Post

2 followers Follow
0
Avatar

Avoiding non-English content

Hi,
Just wondering if this is the right way to avoid irrelevant content:

interaction.content contains_any "Centro"
AND
language.tag == "en"

We restrict mentions to English language only but still receive quite a few posts in Spanish language.
Example: https://twitter.com/Nico_Morena/status/508356203706343424

Can you please advise?

Thank you

Raisa Tsemel

Official comment

Avatar

Hi,
While our language detection is correct in the majority of cases, there will be some circumstances when interactions are classified incorrectly. There are a number of ways to improve the strictness of your filter.

You can use the confidence target within your filter, this allows you to specify the confidence of the language that has been detected as a percentage. E.g:

language.confidence >= 75 and
language.tag == "en"

Would only return results that were in English that have a confidence of 75% or more.

Other targets you can use for Twitter include twitter.lang, this is the language detected by Twitter, you could make sure language.tag and twitter.lang are identical. Another target that can be used for language detection is the twitter.user.lang, this is the language set by the user for their Twitter account, so it is not always the most reliable filter.

Using a combination of these targets you should be able to improve your results.

Paul M.
Comment actions Permalink

Please sign in to leave a comment.

1 comment