Use the TextCat plugin.
In /etc/spamassassin/local.pre add:
loadplugin Mail::SpamAssassin::Plugin::TextCat
In /etc/spamassassin/local.cf add:
ok_languages en af
Where “en af” is a list of codes for languages you wish to accept. The full list is in the TextCat documentation.
If the guessed language is not okay, “UNWANTED_LANGUAGE_BODY” is triggered.
The results are added to a “X-Language” name-value pair in the message metadata data structure. This may be useful as Bayes tokens and can also be used in rules for scoring. The results can also be added to marked-up messages using “add_header”, with the _LANGUAGES_ tag. See Mail::SpamAssassin::Conf for details.
NOTE: The language cannot always be recognized with sufficient confidence. In that case, no action is taken.
IMPORTANT: It is very important that the loadplugin line be added to a .pre file not a .cf file so it is loaded before the rules files are loaded, otherwise those rules will not get enabled.
You may also want to increase the score from the default of 2.8:
score UNWANTED_LANGUAGE_BODY 5
af | Afrikaans |
am | Amharic |
ar | Arabic |
be | Byelorussian |
bg | Bulgarian |
bs | Bosnian |
ca | Catalan |
cs | Czech |
cy | Welsh |
da | Danish |
de | German |
el | Greek |
en | English |
eo | Esperanto |
es | Spanish |
et | Estonian |
eu | Basque |
fa | Persian |
fi | Finnish |
fr | French |
fy | Frisian |
ga | Irish Gaelic |
gd | Scottish Gaelic |
he | Hebrew |
hi | Hindi |
hr | Croatian |
hu | Hungarian |
hy | Armenian |
id | Indonesian |
is | Icelandic |
it | Italian |
ja | Japanese |
ka | Georgian |
ko | Korean |
la | Latin |
lt | Lithuanian |
lv | Latvian |
mr | Marathi |
ms | Malay |
ne | Nepali |
nl | Dutch |
no | Norwegian |
pl | Polish |
pt | Portuguese |
qu | Quechua |
rm | Rhaeto-Romance |
ro | Romanian |
ru | Russian |
sa | Sanskrit |
sc | Scots |
sk | Slovak |
sl | Slovenian |
sq | Albanian |
sr | Serbian |
sv | Swedish |
sw | Swahili |
ta | Tamil |
th | Thai |
tl | Tagalog |
tr | Turkish |
uk | Ukrainian |
vi | Vietnamese |
yi | Yiddish |
zh | Chinese (both Traditional and Simplified) |
zh | g5 - Chinese (Traditional only) |
zh | 2312 - Chinese (Simplified only) |