New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use content classification systems for better SPAM detection #10038
Comments
Considering the ClassifierReborn documentation : Bayesian ClassifierBayesian Classifiers are accurate, fast, and have modest memory requirements. Latent Semantic Indexer (LSI)Latent Semantic Indexing engines are not as fast or as small as Bayesian classifiers, but are more flexible, providing fast search, and clustering detection as well as semantic analysis of the text that theoretically simulates human learning. As per documentation As a rule, indexing will be fairly swift on modern machines until you have well over 500 documents indexed, or have an incredibly diverse vocabulary for your documents. |
Using the LSI algorithm is a no go. Indexing 2000 out of 30.000 comments took creating a 9100 List of words.
|
@alecslupu so far what I have tried in your staging (leaving comments in different languages with the word Scort, adding and editing profiles, proposals, etc.) has worked just fine. It would be optimal to test it with Metadecidim data, being careful with personal data. |
I have a feeling that my remarks might have gone lost in the wild and scattered over multiple issues, discussions, PRs and chats, I will post this information here collected in one piece so that it is not lost in the wilderness, as I have a feeling it might take a while to develop a good enough solution for the SPAM classification. This issue is not only technical. It has a lot to do with the strategy on how to solve it properly. There is nothing wrong with the technical implementation done by @alecslupu (given the requirements), but the main problem is that the problem and the desired solution has not been researched and specified well enough to begin with. TL;DR / conclusions:
What are the remaining problems?So as I have already posted in the reviews of #11319 and #10151, the automatic classification boils down to two things:
How to fight SPAM?Generally there is no one way to detect SPAM (such as text classification) but you need to combine multiple things together based on the context to be classified. There’s some good conversation already on this topic especially regarding the spammer profiles at #8239 which also shows that the most reliable classification method relies not only on the content these users write in their profile bio but on multiple factors on how the spammers typically behave. And as time goes on, they will learn, and the classification strategy has to be adjusted accordingly. Here is a good resource I found regarding classifying SPAM in general: Note that typical email SPAM filters not only rely on the content classification but they use multiple factors to make more accurate guesses. E.g. content frequency, content tone, volume and sender verification. On top of this, some other commercial SPAM classification tools can also check the sender against blocklists or check the location of the user (which is by the way one technique we’ve used successfully in the past to fight SPAM on a Decidim website). This also applies to detecting the typical behavior of a profile spammer on a website. Most of them come from certain areas of the world and they typically behave in a certain way: fill out their profile description, image and URL, have very little amount of logins, typically a small amount of interactions on the platform, etc. The more these factors are taken into account during the classification, the more accurate the guess is. One algorithm we have used previously successfully to find profile spammers with very high accuracy:
This results to a list of users which are likely to be spammers which can be further analyzed by admins and decided manually which are spammers and which are not. There are also few considerations we’ve done in order to nullify the guess as they are some factors that indicate the user is not likely to be a spammer, such as:
Also there are other ways that are already implemented to make profile spamming less interesting for the spammers, such as limiting the link juice on the profile URLs (#12827, #8779, #9047). Another idea I’ve had would be to take the personal URL completely out of the profile as at least in our experience, it is largely unnecessary. Almost no real users use it. Even better if we could get rid of the profile description but I understand that on some sites it might be nice to have. And by that I really mean “nice to have” because I don’t think many people find it useful, at least in our experience. Currently both of these fields just mostly serve as honeypot traps to catch spammers. This is mostly about the profile SPAMming. Note that comment and content SPAM is a different problem to solve but most of the points made here also applies there. How does the selected algorithm (Naïve Bayes) compare to alternatives?I have been testing different algorithms and different strategies for classifying the SPAM, including:
First of all, comparing Naïve Bayes and fastText, I see them equally good for the problem scope of classifying SPAM. If they are fed the same training data, they are almost equally reliable. Last year I did some tests regarding these with the profile descriptions at MetaDecidim and their performance regarding the classification task alone was comparable. fastText is able to do a lot more out of the box with its pre-trained models, such as language identification and word vectors (e.g. context classification for proposals). But if we only need to detect SPAM, there is really no difference between these algorithms. Might be some nuance differences but overall they perform equally well with the same training data. Regarding the LLMs, while there is a lot of hype around them being able to solve this kind of problems “automatically”, there are a some of problems with using them here:
I tested some of these models (only the open versions of them) with the actual data to be analyzed (i.e. MetaDecidim profiles), and none of them performed any better than the simpler classification models. They can be used for the task but using them requires a lot more energy and resources, without any added benefits to the alternatives. People get too excited when they see something behaving close to a human. The truth is that they are just algorithms that have a lot of data fed to them. They can be used for SPAM detection but they are definitely not the best tools for that. SPAM detection problem relates to being able to classify a text to a certain category which can be done much more easily and more efficiently. As the conclusion, I think that for the specific task at hand, Naïve Bayes is a fine selection. It just needs a lot of data samples to be reliable enough. Don’t eat the LLM hype regarding this particular problem without chewing. Once there are enough content samples and if there is still doubt about the classification algorithm, use PyCaret to compare different classification algorithms and their precision (does not compare against LLMs but it does not matter): Content samplesHaving a large enough sample sizes for the data to be classified is the key to getting it working correctly. But there are several problems regarding this, such as not having enough clean content to train it with. I believe finding SPAM is not the problem but rather finding enough HAM (non-SPAM). The current implementation at #11319 relies on a sample set of 230 SPAM profiles, 5574 SMS messages and 104 SPAM comments. The only dataset that contains non-SPAM content is the SMS set. The total content distribution is about 90% SPAM and 10% HAM. I don’t have to be too much of a statistical researcher or a wizard to predict that this will classify about 90% of the content as SPAM. And just to verify this, (last year) I ran the current model against all profiles at MetaDecidim and it classified about 94% of the profiles as spammers. While there are a lot of spammer profiles, manual inspection shows that it does a lot of mistakes on genuine profiles. E.g. “product manager”, “product owner”, “PO at company name”, “building products with purpose”, “Telecommunication Engineer - Developer”, filling the profile description only with emojis or writing product description in any other language than English will all classify as SPAM, just as some examples. This is comparable to a model that classifies 9/10 profiles as SPAM at random. It is likely good at detecting SMS SPAM at its current state but that is most it can do. There is no SMS SPAM in Decidim. It might find some comment SPAM but probably does a lot of mistakes for that too. It won’t be able to detect profile SPAM reliably at its current state. It is just as good that the data you feed to it. We need a lot more samples of data for each content category we are classifying. The more data it is fed, the more reliable the guessing will be. And the data needs to have close to equal weights both on SPAM and HAM. I would start with a ballbark of 100k samples for both HAM and SPAM per category. This means is that the datasets needed are:
I believe finding SPAM would not be the biggest problem here but finding large enough sets of good content will probably be a problem. From MetaDecidim alone you can get about 30k SPAM profile descriptions, as most of the registered users are spammers. But it needs to be manually classified to begin with. Here is where LLMs might be helpful. If it’s not possible to find this much actual data, you can take all that data available and generate the missing part with LLMs by feeding them all the source data. I actually tried this as an experiment, I used only 10 HAM and 10 SPAM descriptions to generate about 200k entries using the open GPT-2 model, and the end result was a dataset which I fed to the Naïve Bayes classifier. It categorized about 70% of the profiles as SPAM and 30% as HAM. Probably it still does a lot of mistakes as there was no further work done for that data but it could be improved by experimenting with different models and feeding it more accurate data to begin with. But it just shows that it is likely to do less mistakes than the current model because a lot more data has been fed to it. And yes, on the other hand it will also miss some spammers but I would suggest rather aiming for a more precise model than a model that does a lot of mistakes. The problem with generating this type of content using the LLMs is again the data they have been fed with. The GPT-2 model for instance has a lot of weight on news and wiki articles, which gives you somewhat of an idea on the content it will generate. The lucky part regarding the profiles context is that many news articles contain content that resembles a profile description, e.g. “[...] says Dr. John Doe who works as the research lead at XYZ”. And on the other hand, it has read a lot of ads too, as most news articles contain ads. It would need more research and developing the prompts in a way to make these models generate the needed content but it might work. It just needs more research on how to develop the correct prompts and which model to use. If this is used, the final content still needs to be manually classified by humans into the desired categories. Some cleanup work is also needed on the datasets and some entries they create need to be omitted as they make no sense. This is some amount of manual work that can be also crowd sourced. Content languageAnother problem regarding a working model is the content language which is not only English. There are two alternative ways to tackle this problem:
The first one would be easier to implement but again, it requires a heavy model to run and would add a GPU requirement for the system to work fast. Detecting the language is easy and can be done fast even with a CPU (e.g. using fastText) but translating text to another language is slower and works the best using a GPU. The remaining option is then to translate all the content to all possible languages that the spammers are using. This can also include languages used by the platform as sometimes the spammers are targeting those languages specifically. Also note that in some cases, participants can also post actual content in languages that are not configured for the instance (there may be language groups that the city serves in their language even if they don’t officially translate all the website content to all these languages). So it is not always a direct indication of a spammer that they are posting in some other language that is configured for the system. In the following table I have collected all the languages used by the spammers based on the MetaDecidim profile analysis and also the languages currently supported by Decidim based on the language files shipped within the Decidim gems (I know many of these languages are not actually translated). There is an open tool called Argos Translate that supports most of these languages, only exception being Karakalpak (kaa) which is not actually translated currently in Decidim although the language files are shipped with the gems. It works fairly well and is based on the data available through the OpenNMT translations. Note that this might have changed as the analysis was done last year, but it can serve as a starting point. The explanations of the table columns:
I would assume that training the models specifically to each language would be the most reliable method and then selecting the model based on automatic detection of the source text language e.g. using fastText which is fairly reliable and also fast. It has a pre-trained model for language detection. Having the source content only in English should not be a limiting factor as it is possible to translate the datasets to all these languages with only one exception (which is irrelevant). It might not be as reliable as using real content but I don’t see it feasible to collect enough SPAM and HAM content in all these languages or finding an LLM that supports all these languages, so regarding this I believe some corners can be cut. Steps forwardIn all of this I just wanted to transfer all the relevant points for further planning on how to proceed with this issue. It requires quite much more work that was initially planned for this issue. My suggestion would be as follows:
Note that this process is not a guaranteed way to get a working classification system. A lot of the work that goes to machine learning is based on trial and error. This should give a good starting point for further work and evaluation but there is no way to know how well this works in action than trying it. If it fails, it will be a lot of wasted effort. If it works, it will be great. Only thing I can say is that the first version will not definitely work perfectly but it would be more reliable than the current model. And once again, there will never be a perfect classifier as the spammers will also learn. But if it works 80% of the time, it is 80% less work for all the administrators (where this problem is not already solved through other means). And finally, spammers are also capable of learning. Once the method is created and shipped, it will only work for a certain period of time. As time goes on, the model needs to be adjusted as the spammers adjust their work. |
Ref: SPAM06
This proposal was original created by @ahukkanen and available at
https://meta.decidim.org/processes/roadmap/f/122/proposals/16256
There are a couple changes introduced by @decidim/product
Is your feature request related to a problem? Please describe.
SPAM users are becoming bigger and bigger problem for all Decidim instances. They register profiles to put advertisement in their profile bio or a SPAM link in their personal URL and they are flooding the comments section with SPAM.
This is a real issue that is causing lots of extra work for the moderators of the platform. We should apply some automation in order to help their work.
Describe the solution you'd like
There is a gem available named Classifier Reborn which provides two alternative content classification algorithms:
Bayes - The system is trained using a predefined set of sentences to detect what are considered good and what are considered bad. When classifying content, it applies a word density search for the new content against this predefined database and provides a probability if the new content is considered good or bad.
Latent Semantic Indexer (LSI) - Behaves with similar logic as above but adds semantic indexing to the equation. Slower but more flexible.
More information available from:
Based on one of these algorithms, we could calculate a SPAM probability score for any content the user enters + the user profile itself when it is updated because in the past years we have been seeing many users who create SPAM profiles to get a back link to their site for improved SEO scores.
The only automated action that will be done is to report this user account (and SPAM contents) to the Moderation panel, so a human can review this report and hide/block if it's SPAM indeed. In the future this could be evolved to automatically hiding the content after we have more experience.
Describe alternatives you've considered
Additional context
The suggested content classification systems with the predefined databases are likely to work only for English. I haven't dug deeper whether such databases are available for other languages.
But, as of our experience, most of the SPAM users are spamming in English, so I think such classification systems could solve the problem at least for English SPAM.
If the classification needs to be applied to other languages as well, there could be some way to train the system further with other datasets. By default, it could just be trained in English to get rid of most of the SPAM users.
See original proposal at Metadecidim.
Does this issue could impact on users private data?
No.
Funded by
Decidim Association
Acceptance criteria
When I run the command
bin/rails decidim:spam:train:moderation
Then the algorithm is trained with the past moderated contents.
When I run the command
bin/rails decidim:spam:train:file[path/to/file]
Then the algorithm is trained with a spam database file.
When I block a participant
Then the algorithm is trained with its profile data.
When I hide a content
Then the algorithm is trained with its data.
When I create a proposal with some words that appear as spam (for instance, "You are the lucky winner! Claim your holiday prize.")
Then the system automatically report this content.
When I edit a proposal with some words that appear as spam (for instance, "You are the lucky winner! Claim your holiday prize.")
Then the system automatically report this content.
When I create a comment with some words that appear as spam (for instance, "You are the lucky winner! Claim your holiday prize.")
Then the system automatically report this content.
When I edit a comment with some words that appear as spam (for instance, "You are the lucky winner! Claim your holiday prize.")
Then the system automatically report this content.
When I edit my profile with some words that appear as spam (for instance, "You are the lucky winner! Claim your holiday prize.")
Then the system automatically report this participant.
The text was updated successfully, but these errors were encountered: