After the first article present here. We went though the analysis of the users needs and got a lot of feedback present in this index 2:
We have interviewed developers, communities managers, and database Administrators. And we have got a lot of feedback, from all those interviews.
The result of those interviews are the following. We have good and bad news:
Bad news
Most of the Community managers does not solve the spam themselves
In common app likes facebook, youtube, … the spam filtering is handle by the platform directly and they does not allow integration from external API to extends thier existing solutions.
So we have decided to focus on developers and database administrators for the moment and for upcoming versions and if the need for external system reappear we will then see how to develop that again.
There are existing solutions for modules like WordPress
I have found out that there are some integrations for wordpress quite popular and coming by default with the software. And after analysis, we found out that the plugins only detect spam when link are added to comment by analysing the comments with links. So we decided to take that into consideration while developing our module and provide a more accurate solution outside of the link scope.
It is extremely hard to find good dataset for that use case
For the machine learning powering the software, we have decided to implement a solution using a huge amount of dataset and then being able to detect spam because he has seen a lot of spams. So he can easily classify the text we give to it. Buuuut that was before we start looking for paid and free dataset online. There is a really small amount of dataset for website/comment spam. That is mainly because most of the time, people are more focus on email spam and they consider researching, gathering datas, … for that use-case. But in the next index we will start searching for dataset.
Good news
At least some good news:
It can be resume as follow:
Developers need a solution like dixto.
After talking with developers we found out that they may actually be really interested by a solution that dixto is offering. We are developing an open-source saas spam detection and explicit content mitigation for developers and database administrators. It may be easily integrated into any programming languages and can help to detect spam in a given text, images or videos.
There is no solution actually for infected databases.
we also found out that there is no existing solution for people who have infected database. It means that if you develop your solution today and then after 3 years decided to integrate a spam filtering software, you may have to go through your database yourself and delete everything manually. The software will be providing a database integration that will browse your selected tables and delete unwanted contents.
Dixto can be developed as a simple solution
The actual software is not supposed to be complex or complicated. It can be done as a simple api with a nice interface for key managements and database cleaning utilities.
So from this we may easily conclude that there is a need for the software and the hard part will be the core of the software and the machine learning module. We will tacle that in the upcoming indexes.
Leave a Reply