Blacklist REGEX Support
I believe that Twitch should add REGEX support for Automod "Blocked words and phrases". This would be a very helpful feature for chat moderation since a lot of users can easily get past the blocked term by intentionally misspelling a word. This would also allow streamers to enforce English only characters in chat. I'm aware that twitch allows using the "*" wildcard but in many instances this is simply not enough.
Tengo una idea bastante básica y muy interesante que debería tener twitch para moderar el chat del canal, esto está implementado en Discord y es una herramienta muy básica y extremadamente eficiente, me refiero al regex "expresión regular"
Los regex son extremadamente eficientes en bloquear lo que uno quiere, por ejemplo la palabra idiota, uno puede en twitch bloquear la palabra idiota pero hay usuarios que escriben idiot4 o idiiota o iiidiota o tambien iidi0t4, pues con un regex o expresión regular se puede bloquear eso, también los regex sirven para eliminar el spam, un buen sistema regex también puede servir para modificar una palabra, por ejemplo en lugar de idiota, en el chat aparecería algo que quieras, como "caso perdido", es decir, el usuario escribe "idiota" y en el chat aparece "caso perdido", se pueden hacer muchas cosas con los regex, es muy eficiente y poderoso, y sería una herramienta básica de moderación avanzada y efectiva.
Thanks for sharing this type of informative article. I have learned some right stuff here. I really like your articles.
The AutoMod/Blocked Terms filter currently only interprets messages for exactly matching the text somewhere within the message. With the only dynamic function is an asterisk (*) character acting as a wildcard to replace any number of simple characters at that point in the sequence. This does not include spaces or international similar characters however so all a user or bot needs to do is space a word out and it will require an entire new rule if not multiple.
For example a bot site with 3 words to it's name is being posted by bots often. These sites also use words which can be relevant on stream so the individual words would never be blocked. Currently to account for the various characters and spacings I would need to write as many rules for this one bot site as I have overall currently. These can be as simple as "Cool" vs "C O O L".
In order to deal with these en mass it would be best to allow for use of Regex rules. The rules always begin with forward slash (/) and end with flags after the final forward slash (/) making them easy to identify.
To explain how this helps, if the blocked term has an L then the user could replace that with an uppercase I. The term would have to be either 2 rules or an * however the latter could affect other words unintentionally. With regex you could place (i|l) to catch either of only those 2 characters. Similar rules can be used to account for a Unicode character for another language which appears the same as one in the which are not currently caught by blocked terms. For instance (e|е) would account for one case I have dealt with previously.
To return to my example bot site, I could have a single rule which is specific enough to catch posts about this site and wide enough to catch any variation of it all while not interrupting any use of the words in it's name. While these rules can become long due to repeatedly checking for white space or spelled out symbols that is still preferable over many near identical rules.
As Regex is a tool which has been widely used for some time there is a lot of existing documentation and online resources for users to reference and test their rules.
Worth noting that Regex has a flag for case insensitivity, however the Blocked Terms already have this so the need for it's use depends on existing implementation.
The expression /Site[\s]*For([\s.(dot)])*pros/gi catches all of the following cases which would each need their own Blocked Term rule.
SiteFor . Pros
SiteFor (dot) Pros
Site For Pros
Site For .Pros
Site For. Pros
Site For . Pros
Site For (dot)Pros
Site For(dot) Pros
Site For (dot) Pros
Faced with a similar problem about 20 years ago in the way of spam about the trademark V pill and when SpamAssassin was just a baby. We created and sold an eMail filtering package that included a program we called WordAssassin to identify bad words. Users were able to add good and bad words to a couple of lists and every time an eMail came in Procmail would run WordAssassin which would give the desired output. Actions included removing the word, returning "bad word" found, adjusting the subject line, return value, and so on.
Most of the world and ourselves started with a REGEX but that didn't really work too well to catch the message and every time an eMail came in a Perl program would run the tests. Needless to say, this overloaded the 450Mghz servers and would still be quite the task given the workload here. Furthermore, it didn't catch all the manipulations of the words very well. We created a C++ program to handle all the fun stuff spammers were doing to get their message thru spam filters and to the user. My old sales pitch seems to fit here so, here we go.
It was always difficult to explain how spammers were using different character sets, spaces, etc, but they were already seeing it anyway. I would explain we had built an extremely fast C binary that performed a variety of tests to detect and provide short ***** protection, longer ***** protection, mangled ***** protection, HTML ***** protection, and so on. YEP, It got a lot of laughs and the product is still running somewhere most likely today.
The core engine.cpp would solve this problem and has been available for licensing or for some good use for quite some time now at wordassassin.com
Currently, the list of blocked terms and phrases found at https://dashboard.twitch.tv/u/[CHANNEL]/settings/moderation/blocked-terms only matches directly against the input. This means that in order to ban variations of a word, one would have to input many millions of phrases through the API even for a single word (as shown here: https://twitter.com/thomsimonson/status/1429472208506822659).
Expand this page to allow soft matching & regex-based matching (advanced, potentially restrict API only) entries.
Soft-matching would match every entry in the list not by it's exact character, but any latin variations/etc. - so banning "information" (example) would also match "înfórmàtion" etc.
Regex-based matching would match via regular expressions (no further explanation needed).
Love the auto mod . But being a lead mod for Papa_pixels, we are a multi cultural channel having only English would really not work well for our channel. Auto mod should be multi cultural
I have a bot that can handle this. I've been using it to block our favorite "Wanna be famous" spam. It can target messages and delete them rather than clearing the chat of all a particular user's message, too, incase of a false positive and keeps a log.
Their bot interface is IRC and it's plaintext and easy to use.
You can sort of do this... Already if you ban f.e an excessive user of HeyGuys (like 3 times right after each other)
If you disallow HeyGuys HeyGuys HeyGuys
Every message containing HeyGuys three or more times right after each other (even at the end of the message) it is blocked...
So it is for anything you could blacklist...