Difference between revisions of "Erfwiki:SpamWars"

From Erfwiki
Jump to: navigation, search
(Usergroup Idea)
m (For Bot Coding)
Line 121: Line 121:
**## A minor mutation seems to have shown up, which first posts, and then edits the links. ([http://www.erfworld.com/wiki/index.php?title=Business_Loan_From_Banks&action=history EG])
**## A minor mutation seems to have shown up, which first posts, and then edits the links. ([http://www.erfworld.com/wiki/index.php?title=Business_Loan_From_Banks&action=history EG])
:There also appear at one point to have been some bots using names on a "Word-Word(-number)" pattern back around February or so. They seem to mostly have been taken out and prevented by previous admin action. [[User:Abb3w|Abb3w]] 16:03, 14 May 2011 (UTC)
:There also appear at one point to have been some bots using names on a "Word-Word(-number)" pattern back around February or so. They seem to mostly have been taken out and prevented by previous admin action. [[User:Abb3w|Abb3w]] 16:03, 14 May 2011 (UTC)
::There also was the highly anomalous [[Special:Contributions/Draconianguy2|Draconianguy2]], an apparent one-off (knock on wood) back on the 12th. [[User:Abb3w|Abb3w]] 23:15, 14 May 2011 (UTC)
==For Wiki Work==
==For Wiki Work==

Revision as of 18:15, 14 May 2011

Branched off from main ErfWiki:Maintenance Portal page. (It's a Wiki. You don't like it, move it back.) Abb3w 22:15, 7 May 2011 (UTC)

From Maintenance Portal

  • As long as pages can be edited by anyone without a captcha-protected login, spambots can do whatever they want
  • Add spambot IPs to [[Category:Spammer]]
    • No IP is being used more than once. All this accomplishes is blacklisting a massive pile of random IPs. --ChroniclerC 21:49, 2 May 2011 (UTC)
      • The spambots are like that, but the wandering vandal i spent a few hours reverting last night does double back over his IP's a bit, probably using a limited number of proxies. Slapping those down would deal with that part of the problem eventually. --Pickled Tink 02:49, 7 May 2011 (UTC)
  • List of steps recommended by MediaWiki's manual:
    • Requiring user logins to edit pages
    • Requiring email and CAPTCHA validations of user creation
    • Requiring CAPTCHA for edits, from users who are not well known.
    • Blocking edits which add specific key words or external links
    • Blocking usernames and page title patterns that are commonly used by bots
    • Blocking registration using known spam domains (mail.ru)
    • Using several blacklist services
    • Cleanup scripts that revert changes caused by recently identified spammers
  • List of steps not recommended by MediaWiki's manual
    • Don't use MediaWiki for low-volume/maintenance wikis with anonymous editing policies

--- Question here, and I'm not sure where else to put this: Do you have an autorevert script in mind or already set up? Do you need help with one? Is there a specific place we could discuss this? A thread on the forum seems like a good idea, but I don't see one. --- -- Oh, and if I could post this, then a spammer can post whatever they want, too. Requirin user registration, and hassling contributers until they're recognized as not spammers might drive off some users, but spam drives many more people off. -- 00:08, 3 May 2011 (UTC)

-- Over at the unofficial exalted wiki, I found that the most successful thing with fixing spam was switching to a straight type-in-the-word captcha on account creation - recaptcha and logic puzzles are pretty much cracked, but the spammers are generally not targeting single wikis, so something like 'type in the name of the wiki' cut my spam down to zero. - Xyphoid

I would suggest something more specific - the KH Wiki asks you to type in the name of the main protagonist, so something like that would be wise. Like "Who is the Perfect Warlord? Parson Gotti." 05:12, 7 May 2011 (UTC)
I'll second this approach - spambots don't target sites in particular, they're just designed to crack common captchas, so site-unique captchas tend to kill them dead. They won't keep out deliberate spammers, of course, but they're far rarer than bots these days. --Tommy 21:13, 7 May 2011 (UTC)

-- Another ham handed, but simple, method would be to simply create a bot that automatically banned any new user with a username between five and eight characters in length (This is true of almost every single bot), and leave a note on the registration page that this happens. I merely add it here because the lazy option must always be presented. --Pickled Tink 11:21, 7 May 2011 (UTC)

-- I see you're running Apache. Do you have access to change Apache or is this shared hosting? If you can change your Apache configs and install modules I would highly suggest installing mod_security and then ASL Lite (free ruleset for modsec). http://www.gotroot.com Won't stop all your spam but it would cut down some of the nastier stuff. --

Strategery Suggestions

For Manual Hunters

There are probably more effective ways, in the long run. However, there's something to be said for the quality of wetware AI. So, if you care to start hunting through Uncategorized Pages or (ick) all the Recent Changes, you can. (Non-spam uncategorized pages ought to be considered as to whether a category can be found for them.) Once you find pages with Spam....

  1. Check the page history
    1. If the page is newly created, blank and label with {{delete}} to mark for speedy deletion.
    2. If the page existed before
      1. It may have been only vandalized once since last clean
        1. The history page has a handy "undo" link
        2. To help distinguish yourself from spambots, change "revision" to "vandalism" or variant thereof.
        3. You might also flag it as a minor edit
      2. It may have been vandalized more than once by spam-bot
        1. Start wading back through the changelog to find the last-clean variant
        2. Click timestamp link for that variant
        3. Edit, and give some manner of reverty-description for the summary
  2. Go back to that history, dude
    1. Find the spammer's change
    2. Click the link for the username or IP that made the offending change
    3. Check the associated user and/or talk pages
      1. See if there's already indication there that they're in [[Category:Spammer]] or [[Category:Banned]]
      2. If not, check the ban log to see if they're already banned
      3. Mark already banned accounts/IPs by adding {{banned}} to their talk and/or user page
      4. Mark accounts/IPs to be drawn to the Banhammer's attention by adding {{spammer}} to their talk and/or user page
    4. Check the user contributions; they may have more SPAM to their "credit"
  3. Go back to manual hunting...

For Bot Coding

...err... yeah, someone get on that

  • Install pywikipediabot (needs python
    • use python delete.py -cat:"Candidates for speedy deletion" -always to mass delete all pages in. Maybe there should be a category just for spam.
    • use spamremove.py spamsite.com to find all pages containing spamsite.com and removing the spamlink
    • maybe there are other useful bots in that package too
  • and, for erfs sake, start using MediaWiki:Spam-blacklist. Spamming will get less if they cant spam the same link twice.--Baumgeist 18:26, 11 May 2011 (UTC)
    • Whacking the recent contribution page with a curl-grep-sed pipe, a good start might be adding the dot-com domains
  1. allforresult
  2. allhomesearch
  3. awarefinance
  4. banansearch
  5. bestentrypoint
  6. bestsmartfind
  7. bigtopsale
  8. blogbasters
  9. dawnloadonline
  10. detailedlook
  11. esecuritys
  12. find24hs
  13. findfavour
  14. findthisall
  15. godsearchs
  16. google-reseach
  17. gottacatch
  18. greatfound
  19. hardlyfind
  20. lehmanbrotherbankruptcy
  21. listenhanced
  22. mydrugdir
  23. myglobaldirs
  24. mylinkdirs
  25. mysearchends
  26. mystarsearch
  27. rocketcarrental
  28. seachtop
  29. signforcover
  30. siteslocate
  31. sitesworlds
  32. starsearchtool
  33. tag.contextweb
  34. thedirsite
  35. toptopdir
  36. widesearchengine
to start getting a handle on the Centermen bots. Abb3w 16:20, 14 May 2011 (UTC)
    • Also add:
  1. pandorashopsonline
  2. levispeichern
  3. tiffanyjewelleryoutlet
  4. billigechanel
  5. tiffanyco-mall (a Dot-net rather than Dot-com)
  6. levispeichern
...to start dealing with the H-bots
  • So far I have found 2 types of spam.
      1. replaces the text of a page with a random engrish compliment. All of these have a long stream of random letters in the comment field. A bot could probably flag all edits with gibberish for comments as spam. Alternatively, we could probably find or compile a list of the phrases this spambot uses and search for those.
      2. replaces or creates an article with a spam message linking to another website. The first lines of these spam articles all appear to not be formatted in the same way as our real articles -- they don't begin with a left-justified level 1 heading (=). Some begin with ==<center>. Others begin with plain text. If this is correct a bot could probably tag any page that doesn't start with a left-justified level 1 heading (=) as spam. -- circa 13 May 2011
The first type have been commonly termed "cheerleaders"; some admin action (probably basic Kittenauth) seems to have temporarily stopped them. The other two look to be
      1. What I will dub "H-Bots", based on a few which were active 02:43 through 02:49 on 2011-05-13. They all have mixed alphanumeric names, mostly beginning with "H"; they edit their own User talk page with spam links.
      2. What I will dub "Centermen", based on the use of the center tag. They mostly have six-to-nine character phonetic but somewhat random names. Some stay inactive for a few minutes before posting, others sit around for weeks or months before spamming.
        1. A minor mutation seems to have shown up, which first posts, and then edits the links. (EG)
There also appear at one point to have been some bots using names on a "Word-Word(-number)" pattern back around February or so. They seem to mostly have been taken out and prevented by previous admin action. Abb3w 16:03, 14 May 2011 (UTC)
There also was the highly anomalous Draconianguy2, an apparent one-off (knock on wood) back on the 12th. Abb3w 23:15, 14 May 2011 (UTC)

For Wiki Work

Spotted... Abb3w 02:46, 8 May 2011 (UTC)


I've got it! Use Kitten Auth. Normally it shows a bunch of pictures of different animals and asks you to select which one is the kitten, but you can customise it.

So you just load a heap of images of erfworld characters, particularly Wanda, and ask people to select the picture of Wanda. Should stop even the human opperators unless they actually know the comic, and its simple enough to switch to a different erfworld character like Ansom if they start to figure it out. http://www.mediawiki.org/wiki/Extension:KittenAuth --Charles 01:52, 8 May 2011 (UTC)

Looks like KittenAuth was effective for about 15 hours, uncustomized. Abb3w 05:31, 10 May 2011 (UTC)
Another possibility: Take your KittenAuth image library, make LOTS of versions of each picture (subtle changes to a couple of pixels, just recompressing it, etc). Looks like one of the more common ways of bypassing it involve building urlpath or md5 hashes of the good images. More images is better. More "good" images means less chance that an already-used (and identified) "good" image will be presented in this selection run. The core KittenAuth library has already been fingerprinted by a lot of spammers, so use your own custom pictures, and switch them out whenever its success rate starts falling. 21:11, 10 May 2011 (UTC)
You could probably generate the images on the fly. Have a large set of base images (and preferrably more than one question, so that the correct answer is not always the same for a given set of base images). Then, when you need to display the auth images, randomly alter (like, say apply some distorting filter) the base images to generate the images to display on that particular login attempt. The image generation would use a bit of CPU, so a downside would be it being a sweet target for DOS attacks.
You can probably reduce the amount of CPU required by the above system by using a small set of "good" images, loaded from the disk in uncompressed format, then adding some live text over them (which is fast when your source image is already uncompressed and no anti-aliasing is used), and JPEG-encoding the result on the fly. If you only modify the good image, a spammer who is really intent on breaking the wiki will have to fingerprint all the other images, then select the one that does not match any known fingerprint. To render statistical analysis unfeasible, you will need to run a timed script that generates a whole new set of bad images every so often (with the same system of adding text to the existing ones and saving them); once a day should be sufficient, unless the attacker is REALLY determined. This saves server CPU power with only a small cost in effectiveness. Incidentally, if you do add KittenAuth, you may want to make all images greyscale, for the benefit of color-blind users. 19:41, 11 May 2011 (UTC)

Usergroup Idea

Okay, this will take some manual work on the part of someone who has high-rank power within the wiki, but what if the default usergroup for a newly created account could not put in links? Just stay with me a minute, it's kind of a weird idea that I'm getting from a mix of forum protocols, and clan protocols in an MMO I play.

  • Anonymous users (you know, people who haven't logged in) can only edit talk pages and can't add external links.
  • Freshly-created accounts are automatically put into limited-access usergroup; let's call it Newbie for now. Only step up they have is that they can now edit main pages. They still can't create new pages, upload images, move pages, or add external links.
  • Users that have proven that they aren't mooks (after, say, a week or so) get promoted to Trusted. Trusted users have pretty much full access since we know they aren't bots by this point.
  • Above that, but just below the wiki admin, should be some Moderators. They'd be in charge of deleting and moving pages, and possibly deal with promoting Newbies to Trusted or banning schmucks.

--ChroniclerC 20:22, 14 May 2011 (UTC)

Doesn't that diverges a bit from the standard access model for Wikis? Or do other Wikis have additional access groups beyond Administrator and Bureaucrat (and the tacit "physical access" group)? Does the wiki software support additional groups, or would that have to be a code-level change that might cause larger security/maintenance issues down the line? Abb3w 22:44, 14 May 2011 (UTC)
I know that all of that can be done with standard wiki tech, except for the creation of the new usergroup (for Trusted). If you've got Admin/Bureaucrat access (whichever is higher, I can never remember), you can tweak what each usergroup, including anonymous users, are allowed to do, and the lower of Admin/Bureaucrat can take over the Moderator stuff if we don't want to make another usergroup just for it. I'm fairly certain that it's possible to create a new usergroup, but I'm not sure how. --ChroniclerC 23:06, 14 May 2011 (UTC)