Tech & VC 29 May 2007 02:19 pm
reCAPTCHA on my Blog
reCAPTCHA is the latest project from Luis von Ahn the inventor of the CAPTCHA and founder of the highly addictive ESP Game. The underlying principle to all of Luis con Ahn’s work is that perhaps the smartest computer is one that is powered by the crowdsourced intelligence of humans. Typically, Luis leverages games in order to incentive humans to contribute work to a greater cause, such as labeling images for improved search quality and seeing-impaired access.
reCAPTCHA is not a game. It’s a web service version of a CAPTCHA. reCAPTCHA is a leap forward in OCR technology. A normal CAPTCHA is just randomly chosen characters,; by contrast, a reCAPTCHA is two words that modern-day OCR technology fails to recognize in book digitizing efforts. One of the words is known to the computer (based on previous reCAPTCHAs) and one of the words is unknown. If computer tests if you can recognize the known word. If you get it right, it assumes you know the unknown word. The unknown word is considered unknown until a statistically significant number of people agree on the word. Once there is strong agreement, the word is known and can be used as a known word in future reCAPTCHAs. More importantly, once a previously unknown word is known, it can be used to improve the digitization of the book that was the source of the word. So, if there is a smudge in the scan of Moby Dick and OCR fails to recognize a word, that word can possibly be recovered by the human computing power of reCAPTCHA.
It’s free to use (unless you suck up a lot of bandwidth). You need to request an API key and then implement their API (or use a plugin, like the WordPress plugin I’m using). As a nice side benefit, the usability of reCAPTCHA is significantly improved over original CAPTCHAs (for example, CAPTCHAs are not accessible to the blind, but reCAPTCHA has an audio option for the seeing-impaired).
Filling out a reCAPTCHA is now required in order to comment on my blog. I’m not a big fan of making it harder for people to comment here, but I’m glad this will reduce comment spam, and I’m really glad I will be adding to the book digitizing efforts of archive.org, which is the first beneficiary of reCAPTCHA’s OCR computing power. If the number of comments I receive dips significantly, then I’ll kill it, but otherwise, the reCAPTCHA is here to stay.
Also, I have a feature request for reCAPTCHA: I wish they would report to my how many people successfully and unsuccessfully filled out my reCAPTCHA. Just a simple counter of both numbers would be great.
I love the reCAPTCHA tagline… it’s a perfect description of their value proposition: “Stop Spam. Read Books.” Write a comment here to test it out. :) The WordPress plugin isn’t perfect, but I’m looking into customizing the CSS to make it a little more intuitive, especially when a user fails to fill out the reCAPTCHA correctly.
I first learned about reCAPTCHAs from the consistently excellent O’Reilly Radar blog. Also, check out this original announcement by Ben Maurer, a student of Luis Von Ahn.
3 Responses to “reCAPTCHA on my Blog”

on 29 May 2007 at 3:32 pm 1.Greg said …
I was looking at blogs for info on reCAPTCHA. I like your article! To show you that comments won’t go significantly down, here’s one from me :)
on 30 May 2007 at 12:46 am 2.harsh shah said …
Interesting article. Just testing the new system.
on 14 Aug 2007 at 9:27 pm 3.Guy Rintoul said …
Interesting to read about reCaptcha. Just looking for a captcha for my blog and this sounds like the best one :-)