Man Woman and Child
Man Woman and Child

reCAPTCHA – two birds with one stone

December 9th, 2008

ReCAPTCHA have launched a new service which is based on audio recognition of old radio programmes. See here for their blog entry.

I love the concept of reCAPTCHA (concept described below). It’s such a great win-win, it also makes me wonder where other win-wins can be found. Maybe retirement homes should be built in public parks … a pleasant environment for the residents, some watchful security and maybe gardening for the park. One day we’ll have many of these pairings I’m sure, and we’ll wonder why we did it any other way.

Any (better) ideas?

The reCAPTCHA concept

In general a “CAPTCHA” is one of those distorted words that tests whether you’re a human being rather than a spammer’s computer. 200 million are solved each day.

ReCAPTCHA is a special type, in that it sources words from projects that are digitizing books. The words that their OCR software can’t read are the ones reCAPTCHA uses.

Here’s a reCAPTCHA (click on it to go to the reCAPTCHA site to try it).

It’s win-win genius: the authentication process gets words that are known to be difficult to recognize through OCR, and the book scanners get their problem words solved.

According to wikipedia, spammers have grabbed the words and used unwitting human labour on pornographic sites to solve them. By typing the word, the user gets more pornography.

You’ll note that there are two words: the system knows one answer already (which it uses to check the person’s response), and it gathers answers on the second word until consensus is reached. In other words, duplication of labour is the accuracy check.

It brings to mind Amazon’s mechanical Turk, which makes labour very cheap, possibly so cheap that it can be duplicated to achieve accuracy in a similar way.

Entry Filed under: Analysis


Leave a Comment

Required

Required, hidden

Trackback this post  |  Subscribe to the comments via RSS Feed

Subscribe

Search