Google Buys reCAPTCHA


Google has acquired reCAPTCHA, the service that powers some of those squiggly-letter fields (or CAPTCHAs) you have to fill out before submitting a form. (This is usually done to hinder bots attempting to mass-submit the forms for purposes such as spamming.)

The interesting part of reCAPTCHA is where they get their squiggly letters from. The words are from (public domain) books and newspapers that have been scanned. As computers are bad at interpreting images and finding the words within, the scans are chopped-up and served-up through reCAPTCHA, where users help translate the images into plain text. This is done by showing two words, one that reCAPTCHA knows the plaintext for and one it doesn’t. If you type the known word properly, the CAPTCHA validates and the input for the second word is logged.

reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we’ll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

  • Joseph

    You guys / gals rock! Love this blog. Great information written with some tongue in cheek commentary. One of my favorites!!

    • redwall_hp

      Glad you enjoy Webmaster-Source so much. It's pretty much a solo operation at the moment, which means it can be a bit of a stretch to keep things running smoothly, but I try to get by. Be sure to check out the archives. There's some good stuff in there. :)