This is the catalog for a crawl of Amazon.com. catalog.isbns.txt contains a list of ISBNs collected in the crawl. catalog.txt is the actual catalog. Each line contains an ISBN, a space, a segment code, and then the title of the downloaded page. To look up an ISBN, take the segment code (call it $SEG) and download: http://www.archive.org/download/amazon_crawl.$SEG/amazon.$SEG.gz gunzip the file. It consists of repeated objects of the form: an ISBN-10, a comma, a natural number, a colon, and then a string the length of the natural number. Example: "0000000000,25:This is a test of the um." The ISBN is the ISBN of the book, the string is the contents of the page . Pages were donwloaded September and October 2007.