Tag Archives: static

How IMDB’s Speedy Search Suggestions Work

IMDB Search JSONIf you type a few letters into the search field over at the Internet Movie Database, you might notice how fast it is. That’s because they’re not served dynamically from their primary servers. IMDB, instead, serves the JSON data for search suggestions from a CDN, resulting in a significant speed boost. They use pregenerated static files to make this possible.

For example, if you visit this URL, you’ll get a JSON file of results for Harry Potter films:

http://sg.media-imdb.com/suggests/h/harry.json

The “h” directory means the query starts with an “h,” as they group their result sets alphabetically, and the “harry” part is what was typed into the search box. So if you wanted results that would match Doctor Who, you could use /d/doct.json. (Spaces are replaced with underscores.)

They only seem to have result sets for 4-5 character inputs, though. So you can query “ince” but not “inception.” The latter will result in an error. I guess most searches common enough to be matched in the suggestion box are covered within that limitation.

It’s a clever implementation, and it has to save a lot of computing power on a site that large, in addition to being fast.

(Note that this is not a public API, and IMDB/Amazon probably wouldn’t be happy about you scraping it or anything like that. But it’s a nice thing to learn from.)

Host Static Websites With Amazon S3

Amazon S3, the inexpensive storage service, now can be used to host entire static websites. Though the service will accept any kind of file, which makes it great for keeping large or frequently-accessed data (podcasts, software downloads, JavaScript widgets, etc.) off your server, until recently it didn’t support index files. You could point a domain to an S3 bucket and upload HTML files, but visitors would get an automatically-generated listing of files instead of your index.html content. That has now changed. Amazon now allows you to setup custom root and error documents.

To get started, open the Amazon S3 Management Console, and follow these simple steps:

1) Right-click on your Amazon S3 bucket and open the Properties pane
2) Configure your root and error documents in the Website tab
3) Click Save

It seems like a good way to throw up a quick traffic-resistant website, though I imagine it could get expensive pretty quickly if it were, say, submitted to Reddit.