How to process AOL’s search logs with PHP

AOL, for some reason, released 2,000,000 search records taken from the last 3 months. With a tiny bit of programming, you can use PHP to search through these records. This post describes how.

But first, let’s cut to the chase:

First, you need to download the files, which you’ll have to find on your own, as I can’t remember where I found them. The file is ~450MB.


wget http://whereverthosefilesare.com/AOL-data.tgz

Next, uncompress this file into its multiple parts:


tar zxvf AOL-data.tgz

Then, un-gzip the compressed files:


gzip -f user*.gz

This will leave you with a bunch of text files containing the actual log data. These files will be very large. So large in fact that your web server might not be able to deal with them in PHP. So, you should split them into multiple smaller files. For each .txt file, run a command such as:


csplit -f aol user*01.txt 1000 {1000} &

That code will split the files into multiple files of 1000 lines each, naming each them aol100, aol101 and so on. In these smaller chunks, PHP will be able to deal with them more easily. When I ran this command, I ended up with 2000 files of 1000 lines each. For the code below to work, each of these files must have the letters “aol” in their names.

Next, you can create a PHP page that will search the AOL log files for a certain string, and then return matching lines. The string you are looking for will be passed on the URL, as in http://example.com/search.php?search=sex

Here is the code for the PHP page.

Save the file as search.php or whatever, and then access it at its url, e.g., http://yourwebsite.com/search.php?search=thewordyouarelookingfor

Change the URL to run another search. You can take the user id from one search and put it on the URL, thereby finding all the searches by that user.

Leave a Reply

Your email address will not be published. Required fields are marked *