A while back I was contacted by Douglas Hubbard regarding a book he was writing entitled How to Measure Anything. He was interested in finding out more about tools that could automate online data collection, and screen-scraping popped up on his list as one method to go about this. Last week Douglas contacted me indicating that he was essentially done with the work, and it was on its way to press. He sent me a recent draft copy, and asked if I might blog a bit about it. I happily consented, and, I have to admit, I’ve really enjoyed what I’ve read so far.
Before digging into my commentary, I thought I’d include a snippet from the book that deals specifically with screen-scraping:
There is quite a lot of information on the internet and it changes fast. If you use a standard search engine, you get a list of websites, but that’s it. Suppose, instead, you needed to measure the number of times your firms name comes up in certain news sites or measure the blog traffic about a new product. You might even need to use this information in concert with other specific data reported in structured formats on other sites such as economic data from government agencies, etc.
Internet “Screen-scrapers” are a way to gather all this information on a regular basis without hiring a 24×7 staff of interns to do it all. You could use a tool like this to track used-market versions of your product on www.ebay.com, correlate your stores sales in different cities to the local weather by screen-scraping data from www.weather.com , or even just the number of hits on your firms name on various search engines hour-by-hour. As a search on the internet will reveal, there are several examples on the web of “mashups” where data is pulled from multiple sources and presented in a way that provides new insight. A common angle with mashups now is to plot information about business, real estate, traffic, and so on against a map site like Mapquest or Google Earth. I’ve found a mashup of Google Earth and real-estate data on www.housingmaps.com that allows you to see recently sold home prices on a map. Another mashup on socaltech.com shows a map that plots locations of businesses that recently received venture capital. At first glance, someone might think these are just for looking to buy a house or find a job with a new company. But how about research for a construction business or forecasting business growth in a new industry? We are limited only by our resourcefulness.
You can imagine almost limitless combinations of analysis by creating mashups of sites like Myspace and/or YouTube to measure cultural trends or public opinion. Ebay gives us tons of free data about behavior of sellers, buyers and what is being bought and sold and there are already several powerful analytical tools to summarize all the data on Ebay. Comments and reviews of individual products on the sites of Sears, Walmart, Target, and Overstock.com are a source of free input from consumers if we are clever enough to exploit it. The mind reels.
If you step back from it, fundamentally screen-scraping simply deals with repurposing information. The information you’re after with happens to be in a format that makes it less usable, and screen-scraping allows you to put it in a format that is. As Douglas points out, the ability to do this leads to infinite possibilities.
He touches on a few basic reasons for doing screen-scraping:
- Watching information as it changes over time.
- Aggregating data into a single repository.
- Combining information from multiple sources in such a way that the whole is greater than the sum of the parts.
Chances are, any one of us could come up with all kinds of examples of each, and many of them would apply directly to the type of work we do. Every industry deals with information. It’s likely that some of the information you deal with on a day-to-day basis would be more useful to you if it could be repurposed in one of the three ways I mention. How would your business benefit if you could be notified when one of your products is mentioned? How much time could you save if you were able to take any existing set of data you deal with frequently, and enrich it by aggregating information onto it? For example, you might take real estate property listings, and enhance it by adding information for each property that can be readily obtained from a county assessor’s web site. The end product could be quite useful, but it would be unreasonable to manually copy and paste the information from the web site. Screen-scraping allows this kind of thing to be done in an automated fashion.
How to Measure Anything isn’t available just yet, but I’d highly recommend keeping an eye out for it. If you work in an industry that deals with information and measurement (and I can’t think of one that doesn’t), you’d likely benefit from the principles Douglas teaches. Keep an eye on his How to Measure Anything web site for updates, or if you’d like to pre-order the book.