What is the difference between data mining and scraping?

It seems there is a lot of confusion out there about the differences and uses for screen-scraping and data mining, and it’s understandable because these two guys tend to hang out together.

Screen scraping is one of many possible tools to gather the data for the miners to run their analysis upon. Scraping can be a good tool to pull data off of one site, or from dozens, or hundreds. The sky’s the limit, and data mining becomes more effective when there is more data to mine.

Data mining is where you have a big pile of information, and you start digging into it for insights. Most of the time you’re looking for previously unnoticed correlations between records, or trends. I’ve come across a few results from data mining that I’ve found very interesting. One of my favorites is from 2012, and mentioned in the NY Times; data scientists analyzed data from used car sales, and found that if you want to buy a reliable used car, buy an orange one. What does orange have to do with reliability? The article proposes some guesses, but who would have guessed that correlation even existed?

Facebook is also a notorious data miner. One of the classic problems of data mining is getting enough data to make a meaningful analysis, but this is not a problem for Facebook. In their case they get data from every user account, every post and comment. They are able to data mine without screen scraping. Since not everyone is so readily supplied a trove of data like this, tools like screen scraping can help fill the void.

While data mining and scraping aren’t always paired up, but they are often found in the same circles. I have had many clients that are using scraped data to augment some other data they have, and use it to look for some interesting trends.

