Data mining vs. screen-scraping

Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts.

In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll elaborate a bit.

The term “screen-scraping” comes from the old mainframe terminal days where people worked on computers with green and black screens containing only text. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Fast-forwarding to the web world of today, screen-scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, on the other hand, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

The difficulty is that people who don’t know the term “screen-scraping” will try Googling for anything that resembles it. We include a number of these terms on our web site to help such folks. For example, we created pages entitled Text Data Mining, Automated Data Collection, Web Site Data Extraction, and even Web Site Ripper (I suppose “scraping” is sort of like “ripping” 🙂 ). So it presents a bit of a problem–we don’t necessarily want to perpetuate a misconception (i.e., screen-scraping = data mining), but we also have to use terminology that people will actually use.

4 thoughts on “Data mining vs. screen-scraping”

  1. I actually wouldn’t consider myself to be much of an expert on data mining, as the area of expertise of our company is screen-scraping. I have an acquainence, however, who is near completion of an M.S. in data mining, and, based on his description, it looks to be a degree in pretty high demand. Good luck with it!

  2. When it comes to aggregating data, is it fair to say that the quality of the from data mining is superior to that compared to screen scraping?

  3. It’s actually more of an apples to oranges comparison. Screen-scraping is a process whereby you obtain data, and has nothing specific to do with what gets done with the data. Data mining, on the other hand, assumes that you already have the data and that you’d like to analyze it using certain statistical methods.

Leave a Comment