Screen scraping is an automated process to extract information from a website. That would include anything you can see in your web browser, such as text, images, or PDF files. Once scraped, the information can be delivered to you in a format that’s much more useful. For example, web scraping could allow you to copy data about thousands of products from Amazon, which could be saved to a CSV file. You could then open that file in Excel for analysis.
If you had all the time in the world you could do the same thing that a data scraper does. There’s nothing magical about it. But would you want to? Surely you have better ways to spend your time than copying and pasting data from a website. Not only could it take a long time, but it would also be prone to errors. Using a scraper for the same task is way faster and much more accurate.
Screen-scraping is known by a large variety of names, such as “bot,” “crawler,” “scraping agent,” and “spider”. Yes, I feel your sympathy for our SEO efforts.
Sometimes scraping is done on-demand (i.e., “real time” or “near time”). This is useful in cases where you might have a need to extract and compare data from several sites at once, or if the data you’re retrieving is prone to fluctuate and you need the most current data possible. Here are a couple of examples of on-demand scraping:
Let’s suppose you need to place an order for 10,000 sprockets, and there are five suppliers that you could choose from. The problem is that the prices change often, and each has different volume discounts. You could create a scrape that would take as inputs the various part numbers as well as the quantity for each you need. When you run the routine it would query all of the supplier sites and return the current price for each. Voila! Comparison shopping at the touch of a button.
As another example, let’s suppose you own a trucking company, and you’re always on the lookout for jobs. You need to keep your rigs filled and moving. There are websites that advertise trucking jobs available, and you spend half your day incessantly monitoring them. The trouble is, when a job appears it gets snatched up almost immediately by a competitor. Rather than pounding on your keyboard, why not get a web scraper on your team? The bot could monitor the websites for jobs, and, if one appears that meets criteria you specify, it claims it for you.
In contrast to on-demand, most of the time you want to get the data in bulk–one big download of a large data set. You might want this data extracted on a daily, weekly, or monthly basis. The data could be saved to a database or a spreadsheet. With data like this, you can track price, availability, reviews, or pretty much anything that you can find on the internet.
Sometimes we compare screen-scraping to plastic–there are so many uses for it that it’s difficult to describe them all. That said, if you ever find yourself repeating the same task over and over, spare your sanity and consider screen-scraping instead.