Screen-scraping is a means to automate anything you can do on the internet, and there is no end to things you can do on the internet. Anytime you find a repetitive task you’re repeating online, it might be worth seeing if a screen-scraper could do the same thing faster and more accurately.
The versatility of screen scraping sometimes leaves people at a loss as to where to start. Here are some examples of common uses for screen-scraping, but it is by no means a complete list.
Update prices and inventory
This is probably the most common request we get for scraping. If you have items from a site that requires a login, the scrape would also log in. For each site that you want to scrape, we need to find a way to get to the items you want. For some sites it’s reasonable to scrape the whole catalog, and others we might build the routine to go into only categories that interest you, or read in a list of item numbers and search for each.
Once we decide on the best way to find your data, the scraping routine can be scheduled to run at any interval you need. If you scheduled your scrape to launch daily, it could save the currently displayed price and inventory note in a database, or as a CSV you could open with Excel, or pretty much anything that works for you. Once done, your list wouldn’t ever be more than a day out of date.
Real estate listings are another field for which we get a lot of requests. These can be more complicated than retail items because of the number of options one needs to capture, and the wild array of different types of properties. Most have a section of valuable data points such as the number of bedrooms, bathrooms, neighborhood, etc. There is usually a text description, and often in this type of listing, the pictures are valuable, but they can almost always be captured.
For example, suppose you have four sites of listings we scrape daily. On each run, all are saved, and you will be able to tell which listings are no longer on the site, what is still active, and anything added since the last run, and we can deliver files with the listing data to you after each daily run.
Real time comparison
There are situations where one needs up-to-the-second, hot-off-the-griddle information. I’ve seen it used for commodity prices, lending rates, verification of inventory, and a number of other things.
For example, when you get an order on your website, you need to check 3 different supplier sites to make sure the ordered item is in stock, and find the lowest price. In this case we could create a nice little web page that you can go to, input the part number, and within a few moments the price and availability from each supplier is checked and displayed for you to choose from.
Most of the time when people think of scraping, they think of saving data from a site. While that is the most common scenario, you can also use scraping to post data to websites. It’s not hard as the HTTP involved is the same as the website searches we do every day.
An example of this we worked on was a car dealership that wanted to post their inventory online. Most of the very large automobile listing sites offer APIs or tools so there is no need for a scrape to post to them. This dealership also wanted to post to some smaller, more regional sites, though. In these cases we would acquire a list of new inventory to post, and the scraping routines would log into several sites, creating a new ad for each. Upon success, we would record the ad number, or link to the newly posted ad. In this way, ads for several new vehicles for sale could be posted to dozens of sites automatically.
We have created scrapes for attendees of trade shows, where we go to the webpage of an upcoming event, and find the list of participants to scrape the business name, contact name, phone number, business description, and booth location. The client would then use that list to prioritize places to visit during the show, and allow him some means of contact if he missed any or wanted to follow up with someone.
Most of the time if you need to migrate data from one website to another it is ideally done by accessing the source and database directly, but there are occasions where that is not possible. We had one client who had started a business, and her husband created the Customer Relationship Management (CRM) site they used. A few years and one ugly divorce later, she wanted to migrate off the ex-husband’s tool, and he refused to cooperate. This was a job for screen-scraper. We devised methods to log into the original site, scrape all the contacts, notes, appointments, etc., and save it in a new database ready to connect to a new site.
The crazy number of things you can do with screen-scraping can be intimidating, however it’s nothing more complex than using a program to do something a person could do on the internet. Therefore if you already have a process you go through on the internet, or you can envision one you would do if it weren’t so huge, it seems exactly like the sort of thing that screen scrapers do.