Three common methods for data extraction

Building off of my earlier posting on data discovery vs. data extraction, in the data extraction phase of the web scraping process you’ve already arrived at the page containing the data you’re interested in, and you now need to pull it out of the HTML. Probably the most common technique used traditionally to do this … Read moreThree common methods for data extraction

New screen-scraper tutorial available

We’ve just released a new screen-scraper tutorial: http://www.screen-scraper.com/support/tutorials/tutorial7/tutorial_overview.php. It’s just received the blessing from our project manager and aspiring professional writer/editor, Jason Bellows, so it should be ready for public consumption. Here’s a snippet from the tutorial introduction: “It’s often the case in screen-scraping that you want to submit a form multiple times using different … Read moreNew screen-scraper tutorial available

Untrusted Server Certificate Chain fix

Some of you in the past may have run into this dreaded message when trying to access a site that uses HTTPS: java.security.cert.CertificateException: Untrusted Server Certificate Chain I’m happy to report that we’ve just issued a fix for that in version 2.7.0.1a. See this FAQ if you run into any trouble upgrading.

Data discovery vs. data extraction

Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping … Read moreData discovery vs. data extraction