Astute screen-scraper Fred came up with a scenario that arises from time-to-time: you’ve got a page containing one or more HTML tables, all of which are nearly identical in structure. You want to pull the data from each table, but need to be able to distinguish which row came from which table. Standard old extractor … Read moreScraping data from similar tables
Building off of my earlier posting on data discovery vs. data extraction, in the data extraction phase of the web scraping process you’ve already arrived at the page containing the data you’re interested in, and you now need to pull it out of the HTML. Probably the most common technique used traditionally to do this … Read moreThree common methods for data extraction
We’ve just released a new screen-scraper tutorial: http://www.screen-scraper.com/support/tutorials/tutorial7/tutorial_overview.php. It’s just received the blessing from our project manager and aspiring professional writer/editor, Jason Bellows, so it should be ready for public consumption. Here’s a snippet from the tutorial introduction: “It’s often the case in screen-scraping that you want to submit a form multiple times using different … Read moreNew screen-scraper tutorial available
Some of you in the past may have run into this dreaded message when trying to access a site that uses HTTPS: java.security.cert.CertificateException: Untrusted Server Certificate Chain I’m happy to report that we’ve just issued a fix for that in version 18.104.22.168a. See this FAQ if you run into any trouble upgrading.
Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping … Read moreData discovery vs. data extraction
Come ‘n get it, friends and neighbors. You can download it fresh from our site or update your existing instance. This is definitely our cleanest release yet. Probably the coolest feature in my opinion is the RSS stuff. Check out our new tutorial on it. It may end up being kind of a “gee whiz” … Read moreVersion 2.7 of screen-scraper available
Up till now it’s been a pretty big pain to add a number to a session variable. Oftentimes you’ll have something like a page number that you need to increment as you loop through search results pages. The page number is usually stored as a String, and to increment it you normally have to cast … Read moreAdding numbers to session variables
OK, so one bug slipped under our radar. Fortunately, it’s been fixed and hopefully this one will become 2.7. Please feel free to upgrade and let us know of anything quirky.
This probably isn’t a big deal to most of you reading, but we can now accept credit cards on our site. For a great while we only handled PayPal, which turned out to be a pretty big pain in the patookis for many. So if you don’t have a PayPal account, and have hesitated registering … Read moreNow accepting credit cards
Get it while it’s hot. This could become version 2.7. We’re doing our own internal hammering on this one, but please let me know if any of you out there find bugs we miss. As usual, you can reach me at todd-[at]-screen-scraper.com.