Building off of my earlier posting on data discovery vs. data extraction, in the data extraction phase of the web scraping process you’ve already arrived at the page containing the data you’re interested in, and you now need to pull it out of the HTML. Probably the most common technique used traditionally to do this … Read moreThree common methods for data extraction
We’ve just released a new screen-scraper tutorial: http://www.screen-scraper.com/support/tutorials/tutorial7/tutorial_overview.php. It’s just received the blessing from our project manager and aspiring professional writer/editor, Jason Bellows, so it should be ready for public consumption. Here’s a snippet from the tutorial introduction: “It’s often the case in screen-scraping that you want to submit a form multiple times using different … Read moreNew screen-scraper tutorial available
Some of you in the past may have run into this dreaded message when trying to access a site that uses HTTPS: java.security.cert.CertificateException: Untrusted Server Certificate Chain I’m happy to report that we’ve just issued a fix for that in version 184.108.40.206a. See this FAQ if you run into any trouble upgrading.
Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping … Read moreData discovery vs. data extraction
Come ‘n get it, friends and neighbors. You can download it fresh from our site or update your existing instance. This is definitely our cleanest release yet. Probably the coolest feature in my opinion is the RSS stuff. Check out our new tutorial on it. It may end up being kind of a “gee whiz” … Read moreVersion 2.7 of screen-scraper available
Up till now it’s been a pretty big pain to add a number to a session variable. Oftentimes you’ll have something like a page number that you need to increment as you loop through search results pages. The page number is usually stored as a String, and to increment it you normally have to cast … Read moreAdding numbers to session variables
OK, so one bug slipped under our radar. Fortunately, it’s been fixed and hopefully this one will become 2.7. Please feel free to upgrade and let us know of anything quirky.
This probably isn’t a big deal to most of you reading, but we can now accept credit cards on our site. For a great while we only handled PayPal, which turned out to be a pretty big pain in the patookis for many. So if you don’t have a PayPal account, and have hesitated registering … Read moreNow accepting credit cards
Get it while it’s hot. This could become version 2.7. We’re doing our own internal hammering on this one, but please let me know if any of you out there find bugs we miss. As usual, you can reach me at todd-[at]-screen-scraper.com.
I must be on some kind of tutorial rampage. I’ve just written a fifth tutorial on an oft-requested topic: inserting scraped data into databases. You can find it here: http://www.screen-scraper.com/support/tutorials/tutorial5/tutorial_overview.php. For quite a while I mulled over how to approach this given how many ways there are to go about it. Recently I had somewhat … Read moreNew screen-scraper tutorial on inserting data into databases