Version 5.0.28a of screen-scraper Released

Changes: Based on feedback, now allowing running the screen-scraper workbench and server simultaneously by adding the “AllowMultipleSimultaneousInstances” property to the screen-scraper.properties file. Fixed a bug where screen-scraper would freeze up when very large requests were included in proxy sessions and scrapeable files. Fixed a bug where space characters in URL’s would generate an error.

Version 5.0.27a of screen-scraper Released

Just a few changes in this one: Fixed a scrolling bug related to displaying script instances associated with extractor patterns. Removed a log message that was appearing each time a redirect occurred. screen-scraper will now display a “start page” when the workbench initially launches. The start page will hopefully be especially helpful for newer users. … Read moreVersion 5.0.27a of screen-scraper Released

Version 5.0.26a of screen-scraper Released

Fixed a number of bugs in this one: Made a bug fix that arose when available anonymous proxy servers was depleted to zero. Now disallowing running multiple screen-scraper interfaces simultaneously.  For example, previously the screen-scraper workbench could be run concurrently with the server.  This ended up causing database corruption in some cases, though, so we’re … Read moreVersion 5.0.26a of screen-scraper Released

Version 5.0 of screen-scraper released! And it’s on sale!!!

Okay, that was probably too many exclamation points in the title.  It’s with good reason, though.  Version 5.0 represents a major upgrade in screen-scraper’s functionality (take a glance at the release notes to see what I mean).  Not only have we made all kinds of bug fixes, but there are lots of enhancements to the … Read moreVersion 5.0 of screen-scraper released! And it’s on sale!!!

To Recurse is Human, to Iterate, Divine

Well, that’s actually not always true.  Take a quick look at this blog posting here.  The fundamental issue described by that posting is one of recursion vs. iteration.  When recursion is used (a page calls a page which calls a page…) objects tend to get stacked up, and subsequently fill up memory.  When iteration is … Read moreTo Recurse is Human, to Iterate, Divine

Tidy Time

So lately we’ve been experimenting with different tidiers in the latest alpha versions of screen-scraper.  This is the little utility that will clean up malformed HTML, making extraction easier.  For some time we’ve used a library called JTidy to handle this, which has worked quite well, but does have a couple of problems.  First, at … Read moreTidy Time