Complex Forms

Posted in Tips on 03.22.17 by jason

There are some sites that have some pretty complex forms–sometimes in the sheer number of parameters, or sometimes by being incomprehensible to humans. In such cases we have a method to get all the form elements for you.

Read the rest of this entry »

Version 7.0.1a released

Posted in Updates on 04.19.16 by jason

When you updated to version 7.0.1a, the first thing you’ll notice is spruced up GUI, but there is a quite a bit going on under the hood too. You can see all the release notes here.

If you want to use this update, here is the instruction to update.

Screen-scraper 7.0 Released

Posted in Updates on 03.02.16 by jason

This new stable version adds many new features, and give you the ability to scrape sites that are using the lastest SSL features.

Read the rest of this entry »

Dynamic Content

Posted in Tips on 10.28.15 by jason

One’s first experience with a page full of dynamic content can be pretty confusing. Generally one can request the HTML, but it’s missing the data that is sought.

What you’re usually seeing is a page that contains JavaScript which is making a subsequent HTTP request, and getting the data to add into the HTML. That subsequent HTTP response is often JSON, but can be plain HTML, XML, or myriad other things.

Read the rest of this entry »

HTTPS connection issues

Posted in Updates on 04.29.15 by jason

We’ve been seeing lots of issues with scrapes connecting to HTTPS sites. Some of the errors include

  • ssl_error_rx_record_too_long
  • An input/output error occurred while connecting to https:// … The message was peer not authenticated.
  • javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

The issue came about when the Heartbleed vulnerability necessitated changes to some HTTPS connections—some of types aren’t secure anymore, and new versions have come out. Screen-scraper needed two changes to catch up, and they are:

  • Update to use Java 8
  • Update of HTTPClient to 4.4

Both of these are pretty large changes, so they aren’t in the stable release yet, however in some cases they are the only option to make a scrape work, therefore here is the instructions to get what you need. Read the rest of this entry »

Scraping data from various industries

Posted in Miscellaneous on 06.10.13 by Todd Wilson

We’ve just added several new scraping sessions that exemplify extracting data from sites in various industries.  If you go to our home page and click on one of the buttons corresponding to an industry you’ll be taken to a page where you can download the scraping session.  The e-commerce section also has a video to walk you through the process, and we’ll be adding videos to the others shortly.

Apache Commons

Posted in Uncategorized on 05.28.13 by jason

We’ve recently included libraries for Apache Commons Lang. There is a large number of useful things in there, but I find most use for stringUtils and wordUtils.

For example, some sites one might scrape might have the results in all caps. You could:

import org.apache.commons.lang.*;

name = “GEORGE WASHINGTON CARVER”;
name = StringUtils.lowerCase(name);
name = WordUtils.capitalize(name);
session.log(“Name now shows as: ” + name);

At the end, the name is now formatted as “George Washington Carver”. Most all of the methods are already nullsafe, and there is a lot of little tools in there to try.

End-of-year sale!

Posted in Miscellaneous on 11.29.12 by Todd Wilson

This is our biggest sale in quite a while.  Until December 31, 2012 take 40% off Professional Edition licenses and 60% off Enterprise Edition licenses.  Click here to take advantage.

Version 6.0.18a of screen-scraper Released

Posted in Updates on 10.16.12 by Todd Wilson

A few minor updates in the one, along with a long-awaited global find feature!

Let Us Help You Learn screen-scraper

Posted in Uncategorized on 07.19.12 by Todd Wilson

We are pleased to announce our new coaching program. To help get started, our new users can receive up to two free hours of one-on-one coaching (click here for details).

Existing users, receive help planning out your project, solving that one tough issue, learn new techniques and refine your current scraping projects. Purchase hours of training by calling our offices at 800-672-0113.

Previous Entries »