HTTPS connection issues

We’ve been seeing lots of issues with scrapes connecting to HTTPS sites. Some of the errors include

  • ssl_error_rx_record_too_long
  • An input/output error occurred while connecting to https:// … The message was peer not authenticated.
  • javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

The issue came about when the Heartbleed vulnerability necessitated changes to some HTTPS connections—some of types aren’t secure anymore, and new versions have come out. Screen-scraper needed two changes to catch up, and they are:

  • Update to use Java 8
  • Update of HTTPClient to 4.4

Both of these are pretty large changes, so they aren’t in the stable release yet, however in some cases they are the only option to make a scrape work, therefore here is the instructions to get what you need.The update to HTTPClient 4.4 was pushed in screen-scraper v 6.0.50a. Since it is a large change, some bugs are anticipated, and we’re working though them. You may, therefore see newer versions available, and that is good.

The update utility cannot update the bundled JRE. One can update the version without updating to Java 8, and it works pretty well, but in case you still cannot connect to a site, or a part of scrape isn’t working try updating the JRE. Linux/OSX/BSD can just install Java 8 to the system and follow these instructions to use it. The best solutions for Windows is to reinstall, so:

  1. Export your scraping sessions
  2. Download your installer and run it. Please don’t install to a directory where there is already an install of screen-scraper. You can either move the old one, of choose a new location for the installation.

 

Windows

OSX

Once done, you’ll be at v 6.0 with Java 8. You can try your scrape then, and it could work, but if not make sure you update to the newest version. We’ve been testing the new builds, and they are working well and very stable, but if you run across any bugs please report them and we’ll hop on them post haste.

7 thoughts on “HTTPS connection issues”

  1. We’ve seen a couple of instances where someone running screen-scraper v 6.0.61a and Java 8, and gotten an error:

    An input/output error occurred while connecting to 'https://xxxx'. The message was handshake alert: unrecognized_name.

    I see that is a error caused by a configuration error on the site, and most apps ignore the handshake. Java does not.

    If you edit the screen-scraper.properties, you can add a line

    EnableSNIExtension=false

    This is a global change, and has fixed cases that I have seen, but there is a chance this setting will adversely affect another HTTPS site. If seen, please let me know.

  2. Another issue with screen-scraper v 6.0.64a and Java 8. The NIExtension has no affect:

    An input/output error occurred while connecting to 'xxxx'. The message was sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.

    The root cause on this is either a certificate that is not properly signed by a certificate authority or there is a certificate trust chain that the HTTP client we use cannot handle.

    The fix for this is to use an alternet HTTP client. You will need to be at v 6.0.63a or newer, and (as of now) use a script that the start of the session to use the alternate client. Download the script from here

  3. We upgraded from 6.0.47a to 6.0.65a and started getting this error when we open workbench

    java.lang.NoClassDefFoundError: java/lang/AutoCloseable
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$000(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClassInternal(Unknown Source)
    at com.screenscraper.controller.ControllerMain.main(ControllerMain.java:303)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.exe4j.runtime.LauncherEngine.launch(Unknown Source)
    at com.exe4j.runtime.WinLauncher.main(Unknown Source)
    Caused by: java.lang.ClassNotFoundException: java.lang.AutoCloseable
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClassInternal(Unknown Source)
    … 19 more

  4. Parkash, this has been happening when one updates to 6.0.65a and is still using Java 6. You need to follow the steps on the main post to use an updated Java.

  5. I just updated my ubuntu 14.04 system to Java 8 and SS6.0 I am still getting the following error:
    Login page: An input/output error occurred while connecting to ‘http://webpage.php’. The message was peer not authenticated.

    Automatic updates are on, though I reinstalled just in case.

    Any suggestions?
    Randy

Leave a Comment