Release notes for screen-scraper

Public Release 5.0 (06.30.10)

  • feature: added REST interface
  • feature: can now filter out less useful proxy transactions
  • feature: added DataManager to facilitate saving data to a database
  • feature: generate multiple scrapeable files from proxy session
  • feature: made button bar persistent for extractor patterns
  • feature: retained number of lines to display for scraping session log between sessions
  • feature: updated scrapeable file icons to indicate when they are and are not invoked in sequence
  • feature: added a delete option for scraping sessions to web interface
  • feature: enhanced data set viewer with list view and colored tokens
  • feature: improved script error messages
  • feature: added a method to allow HTTP parameters to be removed from scrapeable files
  • feature: added logging levels to scraping session
  • feature: added ability to compare request in scrapeable file with transaction in proxy session
  • feature: enhanced breakpoint window to show more information, such as current script and number of scripts on the stack
  • feature: added syntax highlighting to extractor pattern pane
  • feature: added ability to pause/breakpoint a scraping session with a button
  • feature: extracted data can now be highlighted in last response tab
  • feature: pane now scrolls down when an extractor pattern is added
  • feature: character set can now be determined on a scraping session and scrapeable file level
  • feature: added ability to limit length of response for a scrapeable file
  • feature: enhanced handling of database backups over time
  • feature: can now add more session variables to a scheduled scraping session in the web interface
  • feature: added ability to clear completed scraping sessions from web interface
  • feature: enhanced a few default regular expressions
  • feature: properties file can now be reloaded from the web interface
  • feature: can now copy and paste sub-extractor patterns
  • feature: added ability to trim white space from extracted data
  • feature: added a couple of new options to invoking scripts from an extractor pattern
  • feature: added sutil to handle more general methods
  • feature: provided a way to null out session variables for tokens that didn't match
  • feature: provided a way to save data sets without appending to an existing data set
  • feature: added session.setMaxConcurrentFileDownloads
  • feature: added ability to install multiple screen-scraper services in Windows
  • feature: now higlighting selected words in script text pane
  • feature: added code completion and macros to script pane
  • feature: now using syntax highlighting in last response tab
  • feature: added alternative HTML tidier
  • feature: added notes column to proxy
  • feature: added getv and setv to session object
  • feature: now limiting script stack size in order to avoid memory problems
  • feature: added ability to force files to be regarded as non-binary
  • feature: added scrapeableFile.connectionTimedOut
  • feature: added find feature in proxy session
  • bugfix: rearranged and made redundant some GUI elements to make working with scrapeable files easier
  • bugfix: extractor pattern token window no longer scrolls to the bottom when a new token is added
  • bugfix: scrollable panes no longer scroll to the bottom when first viewed
  • bugfix: now retaining scroll position in panes when user selects various tabs
  • bugfix: the find dialog box now appears within screen-scraper's frame by default
  • bugfix: improved default open/save dialog box on Windows and Mac OS X
  • bugfix: added message if DATARECORD is absent when a sub-extractor pattern is added
  • bugfix: fixed resizing of child elements in breakpoint window
  • bugfix: fixed a bug where scraping session notes couldn't be deleted
  • bugfix: now clearing main panel when a folder gets deleted
  • bugfix: fixed a bug where a copied extractor pattern would retain script instances
  • bugfix: main panel is now getting cleared when a script gets deleted
  • bugfix: now updating list of scraping sessions for proxy session when a scraping session is renamed
  • bugfix: session.getNotes() was generating an exception
  • bugfix: pop-up windows are now appearing closer to the mouse cursor
  • bugfix: fixed an issue where scrapeable files couldn't be generated from certain proxy transactions
  • bugfix: made various fixes to proxy so that it more accurately identifies binary and non-binary responses
  • bugfix: fixed a bug where extractor patterns weren't being generated from selected HTML
  • bugfix: now clearing lower pane when proxy transactions are deleted
  • bugfix: fixed dataSet.writeToFile so that column headers are updated correctly
  • bugfix: now remembering wrap text state in scripts
  • bugfix: now properly resequencing scrapeable files upon deletion
  • bugfix: now accurately indicating when a request is multi-part
  • bugfix: fixed an issue where logs were being truncated
  • bugfix: improved handling of international characters in RemoteScrapingSession
  • bugfix: fixed an issue on import when character set wasn't indicated
  • bugfix: improved handling of hard returns in extractor patterns containing embedded variables
  • bugfix: improved error message on export
  • bugfix: improved handling of null values with data records
  • bugfix: will now recreate log file and continue logging when log file gets deleted
  • bugfix: fixed an issue where extractor patterns weren't getting highlighted properly after edit
  • bugfix: improved handling of large proxy transactions
  • bugfix: fixed an issue when resolving certain URL's from relative to absolute
  • bugfix: now exporting scripts that are invoked via session.executeScript
  • bugfix: improved handling of breakpoints in server mode
  • bugfix: fixed an issue where script pane wasn't being updated on import
  • bugfix: fixed an issue where tokens with duplicate names in sub-extractor patterns weren't being saved properly
  • bugfix: made running time human-readable in web interface
  • bugfix: user's IP address is now displayed when access is denied
  • bugfix: fixed a bug where the extracted data window couldn't be displayed while the breakpoint window was visible
  • bugfix: no longer overwriting .vmoptions files
  • bugfix: now using scraping session character set when exporting
  • bugfix: the ? character is now disallowed in object names
  • bugfix: logging level was always defaulting to debug when invoking scraping sessions from the command line
  • bugfix: can now copy text from the last request tab
  • bugfix: now displaying an error when invalid regular expression is entered in token
  • bugfix: now coloring text in log when tidying fails
  • bugfix: enhanced resizing of table columns
  • bugfix: fixed an issue where GUI would freeze up when applying an extractor pattern while scraping session was running
  • bugfix: made a few minor fixes to the .NET driver
  • bugfix: enhanced Ruby driver to be more Ruby-like
  • bugfix: updated XML libraries
  • bugfix: fixed a bug where an exception was being through when values were blank in file used by session.loadVariables
  • bugfix: fixed BASE HREF issue when viewing HTML in local web browser
  • bugfix: now deprecating unstable Windows features, including using IE as the HTTP client and allowing VBScript as a scripting language
  • bugfix: now allowing parentheses to be used in regular expressions, as well as back references
  • bugfix: fixed an issue where a script wasn't being deleted when it's parent folder was deleted
  • bugfix: fixed an issue where headers were being munged in certain redirect responses
  • bugfix: made visual sequencing of invoked scripts more logical
  • bugfix: improved progress bar when downloading an update
  • bugfix: improved placement of pop-up windows in web interface
  • bugfix: session.loadVariables now allows spaces before and after = symbol
  • bugfix: text not wrapping by default if checkbox was checked in script pane
  • bugfix: auto-refresh not occurring in web interface if checkbox is initially checked
  • bugfix: no longer requiring web.htm in web interface URL
  • bugfix: now highlighting button corresponding to current section in settings dialog box
  • bugfix: enhanced icons in Windows
  • bugfix: added icons for all menu items and buttons
  • bugfix: updated PHP class for better backward compatibility

Public Release 4.5 (03.04.09)

  • feature: syntax highlighting in the script editor
  • feature: added icons to menu items
  • feature: logging levels (debug, info, warn, error)
  • feature: general optimization in both workbench and server
  • feature: session.clearAllSessionVariables
  • feature: context menu on root folder
  • feature: scripts now automatically import com.screenscraper.common.*
  • feature: port conflicts now being output to the error.log file
  • feature: added method to determine whether or not session is running
  • feature: emails sent regarding anonymization status
  • feature: ability to log int values
  • feature: user notified when database connection is lost
  • feature: scripts can be force overwritten via a property in the properties file
  • feature: time stamp added to exported scraping session file
  • feature: look and feel can be set
  • feature: memory usage displayed in workbench
  • feature: memory usage accessible via a method call
  • feature: user is notified in proxy session if external proxy is set
  • feature: custom HTTP headers can be added
  • feature: added "Nickname" property
  • bugfix: proxy sessions weren't handling web sites utilizing non-standard port numbers
  • bugfix: fixed quirks with scrapeable file names
  • bugfix: bug in reordering scrapeable file parameters
  • bugfix: scraping session start/stop button not updating properly
  • bugfix: in some cases not displaying correct script in breakpoint window
  • bugfix: mapping sets and options on tokens not applying in data set window
  • bugfix: invalid URL message not being trapped properly
  • bugfix: problem importing scraping sessions with unusual names
  • bugfix: double prompts when overwriting scripts
  • bugfix: tidying can now be turned off in basic edition
  • bugfix: now producing a better error message when exporting a scraping session using an invalid character set
  • bugfix: highlight was offset in some cases in finding text
  • bugfix: clicking a proxy transaction with large post data was chewing up CPU
  • bugfix: session variables can be embedded in extractor patterns (again)
  • bugfix: visual state for proxy sessions not being retained
  • bugfix: token options and mappings not applying when data is extracted manually from a script
  • bugfix: cleared up quirks with script instance drop-down list
  • bugfix: last number of records scraped inaccurately recorded in web interface
  • bugfix: notify user when script doesn't import because existing script is not to be overwritten
  • bugfix: scraping session file locked after export
  • bugfix: scroll and cursor location not remembered on last response tab
  • bugfix: context menu in log not displaying correctly
  • bugfix: settings window is resizable to accommodate different font sizes and screen resolutions
  • bugfix: parameter type drop-down list disappearing when parameters were deleted

Public Release 4.0 (01.21.08)

  • feature: web interface for scheduling and managing scrapes
  • feature: added real-time integration with external applications
  • feature: automatic anonymization
  • feature: scrapeable files and extractor patterns can be copied and pasted
  • feature: added a "notes" to scraping sessions
  • feature: improved cookie compatibility
  • feature: added sequence to sub-extractor patterns
  • feature: scraping sessions can now be run directly from the command line
  • feature: HTML entities can now be automatically converted from scraped data
  • feature: cookies can be cleared for a scraping session
  • feature: last response for a scrapeable file can now be viewed in a browser
  • feature: current time and elapsed time can be output in a script
  • feature: greatly improved look 'n feel on mac os x
  • feature: added new regular expressions
  • feature: "update.zip" files will be decompressed and imported
  • feature: objects in the tree can be deleted with the "delete" key
  • feature: enhanced the "status" bar
  • feature: the licensed email address now appears in the "about" screen
  • feature: the default file extension for exported objects is now "sss"
  • feature: a "start/stop scraping" button was added to the scraping session "log" tab
  • feature: HTML can be automatically stripped from extracted data
  • feature: screen-scraper can check for updates on startup
  • feature: enhanced installers
  • bugfix: mappings were not being imported properly from exported scraping sessions
  • bugfix: null interpolated session variables were not being properly handled
  • bugfix: "deflate" encoding was not being properly handled
  • bugfix: in some cases sequence numbers were being duplicated for scrapeable files
  • bugfix: in certain cases folders could not be deleted
  • bugfix: the proxy server was misidentifying some files as binary
  • bugfix: the "last response" tab was blanking out prematurely in some cases
  • bugfix: now catching class loader exceptions for jar files compiled with a higher java version
  • bugfix: ports weren't being displayed for SSL URL's in the proxy
  • bugfix: exceptions thrown in scripts were causing some subsequent scripts not to be executed
  • bugfix: various fixes for windows vista
  • bugfix: mapping sets were not always being deleted properly
  • bugfix: multiple command line instances were not being handled properly
  • bugfix: drag 'n drop to folders in some cases wasn't working
  • bugfix: double-clicking extractor pattern tokens didn't always allow them to be edited
  • bugfix: extractor pattern tokens were getting repeated after editing a token
  • bugfix: too high sequence numbers for extractor patterns was causing them to disappear
  • bugfix: new scripts weren't being sorted properly
  • deprecated: embedded session variables in extractor patterns
  • deprecated: the "Optional?" flag for extractor pattern tokens
  • deprecated: the "Run Script" button
  • deprecated: automatic joining of data sets
  • deprecated: RunnableScrapingSession for everything but enterprise edition

Public Release 3.0 (01.10.07)

  • feature: added a "Find" feature to the scraping session log and script panel.
  • feature: the scraping session log can now be limited to a specified number of lines.
  • feature: the scraping session log can automatically remain scrolled to the end.
  • feature: scripts can now be called from other scripts.
  • feature: the database now gets backed up automatically.
  • feature: screen-scraper can now be registered in a GUI-less environment.
  • feature: tab state is now preserved when moving between objects.
  • feature: added context menus for editing commands.
  • feature: upgraded Mac interface to be like Windows and Linux.
  • feature: added a new library used to write out XML from scripts.
  • feature: enhanced firewall handling.
  • feature: for new installs, the user is now referred to the tutorials.
  • feature: screen-scraper now checks for blocked ports on startup.
  • feature: added a method to load and save session state between sessions.
  • feature: integrated a new HTML renderer.
  • feature: objects can now be organized into folders.
  • feature: improved "Strip HTML" feature.
  • bugfix: fixed an issue related to passing in remote variables containing the ! character.
  • bugfix: fixed an issue related to truncated error messages in scripts.
  • bugfix: when invoked from the command line with no parameters the "params" variable was coming through as void.
  • bugfix: in some cases duplicate scripts were showing up on import.
  • bugfix: there was an issue related to saving while a command line instance was running.
  • bugfix: fixed an issue in the proxy related to URL's containing multiple adjacent slash characters.
  • bugfix: in some cases the database was closing prematurely.
  • bugfix: fixed an issue related to repainting after an extractor pattern was added.
  • bugfix: the "breakpoint" window wasn't always updating properly.
  • bugfix: addressed issues related to database corruption.
  • bugfix: fixed a bug related to tildes in URL's.
  • bugfix: made multiple fixes related to international character sets and non-ASCII characters.
  • bugfix: fixed a few issues related to running screen-scraper in various modes simultaneously.

Public Release 2.7.2 (03.24.06)

  • bugfix: updated the http-client library to accept all SSL certificates.
  • bugfix: in certain situations the database was getting closed prematurely when screen-scraper was invoked from the command line.

Public Release 2.7 (03.08.06)

  • feature: screen-scraper can now generate RSS feeds from scraped data.
  • feature: added session.addToSessionVariable method.
  • feature: log messages have been enhanced and clarified.
  • feature: all of screen-scraper's ports are now settable in the properties file.
  • feature: the web server can now be disabled.
  • feature: because of a bug in the third-party library that handles the VBScript engine we included a warning in screen-scraper when using VBScript.
  • bugfix: hot swapping scraping sessions and scripts has been improved.
  • bugfix: the server can now be run via the shell scripts on more recent versions of Mac OS X.
  • bugfix: a few fixes were made to increase database robustness.

Public Release 2.6 (11.01.05)

  • feature: international character sets are now supported.
  • feature: files can be uploaded within scrapeable files.
  • feature: added scrapeableFile.saveFileOnRequest, which allows for binary files to be downloaded via POST requests.
  • feature: added session.reformatDate, which allows for extracted dates to be reformatted.
  • bugfix: fixed bugs where harmless SQL errors were being generated.
  • bugfix: under certain circumstances errors would occur when proxying binary files.

Public Release 2.5 (08.02.05)

  • feature: automatic hot swap from the "import" folder on start-up
  • feature: scripts can be stopped mid-stream
  • feature: tidying settable on a scrapeable file level
  • feature: external proxy settable on a scraping session level
  • feature: workbench, server, and command line can be run simultaneously
  • feature: added a system tray icon for the server when running on Windows
  • feature: added scrapeableFile.extractData and scrapeableFile.extractOneValue
  • feature: added "mappings" feature for extractor pattern tokens
  • feature: implemented saving and loading of state
  • feature: caching of data sets
  • feature: filtering duplicates from data sets
  • feature: regular expressions can now be designated from a drop-down list
  • feature: HTML can be automatically stripped from extracted data
  • feature: requests can be made multiple times for a URL in case of failures
  • change: multiple script instances can be deleted at once
  • change: text box is highlighted in the "find" dialog box by default
  • change: changed highlight color for "find" feature
  • change: "last response" is now cleared before exporting
  • change: installer now sets working directory and installs COM driver
  • change: enhanced dataSet.writeToFile
  • change: added "Strict Mode" cookie policy
  • change: upgraded some third-party libraries
  • change: performed a number of code optimizations
  • bugfix: an error message related to help files was being output to the error log
  • bugfix: dataset window spawned from "breakpoint" dialog window wasn't getting initial focus
  • bugfix: resolved database corruption issues
  • bugfix: server now generates logs by default
  • bugfix: scrapingSession.downloadFile now makes use of existing cookies

Public Release 2.0 (02.02.05)

  • feature: option for disabling log file generation when run as server
  • feature: sending email through scripts
  • feature: SOAP connection support
  • feature: updated look and feel
  • feature: button bar for commonly used tasks
  • feature: status bar for application messages
  • feature: screen-scraper is automatically installed as a service in the professional edition
  • change: single "Import..." menu item instead of choosing between scraping sessions or scripts
  • change: "Yes to all" on import
  • change: merge cookie drop downs menu items in scraping session general tab
  • bugfix: new scripts with the same name will get an icremented number
  • bugfix: vbscript scripts can no be invoked when in server mode

Public Release 1.5 (09.11.04)

  • change: HTTPS is now handled with a temporary secure certificate
  • change: Rename gui to workbench
  • feature: Cookie handling option in scraping sessions
  • feature: .Net connector added
  • feature: Local files can now be scraped
  • feature: Delete table rows by right-click and pop-up menu
  • feature: Edit menu w/ copy, paste, etc. for text boxes
  • feature: Allow selection and deletion of multiple HTTP transactions from table
  • feature: Undo/redo on text boxes from Edit menu
  • feature: Search function in "Last Response" tab
  • feature: Script instances can be enabled/disabled
  • feature: Save and restore last window size.
  • feature: Data sets can be written to a delimited file
  • feature: Basic, Digest or NTLM Authentication handling in scraping session
  • feature: Hot deploy by copying scraping sessions and independent scripts to import dir
  • feature: Breakpoint debugging in scripts
  • feature: Extensibility by adding custom jars to the ext dir
  • bugfix: Extractor pattern token data is now saved by default when editor window is closed
  • bugfix: Confirm overwrite on export
  • bugfix: When an error occurs in getting the html page the http code is displayed in the log such as 404, etc.
  • bugfix: "Chunked" tranfer encoding now handled properly in proxy server
  • bugfix: New scraping sessions and scripts default names will increment

Public Release 1.2 (06.02.04)

  • Numerous bug fixes and optimizations
  • Sub-extractor patterns
  • More flexible cookie handling
  • New methods added to built-in screen-scraper objects

Release 1.1.5 (10.01.03)

  • Several bug fixes and a few minor feature enhancements.
  • Two new tutorials are now available.

Release 1.1 (09.02.03)

  • Numerous bug fixes and minor feature enhancements
  • Internal scripts can now be written in Interpreted Java, JavaScript, JScript, Perl, Python, or VBScript.
  • The current scrapeable file can now be accessed within a script, also allowing access to the full data scraped for a page.
  • A method can be called to determine if an error occurred while the file was being requested.
  • Scraping sessions can be paused in a script.
  • Maximum number of concurrent scraping sessions can be set via a property.
  • The connection timeout can now be set via a property.

Release 1.0 (07.31.03)

  • Numerous bug fixes and minor feature enhancements
  • Improvements to sever security
  • Extracted data can automatically be saved into session variables
  • Extracted data can be joined or appended to existing data sets
  • Significant improvements to the install procedure
  • Imrpovements to documentation
  • Self-updating when new versions become available
  • Improved usability of running screen-scraper as a server

Release 0.9.5b (06.12.03)

  • Various improvements in documentation
  • Several bug fixes and minor feature enhancements were made.
  • Several optimization and memory leak issues resolved.
  • Data set and data record objects can be accessed from remote sources (e.g. ASP or PHP scripts.
  • A lock file now gets generated when screen-scraper starts up in order to allow only one instance to be run at a time, avoiding potential database corruption.
  • Basic authentication parameters are now associated directly with a scrapeable file.

Release 0.8.7b (05.27.03)

  • Includes several bug fixes and feature enhancements.
  • Allows screen-scraper to import and export objects.
  • Improved support for external proxies, including those that make use of NTLM.

Release 0.8.6b (03.04.03)

  • Fixes several miscellaneous bugs.
  • screen-scraper can now clean up HTML using JTidy in order to facilitate data extraction.

Release 0.8.5b (02.18.03)

  • Fixed a bug in the proxy server that garbled some URL query strings.

Release 0.8.5a (02.08.03)

  • screen-scraper now uses HttpClient (http://jakarta.apache.org/commons/httpclient/) to handle all of the HTTP transactions, which allows for a broader range of sites to be correctly scraped.

Release 0.8.4b (01.15.03)

  • added ability to invoke screen-scraper from the command line
  • added ability for screen-scraper as a server
  • creating language bindings for Java, PHP, and COM
  • when viewing the last response from scrapeable files HTTP headers are now
  • displayed and removed depending on whether the content is viewed as text or HTML
  • patterns can be formed by highlighting HTML
  • extractor tokens can be created from highlighting HTML

Release 0.8.2b (11.17.02)

  • context-sensitive documentation added
  • several bug fixes and feature enhancements
  • added support for an external proxy server
  • added "settings" dialog

Release 0.8b (10.22.02)

  • initial release