Scraping a Date Range

Much of the time in scraping, one wants to fill in a web form and grab the results, and many of the forms want the user to fill in a date range. It’s not a daunting prospect if you just want to scrape the form once, but for jobs where you want run a scrape weekly and get a full week’s worth of data making a script for that has been challenging. I have therefore developed a simple, generic script that will figure the date for a given number of days from today, and save it in session variable.

For the purposes of this post, I’m going to make a script give me a date for a week from today in the format of a 2 digit day, 2 digit month, and 4 digit year, however I’ll make those easy to change.

To start one needs to import some useful Java componants:

import java.util.*;
import java.text.*;

These allow us to go ahead and create an instance of “right now”.

Calendar rightNow = Calendar.getInstance();

This gives me a “right now” to which I can add 7 days to thusly:

rightNow.add( Calendar.DATE, 7 );

And all that is left is to format it:

Date endDate = rightNow.getTime();
Date endDate = rightNow.getTime();
SimpleDateFormat formatter = new SimpleDateFormat( “MM/dd/yyy” );
String newDate = formatter.format( endDate );

Now I have a nicely formatted local variable named newDate that I would just need to set as a session variable for the rest of the scrape to run.

session.setVariable(“NEW_DATE”, newDate);

That’s enough to make the script work, but in order to make it into a good template, one should make it easy to find and change the things that will have to set differently in each application. My attempt to do so ended up like this:

import java.util.*;
import java.text.*;

// Set number of days to add to current date.
addDays = 7;

// Set the format in which the date should be output.
String dateFormat = “MM/dd/yyyy”;

//Figure the new date.
Calendar rightNow = Calendar.getInstance();
rightNow.add( Calendar.DATE, addDays );
Date endDate = rightNow.getTime();
SimpleDateFormat formatter = new SimpleDateFormat( dateFormat );
String newDate = formatter.format( endDate );

// Output the new date.
session.setVariable(“NEW_DATE”, newDate);

Of course you can use this process to make more than one date for your form if needed; from here it should just be a matter of some minor editing.

For information on the date formatting, see the java page at: http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html

And for a trick to make the formatting of dates far easier when you’re in screen-scraper, read up on the reformatDate method that is available in the professional edition.

Leave a Comment