Thoughts Archives - screen-scrapeable

Combining Scraped Data from Multiple Sites

March 11, 2019January 30, 2019 by Todd Wilson

Often data sets become richer when they’re combined together. A good example of this is in a small study done by Streaming Observer on the quality of movies available from the big streaming services–Amazon, Netflix, Hulu, and HBO. The study concluded that, even though Amazon has by far the most movies, Netflix has more quality movies than the other three combined. This was determined by combining data about the movies available from each streaming service with data from Rotten Tomatoes, which ranks the quality of movies.

Enterprise-Scale Screen-Scraping

November 23, 2010 by Todd Wilson

One of the main aspects that I think differentiates screen-scraper from many other solutions is its ability to handle large-scale scraping needs. Additionally, it was designed from the ground up to integrate with other systems, so it generally fits nicely into most any existing setup. If you’re doing a simple one-off data extraction project screen-scraper … Read moreEnterprise-Scale Screen-Scraping

Data Cravings

November 10, 2010 by Todd Wilson

Yesterday ReadWriteWeb published an article entitled “Overwhelmed Executives Still Crave Big Data, Says Survey“. The basic gist of it is that data is vital to making business decisions, and many managers feel that they don’t have enough of it. This got me thinking about how screen-scraping plays into all of this. At a basic level, … Read moreData Cravings

Further thoughts on hindering screen-scraping

August 17, 2009 by jason

We previously listed some means to try to stop screen-scraping, but since it is an ongoing topic for us, it bears revisiting. Any site can be scraped, but some require such an influx of time and resources as to make it prohibitively expensive. Some of the common methods to do so are: Turing tests The … Read moreFurther thoughts on hindering screen-scraping

How to Measure Anything

April 30, 2007March 16, 2007 by Todd Wilson

A while back I was contacted by Douglas Hubbard regarding a book he was writing entitled How to Measure Anything. He was interested in finding out more about tools that could automate online data collection, and screen-scraping popped up on his list as one method to go about this. Last week Douglas contacted me indicating … Read moreHow to Measure Anything

Using screen-scraper to automatically test embedded devices

September 12, 2006 by Todd Wilson

A while back I flew out to Huntsville, AL to work with a government contractor company on automating the testing of embedded devices. To this day I’m not entirely sure what these little machines did, but they each had a web interface that needed testing (much like that of a wireless router, if you’ve worked … Read moreUsing screen-scraper to automatically test embedded devices

Developing software by the 15% rule

August 24, 2006 by Todd Wilson

Writing software on a consulting basis can often be a losing proposition for developers or clients or both. There are too many things that can go wrong, and that ultimately translates into loss of time and money. The “15% rule” we’ve come up with is intended to create a win-win situation for both parties (or … Read moreDeveloping software by the 15% rule

Three common methods for data extraction

March 21, 2006March 21, 2006 by Todd Wilson

Building off of my earlier posting on data discovery vs. data extraction, in the data extraction phase of the web scraping process you’ve already arrived at the page containing the data you’re interested in, and you now need to pull it out of the HTML. Probably the most common technique used traditionally to do this … Read moreThree common methods for data extraction

Data discovery vs. data extraction

March 16, 2006 by Todd Wilson

Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping … Read moreData discovery vs. data extraction

Data mining vs. screen-scraping

February 16, 2006 by Todd Wilson

Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts. In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll … Read moreData mining vs. screen-scraping