Often data sets become richer when they’re combined together. A good example of this is in a small study done by Streaming Observer on the quality of movies available from the big streaming services–Amazon, Netflix, Hulu, and HBO. The study concluded that, even though Amazon has by far the most movies, Netflix has more quality movies than the other three combined. This was determined by combining data about the movies available from each streaming service with data from Rotten Tomatoes, which ranks the quality of movies.
Thoughts
Enterprise-Scale Screen-Scraping
One of the main aspects that I think differentiates screen-scraper from many other solutions is its ability to handle large-scale scraping needs. Additionally, it was designed from the ground up to integrate with other systems, so it generally fits nicely into most any existing setup. If you’re doing a simple one-off data extraction project screen-scraper … Read moreEnterprise-Scale Screen-Scraping
Data Cravings
Yesterday ReadWriteWeb published an article entitled “Overwhelmed Executives Still Crave Big Data, Says Survey“. The basic gist of it is that data is vital to making business decisions, and many managers feel that they don’t have enough of it. This got me thinking about how screen-scraping plays into all of this. At a basic level, … Read moreData Cravings
Further thoughts on hindering screen-scraping
We previously listed some means to try to stop screen-scraping, but since it is an ongoing topic for us, it bears revisiting. Any site can be scraped, but some require such an influx of time and resources as to make it prohibitively expensive. Some of the common methods to do so are: Turing tests The … Read moreFurther thoughts on hindering screen-scraping
How to Measure Anything
A while back I was contacted by Douglas Hubbard regarding a book he was writing entitled How to Measure Anything. He was interested in finding out more about tools that could automate online data collection, and screen-scraping popped up on his list as one method to go about this. Last week Douglas contacted me indicating … Read moreHow to Measure Anything
Using screen-scraper to automatically test embedded devices
A while back I flew out to Huntsville, AL to work with a government contractor company on automating the testing of embedded devices. To this day I’m not entirely sure what these little machines did, but they each had a web interface that needed testing (much like that of a wireless router, if you’ve worked … Read moreUsing screen-scraper to automatically test embedded devices
Developing software by the 15% rule
Writing software on a consulting basis can often be a losing proposition for developers or clients or both. There are too many things that can go wrong, and that ultimately translates into loss of time and money. The “15% rule” we’ve come up with is intended to create a win-win situation for both parties (or … Read moreDeveloping software by the 15% rule
Three common methods for data extraction
Building off of my earlier posting on data discovery vs. data extraction, in the data extraction phase of the web scraping process you’ve already arrived at the page containing the data you’re interested in, and you now need to pull it out of the HTML. Probably the most common technique used traditionally to do this … Read moreThree common methods for data extraction
Data discovery vs. data extraction
Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping … Read moreData discovery vs. data extraction
Data mining vs. screen-scraping
Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts. In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll … Read moreData mining vs. screen-scraping