September 2008 Archives

Here's a HUGE tip for those of you who are about to build a screen scraper with WWW::Mechanize and Hpricot. If you have any intention of parsing hundreds or thousands of pages (you probably do, right, otherwise you wouldn't be screen scraping), you better set max_history to a very low number (say, 1 for example).

If you don't, Mechanize will remember every page you visit... and that can add up quickly to megabytes of memory if you're scanning multiple pages.

Who's this guy?

Aaron Longwell is Chief Web Craftsman at New Media Logic Corporation in Coeur d' Alene, Idaho. As a professional software developer for 12 years and a student of public policy, he occasionally has interesting things to say about software, technology, culture and politics.

Subscribe to feed Subscribe to my RSS Feed

  • View Aaron Longwell's profile on LinkedIn
  • Recommend Me