How to Stop WWW::Mechanize Memory "Leaks"

Here's a HUGE tip for those of you who are about to build a screen scraper with WWW::Mechanize and Hpricot. If you have any intention of parsing hundreds or thousands of pages (you probably do, right, otherwise you wouldn't be screen scraping), you better set max_history to a very low number (say, 1 for example).

If you don't, Mechanize will remember every page you visit... and that can add up quickly to megabytes of memory if you're scanning multiple pages.

0 TrackBacks

Listed below are links to blogs that reference this entry: How to Stop WWW::Mechanize Memory "Leaks".

TrackBack URL for this entry: http://aaronlongwell.com/mt/mt-tb.cgi/9

Leave a comment


Type the characters you see in the picture above.

Who's this guy?

Aaron Longwell is Chief Web Craftsman at New Media Logic Corporation in Coeur d' Alene, Idaho. As a professional software developer for 12 years and a student of public policy, he occasionally has interesting things to say about software, technology, culture and politics.

Subscribe to feed Subscribe to my RSS Feed

  • View Aaron Longwell's profile on LinkedIn
  • Recommend Me