[Tfug] finding linked pages

Glen Pfeiffer glen at thepfeiffers.net
Thu May 1 09:14:15 MST 2008


On 04/30/2008 11:08 PM, christopher wrote:
> Hi everyone. Is there a way to search for linked pages
> on a website?
> 
A web crawling app will do it. On Debian:

#aptitude search ~dcrawl | grep web
p   harvestman       - a very flexible web crawler application   
p   webcheck         - website link and structure checker  


#aptitude show harvestman
...
Description: a very flexible web crawler application
 HarvestMan can be used to download files from websites, 
 according to a number of user-specified rules. The latest 
 version of HarvestMan supports as much as 60 plus customization 
 options. HarvestMan is a console (command-line) application. 
 
 Homepage: http://harvestman.freezope.org/

#aptitude show webcheck
...
Description: website link and structure checker
 webcheck is a website checking tool for webmasters. It crawls a 
 given website and generates a number of reports in the form of 
 html pages. It is easy to use and generates simple, clear and 
 readable reports. 
 
 Features of webcheck include: 
 * support for http, https, ftp and file schemes 
 * view the structure of a site 
 * track down broken links 
 * find potentially outdated and new pages 
 * list links pointing to external sites 
 * can run without user intervention

-- 
Glen 






More information about the tfug mailing list