Difference between revisions of "Open Source Crawlers"

From Wiki History Database
Jump to: navigation, search
Line 19: Line 19:
 
http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP.
 
http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP.
  
[[Nutch]] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [[Lucene]] text indexing package.
+
[http://en.wikipedia.org/wiki/Nutch Nutch] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [http://en.wikipedia.org/wiki/Lucene Lucene] text indexing package.
  
 
[http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project].
 
[http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project].

Revision as of 11:18, 31 October 2007