Difference between revisions of "Open Source Crawlers"

From Wiki History Database
Jump to: navigation, search
 
Line 19: Line 19:
 
http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP.
 
http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP.
  
[http://en.wikipedia.org/wiki/Nutch Nutch] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [http://en.wikipedia.org/wiki/Lucene Lucene] text indexing package.
+
http://en.wikipedia.org/wiki/Nutch [[Nutch]] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [http://en.wikipedia.org/wiki/Lucene Lucene] text indexing package.
  
 
[http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project].
 
[http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project].
Line 42: Line 42:
 
[http://www.agentkernel.com/ Agent Kernel] A Java framework for schedule, thread, and storage management when crawling.
 
[http://www.agentkernel.com/ Agent Kernel] A Java framework for schedule, thread, and storage management when crawling.
  
[[Category:FOSS]]
+
==Links==
 +
*http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers
 +
[[Category:This Site]]

Latest revision as of 12:56, 11 January 2009