Difference between revisions of "Open Source Crawlers"
From Wiki History Database
NewUserName (Talk | contribs) |
|||
Line 19: | Line 19: | ||
http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP. | http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP. | ||
− | [[Nutch]] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [ | + | http://en.wikipedia.org/wiki/Nutch [[Nutch]] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [http://en.wikipedia.org/wiki/Lucene Lucene] text indexing package. |
[http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project]. | [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project]. | ||
Line 42: | Line 42: | ||
[http://www.agentkernel.com/ Agent Kernel] A Java framework for schedule, thread, and storage management when crawling. | [http://www.agentkernel.com/ Agent Kernel] A Java framework for schedule, thread, and storage management when crawling. | ||
− | [[Category: | + | ==Links== |
+ | *http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers | ||
+ | [[Category:This Site]] |