Difference between revisions of "Open Source Crawlers"
From Wiki History Database
NewUserName (Talk | contribs) |
NewUserName (Talk | contribs) |
||
Line 19: | Line 19: | ||
http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP. | http://bithack.se/methabot/ Methabot] is a speed-optimized web crawler and command line utility written in C programming language and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP. | ||
− | + | http://en.wikipedia.org/wiki/Nutch [[Nutch]] is a crawler written in Java and released under an Apache License. It can be used in conjunction with the [http://en.wikipedia.org/wiki/Lucene Lucene] text indexing package. | |
[http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project]. | [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/webbase-pages.html#Spider WebVac] is a crawler used by the [http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/ Stanford WebBase Project]. |