View source for Nutch
Jump to:
navigation
,
search
Nutch is Doug Cutting's effort to build an open source search engine based on Lucene Java for the search and index component. The fetcher ("robot" or "web crawler") has been written from scratch solely for this project. Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. These two facilities have been spun out into their own subproject called Hadoop. ==Links== *http://en.wikipedia.org/wiki/Nutch [[Category:Blog Linked Pages]]
Return to
Nutch
.
Navigation menu
Personal tools
Log in
Namespaces
Article
Discussion
Variants
Views
Read
View source
View history
Actions
Search
Navigation
Main Page
Community portal
Current events
Recent changes
Random page
Help
Donations
Toolbox
What links here
Related changes
Special pages
Page information