Nutch is an Apache licensed open source search engine project that I have been keeping an eye on for a while. One thing that makes this project especially compelling is that the author of the (fabulous) Lucene search library Doug Cutting is also a principle designer and implementer of Nutch. You can grab the source code using subversion:
svn co http://svn.apache.org/repos/asf/lucene/nutch/
Nutch now contains two new modules: the Nutch Distributed File System (patterned after the Google File System) and a Java version of MapReduce (patterned after Google’s MapReduce). So far, I have only been looking at the source code (no builds and playing with it yet!) but this stuff looks really good. Anyone want to start a search engine company?