Peter Norvig's recent talk at UC Berkeley discussed how the effects of large data sets and increasing computer resources make it possible to achieve increasingly better modeling and predictive results. Well worth an hour to listen to.
There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.