Showing posts with label Mahout. Show all posts
Showing posts with label Mahout. Show all posts

Sunday, July 04, 2010

Reading two good books on using MapReduce algorithms for large scale text processing

I have a fair amount of experience with Hadoop, but little experience with associated tools like Pig and Mahout. I can spend more time with Pig in my local sandbox but I wanted more formal help getting up to speed with Mahout and general MapReduce application programming. I purchased the MEAP for Mahout In Action, reading new chapters as they are available. The authors (especially Robin Anil) have been very helpful on the online forum for the book, and I have found the material to be useful and interesting.

Another book I bought was just delivered yesterday morning: Data-Intensive Text Processing with MapReduce. I have only read the first few chapters but the book has been very interesting and informative.

I have done some work based on Hadoop for about half the customers I have had in the last year and a half, and I believe that knowing how to horizontally scale out machine learning and text analytics applications has become a must-have skill.