Categories
Go Back
Hadoop: The Definitive Guide
Author: Tom White

Publisher: O'Reilly
ISBN: 9789350237564
Pages: 704
Add to Booklist
Bookmark and Share
Apache Hadoop is a framework for open-source software and is used for storing and processing large scale data on commodity hardware clusters. Hadoop is a top-level Apache project which is used and contributed to, by people all across the world.

+is one of a kind and caters to the information needed about all the major projects in the Apache Hadoop field. The book is absolute for those programmers who have to deal with datasets of various sizes. It is also suitable for administrators who wish to set up Hadoop clusters and run it.

Hadoop: The Definitive Guide also has detailed instructions for the installation of Apache Hadoop. It also comes with information about Cloudera’s distribution, which includes Apache Hadoop.

This book helps readers to discover the common snags which are encountered frequently. Hadoop: The Definitive Guide talks about how to design a dedicated Hadoop cluster, along with its building and administration. The advanced features available to write MapReduce programs for real world processes are discussed. The uses of Sqoop, ZooKeeper and the Pig Query language is also looked into.

This book contains many chapters such as The Hadoop Distributed Filesystem, Developing a MapReduce Application, MapReduce Types and Formats, and How MapReduce Works. This book basically helps readers in building and maintaining scalable distributed systems in a reliable way. The book also helps the reader in analyzing datasets using Hive.

The problem solving abilities of Hadoop is well demonstrated in Hadoop: The Definitive Guide, by making use of various case studies. In addition to all this, the third edition is up to date with the recent changes brought into Hadoop. It includes new materials about MapReduce API and MapReduce 2. YARN, the flexible execution model is also addressed to in the book.