Hadoop backup and recovery solutions : learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems / Gaurav Barot, Chintan Mehta, Amij Patel.

Barot, Gaurav, author.
1st edition
Birmingham, England ; Mumbai, [India] : Packt Publishing, 2015.
Community experience distilled.
Community Experience Distilled
1 online resource (206 p.)
Apache Hadoop.
Electronic data processing -- Distributed processing.
Electronic books.
System Details:
text file
If you are a Hadoop administrator and you want to get a good grounding in how to back up large amounts of data and manage Hadoop clusters, then this book is for you.
""Cover""; ""Copyright""; ""Credits""; ""About the Authors""; ""About the Reviewers""; """"; ""Table of Contents""; ""Preface""; ""Chapter 1: Knowing Hadoop and Clustering Basics""; ""Understanding the need for Hadoop""; ""Apache Hive""; ""Apache Pig""; ""Apache HBase""; ""Apache HCatalog""; ""Understanding HDFS design""; ""Getting familiar with HDFS daemons""; ""Scenario 1 � writing data to the HDFS cluster""; ""Scenario 2 � reading data from the HDFS cluster""; ""Understanding the basics of Hadoop cluster""; ""Summary""
""Chapter 2: Understanding Hadoop Backup and Recovery Needs""""Understanding the backup and recovery philosophies""; ""Replication of data using DistCp""; ""Updating and overwriting using DistCp""; ""The backup philosophy""; ""Changes since the last backup""; ""The rate of new data arrival""; ""The size of the cluster""; ""Priority of the datasets""; ""Selecting the datasets or parts of datasets""; ""The timelines of data backups""; ""Reducing the window of possible data loss""; ""Backup consistency""; ""Avoiding invalid backups""; ""The recovery philosophy""
""Knowing the necessity of backing up Hadoop""""Determining backup areas � what should I back up?""; ""Datasets""; ""Block size � a large file divided into blocks""; ""Replication factor""; ""A list of all the blocks of a file""; ""A list of DataNodes for each block � sorted by distance""; ""The ACK package""; ""The checksums""; ""The number of under-replicated blocks""; ""The secondary NameNode""; ""Active and passive nodes in second generation Hadoop""; ""Hardware failure""; ""Software failure""; ""Applications""; ""Configurations""; ""Is taking backup enough?""
""Understanding the disaster recovery principle""""Knowing a disaster""; ""The need for recovery""; ""Understanding recovery areas""; ""Summary""; ""Chapter 3: Determining Backup Strategies""; ""Knowing the areas to be protected""; ""Understanding the common failure types""; ""Hardware failure""; ""Host failure""; ""Using commodity hardware""; ""Hardware failures may lead to loss of data""; ""User application failure""; ""Software causing task failure""; ""Failure of slow-running tasks""; ""Hadoop's handling of failing tasks""; ""Task failure due to data""
""Bad data handling � through code""""Hadoop's skip mode""; ""Learning a way to define the backup strategy""; ""Why do I need a strategy?""; ""What should be considered in a strategy?""; ""Filesystem check (fsck)""; ""Filesystem balancer""; ""Upgrading your Hadoop cluster""; ""Designing network layout and rack awareness""; ""Most important areas to consider while defining a backup strategy""; ""Understanding the need for backing up Hive metadata""; ""What is Hive?""; ""Hive replication""; ""Summary""; ""Chapter 4: Backing Up Hadoop""; ""Data backup in Hadoop""; ""Distributed copy""
""Architectural approach to backup""
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed August 5, 2015).
Mehta, Chintan, author.
Patel, Amij, author.
Location Notes Your Loan Policy
Description Status Barcode Your Loan Policy