Franklin

Data Warehousing in the Age of Big Data.

Author/Creator:
Krishnan, Krish.
Publication:
San Francisco : Elsevier Science & Technology, 2013.
Format/Description:
Book
1 online resource (371 pages)
Series:
The Morgan Kaufmann Series on Business Intelligence Ser.
The Morgan Kaufmann Series on Business Intelligence Ser.
Status/Location:
Loading...

Options
Location Notes Your Loan Policy

Details

Other records:
Subjects:
Data warehousing.
Big data.
Form/Genre:
Electronic books.
Summary:
Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data-ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. Learn how to leverage Big Data by effectively integrating it into your data warehouse. Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements.
Contents:
Front Cover
Data Warehousing in the Age of Big Data
Copyright Page
Contents
Acknowledgments
About the Author
Introduction
Part 1: Big Data
Part 2: The Data Warehousing
Part 3: Building the Big Data - Data Warehouse
Appendixes
Companion website
1 BIG DATA
1 Introduction to Big Data
Introduction
Big Data
Defining Big Data
Why Big Data and why now?
Big Data example
Social Media posts
Survey data analysis
Survey data
Weather data
Twitter data
Integration and analysis
Additional data types
Summary
Further reading
2 Working with Big Data
Introduction
Data explosion
Data volume
Machine data
Application log
Clickstream logs
External or third-party data
Emails
Contracts
Geographic information systems and geo-spatial data
Example: Funshots, Inc.
Data velocity
Amazon, Facebook, Yahoo, and Google
Sensor data
Mobile networks
Social media
Data variety
Summary
3 Big Data Processing Architectures
Introduction
Data processing revisited
Data processing techniques
Data processing infrastructure challenges
Storage
Transportation
Processing
Speed or throughput
Shared-everything and shared-nothing architectures
Shared-everything architecture
Shared-nothing architecture
OLTP versus data warehousing
Big Data processing
Infrastructure explained
Data processing explained
Telco Big Data study
Infrastructure
Data processing
4 Introducing Big Data Technologies
Introduction
Distributed data processing
Big Data processing requirements
Technologies for Big Data processing
Google file system
Hadoop
Hadoop core components
HDFS
HDFS architecture
NameNode
DataNodes
Image
Journal
Checkpoint
HDFS startup
Block allocation and storage in HDFS.
HDFS client
Replication and recovery
Communication and management
Heartbeats
CheckpointNode and BackupNode
CheckpointNode
BackupNode
File system snapshots
JobTracker and TaskTracker
MapReduce
MapReduce programming model
MapReduce program design
MapReduce implementation architecture
MapReduce job processing and management
MapReduce limitations (Version 1, Hadoop MapReduce)
MapReduce v2 (YARN)
YARN scalability
Comparison between MapReduce v1 and v2
SQL/MapReduce
Zookeeper
Zookeeper features
Locks and processing
Failure and recovery
Pig
Programming with pig latin
Pig data types
Running pig programs
Pig program flow
Common pig command
HBase
HBase architecture
HBase components
Write-ahead log
Hive
Hive architecture
Infrastructure
Execution: how does hive process queries?
Hive data types
Hive query language (HiveQL)
Chukwa
Flume
Oozie
HCatalog
Sqoop
Sqoop1
Sqoop2
Hadoop summary
NoSQL
CAP theorem
Key-value pair: Voldemort
Column family store: Cassandra
Data model
Data partitioning
Data sorting
Consistency management
Write consistency
Read consistency
Specifying client consistency levels
Built-in consistency repair features
Cassandra ring architecture
Data placement
Data partitioning
Peer-to-Peer: simple scalability
Gossip protocol: node management
Document database: Riak
Graph databases
NoSQL summary
Textual ETL processing
Further reading
5 Big Data Driving Business Value
Introduction
Case study 1: Sensor data
Summary
Vestas
Overview
Producing electricity from wind
Turning climate into capital
Tackling Big Data challenges
Maintaining energy efficiency in its data center
Case study 2: Streaming data
Summary.
Surveillance and security: TerraEchos
The need
The solution
The benefit
Advanced fiber optics combine with real-time streaming data
Solution components
Extending the security perimeter creates a strategic advantage
Correlating sensor data delivers a zero false-positive rate
Case study 3: The right prescription: improving patient outcomes with Big Data analytics
Summary
Business objective
Challenges
Overview: giving practitioners new insights to guide patient care
Challenges: blending traditional data warehouse ecosystems with Big Data
Solution: getting ready for Big Data analytics
Results: eliminating the "Data Trap"
Why aster?
About aurora
Case study 4: University of Ontario, institute of technology: leveraging key data to provide proactive patient care
Summary
Overview
Business benefits
Making better use of the data resource
Smarter healthcare
Solution components
Merging human knowledge and technology
Broadening the impact of artemis
Case study 5: Microsoft SQL server customer solution
Customer profile
Solution spotlight
Business needs
Solution
Benefits
Speed efficiency and cut costs
Increases insight and advantage
Facilitates innovation
Case study 6: Customer-centric data integration
Overview
Solution design
Enabling a better cross-sell and upsell opportunity
Example
Summary
2 THE DATA WAREHOUSING
6 Data Warehousing Revisited
Introduction
Traditional data warehousing, or data warehousing 1.0
Data architecture
Infrastructure
Pitfalls of data warehousing
Performance
Scalability
Architecture approaches to building a data warehouse
Pros and cons of information factory approach
Pros and cons of datamart BUS architecture approach
Data warehouse 2.0
Overview of Inmon's DW 2.0.
Overview of DSS 2.0
Summary
Further reading
7 Reengineering the Data Warehouse
Introduction
Enterprise data warehouse platform
Transactional systems
Operational data store
Staging area
Data warehouse
Datamarts
Analytical databases
Issues with the data warehouse
Choices for reengineering the data warehouse
Replatforming
Platform engineering
Data engineering
Modernizing the data warehouse
Case study of data warehouse modernization
Current-state analysis
Recommendations
Business benefits of modernization
The appliance selection process
Request For Information/Request For Proposal (RFI/RFP)
Vendor information
Product information
Scorecard
Proof of concept process
Program roadmap
Modernization ROI
Additional benefits
Summary
8 Workload Management in the Data Warehouse
Introduction
Current state
Defining workloads
Understanding workloads
Data warehouse outbound
End-user application
Data outbound to users
Data inbound from users
Datamarts
Data outbound to users
Data inbound from users
Analytical databases
Data warehouse inbound
Data warehouse processing overheads
Query classification
Wide/Wide
Wide/Narrow
Narrow/Wide
Narrow/Narrow
Unstructured/semi-structured data
ETL and CDC workloads
Measurement
Current system design limitations
New workloads and Big Data
Big Data workloads
Technology choices
Summary
9 New Technologies Applied to Data Warehousing
Introduction
Data warehouse challenges revisited
Data loading
Availability
Data volumes
Storage performance
Query performance
Data transport
Data warehouse appliance
Appliance architecture
Data distribution in the appliance
Key best practices for deploying a data warehouse appliance.
Big Data appliances
Cloud computing
Infrastructure as a service
Platform as a service
Software as a service
Cloud infrastructure
Benefits of cloud computing for data warehouse
Issues facing cloud computing for data warehouse
Data virtualization
What is data virtualization?
Increasing business intelligence performance
Workload distribution
Implementing a data virtualization program
Pitfalls to avoid when using data virtualization
In-memory technologies
Benefits of in-memory architectures
Summary
Further reading
3 BUILDING THE BIG DATA - DATA WAREHOUSE
10 Integration of Big Data and Data Warehousing
Introduction
Components of the new data warehouse
Data layer
Algorithms
Technology layer
Integration strategies
Data-driven integration
Data classification
Architecture
Workload
Analytics
Physical component integration and architecture
Data loading
Data availability
Data volumes
Storage performance
Operational costs
External data integration
Hadoop &
RDBMS
Big Data appliances
Data virtualization
Semantic framework
Lexical processing
Clustering
Semantic knowledge processing
Information extraction
Visualization
Summary
11 Data-Driven Architecture for Big Data
Introduction
Metadata
Technical metadata
Business metadata
Contextual metadata
Process design-level metadata
Program-level metadata
Infrastructure metadata
Core business metadata
Operational metadata
Business intelligence metadata
Master data management
Processing data in the data warehouse
Processing complexity of Big Data
Processing limitations
Processing Big Data
Gather stage
Analysis stage
Process stage
Context processing
Metadata, master data, and semantic linkage.
Types of probabilistic links.
Notes:
Description based on publisher supplied metadata and other sources.
Local notes:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2021. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Other format:
Print version: Krishnan, Krish Data Warehousing in the Age of Big Data
ISBN:
9780124059207
9780124058910
OCLC:
843860813