Franklin

Big Data and Social Science : A Practical Guide to Methods and Tools.

Other records:
Author/Creator:
Foster, Ian.
Edition:
1st ed.
Publication:
Philadelphia, PA : CRC Press LLC, 2016.
Series:
Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Ser.
Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Ser.
Format/Description:
Book
1 online resource (377 pages)
Subjects:
Social sciences -- Data processing.
Form/Genre:
Electronic books.
Contents:
Cover
Half Title
Title
Copyright
Contents
Preface
Editors
Contributors
1: Introduction
1.1: Why this book?
1.2: Defining big data and its value
1.3: Social science, inference, and big data
1.4: Social science, data quality, and big data
1.5: New tools for new data
1.6: The book's "use case"
1.7: The structure of the book
1.7.1: Part I: Capture and curation
1.7.2: Part II: Modeling and analysis
1.7.3: Part III: Inference and ethics
1.8: Resources
I: Capture and Curation
2: Working with Web Data and APIs
2.1: Introduction
2.2: Scraping information from the web
2.2.1: Obtaining data from the HHMI website
2.2.2: Limits of scraping
2.3: New data in the research enterprise
2.4: A functional view
2.4.1: Relevant APIs and resources
2.4.2: RESTful APIs, returned data, and Python wrappers
2.5: Programming against an API
2.6: Using the ORCID API via a wrapper
2.7: Quality, scope, and management
2.8: Integrating data from multiple sources
2.8.1: The Lagotto API
2.8.2: Working with a corpus
2.9: Working with the graph of relationships
2.9.1: Citation links between articles
2.9.2: Categories, sources, and connections
2.9.3: Data availability and completeness
2.9.4: The value of sparse dynamic data
2.10: Bringing it together: Tracking pathways to impact
2.10.1: Network analysis approaches
2.10.2: Future prospects and new data sources
2.11: Summary
2.12: Resources
2.13: Acknowledgements and copyright
3: Record Linkage
3.1: Motivation
3.2: Introduction to record linkage
3.3: Preprocessing data for record linkage
3.4: Indexing and blocking
3.5: Matching
3.5.1: Rule-based approaches
3.5.2: Probabilistic record linkage
3.5.3: Machine learning approaches to linking
3.5.4: Disambiguating networks.
3.6: Classification
3.6.1: Thresholds
3.6.2: One-to-one links
3.7: Record linkage and data protection
3.8: Summary
3.9: Resources
4: Databases
4.1: Introduction
4.2: DBMS: When and why
4.3: Relational DBMSs
4.3.1: Structured Query Language (SQL)
4.3.2: Manipulating and querying data
4.3.3: Schema design and definition
4.3.4: Loading data
4.3.5: Transactions and crash recovery
4.3.6: Database optimizations
4.3.7: Caveats and challenges
4.4: Linking DBMSs and other tools
4.5: NoSQL databases
4.5.1: Challenges of scale: The CAP theorem
4.5.2: NoSQL and key-value stores
4.5.3: Other NoSQL databases
4.6: Spatial databases
4.7: Which database to use?
4.7.1: Relational DBMSs
4.7.2: NoSQL DBMSs
4.8: Summary
4.9: Resources
5: Programming with Big Data
5.1: Introduction
5.2: The MapReduce programming model
5.3: Apache Hadoop MapReduce
5.3.1: The Hadoop Distributed File System
5.3.2: Hadoop: Bringing compute to the data
5.3.3: Hardware provisioning
5.3.4: Programming language support
5.3.5: Fault tolerance
5.3.6: Limitations of Hadoop
5.4: Apache Spark
5.5: Summary
5.6: Resources
II: Modeling and Analysis
6: Machine Learning
6.1: Introduction
6.2: What is machine learning?
6.3: The machine learning process
6.4: Problem formulation: Mapping a problem to machine learning methods
6.5: Methods
6.5.1: Unsupervised learning methods
6.5.2: Supervised learning
6.6: Evaluation
6.6.1: Methodology
6.6.2: Metrics
6.7: Practical tips
6.7.1: Features
6.7.2: Machine learning pipeline
6.7.3: Multiclass problems
6.7.4: Skewed or imbalanced classification problems
6.8: How can social scientists benefit from machine learning?
6.9: Advanced topics
6.10: Summary
6.11: Resources
7: Text Analysis.
7.1: Understanding what people write
7.2: How to analyze text
7.2.1: Processing text data
7.2.2: How much is a word worth?
7.3: Approaches and applications
7.3.1: Topic modeling
7.3.1.1: Inferring topics from raw text
7.3.1.2: Applications of topic models
7.3.2: Information retrieval and clustering
7.3.3: Other approaches
7.4: Evaluation
7.5: Text analysis tools
7.6: Summary
7.7: Resources
8: Networks: The Basics
8.1: Introduction
8.2: Network data
8.2.1: Forms of network data
8.2.2: Inducing one-mode networks from two-mode data
8.3: Network measures
8.3.1: Reachability
8.3.2: Whole-network measures
8.4: Comparing collaboration networks
8.5: Summary
8.6: Resources
III: Inference and Ethics
9: Information Visualization
9.1: Introduction
9.2: Developing effective visualizations
9.3: A data-by-tasks taxonomy
9.3.1: Multivariate data
9.3.2: Spatial data
9.3.3: Temporal data
9.3.4: Hierarchical data
9.3.5: Network data
9.3.6: Text data
9.4: Challenges
9.4.1: Scalability
9.4.2: Evaluation
9.4.3: Visual impairment
9.4.4: Visual literacy
9.5: Summary
9.6: Resources
10: Errors and Inference
10.1: Introduction
10.2: The total error paradigm
10.2.1: The traditional model
10.2.2: Extending the framework to big data
10.3: Illustrations of errors in big data
10.4: Errors in big data analytics
10.4.1: Errors resulting from volume, velocity, and variety, assuming perfect veracity
10.4.2: Errors resulting from lack of veracity
10.4.2.1: Variable and correlated error
10.4.2.2: Models for categorical data
10.4.2.3: Misclassification and rare classes
10.4.2.4: Correlation analysis
10.4.2.5: Regression analysis
10.5: Some methods for mitigating, detecting, and compensating for errors
10.6: Summary.
10.7: Resources
11: Privacy and Confidentiality
11.1: Introduction
11.2: Why is access important?
11.3: Providing access
11.4: The new challenges
11.5: Legal and ethical framework
11.6: Summary
11.7: Resources
12: Workbooks
12.1: Introduction
12.2: Environment
12.2.1: Running workbooks locally
12.2.2: Central workbook server
12.3: Workbook details
12.3.1: Social Media and APIs
12.3.2: Database basics
12.3.3: Data Linkage
12.3.4: Machine Learning
12.3.5: Text Analysis
12.3.6: Networks
12.3.7: Visualization
12.4: Resources
Bibliography
Index.
Notes:
Description based on publisher supplied metadata and other sources.
Local notes:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2021. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Contributor:
Ghani, Rayid.
Jarmin, Ron S.
Kreuter, Frauke.
Lane, Julia.
Other format:
Print version: Foster, Ian Big Data and Social Science
ISBN:
9781498751414
9781498751407
OCLC:
959149123
Loading...
Location Notes Your Loan Policy
Description Status Barcode Your Loan Policy