Franklin

Linguistic structure prediction [electronic resource] / Noah A. Smith.

Author/Creator:
Smith, Noah A.
Publication:
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2011.
Series:
Synthesis digital library of engineering and computer science.
Synthesis lectures on human language technologies, 1947-4059 ; # 13.
Synthesis lectures on human language technologies, 1947-4059 ; # 13
Format/Description:
Book
1 electronic text (xx, 248 p.) : ill., digital file
Subjects:
Natural language processing (Computer science).
Computational linguistics.
Linguistic analysis (Linguistics) -- Data processing.
System Details:
Mode of access: World Wide Web.
Summary:
A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. We also survey natural language processing problems to which these methods are being applied, and we address related topics in probabilistic inference, optimization, and experimental methodology.
Contents:
Preface
Acknowledgments

1. Representations and linguistic data
Sequential prediction
Sequence segmentation
Word classes and sequence labeling
Morphological disambiguation
Chunking
Syntax
Semantics
Coreference resolution
Sentiment analysis
Discourse
Alignment
Text-to-text transformations
Types
Why linguistic structure is a moving target
Conclusion

2. Decoding: making predictions
Definitions
Five views of decoding
Probabilistic graphical models
Polytopes
Parsing with grammars
Graphs and hypergraphs
Weighted logic programs
Dynamic programming
Shortest or minimum-cost path
Semirings
DP as logical deduction
Solving DPs
Approximate search
Reranking and coarse-to-fine decoding
Specialized graph algorithms
Bipartite matchings
Spanning trees
Maximum flow and minimum cut
Conclusion

3. Learning structure from annotated data
Annotated data
Generic formulation of learning
Generative models
Decoding rule
Multinomial-based models
Hidden Markov models
Probabilistic context-free grammars
Other generative multinomial-based models
Maximum likelihood estimation by counting
Maximum a posteriori estimation
Alternative parameterization: log-linear models
Comments
Conditional models
Globally normalized conditional log-linear models
Logistic regression
Conditional random fields
Feature choice
Maximum likelihood estimation
Maximum a posteriori estimation
Pseudolikelihood
Toward discriminative learning
Large margin methods
Binary classification
Perceptron
Multi-class support vector machines
Structural SVM
Optimization
Discussion
Conclusion

4. Learning structure from incomplete data
Unsupervised generative models
Expectation maximization
Word clustering
Hard and soft k-means
The structured case
Hidden Markov models
EM iterations improve likelihood
Extensions and improvements
Log-linear EM
Contrastive estimation
Bayesian unsupervised learning
Empirical Bayes
Latent Dirichlet allocation
EM in the empirical Bayesian setting
Inference
Nonparametric Bayesian methods
Discussion
Hidden variable learning
Generative models with hidden variables
Conditional log-linear models with hidden variables
Large margin methods with hidden variables
Conclusion

5. Beyond decoding: inference
Partition functions: summing over Y
Summing by dynamic programming
Other summing algorithms
Feature expectations
Reverse DPs
Another interpretation of reverse values
From reverse values to expectations
Deriving the reverse DP
Non-DP expectations
Minimum Bayes risk decoding
Cost-augmented decoding
Decoding with hidden variables
Conclusion

A. Numerical optimization
A.1. The hill-climbing analogy
A.2. Coordinate ascent
A.3. Gradient ascent
Subgradient methods
Stochastic gradient ascent
A.4. Conjugate gradient and quasi-Newton methods
Conjugate gradient
Newton's method
Limited memory BFGS
A.5. "Aggressive" online learners
A.6. Improved iterative scaling

B. Experimentation
B.1. Methodology
Training, development, and testing
Cross-validation
Comparison without replication
Oracles and upper bounds
B.2. Hypothesis testing and related topics
Terminology
Standard error
Beyond standard error for sample means
Confidence intervals
Hypothesis tests
Closing notes

C. Maximum entropy

D. Locally normalized conditional models
Probabilistic finite-state automata
Maximum entropy Markov models
Directional effects
Comparison to globally normalized models
Decoding
Theory vs. practice

Bibliography
Author's biography
Index.
Notes:
Part of: Synthesis digital library of engineering and computer science.
Series from website.
Includes bibliographical references (p. 209-240) and index.
ISBN:
9781608454068 (electronic bk.)
9781608454051 (pbk.)
Publisher Number:
10.2200/S00361ED1V01Y201105HLT013 doi
Access Restriction:
Restricted for use by site license.
Loading...
Location Notes Your Loan Policy
Description Status Barcode Your Loan Policy