Distant Speech Recognition.

Other records:
Woelfel, Matthias.
1st ed.
New York : John Wiley & Sons, Incorporated, 2009.
1 online resource (595 pages)
Automatic speech recognition.
Pattern perception.
Electronic books.
A complete overview of distant automatic speech recognition The performance of conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This is due to a broad variety of effects such as background noise, overlapping speech from other speakers, and reverberation. While traditional ASR systems underperform for speech captured with far-field sensors, there are a number of novel techniques within the recognition system as well as techniques developed in other areas of signal processing that can mitigate the deleterious effects of noise and reverberation, as well as separating speech from overlapping speakers. Distant Speech Recognitionpresents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant ASR problem. Key Features: Covers the entire topic of distant ASR and offers practical solutions to overcome the problems related to it Provides documentation and sample scripts to enable readers to construct state-of-the-art distant speech recognition systems Gives relevant background information in acoustics and filter techniques, Explains the extraction and enhancement of classification relevant speech features Describes maximum likelihood as well as discriminative parameter estimation, and maximum likelihood normalization techniques Discusses the use of multi-microphone configurations for speaker tracking and channel combination Presents several applications of the methods and technologies described in this book Accompanying website with open source software and tools to construct state-of-the-art distant speech recognition systems This reference will be an invaluable resource for researchers, developers, engineers and other professionals, as well as advanced students in speech technology,
signal processing, acoustics, statistics and artificial intelligence fields.
Distant Speech Recognition
1 Introduction
1.1 Research and Applications in Academia and Industry
1.1.1 Intelligent Home and Office Environments
1.1.2 Humanoid Robots
1.1.3 Automobiles
1.1.4 Speech-to-Speech Translation
1.2 Challenges in Distant Speech Recognition
1.3 System Evaluation
1.4 Fields of Speech Recognition
1.5 Robust Perception
1.5.1 A Priori Knowledge
1.5.2 Phonemic Restoration and Reliability
1.5.3 Binaural Masking Level Difference
1.5.4 Multi-Microphone Processing
1.5.5 Multiple Sources by Different Modalities
1.6 Organizations, Conferences and Journals
1.7 Useful Tools, Data Resources and Evaluation Campaigns
1.8 Organization of this Book
1.9 Principal Symbols used Throughout the Book
1.10 Units used Throughout the Book
2 Acoustics
2.1 Physical Aspect of Sound
2.1.1 Propagation of Sound in Air
2.1.2 The Speed of Sound
2.1.3 Wave Equation and Velocity Potential
2.1.4 Sound Intensity and Acoustic Power
2.1.5 Reflections of Plane Waves
2.1.6 Reflections of Spherical Waves
2.2 Speech Signals
2.2.1 Production of Speech Signals
2.2.2 Units of Speech Signals
2.2.3 Categories of Speech Signals
2.2.4 Statistics of Speech Signals
2.3 Human Perception of Sound
2.3.1 Phase Insensitivity
2.3.2 Frequency Range and Spectral Resolution
2.3.3 Hearing Level and Speech Intensity
2.3.4 Masking
2.3.5 Binaural Hearing
2.3.6 Weighting Curves
2.3.7 Virtual Pitch
2.4 The Acoustic Environment
2.4.1 Ambient Noise
2.4.2 Echo and Reverberation
2.4.3 Signal-to-Noise and Signal-to-Reverberation Ratio
2.4.4 An Illustrative Comparison between Close and Distant Recordings
2.4.5 The Influence of the Acoustic Environment on Speech Production
2.4.6 Coloration.
2.4.7 Head Orientation and Sound Radiation
2.4.8 Expected Distances between the Speaker and the Microphone
2.5 Recording Techniques and Sensor Configuration
2.5.1 Mechanical Classification of Microphones
2.5.2 Electrical Classification of Microphones
2.5.3 Characteristics of Microphones
2.5.4 Microphone Placement
2.5.5 Microphone Amplification
2.6 Summary and Further Reading
2.7 Principal Symbols
3 Signal Processing and Filtering Techniques
3.1 Linear Time-Invariant Systems
3.1.1 Time Domain Analysis
3.1.2 Frequency Domain Analysis
3.1.3 z-Transform Analysis
3.1.4 Sampling Continuous-Time Signals
3.2 The Discrete Fourier Transform
3.2.1 Realizing LTI Systems with the DFT
3.2.2 Overlap-Add Method
3.2.3 Overlap-Save Method
3.3 Short-Time Fourier Transform
3.4 Summary and Further Reading
3.5 Principal Symbols
4 Bayesian Filters
4.1 Sequential Bayesian Estimation
4.2 Wiener Filter
4.2.1 Time Domain Solution
4.2.2 Frequency Domain Solution
4.3 Kalman Filter and Variations
4.3.1 Kalman Filter
4.3.2 Extended Kalman Filter
4.3.3 Iterated Extended Kalman Filter
4.3.4 Numerical Stability
4.3.5 Probabilistic Data Association Filter
4.3.6 Joint Probabilistic Data Association Filter
4.4 Particle Filters
4.4.1 Approximation of Probabilistic Expectations
4.4.2 Sequential Monte Carlo Methods
4.5 Summary and Further Reading
4.6 Principal Symbols
5 Speech Feature Extraction
5.1 Short-Time Spectral Analysis
5.1.1 Speech Windowing and Segmentation
5.1.2 The Spectrogram
5.2 Perceptually Motivated Representation
5.2.1 Spectral Shaping
5.2.2 Bark and Mel Filter Banks
5.2.3 Warping by Bilinear Transform - Time vs Frequency Domain
5.3 Spectral Estimation and Analysis
5.3.1 Power Spectrum
5.3.2 Spectral Envelopes.
5.3.3 LP Envelope
5.3.4 MVDR Envelope
5.3.5 Perceptual LP Envelope
5.3.6 Warped LP Envelope
5.3.7 Warped MVDR Envelope
5.3.8 Warped-Twice MVDR Envelope
5.3.9 Comparison of Spectral Estimates
5.3.10 Scaling of Envelopes
5.4 Cepstral Processing
5.4.1 Definition and Characteristics of Cepstral Sequences
5.4.2 Homomorphic Deconvolution
5.4.3 Calculating Cepstral Coefficients
5.5 Comparison between Mel Frequency, Perceptual LP and warped MVDR Cepstral Coefficient Front-Ends
5.6 Feature Augmentation
5.6.1 Static and Dynamic Parameter Augmentation
5.6.2 Feature Augmentation by Temporal Patterns
5.7 Feature Reduction
5.7.1 Class Separability Measures
5.7.2 Linear Discriminant Analysis
5.7.3 Heteroscedastic Linear Discriminant Analysis
5.8 Feature-Space Minimum Phone Error
5.9 Summary and Further Reading
5.10 Principal Symbols
6 Speech Feature Enhancement
6.1 Noise and Reverberation in Various Domains
6.1.1 Frequency Domain
6.1.2 Power Spectral Domain
6.1.3 Logarithmic Spectral Domain
6.1.4 Cepstral Domain
6.2 Two Principal Approaches
6.3 Direct Speech Feature Enhancement
6.3.1 Wiener Filter
6.3.2 Gaussian and Super-Gaussian MMSE Estimation
6.3.3 RASTA Processing
6.3.4 Stereo-Based Piecewise Linear Compensation for Environments
6.4 Schematics of Indirect Speech Feature Enhancement
6.5 Estimating Additive Distortion
6.5.1 Voice Activity Detection-Based Noise Estimation
6.5.2 Minimum Statistics Noise Estimation
6.5.3 Histogram- and Quantile-Based Methods
6.5.4 Estimation of the a Posteriori and a Priori Signal-to-Noise Ratio
6.6 Estimating Convolutional Distortion
6.6.1 Estimating Channel Effects
6.6.2 Measuring the Impulse Response
6.6.3 Harmful Effects of Room Acoustics
6.6.4 Problem in Speech Dereverberation.
6.6.5 Estimating Late Reflections
6.7 Distortion Evolution
6.7.1 Random Walk
6.7.2 Semi-random Walk by Polyak Averaging and Feedback
6.7.3 Predicted Walk by Static Autoregressive Processes
6.7.4 Predicted Walk by Dynamic Autoregressive Processes
6.7.5 Predicted Walk by Extended Kalman Filters
6.7.6 Correlated Prediction Error Covariance Matrix
6.8 Distortion Evaluation
6.8.1 Likelihood Evaluation
6.8.2 Likelihood Evaluation by a Switching Model
6.8.3 Incorporating the Phase
6.9 Distortion Compensation
6.9.1 Spectral Subtraction
6.9.2 Compensating for Channel Effects
6.9.3 Distortion Compensation for Distributions
6.10 Joint Estimation of Additive and Convolutional Distortions
6.11 Observation Uncertainty
6.12 Summary and Further Reading
6.13 Principal Symbols
7 Search: Finding the Best Word Hypothesis
7.1 Fundamentals of Search
7.1.1 Hidden Markov Model: Definition
7.1.2 Viterbi Algorithm
7.1.3 Word Lattice Generation
7.1.4 Word Trace Decoding
7.2 Weighted Finite-State Transducers
7.2.1 Definitions
7.2.2 Weighted Composition
7.2.3 Weighted Determinization
7.2.4 Weight Pushing
7.2.5 Weighted Minimization
7.2.6 Epsilon Removal
7.3 Knowledge Sources
7.3.1 Grammar
7.3.2 Pronunciation Lexicon
7.3.3 Hidden Markov Model
7.3.4 Context Dependency Decision Tree
7.3.5 Combination of Knowledge Sources
7.3.6 Reducing Search Graph Size
7.4 Fast On-the-Fly Composition
7.5 Word and Lattice Combination
7.6 Summary and Further Reading
7.7 Principal Symbols
8 Hidden Markov Model Parameter Estimation
8.1 Maximum Likelihood Parameter Estimation
8.1.1 Gaussian Mixture Model Parameter Estimation
8.1.2 Forward-Backward Estimation
8.1.3 Speaker-Adapted Training
8.1.4 Optimal Regression Class Estimation.
8.1.5 Viterbi and Label Training
8.2 Discriminative Parameter Estimation
8.2.1 Conventional Maximum Mutual Information Estimation Formulae
8.2.2 Maximum Mutual Information Training on Word Lattices
8.2.3 Minimum Word and Phone Error Training
8.2.4 Maximum Mutual Information Speaker-Adapted Training
8.3 Summary and Further Reading
8.4 Principal Symbols
9 Feature and Model Transformation
9.1 Feature Transformation Techniques
9.1.1 Vocal Tract Length Normalization
9.1.2 Constrained Maximum Likelihood Linear Regression
9.2 Model Transformation Techniques
9.2.1 Maximum Likelihood Linear Regression
9.2.2 All-Pass Transform Adaptation
9.3 Acoustic Model Combination
9.3.1 Combination of Gaussians in the Logarithmic Domain
9.4 Summary and Further Reading
9.5 Principal Symbols
10 Speaker Localization and Tracking
10.1 Conventional Techniques
10.1.1 Spherical Intersection Estimator
10.1.2 Spherical Interpolation Estimator
10.1.3 Linear Intersection Estimator
10.2 Speaker Tracking with the Kalman Filter
10.2.1 Implementation Based on the Cholesky Decomposition
10.3 Tracking Multiple Simultaneous Speakers
10.4 Audio-Visual Speaker Tracking
10.5 Speaker Tracking with the Particle Filter
10.5.1 Localization Based on Time Delays of Arrival
10.5.2 Localization Based on Steered Beamformer Response Power
10.6 Summary and Further Reading
10.7 Principal Symbols
11 Digital Filter Banks
11.1 Uniform Discrete Fourier Transform Filter Banks
11.2 Polyphase Implementation
11.3 Decimation and Expansion
11.4 Noble Identities
11.5 Nyquist(M) Filters
11.6 Filter Bank Design of De Haan et al.
11.6.1 Analysis Prototype Design
11.6.2 Synthesis Prototype Design
11.7 Filter Bank Design with the Nyquist(M) Criterion
11.7.1 Analysis Prototype Design.
11.7.2 Synthesis Prototype Design.
Description based on publisher supplied metadata and other sources.
Local notes:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2021. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
McDonough, John.
Woelfel, Matthias, Matthias.
McDonough, Dr Dr John.
Other format:
Print version: Woelfel, Matthias Distant Speech Recognition
Location Notes Your Loan Policy
Description Status Barcode Your Loan Policy