(NOTE: Each chapter ends with Historical Perspective and Further
Reading.)
1. Introduction.
Motivations. Spoken Language System Architecture. Book
Organization. Target Audiences.
I. FUNDAMENTAL THEORY.
2. Spoken language Structure.
Sound and Human Speech Systems. Phonetics and Phonology. Syllables
and Words. Syntax and Semantics.
3. Probability, Statistics, and Information Theory.
Probability Theory. Estimation Theory. Significance Testing.
Information Theory.
4. Pattern Recognition.
Bayes' Decision Theory. How to Construct Classifiers.
Discriminative Training. Unsupervised Estimation Methods.
Classification and Regression Trees.
II. SPEECH PROCESSING.
5. Digital Signal Processing.
Digital Signals and Systems. Continuous-Frequency Transforms.
Discrete-Frequency Transforms. Digital Filters and Windows. Digital
Processing of Analog Signals. Multirate Signal Processing.
Filterbanks. Stochastic Processes.
6. Speech Signal Representations.
Short-Time Fourier Analysis. Acoustical Model of Speech Production.
Linear Predictive Coding. Cepstral Processing. Perceptually
Motivated Representations. Formant Frequencies. The Role of
Pitch.
7. Speech Coding.
Speech Coders Attributes. Scalar Waveform Coders. Scalar Frequency
Domain Coders. Code Excited Linear Prediction (CELP). Low-Brit
Speech Coders.
III. SPEECH RECOGNITION.
8. Hidden Markov Models.
The Markov Chain. Definition of the Hidden Markov Model. Continuous
and Semicontinuous HMMs. Practical Issues in Using HMMs. HMM
Limitations.
9. Acoustic Modeling.
Variability in the Speech Signal. How to Measure Speech Recognition
Errors. Signal Processing—Extracting Features. Phonectic
Modeling—Selecting Appropriate Units. Acoustic Modeling—Scoring
Acoustic Features. Adaptive Techniques—Minimizing Mismatches.
Confidence Measures: Measuring the Reliability. Other Techniques.
Case Study: Whisper.
10. Environmental Robustness.
The Acoustical Environment. Acoustical Transducers. Adaptive Echo
Cancellation (AEC). Multimicrophone Speech Enhancement. Environment
Compensation Preprocessing. Environment Model Adaptation. Modeling
Nonstationary Noise.
11. Language Modeling.
Formal Language Theory. Stochastic Language Models. Complexity
Measure of Language Models. N-Gram Smoothing. Adaptive Language
Models. Practical Issues.
12. Basic Search Algorithms.
Basic Search Algorithms. Search Algorithms for Speech Recognition.
Language Model States. Time-Synchronous Viterbi Beam Search. Stack
Decoding (A Search).
13. Large-Vocabulary Search Algorithms.
Efficient Manipulation of a Tree Lexicon. Other Efficient Search
Techniques. N-Best and Multipass Search Strategies.
Search-Algorithm Evaluation. Case Study—Microsoft Whisper.
IV. TEXT-TO-SPEECH SYSTEMS.
14. Text and Phonetic Analysis.
Modules and Data Flow. Lexicon. Document Structured Detection. Text
Normalization. Linguistic Analysis. Homograph Disambiguation.
Morphological Analysis. Letter-to-Sound Conversion. Evaluation.
Case Study: Festival.
15. Prosody.
The Role of Understanding. Prosody Generation Schematic. Speaking
Style. Symbolic Prosody. Duration Assignment. Pitch Generation.
Prosody Markup Languages. Prosody Evaluation.
16. Speech Synthesis.
Attributes of Speech Synthesis. Formant Speech Synthesis.
Concatenative Speech Synthesis. Prosodic Modification of Speech.
Source-Filter Models for Prosody Modification. Evaluation of TTS
Systems.
V. SPOKEN LANGUAGE SYSTEMS.
17. Spoken Language Understanding.
Written vs. Spoken Languages. Dialog Structure. Semantic
Representation. Sentence Interpretation. Discourse Analysis. Dialog
Management. Response Generation and Rendition. Evaluation. Case
Study—Dr. Who.
18. Applications and User Interfaces.
Application Architecture. Typical Applications. Speech Interface
Design. Internationalization. Case Study—MIPAD.
Index.
XUEDONG HUANG is founder and head of the Speech Technology Group at
Microsoft Research. He received his Ph.D. from the University of
Edinburgh. He is an IEEE Fellow.
ALEX ACERO and HSIAO-WUEN HON are Senior Researchers at Microsoft
Research and Senior Members of IEEE. Both received doctorates from
Carnegie Mellon University.
Foreword by Dr. Raj Reddy, Carnegie Mellon University
Ask a Question About this Product More... |