Skip to Secondary Navigation | Skip To Content

525.747 - Speech Processing Course Homepage

Instructor Information

John Carmody

Email: carmody@erols.com
Work Phone: (443) 255-6444
Home Phone: (410) 750-1019

Dr. Carmody has thirty years of experience in industry developing human speech processing systems and communications systems.  He has been described as a "blue collar" PhD becuase he has worked every aspect from conceptual design through deployment and support.

Course Information

Course Description

This course emphasizes processing of the human speech waveform, primarily using digital techniques. Theory of speech production and speech perception as related to signals in time and frequency-domains is covered, as well as the measurement of model parameters, short-time Fourier spectrum, and linear predictor coefficients. Speech coding, recognition, speech synthesis, and speaker identification are discussed. Application areas include telecommunications telephony, INTERNET VOIP, and man-machine interfaces. Considerations for embedded realization of the speech processing system will be covered as time permits. Several application-oriented software projects will be required.

Prerequisites

525.427 Digital Signal Processing and 525.414 Probability and Stochastic Processes for Engineers. Background in Linear Algebra and Matlab helpful.

Course Goal

The course is intended to provide the student with the understanding of the various aspects of Human Language Technology and the various signal processing tools available for exploitation.

The course covers the human physiological aspect of hearing and speech production, modeling those functions, speech coding, and speech recognition.

Course Objectives

  • One of the primary objectives of the course to to gain enough understanding to enable the student to pursue further research and development, including independent reading and contributions inthe field.
  • Specific topics include:

    1.  The physics of acoustic waves,
    2. Components of human speech,
    3. Human hearing and speech systems and mathematical models,
    4. Speech analysis and synthesis,
    5. Transmission of speech and audio data,
    6. Quality measures and coders,
    7. Aids to the handicapped,
    8. Speech recognition overview,
    9. Large vocabulary continuous speech recognition,
    10. Speaker identification and verification,
    11. Language identification,
    12. Text to speech, including prosody,
    13. Speech enhancement, and
    14. Speaker normalization.
  • Applications and connections to Natural Language Processing will also be covered.

When This Course is Typically Offered

This course is offered in the spring at the Dorsey Center.

Syllabus

Topics Covered

  • Models of the human speech production system
  • Models of the human hearing systems
  • Speech signal analysis tools
  • Speech coding
  • Recognition
  • Enhancement of speech
  • Prosody
  • Text to speech
  • Vector quantization
  • Tools for spectral analysis

Student Assessment Criteria

Homework 5%
Midterm [at home]] 15%
Individual project 30%
Group project 25%
Final [at home] 25%

Homeworks include analytic solutions, MATLAB problems, reviewing literature, and researching topics.

Computer and Technical Requirements

A basic knowledge of MATLAB is helpful, but not required.

Participation Expectations

Students are expected to participate in class and be responsible for all material.

Textbooks

Textbook information for this course is available online through the MBS Direct Virtual Bookstore.

Course Notes

There are notes for this course.

Final Words from the Instructor

Speech processing is a passion with me.  I hope I can show the student how diverse and exciting this field is.

As in all my courses, I try to provide skills that can be used in many areas, producing better engineers.

(Last Modified: 01-17-2009 at 8:49:00 PM)