menu 1
menu 2
menu 3
menu 4

dg.o Web

Project Profile:  
Find: Institutions | Projects | Researchers | Logout
ITR Large: MALACH: Multilingual Access to Large spoken ArChives      (Back to Search Results)

Grant Number: 122466

  • Description: Continuing grant
  • Associated Project:
  • Award Date:
  • Award Period: 2001-10-01 to 2005-09-30
  • Amount: $ 6823719.00

Primary Investigator:
Sam Gustman

Sam Gustman


Government Domain:

Primary Institution:
Survivors of the Shoah Visual Histo

Project Home Page:

Latest Project Highlight:

The project proposes to advance the state of the art in automatic speech recognition by detecting emotional and highly accented speech and differences based on age and gender, and then optimizing the acoustic model for those conditions. It will apply long-term adaptation techniques to improve robustness, and will implement an innovative second-pass decoding technique to improve accuracy by using side information such as thesaurus terms and human-prepared summaries. The techniques to be developed will dramatically improve the efficiency of professional catalogers, leveraging automatic segmentation to suggest topic boundaries in interviews, using domain-tuned classification algorithms to recommend thesaurus terms, and providing automated tools to support generation of event timelines. Volunteers will help assign metadata and provide transcripts. Efforts will be made to automate transferring capabilities developed originally for English to other languages. Access to multilingual materials will be done by combining knowledge-based and corpus-based techniques to extend existing thesauri to new languages and by supporting cross-language searching of manually prepared segment-level summaries and automatic speech recognition transcripts. Each component will be evaluated and user studies done to measure the overall impact on support for cataloging, search and exploration. This will produce significant impact, both through improved access to our cultural heritage and through the application of the techniques to other important problems. The collection of spoken material used in this project will be the 116,000 hours held by the Survivors of the Shoah Visual History Foundation, a set of already digitized video recordings of great historical importance.