menu 1
menu 2
menu 3
menu 4

dg.o Web

DG Research Team Is Developing a Digital Interpreter
Machine-learning Is Used to Build Ontologies for Translating Data Across International Borders
For the DGRC


  Eduard Hovy
  Avigdor Gal

Project home: QUALEG

Last spring's 2004 Digital Government Conference kicked off with a one-day workshop session between American Digital Government researchers and their European Union counterparts. The group produced a white paper entitled, "International Collaboration in eGovernment Research," which laid out principles and goals for joint projects.

One of the first examples of those principles in action is a collaboration between Eduard Hovy of USC's Information Sciences Institute (ISI) and Professor Avigdor Gal of the Technion, Israel's Institute of Technology. They will be providing part of the technological infrastructure for an EU project called "QUALEG," an acronym for "Quality of Service and Legitimacy in eGovernment," a multi-institution initiative to improve the delivery of government services.

QUALEG is a classic example of the challenges Europe faces as it creates a multi-national federated system. The countries directly involved are Poland, France and Germany. That means integrating documents written in at least three languages, most likely four, since many European documents also have English versions. It means combining different legacy software as well. It is a where-do-you-start nightmare, as tricky as trying to re-fold a map in a convertible on the Autobahn.

One of the first steps in the process is the creation of an ontology, a cross-lingual thesaurus that provides a thematic structure for all the terms one is likely to encounter. Ontologies need to be semantically sensitive: "bank" near "river" means something quite different from "bank" near "teller."

"In the context of Digital Government, ontologies play an increasingly important role, as database metadata schemas, terminology standardization structures and the foundation for interfaces between applications," says Hovy. "Yet the complexity and cost of building ontologies remains a daunting challenge."

For QUALEG, Hovy's group will provide a "starter" ontology that a machine can learn from. The ISI portion will thus become the essential backend piece underlying the integration of the QUALEG databases. Gal's group is creating a software they've dubbed "OntoBuilder." Essentially, it is a heuristic front-end. Through a simple interface, users can input terms that will help a topic-specific ontology learn and become more accurate.

OntoBuilder will be used by city administrators in three cities in Poland, Germany, and France to create their local ontologies. The automated ISI software should help boost their productivity, Hovy says.

In this case, since this project involves at least four languages with some having declensional endings and gender-based forms, creating an ontology would seem not only daunting, but near impossible. Yet with his group's experience in machine translation, Hovy says this is one challenge that is under control, "There are superficial differences in looking at word forms, so you need software that gets to the root forms of each word. In machine translation, it's mostly a solved problem."

But there is an additional hurdle, says Hovy, "You have overlapping meanings - this word may have four meanings in this language and seven meanings in that one." However, Hovy says, results improve markedly if you're working in a particular domain. If you know that all your terms are drawn from finance, the software will more easily translate "bank" as a financial institution and not a geographic entity.

Latest DG News

" dg.o 2006 Convenes May 21-24, 2006  
" dg.o 2006 Early Registration Ends April 10th!
" dg.o 2006 Issues CFP - Tutorials
" dg.o 2006 Issues CFP - Workshops
" dg.o 2006 features Workshops on:
" dg.o 2006 features Tutorial on:
   "Social Network Analysis
" New DG Team Pursues eRulemaking
" IEEE ISI2006 Convenes May 22-24, 2006
" eChallenges e-2006 Issues CFP
" DG Research Helps Predict Urban Growth
" Swapping Secrets of the Double Helix
" UK and DO-Wire Launch e-Gov Best Practices wiki
" DG Team Develops "Virtual Agora" for e-Gov
" Mapping for Times of Crisis
" Exploring Detection of Crisis Hotspots
" Report: Mass eMail Campaigns Harmful
" Scenario-Based Designs for Stat Studies
" US, EU Explore Info Integration
" DG Team Develops Digital Interpreter
" DG Study Gives Teeth to FBI
" Research Smooths Road for Small Businesses
" DG Researchers Parsing in Tongues
" e-Gov Journal Issus Call for Articles

" See all news stories

Contribute to dgOnline

Since ontologies are such a necessary and time-consuming precursor to any integration project, researchers are investigating, "semi-automated methods to build ontologies to align and merge existing ones for new purposes and to adapt old ones for re-use," says Hovy, who hopes this will become a prototype project for just such semi-automation technology.

The researchers will use a "clustering" approach. In clustering, the machine is taught to understand relationships by word occurrence: glass, paperweight, perfume bottle versus glass, windshield, rearview mirror. One starts with "topic signatures" - a set of words for each category that are weighted by relevance to that topic. Using this system, accuracy can go as high as 75%, depending on how clear and distinct the topics are, says Hovy.

The process still requires some human intervention, at least in its initial stages. "Clustering has been used since the 60's," says Hovy, "But it's never been very accurate by itself. If you give it additional help, it can be."

QUALEG at this phase is a pilot program that it is hoped can be extended throughout Europe. But Hovy's and Gal's work is equally ambitious - it could lay the groundwork for an "ontology service bureau," where those charged with database creation could have that initial painstaking step performed. If not fully automated, such a system could at least eliminate much of the individual time and effort that goes into ontology creation.