Turing Center at University of Washington

Investigating problems at the crossroads of natural language processing, data mining, Web search, and the Semantic Web.

Turing Center Home Turing Center People Turing Center Publications Turing Center Press Turing Center Events Turing Center Jobs Turing Center Contact
 

Previous Events

2006

Turing Talk

Siegfried Frey (Institute for Cognition and Communication, Duisburg-Essen)
Knowledge-Based Animation of Virtual Humans
January 6 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303
Recent publications: "Dreamteam Mensch-Computer--Wer bestimmt die Spielregeln?", "Die Macht des Bildes"

Abstract

In this talk I shall demonstrate a prototype implementation of a new approach to computer animation and explain how it can overcome limits and inefficiencies of current methods.

The Skriptanimation approach is based on European research over three decades into the perceptual and emotional effects of human body movement in political, artistic, and commercial communication, and on the Lorenz-Tinbergen model of "subtractive dummy experimentation". Skriptanimation provides an alternative to the keyframe and motion-capturing paradigms, both of which suffer from high cost, theoretical poverty, and the consequent backfiring of attempts to make animated characters realistic and appealing.

The research underlying Skriptanimation has revealed that particular dimensions of body movement, including lateral head flexion and head-torso angular offset, exert surprisingly massive impacts on person perception, swamping the effects of static facial features. These results, and our conviction that a far richer body of usable knowledge awaits us if we can perform animation experiments efficiently, have motivated our development of the Skriptanimation multidimensional motion representation language and associated software. Our system permits the translation of video-recorded natural human movement into a data protocol that provides the empirical basis for dimension-specific analysis, experimentation, and editing via a graphic interface with instant comparative previews.

Time will be provided for hands-on demonstrations by members of the audience.

Turing Talk

David Goss-Grubbs (Linguistics)
Panlingual Representation of Tense and Aspect
January 27 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

Across the languages of the world, there are numerous grammatical strategies for expressing tense (the location of an event in time) and aspect (ways of viewing the internal temporal constituency of an event). However, these can all be mapped to a common set of constructions in a semantic representation. In this talk I present a method for representing tense and aspect in a language-independent way in Minimal Recursion Semantics (MRS), an approach for representing the semantics of natural language sentences intended for use with computational grammars. I argue for an analysis where grammatical tense contributes a restriction on a free time variable. Aspect is handled by using operators to relate event variables to various of their aspectual counterparts. Both tense and aspect supply motivation for formally dividing events into types corresponding to Vendler's action types.

Colloquium

Oren Etzioni (Turing Center)
All I Really Need to Know I Learned from Google
February 2 (Thursday), 3:30 pm - 4:30 pm, Electrical Engineering 105

Abstract

For the last quarter century (measured in person years), the KnowItAll project has focused on accumulating massive amounts of information from the Web by utilizing domain-independent, fully automated techniques. If successful, this effort has the potential to address the long-standing "Knowledge Acquisition Bottleneck" in Artificial Intelligence, and enable a new generation of search engines that extract and synthesize information from text to answer complex user queries. This talk will describe the evolution of the KnowItAll family of systems (or is it Intelligent Design?) culminating in TextRunner--a program that has extracted over 1,000,000,000 "facts" from the Web without breaking a sweat.

See and hear this lecture.

Turing Talk

Shou-de Lin (Computer Science, Linguistics, and Information Sciences Institute, Southern California, and Los Alamos National Laboratory)
Unsupervised Frameworks for Machine Discovery in Complex Semantic Networks and Natural Languages
February 3 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

In this talk I will report my work on interesting instance discovery and ancient script decipherment to demonstrate how one can deal with challenging problems for which there are no training examples.

I will introduce an unsupervised discovery system that can identify and explain interesting or suspicious individuals from a huge, heterogeneous social network or semantic network. Such a system has a wide range of applications in homeland security, fraud detection, or general scientific discovery. For example, an interesting individual involved in a victim's social network could be a suspect, or a type of food that is interestingly connected to a given disease might be a potential cause or cure for it. In this talk I will describe a hybrid approach integrating techniques from symbolic and statistical AI, KDD, and NLP together with the experimental results.

If time permits, I will briefly introduce an unsupervised framework for deciphering the linear writing order of a two-dimensional hieroglyphic ancient script, the Luwians. I will illustrate how we handle this problem by exploiting Shannon's noisy-channel model, EM (estimation maximization) and a forward-backward search algorithm. I will also describe several strategies to verify such unsupervised discovery systems and report the results.

Symposium

Eighth UW/Microsoft Quarterly Symposium in Computational Linguistics
February 3 (Friday), 3:00 pm - 5:00 pm, Microsoft Building 113, 1st floor, room 1021

You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science; the MSR NLP and Speech and Natural Language (SNL) groups; and UW alumni at Microsoft. The symposium consists of two invited talks, followed by an informal reception and an opportunity for demos and/or poster exhibitions.

To facilitate the creation of badges, the symposium is compiling a list of attendees in advance. If you plan to attend, please email jparvi at u.washington.edu, with "UW/MS Symposium" in the subject line.

William Lewis (Linguistics, UW)
Locating, Recognizing, and Converting Interlinear Text on the Web

The Linguistics community is confronted by a quandary: while languages are going extinct at an alarming rate, the digital revolution has provided technologies that make it possible to record, analyze, and disseminate language data more efficiently than ever before. Traditional recording media of notebooks and analog recorders have become things of the past, being replaced by sophisticated digital recorders, laptops, and PDA’s. However, despite cheap storage and the communicative efficiencies provided by the World Wide Web, much language data has been recorded and analyzed without a great deal of attention to the need for its preservation or dissemination. Language data is often housed locally and recorded in proprietary data formats that may themselves go extinct.

The Open Language Archives Community (Bird and Simons 2003) was formed to develop standards for data encoding to ensure that language data can be used over the long-term, and promotes metadata standards to ensure that the data can be located and used. OLAC's problem now is to find resource providers willing to make the extra effort to encode their resources and data in a way that makes them available to search. Critical mass cannot be achieved until a sufficient number of archives, institutions and individuals make their data available to OLAC, yet at the same time, many will not take the extra steps necessary to reformulate their data until they recognize that the extra effort will be worth it.

A number of efforts are currently underway to leverage existing language data as it is currently made available on the Web and make it searchable, whether the data is embedded in journal articles, posted as part of language learning materials for revitalization efforts, housed in language archives hidden behind idiosyncratic user interfaces, provided in Word or text formats, etc. The ODIN (the Online Database of INterlinear text) project was started as a pilot to test the potential for locating language resources and data by concentrating on commonly used semi-structured data types, particularly those used to encode language data. ODIN’s focus thus far has been on interlinearized text, a format common to the field of linguistics. By scanning online resources and documents for instances of interlinear text and applying both novel and commonly used methods for language identification to the data, ODIN has achieved a fairly high degree of success at both locating language resources (most specifically, linguistic resources) and identifying the languages encoded, becoming the first fully automated OLAC data provider. Beyond locating resources containing language data, the ODIN team has experimented with methods for the automated migration of data encoded in legacy formats to best practice XML, from which more extensive metadata, and significantly greater interoperation, can be obtained.

Manuela Noske (Windows Localization, Microsoft)
Unsupervised Acquisition of Ateso Morphology

This paper reports on experimental work applying the unsupervised learning algorithm known as Linguistica v2.0.4 (Goldsmith 2002) to a corpus of approx. 460,000 alphanumeric tokens of the Eastern Nilotic language Ateso. Linguistica divides the morphological discovery process into a set of heuristics which guide the segmentation process and a Minimum Description Length model (Rissanen 1989, Goldsmith 2001) which evaluates the outcome; it has been tested on mostly non-agglutinating languages so far. The results of Linguistica are compared with a manual analysis of 3 samples of 100 words each that are randomly chosen from the Ateso corpus. A quantitative evaluation of Linguistica in terms of recall and precision is supplemented by a qualitative evaluation and a summary which describes what difficulties are encountered in running this experiment on an under-documented language.

Transportation by bus: From the UW area, you can reach the site in 12 minutes by Route 545 from the Montlake/SR 520 freeway stop, leaving at 2:12 pm and arriving at Overlake Transit Center at 2:24 pm. You can return in 20 minutes from 148th Avenue NE and NE 34th Street by Route 242, leaving at 5:25 pm and arriving at Montlake at 5:45 pm. See the map for details on the bus stops.

Transportation by automobile: Take the 148th Ave NE northbound exit from SR 520. Limited visitor parking can be found in front of both building 112 (from NE 36th St) and building 114 (from NE 31st Circle). Please carpool if possible; the visitor parking is very limited.

Turing Talk

Laurie Poulson (Linguistics)
Evaluation and Incremental Panlingual Resource Development
February 10 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

My research involves the development of a methodology for evaluating a process designed to facilitate the rapid prototyping of broad-coverage knowledge-based grammars. This process solicits typologically-based information about a language and then produces a grammar start for that language that contains the foundation of an HPSG grammar with MRS semantics as well as the core elements of an implementation of specific phenomena. This process is essentially panlingual, theoretically generating an appropriate grammar start, customized with regards to these phenomena, for any natural language. I will present a strategy for evaluating the quality of large numbers of these grammar starts as part of the test-development cycle. The immaturity of the grammars, the continually evolving nature of the process as more phenomena are handled and the pan-lingual nature of the output have all presented significant design challenges. Given the panlingual scope of the process, however, the most problematic issues have been how to design an evaluation procedure that is systematic while still tractable and how to develop an appropriate resource of testing data. The proposed evaluation strategy addresses both of these issues and incorporates an incremental method for developing a panlingual resource.

Turing Talk

Bo Pang (Computer Science, Cornell)
A Sentimental Education: Automatic Analysis of Opinions with Graph-Theoretic Formulations
February 13 (Monday), 3:00 pm - 4:00 pm, Paul G. Allen Center for Computer Science and Engineering 403

Abstract

Sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as "thumbs up" or "thumbs down". We describe our previous work that applies standard text-categorization techniques to this sentiment polarity classification problem and a novel approach we later propose that applies these categorization techniques to just the subjective portions of the document. Extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs that incorporate cross-sentence contextual constraints.

We then address the rating-inference problem, where one must determine the reviewer's evaluation with respect to a multi-point scale (e.g., one to five "stars"). We apply a meta-algorithm, based on a metric labeling formulation of the problem, that explicitly exploits relations between classes. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

Portions of this work are joint with Lillian Lee and Shivakumar Vaithyanathan.

Seminar

Steven Poltrock and Mark Handel (Mathematics and Computing Technology, Boeing Phantom Works)
Modeling Collaboration: A Case Study
February 17 (Friday), 11:30 am - 12:20 pm, Loew 206

Turing Talk

Steven L. Tanimoto (Computer Science and Engineering)
Human Visual Communication in the Web: Design for Community-Driven Language Evolution
February 17 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

As greater numbers of people from more and more communities throughout the world gain access to the Internet, the probability that any two randomly chosen people share the same language tends to diminish. However, they may have a reason to interact because of a shared interest or medical condition. Visual languages offer one means of enabling communication across language barriers, and they can take advantage of improving technology for transmitting graphics. Past efforts to design and promulgate various international languages have tended to falter, in part because their designers could not foresee all of the challenges users would face in learning and adapting the languages to their own needs. One approach to avoiding this problem is to design a system that supports community-driven evolution of the language. Issues to be considered include language structural complexity, community rating systems such as that developed by Cheng and Vassileva, vocabulary expansion, and integration with other web services such as automatic language translation and search engines. Two simple demonstrations based on collaborations with C. Bernardelli will be used to illustrate issues of language design and integration with web search.

Turing Talk

Massimiliano Ciaramita (Laboratory for Applied Ontology, Italian National Research Council)
Unsupervised Acquisition of Semantic Relations from Text: A Case Study in Molecular Biology
February 22 (Wednesday), 11:30 am - 12:20 pm, Paul G. Allen Center for Computer Science and Engineering 503

Abstract

This work presents an unsupervised method for the acquisition of labelled relations between ontological objects from unstructured text. The ultimate goal of the proposed architecture is that of providing resources to support broad-coverage, information-extraction-based text mining and ontology engineering. The method is based on simple well-studied off-the-shelf natural language processing techniques such as named entity recognition, statistical syntactic parsing, and hypothesis testing for lexical collocations and selectional preference learning. The work focuses in particular on a case study in molecular biology concerning the GENIA (Ohta et al., 2002) corpus and ontology. Relative to this study we analyze and evaluate relevant properties of this approach and discuss directions for future research, particularly with respect to Web-scale text mining and retrieval tasks.

Turing Talk

Oren Etzioni, Stephen Soderland, and Ethan Phelps-Goodman (Turing Center)
Initial Steps towards Corpus-Based Panlingual Translation
February 24 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

Panlingual translation is an exciting and ambitious goal that is at the heart of the Turing Center. In this talk, we will describe our basic approach to the problem, which relies on corpus-based techniques, and discuss in detail two ongoing projects. First, we are investgating the use of a controlled-language interface to improve translation performance. Second, we are utilizing the TextRunner information-extraction system to learn phrase translations from comparable corpora. We will conclude with a discussion of our plans for future work.

Turing Talk

Nicholas Kushmerick (Computer Science and Informatics, University College Dublin)
Email Activity Management: A Machine Learning Approach
February 24 (Friday), 3:30 pm - 4:30 pm, Paul G. Allen Center for Computer Science and Engineering 403

Lecture

Meredith Ringel Morris (Computer Science, Stanford)
Supporting Effective Interaction with Tabletop Groupware
February 28 (Tuesday), 3:30-5:00 pm, Electrical Engineering 105

See and hear this lecture.

Turing Talk

Sharon Oviatt (Center for Human-Computer Communication, Oregon Health & Science University)
Toward Mobile and Adaptive Conversational Interfaces: Modeling Speech Convergence between Humans and Animated Personas
March 3 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

During the past decade, rapid advances in spoken language technology, natural language processing, dialogue modeling, multimodal interfaces, animated character design, and mobile applications all have stimulated interest in a new class of conversational interfaces. Such systems are being designed to support users' performance in a variety of task applications (commercial, medical, educational, in-vehicle), and many have been designed with animated characters that aim to facilitate user performance. However, the development of robust systems that process conversational speech is a challenging problem, largely because users' spoken language can be extremely variable. In this talk, I'll describe research in our lab that has identified a new source of variability in users' spoken language to computers. Basically, people spontaneously and rapidly adapt the basic acoustic-prosodic features of their speech signal to the text-to-speech output they hear from a computer partner. These speech adaptations are delivered dynamically, since users will quickly readapt their speech when communicating with a different computer voice. They also are flexibly bi-directional--for example, users will increase their own speech amplitude and rate when conversing with a computer partner that has louder and faster text-to-speech (TTS) output, and will decrease these features when the TTS is quieter and slower. In fact, an analysis of speakers' amplitude, durational features, and dialogue response latencies confirmed that these adaptations can be substantial in magnitude (10-50%), with the largest adaptations involving utterance pause structure and amplitude. This research underscores the need for new speech and multimodal systems that can adapt to users and their communication context. It also emphasizes the importance of auditory interface design for next-generation mobile systems. Implications are discussed for designing future conversational interfaces that are more reliable, well synchronized, and supportive of user performance.

Lecture

Luis von Ahn (Computer Science, Carnegie Mellon)
Human Computation
March 28 (Tuesday), 3:30-5:00 pm, Electrical Engineering 105

See and hear this lecture.

Lecture

Kalervo Järvelin (Information Studies, Tampere)
Ontology-Based Searching in Unannotated Text Collections
March 29 (Wednesday), 3:30-5:00 pm, Mary Gates 389 (preregistration requested)

Lecture

David Hawking (ICT Centre, Commonwealth Scientific and Industrial Research Organisation, Canberra)
Does Topic Metadata Help with Web Search?
April 5 (Wednesday), 3:30-4:30 pm, Mary Gates 420

Abstract

It has been claimed that topic metadata can be used to improve the accuracy of text searches. Justin Zobel (RMIT) and I tested this claim by examining the contribution of metadata to effective searching within websites published by a university with a strong commitment to and substantial investment in metadata. In this talk I will report experiments we conducted to measure the ability of subject and description metadata to contribute to effectively answering four different types of queries, extracted from the university's official query logs and from the university's site map. Examination of the metadata present at the university reveals that, in addition to implementation deficiencies, there are inherent problems in trying to use subject and description metadata to enhance the searchability of websites. A follow-up experiment with the websites published in a particular government jurisdiction confirmed our findings. Our experiments show that link anchor text, which can be regarded as metadata created by others, is much more effective in identifying best answers to queries than other textual evidence.

Symposium

Ninth UW/Microsoft Quarterly Symposium in Computational Linguistics
April 7 (Friday), 3:30 pm - 5:00 pm, Mary Gates 241

You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Germanics, Electrical Engineering, and Computer Science; the MSR NLP and Speech and Natural Language (SNL) groups; and UW alumni at Microsoft. The symposium consists of two invited talks.

Simon Corston-Oliver and Anthony Aue (Microsoft Research)
Multilingual Dependency Parsing Using Bayes Point Machines

Michael J. Cafarella and Oren Etzioni (Turing Center)
A Search Engine for Natural Language Applications

Turing Talk

Xue Nianwen (Cognitive Science, Colorado)
Strategies for the Syntactic and Semantic Annotation of Chinese Corpora
April 14 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

Corpus-based approaches have been the predominant research paradigm in natural language processing for the past decade or so. In the first part of this talk, I will describe our efforts to build a multi-layered, multi-dimensional Chinese corpus during the past seven years that we hope will provide the fuel for research in Chinese language processing. I will first discuss the syntactic annotation in the Chinese treebank, and move on to the semantic annotation of verbs and their nominalizations in the Chinese propbank. And finally I will touch on some preliminary work we have done on Chinese discourse connectives. A recurring issue in corpus annotation is the competing demands of linguistic (most intuitive and elegant representation) and engineering (high annotation consistency) principles, and I will discuss some tradeoffs that we have made in each of the annotation tasks discussed. In the second part of the talk, I will describe some machine-learning systems that we have developed using this corpus as training and test material. I will present some experimental results on semantic role labeling of Chinese verbs and their nominalizations and discuss some challenges facing Chinese NLP.

Turing Talk

Rich Henry (UnifiedField Associates)
New Tool, dotSUB, Dramatically Eases Film Translation and Subtitling
April 21 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

dotSUB dramatically decreases the effort and cost for translating and subtitling films and video, while dramatically increasing access for both viewers and potential translators. The result will be much greater access to films and videos from all over the world, increasing cross-cultural communication and understanding: "Any Film Any Language."

No special software or training is required. dotSUB provides a simple web browser-based interface where anyone who knows the film's source language and any second language can contribute to the translation into that second language. Translations are entered phrase by phrase. As soon as the "submit" button is hit, those phrases immediately become available as subtitles when the film is watched. Films can literally have subtitles available in 20 languages in the morning and 25 in the afternoon.

In addition to dotSUB's significant impact on access to films from other cultures, it also has great potential as a teaching tool, for example in foreign language education.

Lecture

Steven Greenberg (Silicon Speech; Centre for Applied Hearing Research, Technical University of Denmark)
What are the Essential Cues for Understanding Spoken Language?
July 14 (Friday), 11:00 am - 12:00 noon, Electrical Engineering 403

Turing Talk

Jonathan Pool (Turing Center and Utilika Foundation)
Can Controlled Languages Scale to the Web?
August 4 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

In a multilingual Semantic Web, authors might write in precise, expressive varieties of diverse languages. Do such controlled languages exist? Of 41 candidates, just 4 were (1) designed for multiple domains and genres and (2) documented enough for evaluation. A sample of Web statements on health and human rights revealed limited expressivity or precision in each language. The most expressive one avoided structural ambiguity but allowed semantic ambiguity that could frustrate human and machine comprehension. The possibility of a practical Web-scale controlled language remains undemonstrated but unrefuted.

Turing Talk

Chung-chieh Shan (Computer Science, Rutgers)
Linguistic Side Effects
August 18 (Friday), 1:30 pm - 2:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

How does the meaning of an utterance determine its effects on the discourse and its participants? I study this question using tools from programming-language theory. In particular, how do utterances interact with each other in context?

For example, pronouns in natural languages and variable references in programming languages both retrieve values from the context. In general, computational side effects in programming languages and apparently noncompositional phenomena in natural languages both let expressions access their contexts.

This link stresses the dynamic, operational view on meaning. For example, the notion of evaluation order unifies an unprecedented variety of linguistic generalizations: crossover in anaphora, superiority in questions, and ordering effects in polarity licensing and quantifier scope.

Poster Session and Graduation Event

UW Professional Master's in Computational Linguistics
September 22 (Friday), 6:00 pm - 8:00 pm, Communications 202

Posters

"TreeTran: A Tool for Visual Selection and Testing of Transfer Rules for Machine Translation", David Bullock

"Syntactic Structure by Projecting Interlinear Glossed Text Structure", Dan Jinguji

"ask574: A Web-based Question Answering System Adapted from the CLMA Program's TREC 2006 Entry", Josh Minor

"An Evaluation of the Classification of IGT to Topic Hierarchies", Brian Nisonger

"Finding and Evaluating Structured Bilingual Corpora on the Web", Achim Ruopp

Colloquium and Reception

Emily Bender (Linguistics)
Grammar Engineering for Crosslinguistic Hypothesis Testing
October 13 (Friday), 3:30 pm - 5:00 pm, Law 127

Abstract

In this talk, part of the "fundamental issues" series, I ask what it means to test a syntactic hypothesis and describe how the techniques of grammar engineering can expand the practical empirical base of syntax. Computerized implementations of their grammars allow linguists to more efficiently and effectively test hypotheses, for two reasons: First, languages are made up of many subsystems with complex interactions. Linguists generally focus on just one subsystem at a time, yet the predictions of any particular analysis cannot be calculated independently of the interacting subsystems. With implemented grammars, the computer can track the effects of all aspects of the implementation while the linguist focuses on developing just one. Second, automated application of grammars to test suites and naturally occurring data allows for much more thorough testing of linguistic analyses--against thousands as opposed to tens of examples and including examples not anticipated by the linguist. I describe how current work in the Grammar Matrix project is extending these techniques to cross-linguistic hypothesis testing in computational typology.

Symposium

Tenth UW/Microsoft Quarterly Symposium in Computational Linguistics
October 20 (Friday), 3:00 pm - 5:00 pm, Microsoft Building 113

You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science and Engineering; the MSR NLP Group; the Microsoft Natural Language (NLG) and Speech Components groups; and UW alumni at Microsoft. The symposium consists of two invited talks, followed by an informal reception and an opportunity for demos and/or poster exhibitions.

To facilitate the creation of badges, the symposium is compiling a list of attendees in advance. If you plan to attend, please email jparvi at u.washington.edu, with "UW/MS Symposium" in the subject line.

Presentations: Fei Xia (with William Lewis and Dan Jinguji), "Towards Automatic Enrichment and Analysis of Linguistic Data for Low-Density Languages"; Mark Johnson, "Features of Reranking Parsers". See the announcement referenced above for details.

Turing Talk

Peter Clark (Boeing Phantom Works)
The Reading to Learn Project: An Experiment in Controlled Language Processing
October 24 (Tuesday), 2:30 pm - 3:20 pm, Paul G. Allen Center for Computer Science and Engineering 303

Abstract

One of the most important methods by which human beings learn is reading, a task which includes integrating what was read with existing, prior knowledge. While in its full generality, the reading task is still too difficult a capability to be implemented in a computer, significant (if partial) approaches to the task are now feasible. Our goal in this DARPA seedling project was to study issues and develop solutions for this task by working with a reduced version of the problem, namely working with text written in a simplified version of English (a controlled language) rather than full natural language. Our experience and results reveal that even this reduced version of the task is still challenging, and we have uncovered several major insights into this challenge. In particular, our work indicates a need for fairly substantial domain and linguistic knowledge to ensure reliable interpretation, and for a radical revision of traditional knowledge representation structures to support knowledge integration. We describe our work and analysis, present a synthesis and evaluation of our work, and make several recommendations for future work in this area. Our conclusion is that ultimately, to bridge the "knowledge gap", a pipelined approach is inappropriate, and that to address the knowledge requirements for good language understanding an iterative (bootstrapped) approach is the most promising way forward.

Colloquium and Reception

Scott Farrar (University of Arizona)
e-Linguistics: GOLD, ODIN, and Intelligent Search
October 27 (Friday), 3:30 pm - 5:00 pm, Law 138

Abstract

This presentation will deal with the fact that, while there is no available statistic, the amount of electronically available linguistic field data seems to be increasing at a phenomenal rate. While the situation opens up enormous opportunities for automated empirical research, it is argued that such a rapid increase in the number of Web resources motivates the need for community consensus to take advantage of the Web as the primary resource for accessing data. Thus, linguistics as a data-driven field is poised either to revolutionize itself by changing the way its research is conducted or to miss the opportunity because of a lack of codified best practices. In this talk, the notion of 'e-Linguistics' is proposed as the practice of using the Internet as the primary means to access and analyze language data with automated tools. e-Linguistics requires intelligent frameworks and tools. The GOLD Community of Practice is one such framework for linking on-line linguistic data to an ontology. The key components of the model include the linguistic data resources themselves and those focused on the knowledge derived from data. Data resources include the ever-increasing amount of linguistic field data and other descriptive language resources being migrated to the Web. The knowledge resources capture generalizations about the data and are anchored in the General Ontology for Linguistic Description (GOLD). It is argued that such a model is in the spirit of the vision for a Semantic Web and, thus, provides a concrete methodology for rendering highly divergent resources semantically interoperable. The focus of this work, then, is not on annotation at the syntactic level, but rather on how annotated Web resources can be linked to an ontology. Furthermore, a methodology is given for creating specific communities of practice within the overall Web infrastructure for linguistics. Finally, ontology-driven search in the context of the Online Database for Interlinear Text (ODIN) is discussed as a key application of the proposed model and an exemplar of e-Linguistics for the 21st century.

Symposium

Ontologies for Human-Machine and Panlingual Human Communication
November 7 (Tuesday), 1:30 pm - 3:20 pm, Paul G. Allen Center for Computer Science and Engineering 691 (Bill and Melinda Gates Commons)

An informal round-table discussion. Topics include (1) strategies for human compilation and machine learning of ontologies and (2) the potential contributions of ontologies, folksonomies, wordnets, thesauri, classifications, controlled languages, etc. to translation and information access among all languages. Led by the Turing Center's director, Oren Etzioni. Brief introductory remarks by Emily Bender (Linguistics), Peter Clark (Boeing), Ed Cutrell (Microsoft Research), Oren Etzioni (CSE), Scott Farrar (Linguistics and U of Arizona), John Gennari (Biomedical Informatics), Jonathan Pool (Utilika Foundation), Lucy Vanderwende (Microsoft Research), and Stuart Weibel (iSchool and OCLC/DCMI). Interested researchers and students are welcome to take part. Refreshments will be served.

Current Events

 
 

Email: | Maps | Directions