|
|||||||||||||||||
Previous Events2006Turing TalkSiegfried Frey (Institute for Cognition and Communication, Duisburg-Essen) Abstract In this talk I shall demonstrate a prototype implementation of a new approach to computer animation and explain how it can overcome limits and inefficiencies of current methods. The Skriptanimation approach is based on European research over three decades into the perceptual and emotional effects of human body movement in political, artistic, and commercial communication, and on the Lorenz-Tinbergen model of "subtractive dummy experimentation". Skriptanimation provides an alternative to the keyframe and motion-capturing paradigms, both of which suffer from high cost, theoretical poverty, and the consequent backfiring of attempts to make animated characters realistic and appealing. The research underlying Skriptanimation has revealed that particular dimensions of body movement, including lateral head flexion and head-torso angular offset, exert surprisingly massive impacts on person perception, swamping the effects of static facial features. These results, and our conviction that a far richer body of usable knowledge awaits us if we can perform animation experiments efficiently, have motivated our development of the Skriptanimation multidimensional motion representation language and associated software. Our system permits the translation of video-recorded natural human movement into a data protocol that provides the empirical basis for dimension-specific analysis, experimentation, and editing via a graphic interface with instant comparative previews. Time will be provided for hands-on demonstrations by members of the audience. Turing TalkDavid Goss-Grubbs (Linguistics) Abstract Across the languages of the world, there are numerous grammatical strategies for expressing tense (the location of an event in time) and aspect (ways of viewing the internal temporal constituency of an event). However, these can all be mapped to a common set of constructions in a semantic representation. In this talk I present a method for representing tense and aspect in a language-independent way in Minimal Recursion Semantics (MRS), an approach for representing the semantics of natural language sentences intended for use with computational grammars. I argue for an analysis where grammatical tense contributes a restriction on a free time variable. Aspect is handled by using operators to relate event variables to various of their aspectual counterparts. Both tense and aspect supply motivation for formally dividing events into types corresponding to Vendler's action types. ColloquiumOren Etzioni (Turing Center) Abstract For the last quarter century (measured in person years), the KnowItAll project has focused on accumulating massive amounts of information from the Web by utilizing domain-independent, fully automated techniques. If successful, this effort has the potential to address the long-standing "Knowledge Acquisition Bottleneck" in Artificial Intelligence, and enable a new generation of search engines that extract and synthesize information from text to answer complex user queries. This talk will describe the evolution of the KnowItAll family of systems (or is it Intelligent Design?) culminating in TextRunner--a program that has extracted over 1,000,000,000 "facts" from the Web without breaking a sweat. See and hear this lecture.Turing TalkShou-de Lin (Computer Science, Linguistics, and Information Sciences Institute, Southern California, and Los Alamos National Laboratory) Abstract In this talk I will report my work on interesting instance discovery and ancient script decipherment to demonstrate how one can deal with challenging problems for which there are no training examples. I will introduce an unsupervised discovery system that can identify and explain interesting or suspicious individuals from a huge, heterogeneous social network or semantic network. Such a system has a wide range of applications in homeland security, fraud detection, or general scientific discovery. For example, an interesting individual involved in a victim's social network could be a suspect, or a type of food that is interestingly connected to a given disease might be a potential cause or cure for it. In this talk I will describe a hybrid approach integrating techniques from symbolic and statistical AI, KDD, and NLP together with the experimental results. If time permits, I will briefly introduce an unsupervised framework for deciphering the linear writing order of a two-dimensional hieroglyphic ancient script, the Luwians. I will illustrate how we handle this problem by exploiting Shannon's noisy-channel model, EM (estimation maximization) and a forward-backward search algorithm. I will also describe several strategies to verify such unsupervised discovery systems and report the results. SymposiumEighth UW/Microsoft Quarterly Symposium in Computational Linguistics You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science; the MSR NLP and Speech and Natural Language (SNL) groups; and UW alumni at Microsoft. The symposium consists of two invited talks, followed by an informal reception and an opportunity for demos and/or poster exhibitions. To facilitate the creation of badges, the symposium is compiling a list of attendees in advance. If you plan to attend, please email jparvi at u.washington.edu, with "UW/MS Symposium" in the subject line. William Lewis (Linguistics, UW) The Linguistics community is confronted by a quandary: while languages are going extinct at an alarming rate, the digital revolution has provided technologies that make it possible to record, analyze, and disseminate language data more efficiently than ever before. Traditional recording media of notebooks and analog recorders have become things of the past, being replaced by sophisticated digital recorders, laptops, and PDA’s. However, despite cheap storage and the communicative efficiencies provided by the World Wide Web, much language data has been recorded and analyzed without a great deal of attention to the need for its preservation or dissemination. Language data is often housed locally and recorded in proprietary data formats that may themselves go extinct. The Open Language Archives Community (Bird and Simons 2003) was formed to develop standards for data encoding to ensure that language data can be used over the long-term, and promotes metadata standards to ensure that the data can be located and used. OLAC's problem now is to find resource providers willing to make the extra effort to encode their resources and data in a way that makes them available to search. Critical mass cannot be achieved until a sufficient number of archives, institutions and individuals make their data available to OLAC, yet at the same time, many will not take the extra steps necessary to reformulate their data until they recognize that the extra effort will be worth it. A number of efforts are currently underway to leverage existing language data as it is currently made available on the Web and make it searchable, whether the data is embedded in journal articles, posted as part of language learning materials for revitalization efforts, housed in language archives hidden behind idiosyncratic user interfaces, provided in Word or text formats, etc. The ODIN (the Online Database of INterlinear text) project was started as a pilot to test the potential for locating language resources and data by concentrating on commonly used semi-structured data types, particularly those used to encode language data. ODIN’s focus thus far has been on interlinearized text, a format common to the field of linguistics. By scanning online resources and documents for instances of interlinear text and applying both novel and commonly used methods for language identification to the data, ODIN has achieved a fairly high degree of success at both locating language resources (most specifically, linguistic resources) and identifying the languages encoded, becoming the first fully automated OLAC data provider. Beyond locating resources containing language data, the ODIN team has experimented with methods for the automated migration of data encoded in legacy formats to best practice XML, from which more extensive metadata, and significantly greater interoperation, can be obtained. Manuela Noske (Windows Localization, Microsoft) This paper reports on experimental work applying the unsupervised learning algorithm known as Linguistica v2.0.4 (Goldsmith 2002) to a corpus of approx. 460,000 alphanumeric tokens of the Eastern Nilotic language Ateso. Linguistica divides the morphological discovery process into a set of heuristics which guide the segmentation process and a Minimum Description Length model (Rissanen 1989, Goldsmith 2001) which evaluates the outcome; it has been tested on mostly non-agglutinating languages so far. The results of Linguistica are compared with a manual analysis of 3 samples of 100 words each that are randomly chosen from the Ateso corpus. A quantitative evaluation of Linguistica in terms of recall and precision is supplemented by a qualitative evaluation and a summary which describes what difficulties are encountered in running this experiment on an under-documented language. Transportation by bus: From the UW area, you can reach the site in 12 minutes by Route 545 from the Montlake/SR 520 freeway stop, leaving at 2:12 pm and arriving at Overlake Transit Center at 2:24 pm. You can return in 20 minutes from 148th Avenue NE and NE 34th Street by Route 242, leaving at 5:25 pm and arriving at Montlake at 5:45 pm. See the map for details on the bus stops. Transportation by automobile: Take the 148th Ave NE northbound exit from SR 520. Limited visitor parking can be found in front of both building 112 (from NE 36th St) and building 114 (from NE 31st Circle). Please carpool if possible; the visitor parking is very limited. Turing TalkLaurie Poulson (Linguistics) Abstract My research involves the development of a methodology for evaluating a process designed to facilitate the rapid prototyping of broad-coverage knowledge-based grammars. This process solicits typologically-based information about a language and then produces a grammar start for that language that contains the foundation of an HPSG grammar with MRS semantics as well as the core elements of an implementation of specific phenomena. This process is essentially panlingual, theoretically generating an appropriate grammar start, customized with regards to these phenomena, for any natural language. I will present a strategy for evaluating the quality of large numbers of these grammar starts as part of the test-development cycle. The immaturity of the grammars, the continually evolving nature of the process as more phenomena are handled and the pan-lingual nature of the output have all presented significant design challenges. Given the panlingual scope of the process, however, the most problematic issues have been how to design an evaluation procedure that is systematic while still tractable and how to develop an appropriate resource of testing data. The proposed evaluation strategy addresses both of these issues and incorporates an incremental method for developing a panlingual resource. Turing TalkBo Pang (Computer Science, Cornell) Abstract Sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as "thumbs up" or "thumbs down". We describe our previous work that applies standard text-categorization techniques to this sentiment polarity classification problem and a novel approach we later propose that applies these categorization techniques to just the subjective portions of the document. Extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs that incorporate cross-sentence contextual constraints. SeminarSteven Poltrock and Mark Handel (Mathematics and Computing Technology, Boeing Phantom Works) Turing TalkSteven L. Tanimoto (Computer Science and Engineering) Abstract As greater numbers of people from more and more communities throughout the world gain access to the Internet, the probability that any two randomly chosen people share the same language tends to diminish. However, they may have a reason to interact because of a shared interest or medical condition. Visual languages offer one means of enabling communication across language barriers, and they can take advantage of improving technology for transmitting graphics. Past efforts to design and promulgate various international languages have tended to falter, in part because their designers could not foresee all of the challenges users would face in learning and adapting the languages to their own needs. One approach to avoiding this problem is to design a system that supports community-driven evolution of the language. Issues to be considered include language structural complexity, community rating systems such as that developed by Cheng and Vassileva, vocabulary expansion, and integration with other web services such as automatic language translation and search engines. Two simple demonstrations based on collaborations with C. Bernardelli will be used to illustrate issues of language design and integration with web search. Turing TalkMassimiliano Ciaramita (Laboratory for Applied Ontology, Italian National Research Council) Abstract This work presents an unsupervised method for the acquisition of labelled relations between ontological objects from unstructured text. The ultimate goal of the proposed architecture is that of providing resources to support broad-coverage, information-extraction-based text mining and ontology engineering. The method is based on simple well-studied off-the-shelf natural language processing techniques such as named entity recognition, statistical syntactic parsing, and hypothesis testing for lexical collocations and selectional preference learning. The work focuses in particular on a case study in molecular biology concerning the GENIA (Ohta et al., 2002) corpus and ontology. Relative to this study we analyze and evaluate relevant properties of this approach and discuss directions for future research, particularly with respect to Web-scale text mining and retrieval tasks. Turing TalkOren Etzioni, Stephen Soderland, and Ethan Phelps-Goodman (Turing Center) Abstract Panlingual translation is an exciting and ambitious goal that is at the heart of the Turing Center. In this talk, we will describe our basic approach to the problem, which relies on corpus-based techniques, and discuss in detail two ongoing projects. First, we are investgating the use of a controlled-language interface to improve translation performance. Second, we are utilizing the TextRunner information-extraction system to learn phrase translations from comparable corpora. We will conclude with a discussion of our plans for future work. Turing TalkNicholas Kushmerick (Computer Science and Informatics, University College Dublin) LectureMeredith Ringel Morris (Computer Science, Stanford) Turing TalkSharon Oviatt (Center for Human-Computer Communication, Oregon Health & Science University) Abstract During the past decade, rapid advances in spoken language technology, natural language processing, dialogue modeling, multimodal interfaces, animated character design, and mobile applications all have stimulated interest in a new class of conversational interfaces. Such systems are being designed to support users' performance in a variety of task applications (commercial, medical, educational, in-vehicle), and many have been designed with animated characters that aim to facilitate user performance. However, the development of robust systems that process conversational speech is a challenging problem, largely because users' spoken language can be extremely variable. In this talk, I'll describe research in our lab that has identified a new source of variability in users' spoken language to computers. Basically, people spontaneously and rapidly adapt the basic acoustic-prosodic features of their speech signal to the text-to-speech output they hear from a computer partner. These speech adaptations are delivered dynamically, since users will quickly readapt their speech when communicating with a different computer voice. They also are flexibly bi-directional--for example, users will increase their own speech amplitude and rate when conversing with a computer partner that has louder and faster text-to-speech (TTS) output, and will decrease these features when the TTS is quieter and slower. In fact, an analysis of speakers' amplitude, durational features, and dialogue response latencies confirmed that these adaptations can be substantial in magnitude (10-50%), with the largest adaptations involving utterance pause structure and amplitude. This research underscores the need for new speech and multimodal systems that can adapt to users and their communication context. It also emphasizes the importance of auditory interface design for next-generation mobile systems. Implications are discussed for designing future conversational interfaces that are more reliable, well synchronized, and supportive of user performance. LectureLuis von Ahn (Computer Science, Carnegie Mellon) LectureKalervo Järvelin (Information Studies, Tampere) LectureDavid Hawking (ICT Centre, Commonwealth Scientific and Industrial Research Organisation, Canberra) Abstract It has been claimed that topic metadata can be used to improve the accuracy of text searches. Justin Zobel (RMIT) and I tested this claim by examining the contribution of metadata to effective searching within websites published by a university with a strong commitment to and substantial investment in metadata. In this talk I will report experiments we conducted to measure the ability of subject and description metadata to contribute to effectively answering four different types of queries, extracted from the university's official query logs and from the university's site map. Examination of the metadata present at the university reveals that, in addition to implementation deficiencies, there are inherent problems in trying to use subject and description metadata to enhance the searchability of websites. A follow-up experiment with the websites published in a particular government jurisdiction confirmed our findings. Our experiments show that link anchor text, which can be regarded as metadata created by others, is much more effective in identifying best answers to queries than other textual evidence. SymposiumNinth UW/Microsoft Quarterly Symposium in Computational Linguistics You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Germanics, Electrical Engineering, and Computer Science; the MSR NLP and Speech and Natural Language (SNL) groups; and UW alumni at Microsoft. The symposium consists of two invited talks. Simon Corston-Oliver and Anthony Aue (Microsoft Research) Michael J. Cafarella and Oren Etzioni (Turing Center) Turing TalkXue Nianwen (Cognitive Science, Colorado) Abstract Corpus-based approaches have been the predominant research paradigm in natural language processing for the past decade or so. In the first part of this talk, I will describe our efforts to build a multi-layered, multi-dimensional Chinese corpus during the past seven years that we hope will provide the fuel for research in Chinese language processing. I will first discuss the syntactic annotation in the Chinese treebank, and move on to the semantic annotation of verbs and their nominalizations in the Chinese propbank. And finally I will touch on some preliminary work we have done on Chinese discourse connectives. A recurring issue in corpus annotation is the competing demands of linguistic (most intuitive and elegant representation) and engineering (high annotation consistency) principles, and I will discuss some tradeoffs that we have made in each of the annotation tasks discussed. In the second part of the talk, I will describe some machine-learning systems that we have developed using this corpus as training and test material. I will present some experimental results on semantic role labeling of Chinese verbs and their nominalizations and discuss some challenges facing Chinese NLP. Turing TalkRich Henry (UnifiedField Associates) Abstract dotSUB dramatically decreases the effort and cost for translating and subtitling films and video, while dramatically increasing access for both viewers and potential translators. The result will be much greater access to films and videos from all over the world, increasing cross-cultural communication and understanding: "Any Film Any Language." LectureSteven Greenberg (Silicon Speech; Centre for
Applied Hearing Research, Technical University of Denmark) Turing TalkJonathan Pool (Turing Center and Utilika Foundation) Abstract In a multilingual Semantic Web, authors might write in precise, expressive varieties of diverse languages. Do such controlled languages exist? Of 41 candidates, just 4 were (1) designed for multiple domains and genres and (2) documented enough for evaluation. A sample of Web statements on health and human rights revealed limited expressivity or precision in each language. The most expressive one avoided structural ambiguity but allowed semantic ambiguity that could frustrate human and machine comprehension. The possibility of a practical Web-scale controlled language remains undemonstrated but unrefuted. Turing TalkChung-chieh Shan (Computer Science, Rutgers) Abstract How does the meaning of an utterance determine its effects on the discourse and its participants? I study this question using tools from programming-language theory. In particular, how do utterances interact with each other in context? For example, pronouns in natural languages and variable references in programming languages both retrieve values from the context. In general, computational side effects in programming languages and apparently noncompositional phenomena in natural languages both let expressions access their contexts. This link stresses the dynamic, operational view on meaning. For example, the notion of evaluation order unifies an unprecedented variety of linguistic generalizations: crossover in anaphora, superiority in questions, and ordering effects in polarity licensing and quantifier scope. Poster Session and Graduation EventUW Professional Master's in Computational Linguistics Posters "TreeTran: A Tool for Visual Selection and Testing of Transfer Rules for Machine Translation", David Bullock "Syntactic Structure by Projecting Interlinear Glossed Text Structure", Dan Jinguji "ask574: A Web-based Question Answering System Adapted from the CLMA Program's TREC 2006 Entry", Josh Minor "An Evaluation of the Classification of IGT to Topic Hierarchies", Brian Nisonger "Finding and Evaluating Structured Bilingual Corpora on the Web", Achim Ruopp Colloquium and ReceptionEmily Bender (Linguistics) Abstract In this talk, part of the "fundamental issues" series, I ask what it means to test a syntactic hypothesis and describe how the techniques of grammar engineering can expand the practical empirical base of syntax. Computerized implementations of their grammars allow linguists to more efficiently and effectively test hypotheses, for two reasons: First, languages are made up of many subsystems with complex interactions. Linguists generally focus on just one subsystem at a time, yet the predictions of any particular analysis cannot be calculated independently of the interacting subsystems. With implemented grammars, the computer can track the effects of all aspects of the implementation while the linguist focuses on developing just one. Second, automated application of grammars to test suites and naturally occurring data allows for much more thorough testing of linguistic analyses--against thousands as opposed to tens of examples and including examples not anticipated by the linguist. I describe how current work in the Grammar Matrix project is extending these techniques to cross-linguistic hypothesis testing in computational typology. SymposiumTenth UW/Microsoft Quarterly Symposium in Computational Linguistics You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science and Engineering; the MSR NLP Group; the Microsoft Natural Language (NLG) and Speech Components groups; and UW alumni at Microsoft. The symposium consists of two invited talks, followed by an informal reception and an opportunity for demos and/or poster exhibitions. To facilitate the creation of badges, the symposium is compiling a list of attendees in advance. If you plan to attend, please email jparvi at u.washington.edu, with "UW/MS Symposium" in the subject line. Presentations: Fei Xia (with William Lewis and Dan Jinguji), "Towards Automatic Enrichment and Analysis of Linguistic Data for Low-Density Languages"; Mark Johnson, "Features of Reranking Parsers". See the announcement referenced above for details. Turing TalkPeter Clark (Boeing Phantom Works) Abstract One of the most important methods by which human beings learn is reading, a task which includes integrating what was read with existing, prior knowledge. While in its full generality, the reading task is still too difficult a capability to be implemented in a computer, significant (if partial) approaches to the task are now feasible. Our goal in this DARPA seedling project was to study issues and develop solutions for this task by working with a reduced version of the problem, namely working with text written in a simplified version of English (a controlled language) rather than full natural language. Our experience and results reveal that even this reduced version of the task is still challenging, and we have uncovered several major insights into this challenge. In particular, our work indicates a need for fairly substantial domain and linguistic knowledge to ensure reliable interpretation, and for a radical revision of traditional knowledge representation structures to support knowledge integration. We describe our work and analysis, present a synthesis and evaluation of our work, and make several recommendations for future work in this area. Our conclusion is that ultimately, to bridge the "knowledge gap", a pipelined approach is inappropriate, and that to address the knowledge requirements for good language understanding an iterative (bootstrapped) approach is the most promising way forward. Colloquium and ReceptionScott Farrar (University of Arizona) Abstract This presentation will deal with the fact that, while there is no available statistic, the amount of electronically available linguistic field data seems to be increasing at a phenomenal rate. While the situation opens up enormous opportunities for automated empirical research, it is argued that such a rapid increase in the number of Web resources motivates the need for community consensus to take advantage of the Web as the primary resource for accessing data. Thus, linguistics as a data-driven field is poised either to revolutionize itself by changing the way its research is conducted or to miss the opportunity because of a lack of codified best practices. In this talk, the notion of 'e-Linguistics' is proposed as the practice of using the Internet as the primary means to access and analyze language data with automated tools. e-Linguistics requires intelligent frameworks and tools. The GOLD Community of Practice is one such framework for linking on-line linguistic data to an ontology. The key components of the model include the linguistic data resources themselves and those focused on the knowledge derived from data. Data resources include the ever-increasing amount of linguistic field data and other descriptive language resources being migrated to the Web. The knowledge resources capture generalizations about the data and are anchored in the General Ontology for Linguistic Description (GOLD). It is argued that such a model is in the spirit of the vision for a Semantic Web and, thus, provides a concrete methodology for rendering highly divergent resources semantically interoperable. The focus of this work, then, is not on annotation at the syntactic level, but rather on how annotated Web resources can be linked to an ontology. Furthermore, a methodology is given for creating specific communities of practice within the overall Web infrastructure for linguistics. Finally, ontology-driven search in the context of the Online Database for Interlinear Text (ODIN) is discussed as a key application of the proposed model and an exemplar of e-Linguistics for the 21st century. SymposiumOntologies for Human-Machine and Panlingual Human Communication An informal round-table discussion. Topics include (1) strategies for human compilation and machine learning of ontologies and (2) the potential contributions of ontologies, folksonomies, wordnets, thesauri, classifications, controlled languages, etc. to translation and information access among all languages. Led by the Turing Center's director, Oren Etzioni. Brief introductory remarks by Emily Bender (Linguistics), Peter Clark (Boeing), Ed Cutrell (Microsoft Research), Oren Etzioni (CSE), Scott Farrar (Linguistics and U of Arizona), John Gennari (Biomedical Informatics), Jonathan Pool (Utilika Foundation), Lucy Vanderwende (Microsoft Research), and Stuart Weibel (iSchool and OCLC/DCMI). Interested researchers and students are welcome to take part. Refreshments will be served. Current Events |
|||||||||||||||||
Email: | Maps | Directions |
|||||||||||||||||