Turing Center Events

Symposium

Seventeenth UW/Microsoft Quarterly Symposium in Computational Linguistics
January 23 (Friday), 3:30 pm - 5:00 pm, Microsoft Corporation, Building 99 (14820 NE 36th Street), room 1919 (first floor)

You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science; Microsoft Research; and UW alumni at Microsoft.

Transit from UW: Sound Transit Route 545 every 15 minutes from Montlake Freeway Station (2:07 or 2:22 p.m.) to SR 520 and NE 40th Street (2:19 or 2:34 p.m.); cross SR 520 to 148th Avenue NE, turn left, go to NE 36th Street.

Carpooling from UW: Contact Greg Hullender (greghull@u.washington.edu).

Visitor badge preparation: Send your name to Joyce Parvi (phoneme@u.washington.edu).

Talks:

Jeremy Kahn (Linguistics)
Unlexicalizing the Comma: An Orthographic Assist to Statistical Machine Translation

Ye-Yi Wang (Microsoft Reseach, Speech Technology)
Voice Search of Structured Media Data

Turing Talk

Fabian Suchanek (Max-Planck-Institute for Computer Science)
YAGO - A Core of Semantic Knowledge
January 29 (Thursday), 11:00 am - 12:30 pm, Paul G. Allen Center for Computer Science and Engineering 305

The slides of this Turing Talk are available.

Abstract

This talk will present YAGO. YAGO is a large ontology, which currently contains more than 2 million entities and close to 20 million facts about them. The talk will explain how YAGO was constructed and how we represent YAGO's knowledge. It will also introduce the SOFIE project. SOFIE uses logical reasoning to extract new information for YAGO from Web sources.

Speaker

Fabian Suchanek recently received his PhD in Computer Science at Max-Planck-Institute; he was advised by Professor Gerhard Weikum. Fabian has been working in the areas of Automated Reasoning, Information Extraction, Ontologies and the Semantic Web. His main work is the YAGO ontology.

Turing Talk

Francis Bond (MASTAR Project, National Institute of Information and Communications Technology)
Opportunistic Lexicon Development
February 18 (Wednesday), 11:00 am - 12:00 noon, Paul G. Allen Center for Computer Science and Engineering 403

Abstract

In this talk I present a variety of opportunistic methods for rapidly building bilingual lexicons. They include: linking through multiple languages, deducing semantics from numeral classifiers, identifying cognates using Chinese characters and finding translations in mainly monolingual corpora.

Symposium

Eighteenth UW/Microsoft Quarterly Symposium in Computational Linguistics
April 10 (Friday), 3:30 pm - 5:30 pm, Mary Gates 241

Talk 1

Vikram Dendi (Microsoft Research)
Standing in the User’s Shoes—Enhancing the “Perceived Value” of Translation

Abstract

Microsoft recently announced a technology (Translator Widget) that makes it dead simple to bring machine translations to a web page. Why does MSR’s machine translation team see this as a critical step in increasing the value of MT? In his responsibility of planning the innovation value chain for this team at Microsoft Research, Vikram has been exploring the idea of “perceived value” and how it ties into translation quality. The “perceived value” concept contrasts with more empirical methods of measuring (translation) quality in that it captures a user’s overall expectation of the value that a technology delivers to them. You will have an opportunity to contrast the scientist’s viewpoint with that of a strategist in this brief conversation.

Speaker

Vikram Dendi wears multiple hats in Microsoft Research’s machine translation team. He primarily leads business strategy and planning for the team, but also spends a lot of time thinking about the user experience.

Related Work

The Emergence Of Machine Translation.

Talk 2

Kevin Duh (Signal, Speech, and Language Interpretation Lab, UW)
Semi-Supervised Learning for Ranking

Abstract

The problem of ranking, whose goal is to predict an ordering over a set of objects, is a key problem in many applications. In web search, for instance, ranking algorithms are used to order webpages in terms of relevance to the user. In complex NLP systems (e.g. machine translation, parsing), a set of candidate hypotheses is re-ranked such that the best one emerges at the top. While ranking has become an active research area, most work is done under the framework of supervised learning.

In this talk, I will discuss whether ranking algorithms can be extended to the semi-supervised learning framework, i.e., “Can additional unlabeled data be exploited to improve ranking performance?” I will begin by surveying possible approaches in this open area (based on prior work in semi-supervised classification). Then I will introduce a general “transductive/local” meta-algorithm for turning a standard supervised algorithm into a semi-supervised one. Results in the context of information retrieval will be presented.

Speaker

Kevin Duh is a PhD student at the University of Washington. He is a member of the UW SSLI lab, where he works on in natural language processing, information retrieval, and machine learning. In particular, he is interested in minimally-supervised approaches for resource-poor languages and complex structured problems. He received his B.S. in Electrical Engineering from Rice University in 2003 was an NSF Graduate Fellow from 2005-2008.

Symposium

Nineteenth UW/Microsoft Quarterly Symposium in Computational Linguistics
October 9 (Friday), 3:00 pm - 5:00 pm, Microsoft Corporation, Building 99 (14820 NE 36th Street), room 1919 (first floor)

Transit from UW: Sound Transit Route 545 from Montlake Freeway Station to SR 520 and NE 40th Street; cross SR 520 to 148th Avenue NE, turn left, go to NE 36th Street.

Talk 1

Stanley Kok (Computer Science & Engineering, UW) and Chris Brockett (Microsoft Research Natural Language Processing Group)
Hitting the Right Paraphrases in Good Time

Abstract

We present a random-walk-based approach to extracting paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes their corresponding phrases are aligned. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones being “closer” to the phrase of interest. This approach allows “feature” nodes that represent domain knowledge to be easily incorporated into the graph, and incorporates techniques to prevent the graph from growing too large for efficiency. Current state-of-the-art approaches, by contrast, require the graph to be bipartite, are limited to finding paraphrases that are of length two away from a phrase, and do not generally permit easy incorporation of domain knowledge into the graph. Manual evaluation of generated output shows that this approach outperforms state-of-the-art.

Talk 2

Alan Ritter (Computer Science & Engineering, UW) and Colin Cherry (Microsoft Research Natural Language Processing Group)
Toward the Twuring Test: Conversation Modeling using Twitter

Abstract

The growing popularity of social media has had an interesting side-effect for language researchers: services such as Twitter have resulted in people having instant-messenger-style conversations using a public medium, where anyone can observe. This creates a unique opportunity to collect, study, and model large-scale conversation data. We present a method for mining conversations from Twitter's public feed. The resulting conversation corpus, which will be made publicly available, has more than 1.3 million conversations, 75 thousand of which have more than 5 turns, providing a rich resource for the study of both Twitter and internet chat. Furthermore, we present several methods that attempt to model the flow of conversation by discovering latent classes over Tweets. We show that a repurposed content model (Barzilay and Lee 2004) can discover meaningful dialogue acts, such as “question” and “comment”, which indicate not only the role a Tweet plays in its conversation, but also the sorts of Tweets that are likely to follow. This model is improved and extended by employing a Bayesian sampling-based approach, allowing us to model a conversation's topic, and to introduce sparse priors during learning.

Talk 3

Hoifung Poon (Computer Science & Engineering, UW) and Lucy Vanderwende (Microsoft Research Natural Language Processing Group)
Joint Inference for Knowledge Extraction from Biomedical Literature

Abstract

Automatically extracting knowledge from online repositories (e.g., PubMed) holds the promise of dramatically speeding up biomedical research and drug design, and represents an outstanding example for the great vision of knowledge extraction from the Web. After initially focusing on entity recognition and binary interaction for protein, the community has recently shifted their attention towards the more ambitious goal of recognizing complex, nested event structures, which are ubiquitous in the literature. However, the state-of-the-art systems still adopt a pipeline architecture and fail to leverage the relational structures among candidate entities for mutual disambiguation. In this paper, we present the first joint approach for bioevent extraction that obtains state-of-the-art results. Our system is based on Markov logic and jointly predicts events and their arguments. We evaluated it using the BioNLP-09 Shared Task and compared it to the participating systems. Experimental results demonstrate the advantage of our approach.

Previous Events

2009

Symposium

Turing Talk

Turing Talk

Symposium

Symposium

Current Events