|
|||||||||||||||||
Previous Events2009SymposiumSeventeenth UW/Microsoft Quarterly Symposium in Computational Linguistics You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science; Microsoft Research; and UW alumni at Microsoft. Transit from UW: Sound Transit Route 545 every 15 minutes from Montlake Freeway Station (2:07 or 2:22 p.m.) to SR 520 and NE 40th Street (2:19 or 2:34 p.m.); cross SR 520 to 148th Avenue NE, turn left, go to NE 36th Street. Carpooling from UW: Contact Greg Hullender (greghull@u.washington.edu). Visitor badge preparation: Send your name to Joyce Parvi (phoneme@u.washington.edu). Talks: Jeremy Kahn (Linguistics) Ye-Yi Wang (Microsoft Reseach, Speech Technology) Turing TalkFabian Suchanek (Max-Planck-Institute for Computer Science) The slides of this Turing Talk are available. Abstract This talk will present YAGO. YAGO is a large ontology, which currently contains more than 2 million entities and close to 20 million facts about them. The talk will explain how YAGO was constructed and how we represent YAGO's knowledge. It will also introduce the SOFIE project. SOFIE uses logical reasoning to extract new information for YAGO from Web sources. Speaker Fabian Suchanek recently received his PhD in Computer Science at Max-Planck-Institute; he was advised by Professor Gerhard Weikum. Fabian has been working in the areas of Automated Reasoning, Information Extraction, Ontologies and the Semantic Web. His main work is the YAGO ontology. Turing TalkFrancis Bond (MASTAR Project, National Institute of Information and Communications Technology) Abstract In this talk I present a variety of opportunistic methods for rapidly building bilingual lexicons. They include: linking through multiple languages, deducing semantics from numeral classifiers, identifying cognates using Chinese characters and finding translations in mainly monolingual corpora. SymposiumEighteenth UW/Microsoft Quarterly Symposium in Computational Linguistics You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science; Microsoft Research; and UW alumni at Microsoft. Talk 1 Vikram Dendi (Microsoft Research) Abstract Microsoft recently announced a technology (Translator Widget) that makes it dead simple to bring machine translations to a web page. Why does MSR’s machine translation team see this as a critical step in increasing the value of MT? In his responsibility of planning the innovation value chain for this team at Microsoft Research, Vikram has been exploring the idea of “perceived value” and how it ties into translation quality. The “perceived value” concept contrasts with more empirical methods of measuring (translation) quality in that it captures a user’s overall expectation of the value that a technology delivers to them. You will have an opportunity to contrast the scientist’s viewpoint with that of a strategist in this brief conversation. Speaker Vikram Dendi wears multiple hats in Microsoft Research’s machine translation team. He primarily leads business strategy and planning for the team, but also spends a lot of time thinking about the user experience. Related Work The Emergence Of Machine Translation. Talk 2 Kevin Duh (Signal, Speech, and Language Interpretation Lab, UW) Abstract The problem of ranking, whose goal is to predict an ordering over a set of objects, is a key problem in many applications. In web search, for instance, ranking algorithms are used to order webpages in terms of relevance to the user. In complex NLP systems (e.g. machine translation, parsing), a set of candidate hypotheses is re-ranked such that the best one emerges at the top. While ranking has become an active research area, most work is done under the framework of supervised learning. Speaker Kevin Duh is a PhD student at the University of Washington. He is a member of the UW SSLI lab, where he works on in natural language processing, information retrieval, and machine learning. In particular, he is interested in minimally-supervised approaches for resource-poor languages and complex structured problems. He received his B.S. in Electrical Engineering from Rice University in 2003 was an NSF Graduate Fellow from 2005-2008. SymposiumNineteenth UW/Microsoft Quarterly Symposium in Computational Linguistics You are invited to take advantage of this opportunity to connect with the computational linguistics community at Microsoft and the University of Washington. Sponsored by the UW Departments of Linguistics, Electrical Engineering, and Computer Science; Microsoft Research; and UW alumni at Microsoft. Transit from UW: Sound Transit Route 545 from Montlake Freeway Station to SR 520 and NE 40th Street; cross SR 520 to 148th Avenue NE, turn left, go to NE 36th Street. Talk 1 Stanley Kok (Computer Science & Engineering, UW) and Chris Brockett (Microsoft Research Natural Language Processing Group) Abstract We present a random-walk-based approach to extracting paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes their corresponding phrases are aligned. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones being “closer” to the phrase of interest. This approach allows “feature” nodes that represent domain knowledge to be easily incorporated into the graph, and incorporates techniques to prevent the graph from growing too large for efficiency. Current state-of-the-art approaches, by contrast, require the graph to be bipartite, are limited to finding paraphrases that are of length two away from a phrase, and do not generally permit easy incorporation of domain knowledge into the graph. Manual evaluation of generated output shows that this approach outperforms state-of-the-art. Talk 2 Alan Ritter (Computer Science & Engineering, UW) and Colin Cherry (Microsoft Research Natural Language Processing Group) Abstract The growing popularity of social media has had an interesting side-effect for language researchers: services such as Twitter have resulted in people having instant-messenger-style conversations using a public medium, where anyone can observe. This creates a unique opportunity to collect, study, and model large-scale conversation data. We present a method for mining conversations from Twitter's public feed. The resulting conversation corpus, which will be made publicly available, has more than 1.3 million conversations, 75 thousand of which have more than 5 turns, providing a rich resource for the study of both Twitter and internet chat. Furthermore, we present several methods that attempt to model the flow of conversation by discovering latent classes over Tweets. We show that a repurposed content model (Barzilay and Lee 2004) can discover meaningful dialogue acts, such as “question” and “comment”, which indicate not only the role a Tweet plays in its conversation, but also the sorts of Tweets that are likely to follow. This model is improved and extended by employing a Bayesian sampling-based approach, allowing us to model a conversation's topic, and to introduce sparse priors during learning. Talk 3 Hoifung Poon (Computer Science & Engineering, UW) and Lucy Vanderwende (Microsoft Research Natural Language Processing Group) Abstract Automatically extracting knowledge from online repositories (e.g., PubMed) holds the promise of dramatically speeding up biomedical research and drug design, and represents an outstanding example for the great vision of knowledge extraction from the Web. After initially focusing on entity recognition and binary interaction for protein, the community has recently shifted their attention towards the more ambitious goal of recognizing complex, nested event structures, which are ubiquitous in the literature. However, the state-of-the-art systems still adopt a pipeline architecture and fail to leverage the relational structures among candidate entities for mutual disambiguation. In this paper, we present the first joint approach for bioevent extraction that obtains state-of-the-art results. Our system is based on Markov logic and jointly predicts events and their arguments. We evaluated it using the BioNLP-09 Shared Task and compared it to the participating systems. Experimental results demonstrate the advantage of our approach. Current Events |
|||||||||||||||||
Email: | Maps | Directions |
|||||||||||||||||