RESEARCH
Summary | Projects | Recent Invitated Talks
Learnable Constraint-based Grammars for Deep Language Understanding
The core of my research focuses on developing computational models for language understanding and learning, in multilingual settings. I introduce a new grammar formalism, Lexicalized Well-Founded Grammar, which captures syntax and semantics, has ontology constraints at the grammar rule level, and is learnable from a small set of annotated examples (Muresan 2006; Muresan and Rambow, 2007; Muresan 2008; Muresan 2010). The semantic representation is an ontology-based representation, which is expressive enough to capture various phenomena of natural language, yet restrictive enough to facilitate learning. I introduce a new grammar learning model, Grammar Approximation by Representative Sublanguage, based on the concept of representative examples, defining the importance to the model linguistically, and not simply by frequency, as in most previous work. The search space for grammar induction is a complete grammar lattice, which guarantees the uniqueness of the solution. More generally, my research objective is to explore what computational formalisms and machine learning techniques are adequate to model both context and cumulative learning for language understanding in a unified way. The goal is to bring us closer to achieve scalable, robust and human-inspired computational models for deep language understanding and language learning.
Indentifying the Language of Opposition in Online Interractions
Project in collaboration with Mark Aakus and Nina Wacholder.
This project contributes to the long-standing interest in developing socially intelligent systems to augment human reasoning and interaction in large-scale online communities by focusing on a specific, yet ubiquitous, phenomenon of human behavior: opposition. This research uses theories from communication sciences and natural language processing techniques for identifying and characterizing the flow of opposition in online interactions.This project will offer a key reframing of prior natural language processing research related to opposition by introducing a robust and scalable deep-linguistic approach for modeling how texts relate to each other through opposition. The computational models of opposition will be tested in two applications for social collaborative environments: 1) The Controversy Alert System will detect and highlight controversial utterances and blunders in Wikipedia articles. 2) The Opposition Monitor will help users track the flow of opposition in an extended online interaction, such as who opposes whom and on what issue.
Past Project
Exploring Richer Representations in Statistical Machine Translation
Project in collaboration with Philip Resnik at University of Maryland, College Park, funded by NSF SGER:"Exploiting Alternative Packagings of Source Meaning in Statistical Machine Translation".
Current approaches in statistical machine translation (MT) miss a key fact: the source language sentence is not the only way the author's meaning could have been expressed. The idea that the source sentence is just one of various ``packagings'' of underlying meaning was, of course, one familiar motivation for interlingual approaches to translation; however, interlingual semantic representations have generally been abandoned as notoriously difficult to define, and equally difficult to obtain accurately with broad coverage once defined. In this project, we are revisiting the idea of "packagings" of meaning, but exploring it in practical ways consistent with current practice in statistical MT. Unlike semantic transfer or interlingual approaches, we encode alternatives as source paraphrase lattices, a representation that allows us to exploit generalizations about the source language while still maintaining the surface-to-surface orientation that characterizes the statistical state of the art. Our exploratory work focuses on capturing syntactic and semantic variation using Lexicalized Well Founded Grammars (LWFG), a recent formalism that balances expressiveness with practical and provable learnability results (Muresan, 2006; Muresan and Rambow, 2007; Muresan 2008). We are quantifying and characterizing the information available in source paraphrase lattices, assessing the value of shallow paraphrasing, and exploring the relative promise of deeper techniques for source paraphase generation using LWFG and other constraint-based grammatical frameworks. The ability to capture generalizations via source paraphrase may open new possibilities in the translation of minority and endangered languages, which lack training corpora on the scale necessary to support standard statistical MT techniques.
Learning Consumer Health Terminologies from Text
Project in collaboration with Judith Klavans, University of Maryland Institute for Advanced Computer Studies.
Understanding and sharing terminology, both by systems and humans, are important aspects of communication. In this project, we propose a two-step approach towards building terminological knowedge bases for consumer-health systems from text (Muresan and Klavans(in prep)). First, we automatically extract defnitions from consumer-oriented on-line articles and web documents, which reflects language in use, rather than rely solely on dictionaries. This step is based on our previous work on DEFINDER ((Klavans and Muresan, 2000;2001; Muresan and Klavans 2002), which we plan to extend by using machine learning aproaches. Second, we learn a grammar that directly maps natural language to graph-based meaning representations, rather than use hand-written patterns, grammars, or semantic transfer rules used on top of syntactic parses. We use our LWFG grammar formalism which capture syntax and semantics, and models semantic interpretation as grammar constraints.
Smaranda Muresan
Assistant Professor
Rutgers University
Library and Information Science Department
4 Huntington St
New Brunswick,
NJ,08901-1071
office: CIL 108
email: smuresan -AT- rci.rutgers.edu
phone:1-732-932-7500 (ext 8267)
Curriculum Vitae
