-
Notifications
You must be signed in to change notification settings - Fork 4
/
index2.html
248 lines (212 loc) · 14.5 KB
/
index2.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<!-- THIS IS A MACHINE-GENERATED/MAINTAINED FILE. YOU CAN MANUALLY EDIT CONTENT
- ONLY BETWEEN HTML COMMENTS "begin HTML content" AND "end HTML content"
- OTHER CHANGES WILL BE LOST! TO MAINTAIN,
- USE http://www.cs.washington.edu/htbin-post/content-tool.cgi. -->
<title>Alan Ritter</title>
<link rel="stylesheet" href="/home/cse.css" type="text/css">
<script src="/home/cse2.js" type="text/javascript"></script>
<style type="text/css" media="screen">
body {
margin:50px 0px; padding:0px; /* Need to set body margin and padding to get consistency between browsers. */
}
#Content {
width:800px;
margin:0px auto; /* Right and left margin widths set to "auto" */
text-align:left; /* Counteract to IE5/Win Hack */
padding:15px;
border:1px dashed #333;
#background-color:#eee;
}
</style>
</head>
<body style="width:800px;margin:0px auto;">
<!-- <p align="center"><img border="0" src="IMG_0336.jpg" width="365" height="274"> <br/> -->
<div id="Content">
<table width="100%">
<tr>
<td>
<h1>Alan Ritter</h1>
<i>Ph.d. Student</i> <br>
<a href="http://www.cs.washington.edu/">Computer Science and Engineering</a><br>
<a href="http://www.washington.edu/">University of Washington</a><br>
<img src="/images/cse_logo_133.gif" width="133" height="100" border="0" alt="CSE logo" />
</td>
<td>
<p align="right"><img border="0" src="edinburgh.jpg" height="274"> <br/>
<a href="/htbin-post/unrestricted/mailto2.pl?to=aritter;sub=Hello">Email me!</a> </p><br/>
</td>
</tr>
</table>
<h1>Overview</h1>
I am a 5th year Ph.d. student in the Computer Science Department at the University of Washington. My graduate adviser is <a href="http://www.cs.washington.edu/homes/etzioni/">Oren Etzioni</a>; I have collaborated with <a href="http://www.cs.northwestern.edu/~ddowney/">Doug Downey</a>, <a href="http://www.cs.washington.edu/homes/mausam/">Mausam</a>, and <a href="http://www.cs.washington.edu/homes/soderlan/">Stephen Soderland</a> at UW, in addition to <a href="https://sites.google.com/site/colinacherry/">Colin Cherry</a>, <a href="http://research.microsoft.com/en-us/people/billdol/">Bill Dolan</a> and <a href="http://research.microsoft.com/en-us/um/people/sumitb/">Sumit Basu</a> during internships at <a href='http://research.microsoft.com'>Microsoft Research</a>.
<!-- I was awarded an <a href="http://ndseg.asee.org/">NDSEG Fellowship</a> in 2008. -->
<h1> Research Summary </h1>
Broadly I am interested in getting computers to better understand natural language, and new applications this will enable.
I enjoy working with large quantities of text which is not restricted to a narrow domain, such as that found on the <b>Web</b> or in <b>Social Media</b>.
Specifically my work focuses on <a href="http://en.wikipedia.org/wiki/Information_extraction">Information Extraction</a>, <a href="http://www.siglex.org/">Computational Lexical Semantics</a>, <a href="http://en.wikipedia.org/wiki/Latent_variable">Latent Variable Models</a>, <a href="http://research.microsoft.com/en-us/events/lsm2011/">Language Processing in Social Media</a> and <a href="http://iuiconf.org/">Intelligent Interfaces</a>.
Following are short descriptions of research directions.
<h3>Information Extraction in Social Media</h3>
<hr>
By extracting Named Entities, in addition to extracting and resolving temporal expressions (for example "next Friday"), we are able automatically
extract a <a href="http://statuscalendar.com">calendar</a> of popular events occurring in the near future from Twitter.
</p>
Off-the shelf tools such as Part of Speech Taggers and Named Entity Recognizers perform extremely poorly when applied to Twitter due to it's noisy and
unique style. To address this I have been working towards building a set of <a href="http://github.com/aritter/twitter_nlp">Twitter-specific text processing tools</a> <a href="twitter_ner.pdf">[EMNLP 2011]</a>.
<h3>Conversational Modeling in Social Media</h3>
<hr>
I have worked on unsupervised modeling of dialogue acts in Twitter <a href="http://www.cs.washington.edu/homes/aritter/twitter_chat.pdf">[NAACL 2010]</a>. By remaining agnostic about the set of classes, we are
able to learn a model which provides insight into the nature of communication in a new medium.
<p>
I have investigated the feasibility of automatically replying to status messages by adapting techniques
from <b>Statistical Machine Translation</b> <a href="mt_chat.pdf">[EMNLP 2011]</a> and utilizing millions of naturally occurring Twitter conversations
as parallel text.
Although there are many differences between conversation and translation, with a few conversation-specific adaptations we
are able to build Response Models which <a href="http://www.cs.washington.edu/homes/aritter/mt_chat.html">often generate appropriate replies to Twitter status posts</a>.
This work has several possible applications, including conversationally aware predictive text entry.
<h3>Latent Variable Models of Lexical Semantics</h3>
<hr>
I have applied a variant of <b>Latent Dirichlet Allocation</b> to automatically infer the argument types or <b>Selectional Preferences</b> of
textual relations <a href="http://turing.cs.washington.edu/papers/acl-2010-ritter.pdf">[ACL 2010]</a>. Generative models have the advantage that they provide a principled way to perform many different
kinds of probabilistic queries about the data. For example, our model of selectional preferences is useful in filtering improper applications of inference rules in context, showing a substantial improvement over the previous state-of-the-art rule-filtering system which makes use of a predefined set of classes.
The topics discovered by our model can be browsed <a href="http://rv-n02.cs.washington.edu:1234/lda_sp_demo_v3/lda_sp/topics/">here</a>. Inference and evaluation code is available for download <a href="https://github.com/aritter/LDA-SP">here</a>.
<p>
In addition, I have investigated <b>Distant Supervision with Topic Models</b>. As a distant source of supervision we make use of facts from <a href="http://www.freebase.com/">Freebase</a>, a large, open-domain database, to generate constraints in the topic model.
This approach leverages the ambiguous training data provided by Freebase in a principled way, significantly
outperforming Co-Training on a weakly supervised named entity classification task <a href="twitter_ner.pdf">[EMNLP 2011]</a>.
<h3>Utilizing Implicit Feedback in Interactive File Selection</h3>
<hr>
Selection tasks are common in modern computer interfaces; we are often required to
select multiple files, emails, and other data entries for copying, modification, deletion etc...
Complex selection tasks can require many clicks and mouse movements on behalf of the user;
to aid users with these complex selections we propose an interactive machine learning solution <a href="http://www.cs.washington.edu/homes/aritter/p167-ritter.pdf">[IUI 2009]</a>.
In addition to making use of explicit selections and deselections, we utilize implicit
features of the user's behaviour such as passing over files, or proximity in the interface.
Since the behaviour features are task-independent, we use historical interaction traces as training
data. A video demonstration of our file-selection prototype can be viewed <a href="http://www.youtube.com/watch?v=4V-ukMndnFo">here</a>.
<h3>Finding Contradictions in Web Text</h3>
<hr>
Many textual relations map one argument to a unique value. For example the verb <i>assassinated</i> should map each direct object to a unique subject. We investigate automatically
classifying relation functionality using an unsupervised EM-style algorithm, and evaluate performance at discovering naturally occurring contradictions within a large web corpus <a href="http://www.cs.washington.edu/homes/aritter/Ritter_emnlp08.pdf">[EMNLP 2008]</a>. We show
that contradiction detection on the web is a difficult task for a variety of reasons including name ambiguity (e.g. John Smith was born in many different locations), synonyms and meronyms
(Mozart was born in both Salzburg and Austria).
<h3>Interactive Information Integration with HTML Tables & Freebase</h3>
<hr>
As part of the grad databases class, I investigated using data-integration techniques, to augment HTML tables with additional data from Freebase. Users can choose to display columns which are not present in the original table, but for which data exists in Freebase, providing direct benefit. Details on this project including a prototype Firefox browser plugin can be found <a href="http://www.cs.washington.edu/homes/aritter/freebase_html/">here</a>.
<hr>
<h1>Teaching Experience</h1>
I have served as teaching assistant for the Machine Learning class at UW in <a href="http://www.cs.washington.edu/education/courses/cse446/10wi/">Winter 2010</a> and <a href="http://www.cs.washington.edu/education/courses/cse446/11wi/">Winter 2011</a>, where I helped design homework assignments, answered student questions one on one, and presented an hour long lecture in class.
<p>
Also at UW, I mentored Sam Clark during his senior year and Master's program; during this time I supervised his work on annotating a corpus of Tweets with Parts of Speech,
and building a <a href="https://github.com/aritter/twitter_nlp">Part-of-Speech Tagger</a> for Twitter </a> <a href="twitter_ner.pdf">[EMNLP 2011]</a>. Sam is now at <a href="http://www.decide.com/">Decide.com</a>.
<br /><br />
<!-- <a href="cv.pdf">C.V.</a> <br/> -->
<hr>
<table>
<tr>
<td>
<h2>Publications</h2>
<h3>2011</h3>
<p>
<a href="twitter_ner.pdf">Named Entity Recognition in Tweets: An Experimental Study</a> <br>
Alan Ritter, Sam Clark, Mausam, Oren Etzioni <br>
Proceedings of EMNLP 2011 <br>
<!-- <b><a href="naacl10_v5.pptx"><font color="gray">Slides</font></a></b> -->
</p>
<p>
<a href="mt_chat.pdf">Data-Driven Response Generation in Social Media</a> <br>
Alan Ritter, Colin Cherry, Bill Dolan <br>
Proceedings of EMNLP 2011 <br>
<!-- <b><a href="naacl10_v5.pptx"><font color="gray">Slides</font></a></b> -->
</p>
<h3>2010</h3>
<p>
<a href="http://turing.cs.washington.edu/papers/acl-2010-ritter.pdf">A Latent Dirichlet Allocation Method for Selectional Preferences</a> <br>
Alan Ritter, Mausam, Oren Etzioni <br>
Proceedings of ACL 2010 <br>
<b><a href="ACL10_v4.pptx"><font color="gray">Slides</font></a></b>
</p>
<meta name="citation_title" content="A Latent Dirichlet Allocation Method for Selectional Preferences">
<meta name="citation_author" content="Ritter, Alan">
<meta name="citation_author" content="Mausam">
<meta name="citation_author" content="Etzioni, Oren">
<meta name="citation_date" content="2010/7/12">
<meta name="citation_journal_title" content="ACL 2010">
<meta name="citation_pdf_url" content="http://turing.cs.washington.edu/papers/acl-2010-ritter.pdf">
<p>
<a href="twitter_chat.pdf">Unsupervised Modeling of Twitter Conversations</a> <br>
Alan Ritter, Colin Cherry, Bill Dolan <br>
Proceedings of HLT-NAACL 2010 <br>
<b><a href="naacl10_v5.pptx"><font color="gray">Slides</font></a></b>
</p>
<h3>2009</h3>
<p>
<!-- <a href="http://research.microsoft.com/en-us/um/people/sumitb/smartselection/index.html">Learning to Generalize for Complex Selection Tasks</a><br/> -->
<a href="p167-ritter.pdf">Learning to Generalize for Complex Selection Tasks</a><br/>
Alan Ritter and Sumit Basu <br/>
<b><font color="blue">Best Student Paper Award</b></font><br/>
<i>IUI 2009</i><br/>
<b><a href="http://www.youtube.com/watch?v=4V-ukMndnFo"><font color="gray">Video</font></a></b>
<b><a href="IUI Presentation.pptx"><font color="gray">Slides</font></a></b>
</p>
<p>
<a href="chat.pdf">Filter, Rank, and Transfer the Knowledge: Learning to Chat</a> <br>
Sina Jafarpour and Chris Burges, Alan Ritter <br>
NIPS Workshop on Advances in Ranking, Vancouver, Canada, 2009
</p>
<a href="http://turing.cs.washington.edu/papers/ritter_aaai_ss09.pdf">What Is This, Anyway: Automatic Hypernym Discovery</a><br/>
Alan Ritter, Stephen Soderland, and Oren Etzioni<br/>
<i>2009 AAAI Spring Symposium on Learning by Reading and Learning to Read</i></p>
<h3>2008</h3>
<p>
<a href="Ritter_emnlp08.pdf">It's a Contradiction -- No, It's Not: A Case Study using Functional Relations</a> <br/>
Alan Ritter, Doug Downey, Stephen Soderland, and Oren Etzioni<br/>
<i>EMNLP 2008</i><br>
<b><a href="AuContraire EMNLP08.pptx"><font color="gray">Slides</font></a></b>
</p>
<h3>2006</h3>
<p>
<a href="cluster.pdf">Distributional Word Clustering in Parallel</a> <br/>
Alan Ritter, James Hearne, Philip Nelson<br/>
<i>ISCA PDCS 2006</i></p>
<p>
<a href="tuning_mosix.pdf">Machine Learning Approach to Tuning Distributed Operating System Load Balancing Algorithms</a> <br/>
Michael Meehan, Alan Ritter<br/>
<i>ISCA PDCS 2006</i></p>
<p>
<a href="http://www.drdobbs.com/windows/184406383;jsessionid=JCRKHCWRGSIWTQE1GHOSKHWATMY32JVN#0601ds2">NDIS Network Driver</a> <br>
Alan Ritter<br/>
<i>Dr. Dobb's Journal, January 2006</i></p>
</td>
<td align="left" valign="top">
<h2>Software/Data/Demos</h2>
<ul>
<li><a href='http://github.com/aritter/twitter_nlp'>Twitter Named Entity Recognizer</a></li>
<li><a href='http://statuscalendar.com'>Twitter Calendar</a></li>
<li><a href='http://www.cs.washington.edu/homes/aritter/mt_chat.html'>Twitter Response Generator</a></li>
<!-- <li><a href='http://www.cs.washington.edu/homes/aritter/twitter_chat/'>Twitter Conversation Corpus</a></li> -->
<li><a href='http://rv-n02.cs.washington.edu:1234/lda_sp_demo_v3/lda_sp/relations/'>Topic-Model Based Selectional Preferences</a> see also <a href="https://github.com/aritter/LDA-SP">Inference and Evaluation Code</a></li>
<li><a href='freebase_html'>Querying and Updating Freebase with Web Tables</a></li>
<!-- <li><a href='http://turingc.cs.washington.edu:7125/TextRunner/cgi-bin/raw_hyp.pl?http'>Hypernyms Extracted from the Web</a></li> -->
<li><a href="http://milkhog.cs.washington.edu:1234/tgrep/tgrep_extract.py">Extraction patterns over parsed news articles (using the Charniak Parser and tgrep2)</a></li>
<!-- <li><a href="http://turingc.cs.washington.edu:1234/conflict-dev.pl?pred=invented&n=100000">Automatically detecting conflicting statements on the web</a></li> -->
<!-- <li><a href="http://turingc.cs.washington.edu:7125/TextRunner/cgi-bin/analogy.pl">Solving analogies using TextRunner</a></li> -->
</ul>
</td>
</tr>
</table>
<h3>Presentations</h3>
<a href="navdas(2).ppt">NRL Seminar</a> <br/>
<a href="poster.pdf">openMosix Load Balancing Poster</a>
</div>
<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-2879579-2";
urchinTracker();
</script>
</body>
</html>