quality-estimation-task.html

<HTML>
  <HEAD>
    <title>Quality Estimation Task - ACL 2017 Second Conference  on  Machine Translation</title>
    <style> h3 { margin-top: 2em; } </style>
  </HEAD>
  <body>

    <center>
      <script src="title.js"></script>
      <p><h2>Shared Task: Quality Estimation</h2></p>
      <script src="menu.js"></script>
    </center>

<p>This shared task will build on its previous five editions to further 
examine automatic&nbsp;methods for estimating the quality of machine 
translation output at run-time, without relying on reference 
translations. We include <b>word-level</b>, <b>phrase-level</b> and <b>sentence-level</b>
 estimation. All tasks will make use of a large dataset produced from 
post-editions by professional translators. The data will be 
domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous 
years. In addition to advancing the state of the art at all prediction 
levels, our <b>goals</b> include:

</p><ul>
<li>To test the effectiveness of larger (domain-specific and professionally
 annotated) datasets. We will do so by increasing the size of one of 
last year's training sets.  </li>
<li>To study the effect of language direction and domain. We will do so by 
providing two datasets created in similar ways, but for different 
domains and language directions.</li> 
<li>To investigate the utility of detailed information logged during 
post-editing. We will do so by providing  post-editing time, 
keystrokes, and actual edits.</li>
<li>Measure progress over years at all prediction levels.  We will do so by using last year's test set for comparative experiments.</li>
</ul>

<!--
<ul>
<li>To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction.</li>
<li>To investigate quality estimation at a new level of granularity: phrases. </li>
<li>To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets.</li>
Our tasks have the following <b>goals</b>:
<li>To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. </li>
<li>To explore differences between sentence-level and document-level
prediction.</li>
<li>To analyse the effect of training data sizes and quality for sentence and word-level prediction, particularly the use of annotations obtained from crowdsourced post-editing. </li>
<li>To explore word-level quality prediction at different levels of granularity. </li>
<li> To investigate the effectiveness of different quality labels. </li>
<li>To push current work on sentence-level quality estimation towards robust models that can work across MT systems;</li>
<li> To study the effects of training and test datasets with mixed domains, language pairs and MT systems. </li>
<li>To test work on sentence-level quality estimation for the task of selecting the best translation amongst multiple systems;</li>
<li>To evaluate the applicability of quality estimation for post-editing tasks;</li>
<li>To provide a first common ground for development and comparison of quality estimation systems at word-level.</li>
</ul>
-->

This year's shared task provides new training and test datasets for all 
tasks, and allows participants to explore any additional data and 
resources deemed relevant. A in-house MT system was used to produce 
translations for all tasks. MT system-dependent information can be made 
available under request. The data is publicly available but since it has
 been provided by our industry partners it is subject to specific terms 
and conditions. However, these have no practical implications on the use
 of this data for research purposes. 

<p><br></p><hr>


<br>
<font color="purple"><b>Gold-standard labels for all subtasks </b> <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/wmt17_de-en_gold.tar.gz">German-English<a/> and <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/wmt17_en-de_gold.tar.gz">English-German<a/></font></b>.


<!-- BEGIN SENTENCE-LEVEL-->

<h3><font color="blue">Task 1: Sentence-level QE</font></h3>

<p><b><font color="purple">Results <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/wmt17_task1_results.pdf">here<a/></font></b>.  

<p> Participating systems are required to score (and rank) sentences 
according to post-editing effort. Multiple labels will be made 
available, including the percentage of edits need to be fixed (HTER), 
post-editing time, and keystrokes. The main prediction label will be 
HTER, but we welcome participants wanting to submit models trained to 
predict other labels. 
Predictions according to each alternative label will be evaluated 
independently. For the ranking variant, the predictions can be generated
 by models built using any of these labels (or their combination), as 
well using external information. The <b>data</b> consists of:


</p><ul>
<!--<li><b>English-German</b>: segments on the IT domain translated by an in-house phrase-based SMT system and post-edited by professional translators (23,000 for training, 1,000 for dev). 
Download the <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/task1_en-de_training-dev.tar.gz">training and development</a> data and baseline features.</li> 
<li><b>German-English</b>: segments on the Pharmaceutical domain translated by an in-house phrase-based SMT system and post-edited by professional translators (25,000 for training, 1,000 for dev). Download the <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/task1_de-en_training-dev.tar.gz">training and development</a> data and baseline features.</li> 
-->
<li><b>English-German</b>: segments on the IT domain translated by an in-house phrase-based SMT system and post-edited by professional translators (23,000 for training, 1,000 for dev). Download the <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features.</li> 
<li><b>German-English</b>: segments on the Pharmaceutical domain translated by an in-house phrase-based SMT system and post-edited by professional translators (25,000 for training, 1,000 for dev). Download the <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features. <font color="red"><b>WARNING</b>: there was an issue with the baseline features extracted for this language pair, so if you are using them, please download the file again from the same link (the 'Corrected Version' file).</font></li> 
</ul>

<p>The data for download contains source sentences, their machine translations, their post-editions (translations), HTER as post-editing effort scores. Other scores, such as post-editing time, will be made available shortly.  
In both cases, The  <a href="https://github.com/ghpaetzold/PET">PET</a> tool was used to collect these various types of information during post-editing. HTER labels were computed using <a href="http://www.umiacs.umd.edu/%7Esnover/terp/">TER</a> (default settings: tokenised, case insensitive, exact matching only, with scores capped to 1).
<p></p>


<p>As <i><font color="green">test data</font></i>, for each language pair we will provide <b>2,000</b> new sentence translations, produced by the same SMT system used for the training data for each language pair.  <font color="red"><b>NEW</b></font>: Download the <a href="http://hdl.handle.net/11372/LRT-2135">test</a> data and baseline features. For English-German, note that we are also releasing the <b>2016 test data</b>. Please submit your results for both test sets so we can attempt to measure progress over years.  

</p><p></p><p>
The usual <a href="http://www.quest.dcs.shef.ac.uk/quest_files/features_blackbox_baseline_17">17 features</a> used in WMT12-16 is considered for the <b>baseline system</b>.
 This system uses SVM regression with an RBF kernel, as well as grid 
search algorithm for the optimisation of relevant parameters. <a href="https://github.com/ghpaetzold/questplusplus">QuEst++</a> is used to build prediction models. 
<!--
 and this <a href="http://www.quest.dcs.shef.ac.uk/wmt13_files/evaluateWMTQP2013-Task1_1.pl">script</a> is used to evaluation the models. For significance tests, we use the bootstrap resampling method with <a href="http://www.quest.dcs.shef.ac.uk/wmt15_files/bootstrap-hypothesis-difference-significance.pl">this code</a>.
<br>
-->

</p><p>As in previous years, two variants of the results can be submitted:
</p><ul>
<li><b>Scoring</b>: An absolute quality score for each sentence 
translation according to the type of prediction, to be interpreted as an
 error metric: lower scores mean better translations.</li>
<li><b>Ranking</b>: A ranking of sentence translations for all source 
sentences from best to worst. For this variant, it does not matter how 
the ranking is produced (from HTER predictions, likert predictions, 
post-editing time, etc.). The reference ranking will be defined based on
 the true HTER scores.</li>
</ul>


<p><i><font color="green">Evaluation</font></i> is performed against the true label and/or ranking using as metrics:
</p><ul>
<li><b>Scoring</b>: Pearson's correlation (primary), Mean Average Error (MAE) and Root Mean Squared Error (RMSE).</li>
<li><b>Ranking</b>: Spearman's rank correlation (primary) and DeltaAvg. </li>
</ul>

<!--<font color="red">Add link to evaluation script</font>-->


<p><br></p><hr>

<!-- BEGIN WORD-LEVEL-->

<h3><font color="blue">Task 2: Word-level QE</font></h3>

<p><b><font color="purple">Results <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/wmt17_task2_results.pdf">here<a/></font></b>.  

<p>Participating systems are required to detect errors for each token 
in MT output. We frame the problem as the binary task of distinguishing 
between 'OK' and 'BAD' tokens. </p> 

<p>The <b>data</b> for this task is the same as provided in Task 1. As 
in previous years, all segments are automatically annotated for errors 
with binary word-level labels by using the alignments provided by the  <a href="http://www.cs.umd.edu/%7Esnover/tercom/" target="_blank">TER</a>
 tool (settings: tokenised, case insensitive, exact matching only, 
disabling shifts by using the `-d 0` option) between machine 
translations and their post-edited versions. Shifts (word order errors) 
were not annotated as such (but rather as deletions + insertions) to 
avoid introducing noise in the annotation.</p> 

As <i><font color="green">training</font></i> and <i><font color="green">development</font></i> data, we provide the tokenised translation outputs with tokens annotated with  'OK' or 'BAD' labels. Download:
<ul> 
<!--<li><b>English-German</b> <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/task2_en-de_training-dev.tar.gz">training and development</a> data and baseline features.</li> 
<li><b>German-English</b> <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/task2_de-en_training-dev.tar.gz">training and development</a> data and baseline features.</li> -->
<li><b>English-German</b> <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features.</li> 
<li><b>German-English</b> <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features.</li> 
</ul>


<p>As <i><font color="green">test data</font></i>, for each language pair we will provide <b>2,000</b> new sentence translations, produced and annotated in the same way.  <font color="red"><b>NEW</b></font>: Download the <a href="http://hdl.handle.net/11372/LRT-2135">test</a> data and baseline features. For English-German, note that we are also releasing the <b>2016 test data</b>. Please submit your results for both test sets so we can attempt to measure progress over years.    

<p>The baseline system is be similar to the baseline used at WMT-15 and WMT-16: the set of <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/task2_en-de_baseline.feature.list">baseline features</a> includes the same features as the ones used last year with the addition of feature combinations (target word + left/right context, target word + source word, etc.). The features are extracted with the <a href="https://github.com/qe-team/marmot">Marmot</a> QE tool. The system is trained with <a href="http://www.chokkan.org/software/crfsuite/">CRFSuite</a> toolkit with passive-aggressive algorithm.</p>

</p><p>Submissions are <i><font color="green">evaluated</font></i> in 
terms of classification performance via the multiplication of F1-scores 
for the 'OK' and 'BAD' classes against the original labels, as in WMT16. 
We will also report the F1-BAD score.
We use this <a href="https://gist.github.com/varvara-l/028e4439fb992d089935" target="_blank">evaluation script</a> for the metrics, and   
 <a href="https://gist.github.com/varvara-l/d66450db8da44b8584c02f4b6c79745c">this script</a> to compute significance levels using approximate randomisation.

<p><b><font color="red">NEW</font></b>:  Submissions to the word-level task will also be <font color="green">evaluated</font> in terms of their performance at sentence level. The motivation for that is that we found that sometimes predictions at word level can work well as sentence-level predictors: the percentage of words labelled as 'BAD' in a sentence should essentially be similar to a sentence-level HTER score. 
All submissions for Task 2 will automatically be evaluated analogously to the sentence-level scoring task: using Pearson correlation (primary metric), MAE and RMSE scores. Participants aiming to optimise their models against sentence-level metrics can submit one additional system per language pair if they wish so, using the submission format of Task 2. The binary word-level predictions will be used to compute the sentence-level score: number of words with 'BAD' label over the length of sentence.</p>

<br>
<hr>


<!-- BEGIN PHRASE-LEVEL-->

</p><h3><font color="blue">Task 3: Phrase-level QE</font></h3>

<p><b><font color="purple">Results <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/wmt17_task3_results.pdf">here<a/></font></b>.  


<p>For this task, given a 'phrase' (segmentation as given by the SMT 
decoder), participants are required to label it as 'OK' or 'BAD'. Errors
made by MT engines are interdependent and one incorrectly chosen word 
can cause more errors, especially in its local context. Phrases as 
produced by SMT decoders can be seen as a representation of this local 
context and in this task we ask participants to consider them as atomic 
units, using phrase-specific information to improve upon the results of 
the word-level task. 
<p>
The <b>data</b> for this task is the same as provided in Tasks 1 and 2. 
 The labelling of this data was adapted from word-level  labelling by 
assigning the 'BAD' tag to any phrase that contains at  least one 'BAD' 
word. We note, however, that <i> the order of the words in the source sentence is different here than the original word order</i>, as some pre-ordering was applied to the source sentences before decoding. Given that our phrases correspond to the decoder segmentation (based on this reordered version of the source), it is not possible to revert the pre-ordering while keeping the segmentation produced by the decoder. We also provide the original source sentences before the pre-ordering for those interested. 

<p>As <i><font color="green">training</font></i> and <i><font color="green">development</font></i>
 data, we provide the tokenised translation outputs with phrase 
segmentation for both source and machine-translated sentences. We also 
provide target-source phrase alignments and phrase-level labels.
Download:
<ul> 
<li><b>English-German</b> <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features.</li> 
<li><b>German-English</b> <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features.</li> 
</ul>

<p>The baseline phrase-level system is analogous to last year's system: it uses a set of <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/task3_en-de_baseline.feature.list">baseline features</a> (based on black-box sentence-level features) extracted with the <a href="https://github.com/qe-team/marmot">Marmot</a> tool and is trained with the <a href="http://www.chokkan.org/software/crfsuite/">CRFSuite</a> tool.</p>

<p>As <i><font color="green">test data</font></i>,  for each language pair we will provide <b>2,000</b> new sentence translations, produced and annotated in the same way.  <font color="red"><b>NEW</b></font>: Download the <a href="http://hdl.handle.net/11372/LRT-2135">test</a> data and baseline features. For English-German, note that we are also releasing the <b>2016 test data</b>. Please submit your results for both test sets so we can attempt to measure progress over years.     </p>

<p>Submissions will be <i><font color="green">evaluated</font></i> in terms of the multiplication of <b>phrase-level</b> F1-OK and F1-BAD. </p>

<!---- TASK 3B -->


<p><h3><font color="blue">Task 3b: Phrase-level QE with human annotation</font></h3>

<p><b><font color="red">This task was cancelled this year due to issues in the labelling of the data</font></b>.<p>
 
This task uses a subset of the data in Task 3 (German-English only) where each phrase has been annotated (as a phrase) by humans with three labels: 'OK', 'BAD' (as before) and 'BAD_word_order', which is a specific type of error where the phrase is in an incorrect position in the sentence. 

<p>The <i><font color="green">training</font></i> and <i><font color="green">development</font></i> data follow the same structure as for Task 3, but it is smaller (124 and 3,769 sentences, respectively). Download:
<ul> 
<li><b>German-English</b> <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1974">training and development</a> data and baseline features.</li> 
</ul>

<p>The baseline phrase-level system and evaluation procedures are the same as for Task 3. 

<p>As <i><font color="green">test data</font></i>, we will provide <b>306</b> new sentence translations, produced and annotated in the same way.  <font color="red"><b>NEW</b></font>: Download the <a href="http://hdl.handle.net/11372/LRT-2135">test</a> data and baseline features.  </p>

<br>
<hr>


<!-- EXTRA STUFF -->

<h3>Additional resources</h3>

<p>These are the resources we have used to extract the baseline features
 in Task 1, which can also be useful for Tasks 2 and 3. If you require 
other resources/info from the MT system, let us know:

</p><p>
<b>English-German</b>
</p><ul>
<li>English <a href="http://www.quest.dcs.shef.ac.uk/quest_files_16/lm.tok.en.tar.gz">language model</a></li>
<li>English <a href="http://www.quest.dcs.shef.ac.uk/quest_files_16/ngram-count.tok.en.out.clean.tar.gz">n-gram counts</a></li>
<li>German <a href="http://www.quest.dcs.shef.ac.uk/quest_files_16/lm.tok.de.tar.gz">language model</a></li>
<li>English-German (and v.v.)  <a href="http://www.quest.dcs.shef.ac.uk/quest_files_16/EN-DE.lex.tar.gz">lexical translation tables</a></li>
</ul>

<p><b>German-English</b>
</p><ul>
<li>German <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/lm.tok.de.tar.gz">language model</a></li>
<li>German <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/ngram-count.tok.de.out.clean.tar.gz">n-gram counts</a></li>
<li>English <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/lm.tok.en.tar.gz">language model</a></li>
<li>German-English (and v.v.)  <a href="http://www.quest.dcs.shef.ac.uk/wmt17_files_qe/DE-EN.lex.tar.gz">lexical translation tables</a></li>
</ul>


<p><br></p><hr>

<!-- SUBMISSION INFO -->

<h3>Submission Format</h3>

<h4><font color="red">Tasks 1</font></h4>

<p> The output of your system for a <b>a given subtask</b> should produce scores for the translations at the <i>segment-level</i> formatted in the following way: </p>
<pre>&lt;METHOD NAME&gt; &lt;SEGMENT NUMBER&gt; &lt;SEGMENT SCORE&gt; &lt;SEGMENT RANK&gt;<br><br></pre>
Where:
<ul>
<li><code>METHOD NAME</code> is the name of your
quality estimation method.</li>
<li><code>SEGMENT NUMBER</code> is the line number
of the plain text translation file you are scoring/ranking.</li>
<li><code>SEGMENT SCORE</code> is the predicted (HTER/METEOR) score for the
particular segment - assign all 0's to it if you are only submitting
ranking results. </li>
<li><code>SEGMENT RANK</code> is the ranking of
the particular segment - assign all 0's to it if you are only submitting
absolute scores. </li>
</ul>
Each field should be delimited by a single tab character.


<h4><font color="red">Task 2</font></h4>

<p> The output of your system should produce scores for the translations at the <i>word-level</i>
formatted in the following way: </p>
<pre>&lt;METHOD NAME&gt; &lt;SEGMENT NUMBER&gt; &lt;WORD INDEX&gt; &lt;WORD&gt; &lt;BINARY SCORE&gt; <br><br></pre>
Where:
<ul>
<li><code>METHOD NAME</code> is the name of your quality estimation method.</li>
<li><code>SEGMENT NUMBER</code> is the line number of the plain text translation file you are scoring (starting at 0).</li>
<li><code>WORD INDEX</code> is the index of the word in the tokenised sentence, as given in the training/test sets (starting at 0).</li>
<li><code>WORD</code> actual word.</li>
<li><code>BINARY SCORE</code> is either 'OK' for no issue or 'BAD' for any issue.</li>
</ul>
Each field should be delimited by a single tab character.


<h4><font color="red">Task 3 and Task 3b</font></h4>

<p> The output of your system should produce scores for the translations at the <i>phrase-level</i>
formatted in the following way: </p>
<pre>&lt;METHOD NAME&gt; &lt;SEGMENT NUMBER&gt; &lt;PHRASE INDEX&gt; &lt;PHRASE&gt; &lt;BINARY SCORE&gt; <br><br></pre>
Where:
<ul>
<li><code>METHOD NAME</code> is the name of your quality estimation method.</li>
<li><code>SEGMENT NUMBER</code> is the line number of the plain text translation file you are scoring (starting at 0).</li>
<li><code>PHRASE INDEX</code> is the index of the word in the segmented sentence, as given in the training/test sets (starting at 0).</li>
<li><code>PHRASE</code> actual phrase. Multiword phrases should be written in whole with words delimited by spaces</li>
<li><code>BINARY SCORE</code> is either 'OK' for no issue or 'BAD' for any issue.</li>
</ul>
Each field should be delimited by a single tab character.

<p>Example of the phrase-level format:</p>

<table>
<tr>
<td width="20%"><tt>PHRASE_BASELINE</tt></td> <td width="10%">4</td>   <td width="10%">0</td>   <td>Geben Sie im Eigenschafteninspektor (</td>   <td width="10%">BAD<td>
</tr>
<tr>
<td><tt>PHRASE_BASELINE</tt></td> <td>4</td>   <td>1</td>   <td>" Fenster " > " Eigenschaften "</td> <td>OK</td>
</tr>
<tr>
<td><tt>PHRASE_BASELINE</tt></td> <td>4</td>   <td>2</td>   <td>) , und w&#228;hlen Sie</td>  <td>BAD</td>
</tr>
<tr>
<td><tt>PHRASE_BASELINE</tt></td> <td>4</td>   <td>3</td>   <td>Statischer Text</td> <td>OK</td>
</tr>
<tr>
<td><tt>PHRASE_BASELINE</tt></td> <td>4</td>   <td>4</td>   <td>oder</td>    <td>OK</td>
</tr>
<tr>
<td><tt>PHRASE_BASELINE</tt></td> <td>4</td>   <td>5</td>   <td>Dynamischer Text</td>    <td>OK</td>
</tr>
<tr>
<td><tt>PHRASE_BASELINE</tt></td> <td>4</td>   <td>6</td>   <td>.</td>   <td>OK</td>
</tr>
</table>


<p>The example shows the labelling for the sentence (double vertical lines show phrase borders): </p>

<p style="text-indent:25px">Geben Sie im Eigenschafteninspektor ( || ' Fenster ' > ' Eigenschaften ' || ) , und w&#228;hlen Sie || Statischer Text || oder || Dynamischer Text || .</p>

</p>performed by the <tt>PHRASE_BASELINE</tt> system.</p>

<h3>Submission Requirements</h3>

Each participating team can submit at most 2 systems for each of the language pairs of each subtask (systems producing alternative scores, e.g. post-editing time can be submitted as additional runs) . These should be sent
via email to Lucia Specia <a href="mailto:lspecia@gmail.com" target="_blank">lspecia@gmail.com</a>. Please use the following pattern to name your files:
<p>
<code>INSTITUTION-NAME</code>_<code>TASK-NAME</code>_<code>METHOD-NAME</code>, where:
</p><p> <code>INSTITUTION-NAME</code> is an acronym/short name for your institution, e.g. SHEF
</p><p><code>TASK-NAME</code> is one of the following: 1, 2, 3.
</p><p><code>METHOD-NAME</code> is an identifier for your method in case you have multiple methods for the same task, e.g. 2_J48, 2_SVM
</p><p> For instance, a submission from team SHEF for task 2 using method "SVM" could be named SHEF_2_SVM.

</p><p>You are invited to submit a short paper (4 to 6 pages) to WMT
describing your QE method(s). You are not required to
submit a paper if you do not want to. In that case, we ask you
to give an appropriate reference describing your method(s) that we can cite
in the WMT overview paper.</p>


<h3>Important dates</h3>

    <table>
      <tbody><tr><td>Release of training data </td><td>February 10, 2017</td></tr>
      <tr><td>Release of test data </td><td>April 10 2017</td></tr>
      <tr><td>QE metrics results submission deadline  </td><td>May 14 2017</td></tr>
      <tr><td>Paper submission deadline</td><td>June 9 2017</td></tr>
      <tr><td>Notification of acceptance</td><td>June 30 2017</td></tr>
      <tr><td>Camera-ready deadline</td><td>July 14 2017</td></tr>
    </tbody></table>


<h3>Organisers</h3>
<br>
Varvara Logacheva (University of Sheffield)
<br>
Lucia Specia (University of Sheffield)
<br>


<h3>Contact</h3>
<p> For questions or comments, email Lucia
Specia <a href="mailto:lspecia@gmail.com" target="_blank">lspecia@gmail.com</a>.
</p>

<p align="right">
Supported by the European Commission under the projects 
<br>
<a href="http://www.qt21.eu/"><img src="figures/qt21.png" height="40" width="100" border="0" align="right"></a>
<a href="http://cracker-project.eu/"><img src="figures/cracker-logo-no-tag-large.png" height="40" width="100" border="0" align="right"></a>
</p>

</p></li></body></html>