-
Notifications
You must be signed in to change notification settings - Fork 270
GSOC2015_Progress_Emilio
Warmup Period (until 25th of May) Warm up tickets on GitHub, made an experimental date normalization module using the ANTLR4 bindings for Python and the provided grammar. We decided to abandon this approach and create our own regular expressions directly via Python.
First Week (25/5 - 31/5) Some ideas about the date normalizer, meeting at FBK with mentor Marco Fossati.
Second Week (1/6 - 7/6) First prototype of the date normalizer, reviewed crowd annotated gold standard.
Third Week (8/6 - 14/6) Exams!
Fourth and Fifht Weeks (15/6 - 28/6) Almost finished and tested the date normalizer as well as the code using it.
Fifth Week (29/6 - 5/7) Final refinements for the mid-term: successfully outputting reified triples and script for transforming the wikipedia dump in sentences about soccer.
Sixth Week (6/7 - 12/7)
Refactoring and cleaning of the code base, experiments with the unsupervised classifier. As it turned out it is heavily dependent on the quality of the entities linked by the linker (for example stagione 2010-2011
was linked to Serie B) and on the mapping between frame elements and ontology types in dbpedia.
Seventh Week (13/7 - 19/7) Script to compute Fleiss's Kappa on the crowdflower results, slowly refactoring the code base
Eight Week (20/7 - 26/7) Holidays.
Ninth Week (27/7 - 2/8) Created makefile rules to run the supervised classifier, thoughts about how to score triples' confidence and implementation of score for unsupervised classification using the entity linking score.
Tenth Week (3/8 - 9/8) Scoring supervised classification facts and serializing triples' score in a separated dataset, heavy refactoring of the classifier.
Eleventh Week (10/8 - 16/8) Integrated the mappings to DBPO into the output triples, found some critical bugs related to the feature extraction for the SVM which might be responsible for the low frame classification performances. Fixed the bugs and computed confusion matrix for the classifier. Performances are good, but not exceptional, perhaps due to some problems in how the training set was built wrt to the gold standard (i.e. some labels in the gold standard do not appear in the training set)
Twelfth Week (17/8 - 23/8) As we are using the 'C-SVC' flavour of libsvm I explored the classifier's performances as the SVM parameters vary. First, I explored how different kernels perform, then I moved on and explore how the best performing kernel reacts to parameter tuning.
The available kernels are polynomial, sigmoid, linear and radial basis and were tried with the default libsvm parameters (degree=3 and coef0=0 using C=1).
Precision | Recall | F1 | ||
---|---|---|---|---|
Polynomial | Frames | 0.41 | 0.31 | 0.35 |
Roles | 0.78 | 0.16 | 0.26 | |
Sigmoid | Frames | 0.76 | 0.76 | 0.76 |
Roles | 0.7 | 0.62 | 0.66 | |
Linear | Frames | 0.8 | 0.79 | 0.8 |
Roles | 0.78 | 0.68 | 0.73 | |
RBF | Frames | 0.76 | 0.76 | 0.76 |
Roles | 0.7 | 0.62 | 0.66 |
By far the worst performing kernel is the Polynomial kernel. It did not recognize any role and it obtained these modest performances thanks to the automatically annotated numerical FEs; this might suggest that the default parameters were unsuitable for this task. Other kernels have fairly good performances overall with the sigmoid and RBF ones showing very similar results, only differing by a couple of instances. The best performances overall are obtained by the linear kernel both in terms of precision and recall, both on frames and on roles, therefore this will be used throughout the rest of this work.
Next, the performances of the linear kernel are investigated as C varies taking values 0.001, 0.01, 0.1, 1, 10, 100 and 1000.
As for roles, the best performances are obtained using C values near 1 (0.1, 1, 10). The confusion matrix shows a lot of activity around roles which require some sort of semantic understanding of the sentence for proper tagging and roles which are not clearly distinguishable. For example Competizione
is often mistaken as Premio
and, in fact, the distinction is not clear at all (think of Ha vinto la Champions League
). Another example is that teams and competitions are often tagged as Luogo
because many examples contain either international teams or championships.
Confusion also comes from overlapping (such as Squadra
, Perdente
and Vincitore
) or ambiguous (Agente
and Entità
) FE definitions. Finally, the rules for the date normalizer are messing up with the Competizione
and Tempo
roles but this is acceptable because these FEs were missed by the classifier and the normalizer was able to recover a partial information about the frame. Two distinct strips can be noticed in the row and column of the O
role, representing missed tokens or tokens classified by mistake.
Frame classification is very stable with very high precision and recall (more than 0.8) for all classes but Stato
, having a precision smaller than 0.4; it is often mistaken for Attività
. Some Vittoria
instances are classified as Trofeo
, a minor and understandable mistake which may be linked to the confusion between the roles Competizione
and Premio
.
Nonetheless, considering the simplicity of features that the classifier is using these performances are promising. Other features could be added to further help the classifier on the classes it frequently mistakens; for example, considering the Sconfitta
frames in the gold standard the Perdente
FE is located before the LU 59 out of the 62 times it appears in that frame (there are 97 instances of this frame). Opposite behaviour is observed with the opposite situation Vincitore
/Vittoria
. Another similar example is the Squadra
FE and the Trofeo
frame: out of 92 sentences 75 have Squadra
and in 66 of these it appears before the LU.