Skip to content
This repository has been archived by the owner on Oct 18, 2018. It is now read-only.

iipc/twittervane

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

No longer under active development

Twittervane

Twittervane is a prototype application capable of collecting and analysing Twitter feeds and outputs URLs mentioned in the Tweets. These URLs shared on the Twitter could potentially point to web resources relevant to web archive collections.

Evaluation

The Evaluating Twittervane project was funded by the International Internet Preservation Consortium (IIPC) to build on an earlier project, Twittervane.

Six curators of the National Library of New Zealand, the National Library of France and the Library of Congress independently evaluated the Twittervane methodology and provided their feedback. Curators had 3 weeks to use and test Twittervane. They not only provided valuable feedback on the user interface and documentation, but also set up collections and assessed the relevance of the URLs reported by Twittervane for their collections. Some feedback, where possible within the project’s resource, was addressed while others have been logged as future requirements.

The general view is that Twittervane could be useful for events-based collections, as it could reduce the time spent on web searching especially over a longer period of time (e.g. elections, Olympics). URLs reported by Twittervane tend to point to news sites and online periodicals. However, curators also found that only a small percentage of the URLs found by Twittervane are relevant and can be accepted as valid selections (eg 20% ~ 30%). Many URLs lead to spam sites.

Issues & lessons learned

One curator pointed out that search terms are closely related to and impact the quality of the results produced by Twittervane. Unfortunately the project team wasn’t much more experienced than the curators to provide more useful hints. Basic training including best practice about the use of search terms to obtain the most relevant tweets, seems an helpful area of future work.

The relevance and quality of the URLs expanded by Twittervane seem to raise the question whether they can justify the amount of processing required to produce the URLs. This may not only be related to the search terms used, but also to the nature of social networks like Twitter, that this approach may only be useful for very specific collections.

Conclusions & recommendations

Most curators who took part in the evaluation were positive about the Twittervane approach and saw this as a complementary selection tool, especially for events-based collections. However, Twittervane also points to a large number of URLs which are not relevant to the collections and cannot be used as valid selections (e.g. spam sites and duplicates). This may be improved when curators are more skilled and establish best practice in using the most appropriate search terms for a collection. More testing is required over longer period of time would be needed in order to determine this. The issues related to data quality may also be addressed technically by for example removing duplicates and detecting spam sites but further investigations are required to achieve this.

Twittervane is not a replacement of the curatorial process but has the potential to be a complementary tool, which may only be useful for events-based collections.

Further work need to take place to productionise Twittervane. However the question that needs to be answered first is whether the amount of processing required to produce the small amount of relevant URLs can be justified.

Note also that there are third-party services like Topsy that fulfil a very similar role.

Other tools