Skip to content
This repository has been archived by the owner on Oct 18, 2018. It is now read-only.

Tweet Text to Web Collection Resolution #3

Open
terrylnwoods opened this issue Feb 1, 2013 · 0 comments
Open

Tweet Text to Web Collection Resolution #3

terrylnwoods opened this issue Feb 1, 2013 · 0 comments

Comments

@terrylnwoods
Copy link
Contributor

When TwitterVane receives a tweet, the tweet is stored in a database for further analysis.
Analysis is performed by the TweetAnalyser component which is notified to run after n-tweets have been received (the 'n' is configured in the spring-sevlet.xml file for the TweetStreamAgent component).

The analysis consists of:

  1. Expanding the URLs associated with the tweet if it they are shortened
  2. Resolve the Web Collection to which these URLs belong

The resolution performed by the latter analysis is based on the search terms defined for each Web Collection.
If no search term is found within the tweet text then the URL is allocated to a "bucket" Web Collection called "UNKNOWN".

An improved process for resolving URLs to Web Collections is needed since the current process results in large numbers of URLs being allocated to the "bucket".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant