Twitter Streaming Api - Example

Overview

This is a simple application that makes use of Apache Lucene and Twitter's Streaming API.

Get Started

Download Distribution directory.
Connect to a Wifi/LAN network.
Update the file app.properties with the following security details (Please visit Twitter's dev portal for more details about how you can obtain these):
1. App Token
2. App Secret
3. Consumer Key
4. Consumer Secret
On Windows
- Double-click twitter-search-example.cmd if you are on Windows.
On Linux
- Open Console in distribution folder and execute "bash twitter-search-example.sh
Check console Output.

Details

This is a standalone tool that doesn't have any dependencies on 3d party services. The application starts in two threads. The first one reads Tweet messages from Twitter's public stream and sends them off to Lucene for indexing. The second thread queries Lucene every 5 seconds and gets answers to the following questions:

What is the total count of tweets matching the search term seen so far?
How many tweets containing the search term were there in the last 1, 5 and 15 minutes?
What are the ten most frequent terms (excluding the search term) that appear in tweets containing the search term over the last 1, 5 and 15 minutes?
Within tweets matching the search term, who were the top ten tweeps (Twitter users) who tweeted the most in the last 1, 5 and 15 minutes?

Tokenization

The terms tokenization is done by Lucene's Standard Analyzer. The Standard Analyzer uses a Standard tokenizer that splits words based on spaces and stop characters. Many tweets contain urls in the form of http://www.google.com. The Standard Analyzer will split that in two tokens -> http and www.google.com. It is easy to change this behaviour by creating a custom Lucene Analyzer.

Improvements

In sentiment analysis use BigDecimal objects rather than plain double primitives. This will produce a better precision.
The project uses an In-memory Lucene index for storing tweet messages. If the available memory is limited then the in memory index can store a limited number of tweets too.
Use multiple threads when writing to/reading from the index.
Create RESTful APIs for querying the data.
Replace Lucene with a distributed storage layer e.g ElasticSearch

Testing

The distribution version of the current tool has been tested on OS X Yosemite and Java 1.7 only. It should be able to run on any computer running java 1.5 and above.

Libraries/APIs

https://github.com/twitter/hbc

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
distribution		distribution
src		src
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Streaming Api - Example

Overview

Get Started

Details

Tokenization

Improvements

Testing

Libraries/APIs

Licence

About

Releases

Packages

Languages

gekalogiros/twitter-streaming-analyzer

Folders and files

Latest commit

History

Repository files navigation

Twitter Streaming Api - Example

Overview

Get Started

Details

Tokenization

Improvements

Testing

Libraries/APIs

Licence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages