-
Notifications
You must be signed in to change notification settings - Fork 0
/
twitter-stream-in-neo4j-see-twitts-differently.html
221 lines (193 loc) · 18.6 KB
/
twitter-stream-in-neo4j-see-twitts-differently.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1"><title>Twitter Stream in Neo4J : See Twitts differently - Toto do stuff</title><meta name="description" content="The idea here is to show a concrete exemple for implementing a Neo4J server (embeeded) that will represent a Twitter Stream Activity. Before coding, we…"><meta name="generator" content="Publii Open-Source CMS for Static Site"><link rel="canonical" href="https://totetmatt.github.io/twitter-stream-in-neo4j-see-twitts-differently.html"><link rel="alternate" type="application/atom+xml" href="https://totetmatt.github.io/feed.xml"><link rel="alternate" type="application/json" href="https://totetmatt.github.io/feed.json"><meta property="og:title" content="Twitter Stream in Neo4J : See Twitts differently"><meta property="og:site_name" content="Toto do stuff"><meta property="og:description" content="The idea here is to show a concrete exemple for implementing a Neo4J server (embeeded) that will represent a Twitter Stream Activity. Before coding, we…"><meta property="og:url" content="https://totetmatt.github.io/twitter-stream-in-neo4j-see-twitts-differently.html"><meta property="og:type" content="article"><style>:root{--body-font:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Oxygen,Ubuntu,Cantarell,"Fira Sans","Droid Sans","Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";--heading-font:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Oxygen,Ubuntu,Cantarell,"Fira Sans","Droid Sans","Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";--logo-font:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Oxygen,Ubuntu,Cantarell,"Fira Sans","Droid Sans","Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";--menu-font:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Oxygen,Ubuntu,Cantarell,"Fira Sans","Droid Sans","Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol"}</style><link rel="stylesheet" href="https://totetmatt.github.io/assets/css/style.css?v=825c89ac06c7215b642eda05e8a14751"><script type="application/ld+json">{"@context":"http://schema.org","@type":"Article","mainEntityOfPage":{"@type":"WebPage","@id":"https://totetmatt.github.io/twitter-stream-in-neo4j-see-twitts-differently.html"},"headline":"Twitter Stream in Neo4J : See Twitts differently","datePublished":"2013-01-29T18:00","dateModified":"2020-06-20T00:35","description":"The idea here is to show a concrete exemple for implementing a Neo4J server (embeeded) that will represent a Twitter Stream Activity. Before coding, we…","author":{"@type":"Person","name":"Totetmatt","url":"https://totetmatt.github.io/authors/totetmatt/"},"publisher":{"@type":"Organization","name":"Totetmatt"}}</script></head><body><div class="site-container"><header class="top" id="js-header"><a class="logo" href="https://totetmatt.github.io/">Toto do stuff</a></header><main><article class="post"><div class="hero"><figure class="hero__image hero__image--overlay"><img src="https://totetmatt.github.io/media/website/bg.jpg" srcset="https://totetmatt.github.io/media/website/responsive/bg-xs.jpg 300w, https://totetmatt.github.io/media/website/responsive/bg-sm.jpg 480w, https://totetmatt.github.io/media/website/responsive/bg-md.jpg 768w, https://totetmatt.github.io/media/website/responsive/bg-lg.jpg 1024w, https://totetmatt.github.io/media/website/responsive/bg-xl.jpg 1360w, https://totetmatt.github.io/media/website/responsive/bg-2xl.jpg 1600w" sizes="(max-width: 1600px) 100vw, 1600px" loading="eager" alt=""></figure><header class="hero__content"><div class="wrapper"><div class="post__meta"><time datetime="2013-01-29T18:00">13/01/29</time></div><h1>Twitter Stream in Neo4J : See Twitts differently</h1><div class="post__meta post__meta--author"><a href="https://totetmatt.github.io/authors/totetmatt/" class="feed__author invert">Totetmatt</a></div></div></header></div><div class="wrapper post__entry"><p>The idea here is to show a concrete exemple for implementing a <a href="http://www.neo4j.org">Neo4J </a>server (embeeded) that will represent a Twitter Stream Activity.</p><p>Before coding, we have to think about how to represent nodes and relationships from the twitter world. In my <a href="http://matthieu-totet.fr/Koumin/tools/naoyun/">tool</a>, I'm use to represent twitter entities like this :</p><p><img loading="lazy" alt="" src="https://totetmatt.github.io/media/posts/7/twittgrapher.jpg" sizes="(max-width: 48em) 100vw, 768px" srcset="https://totetmatt.github.io/media/posts/7/responsive/twittgrapher-xs.jpg 300w, https://totetmatt.github.io/media/posts/7/responsive/twittgrapher-sm.jpg 480w, https://totetmatt.github.io/media/posts/7/responsive/twittgrapher-md.jpg 768w, https://totetmatt.github.io/media/posts/7/responsive/twittgrapher-lg.jpg 1024w, https://totetmatt.github.io/media/posts/7/responsive/twittgrapher-xl.jpg 1360w, https://totetmatt.github.io/media/posts/7/responsive/twittgrapher-2xl.jpg 1600w" width="548" height="238"></p><p>A node is a Twitter entity (Twitt, User, Hashtag, URL, Media) and it has some links between this entities</p><p>This is what I call a <strong>Network Logic</strong> and it works more or less like that :</p><ul><li>user-[twitt]->twitt</li><li>twitt-[mention]->user</li><li>twitt-[hashtag]->hashtag</li><li>twitt-[URL]->URL</li><li>twitt-[Media]->Media</li><li>user-[RT]->twitt</li><li>user-[RT]->user</li></ul><p>Now the idea is to Stream a topic (words and / or users) through the Twitter Streaming API ( I use <a href="http://twitter4j.org/">Twitter4J </a>) and use Neo4J to store & query the result (in real-time-like)</p><h2>Implementation</h2><p>I create a very simple class to handle the network logic and the initialization of the Neo4J embeeded server.</p><pre>
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseBuilder;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.neo4j.graphdb.index.Index;
import org.neo4j.kernel.GraphDatabaseAPI;
import org.neo4j.server.WrappingNeoServerBootstrapper;
import twitter4j.HashtagEntity;
import twitter4j.MediaEntity;
import twitter4j.Status;
import twitter4j.URLEntity;
import twitter4j.UserMentionEntity;
public class Twitterj4neo {
private static enum RelType implements RelationshipType
{
MENTION,
TWITT,
HASHTAG,
URL,
MEDIA,
RT
}
private static String DB_PATH = "C:/neo4j-community-1.8.1-windows/TwittGraphTEST/";
private Index _entitiesIndex ;
private Index _UserIndex ;
private Index _TwittIndex ;
private Index _URLIndex ;
private Index _HashtagIndex ;
private Index _MediaIndex ;
private Index _relMENTION;
private Index_relTWITT;
private Index_relHASHTAG;
private Index_relURL;
private Index_relMEDIA;
private Index _relRT;
GraphDatabaseService _graphDB;
static int _relcounter = 0;
public Twitterj4neo()
{
createDB();
}
private void createDB()
{
_graphDB = new GraphDatabaseFactory().newEmbeddedDatabase(DB_PATH);
_entitiesIndex= _graphDB.index().forNodes( "twitterEntities" );
_UserIndex= _graphDB.index().forNodes( "User" );
_TwittIndex= _graphDB.index().forNodes( "Twitt" );
_URLIndex = _graphDB.index().forNodes( "URL" );
_HashtagIndex= _graphDB.index().forNodes( "Hashtag" );
_MediaIndex= _graphDB.index().forNodes( "Media" );
_relMENTION = _graphDB.index().forRelationships("Mention");
_relTWITT = _graphDB.index().forRelationships("Twitt");
_relHASHTAG = _graphDB.index().forRelationships("Hashtag");
_relURL = _graphDB.index().forRelationships("URL");
_relMEDIA = _graphDB.index().forRelationships("Media");
_relRT = _graphDB.index().forRelationships("RT");
WrappingNeoServerBootstrapper srv;
srv = new WrappingNeoServerBootstrapper((GraphDatabaseAPI) _graphDB );
srv.start();
}
private Node createNodeType(Object id,String type)
{
Node statusNode = createNode(id);
statusNode.setProperty("type", type);
if(type.equals("User"))
{
_UserIndex.add(statusNode, "twitt_id", id);
}
if(type.equals("Hashtag"))
{
_HashtagIndex.add(statusNode, "twitt_id", id);
}
if(type.equals("URL"))
{
_URLIndex.add(statusNode, "twitt_id", id);
}
if(type.equals("Media"))
{
_MediaIndex.add(statusNode, "twitt_id", id);
}
if(type.equals("Twitt"))
{
_TwittIndex.add(statusNode, "twitt_id", id);
}
return statusNode;
}
private Node createNode(Object id)
{
Node statusNode;
statusNode = _entitiesIndex.get( "twitt_id", id).getSingle();
if(statusNode!=null)
{
return statusNode;
}
statusNode = _graphDB.createNode();
statusNode.setProperty("twitt_id",id);
_entitiesIndex.add( statusNode, "twitt_id", id );
return statusNode;
}
public void newStatus(Status s)
{
Transaction tx = _graphDB.beginTx();
try
{
Node statusNode = createNodeType(s.getId(),"Twitt");
statusNode.setProperty("text", s.getText());
if(s.getGeoLocation()!=null)
{
statusNode.setProperty("Geo_Lat", s.getGeoLocation().getLatitude());
statusNode.setProperty("Geo_Long", s.getGeoLocation().getLongitude());
}
Node user = createNodeType("@"+s.getUser().getScreenName().toLowerCase(),"User");
user.setProperty("name", s.getUser().getName());
_relTWITT.add(user.createRelationshipTo(statusNode, RelType.TWITT),"rel_id",_relcounter++);
if(s.isRetweet())
{
Status rts =s.getRetweetedStatus();
Node retweetedUser = createNodeType("@"+rts.getUser().getScreenName(),"User");
retweetedUser.setProperty("name", rts.getUser().getName());
Node retweetedTwitt = createNodeType(rts.getId(),"Twitt");
retweetedTwitt.setProperty("text", rts.getText());
if(rts.getGeoLocation()!=null)
{
retweetedTwitt.setProperty("Geo_Lat", rts.getGeoLocation().getLatitude());
retweetedTwitt.setProperty("Geo_Long", rts.getGeoLocation().getLongitude());
}
_relTWITT.add(retweetedUser.createRelationshipTo(retweetedTwitt, RelType.TWITT),"rel_id",_relcounter++);
_relRT.add(user.createRelationshipTo(retweetedUser, RelType.RT),"rel_id",_relcounter++);
_relRT.add(user.createRelationshipTo(retweetedTwitt, RelType.RT),"rel_id",_relcounter++);
}
for(UserMentionEntity me:s.getUserMentionEntities())
{
if(!s.isRetweet() || (!me.getScreenName().equals(s.getRetweetedStatus().getUser().getScreenName())) )
{
Node mention = createNodeType("@"+me.getScreenName().toLowerCase(),"User");
_relMENTION.add(statusNode.createRelationshipTo(mention, RelType.MENTION),"rel_id",_relcounter++);
}
}
for(HashtagEntity hs:s.getHashtagEntities())
{
Node mention = createNodeType("#"+hs.getText().toLowerCase(),"Hashtag");
_relHASHTAG.add(statusNode.createRelationshipTo(mention, RelType.HASHTAG),"rel_id",_relcounter++);
}
for(URLEntity url:s.getURLEntities())
{
Node mention = createNodeType(url.getExpandedURL(),"URL");
_relURL.add(statusNode.createRelationshipTo(mention, RelType.URL),"rel_id",_relcounter++);
}
for(MediaEntity media:s.getMediaEntities())
{
Node mention = createNodeType(media.getExpandedURL(),"Media");
_relMEDIA.add(statusNode.createRelationshipTo(mention, RelType.MEDIA),"rel_id",_relcounter++);
}
tx.success();
}
catch(Exception e)
{
e.printStackTrace();
}
finally
{
tx.finish();
}
}
public void closeDB()
{
_graphDB.shutdown();
}
}</pre><p>I need to use index to retrive existing nodes. So, I have a master index <strong>twitterEntities </strong>and several others that might be useful one day (I'm not advenced in Neo4J to know what to do with that).</p><p>Now it's complete, I plug it into my TwitterStreamer when it receives a Status.</p><pre>Twitterj4neo neo = new Twitterj4neo();
@Override
public void onStatus(Status arg0) {
neo.newStatus(arg0);
}</pre><p>And that's it ! Just have to run it and wait for t(w)itts !</p><p><img loading="lazy" alt="" src="https://totetmatt.github.io/media/posts/7/neo4j_4.png" sizes="(max-width: 48em) 100vw, 768px" srcset="https://totetmatt.github.io/media/posts/7/responsive/neo4j_4-xs.png 300w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_4-sm.png 480w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_4-md.png 768w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_4-lg.png 1024w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_4-xl.png 1360w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_4-2xl.png 1600w" width="575" height="89"></p><p><img loading="lazy" alt="" src="https://totetmatt.github.io/media/posts/7/neo4j_5.png" sizes="(max-width: 48em) 100vw, 768px" srcset="https://totetmatt.github.io/media/posts/7/responsive/neo4j_5-xs.png 300w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_5-sm.png 480w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_5-md.png 768w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_5-lg.png 1024w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_5-xl.png 1360w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_5-2xl.png 1600w" width="574" height="141"></p><p>I create a special style to have a nice visualuzation of my entities. That's very easy and helps <strong>a lot</strong> .</p><p><img loading="lazy" alt="" src="https://totetmatt.github.io/media/posts/7/neo4j_2.png" sizes="(max-width: 48em) 100vw, 768px" srcset="https://totetmatt.github.io/media/posts/7/responsive/neo4j_2-xs.png 300w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_2-sm.png 480w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_2-md.png 768w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_2-lg.png 1024w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_2-xl.png 1360w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_2-2xl.png 1600w" width="823" height="267"></p><p>To compare, here is the gephi representation (same Network logic, here we see a part of the complete graph)</p><p><img loading="lazy" style="width: 482px; height: 399px;" alt="" src="https://totetmatt.github.io/media/posts/7/neo4j_3.png" sizes="(max-width: 48em) 100vw, 768px" srcset="https://totetmatt.github.io/media/posts/7/responsive/neo4j_3-xs.png 300w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_3-sm.png 480w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_3-md.png 768w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_3-lg.png 1024w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_3-xl.png 1360w, https://totetmatt.github.io/media/posts/7/responsive/neo4j_3-2xl.png 1600w"></p><h2>Comments</h2><p>The webconsole ist zuper toll (as we say here) ! It's a kind of real-time and you can be more precise on your research. I don't master yet the query language, but it seems to very powerfull for complex query.</p><p>On the other hand, it's not very usable for "big graph", let say that it's enough for 100 nodes max or the graph goes on strike. I use a lot Gephi, which is more graphical and robust for real time large graph. I would say the 2 tools complete each other with :</p><ul><li>Gephi for global and live approach</li><li>Neo4J for focus and afterward analysis</li></ul><p>(there is a plugin in gephi ! but it's buging on my side ....).</p><p>I'm also eager to try <a href="http://linkurio.us/">Linkurious</a>, maybe this tools will bring more usabilly for playing with graph than the standard webconsole.</p></div><footer class="wrapper post__footer"><p class="post__last-updated">This article was updated on 20/06/20</p><ul class="post__tag"><li><a href="https://totetmatt.github.io/graph/">graph</a></li><li><a href="https://totetmatt.github.io/neo4j/">neo4j</a></li><li><a href="https://totetmatt.github.io/twitter/">twitter</a></li></ul><div class="post__share"></div><div class="post__bio bio"><div class="bio__info"><h3 class="bio__name"><a href="https://totetmatt.github.io/authors/totetmatt/" class="invert" rel="author">Totetmatt</a></h3></div></div></footer></article><nav class="post__nav"><div class="post__nav-inner"><div class="post__nav-next"><a href="https://totetmatt.github.io/tools.html" class="invert post__nav-link" rel="next"><span>Next</span> Tools </a><svg width="1.041em" height="0.416em" aria-hidden="true"><use xlink:href="https://totetmatt.github.io/assets/svg/svg-map.svg#arrow-next"/></svg></div></div></nav><div class="post__related related"><div class="wrapper"><h2 class="h5 related__title">You should also read:</h2><article class="related__item"><div class="feed__meta"><time datetime="2022-06-24T17:36" class="feed__date">22/06/24</time></div><h3 class="h1"><a href="https://totetmatt.github.io/gephis-twitter-streaming-importer-v2-is-out.html" class="invert">Gephi's Twitter Streaming Importer V2 is Out !</a></h3></article><article class="related__item"><div class="feed__meta"><time datetime="2014-06-15T09:55" class="feed__date">14/06/15</time></div><h3 class="h1"><a href="https://totetmatt.github.io/lets-play-gephi-streaming-api-the-hidden-websocket.html" class="invert">Let's Play Gephi : Streaming API - The 'hidden' Websocket</a></h3></article><article class="related__item"><div class="feed__meta"><time datetime="2013-03-25T18:59" class="feed__date">13/03/25</time></div><h3 class="h1"><a href="https://totetmatt.github.io/timelaps-of-foursquare-activity-on-twitter.html" class="invert">Timelaps of Foursquare activity on Twitter</a></h3></article></div></div></main><footer class="footer"><div class="footer__copyright"><p>Powered by <a href="https://getpublii.com" target="_blank" rel="nofollow noopener">Publii Static CMS</a></p></div><button class="footer__bttop js-footer__bttop" aria-label="Back to top"><svg><title>Back to top</title><use xlink:href="https://totetmatt.github.io/assets/svg/svg-map.svg#toparrow"/></svg></button></footer></div><script>window.publiiThemeMenuConfig = {
mobileMenuMode: 'sidebar',
animationSpeed: 300,
submenuWidth: 'auto',
doubleClickTime: 500,
mobileMenuExpandableSubmenus: true,
relatedContainerForOverlayMenuSelector: '.top',
};</script><script defer="defer" src="https://totetmatt.github.io/assets/js/scripts.min.js?v=f4c4d35432d0e17d212f2fae4e0f8247"></script><script>var images = document.querySelectorAll('img[loading]');
for (var i = 0; i < images.length; i++) {
if (images[i].complete) {
images[i].classList.add('is-loaded');
} else {
images[i].addEventListener('load', function () {
this.classList.add('is-loaded');
}, false);
}
}</script></body></html>