Stream limit loss count is misleading #36

sjacks26 · 2019-10-11T18:11:12Z

According to Twitter docs, Twitter keeps track of the running count of tweets lost to stream limits since an API connection is opened. That means that our limit collection should probably update with each new stream limit message rather than adding a new doc for each stream limit message. If we want to keep all messages in the limit collection, we need to make the STACKS documentation really clear that the count in the track field is cumulative back to when the API connection was opened.

sjacks26 · 2019-10-14T13:29:17Z

Actually, if we update docs rather than adding a new doc each time, we will lose some of the granularity of our data.

Another approach would be to do some additional processing on stream limit messages. For each stream limit message, if the collector hasn't restarted since the previous stream limit message:

Save the "track" field in the stream limit message as "total number of tweets lost to stream limits since XX:XX:XXTXX:XX:XXZ"
Create a new field for "tweets lost since last message" or something like that, which is equal to the most recent value for the track field minus the previous value for the track field.

This solution means keeping some additional pieces of information in memory, but I suspect that any slowing down of any process would be trivial.

sjacks26 added the bug label Oct 11, 2019

sjacks26 assigned jhemsley Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream limit loss count is misleading #36

Stream limit loss count is misleading #36

sjacks26 commented Oct 11, 2019

sjacks26 commented Oct 14, 2019

Stream limit loss count is misleading #36

Stream limit loss count is misleading #36

Comments

sjacks26 commented Oct 11, 2019

sjacks26 commented Oct 14, 2019