-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best method for using multiple columns? #3
Comments
if you already have the fields broken out using an interceptor, i think you the sink is fairly basic, so it does need some work to be used in if you have an interceptor that does this, i'll take a pull request and thx! On Thu, Jan 17, 2013 at 9:36 AM, jeffb4 [email protected] wrote:
|
The interceptor (default regex_extractor that comes with Flume) serializes the parsed-out data into event headers - I'm not familiar enough with Flume terminology to say whether that is LogEvent payload or attribute. My thought (instead of the JSON conversion and then deconversion) was something like:
As far as your plugin goes, the big difference would be the addition of the .serializer config option (defaulting to your current use of the ByteBufferSerializer out of Hector). If JSON/BSON was being written to more than MongoDB, or if Flume event headers weren't capable of storing columns, I could see a more generic JSON solution for in-flight data. |
thanks for pinging me. i had some other things ahead of this, and wanted to understand a bit more (been a while since i've hit the code). yes i think you're on to something, but instead of using the regex, maybe just supply a serializer that does the regex directly into cassandra columns? this would essentially mean that anyone could create a serializer to parse the flume event into columns. taking it one step further, how about defining the conversion in configuration, like JSON or XML? 1 - read the "conversion definition" based on something in the flume headers (source id, app id, hostname, etc) |
I'm writing Apache webserver logs to Cassandra using Flume and this Sink, and I would like to break log entries into various fields/columns (I already break in to fields with an interceptor).
Would the best/canonical method of doing this be to extend flume-ng-cassandra-sink with a serializer config directive, default said directive to the existing serializer, and then (for my needs) create a custom serializer that takes desired fields as a configuration option, and stuffs them into Cassandra as columns?
The text was updated successfully, but these errors were encountered: