Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to retrieve Sysmon logs #75

Open
samsplunks opened this issue Jun 15, 2021 · 3 comments
Open

Not able to retrieve Sysmon logs #75

samsplunks opened this issue Jun 15, 2021 · 3 comments

Comments

@samsplunks
Copy link

samsplunks commented Jun 15, 2021

I have multiple custom channels where Security and Sysmon logs are forwarded.

<source>
  @type windows_eventlog2
  @id wec_channels
  channels ["WEC-Security", "WEC-Sysmon"]
  read_interval 2
  rate_limit 5000
  tag wec_channels
  render_as_xml true
  parse_description false
  <storage>
    @type local
    persistent true
    path C:\opt\td-agent\var\log\wec_channels
  </storage>
  <parse>
    @type none
  </parse>
</source>

<match wec_channels>
  @type kinesis_streams
  ...
</match>

At first I was only receiving Security logs and the systems worked like a charm.

As soon as I add sysmon logs in the configuration, I see this behaviour:

The ruby process flushes correctly the first pass of the first security logs, and after that, the memory keeps on increasing without releasing the logs.
Sometimes, when I stop the service, it flushes the Sysmon logs and at other times it does not.

I can correctly query that sysmon channel through winevt and receive XML logs.
The message part is empty and the xml one contains line returns (in do |xml, message, string_inserts|).

I tried to mimic the plugin by editing the log handling with the purest form and removing the parser helper (I only need XML output).

    def on_notify_xml(ch, subscribe)
        ...
        subscribe.each do |xml, message, string_inserts|
          xml = xml.gsub(/[ \x00\t\n\v\f\r]+/, " ")
          log.debug "XML LOG (#{xml})"
          es.add(Fluent::Engine.now, { "message" => "#{xml}" })
        end
        ...

The sysmon logs are correctly written as debug messages to the td-agent.log file but they don't go anywhere after that.

I tried to use different workers, different sources, different output plugins, it just doesn't get to the match part, keeps stuck in the source part, keeping the logs in memory. If I remove the sysmon logs channel and add other ones, it all works fine.

The WEC servers are beasts and the quantity of security logs is higher than the sysmon ones, there is no system or network bottleneck.

Tried with td-agent version 4.1.1 and also 4.1.0 because I saw some memory leak issues.

Any suggestion?

@samsplunks
Copy link
Author

samsplunks commented Jun 16, 2021

When I stop the service:

[warn]: #1 [wec_sysmon] event loop does NOT exit until hard timeout.
[debug]: #1 fluent/log.rb:308:debug: calling after_shutdown on output plugin type=:kinesis_streams plugin_id="object:d70"
[debug]: #1 fluent/log.rb:308:debug: closing input plugin type=:windows_eventlog2 plugin_id="wec_sysmon"
[debug]: #1 fluent/log.rb:308:debug: closing output plugin type=:kinesis_streams plugin_id="object:d70"
[debug]: #1 fluent/log.rb:308:debug: closing buffer instance=3420
[debug]: #1 fluent/log.rb:308:debug: calling terminate on input plugin type=:windows_eventlog2 plugin_id="wec_sysmon"
[warn]: #1 [wec_sysmon] killing existing thread thread=#<Thread:0x000002915e3e9760@event_loop
                           C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.12.3/lib/fluent/plugin_helper/thread.rb:70 run>
[warn]: #1 [wec_sysmon] thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::WindowsEventLog2Input
                           title=:event_loop thread=#<Thread:0x000002915e3e9760@event_loop
                           C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.12.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
[debug]: #1 fluent/log.rb:308:debug: calling terminate on output plugin type=:kinesis_streams plugin_id="object:d70"
[info]: fluent/log.rb:329:info: Worker 1 finished with status 0

One time, with @query.each do |xml, message, string_inserts| I could catch a message value of what seems to be a bad encoding for a string:

@kenhys
Copy link
Contributor

kenhys commented Jun 17, 2021

Does it work with only channels ["WEC-Sysmon"]?
Does the changing the value of rate_limit affect the behavior?

@samsplunks
Copy link
Author

It doesn't work when only channels ["WEC-Sysmon"].

I filtered the Sysmon subscription to have a very low EPS rate and I rate limited at 500 per 2 seconds.

Here's what I do for a clean test:

  • Stop service
  • Delete log file
  • Delete bookmarks
  • Clear WEC-Sysmon Eventlog
  • Start service

Added -vv to the service fluentdopt.

2021-06-17 13:04:02 +0000 [info]: fluent/log.rb:327:info: using configuration file: <ROOT>
...
  <source>
    @type windows_eventlog2
    @id wec_xml_logs
    channels ["WEC-Sysmon"]
    read_interval 2
    rate_limit 500
    tag "wec.xmllogs"
    render_as_xml true
    parse_description false
    <storage>
      @type "local"
      persistent true
      path "C:\\opt\\td-agent\\var\\log\\wec_xml_logs_bookmarks"
    </storage>
    <parse>
      @type "none"
    </parse>
  </source>
  <match wec.xmllogs>
    @type kinesis_streams
   ...
    <buffer>
      flush_interval 1s
      chunk_limit_size 1m
      flush_thread_interval 0.1
      flush_thread_burst_interval 0.01
      flush_thread_count 15
    </buffer>
  </match>
   ...

2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: registered output plugin 'kinesis_streams'
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: registered buffer plugin 'memory'
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: registered formatter plugin 'json'
2021-06-17 13:04:09 +0000 [info]: fluent/log.rb:327:info: adding match pattern="wec.xmllogs" type="kinesis_streams"
2021-06-17 13:04:09 +0000 [info]: fluent/log.rb:327:info: adding source type="windows_eventlog2"
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: registered input plugin 'windows_eventlog2'
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: registered parser plugin 'none'
2021-06-17 13:04:09 +0000 [info]: #0 fluent/log.rb:327:info: starting fluentd worker pid=7756 ppid=7400 worker=0
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: buffer started instance=3380 stage_size=0 queue_size=0
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: buffer started instance=3340 stage_size=0 queue_size=0
2021-06-17 13:04:09 +0000 [info]: #0 fluent/log.rb:327:info: fluentd worker is now running worker=0
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: enqueue_thread actually running
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing all chunks in buffer instance=3380
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: writing events into buffer instance=3340 metadata_size=1
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: enqueue_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [debug]: #0 fluent/log.rb:306:debug: flush_thread actually running
2021-06-17 13:04:09 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing all chunks in buffer instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing chunk instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk dequeued instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: trying flush for a chunk chunk="5c4f5d7c9002add240db680499a248f0"
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: adding write count instance=3360
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: executing sync write chunk="5c4f5d7c9002add240db680499a248f0"
2021-06-17 13:04:10 +0000 [debug]: #0 fluent/log.rb:306:debug: Write chunk 5c4f5d7c9002add240db680499a248f0 /  39 records /    4 KB
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing all chunks in buffer instance=3340
2021-06-17 13:04:10 +0000 [debug]: #0 fluent/log.rb:306:debug: Finish writing chunk
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: write operation done, committing chunk="5c4f5d7c9002add240db680499a248f0"
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: committing write operation to a chunk chunk="5c4f5d7c9002add240db680499a248f0" delayed=false
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: purging a chunk instance=3340 chunk_id="5c4f5d7c9002add240db680499a248f0" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk purged instance=3340 chunk_id="5c4f5d7c9002add240db680499a248f0" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:10 +0000 [trace]: #0 fluent/log.rb:284:trace: done to commit a chunk chunk="5c4f5d7c9002add240db680499a248f0"
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing chunk instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: dequeueing a chunk instance=3340
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk dequeued instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: trying flush for a chunk chunk="5c4f5d7cdb6b192d134f6baf4f9ef29c"
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: adding write count instance=3360
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: executing sync write chunk="5c4f5d7cdb6b192d134f6baf4f9ef29c"
2021-06-17 13:04:11 +0000 [debug]: #0 fluent/log.rb:306:debug: Write chunk 5c4f5d7cdb6b192d134f6baf4f9ef29c /  22 records /    3 KB
2021-06-17 13:04:11 +0000 [debug]: #0 fluent/log.rb:306:debug: Finish writing chunk
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: write operation done, committing chunk="5c4f5d7cdb6b192d134f6baf4f9ef29c"
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: committing write operation to a chunk chunk="5c4f5d7cdb6b192d134f6baf4f9ef29c" delayed=false
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: purging a chunk instance=3340 chunk_id="5c4f5d7cdb6b192d134f6baf4f9ef29c" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk purged instance=3340 chunk_id="5c4f5d7cdb6b192d134f6baf4f9ef29c" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: done to commit a chunk chunk="5c4f5d7cdb6b192d134f6baf4f9ef29c"
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing all chunks in buffer instance=3340
2021-06-17 13:04:11 +0000 [trace]: #0 fluent/log.rb:284:trace: writing events into buffer instance=3380 metadata_size=1
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing chunk instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk dequeued instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: trying flush for a chunk chunk="5c4f5d7dc31b44758b3b0b57b68cb414"
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: adding write count instance=3360
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: executing sync write chunk="5c4f5d7dc31b44758b3b0b57b68cb414"
2021-06-17 13:04:12 +0000 [debug]: #0 fluent/log.rb:306:debug: Write chunk 5c4f5d7dc31b44758b3b0b57b68cb414 /  15 records /    3 KB
2021-06-17 13:04:12 +0000 [debug]: #0 fluent/log.rb:306:debug: Finish writing chunk
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: write operation done, committing chunk="5c4f5d7dc31b44758b3b0b57b68cb414"
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: committing write operation to a chunk chunk="5c4f5d7dc31b44758b3b0b57b68cb414" delayed=false
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: purging a chunk instance=3340 chunk_id="5c4f5d7dc31b44758b3b0b57b68cb414" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk purged instance=3340 chunk_id="5c4f5d7dc31b44758b3b0b57b68cb414" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: done to commit a chunk chunk="5c4f5d7dc31b44758b3b0b57b68cb414"
2021-06-17 13:04:12 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing all chunks in buffer instance=3340
2021-06-17 13:04:13 +0000 [trace]: #0 fluent/log.rb:284:trace: enqueueing chunk instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>
2021-06-17 13:04:13 +0000 [trace]: #0 fluent/log.rb:284:trace: chunk dequeued instance=3340 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil, seq=0>

And it goes on writing and enqueuing chunks with the same last sequences of logs.

As soon as I change the config with security channels (one or more) and a rate_limit at 10000 / 2 seconds, everything runs smooth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants