Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File data logging v2 #12080

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

File data logging v2 #12080

wants to merge 3 commits into from

Conversation

regit
Copy link
Contributor

@regit regit commented Nov 2, 2024

Update of #12042

Contribution style:

Our Contribution agreements:

Changes (if applicable):

Link to ticket: https://redmine.openinfosecfoundation.org/issues/7347

Describe changes:

  • Rebase on latest master
  • Remove conditional logging of sha256 and add comment
  • Add suricata-verify test

Provide values to any of the below to override the defaults.

  • To use an LibHTP, Suricata-Verify or Suricata-Update pull request,
    link to the pull request in the respective _BRANCH variable.
  • Leave unused overrides blank or remove.

SV_BRANCH=OISF/suricata-verify#2113

regit added 3 commits November 2, 2024 10:04
This patch adds file data to alerts so an analyst can directly
understand what is the file. It can also be used to do some
detection on the file outside of Suricata without doing file
extraction.

Ticket: OISF#7347
@regit regit requested review from jufajardini, victorjulien and a team as code owners November 2, 2024 09:52
Copy link

codecov bot commented Nov 2, 2024

Codecov Report

Attention: Patch coverage is 37.50000% with 15 lines in your changes missing coverage. Please review.

Project coverage is 79.79%. Comparing base (b1e7917) to head (1e7720c).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12080      +/-   ##
==========================================
- Coverage   83.37%   79.79%   -3.59%     
==========================================
  Files         910      910              
  Lines      257556   257399     -157     
==========================================
- Hits       214748   205383    -9365     
- Misses      42808    52016    +9208     
Flag Coverage Δ
fuzzcorpus 61.55% <37.50%> (+<0.01%) ⬆️
livemode 19.41% <8.33%> (-0.01%) ⬇️
pcap 44.54% <37.50%> (+0.03%) ⬆️
suricata-verify ?
unittests 59.35% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 23242

@victorjulien victorjulien marked this pull request as draft November 4, 2024 12:05
"data": {
"type": "string"
},
"offset": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if these fields should be grouped together in a subobject (making the meaning of offset more clear)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a good idea. I can try implementing that to see if it works ok.

@@ -67,6 +67,7 @@
#include "util-validate.h"

#include "action-globals.h"
#include <stdint.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this include added ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to fix my LSP which is adding these headers by itself...

@catenacyber
Copy link
Contributor

Why is this better than using filestore ?

@regit
Copy link
Contributor Author

regit commented Nov 19, 2024

Why is this better than using filestore ?

I see multiple advantages which depends on the use case:

  • this provide a field in JSON with basic information about the file so user can make sure about what type of data it is without doing correlation. It can also be used to run some data analysis on the start of the data without having to have full file extraction.
  • JSON is not kept of the probe on a multi probe system so it allows basic understanding of binary content without querying a probe for the file

Where it definitely not work well:

  • reverse engineering use case
  • sandbox usage

@regit
Copy link
Contributor Author

regit commented Nov 19, 2024

Why is this better than using filestore ?

I see multiple advantages which depends on the use case:

  • this provide a field in JSON with basic information about the file so user can make sure about what type of data it is without doing correlation. It can also be used to run some data analysis on the start of the data without having to have full file extraction.
  • JSON is not kept of the probe on a multi probe system so it allows basic understanding of binary content without querying a probe for the file

Where it definitely not work well:

  • reverse engineering use case
  • sandbox usage

These examples apart, I would be happy to take some feedback from data analyst or SOC expert to see if they would be interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants