Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the exception behavior when config.string contains ${VAR} in the comments. #16050

Merged
merged 3 commits into from
Apr 11, 2024

Conversation

mashhurs
Copy link
Contributor

@mashhurs mashhurs commented Apr 1, 2024

Release notes

[rn:skip]

What does this PR do?

We were applying var substitution once we load from YAML and it affects when we use config.string. Other options like path.config since points to path, the configs will be loaded later and LSCL rules will be processed when compiling the pipeline. This is why we are experiencing the issue with config.string only.

This PR standardizes (same sequence will be with config.string as well) the way we are loading configs and compiling.

Why is it important/What is the impact to the user?

Probability might be low but it is still not an expectation to resolve the ${VAR} if commented out. So, users may have a better experience with this change.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

  • pull the change
  • use following config in the pipelines.yml
     - pipeline.id: test
    config.string: |
      input { generator {} } #${I_AM_NOT_HERE}
      filter { sleep { time => 100 } }
      output {
      #${I_AM_NOT_HERE}
        stdout {
          codec => rubydebug
        }
        # ${I_AM_NOT_HERE}
      }
    
  • Run Logstash with bin/logstash --debug --config.debug to see the what config was applied.
  • Logstash runs without crash. Before the change, Logstash was failing Cannot evaluate ${I_AM_NOT_HERE}... error
    • and config will be applied after filtering out ${I_AM_NOT_HERE} with the comments
       ----------------------P[input-generator{"count"=>5}|[string]config_string:1:9:```
       generator { count => 5 }
       ```] -> __QUEUE__
       __QUEUE__ -> filter_to_output
       filter_to_output -> P[output-stdout{"codec"=>#<Java::OrgLogstashConfigIrImperative::PluginStatement:0x3344b612>}|     
       [string]config_string:3:1:```
       stdout {
       codec => rubydebug 
       }
       ```]
      

Related issues

Use cases

Screenshots

Logs

@mashhurs mashhurs added the bug label Apr 1, 2024
@mashhurs mashhurs self-assigned this Apr 1, 2024
@mashhurs mashhurs marked this pull request as ready for review April 1, 2024 20:19
@@ -36,7 +38,6 @@ def deep_replace(value)
end
else
if value.is_a?(Array)
value_array_index = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: this isn't used anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+1 it's a leftover from #13603

Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this solves the specific use-case of config.string having a syntax that hides comments between a # char and a newline (or EOF), it effectively does so by infecting the substitution variables handler with knowledge about the syntax of one of the things it parses.

We may be better served to run the substitution variables on a key-by-key basis for settings, allowing settings like config.string to opt out of preemptive replacements because we know that they will be replaced later after the value has been syntactically parsed. This will ensure that the replacements run on config.string happen in a single pass, and not piecemeal in two passes.

…on the generic approach which LSCL happens during the pipeline compile.
@mashhurs mashhurs force-pushed the fix-config-with-commented-vars branch from 4678c4b to b0948a7 Compare April 3, 2024 06:15
@@ -172,7 +172,6 @@ public void displayDebugInformation() {
logger.debug("\n\n{}", configPart.getText());
}
logger.debug("Merged config");
logger.debug("\n\n{}", this.configString());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: this line makes twice same config print (when --debug and --config.debug enabled) which makes us confusing..., removing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's duplicated only when the config in composed by just one SourceWithMetadata, but when the config is loaded from multiple files, like when glob operator is used, it logs, every single section, and then also the effective pipeline definition as it's passed down to the compiler.
So I'm little bit doubtful it remove it, especially in cases of strange compilation errors.

Example single file:

[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] -------- Logstash Config ---------
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] Config from source, source: LogStash::Config::Source::MultiLocal, pipeline_id:: test
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] Config string, protocol: string, id: config_string
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
output {
#${I_AM_NOT_HERE}
  stdout {
    codec => rubydebug
  }
  # ${I_AM_NOT_HERE}
}

[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] Merged config
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
output {
#${I_AM_NOT_HERE}
  stdout {
    codec => rubydebug
  }
  # ${I_AM_NOT_HERE}
}

[2024-04-10T12:22:10,616][DEBUG][logstash.agent           ] Converging pipelines state {:actions_count=>1}

Example multiple files:

[2024-04-10T12:29:45,178][DEBUG][org.logstash.config.ir.PipelineConfig] -------- Logstash Config ---------
[2024-04-10T12:29:45,178][DEBUG][org.logstash.config.ir.PipelineConfig] Config from source, source: LogStash::Config::Source::MultiLocal, pipeline_id:: test
[2024-04-10T12:29:45,178][DEBUG][org.logstash.config.ir.PipelineConfig] Config string, protocol: file, id: /tmp/logstash_1.config
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig] Config string, protocol: file, id: /tmp/logstash_2.config
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig]

output {
     #${I_AM_NOT_HERE}
       stdout {
         codec => rubydebug
       }
       # ${I_AM_NOT_HERE}
}
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig] Merged config
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
output {
     #${I_AM_NOT_HERE}
       stdout {
         codec => rubydebug
       }
       # ${I_AM_NOT_HERE}
}
[2024-04-10T12:29:45,179][DEBUG][logstash.agent           ] Converging pipelines state {:actions_count=>1}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andsel for checking this. It is not related to the PR/feature but printing config multiple times is really confuses user. I have placed back and will check if we can add a condition to properly print considering all config cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I agree with you that replicating information could be misleading, but please file an issue to improve logging, maybe segments could logged at trace level while the full pipeline definition at debug, or something similar.
Having a separate issue permit to discuss the problem separately. 👍

@@ -30,7 +30,7 @@ def initialize(settings)
end

def pipeline_configs
pipelines = deep_replace(retrieve_yaml_pipelines())
pipelines = retrieve_yaml_pipelines
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, things are clear now.
We do not really need var substitution when loading the YAML. Once we have pipeline definitions from YAML, we properly load the configs based on the given options (config.string, config.path). Then, imperative pipeline compile operation executes the LSCL which wipes out the comments based on the rule

@mashhurs
Copy link
Contributor Author

mashhurs commented Apr 3, 2024

While this solves the specific use-case of config.string having a syntax that hides comments between a # char and a newline (or EOF), it effectively does so by infecting the substitution variables handler with knowledge about the syntax of one of the things it parses.

We may be better served to run the substitution variables on a key-by-key basis for settings, allowing settings like config.string to opt out of preemptive replacements because we know that they will be replaced later after the value has been syntactically parsed. This will ensure that the replacements run on config.string happen in a single pass, and not piecemeal in two passes.

Ha.. what a fun! Your point totally makes sense (which I didn't know), thank you.
Now, things are clear that we do not necessarily need to apply var substitution when loading the YAML. Later on, we properly load the configs based on the given options (config.string, config.path). Then, imperative pipeline compile operation executes the LSCL which wipes out the comments based on the rule. Let me please know if I am still missing something.

@mashhurs mashhurs requested a review from yaauie April 3, 2024 06:43
@mashhurs mashhurs changed the title Wipe out comment lines if config comment contains . Fix the exception behavior when config.string contains ${VAR} in the comments. Apr 3, 2024
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globally seems good, but left a doubt on the effectiveness of removing the debug line that prints the full pipeline string passed down to the compiler.

@@ -36,7 +38,6 @@ def deep_replace(value)
end
else
if value.is_a?(Array)
value_array_index = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+1 it's a leftover from #13603

@@ -172,7 +172,6 @@ public void displayDebugInformation() {
logger.debug("\n\n{}", configPart.getText());
}
logger.debug("Merged config");
logger.debug("\n\n{}", this.configString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's duplicated only when the config in composed by just one SourceWithMetadata, but when the config is loaded from multiple files, like when glob operator is used, it logs, every single section, and then also the effective pipeline definition as it's passed down to the compiler.
So I'm little bit doubtful it remove it, especially in cases of strange compilation errors.

Example single file:

[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] -------- Logstash Config ---------
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] Config from source, source: LogStash::Config::Source::MultiLocal, pipeline_id:: test
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] Config string, protocol: string, id: config_string
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
output {
#${I_AM_NOT_HERE}
  stdout {
    codec => rubydebug
  }
  # ${I_AM_NOT_HERE}
}

[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig] Merged config
[2024-04-10T12:22:10,615][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
output {
#${I_AM_NOT_HERE}
  stdout {
    codec => rubydebug
  }
  # ${I_AM_NOT_HERE}
}

[2024-04-10T12:22:10,616][DEBUG][logstash.agent           ] Converging pipelines state {:actions_count=>1}

Example multiple files:

[2024-04-10T12:29:45,178][DEBUG][org.logstash.config.ir.PipelineConfig] -------- Logstash Config ---------
[2024-04-10T12:29:45,178][DEBUG][org.logstash.config.ir.PipelineConfig] Config from source, source: LogStash::Config::Source::MultiLocal, pipeline_id:: test
[2024-04-10T12:29:45,178][DEBUG][org.logstash.config.ir.PipelineConfig] Config string, protocol: file, id: /tmp/logstash_1.config
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig] Config string, protocol: file, id: /tmp/logstash_2.config
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig]

output {
     #${I_AM_NOT_HERE}
       stdout {
         codec => rubydebug
       }
       # ${I_AM_NOT_HERE}
}
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig] Merged config
[2024-04-10T12:29:45,179][DEBUG][org.logstash.config.ir.PipelineConfig]

input { generator {} } #${I_AM_NOT_HERE}
filter { sleep { time => 100 } }
output {
     #${I_AM_NOT_HERE}
       stdout {
         codec => rubydebug
       }
       # ${I_AM_NOT_HERE}
}
[2024-04-10T12:29:45,179][DEBUG][logstash.agent           ] Converging pipelines state {:actions_count=>1}

…fig.java


Put the logging config back as it is being used with composed configs.
@mashhurs mashhurs requested a review from andsel April 10, 2024 16:54
Copy link

Quality Gate passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No Coverage information No data about Coverage
No Duplication information No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mashhurs

Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mashhurs mashhurs merged commit 9483ee0 into elastic:main Apr 11, 2024
6 checks passed
@mashhurs mashhurs deleted the fix-config-with-commented-vars branch April 11, 2024 14:32
andsel added a commit to andsel/logstash that referenced this pull request Jun 6, 2024
…es () inside config.string in the pipelines.yml file, but that created a regression, becuase env vars is expected to be replaced in all the other yaml keys
@andsel andsel mentioned this pull request Jun 6, 2024
5 tasks
jsvd pushed a commit that referenced this pull request Jun 6, 2024
The PR was created to skip resolving environment variable references in comments present in the “config.string” pipelines defined in the pipelines.yml file.
However it introduced a bug that no longer resolves env var references in values of settings like pipeline.batch.size or queue.max_bytes.
For now we’ll revert this PR and create a fix that handles both problems.
github-actions bot pushed a commit that referenced this pull request Jun 6, 2024
The PR was created to skip resolving environment variable references in comments present in the “config.string” pipelines defined in the pipelines.yml file.
However it introduced a bug that no longer resolves env var references in values of settings like pipeline.batch.size or queue.max_bytes.
For now we’ll revert this PR and create a fix that handles both problems.

(cherry picked from commit efa8378)
jsvd pushed a commit that referenced this pull request Jun 6, 2024
The PR was created to skip resolving environment variable references in comments present in the “config.string” pipelines defined in the pipelines.yml file.
However it introduced a bug that no longer resolves env var references in values of settings like pipeline.batch.size or queue.max_bytes.
For now we’ll revert this PR and create a fix that handles both problems.

(cherry picked from commit efa8378)

Co-authored-by: Andrea Selva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Commented out variable substitutions may still be evaluated in pipelines.yml
5 participants