Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional inputs for transforms and sinks #21872

Open
tmccombs opened this issue Nov 22, 2024 · 1 comment
Open

Optional inputs for transforms and sinks #21872

tmccombs opened this issue Nov 22, 2024 · 1 comment
Labels
type: feature A value-adding code addition that introduce new functionality.

Comments

@tmccombs
Copy link
Contributor

tmccombs commented Nov 22, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

I have vector configuration that is shared across multiple instances. In particular it includes a sink configuration.

In some instances I want to add an additional source, and some transforms, by including another yaml file.

However, that poses the problem: how do I include the final transform in the inputs of the sink in the shared configuration file?

Attempted Solutions

Simply including the name of the transform in the inputs doesn't work, because then I get an error that Input "x" for sink "y" doesn't match any components. If I try using a wildcard that would only match the new transform, I get similar error.

What does work is using an environment variable to specify additional inputs to use, but then you have to specify that environment variable in addition to the additional file. And it just feels kind of hacky.

Another option is to use a wildcard that also matches something that is always available. But that may or may not be reasonable to do, depending on what the other inputs are. In particular, it may be difficult to do if the only shared input is a route transform, and the added input isn't.

You could use some kind of programmatic generation to generate the config, so that you can optionally include the desired components and inputs, but that would add significant complexity if you don't already need that.

You could use some kind of dummy source that doesn't produce any inputs, but there isn't a clear way to do that either. And even if there was, that means you would need to have a separate file for the cases that don't need the extra components.

Proposal

I can think of a few ways that this could be addressed:

  1. Add a field on sinks and transforms for optional inputs that doesn't produce an error if the input component doesn't exist
  2. Change wildcards to not produce an error if they don't match anything
  3. Have a way to specify on a source or transform that its output should be sent to a different transform or sink. Like an inverse of inputs.
  4. Have a way to set environment variables (or just variables) in one file, that can be referenced in another, so that you can add a file, and automatically set an environment variable that can be used in inputs for other components.

References

Version

vector 0.42.0 (x86_64-unknown-linux-gnu 3d16e34 2024-10-21 14:10:14.375255220)

@tmccombs tmccombs added the type: feature A value-adding code addition that introduce new functionality. label Nov 22, 2024
@pront
Copy link
Member

pront commented Nov 25, 2024

Hi @tmccombs, this is a reasonable idea. IMO proposal(2) makes sense.

My only concern is that someone might depend on existing behavior i.e. the process to stop if there is no match. The proper way to do this is introduce an opt-in global option to relax the wildcard matching. Optionally, we would make this the default behavior but we would have to go through a deprecation phase first.

If the new option is ON and there's no match, we should log a user friendly warning e.g. like this.

P.S.

Have a way to specify on a source or transform that its output should be sent to a different transform or sink. Like an inverse of inputs.

Ideally I would like all components (sources, transforms, sinks) to be just a node in the pipeline graph. But that is not directly related to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

2 participants