[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

kaiyan-sheng · 2024-10-24T22:54:43Z

Component(s)

receiver/awsfirehose

Is your feature request related to a problem? Please describe.

When trying to ingest both logs and metrics from Firehose, I tried with two receivers configured with the same endpoint:

receivers:
      awsfirehose/cwmetrics:
        endpoint: {{ (.Values.receiver).hostname | default "localhost"}}:4433
        include_metadata: true
        record_type: cwmetrics
      awsfirehose/cwlogs:
        endpoint: {{ (.Values.receiver).hostname | default "localhost"}}:4433
        include_metadata: true
        record_type: cwlogs

This gave me an error:

2024-10-23T20:01:59.169Z	error	graph/graph.go:426	Failed to start component	{"error": "listen tcp 127.0.0.1:4433: bind: address already in use", "type": "Receiver", "id": "awsfirehose/cwmetrics"}

In order to workaround this issue, I have to use two different endpoints, one for metrics and one for logs.

I would like to have a single endpoint to ingest both metrics and logs.

Describe the solution you'd like

We should be able to check what does the request from Firehose looks like and figure out if it's logs or metrics, if it's JSON format metrics or OTEL1.0 format metrics and then apply the right marshaling.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-24T22:54:58Z

Pinging code owners:

receiver/awsfirehose: @Aneurysm9

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Aneurysm9 · 2024-11-27T17:12:31Z

I would strongly recommend simply using different ports for each signal type. The collector requires unique component instances per signal type, even for components that support multiple signal types, and the mechanism used by the OTLP receiver to support multiple signal types on the same port is generally considered a hack and may not be supported going forward.

Beyond that, the type of configuration presented in the issue description here doesn't really fit in with the sharedcomponent model, which expects the same configuration of the component to be used for multiple signal types by allowing signal type-specific instances to share an HTTP handler.

axw · 2024-11-27T23:40:30Z

Beyond that, the type of configuration presented in the issue description here doesn't really fit in with the sharedcomponent model, which expects the same configuration of the component to be used for multiple signal types by allowing signal type-specific instances to share an HTTP handler.

Agreed, I think @kaiyan-sheng was just showing what she tried to configure. It fails of course because two instances of the receiver cannot bind to the same host:port.

The proposal would be to either get rid of the record_type config setting, or introduce something like record_type: auto, and determine the record type from the data: check for CloudWatch JSON/OTLP metrics, and otherwise treat the data as logs.

Minimal config would be something like:

receivers:
  awsfirehose:
    endpoint: hostname:1234

So there would be no requirement for multiple receiver instances.

Aneurysm9 · 2024-11-27T23:53:31Z

The proposal would be to either get rid of the record_type config setting, or introduce something like record_type: auto, and determine the record type from the data: check for CloudWatch JSON/OTLP metrics, and otherwise treat the data as logs.

I don't think this is viable. There are already three different record types supported for two different signal types and more can be added in the future. These formats are not self-describing and I'm not sure it would be advisable to attempt to unmarshal every inbound request with every type of unmarshaller.

Rather than introducing more complexity locally why can the existing capability to listen on different ports not be used?

axw · 2024-11-28T00:44:09Z

Having multiple ports is an option, it's what we're currently doing. We're looking for a way to simplify our user experience. The goal is to have an experience similar to configuring an OTel SDK/collector for sending logs/metrics/traces/profiles to a single OTLP endpoint, regardless of the signal type or protobuf/JSON encoding.

There's a bit of cognitive overhead in needing to know which URL/port to send to based on the signal type and encoding. Imagine you're creating a CloudWatch Metric Stream: if you change from the JSON encoding to OTLP 1.0, now you need to remember to use a different port. Instead of making it the user's problem, we'd like to offer a single Firehose endpoint which can determine the data format itself.

It is possible to deduce the type by sniffing the data without attempting a full unmarshal of each type -- we've implemented this. To be fair though, that's not necessarily future proof.

constanca-m · 2024-12-05T10:35:16Z

These formats are not self-describing and I'm not sure it would be advisable to attempt to unmarshal every inbound request with every type of unmarshaller.

@Aneurysm9 This would not happen. The way I envision this going would be:

We unmarshall the request to map[string]interface{}.
Then based on this map, we will determine which type of record it is. We can make use of the existent functions already:
- If it is a cloudwatch metric, then it has these fields.
- If it is a cloudwatch log, then it has these fields.
- Otherwise, it is is a log directly sent to firehose.

Does this sound OK to you?

I am taking over the PR #36184, and I have come across this need because if we set a new record type firehoselogs, all the other types become redundant.

Additionally the code and comments seem to not be up to date:

opentelemetry-collector-contrib/receiver/awsfirehosereceiver/factory.go

Lines 76 to 84 in f55d810

    
           // createDefaultConfig creates a default config with the endpoint set 
        
           // to port 8443 and the record type set to the CloudWatch metric stream. 
        
           func createDefaultConfig() component.Config { 
        
           	return &Config{ 
        
           		ServerConfig: confighttp.ServerConfig{ 
        
           			Endpoint: testutil.EndpointForPort(defaultPort), 
        
           		}, 
        
           	} 
        
           }

We do not have a default record_type right now. If we add this new type, we could have this default configuration again. We could still keep all these types, and leave to the users to decide if they rather specify their own record type themselves or use an automatic record type.

What do you think @Aneurysm9 ?

kaiyan-sheng added enhancement New feature or request needs triage New item requiring triage labels Oct 24, 2024

github-actions bot added the receiver/awsfirehose label Oct 24, 2024

This was referenced Oct 29, 2024

Weekly Report: 2024-10-22 - 2024-10-29 #36039

Closed

Weekly Report: 2024-10-29 - 2024-11-05 #36187

Closed

github-actions bot mentioned this issue Nov 12, 2024

Weekly Report: 2024-11-05 - 2024-11-12 #36302

Closed

github-actions bot mentioned this issue Nov 19, 2024

Weekly Report: 2024-11-12 - 2024-11-19 #36426

Closed

github-actions bot mentioned this issue Nov 26, 2024

Weekly Report: 2024-11-19 - 2024-11-26 #36533

Closed

github-actions bot mentioned this issue Dec 3, 2024

Weekly Report: 2024-11-26 - 2024-12-03 #36628

Closed

constanca-m mentioned this issue Dec 6, 2024

[awsfirehosereceiver] Add auto type #36708

Open

github-actions bot mentioned this issue Dec 10, 2024

Weekly Report: 2024-12-03 - 2024-12-10 #36739

Closed

This was referenced Dec 17, 2024

Weekly Report: 2024-12-10 - 2024-12-17 #36867

Open

Weekly Report: 2024-12-17 - 2024-12-24 #36929

Open

github-actions bot mentioned this issue Dec 31, 2024

Weekly Report: 2024-12-24 - 2024-12-31 #36987

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

kaiyan-sheng commented Oct 24, 2024

github-actions bot commented Oct 24, 2024

Aneurysm9 commented Nov 27, 2024

axw commented Nov 27, 2024

Aneurysm9 commented Nov 27, 2024

axw commented Nov 28, 2024

constanca-m commented Dec 5, 2024 •

edited

Loading

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

Comments

kaiyan-sheng commented Oct 24, 2024

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Oct 24, 2024

Aneurysm9 commented Nov 27, 2024

axw commented Nov 27, 2024

Aneurysm9 commented Nov 27, 2024

axw commented Nov 28, 2024

constanca-m commented Dec 5, 2024 • edited Loading

constanca-m commented Dec 5, 2024 •

edited

Loading