Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collector] Few improvements to discuss #11

Open
LaurenceJJones opened this issue Oct 22, 2024 · 1 comment
Open

[collector] Few improvements to discuss #11

LaurenceJJones opened this issue Oct 22, 2024 · 1 comment

Comments

@LaurenceJJones
Copy link
Contributor

LaurenceJJones commented Oct 22, 2024

Hey 👋🏻

Wanted to start a discussion on the collector code there a few areas I see some improvement.

When reading from a buffered channel it will block until the channel is empty and closed, so take the errorChan

sentinel/collector.go

Lines 183 to 185 in 8b93836

for err := range errChannel {
log.Println(err)
}

This means the writes to the database cannot happen as the data is flowing into the channel meaning we wait until the whole workgroup is finished before even attempting to write. This doesnt seem too bad if the user only has a handful of containers, but if the user has many then this can cause an increase in memory usage (even though should not be substantial from the struct details) but if we want to go fast then my first suggestion is go func the error chan and add a check in the loop if the channel is closed to break so routine can exit example:

go func {
	for err, ok := range errChannel {
                if !ok {
                    break
                }
		log.Println(err)
	}
}()

This will allow errors to log when they are coming in and writes to the database can write as they are flowing into the channel.

Now the other one I had was generating a go routine per container to get metrics, on surface this may seem fine however this will launch a routine based on O(N), meaning if I have 10 containers it will launch go routines to make requests (unix socket not so bad but we dont support TCP docker yet 🤷🏻 ), read body and unmarshal the json response. I guess this is to ensure we read metrics at the same time, however, as stated this will infinitely scale based on the user usage of coolify.

sentinel/collector.go

Lines 139 to 174 in 8b93836

go func(container types.Container) {
defer wg.Done()
containerNameFromLabel := container.Labels["coolify.name"]
if containerNameFromLabel == "" {
containerNameFromLabel = container.Names[0][1:]
}
resp, err := makeDockerRequest(fmt.Sprintf("/containers/%s/stats?stream=false", container.ID))
if err != nil {
errChannel <- fmt.Errorf("Error getting container stats for %s: %v", containerNameFromLabel, err)
return
}
defer resp.Body.Close()
statsOutput, err := io.ReadAll(resp.Body)
if err != nil {
errChannel <- fmt.Errorf("Error reading container stats for %s: %v", containerNameFromLabel, err)
return
}
var v types.StatsJSON
if err := JSON.Unmarshal(statsOutput, &v); err != nil {
errChannel <- fmt.Errorf("Error decoding container stats for %s: %v", containerNameFromLabel, err)
return
}
metrics := ContainerMetrics{
Name: containerNameFromLabel,
CPUUsagePercentage: calculateCPUPercent(v),
MemoryUsagePercentage: calculateMemoryPercent(v),
MemoryUsed: calculateMemoryUsed(v),
MemoryAvailable: v.MemoryStats.Limit,
}
metricsChannel <- metrics
}(container)

I havent wrapped my brain power on ways we can handle this better other than chunking the container slice to give to the routine (each routine is responsible for X containers), but this may lead to reading metrics at different times (might differ by 1 second at max on current unix socket implementation but tcp is always going to become an issue as we cant control network delays).

I will keep checking the code as add to this as I see new areas.

@LaurenceJJones
Copy link
Contributor Author

Had a brainwave last night, the channel we use execute multiple times, to the database which means cause it sqlite there is only a single writer and reader (currently because we haven't enabled WAL). Couldnt we just exec a single sql statement that insert multiple rows at once 🤷🏻

Seems as off a latest sqlite version it does support the typical format of other sql engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant