You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This means the writes to the database cannot happen as the data is flowing into the channel meaning we wait until the whole workgroup is finished before even attempting to write. This doesnt seem too bad if the user only has a handful of containers, but if the user has many then this can cause an increase in memory usage (even though should not be substantial from the struct details) but if we want to go fast then my first suggestion is go func the error chan and add a check in the loop if the channel is closed to break so routine can exit example:
This will allow errors to log when they are coming in and writes to the database can write as they are flowing into the channel.
Now the other one I had was generating a go routine per container to get metrics, on surface this may seem fine however this will launch a routine based on O(N), meaning if I have 10 containers it will launch go routines to make requests (unix socket not so bad but we dont support TCP docker yet 🤷🏻 ), read body and unmarshal the json response. I guess this is to ensure we read metrics at the same time, however, as stated this will infinitely scale based on the user usage of coolify.
errChannel<-fmt.Errorf("Error decoding container stats for %s: %v", containerNameFromLabel, err)
return
}
metrics:=ContainerMetrics{
Name: containerNameFromLabel,
CPUUsagePercentage: calculateCPUPercent(v),
MemoryUsagePercentage: calculateMemoryPercent(v),
MemoryUsed: calculateMemoryUsed(v),
MemoryAvailable: v.MemoryStats.Limit,
}
metricsChannel<-metrics
}(container)
I havent wrapped my brain power on ways we can handle this better other than chunking the container slice to give to the routine (each routine is responsible for X containers), but this may lead to reading metrics at different times (might differ by 1 second at max on current unix socket implementation but tcp is always going to become an issue as we cant control network delays).
I will keep checking the code as add to this as I see new areas.
The text was updated successfully, but these errors were encountered:
Had a brainwave last night, the channel we use execute multiple times, to the database which means cause it sqlite there is only a single writer and reader (currently because we haven't enabled WAL). Couldnt we just exec a single sql statement that insert multiple rows at once 🤷🏻
Seems as off a latest sqlite version it does support the typical format of other sql engines.
Hey 👋🏻
Wanted to start a discussion on the collector code there a few areas I see some improvement.
When reading from a buffered channel it will block until the channel is empty and closed, so take the errorChan
sentinel/collector.go
Lines 183 to 185 in 8b93836
This means the writes to the database cannot happen as the data is flowing into the channel meaning we wait until the whole workgroup is finished before even attempting to write. This doesnt seem too bad if the user only has a handful of containers, but if the user has many then this can cause an increase in memory usage (even though should not be substantial from the struct details) but if we want to go fast then my first suggestion is go func the error chan and add a check in the loop if the channel is closed to break so routine can exit example:
This will allow errors to log when they are coming in and writes to the database can write as they are flowing into the channel.
Now the other one I had was generating a go routine per container to get metrics, on surface this may seem fine however this will launch a routine based on O(N), meaning if I have 10 containers it will launch go routines to make requests (unix socket not so bad but we dont support TCP docker yet 🤷🏻 ), read body and unmarshal the json response. I guess this is to ensure we read metrics at the same time, however, as stated this will infinitely scale based on the user usage of coolify.
sentinel/collector.go
Lines 139 to 174 in 8b93836
I havent wrapped my brain power on ways we can handle this better other than chunking the container slice to give to the routine (each routine is responsible for X containers), but this may lead to reading metrics at different times (might differ by 1 second at max on current unix socket implementation but tcp is always going to become an issue as we cant control network delays).
I will keep checking the code as add to this as I see new areas.
The text was updated successfully, but these errors were encountered: