[collector] Few improvements to discuss #11

LaurenceJJones · 2024-10-22T07:10:41Z

Hey 👋🏻

Wanted to start a discussion on the collector code there a few areas I see some improvement.

When reading from a buffered channel it will block until the channel is empty and closed, so take the errorChan

Lines 183 to 185 in 8b93836

    
           for err := range errChannel { 
        
           	log.Println(err) 
        
           }

This means the writes to the database cannot happen as the data is flowing into the channel meaning we wait until the whole workgroup is finished before even attempting to write. This doesnt seem too bad if the user only has a handful of containers, but if the user has many then this can cause an increase in memory usage (even though should not be substantial from the struct details) but if we want to go fast then my first suggestion is go func the error chan and add a check in the loop if the channel is closed to break so routine can exit example:

go func {
	for err, ok := range errChannel {
                if !ok {
                    break
                }
		log.Println(err)
	}
}()

This will allow errors to log when they are coming in and writes to the database can write as they are flowing into the channel.

Now the other one I had was generating a go routine per container to get metrics, on surface this may seem fine however this will launch a routine based on O(N), meaning if I have 10 containers it will launch go routines to make requests (unix socket not so bad but we dont support TCP docker yet 🤷🏻 ), read body and unmarshal the json response. I guess this is to ensure we read metrics at the same time, however, as stated this will infinitely scale based on the user usage of coolify.

sentinel/collector.go

Lines 139 to 174 in 8b93836

    
           go func(container types.Container) { 
        
           	defer wg.Done() 
        
           	containerNameFromLabel := container.Labels["coolify.name"] 
        
           	if containerNameFromLabel == "" { 
        
           		containerNameFromLabel = container.Names[0][1:] 
        
           	} 
        
           	resp, err := makeDockerRequest(fmt.Sprintf("/containers/%s/stats?stream=false", container.ID)) 
        
           	if err != nil { 
        
           		errChannel <- fmt.Errorf("Error getting container stats for %s: %v", containerNameFromLabel, err) 
        
           		return 
        
           	} 
        
           	defer resp.Body.Close() 
        
           	statsOutput, err := io.ReadAll(resp.Body) 
        
           	if err != nil { 
        
           		errChannel <- fmt.Errorf("Error reading container stats for %s: %v", containerNameFromLabel, err) 
        
           		return 
        
           	} 
        
           	var v types.StatsJSON 
        
           	if err := JSON.Unmarshal(statsOutput, &v); err != nil { 
        
           		errChannel <- fmt.Errorf("Error decoding container stats for %s: %v", containerNameFromLabel, err) 
        
           		return 
        
           	} 
        
           	metrics := ContainerMetrics{ 
        
           		Name:                  containerNameFromLabel, 
        
           		CPUUsagePercentage:    calculateCPUPercent(v), 
        
           		MemoryUsagePercentage: calculateMemoryPercent(v), 
        
           		MemoryUsed:            calculateMemoryUsed(v), 
        
           		MemoryAvailable:       v.MemoryStats.Limit, 
        
           	} 
        
           	metricsChannel <- metrics 
        
           }(container)

I havent wrapped my brain power on ways we can handle this better other than chunking the container slice to give to the routine (each routine is responsible for X containers), but this may lead to reading metrics at different times (might differ by 1 second at max on current unix socket implementation but tcp is always going to become an issue as we cant control network delays).

I will keep checking the code as add to this as I see new areas.

LaurenceJJones · 2024-10-30T09:55:12Z

Had a brainwave last night, the channel we use execute multiple times, to the database which means cause it sqlite there is only a single writer and reader (currently because we haven't enabled WAL). Couldnt we just exec a single sql statement that insert multiple rows at once 🤷🏻

Seems as off a latest sqlite version it does support the typical format of other sql engines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[collector] Few improvements to discuss #11

[collector] Few improvements to discuss #11

LaurenceJJones commented Oct 22, 2024 •

edited

Loading

LaurenceJJones commented Oct 30, 2024

[collector] Few improvements to discuss #11

[collector] Few improvements to discuss #11

Comments

LaurenceJJones commented Oct 22, 2024 • edited Loading

LaurenceJJones commented Oct 30, 2024

LaurenceJJones commented Oct 22, 2024 •

edited

Loading