The following overview presents a list of collectd plugins and their configuration via TripleO or OSP Director. All of these plugins can be included in the list of CollectdExtraPlugins as listed in the examples below. These examples can be combined by simply adding the plugin to CollectdExtraPlugins and the respective config to ExtraConfig.
To disable a plugin for a group, such as generally disable the default plugin 'cpu', use something like the following:
ExtraConfig:
collectd::plugin::cpu::ensure: absent
When set to true, various statistics about the collectd daemon will be collected, with "collectd" as the plugin name. Defaults to false.
Configures the interval in which to query the read plugins. Obviously smaller values lead to a higher system load produced by collectd, while higher values lead to more coarse statistics.
Metrics are read by the read threads and then put into a queue to be handled by the write threads. If one of the write plugins is slow (e.g. network timeouts, I/O saturation of the disk) this queue will grow. In order to avoid running into memory issues in such a case, you can limit the size of this queue.
By default, there is no limit and memory may grow indefinitely. This is most likely not an issue for clients, i.e. instances that only handle the local metrics. For servers it is recommended to set this to a non-zero value, though.
You can set the limits using WriteQueueLimitHigh and WriteQueueLimitLow. Each of them takes a numerical argument which is the number of metrics in the queue. If there are HighNum metrics in the queue, any new metrics will be dropped. If there are less than LowNum metrics in the queue, all new metrics will be enqueued. If the number of metrics currently in the queue is between LowNum and HighNum, the metric is dropped with a probability that is proportional to the number of metrics in the queue (i.e. it increases linearly until it reaches 100%.)
If WriteQueueLimitHigh is set to non-zero and WriteQueueLimitLow is unset, the latter will default to half of WriteQueueLimitHigh.
If you do not want to randomly drop values when the queue size is between LowNum and HighNum, set WriteQueueLimitHigh and WriteQueueLimitLow to the same value.
Enabling the CollectInternalStats option is of great help to figure out the values to set WriteQueueLimitHigh and WriteQueueLimitLow to.
parameter_defaults:
ExtraConfig:
collectd::write_queue_limit_high: 100
collectd::write_queue_limit_low: 100
The Apache plugin queries the page generated by mod_status, the status module of the Apache web server, parses it and submits the number of bytes transfered, the number of requests received, and the number of processes in the various states of the scoreboard.
parameter_defaults:
ExtraConfig:
collectd::plugin::apache:
localhost:
url: "http://10.0.0.111/status?auto"
The ceph plugin gathers extensive data from ceph daemons.
parameter_defaults:
ExtraConfig:
collectd::plugin::ceph::daemons:
- ceph-osd.0
- ceph-osd.1
- ceph-osd.2
- ceph-osd.3
- ceph-osd.4
Note: the osds have to be listed, even if they don't exist on all nodes.
*Note: The ceph plugin will be added automatically on ceph nodes, when collectd is deployed. Adding the ceph plugin on ceph nodes to CollectdExtraPlugins will result in a deployment failure.
More info can be found on the man page and on the collectd wiki.
The CPU plugin collects the amount of time spent by the CPU in various states, most notably executing user code, executing system code, waiting for IO-operations and being idle.
The CPU plugin does not collect percentages. It collects “jiffies”, the units of scheduling. On many Linux systems there are circa 100 jiffies in one second, but this does not mean you will end up with a percentage. Depending on system load, hardware, whether or not the system is virtualized and possibly half a dozen other factors there may be more or less than 100 jiffies in one second. There is absolutely no guarantee that all states add up to 100, an absolute must for percentages.
The following values are reported by the CPU plugin:
Name | Description | Comment |
---|---|---|
idle | Amount of idle | collectd_cpu_total{...,type_instance='idle'} |
interrupt | cpu blocked by interrupts | collectd_cpu_total{...,type_instance='interrupt'} |
nice | amount of time running low priority processes | collectd_cpu_total{...,type_instance='nice'} |
softirq | amount of cycles spent in servicing interrupt requests | collectd_cpu_total{...,type_instance='waitirq'} |
steal | Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor. | collectd_cpu_total{...,type_instance='steal'} |
system | amount spent on system level (kernel) | collectd_cpu_total{...,type_instance='system'} |
user | jiffies used by user processes | collectd_cpu_total{...,type_instance='user'} |
wait | cpu waiting on outstanding I/O request | collectd_cpu_total{...,type_instance='wait'} |
parameter_defaults:
CollectdExtraPlugins:
- cpu
ExtraConfig:
collectd::plugin::cpu::ReportByState: true
More options to be found at https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_cpu
The DF plugin collects file system usage information, i. e. basically how much space on a mounted partition is used and how much is available. It's named after and very similar to the df(1) UNIX command that's been around forever.
The following values are reported by the df plugin:
Name | Description | Comment |
---|---|---|
free | amount of free space | collectd_df_df_complex{..., type_instance="free"} |
reserved | reserved disk space | collectd_df_df_complex{..., type_instance="reserved"} |
used | used disk space | collectd_df_df_complex{..., type_instance="used"} |
parameter_defaults:
CollectdExtraPlugins:
- df
ExtraConfig:
collectd::plugin::df::FStype: "ext4"
The Disk plugin collects performance statistics of hard-disks and, where supported, partitions. While the “octets” and “operations” are quite straight forward, the other two datasets need a little explanation:
Name | Description |
---|---|
merged | are the number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better. |
time | is the average time an I/O-operation took to complete. Since this is a little messy to calculate take the actual values with a grain of salt. |
io_time | time spent doing I/Os (ms). You can treat this metric as a device load percentage (Value of 1 sec time spent matches 100% of load). |
weighted_io_time | measure of both I/O completion time and the backlog that may be accumulating. |
pending_operations | shows queue size of pending I/O operations. |
The following is an example how to use or configure the disk plugin.
parameter_defaults:
CollectdExtraPlugins:
- disk
ExtraConfig:
collectd::plugin::disk::disk: "sda"
collectd::plugin::disk::ignoreselected: false
parameter_defaults:
CollectdExtraPlugins:
- interface
ExtraConfig:
collectd::plugin::interface:
The load plugin collects the system load and gives a rough overview on the system utilization.
The system load is defined as the number of runnable tasks in the run-queue.
Name | Description |
---|---|
load_longterm | average system load over the past 15 minues |
load_midterm | average system load over the past 5 minutes |
load_shortterm | average system load over the last minute |
parameter_defaults:
CollectdExtraPlugins:
- load
ExtraConfig:
collectd::plugin::load:
parameter_defaults:
CollectdExtraPlugins:
- memory
Open vSwitch, sometimes abbreviated as OvS, is a production-quality open-source implementation of a distributed virtual multi-layer switch. The main purpose of Open vSwitch is to provide a switching stack for hardware virtualisation environments, while supporting multiple protocols and standards used in computer networks.
The ovs_stats plugin collects statistics of OVS connected interfaces. This plugin uses OVSDB management protocol (RFC7047) monitor mechanism to get statistics from OVSDB.
parameter_defaults:
CollectdExtraPlugins:
- ovs_stats
ExtraConfig:
collectd::plugin::ovs_stats::socket: '/run/openvswitch/db.sock'
During deployment via tripleo aka OSP director, the ovs_stats plugin will get configured automatically. The options above are only mentioned in case you have to override them.
The intel_pmu plugin collects information provided by Linux perf interface which provides rich generalized abstractions over hardware specific capabilities. All events are reported on a per core basis.
Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots.
The sysevent plugin monitors rsyslog messages, filters for log messages and creates events when a defined filter was triggered.
This plugin allows CPU, disk, network load and other metrics to be collected for virtualized guests on the machine.
parameter_defaults:
CollectdExtraPlugins:
- virt
ExtraConfig:
collectd::plugin::virt:hostname_format: "hostname"
collectd::plugin::virt:plugin_instance_format: "uuid"
collectd::plugin::virt:extra_stats: "cpu_util disk vcpu"
the plugin writes values to an amqp1 message bus, such as QPID.
parameter_defaults:
CollectdExtraPlugins:
- amqp1
ExtraConfig:
collectd::plugin::amqp1::send_queue_limit: 50
This output plugin submits values to an HTTP server using POST requests and encoding metrics with JSON or using the PUTVAL command.
parameter_defaults:
CollectdExtraPlugins:
- write_http
ExtraConfig:
collectd::plugin::write_http::nodes:
collectd:
url: collectd.tld.org
metrics: true
header: foo
For more options, see https://collectd.org/wiki/index.php/Plugin:Write_HTTP
This write plugin sends values to a Kafka topic.
parameter_defaults:
CollectdExtraPlugins:
- write_kafka
ExtraConfig:
collectd::plugin::write_kafka::kafka_hosts:
- nodeA
- nodeB
collectd::plugin::write_kafka::topics:
some_events:
format: JSON