x-pack/metricbeat/module/meraki: Add new module #40669

DanH-Semplicity · 2024-08-31T05:28:07Z

Proposed commit message

Added Cisco Meraki module with several metricsets to metricbeat.

Added meraki module to x-pack/metricbeat/modules/meraki

Added Metricsets to meraki module:

device_status
appliance_uplink_overview
appliance_uplink_status_and_ha
cellular_gateway_uplink_status
device_appliance_performance_score
device_uplink_loss_and_latency
license_overview
network_health_channel_utilization
wireless_device_channel_utilization

Please explain:

WHAT: metricsets for monitoring cisco meraki
WHY: Improve metricbeat to harvest more monitoring observable metrics
My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Nome to my knowledge.

Author's Checklist

[create unit tests ]
[run code scans]
[test on meraki prod system]

How to test this PR locally

I had access to a local meraki system, and was able to test many of the metricsets.

Related issues

N/A

Use cases

N/A

Screenshots

N/A

Logs

N/A

cla-checker-service · 2024-08-31T05:28:12Z

❌ Author of the following commits did not sign a Contributor Agreement:
d6e356a, c048f77, 2986d09, 9fe235e, 53ae668, c122fc8, 729094b, 0c5b27f, 0ffeed9, a9f809b, 9223215, d2e5848, 07819cd, 998936f, c1f3dd1, bd41dd0

Please, read and sign the above mentioned agreement if you want to contribute to this project

botelastic · 2024-08-31T05:28:13Z

This pull request doesn't have a Team:<team> label.

mergify · 2024-08-31T05:28:44Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @DanH-Semplicity? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

DanH-Semplicity · 2024-08-31T05:29:46Z

I was asked to do an initial drop on the code tonight, however the code is not ready to be merged yet. I still have to work through unit tests, coding style, more testing on prod, etc.

ishleenk17 · 2024-09-02T05:44:55Z

I was asked to do an initial drop on the code tonight, however the code is not ready to be merged yet. I still have to work through unit tests, coding style, more testing on prod, etc.

@DanH-Semplicity : Thanks for the PR!
To get started, please make sure to sign the CLA.

Also, do you have access to Buildkite to review the errors?

For guidance on coding style and best practices, I recommend checking out the Vsphere module as that is being actively worked upon.

ishleenk17 · 2024-09-03T10:57:05Z

/test

ycombinator · 2024-09-04T12:56:58Z

@ishleenk17 Thanks for looking into this PR. Going forward, will this module be owned by @elastic/obs-infraobs-integrations, similar to the vSphere module you mentioned earlier for implementation best practices? If so, this PR should contain a CODEOWNERS entry like so as well:

beats/.github/CODEOWNERS

Line 100 in b11b86a

/metricbeat/module/vsphere @elastic/obs-infraobs-integrations

tommyers-elastic

this is great progress.

i think a lot of the comments here are around consistency in metadata field names which would be solved by joining all this data in a single metricset called something like "device_health". this way we can combine all per-device metrics in a single place, and have just one call to GetOrganizationDevices per collection loop (same goes for uplink metrics/statuses). this should greatly reduce code complexity and result in fewer API calls. in addition this would remove a tonne of boilerplate and repeated code.

tommyers-elastic · 2024-09-04T14:34:09Z

x-pack/metricbeat/module/meraki/appliance_uplink_overview/_meta/fields.yml

+  release: beta
+  description: >
+    appliance_uplink_overview
+  fields:


we eventually need to include the proper mappings here and throughout

Thanks, I thought they were getting auto-updated, I will make sure they are mapped.

x-pack/metricbeat/module/meraki/appliance_uplink_overview/appliance_uplink_overview.go

...ck/metricbeat/module/meraki/appliance_uplink_status_and_ha/appliance_uplink_status_and_ha.go

...cbeat/module/meraki/network_health_channel_utilization/network_health_channel_utilization.go

x-pack/metricbeat/module/meraki/wireless_device_channel_utilization/types.go

...eat/module/meraki/wireless_device_channel_utilization/wireless_device_channel_utilization.go

x-pack/metricbeat/module/meraki/appliance_uplink_overview/appliance_uplink_overview.go

DanH-Semplicity · 2024-09-05T17:48:51Z

Converting to Draft, while I refactor the code to single metricset.

ishleenk17 · 2024-09-05T17:51:36Z

@ishleenk17 Thanks for looking into this PR. Going forward, will this module be owned by @elastic/obs-infraobs-integrations, similar to the vSphere module you mentioned earlier for implementation best practices? If so, this PR should contain a CODEOWNERS entry like so as well:

beats/.github/CODEOWNERS

Line 100 in b11b86a

/metricbeat/module/vsphere @elastic/obs-infraobs-integrations

Thats right, @elastic/obs-infraobs-integrations would become the codeowners.

DanH-Semplicity · 2024-09-07T02:25:51Z

I am still working on the code, but I lost power for 5 hours this afternoon, and so I wanted to commit what I had completed thus far. Still working on review comments and I have to add two more meraki metric integrations for interfaces and tunnels, fix fields.yml, etc, etc .... but I wanted to get code drop, in case I lose power again.

mergify · 2024-09-11T11:54:38Z

backport-8.x has been added to help with the transition to the new branch 8.x.

x-pack/metricbeat/module/meraki/device_health/device_appliance_uplink_status_and_ha.go

x-pack/metricbeat/module/meraki/device_health/device_status.go

x-pack/metricbeat/module/meraki/device_health/device_health.go

x-pack/metricbeat/module/meraki/device_health/device_network_appliance_vpn_sitetosite.go

tommyers-elastic · 2024-09-12T12:50:43Z

x-pack/metricbeat/module/meraki/device_health/device_network_health_channel_utilization.go

+				metric["network.health.channel.radio.wifi0.utilizationNon80211"] = wifi0.UtilizationNon80211
+				metric["network.health.channel.radio.wifi0.utilizationTotal"] = wifi0.UtilizationTotal
+				wifi0_encountered = true
+				metrics = append(metrics, metric)


are these append calls here and line 86 supposed to be here as well as the call on line 90?

oh i see it's if neither of these blocks was entered

if we don't have utilization metrics (i.e. neither of these blocks were entered), why bother reporting any metric events at all?

If you look at lines 46 to 60, it is possible they returned some network health, but the loops were nil. That was my logic, there is very little data in this specific case, if the for loops do not have data. I actually used this in a few locations, where there is sometimes a little or lot of data before a looping structure.

There are some case where is probably warranted due to several values being returned, but perhaps the for loop is nil ... if you search on _encountered you can see a few other spots.

tommyers-elastic · 2024-09-12T12:55:42Z

x-pack/metricbeat/module/meraki/device_health/device_network_health_channel_utilization.go

+	for _, network := range *networks {
+		for _, product_type := range network.ProductTypes {
+			if product_type == "wireless" {
+				networkHealthUtilization, res, err := client.Networks.GetNetworkNetworkHealthChannelUtilization(network.ID, &meraki_api.GetNetworkNetworkHealthChannelUtilizationQueryParams{})


we are still pulling one days worth of data each time we run this - we should only pull data for the current collection period.

we talked offline a little about simplifying things here too, to ensure we only ever get one bucket per call (by specifying a maximum collection period no greater than the resolution of these metrics), were you able to try it out to verify it behaves as expected?

Before I resolved comment, we were looking for additional input, on comments, I believe you had @ asked someone and they never got back to us ... I was looking for guidance on 10 minutes or 60 minutes, since no input, I left it at defaults. I can switch to static 3600 seconds (1 hr) if you want so it matches the wireless default. ???

tommyers-elastic · 2024-09-12T12:57:41Z

x-pack/metricbeat/module/meraki/device_health/device_status.go

+		}
+
+		if score, ok := devicePerformanceScores[serial]; ok {
+			if score.HttpStatusCode == 204 {


would much prefer to just not report metrics if there's no data

lol ... I was like crap, I thought I fixed it ... and I had removed it on the data capture, so there will never be a 204 here, but since dead code, I will remove it.

ok, now fixed both locations. Will be in next code drop.

tommyers-elastic · 2024-09-12T12:59:00Z

x-pack/metricbeat/module/meraki/device_health/device_switch_port.go

+			metrics = append(metrics, metric)
+		}
+
+		if !port_encountered {


same question here as above, if there's no data here, should we bother reporting?

I am honestly 50/50 ... if the meraki api responded with info, it seems like we should return what they sent. They could respond with 6 metrics, with the loops empty, perhaps it will never happen or yea not really pertinent info. I some return 2, 3, 6 and assuming I got loss latency working correctly now, that one I need to keep it, cuz I combined two things.

tommyers-elastic · 2024-09-12T13:17:13Z

x-pack/metricbeat/module/meraki/device_health/device_health.go

+		reportNetworkHealthChannelUtilization(reporter, org, devices, networkHealthUtilizations)
+
+		// Get and Report Organization Wireless Devices Channel Utilization
+		wireless_res, wireless_err := m.client.Devices.GetOrganizationWirelessDevicesChannelUtilizationByDevice(org, &meraki_api.GetOrganizationWirelessDevicesChannelUtilizationByDeviceQueryParams{})


this is still getting 7 days worth of data every time

I believe the wireless devices returns only 1 hr (3600 seconds) by default ... https://developer.cisco.com/meraki/api-v1/get-organization-wireless-devices-channel-utilization-by-device/

See my comment on network, should we set that from 1 day to 3600 seconds, so these are both the same ...

tommyers-elastic · 2024-09-12T14:59:48Z

x-pack/metricbeat/module/meraki/device_health/device_cellular_gateway_uplink_status.go

+
+			for _, item := range *uplink.Uplinks {
+				metrics = append(metrics, mapstr.Union(metric, mapstr.M{
+					"cellular.gateway.uplink.apn":              item.Apn,


do these need to be named differently from other uplink fields, or are the MG uplinks a distinct concept?

AFAICT the MG uplink metadata is just a superset of the other uplink fields (except 'ip_assigned_by')

"uplink.interface" "uplink.status" "uplink.ip" "uplink.gateway" "uplink.public_ip" "uplink.primary_dns" "uplink.secondary_dns" "uplink.ip_assigned_by" ------------------ "cellular.gateway.uplink.interface" "cellular.gateway.uplink.status" "cellular.gateway.uplink.ip" "cellular.gateway.uplink.gateway" "cellular.gateway.uplink.public_ip" "cellular.gateway.uplink.dns1" "cellular.gateway.uplink.dns2" "cellular.gateway.uplink.apn" "cellular.gateway.uplink.connection_type" "cellular.gateway.uplink.iccid" "cellular.gateway.uplink.model" "cellular.gateway.uplink.provider" "cellular.gateway.uplink.signal_stat.rsrp" "cellular.gateway.uplink.signal_stat.rsrq" "cellular.gateway.uplink.signal_type"

Fair point, I agree the fields looks the same. However, I have already seen where meraki has two unique api calls and same ip address and it is not the same. Also in this case it does appear to be a very unique meraki API call, to completely different code trees (client.Appliance.GetOrganizationApplianceUplinkStatuses() and client.CellularGateway.GetOrganizationCellularGatewayUplinkStatuses()) ... Unless I see the data side by side and returning the exact same data, I am not sure I feel comfortable, assuming the APIs are returning the same values. And even then given two completely different calls, I am not sure I trust their API. If I return what Meraki returns and do not try to merge / combine it, then if there is an issue it is Meraki issue and not MB issue. For the naming pattern ... I was trying to do Object, "cellulargateway" in naming for future debug.

tommyers-elastic · 2024-09-12T15:30:53Z

x-pack/metricbeat/module/meraki/device_health/device_network_health_channel_utilization.go

+					networkHealthUtilizations = append(networkHealthUtilizations, networkHealthUtilization)
+				}
+
+			}


nit: we can exit the loop (break) once we have completed this block

gpop63 · 2024-09-13T10:32:33Z

x-pack/metricbeat/module/meraki/device_health/device_appliance_uplink_status_and_ha.go

+								"uplink.loss_latancy.ip":           lossLatencyMetric.IP,
+								"@timestamp":                       lossLatency.Timestamp,
+								"uplink.loss_latancy.loss_percent": lossLatency.LossPercent,
+								"uplink.loss_latancy.latency_ms":   lossLatency.LatencyMs,


Suggested change

"uplink.loss_latancy.ip": lossLatencyMetric.IP,

"@timestamp": lossLatency.Timestamp,

"uplink.loss_latancy.loss_percent": lossLatency.LossPercent,

"uplink.loss_latancy.latency_ms": lossLatency.LatencyMs,

"uplink.loss_latency.ip": lossLatencyMetric.IP,

"@timestamp": lossLatency.Timestamp,

"uplink.loss_latency.loss_percent": lossLatency.LossPercent,

"uplink.loss_latency.latency_ms": lossLatency.LatencyMs,

mergify · 2024-09-17T15:28:48Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b dans_final_factor upstream/dans_final_factor
git merge upstream/main
git push upstream dans_final_factor

inital meraki module and metricsets

d6e356a

DanH-Semplicity requested a review from a team as a code owner August 31, 2024 05:28

DanH-Semplicity requested review from faec and VihasMakwana August 31, 2024 05:28

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 31, 2024

mergify bot assigned DanH-Semplicity Aug 31, 2024

shmsr self-requested a review September 2, 2024 10:06

shmsr added the enhancement label Sep 2, 2024

shmsr changed the title ~~Enhancement - Add to Metricbeat the Meraki Module~~ x-pack/metricbeat/meraki: Add new module Sep 2, 2024

shmsr changed the title ~~x-pack/metricbeat/meraki: Add new module~~ x-pack/metricbeat/module/meraki: Add new module Sep 2, 2024

tommyers-elastic requested changes Sep 4, 2024

View reviewed changes

Merge branch 'main' into dans_final_factor

c048f77

DanH-Semplicity marked this pull request as draft September 5, 2024 17:48

DanH-Semplicity added 2 commits September 6, 2024 20:06

Merge branch 'elastic:main' into dans_final_factor

2986d09

initial refactor for single meraki metricset device_health

9fe235e

DanH-Semplicity added 2 commits September 7, 2024 13:29

added tunnel support aka VPN support by Device

53ae668

adding interfaces aka switch ports and switch port status

c122fc8

mergify bot added the backport-v8.x label Sep 10, 2024

DanH-Semplicity added 2 commits September 10, 2024 14:21

Merge branch 'elastic:main' into dans_final_factor

729094b

processing review comments

0c5b27f

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 11, 2024

v1v removed the backport-v8.x label Sep 11, 2024

DanH-Semplicity and others added 5 commits September 11, 2024 13:57

refactored for comments

0ffeed9

Merge branch 'elastic:main' into dans_final_factor

a9f809b

fixing default metricset

9223215

Removing unused variables and adding text to required variables

d2e5848

add go module deps

d438a84