-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x-pack/metricbeat/module/meraki: Add new module #40669
Conversation
❌ Author of the following commits did not sign a Contributor Agreement: Please, read and sign the above mentioned agreement if you want to contribute to this project |
This pull request doesn't have a |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
I was asked to do an initial drop on the code tonight, however the code is not ready to be merged yet. I still have to work through unit tests, coding style, more testing on prod, etc. |
@DanH-Semplicity : Thanks for the PR! Also, do you have access to Buildkite to review the errors? For guidance on coding style and best practices, I recommend checking out the Vsphere module as that is being actively worked upon. |
/test |
@ishleenk17 Thanks for looking into this PR. Going forward, will this module be owned by @elastic/obs-infraobs-integrations, similar to the vSphere module you mentioned earlier for implementation best practices? If so, this PR should contain a CODEOWNERS entry like so as well: Line 100 in b11b86a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great progress.
i think a lot of the comments here are around consistency in metadata field names which would be solved by joining all this data in a single metricset called something like "device_health". this way we can combine all per-device metrics in a single place, and have just one call to GetOrganizationDevices
per collection loop (same goes for uplink metrics/statuses). this should greatly reduce code complexity and result in fewer API calls. in addition this would remove a tonne of boilerplate and repeated code.
release: beta | ||
description: > | ||
appliance_uplink_overview | ||
fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we eventually need to include the proper mappings here and throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I thought they were getting auto-updated, I will make sure they are mapped.
x-pack/metricbeat/module/meraki/appliance_uplink_overview/appliance_uplink_overview.go
Outdated
Show resolved
Hide resolved
x-pack/metricbeat/module/meraki/appliance_uplink_overview/appliance_uplink_overview.go
Outdated
Show resolved
Hide resolved
...ck/metricbeat/module/meraki/appliance_uplink_status_and_ha/appliance_uplink_status_and_ha.go
Outdated
Show resolved
Hide resolved
...ck/metricbeat/module/meraki/appliance_uplink_status_and_ha/appliance_uplink_status_and_ha.go
Outdated
Show resolved
Hide resolved
...cbeat/module/meraki/network_health_channel_utilization/network_health_channel_utilization.go
Outdated
Show resolved
Hide resolved
x-pack/metricbeat/module/meraki/wireless_device_channel_utilization/types.go
Outdated
Show resolved
Hide resolved
...eat/module/meraki/wireless_device_channel_utilization/wireless_device_channel_utilization.go
Outdated
Show resolved
Hide resolved
...eat/module/meraki/wireless_device_channel_utilization/wireless_device_channel_utilization.go
Outdated
Show resolved
Hide resolved
x-pack/metricbeat/module/meraki/appliance_uplink_overview/appliance_uplink_overview.go
Outdated
Show resolved
Hide resolved
Converting to Draft, while I refactor the code to single metricset. |
Thats right, @elastic/obs-infraobs-integrations would become the codeowners. |
I am still working on the code, but I lost power for 5 hours this afternoon, and so I wanted to commit what I had completed thus far. Still working on review comments and I have to add two more meraki metric integrations for interfaces and tunnels, fix fields.yml, etc, etc .... but I wanted to get code drop, in case I lose power again. |
|
x-pack/metricbeat/module/meraki/device_health/device_appliance_uplink_status_and_ha.go
Outdated
Show resolved
Hide resolved
x-pack/metricbeat/module/meraki/device_health/device_network_appliance_vpn_sitetosite.go
Outdated
Show resolved
Hide resolved
metric["network.health.channel.radio.wifi0.utilizationNon80211"] = wifi0.UtilizationNon80211 | ||
metric["network.health.channel.radio.wifi0.utilizationTotal"] = wifi0.UtilizationTotal | ||
wifi0_encountered = true | ||
metrics = append(metrics, metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these append
calls here and line 86 supposed to be here as well as the call on line 90?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see it's if neither of these blocks was entered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't have utilization metrics (i.e. neither of these blocks were entered), why bother reporting any metric events at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at lines 46 to 60, it is possible they returned some network health, but the loops were nil. That was my logic, there is very little data in this specific case, if the for loops do not have data. I actually used this in a few locations, where there is sometimes a little or lot of data before a looping structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some case where is probably warranted due to several values being returned, but perhaps the for loop is nil ... if you search on _encountered you can see a few other spots.
for _, network := range *networks { | ||
for _, product_type := range network.ProductTypes { | ||
if product_type == "wireless" { | ||
networkHealthUtilization, res, err := client.Networks.GetNetworkNetworkHealthChannelUtilization(network.ID, &meraki_api.GetNetworkNetworkHealthChannelUtilizationQueryParams{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are still pulling one days worth of data each time we run this - we should only pull data for the current collection period.
we talked offline a little about simplifying things here too, to ensure we only ever get one bucket per call (by specifying a maximum collection period no greater than the resolution of these metrics), were you able to try it out to verify it behaves as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I resolved comment, we were looking for additional input, on comments, I believe you had @ asked someone and they never got back to us ... I was looking for guidance on 10 minutes or 60 minutes, since no input, I left it at defaults. I can switch to static 3600 seconds (1 hr) if you want so it matches the wireless default. ???
} | ||
|
||
if score, ok := devicePerformanceScores[serial]; ok { | ||
if score.HttpStatusCode == 204 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would much prefer to just not report metrics if there's no data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol ... I was like crap, I thought I fixed it ... and I had removed it on the data capture, so there will never be a 204 here, but since dead code, I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now fixed both locations. Will be in next code drop.
metrics = append(metrics, metric) | ||
} | ||
|
||
if !port_encountered { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here as above, if there's no data here, should we bother reporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am honestly 50/50 ... if the meraki api responded with info, it seems like we should return what they sent. They could respond with 6 metrics, with the loops empty, perhaps it will never happen or yea not really pertinent info. I some return 2, 3, 6 and assuming I got loss latency working correctly now, that one I need to keep it, cuz I combined two things.
reportNetworkHealthChannelUtilization(reporter, org, devices, networkHealthUtilizations) | ||
|
||
// Get and Report Organization Wireless Devices Channel Utilization | ||
wireless_res, wireless_err := m.client.Devices.GetOrganizationWirelessDevicesChannelUtilizationByDevice(org, &meraki_api.GetOrganizationWirelessDevicesChannelUtilizationByDeviceQueryParams{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still getting 7 days worth of data every time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the wireless devices returns only 1 hr (3600 seconds) by default ... https://developer.cisco.com/meraki/api-v1/get-organization-wireless-devices-channel-utilization-by-device/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on network, should we set that from 1 day to 3600 seconds, so these are both the same ...
|
||
for _, item := range *uplink.Uplinks { | ||
metrics = append(metrics, mapstr.Union(metric, mapstr.M{ | ||
"cellular.gateway.uplink.apn": item.Apn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these need to be named differently from other uplink fields, or are the MG uplinks a distinct concept?
AFAICT the MG uplink metadata is just a superset of the other uplink fields (except 'ip_assigned_by')
"uplink.interface"
"uplink.status"
"uplink.ip"
"uplink.gateway"
"uplink.public_ip"
"uplink.primary_dns"
"uplink.secondary_dns"
"uplink.ip_assigned_by"
------------------
"cellular.gateway.uplink.interface"
"cellular.gateway.uplink.status"
"cellular.gateway.uplink.ip"
"cellular.gateway.uplink.gateway"
"cellular.gateway.uplink.public_ip"
"cellular.gateway.uplink.dns1"
"cellular.gateway.uplink.dns2"
"cellular.gateway.uplink.apn"
"cellular.gateway.uplink.connection_type"
"cellular.gateway.uplink.iccid"
"cellular.gateway.uplink.model"
"cellular.gateway.uplink.provider"
"cellular.gateway.uplink.signal_stat.rsrp"
"cellular.gateway.uplink.signal_stat.rsrq"
"cellular.gateway.uplink.signal_type"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, I agree the fields looks the same. However, I have already seen where meraki has two unique api calls and same ip address and it is not the same. Also in this case it does appear to be a very unique meraki API call, to completely different code trees (client.Appliance.GetOrganizationApplianceUplinkStatuses() and client.CellularGateway.GetOrganizationCellularGatewayUplinkStatuses()) ... Unless I see the data side by side and returning the exact same data, I am not sure I feel comfortable, assuming the APIs are returning the same values. And even then given two completely different calls, I am not sure I trust their API. If I return what Meraki returns and do not try to merge / combine it, then if there is an issue it is Meraki issue and not MB issue. For the naming pattern ... I was trying to do Object, "cellulargateway" in naming for future debug.
networkHealthUtilizations = append(networkHealthUtilizations, networkHealthUtilization) | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can exit the loop (break) once we have completed this block
"uplink.loss_latancy.ip": lossLatencyMetric.IP, | ||
"@timestamp": lossLatency.Timestamp, | ||
"uplink.loss_latancy.loss_percent": lossLatency.LossPercent, | ||
"uplink.loss_latancy.latency_ms": lossLatency.LatencyMs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"uplink.loss_latancy.ip": lossLatencyMetric.IP, | |
"@timestamp": lossLatency.Timestamp, | |
"uplink.loss_latancy.loss_percent": lossLatency.LossPercent, | |
"uplink.loss_latancy.latency_ms": lossLatency.LatencyMs, | |
"uplink.loss_latency.ip": lossLatencyMetric.IP, | |
"@timestamp": lossLatency.Timestamp, | |
"uplink.loss_latency.loss_percent": lossLatency.LossPercent, | |
"uplink.loss_latency.latency_ms": lossLatency.LatencyMs, |
This pull request is now in conflicts. Could you fix it? 🙏
|
Proposed commit message
Added Cisco Meraki module with several metricsets to metricbeat.
Added meraki module to x-pack/metricbeat/modules/meraki
Added Metricsets to meraki module:
Please explain:
WHAT: metricsets for monitoring cisco meraki
WHY: Improve metricbeat to harvest more monitoring observable metrics
My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Nome to my knowledge.
Author's Checklist
How to test this PR locally
I had access to a local meraki system, and was able to test many of the metricsets.
Related issues
N/A
Use cases
N/A
Screenshots
N/A
Logs
N/A