Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x-pack/metricbeat/module/meraki: Add new module #40669

Closed
wants to merge 18 commits into from

Conversation

DanH-Semplicity
Copy link
Contributor

Proposed commit message

Added Cisco Meraki module with several metricsets to metricbeat.

Added meraki module to x-pack/metricbeat/modules/meraki

Added Metricsets to meraki module:

  • device_status
  • appliance_uplink_overview
  • appliance_uplink_status_and_ha
  • cellular_gateway_uplink_status
  • device_appliance_performance_score
  • device_uplink_loss_and_latency
  • license_overview
  • network_health_channel_utilization
  • wireless_device_channel_utilization

Please explain:

  • WHAT: metricsets for monitoring cisco meraki

  • WHY: Improve metricbeat to harvest more monitoring observable metrics

  • My code follows the style guidelines of this project

  • I have commented my code, particularly in hard-to-understand areas

  • I have made corresponding changes to the documentation

  • I have made corresponding change to the default configuration files

  • I have added tests that prove my fix is effective or that my feature works

  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Nome to my knowledge.

Author's Checklist

  • [create unit tests ]
  • [run code scans]
  • [test on meraki prod system]

How to test this PR locally

I had access to a local meraki system, and was able to test many of the metricsets.

Related issues

N/A

Use cases

N/A

Screenshots

N/A

Logs

N/A

@DanH-Semplicity DanH-Semplicity requested a review from a team as a code owner August 31, 2024 05:28
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 31, 2024
Copy link

cla-checker-service bot commented Aug 31, 2024

❌ Author of the following commits did not sign a Contributor Agreement:
d6e356a, c048f77, 2986d09, 9fe235e, 53ae668, c122fc8, 729094b, 0c5b27f, 0ffeed9, a9f809b, 9223215, d2e5848, 07819cd, 998936f, c1f3dd1, bd41dd0

Please, read and sign the above mentioned agreement if you want to contribute to this project

@botelastic
Copy link

botelastic bot commented Aug 31, 2024

This pull request doesn't have a Team:<team> label.

Copy link
Contributor

mergify bot commented Aug 31, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @DanH-Semplicity? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@DanH-Semplicity
Copy link
Contributor Author

I was asked to do an initial drop on the code tonight, however the code is not ready to be merged yet. I still have to work through unit tests, coding style, more testing on prod, etc.

@ishleenk17
Copy link
Contributor

I was asked to do an initial drop on the code tonight, however the code is not ready to be merged yet. I still have to work through unit tests, coding style, more testing on prod, etc.

@DanH-Semplicity : Thanks for the PR!
To get started, please make sure to sign the CLA.

Also, do you have access to Buildkite to review the errors?

For guidance on coding style and best practices, I recommend checking out the Vsphere module as that is being actively worked upon.

@shmsr shmsr self-requested a review September 2, 2024 10:06
@shmsr shmsr changed the title Enhancement - Add to Metricbeat the Meraki Module x-pack/metricbeat/meraki: Add new module Sep 2, 2024
@shmsr shmsr changed the title x-pack/metricbeat/meraki: Add new module x-pack/metricbeat/module/meraki: Add new module Sep 2, 2024
@ishleenk17
Copy link
Contributor

/test

@ycombinator
Copy link
Contributor

@ishleenk17 Thanks for looking into this PR. Going forward, will this module be owned by @elastic/obs-infraobs-integrations, similar to the vSphere module you mentioned earlier for implementation best practices? If so, this PR should contain a CODEOWNERS entry like so as well:

/metricbeat/module/vsphere @elastic/obs-infraobs-integrations

Copy link
Contributor

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great progress.

i think a lot of the comments here are around consistency in metadata field names which would be solved by joining all this data in a single metricset called something like "device_health". this way we can combine all per-device metrics in a single place, and have just one call to GetOrganizationDevices per collection loop (same goes for uplink metrics/statuses). this should greatly reduce code complexity and result in fewer API calls. in addition this would remove a tonne of boilerplate and repeated code.

release: beta
description: >
appliance_uplink_overview
fields:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we eventually need to include the proper mappings here and throughout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I thought they were getting auto-updated, I will make sure they are mapped.

@DanH-Semplicity DanH-Semplicity marked this pull request as draft September 5, 2024 17:48
@DanH-Semplicity
Copy link
Contributor Author

Converting to Draft, while I refactor the code to single metricset.

@ishleenk17
Copy link
Contributor

@ishleenk17 Thanks for looking into this PR. Going forward, will this module be owned by @elastic/obs-infraobs-integrations, similar to the vSphere module you mentioned earlier for implementation best practices? If so, this PR should contain a CODEOWNERS entry like so as well:

/metricbeat/module/vsphere @elastic/obs-infraobs-integrations

Thats right, @elastic/obs-infraobs-integrations would become the codeowners.

@DanH-Semplicity
Copy link
Contributor Author

DanH-Semplicity commented Sep 7, 2024

I am still working on the code, but I lost power for 5 hours this afternoon, and so I wanted to commit what I had completed thus far. Still working on review comments and I have to add two more meraki metric integrations for interfaces and tunnels, fix fields.yml, etc, etc .... but I wanted to get code drop, in case I lose power again.

Copy link
Contributor

mergify bot commented Sep 11, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 11, 2024
@v1v v1v removed the backport-v8.x label Sep 11, 2024
metric["network.health.channel.radio.wifi0.utilizationNon80211"] = wifi0.UtilizationNon80211
metric["network.health.channel.radio.wifi0.utilizationTotal"] = wifi0.UtilizationTotal
wifi0_encountered = true
metrics = append(metrics, metric)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these append calls here and line 86 supposed to be here as well as the call on line 90?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see it's if neither of these blocks was entered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't have utilization metrics (i.e. neither of these blocks were entered), why bother reporting any metric events at all?

Copy link
Contributor Author

@DanH-Semplicity DanH-Semplicity Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at lines 46 to 60, it is possible they returned some network health, but the loops were nil. That was my logic, there is very little data in this specific case, if the for loops do not have data. I actually used this in a few locations, where there is sometimes a little or lot of data before a looping structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some case where is probably warranted due to several values being returned, but perhaps the for loop is nil ... if you search on _encountered you can see a few other spots.

for _, network := range *networks {
for _, product_type := range network.ProductTypes {
if product_type == "wireless" {
networkHealthUtilization, res, err := client.Networks.GetNetworkNetworkHealthChannelUtilization(network.ID, &meraki_api.GetNetworkNetworkHealthChannelUtilizationQueryParams{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are still pulling one days worth of data each time we run this - we should only pull data for the current collection period.

we talked offline a little about simplifying things here too, to ensure we only ever get one bucket per call (by specifying a maximum collection period no greater than the resolution of these metrics), were you able to try it out to verify it behaves as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I resolved comment, we were looking for additional input, on comments, I believe you had @ asked someone and they never got back to us ... I was looking for guidance on 10 minutes or 60 minutes, since no input, I left it at defaults. I can switch to static 3600 seconds (1 hr) if you want so it matches the wireless default. ???

}

if score, ok := devicePerformanceScores[serial]; ok {
if score.HttpStatusCode == 204 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would much prefer to just not report metrics if there's no data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol ... I was like crap, I thought I fixed it ... and I had removed it on the data capture, so there will never be a 204 here, but since dead code, I will remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now fixed both locations. Will be in next code drop.

metrics = append(metrics, metric)
}

if !port_encountered {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here as above, if there's no data here, should we bother reporting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am honestly 50/50 ... if the meraki api responded with info, it seems like we should return what they sent. They could respond with 6 metrics, with the loops empty, perhaps it will never happen or yea not really pertinent info. I some return 2, 3, 6 and assuming I got loss latency working correctly now, that one I need to keep it, cuz I combined two things.

reportNetworkHealthChannelUtilization(reporter, org, devices, networkHealthUtilizations)

// Get and Report Organization Wireless Devices Channel Utilization
wireless_res, wireless_err := m.client.Devices.GetOrganizationWirelessDevicesChannelUtilizationByDevice(org, &meraki_api.GetOrganizationWirelessDevicesChannelUtilizationByDeviceQueryParams{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still getting 7 days worth of data every time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the wireless devices returns only 1 hr (3600 seconds) by default ... https://developer.cisco.com/meraki/api-v1/get-organization-wireless-devices-channel-utilization-by-device/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on network, should we set that from 1 day to 3600 seconds, so these are both the same ...


for _, item := range *uplink.Uplinks {
metrics = append(metrics, mapstr.Union(metric, mapstr.M{
"cellular.gateway.uplink.apn": item.Apn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these need to be named differently from other uplink fields, or are the MG uplinks a distinct concept?

AFAICT the MG uplink metadata is just a superset of the other uplink fields (except 'ip_assigned_by')

"uplink.interface"
"uplink.status"
"uplink.ip"
"uplink.gateway"
"uplink.public_ip"
"uplink.primary_dns"
"uplink.secondary_dns"
"uplink.ip_assigned_by"

------------------

"cellular.gateway.uplink.interface"
"cellular.gateway.uplink.status"
"cellular.gateway.uplink.ip"
"cellular.gateway.uplink.gateway"
"cellular.gateway.uplink.public_ip"
"cellular.gateway.uplink.dns1"
"cellular.gateway.uplink.dns2"

"cellular.gateway.uplink.apn"
"cellular.gateway.uplink.connection_type"
"cellular.gateway.uplink.iccid"
"cellular.gateway.uplink.model"
"cellular.gateway.uplink.provider"
"cellular.gateway.uplink.signal_stat.rsrp"
"cellular.gateway.uplink.signal_stat.rsrq"
"cellular.gateway.uplink.signal_type"

Copy link
Contributor Author

@DanH-Semplicity DanH-Semplicity Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I agree the fields looks the same. However, I have already seen where meraki has two unique api calls and same ip address and it is not the same. Also in this case it does appear to be a very unique meraki API call, to completely different code trees (client.Appliance.GetOrganizationApplianceUplinkStatuses() and client.CellularGateway.GetOrganizationCellularGatewayUplinkStatuses()) ... Unless I see the data side by side and returning the exact same data, I am not sure I feel comfortable, assuming the APIs are returning the same values. And even then given two completely different calls, I am not sure I trust their API. If I return what Meraki returns and do not try to merge / combine it, then if there is an issue it is Meraki issue and not MB issue. For the naming pattern ... I was trying to do Object, "cellulargateway" in naming for future debug.

networkHealthUtilizations = append(networkHealthUtilizations, networkHealthUtilization)
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can exit the loop (break) once we have completed this block

Comment on lines +64 to +67
"uplink.loss_latancy.ip": lossLatencyMetric.IP,
"@timestamp": lossLatency.Timestamp,
"uplink.loss_latancy.loss_percent": lossLatency.LossPercent,
"uplink.loss_latancy.latency_ms": lossLatency.LatencyMs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"uplink.loss_latancy.ip": lossLatencyMetric.IP,
"@timestamp": lossLatency.Timestamp,
"uplink.loss_latancy.loss_percent": lossLatency.LossPercent,
"uplink.loss_latancy.latency_ms": lossLatency.LatencyMs,
"uplink.loss_latency.ip": lossLatencyMetric.IP,
"@timestamp": lossLatency.Timestamp,
"uplink.loss_latency.loss_percent": lossLatency.LossPercent,
"uplink.loss_latency.latency_ms": lossLatency.LatencyMs,

Copy link
Contributor

mergify bot commented Sep 17, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b dans_final_factor upstream/dans_final_factor
git merge upstream/main
git push upstream dans_final_factor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify enhancement needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants