-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flapping detection algorithm to link traps #244
Add flapping detection algorithm to link traps #244
Conversation
The original code has an impressive amount of redundancies when it comes to event generation. I hope to factor those out once I'm closing in on feature parity. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/link-traps-reschedule #244 +/- ##
=================================================================
+ Coverage 98.88% 98.92% +0.04%
=================================================================
Files 51 52 +1
Lines 6882 7056 +174
=================================================================
+ Hits 6805 6980 +175
+ Misses 77 76 -1 ☔ View full report in Codecov by Sentry. |
d2bf37d
to
687c565
Compare
Quality Gate passedIssues Measures |
79dfe92
to
5275908
Compare
acf44fe
to
a65efaf
Compare
This is an initial reinterpretation of what Zino 1 is doing to track flapping states, and is still subject to change. This suggests keeping a separate model for flapping state tracking, so that algorithm methods can be concentrated in that model. Flapping states could also potentially be tracked directly on Port model objects, but we'll try this version first.
These have been implemented in the FlappingStates class instead.
Zino 1 uses it, so we need it too.
a65efaf
to
405339a
Compare
f47a6f4
to
ef688f8
Compare
For quick value lookups where there may be no pre-existing data (in which case we return 0 values)
The logic is more or less kept from Zino 1, although I don't yet understand the concept where an interface can be considered flapping even if we still have no flap tracking data for it...
When flapping stops, the final flap count must be recorded in the event.
This is used by Zino 1, but was not yet allowed in the Zino 2 models
405339a
to
f047b38
Compare
This roughly follows the Zino 1 implementation, but is not yet feature complete.
If there is no flapping state tracked for an interface, it is definitely not considered to be flapping yet. Let other parts of the code initialize flap tracking data for an interface once traps start coming in. This came after discussions with Håvard E about the original Tcl code, which works on a confusing set of global variables.
After talking to Håvard E to understand the intent and flow of the original code better, I felt the need to add some comments and to simplify some logic.
Imports that are only necessary for type annotations should be guarded.
The original Zino code will sometimes alter already closed events. Apparently, the reasoning goes that operators sometimes prematurely close events for flapping ports (which Håvard E argues that we should maybe disallow in the future). The original Zino therefore takes care to update both a closed flapping event and the currently open flapping event whenever the link trap code handles a flapping state. A limitation I immediately see is that multiple closed events could exist for a flapping state (until they are expired/scavenged), but we can only index one of them. As with the original code, there is a bias towards only indexing the latest event. As a side note: Zino 1 used the terms "pre-closed" and "closed" for events. The former refers to events that have been given a state of CLOSED by an operator. The latter refers to events that have been evicted completely from the running state.
As the original Zino does.
Resolved after a discussion with Håvard. When in the middle of a flap storm, we should not commit the event on every update of the flap counter, as that would also create a storm of update notifications to connected clients. This just modifies the existing event object in-place.
This was resolved with Håvard. ifDescr or ifAlias is not normally sent with a trap, and Håvard can think of no good reason for needing to keep a trap-specific mapping of ifindexes to ifdescrs or ifaliases. Which means we can keep using our existing mappings and ignore these comments.
We could sometimes get a trap from a device that has been identified in the running state, but that has been removed from the polldevs config. There is no point to scheduling a poll from such a device.
Tuples as dict keys cannot be serialized to JSON by Pydantic without custom logic. I.e. it serializes to a string representation, with tuple elements separated by commas, but Pydantic is unable to deserialize this representation as a tuple, causing Zino to crash when loading stored state. This annotates the PortIndex type with modifiers that let's Pydantic successfully serialize and deserialize PortIndex dict keys to/from JSON
Flapping state stats were constantly being reset every time an event was updated, due to a bug in the implementation. This renames `FlappingState.flapping` in an attempt to clarify that this is a flag used to indicate that a flapping state has been declared (i.e. an event has likely been created), while the `is_flapping()` method is only used to verify that the current stats qualify as flapping and that a flapping state should/could be established.
Updates/creates docstrings and cleans up a few small things after increased understanding of what the original flapping code is actually tracking.
f047b38
to
1ac3b25
Compare
All existing flap stats should be aged periodically, to ensure flapping states are cleared after traps stop coming in.
The loop may delete FlappingState objects from the iterated dict, which would raise an Exception.
47bcece
to
50340a1
Compare
50340a1
to
241b45e
Compare
We will need it on other test modules that are upcoming.
This factors out `stabilize_flapping_state()` from `age_single_interface_flapping_state()` in order to separate concerns and make things more testable.
The submitted interface argument may not have any flap stats, but there is no need for the implementation to crash if it doesn't. It's safe to quietly ignore it, as this means the interface is already "unflapped".
ea6926e
to
a3132e1
Compare
This belongs in the `FlapState` class itself.
It may be handy to let the caller access the created/updated FlapState object as soon as it has updated it.
Now that the logic is better understood, this factors out the link transition code that updates events for interfaces in either a flapping state or a non-flapping state. There were some redundancies that were copied directly from Zino 1, these have been factored out. Also, the code was made slightly more readable by fetching the associated `FlapState` objects if they exist. Additionally, this fixes a discovered problem in the first version of the code, in where a transition out of a flapping state wasn't handled properly, because the port was looked up incorrectly in the index of flapping states.
Tests more of the behavioral aspects of the link transition handler, focusing on how it handles events and flapping states.
Incomplete implementation of flapping detection algorithms for link traps.
This closes #144 (and completes #232)
This builds on #241 and is dependent on that being merged first, keeping in Draft mode until then.