You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IPIs are sent from one core to another on "asnd" to a receive end-point that is another core.
When the IPIs are triggered, the kernel "top half handling" essentially goes through all the rings associated. There is N rings associated to each core (treat it as N-1), so in total there are N x N rings.
When a core receives an IPI, the top-half goes through all the N-1 rings and processes the incoming "asnd" requests. We have a pair-wise batching logic that tries to batch requests and send IPI only if the receiving ring (1 to 1 ring) has <= 2 entries.
Processing an IPI: Dequeue an entry from the ring. Looking up the receive end-point on that core and a call to tcap_higher_prio (partial-order) to see if we should deliver the IPI immediately.
Clearly, there will be questions, one of them could be: If we received from a core, why process N rings? Unfortunately there is no way to know which core sent that IPI (x86), therefore we end up looking into each ring for data.
Problem:
The top-half processing is "unbounded" and is run with interrupts disabled (non-preemptive) kernel.
The receiving core, lets say core 1, goes through each ring as mentioned earlier, if it finds an entry for lets say core 0, it starts processing it. While it is processing that entry, if core 0 enqueues another request on the queue, core 1 will continue to process that entry as well. This could in the worst-case go on forever.
Starving the other core requests to core 1, in this example.
Starving other interrupt processing or any user-level applications on the core 1.
Solutions:
Perhaps a fixed limit on how many requests are processed per top-half entry per core. This is probably the simplest but could lead to some complications in IPI batching.
Limit requests not to a fixed size but to the size at the entry to "top-half processing". This is not simpler and perhaps could have more races and make IPI batching even more complicated.
If we didn't have IPI batching, we could potentially only process one request per IPI top-half entry. There is too much overhead and because the "top-half" runs at highest priority, I don't see how we can solve either problem 1 or 2, even though one IPI entry processing is bounded!
ideas ??
ideas ??
Thoughts (not related to the problem):
Can we somehow prioritize cores on a receiving core? If Real-time processing is pinned to core 0 and core 1 and best-effort to the remaining cores. Perhaps prioritizing core 0 in core 1 and core 1 in core 0 IPI processing will be useful? This does mean asynchronously processing requests from other cores. Not sure if that's worth the complexity and there is probably no buy especially with the rate-limiting mechanism we have. (And of course the tcap_higher_prio check if the RT has set higher prio to processing other RT requests over BE requests.)
The text was updated successfully, but these errors were encountered:
Background:
IPIs are sent from one core to another on "asnd" to a receive end-point that is another core.
When the IPIs are triggered, the kernel "top half handling" essentially goes through all the rings associated. There is N rings associated to each core (treat it as N-1), so in total there are N x N rings.
When a core receives an IPI, the top-half goes through all the N-1 rings and processes the incoming "asnd" requests. We have a pair-wise batching logic that tries to batch requests and send IPI only if the receiving ring (1 to 1 ring) has <= 2 entries.
Processing an IPI: Dequeue an entry from the ring. Looking up the receive end-point on that core and a call to
tcap_higher_prio
(partial-order) to see if we should deliver the IPI immediately.Clearly, there will be questions, one of them could be: If we received from a core, why process N rings? Unfortunately there is no way to know which core sent that IPI (x86), therefore we end up looking into each ring for data.
Problem:
The top-half processing is "unbounded" and is run with interrupts disabled (non-preemptive) kernel.
The receiving core, lets say core 1, goes through each ring as mentioned earlier, if it finds an entry for lets say core 0, it starts processing it. While it is processing that entry, if core 0 enqueues another request on the queue, core 1 will continue to process that entry as well. This could in the worst-case go on forever.
Solutions:
Thoughts (not related to the problem):
Can we somehow prioritize cores on a receiving core? If Real-time processing is pinned to core 0 and core 1 and best-effort to the remaining cores. Perhaps prioritizing core 0 in core 1 and core 1 in core 0 IPI processing will be useful? This does mean asynchronously processing requests from other cores. Not sure if that's worth the complexity and there is probably no buy especially with the rate-limiting mechanism we have. (And of course the
tcap_higher_prio
check if the RT has set higher prio to processing other RT requests over BE requests.)The text was updated successfully, but these errors were encountered: