-
Notifications
You must be signed in to change notification settings - Fork 13
[cleanup] Common blocks related to VFAT sync checking #163
Comments
I'm wondering if a similar function/functor couldn't be used for automatic VAT mask determination instead of using a However, I think it should be possible to drop the VFAT masks parameter altogether without sacrificing the functionalities. If the VFATs are included into a scan only if both the
Since the number of scanned VFATs wouldn't be known in advance, the calling code would have to be resilient enough. I think it is already the case. Regarding you explicit question, I believe the VFAT synchronization should checked once outside of the loop in the general case. If the synchronization is lost during the scan, an exception (i.e. failing read or write) would be thrown anyway. |
@lpetre-ulb there's such function: Line 34 in c80e012
and Lines 19 to 32 in c80e012
The vfatmask parameter is required in case you want to force it (e.g. scan only certain chips instead of all of them, which may lead to much longer scan time or to code crash because of bad sync) |
@mexanick, indeed. However, this does not mean that automatic VFAT mask is impossible. My proposal was to implement a new RPC method like this one: std::uint32_t getVFATMask::operator()(std::uint32_t ohMask) const; which does the following:
With this proposal, the scans would be performed only on properly synchronized VFATs which are in run mode. If a script needs to force a VFAT mask, it could use a |
I totally agree with your proposal of new function/functor. In some sort this functionality is already implemented as I pointed above (+ some stuff in calibration routines), but the implementation could be sparse and inconsistent. I.e. my point was that new development isn't required, but rather refactoring is needed. |
I want to flesh out three scenarios:
Whatever solution we end up with will need to be able to handle (at least) these modes. |
I must admit that I focused on the software design more than on the ASIC itself. I also don't know if setting the VFATs out of run mode actually protect them. However, my proposal is to scan only, but all, the VFATs in run mode and without synchronization error (that should be implied). So, the proposal matches your requirement. One of the issues with turning on and off the VFATs is that this must be done very carefully within the RPC modules. The VFATs should be in the state state at the end of a scan. Another issue is that related to scripts and/or procedures involving multiple scans. What would be the rationale for turning on and off the VFATs between each scan? Since it was shown that the noise level changes when you set in and off of run mode the VFATs, I think it is preferable to keep them in run mode during the trimming for example. Brian also observed unexpected behavior after setting a VFAT in run mode: ctp7_modules/src/calibration_routines.cpp Line 1213 in c80e012
So, I believe this should be the responsibility of the script and/or online software. In fact, this is part of the bigger question of the front-end state management. For me, the RPC scan functions should modify as little as possible the front-end state and always return it configured as it was before the scan. In my opinion, it is the easiest and safest way to track the state of the front-end and keep it consistent between scans. A scan RPC method is not a configuration RPC method. Therefore, the methods could be split into two sets: the ones which modify the front-end state (
Okay, so follow-up question on the production scenario. What should happen if a VFAT is not in a good state? There is already one VFAT with the issue at P5; hopefully it will be resolved 🤞. But, can it be guaranteed that this will never happen after commissioning? Should the behavior be best effort or crash the scan (as soon as possible)? |
I vote for "best effort" - we don't want to loose a precious operations (and completely valid information we are getting from "good-behaving" parts of the front-end) because some (hopefully small) portion of the system went into a wrong state. |
For production, we will be operating under the assumption that if we say a VFAT is working, then for all intents and purposes, the VFAT should work, and if there is a problem, an error condition should be raised. I.e., if we're configuring for global data taking and one of the VFATs is not in sync, either we take an automatic action to correct this (we should be considering this), or we go to an error state, such that an updated configuration could be made to mask this VFAT. We definitely want to avoid the situation where the SW has a problem with some HW and just carries on, ignoring that HW, even if that information finds its way into a log somewhere. These sorts of operational and functional requirements need to be considered at every step in the design process. N.B. also during operations, we'll be getting periodic |
I fully agree on the automatic recovering. However, should the system go to an error state in case, for example, a single VFAT looses its synchronization? Ot should it be only notified to the shifter/expert while continuing to take data. One could consider to use a best effort strategy, while reporting any error to the relevant people and moving to a degraded/error state. I'm sure there are already many parameters other than VFAT synchronization to monitor during operation. Automatically masking the VFATs at the RPC module level is maybe not the way to go in this case. However, I'm convinced that the calling software should be able to retrieve which VFAT had an issue. Then, it could mask the VFAT (if it is the appropriate action) and move to an error/degraded state while continuing to take data with the remaining healthy system.
Hence, my question. ;) I also believe there is not a single good answer. For example, in the CTP7 firmware, a VFAT is automatically masked from the DAQ event builder if it is detected as out-of-sync. This avoids timeouts which reduce the bandwidth and possibly other damaging consequences (e.g. broken event building). Regarding the implementation of your production vs test stand requirement, I guess that we cannot make the RPC modules stateful and that we should also avoid using any configuration file? Therefore, the simplest way is probably to not perform any automatic VFAT mask determination in the RPC module (and maybe not even a synchronization check) and rely on the higher level software to implement the desired behavior. There would be 3 possible parameter sets:
I've excluded ( |
Brief summary of issue
As started in #160 and expanded in #162, this is another cleanup operation, related to the following types of block (updated for new framework, but functionally unchanged):
I was looking to see if this entire check could be moved into an updated
vfatSyncCheck(ohN, vfatMask)
, however, we will probably always wantvfatSyncCheck
to return the syncMask, even in the event that some VFATs are not sync'd.This means that we can't
throw
the exception fromvfatSyncCheck
since we need the return value (goodVFATs
).If this is the case, then we still would have to put this check into every function that wants to make the promise that it operated on all specified VFATs...
The second place this type of check shows up is inside loops over the VFATs
Here I would raise the question of whether the call to
vfatSyncCheck
needs to be inside the loop or not (though it may vary on a case-by-case basis, depending on what is happening inside the loop)Types of issue
The text was updated successfully, but these errors were encountered: