Replies: 1 comment 1 reply
-
Hey @PeaBrane , thanks a lot for the suggestion. A long time ago we had such a transformation in Tonic, see this commit: https://github.com/neuromorphs/tonic/blob/5e0fa64b8a78adcdfca6629c62fd9b2fceb5a24c/spike_data_augmentation/functional/st_transform.py When it comes to augmentations, there's a reason why I only used the torchvision methods. When training networks in PyTorch, everything is based on frames, so that's a fixed transformation in the pipeline. Applying event-based augmentations before frames are created, means that the ToFrame transform has to be applied after every event-based augmentation, which slows down the whole training pipeline. So for performance reasons, I stuck with torchvision that works with frames. However, I think it would be great to have affine transforms included again. There is actually a way to cache some augmented samples if one would really like to do that, and beyond that, I think it would be great to have it included for some general non-augmentation pipelines. So I think it would be useful, and I like the approach you suggested |
Beta Was this translation helpful? Give feedback.
-
This is a feature request that I can potentially work on. It is on using RandomAffine (torchvision counterpart) to perform direct data augmentation on events, which combines rotation, translation, scaling, and shearing all in one. It has the added benefit of minimizing data loss due to boundary effects. Say you do a rotation followed by a translation (separately, not combined), then both operations will make some data go out of bounds (where as if using affine transformation this oob effect will only occur once).
Note that unlike the torchvision counterpart, this would perform the transformation directly on the events, which in some sense is simpler becaues it involves multiplying the spatial coords of the events with the affine matrix in homogenous coordinates. You can do this transformation before binning (or converting events to frames) to preserve more spatio-temporal information.
In our recent work (preprint here) accepted to the CVPR 2024 AIS workshop, this transformation was core in bringing our network to not being able to train to achieving >99% acc on the eye-tracking challenge. (In fact one of the reviewers requested that we try to integrate this transformation into tonic, hence why I'm opening this discussion.) The code snippet of us applying the transformation is here; it's a bit dirty because we didn't clean it up due to time constraint of the workshop challenge.
My idea for this contribution (PR) involves (largely following the torchvision standard): 1) making an affine.py in the functional directory, 2) adding the class RandomAffine in transforms.py, 3) adding some tests and some visualizations here if needed
Let me know what you guys think, and if you guys think this will be useful I will start working on it.
Beta Was this translation helpful? Give feedback.
All reactions