-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overlap GCM and AES operations (IDFGH-13563) #14452
Comments
Hi @Harshal5 , I'm currently using ESP32-S3, with hardware-AES and software-GCM. I'm mostly interested in decryption speed but both operations could benefit from pipelining the AES and GCM stages. Your understanding of the pipeline timing is correct. |
@bryghtlabs-richard Thanks for confirming, got it! |
I think the proposed Another apporach could be saving some "waiting" time during the AES operation, wherein we start AES operation and start the GCM operation just before we check for AES I am not sure if these approaches would help us achieve considerable performance improvements. |
Sorry, I did not mean to propose a multithreading solution, rather pipelining the hardware AES plus software GCM like your second approach, but I don't think we would want to check for |
Currently, the API During the AES operation we generate the DMA descriptors list to contain the complete "length" bytes of the data instead of chunking the data in small blocks. So, once we start the AES operation, the AES peripheral would complete its operation over the complete "length" bytes of the data and then would be ready to accept some new input data. Chunking into small blocks adds up the overheads of recreating the DMA descriptors list again and again. |
We won't want DMA descriptors per AES block, but I was hoping that the AES core input and output could be configured separately(DMA vs CPU access), but I found the AES core documentation, and DMA is only supported for both input & output, or not all all(CPU input & output). So, if it's possible to gain time by overlapping, it would need to be done without DMA(which may be slower overall).
I agree we should not change the behaviour of the API, only within mbedtls_gcm_update() calls would the overlapping occur. |
Yes, as of now I can think only of the second approach as mentioned above that could increase the performance of AES-GCM operations:
But I am not sure if this approach could be desirable given it involves modules intermixing in the code (GCM references in AES module) and gives just a slightest improvement in the performance. edit: Also, reiterating the above approach, in cases when the input and output buffers are the same, the AES and GCM operations would not be mutually exclusive, thus I think the approach could fail. |
Closing this issue as of now then, please feel free to reopen if needed. |
Is your feature request related to a problem?
I'd like my TLS transfers to be faster. My system spends a lot of time doing GCM or AES operations after the session is established.
Describe the solution you'd like.
It may be worth looking into overlapping AES and GCM operations on ESP32 systems. Some terms:
Currently, each is completed after the other. So total time = N * tAES + N * tGCM. Pseudocode:
Instead, on other systems I have been able to overlap AES and GCM block operations, so total time = tAES + (N-1)*max(tAES, tGCM) + tGCM. This would require the CPU feeding the AES unit periodically, which will have some overhead compared to batching AES operations. It's best of the AES unit can be configured to be fed input from the CPU but output via DMA. Then CPU can wait for input-ready-signal for each block, and only waits for done-signal at end of transfer. Pseudocode:
Describe alternatives you've considered.
No response
Additional context.
I've tuned the GCM loop for xtensa, but if it works this could save quite a bit more time.
The text was updated successfully, but these errors were encountered: