-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR33 with changed vbv-bufsize causes stuttering on rpi3 #478
Comments
If the libx264 encoder is configured to use >4 slices, the existing bufsize (bitrate / framerate) is set too low, causing a quality degradation and failure for the encoder to attain the target bitrate. A previous commit set this to a larger value (bitrate / 2), but this seemed to cause issues with the RPI3 hardware decoder; see: LizardByte#478 Ref: https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate "Specifying too small -bufsize would cause ffmpeg to degrade the output image quality, because it would have to (frequently) conform to the limitations and would not have enough of a free space to use some optimizations (for example, optimizations based on the frame repetitions and similar), because the buffer would not contain enough frames for the optimizations to be effective."
If the libx264 encoder is configured to use >4 slices, the existing bufsize (bitrate / framerate) is set too low, causing a quality degradation and failure for the encoder to attain the target bitrate. A previous commit set this to a larger value (bitrate / 10), but this seemed to cause issues with the RPI3 hardware decoder; see: LizardByte#478 Ref: https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate "Specifying too small -bufsize would cause ffmpeg to degrade the output image quality, because it would have to (frequently) conform to the limitations and would not have enough of a free space to use some optimizations (for example, optimizations based on the frame repetitions and similar), because the buffer would not contain enough frames for the optimizations to be effective."
Hi, Can you see if #482 will fix your issue? Keep in mind that the PR will also revert the change to the bitrate calculation, so you may need to decrease the bitrate in your client to compensate. I've set the PR to draft status, so I'm not sure if it will produce build artifacts that you can use to test if you can't run a local build. |
I'm running the PR now, it will generate artifacts that can be used to test. |
Thanks for prompt solution proposal. No sure, if there are bitrate and rc_buffer_size values, which fits all demands? |
My understanding of I would suggest that you test 0.11.1 vs this PR on a Windows or Linux desktop client configured to use the same bitrate as your RPI3, while monitoring the average incoming bandwidth. It's possible that 0.11.1 works better simply because you are not attaining the client-requested bitrate that you're expecting (which is what the I have a headless RPI3 that I can make available for testing, but it would help if you could let me know more details about your setup (e.g. Raspberry Pi OS version, if you're using the 64bit or legacy build, if you run Moonlight embedded from desktop or if it's a RetroPie packaged build, etc.). If you can run a build, perhaps try |
…4 slice count If the libx264 encoder is configured to use >4 slices, the existing bufsize (bitrate / framerate) is set too low, causing a quality degradation and failure for the encoder to attain the target bitrate. A previous commit set this to a larger value (bitrate / 10), but this seemed to cause issues with the RPI3 hardware decoder; see: LizardByte#478 Ref: https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate "Specifying too small -bufsize would cause ffmpeg to degrade the output image quality, because it would have to (frequently) conform to the limitations and would not have enough of a free space to use some optimizations (for example, optimizations based on the frame repetitions and similar), because the buffer would not contain enough frames for the optimizations to be effective."
I've done more testing. With 16 threads (worst case scenario), capturing 1080p @ 12.5Mbps / 60FPS:
I tested other intermediary values, and the throttling effect occurs to some degree from 1.1x to 1.4x, so I believe this is the best we can do. Since the 1.5x value seems to get us closer to the target bitrate than 2.0x without noticeable thottling in the worst case of 16 threads, I've updated the PR. If you still have problems with the new version of the PR, keep in mind that your actual bitrate on 0.11.1 could have been peaking as low as ~7Mbps (depending on your thread count), so you may still have to reduce the bitrate in your client further. |
I have also done some testing with https://github.com/LizardByte/Sunshine/actions/runs/3415758658 and was able to achieve the same fluent stream as with 0.11.1. |
Have you tried monitoring bandwidth to confirm the real average incoming bitrate on 0.11.1? That would be a good first step. Regardless, I agree that the Pi 3 should be capable of much better performance. Perhaps your client is not using hardware decoding for some reason? |
I was not able to directly monitor the bandwidth on the rpi3 due to lack of tools in the recalbox buildroot. However from my Windows 10 host I see ~ 5 Mbps upload to rpi3 on both 0.11.1 and the DEV versions. CPU utilization on rpi3 is in both cases ~2%. |
Before closing the issue, I have one more question: |
Please don't close the issue. We will close it automatically when it's resolved. For now this fix is not even part of the Sunshine code base. |
…4 slice count If the libx264 encoder is configured to use >4 slices, the existing bufsize (bitrate / framerate) is set too low, causing a quality degradation and failure for the encoder to attain the target bitrate. A previous commit set this to a larger value (bitrate / 10), but this seemed to cause issues with the RPI3 hardware decoder; see: LizardByte#478 Ref: https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate "Specifying too small -bufsize would cause ffmpeg to degrade the output image quality, because it would have to (frequently) conform to the limitations and would not have enough of a free space to use some optimizations (for example, optimizations based on the frame repetitions and similar), because the buffer would not contain enough frames for the optimizations to be effective."
…4 slice count If the libx264 encoder is configured to use >4 slices, the existing bufsize (bitrate / framerate) is set too low, causing a quality degradation and failure for the encoder to attain the target bitrate. A previous commit set this to a larger value (bitrate / 10), but this seemed to cause issues with the RPI3 hardware decoder; see: LizardByte#478 Ref: https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate "Specifying too small -bufsize would cause ffmpeg to degrade the output image quality, because it would have to (frequently) conform to the limitations and would not have enough of a free space to use some optimizations (for example, optimizations based on the frame repetitions and similar), because the buffer would not contain enough frames for the optimizations to be effective."
This issue has been fixed and will be available in the next release. |
Is there an existing issue for this?
Is your issue described in the documentation?
Describe the Bug
Moonlight embedded 2.5.3 works fluent on my raspberry pi3 with Sunshine 0.11.1 with following settings:
The default bitrate for 1080p@60fps is 20000 kbits, which is causing delays my rpi3 client. Whereas 12500 kbit gives a good picture and minimal delay in sound and controls.
With Sunshine 0.12.0 - 0.15.0 I observe stuttering every few seconds, usually when the scene changes or a button is pressed.
I suspect the issue is caused by #33.
I do not see this stuttering on other clients like Windows 7 with Moonlight QT @ 1080p 60fps 20000 kbits.
Looking at the changes in that PR there might be some starvation on bitrate or overshot in the rc_buffer_size:
Before:
Result:
After:
Result:
Can above changes to the formula cause the observed stuttering in my case?
Would it be possible to have options to tune above parameters on such constrained scenarios?
Expected Behavior
No negative impact after PR
Additional Context
Client:
Recalbox 7.2.2
Raspberry Pi 3
Moonlight Embedded 2.5.3
Host Operating System
Windows
Operating System Version
Windows 10 21H2 (Patched 10/2022)
Architecture
64 bit
Sunshine commit or version
0.15.0
Package
Windows - portable
GPU Type
none (software encoding)
GPU Model
UHD 750
GPU Driver/Mesa Version
31.0.101.3790
Capture Method (Linux Only)
No response
Relevant log output
Not sure if there is something relevant
The text was updated successfully, but these errors were encountered: