-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rendered MIDI file's audio export files are too long #1408
Comments
Yes, the culprit is #1159. Extending the playing in fluid_midi.c by (Code snippet from fluid_midi.c of fluidsynth 2.3.1)
causes the problem. Since 2.3.1 there have been further changes in this area which need to be validated. The demo MIDI file Dance.mid is not only causing the problem. I tested other MIDI files, too. For all the tested MIDI files the duration is not properly determined. Some are 12 seconds longer, some 18 seconds, some even 1/2 minute like Dance.mid. |
For the MIDI file Dance.mid I did a deep dive in fluid_player_callback. Here are my findings: At FLUID_PLAYER_DONE: msec = 94000 = 1:34 which is the correct end of the MIDI file Looks like that the criteria "no voices active" isn't the right criteria to keep player running. My 2 cents... |
For completeness because the link above to download the MIDI file Dance.mid is off-line meanwhile. Dance.mid can be downloaded as part of the GeneralUse GS 2.0.0 package from here. |
The issue is a general fluidsynth MIDI Player one not only relevant for file export. The basic MIDI Player from here (w/o reverb and chorus) plays for 2:05 long and not for 1:36. File rendering for audio export just makes the bug obvious. |
I found some time to look into this. The problem is that a sample "Standard Tom 5" of the General User v2 soundfont is playing for about 35 seconds in release phase. The change introduced in #1159 consciously waits for all voices to stop before quitting playback. But since the release phase is so long, it takes a while to get there. Unfortunately, I don't see a quick solution to this. A proper way to deal with this would be to measure the amplitude of the generated audio at the very end, and if it falls below a certain amount, send an ALL_SOUND_OFF event to the synth. Also, a few general thoughts: I cannot recommend using fluidsynth's MIDI player in any serious software. Originally, the player was intended for quick audio rehearsals from the commandline. It has grown since then. Various contributors have made changes, yet there is not a single unit test for it. |
Thank you for your precise analysis. Changing something the the soundfont General User v2 won't probably help because there may be other soundfont with a similar behavior.
I understand. Therefore I recommend to add a similar statement ...
... either here or here or another adequate place. I leave it up to you @derselbst either to close #1408 or leave it open. |
One possible solution to this would be to add a configurable The GeneralUser GS Standard Tom 5 is 2.3 seconds long and has a loop at the end. For melodic presets, I would handle this scenario by using the hold envelope phase for 2.3 seconds and then the decay phase would fade the loop out afterward. However, for percussion presets, many MIDI files use zero-length notes, so any and all sound shaping happens in the release phase. Too short of a release phase, and the sample fades out too quickly and gets buried in a mix. The 40 second release time on the Standard Tom 5 is to keep that primary 2.3 seconds of sample playing reasonably loudly, but is mostly spent on a barely audible loop. This doesn't just cause a delay to the end of the audio file rendering, but also unnecessarily chews up polyphony during song playback. I will look at a more elegant solution to this problem for the next GeneralUser GS revision. |
Thanks Chris for the insights. I can absolutely comprehend your approach.
The contributor who first brought up this issue and had the same idea. I was and I am still against introducing additional parameters, because they don't "fix" this issue. I.e. you either end up ending the player too early or too late. It can and should be solved dynamically, depending on the generated amplitude at the very end. |
I honestly don't know what the original reporter expects fluidsynth to do. what would you define as the duration of a MIDI file?
Now this approach begs the question: what is defined as silence?Perceived silenceAccording to @ReinholdH, it should not last until there actually is silence, but instead until it's inaudible. Because, when zooming into the waveform, you can see that: But that approach of killing a voice when it's not actually silent would lead to problems: This MIDI file is super silent. It uses velocities around 10 for the subtle piano and strings, and to actually hear it properly, you have to set the gain to around 50 (!). Playing this file with the stock gain leads to pretty much no sound, when that's actually not the case. SF2 silenceSF2 clearly defines what silence is. See section 9.1.7:
Fluidsynth follows that and stops the voice when such attenuation occurs, i.e. after the 40 seconds, hence the long export. So this is not a bug IMO. Fluidsynth correctly waits for the voices to be silent in the sf2 terms and then ends the file. It's up to the external implementation to trim the file until the user thinks that it's silent. |
Of course, a final solution needs to work for super silent midi files with low velocities as well. Here is the motivation for the report of this issue: The images below show the duration of the song Dance.mid from here for various soundfont. The length of the song strongly depends on the used soundfont. We can now debate whether this is a bug nor not. All in all, in turns down to what the purpose of the fluidsynth's MIDI player is. If the objective of the fluidsynth's MIDI player is to validate and test the behavior of a certain combination of a MIDI file with a soundfont, whether it fulfills the sf2 spec etc, in a kind of clean room then, of course, the observation is not a bug. But if the objective of the fluidsynth's MIDI player is to support musicians in their various musical activities independent of what the soundfonts and settings are then a dependency of the duration of a song on the used soundfont is an issue. No end user (musician) understands this dependency. No musician understands that when a song is inaudible any longer that the song is not yet finished. For example an applause to a rock song starts earlier than to a classical symphony. But also a classical symphony is finished when it is inaudible for humans. Whether the observation is a bug or enhancement I don't care but wanted to bring up my observation after #1159 was introduced. (btw. #1159 in a playlist only effects the last MIDI file). IMO the statement
would mean that a musician has to modify his/her song (MIDI file) based on what soundfont is used in fluidsynth, and may modify the song for other soundfonts or external MIDI devices. I am of an opposite opinion. As for now I agree to the statement of @derselbst that
I hope my report can help to further improve fluidsynth (which is a brilliant and professionally managed software - many thanks to @derselbst and everybody who contributes)'s MIDI player. |
In my case, I always need to edit the rendered audio anyway to trim out silence at the beginning and adjust volume, so also having to trim the end is a non-issue to me. |
A proper way of addressing this issue might be the following:
The advantage of using something like libebur128 would be that multiple loudness measurements are available out of the box. One could easily tweak around to see what works best. The obvious disadvantage would be yet another dependency. In case that's not desired, one might duplicate the logic of whatever loudness metrics has been decided for in the end. |
FluidSynth version
fluidsynth 2.3.1 and higher
Describe the bug
Rendered MIDI file's audio export files are too long
Expected behavior
This is a regression which came in with fluidsynth 2.3.1. The MIDI file Dance.mid is 1:36 long. But with fuidsynth 2.3.1 onwards an audio export file is 2:05 long. Between 1:36 and 2:05 there is silence.
At a first glance my gut feeling tells me that the regression came in with #1159 in fluidsynth 2.3.1 but I need to further investigate.
Steps to reproduce
Render the file Dance.mid from here into a wav file. Listen the exported wav files "dance - 16bit.wav" and "dance - 32bit.flac" from here
Additional context
Between 1:36 and 2:05 there is silence. In previous fluidsynth versions before 2.3.1, the exported wav files have the correct duration. The demos were created by es20490446e as part of #1405
The text was updated successfully, but these errors were encountered: