Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

Draw loop should not be destroying and recreating framebuffers each frame #20

Open
ghost opened this issue Oct 13, 2017 · 7 comments
Open

Comments

@ghost
Copy link

ghost commented Oct 13, 2017

In Tutorial4 onwards, the render loop is destroying and recreating framebuffers every frame. Creation and destruction of objects in Vulkan are not lightweight operations - they are expected to have a sizeable runtime cost, and indeed do on some implementations.
This is documented in the Vulkan spec itself in fact, the second paragraph here: https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#fundamentals-objectmodel-lifetime

The offending lines are here in Tutorial04, for instance:

bool Tutorial04::CreateFramebuffer( VkFramebuffer &framebuffer, VkImageView image_view ) {

These objects should be created once and re-used, the same as all other objects in a robust Vulkan application.

@Ekzuzy
Copy link
Collaborator

Ekzuzy commented Oct 13, 2017

Well, that depends. Creation and desctruction of framebuffers in each frame is done on purpose:

  • as far as I know, framebuffers are not so heavy, so the cost of their creation is not too big (or at least is much lower than for example pipelines)
  • we may not know up front what images/render targets will be required for the render pass; so creating a framebuffer just before it is used is much easier to perform
  • there may be many different combinations of images/render targets involved in rendering, so we may need to create plenty of framebuffer objects
  • if swapchain images are used as render targets (what is quite common), everytime swapchain is recreated we also need to recreate all the framebuffers in which they are used

All this leads to the most important factor - with such approach our code will be much harder to maintain, to read and to understand. We can't create all the framebuffers at initialization phase, because some of them depend on the swapchain. And we also need to recreate framebuffers that use swapchain images, every time the swapchain is recreated. But we don't need to recreate all of them (though we can, of course). Such code isn't too good for a tutorial. So this was the main reason I didn't do it.

But of course, if we want to improve the performance of our application, we should create all the resources we can before the main, rendering loop. We just can't forget that this is a tradeof between the performance and the factors mentioned above (code readability, complexity, maintainability).

@ghost
Copy link
Author

ghost commented Oct 13, 2017

Firstly, the assumption that it's cheap probably holds for desktop GPUs, but on mobile it doesn't quite work - framebuffer creation is expensive on at least Imagination's implementation, and for good reason. I highly suspect that other mobile vendors have the same high cost creation for similar underlying reasons.

That said I don't necessarily take issue with you lazily creating framebuffers as you need them - that seems completely reasonable, albeit not ideal. It's more the fact that you completely destroy one and then allocate a new one - which in the tutorials you are mostly just recreating the same framebuffer over again. In your tutorials, for instance, it seems you're just creating framebuffers for the swapchain, and the swapchain images are largely static unless you resize the window, which happens way less often than you draw a new frame.

A simple fix for this would be to only destroy and recreate the framebuffers when the swapchain signals that there are new swapchain images (something that is an explicit return code IIRC?).

If you have anything more dynamic and complex, then a general purpose solution would be some sort of "framebuffer cache", because for the most part you're causing a lot of redundant work to happen on some implementations.

@Ekzuzy
Copy link
Collaborator

Ekzuzy commented Oct 13, 2017

I agree with You. But still, I do it to simplify the code - this approach makes code much cleaner, easier to read and to understand. I just create framebuffers and destroy after several frames. It also fits the presented way of managing several number of independent "virtual frames". Recreating them along with swapchains or adding even more advanced logic, for example to check whether such framebuffer already exists, isn't necessary to explain presented topics (especially at this, rather low, level of complexity). It just clutters the code and draws away reader's attention from more important things.

But You just brought an interesting idea. Not for the tutorials as in those I just want to explain the very basics of the Vulkan API. But for other usecases, the mentioned cache sounds really handy. For example cache storing arbitrary number (i.e. 20) of framebuffers used during rendering. If we need 21st framebuffer, the least recently used one just gets destroyed. Is this something You were thinking about?

@ghost
Copy link
Author

ghost commented Oct 16, 2017

I agree with You. But still, I do it to simplify the code - this approach makes code much cleaner, easier to read and to understand.

I guess that's reasonable, but with that being the case, would you mind putting in a note about how it's just done for clarity of the rest of the code, and is definitely not best practice? We had a developer thinking this was just something they should do in their app, or was reasonable to do, with hefty performance consequences.

But for other usecases, the mentioned cache sounds really handy. For example cache storing arbitrary number (i.e. 20) of framebuffers used during rendering. If we need 21st framebuffer, the least recently used one just gets destroyed. Is this something You were thinking about?

That'd be the simplest way to do it, though with knowledge of when the swapchain is changed you could probably do something more fixed - i.e. create and destroy a set of framebuffers when the swapchain is modified. In an app following some notion of "best practices" for Vulkan, dynamic object creation should be both extremely limited and done on a separate thread, negating the need for such a cache.

@Ekzuzy
Copy link
Collaborator

Ekzuzy commented Oct 16, 2017

Information about the consequences of such approach, and the reason why it was chosen for the tutorial, definitely wouldn't hurt ;-). I will add a note about it somewhere in the tutorial.
But!

We had a developer thinking this was just something they should do in their app, or was reasonable to do, with hefty performance consequences.

I've talked about it and consulted it with several people and most of them agreed that such approach is reasonable. But this may come from the fact that we (or at least I) were thinking more about desktop GPUs. I don't have any experience with mobile development or mobile GPUs so I didn't know the impact of such approach may be much more severe on mobile platforms.

I will add a paragraph about it in the 4th tutorial to clarify things a bit. Thanks for pointing this out!

@szihs
Copy link

szihs commented Oct 18, 2017

Hello Tobias,
I agree object creation/destruction should be out of performance critical path.
However I am interested to know why desktop GPU and mobile would have difference on say, this particular API
"
Firstly, the assumption that it's cheap probably holds for desktop GPUs, but on mobile it doesn't quite work - framebuffer creation is expensive on at least Imagination's implementation, and for good reason. I highly suspect that other mobile vendors have the same high cost creation for similar underlying reasons."

Is it related to TBDR Vs immediate mode base renderer? Is so, would you be kind enough to explain.

@ghost
Copy link
Author

ghost commented Oct 18, 2017

Is it related to TBDR Vs immediate mode base renderer? Is so, would you be kind enough to explain.

Kind of. I mean these days all modern GPUs do tiling to some degree, but if we are talking about "traditional" tilers (e.g. PowerVR, Mali), these have to setup an acceleration structure for the binned geometry at some point - which may be done either at framebuffer setup or renderpass creation. There are additional things that other drivers may do which I don't fully understand - e.g. I believe Qualcomm has a moderate-to-high cost for framebuffer setup, and I'm not 100% sure why. But it almost certainly boils down to the fact that "it's not just doing a straightforward render".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants