Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate 3D assets with more number of faces? #58

Open
supersyz opened this issue Dec 12, 2024 · 30 comments
Open

How to generate 3D assets with more number of faces? #58

supersyz opened this issue Dec 12, 2024 · 30 comments
Labels
good first issue Good for newcomers

Comments

@supersyz
Copy link

Hi, Really appreciate for you great open-source work!
I notice the output objects have small number of faces, how to generate 3D assets with more number of faces?

@kitcheng
Copy link

In the example.py and app.py files, search for the term "simplify", and set it to a fixed value of 0.

This way, it will not reduce the number of faces.

Image

@JackDainzh
Copy link

Better just edit the Simplify gradio slider to reach to the desired minimum in app.py

@visualbruno
Copy link

I changed the app.py, added the number of vertices, increased the sampling steps to 100 and changed the slider for mesh simplification from 0 to 0.98 and increased the max texture size to 4096.

Image

Download my app.py file here

@realisticdreamer114514
Copy link

realisticdreamer114514 commented Dec 14, 2024

@visualbruno what guidance strengths are the best for details/fidelity in the 2 stages? Do we really need 100 steps for this level of detail?

@visualbruno
Copy link

@visualbruno what guidance strengths are the best for details/fidelity in the 2 stages? Do we really need 100 steps for this level of detail?

Hi. I'm not sure if 100 steps is better than 50. Sometimes with 20, it is not good enough.
When the guidance strength is at 10 in the 2 stages, it respects more the image.

Check the screenshots:
1st screenshot : Guidance of 0
2nd screenshot : Guidance of 10

The 2nd screenshot, the fidelity is very good

Guidance Strength of 0 Guidance Strength of 10

@cjjkoko
Copy link

cjjkoko commented Dec 15, 2024

@visualbruno what guidance strengths are the best for details/fidelity in the 2 stages? Do we really need 100 steps for this level of detail?

Hi. I'm not sure if 100 steps is better than 50. Sometimes with 20, it is not good enough. When the guidance strength is at 10 in the 2 stages, it respects more the image.

Check the screenshots: 1st screenshot : Guidance of 0 2nd screenshot : Guidance of 10

The 2nd screenshot, the fidelity is very good

Guidance Strength of 0 Guidance Strength of 10

My params:
simply:0.7
texture_size:2048

seed=20,
sparse_structure_sampler_params={
"steps": 100,
"cfg_strength": 7.5,
},
slat_sampler_params={
"steps": 100,
"cfg_strength": 3.5,
},
It works fine, but I'm still working on better parameters, including modifying some that aren't exposed

@visualbruno
Copy link

I think, the biggest issue is about the input picture resolution that is resized to 518x518 in trellis_image_to_3d.py
I tested with a higher resolution like 2058x2058, but the result was horrible. Probably they trained the model with this low resolution.

@cjjkoko
Copy link

cjjkoko commented Dec 16, 2024

I think, the biggest issue is about the input picture resolution that is resized to 518x518 in trellis_image_to_3d.py I tested with a higher resolution like 2058x2058, but the result was horrible. Probably they trained the model with this low resolution.

Image Image Maybe postprocessing will get a great result

@realisticdreamer114514
Copy link

@cjjkoko What kind of postprocessing do you use?

@cjjkoko
Copy link

cjjkoko commented Dec 16, 2024

@cjjkoko What kind of postprocessing do you use?

trimesh and open3d. use laplacian

@realisticdreamer114514
Copy link

realisticdreamer114514 commented Dec 16, 2024

trimesh and open3d

These are trying to smooth the output meshes without trying to improve the quality during generation.
You can see that even with a high-resolution input image and high texture size e.g. 4096, much detail of the final meshes' texture map is lost when they should be preserved (I tested with some character cosplay photos and found out about this). Might be better if someone has the GPU power to train/finetune the I23D model for it to work in input image resolution of say 770^2 or 1036^2, since as visualbruno points out the pipeline is set at the resolution the official model was trained on (518^2) and this kind of downsizing might explain detail loss.

@cjjkoko
Copy link

cjjkoko commented Dec 16, 2024

trimesh and open3d

These are trying to smooth the output meshes without trying to improve the quality during generation. You can see that even with a high-resolution input image and high texture size e.g. 4096, much detail of the final meshes' texture map is lost when they should be preserved (I tested with some character cosplay photos and found out about this). Might be better if someone has the GPU power to train/finetune the I23D model for it to work in input image resolution of say 770^2 or 1036^2, since as visualbruno points out the pipeline is set at the resolution the official model was trained on (518^2) and this kind of downsizing might explain detail loss.

Yes, but there is no specific date for the training.

@realisticdreamer114514
Copy link

there is no specific date for the training

Even at the current default resolution, the official I23D checkpoint seems quite undertrained (not sure if this is the right way to put it) so it doesn't adhere to the input image closely enough and tends to distort details that are still clear when downscaled. Finetuning on this framework can't come sooner...

@cjjkoko
Copy link

cjjkoko commented Dec 18, 2024

there is no specific date for the training

Even at the current default resolution, the official I23D checkpoint seems quite undertrained (not sure if this is the right way to put it) so it doesn't adhere to the input image closely enough and tends to distort details that are still clear when downscaled. Finetuning on this framework can't come sooner...

emmm, Expect great breaking updates in the next release. At this point, you can only adjust the seed to fit each image.I am currently using this very painful

@visualbruno
Copy link

I played with many parameters like the input image resizing, the number of sampling steps, texture resolution and the "number of views" used in the postprocessing. So I modified all main scripts and the app.py to play with these parameters.

The best result I got is with:

  • Input image resized to 770, instead of 518.
  • Number of Sampling Steps : 500
  • Texture resolution : 2048
  • Postprocessing "Number of Views": 120, instead of 100 (it removes a bit the artifacts on the texture)

For sure, with these values, it takes much more time to generate the model.

I tried to increase the input image resolution to 1036 and above, but the results were worse.
I tried with a number of sampling steps of 800 and 1000, but it did not improve the result too.
A texture resolution of 4096 is not better than 2048.
I tried with 200 for the "number of views" in post processing, it did not improve a lot the texture and the rendering time was multiplied by 10.

I tested with 2d anime pictures and it never renders very well, probably because this kind of pictures is flat and lacks relief and depth.

With Marlin from Seven Deadly Sins, Input Picture:
Image
Result:
Image

With Cleopatra, Input Picture:
Image
Result:
Image

With Knight, Input Picture:
Image
Result:
Image

@QuantumLight0
Copy link

I played with many parameters like the input image resizing, the number of sampling steps, texture resolution and the "number of views" used in the postprocessing. So I modified all main scripts and the app.py to play with these parameters.

The best result I got is with:

  • Input image resized to 770, instead of 518.
  • Number of Sampling Steps : 500
  • Texture resolution : 2048
  • Postprocessing "Number of Views": 120, instead of 100 (it removes a bit the artifacts on the texture)

For sure, with these values, it takes much more time to generate the model.

I tried to increase the input image resolution to 1036 and above, but the results were worse. I tried with a number of sampling steps of 800 and 1000, but it did not improve the result too. A texture resolution of 4096 is not better than 2048. I tried with 200 for the "number of views" in post processing, it did not improve a lot the texture and the rendering time was multiplied by 10.

I tested with 2d anime pictures and it never renders very well, probably because this kind of pictures is flat and lacks relief and depth.

With Marlin from Seven Deadly Sins, Input Picture: Image Result: Image

With Cleopatra, Input Picture: Image Result: Image

With Knight, Input Picture: Image Result: Image

I believe the model is under trained for anime models. anime models have flat normals so no depth, so I believe that a model needs to be train on anime model' faces in order for it to understand the faces. I do belive however the multidiffusion has potential in this area and I will provide a sample of why I believe so.
https://github.com/user-attachments/assets/bd2bf57b-0c55-42ad-b28c-2f8b2e5d84f9
Image
Image
Image

@visualbruno
Copy link

@QuantumLight0 What parameters did you use to generate this model ?

@QuantumLight0
Copy link

@QuantumLight0 What parameters did you use to generate this model ?
I set everything to max

Image

@visualbruno
Copy link

@QuantumLight0 I did not see they updated the repository with multi images algorithm. I will play with it.

@cjjkoko
Copy link

cjjkoko commented Dec 25, 2024

Any new breakthroughs ?

@YuDeng YuDeng added the good first issue Good for newcomers label Dec 25, 2024
@visualbruno
Copy link

visualbruno commented Dec 29, 2024

Any new breakthroughs ?

Hi @cjjkoko
After many tests, I find that it's great for 3d modeling for almost everything, but for texturing it's not excellent because it lacks a lot of details.
For the moment, it's very good if you need to generate any kind of object, house, plant, but not for people or characters.

Plant:
Input Picture:
Image
3D (Simplify = 0.0):
Image
Video Sample:

sample.mp4

House:
Input Picture:
Image
3D (Simplify = 0.0):
Image
Video Sample:

sample.mp4

Chest:
Input Picture:
Image
3D (Simplify = 0.5):
Image
Video Sample:

sample.mp4

@cjjkoko
Copy link

cjjkoko commented Dec 30, 2024

Any new breakthroughs ?

Hi @cjjkoko After many tests, I find that it's great for 3d modeling for almost everything, but for texturing it's not excellent because it lacks a lot of details. For the moment, it's very good if you need to generate any kind of object, house, plant, but not for people or characters.

Plant: Input Picture: Image 3D (Simplify = 0.0): Image Video Sample:

sample.mp4
House: Input Picture: Image 3D (Simplify = 0.0): Image Video Sample:

sample.mp4
Chest: Input Picture: Image 3D (Simplify = 0.5): Image Video Sample:

sample.mp4

There is no doubt that Trellis's framework is great,it may be because the sample of the trained data set is too small.I am now looking for a way of secondary processing.And for human processing, I'm using open source solutions like PSHuman

@visualbruno
Copy link

Hi @cjjkoko

About PSHuman, are you on Windows ? Because I tried to install it, but it crashes when I install the requirements.

@cjjkoko
Copy link

cjjkoko commented Dec 31, 2024

Hi @cjjkoko

About PSHuman, are you on Windows ? Because I tried to install it, but it crashes when I install the requirements.

No,I run it on Ubuntu.

@randall-peakey-com
Copy link

For those wanting higher quality textures...
Suggest looking in postprocessing_utils.py

Find line...
observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=1024, nviews=100)
Change to
observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=100)

Obviously note your GPU memory usage will grow quite a bit.
I had to find a few places to del objects + torch.cuda.empty_cache() and reduce the number of nviews to get 4K
I'm sure there is a bunch more that could be optimized, but worked for my hardware and needs
nviews = 100 if texture_size < 4096 else 35
observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=nviews)

@SWAPv1
Copy link

SWAPv1 commented Dec 31, 2024

can you share example images of before and after the higher quality texture settings?

@cjjkoko
Copy link

cjjkoko commented Jan 1, 2025

For those wanting higher quality textures... Suggest looking in postprocessing_utils.py

Find line... observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=1024, nviews=100) Change to observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=100)

Obviously note your GPU memory usage will grow quite a bit. I had to find a few places to del objects + torch.cuda.empty_cache() and reduce the number of nviews to get 4K I'm sure there is a bunch more that could be optimized, but worked for my hardware and needs nviews = 100 if texture_size < 4096 else 35 observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=nviews)

I tried resolution=2048 and nviews=360, but the results didn't improve significantly

@visualbruno
Copy link

For those wanting higher quality textures... Suggest looking in postprocessing_utils.py
Find line... observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=1024, nviews=100) Change to observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=100)
Obviously note your GPU memory usage will grow quite a bit. I had to find a few places to del objects + torch.cuda.empty_cache() and reduce the number of nviews to get 4K I'm sure there is a bunch more that could be optimized, but worked for my hardware and needs nviews = 100 if texture_size < 4096 else 35 observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=nviews)

I tried resolution=2048 and nviews=360, but the results didn't improve significantly

I played a bit with these parameters and I did not find any improvements by increasing the "view resolution" to 2048 instead of 1024 by default.

I have a 3080 16Gb, with this parameter and the "number of views", it puts everything in VRAM, so with a resolution of 1024, I can run 150 views, but with a resolution of 2048, I can only run 50 views.

I tried to play with the number of optimization steps in "bake_texture" function in "postprocessing_utils.py". By default, it is 2500.
I found that below 1000, the quality is worse, but above 2000, there is no difference.

I think, we can stick to a texture_resolution of 2048, view_resolution of 1024.
I will continue to play with the number of views and number of optimization steps.
Image

@cjjkoko
Copy link

cjjkoko commented Jan 1, 2025

For those wanting higher quality textures... Suggest looking in postprocessing_utils.py
Find line... observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=1024, nviews=100) Change to observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=100)
Obviously note your GPU memory usage will grow quite a bit. I had to find a few places to del objects + torch.cuda.empty_cache() and reduce the number of nviews to get 4K I'm sure there is a bunch more that could be optimized, but worked for my hardware and needs nviews = 100 if texture_size < 4096 else 35 observations, extrinsics, intrinsics = render_multiview(app_rep, resolution=texture_size, nviews=nviews)

I tried resolution=2048 and nviews=360, but the results didn't improve significantly

I played a bit with these parameters and I did not find any improvements by increasing the "view resolution" to 2048 instead of 1024 by default.

I have a 3080 16Gb, with this parameter and the "number of views", it puts everything in VRAM, so with a resolution of 1024, I can run 150 views, but with a resolution of 2048, I can only run 50 views.

I tried to play with the number of optimization steps in "bake_texture" function in "postprocessing_utils.py". By default, it is 2500. I found that below 1000, the quality is worse, but above 2000, there is no difference.

I think, we can stick to a texture_resolution of 2048, view_resolution of 1024. I will continue to play with the number of views and number of optimization steps. Image

Yes, so, for now, I've given up on tuning through parameter changes.I am currently looking for better ways to subdivide, smooth, etc. to handle the model.

@randall-peakey-com
Copy link

For clarification, regarding the render_multiview texture_size adjustment -- this isn't going to get a 2x or 4x improvement, but will be an improvement.

I started with a 4K image input, adjusted the texture_size on render_multiview and increased optimization steps.
I'm using in VR and even the small difference is pretty clear.
YMMV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

10 participants