Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPBNet #19

Open
GanyongMo opened this issue Jan 30, 2024 · 10 comments
Open

BPBNet #19

GanyongMo opened this issue Jan 30, 2024 · 10 comments

Comments

@GanyongMo
Copy link
Contributor

Hello Henry,

I am confused the model training for the BPBNet, do you have any idea coming out according to the following issues what I met?

I followed the command:
python3.6 train_BPXnet.py --X_is 'B' --slp 'mixedreal' --train_only_betanet
it rose the error:
File "../lib_py/tensorprep_lib_bp.py", line 151, in prep_reconstruction_gt
x[im_ct, start_map_idx, :, :] = dat['mesh_depth'][entry].astype(np.float32)
KeyError: 'mesh_depth'

also, I skipped the first step, continue the second and the third step, then followed the fourth step for the BPBNet,
python train_BPXnet.py --X_is 'B' --mod 2 --slp 'mixedreal' --v2v
it rose the error:
File "../lib_py/tensorprep_lib_bp.py", line 170, in prep_reconstruction_input_est
x[im_ct, start_map_idx, :, :] = dat['pimg_est'][entry].astype(np.float32)
KeyError: 'pimg_est'

I found the program read the dataset correctly, but idk what I missed... Is it necessary to get the BodyPressureSD addendum dataset (148G) to setup the training for BPBNet?

Thanks in advance.

Best Regards,
Ganyong

@henryclever
Copy link
Contributor

Hi Ganyong,

I'll have to try this again and repro your issue to see - I haven't looked at this code in some time and don't have enough information to know exactly what is causing it. The steps should work out of the box if you have the data downloaded in the correct folder. Are you sure you downloaded all the data (except the addendum)?

You definitely don't need the addendum dataset for training this.

Did you try it with --X_is 'W'? if so did that work?

@GanyongMo
Copy link
Contributor Author

Hi Ganyong,

I'll have to try this again and repro your issue to see - I haven't looked at this code in some time and don't have enough information to know exactly what is causing it. The steps should work out of the box if you have the data downloaded in the correct folder. Are you sure you downloaded all the data (except the addendum)?

You definitely don't need the addendum dataset for training this.

Did you try it with --X_is 'W'? if so did that work?

Hi Henry,

So glad to hear from you!!

Yes, I already download all the data (except the addendum, that is why I asked if I need it to implement the basic training processes both black-box NN and white-box NN, this dataset is too large, I am afraid it is impossible for downloading it in my case, but now it is clear for me)

I am trying to train with --X is 'W' now, in the 2/4 step, I would update the information here once implemented.

For the problem I met previously (--X is 'B'), I am sure the dataset has corresponding 'mesh_depth' and 'pimg_est', but when the function calls the dataset, it shows missing these two keys.

By the way, it is a little bit suffering to check the codes (time-comsuming) because the VSCode cannot run the debugger with python==3.6 (for the moment the VSCode just supports python3.8 or higher version), do you have any idea for it? (if time allowed, I would like to try to configure the environment in python3.8 for the code running correctly).

Anyway, let's keep in touch.

Best Regards,
Ganyong

@GanyongMo
Copy link
Contributor Author

By the way, it is a little bit suffering to check the codes (time-comsuming) because the VSCode cannot run the debugger with python==3.6 (for the moment the VSCode just supports python3.8 or higher version), do you have any idea for it? (if time allowed, I would like to try to configure the environment in python3.8 for the code running correctly).

This problem I have solved, we need to configure the debugger environment appropriately.

Did you try it with --X_is 'W'? if so did that work?

it is working for me; I would try again to figure out what is the problem of the case "--X_is 'B' ". thank you so much!

Best Regards,
Ganyong

@henryclever
Copy link
Contributor

Ah! Glad you got the problem solved with the debugger and envt.

I'm glad the --X_is 'W' is working. Please let me know if you have this issue again with -X_is 'B'! If it is a bug in the code I will make sure it gets fixed for you as soon as possible.

-Henry

@GanyongMo
Copy link
Contributor Author

Please let me know if you have this issue again with -X_is 'B'! If it is a bug in the code I will make sure it gets fixed for you as soon as possible.

Hi, unfortunately, I tried again today, the problem is still there and the same as aforementioned (-X_is 'B'). I am not sure if you have the same problem when you run it again. I am also trying to figure it out.

-- Ganyong

@henryclever
Copy link
Contributor

OK - i'm downloading the data now and will try . Could you send me the contents of the danaLab data? It's on a computer from my old lab I do longer have access to. It may be quicker for me to get them from you than go through and request from AC Lab again. send to (either direct or through some link, i don't care): [email protected]. otherwise let me know and I'll request from AC Lab.

Thanks!

@henryclever
Copy link
Contributor

@GanyongMo, thanks for sending.

There is definitely a bug in this for step 1 ... I repro'd it and found the same issue. As a workaround in the interim, just train betanet for the "B" network using the "W" flag in step 1. In practice the betanet trains the same way (and this doesn't make a difference) but should be corrected (I will fix it) .

By the way I'm using the following package versions:

python 3.6.9
numpy 1.20.3
trimesh 3.8.19
pyrender 0.1.45
pillow 8.1.0
sudo apt install libjpeg-dev zlib1g-dev
matplotlib 3.3.4
torch 1.7.1
torchvision 0.8.2
chumpy 0.70
opencv-python 4.5.1.48
scikit-learn 0.23.2
open3d-python 0.7.0.0
imutils 0.5.4
camera 1.3.0
imageio 2.9.0

I just ran into the following error: "RuntimeError: CUDA error: no kernel image is available for execution on the device" -- and I need to get past this to get step 2 working so I can get to step 4. what cuda driver version are you using (e.g. did you have to downgrade to...)?

Henry

@henryclever
Copy link
Contributor

maybe if you are using latest cuda headers you can send me your versions and I can try with those instead of my old ones? My computer has 535 installed on an A6000.

@GanyongMo
Copy link
Contributor Author

GanyongMo commented Feb 9, 2024

@henryclever

I just ran into the following error: "RuntimeError: CUDA error: no kernel image is available for execution on the device" -- and I need to get past this to get step 2 working so I can get to step 4

Yep, I got the same error as well at the beginning, the reason is that the torch and cuda were not compatible, the packages included corresponding versions that I am using as following below:

bodypressure.yaml

maybe if you are using latest cuda headers you can send me your versions and I can try with those instead of my old ones?

This is the cuda headers version and corresponding pytorch that I am using, it is also the solution for the previous RuntimeError for me.

CUDA 11.1
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

the link: https://pytorch.org/get-started/previous-versions/

At the end, I think I already can run all the commands of 4 steps successfully after built a specific branch for the condition "X_is 'B' and mod =2". Now I am running them to verify the results. If it is possible, I can share it with you then you can have a check if this is on the right track when you are available (maybe I can create a branch for github repo or other way, it is totally ok for me, let me know which one is better for you)

--Ganyong

@henryclever
Copy link
Contributor

Sure! happy to check - just create a branch and I'll take a look. :)

-Henry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants