Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insufficient shared memory in training phase #34

Open
LoveSiameseCat opened this issue May 3, 2023 · 3 comments
Open

insufficient shared memory in training phase #34

LoveSiameseCat opened this issue May 3, 2023 · 3 comments

Comments

@LoveSiameseCat
Copy link

LoveSiameseCat commented May 3, 2023

Hi,
I occurred the same problem as #24 When I tried to train this model. I saved all the images with the format of '.jpg'. However, the RAM memory linearly increased during the training process (Even I only iterate the training data and ignore anything else). Finally, the system would reported "RuntimeError: DataLoader worker (pid 11676) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.”
It's hard for me the fix this bug after I tried everything, can you give me some advises about this?
BTW, I found the variant "out_view" in AbstractDataset._get_jpeg_info is not used, I'm curious what it does.

@CauchyComplete
Copy link
Collaborator

Did you use the same environment as described in requirements.txt? RAM linearly increases at the beginning and stops at a certain point.

@LoveSiameseCat
Copy link
Author

LoveSiameseCat commented May 16, 2023

Thank you for your response. I have created a new environment on 2080ti, but the RAM memory leakage still exists. When I try to fix the problem, I found the problem may occur due to the 'jpegio' package. When I comment out this operation, this issue would be disappeared. However, I think this issue depends on the device, since I found it can be solved after I used another server.

@CauchyComplete
Copy link
Collaborator

Thanks for your report. I'm almost certain that I'll replace jpegio to another package in future work. I think jpegio is not stable enough. If anyone knows another library that supports the extraction of raw DCT coefficients, please let us know!
I tested CAT-Net on both Windows and Linux but didn't face similar problems. As you reported, it's probably a device-dependent error...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants