splatfacto method in Colab broken? #3455

fasteinke · 2024-09-29T02:20:51Z

Describe the bug
Running the demo.ipynb fails to start training

To Reproduce
Steps to reproduce the behavior:

Select an example datatset - here, desolation
Paste the simplest command into xterm: ns-train splatfacto --data data/nerfstudio/desolation
Gets to the setting up of CUDA, this will take a few minutes bit; cycles for quite a while, and then throws an error
xterm then goes nuts, and constantly prompts for input; the error message is lost

Previous attempts to use this method at least started training; now it has problems even earlier.

fasteinke · 2024-09-29T02:22:31Z

That was fast!! I'm impressed ...

fasteinke · 2024-09-29T02:29:58Z

Okay, now I'm confused ... these look to be files to allow me to run locally. But my issue is with how the notebook runs in Colab - do the files there need to be altered in some fashion?

brentyi · 2024-09-29T03:20:31Z

(deleted comment because malware, unfortunately I don't have experience with Colab so not the best person to help with the actual issue)

fasteinke · 2024-09-29T04:45:50Z

That's very nasy!!! ... Looks like I need to be on the ball, with regard to GitHub responses - not something I was aware was happening ...

fasteinke · 2024-10-01T04:31:37Z

To flesh this out, tried running it with the splatfacto-big method, just in case ...

Same error:

[03:15:38] Saving config to: outputs/desolation/splatfacto/2024-10-01_031537/config.yml experiment_config.py:136
Saving checkpoints to: outputs/desolation/splatfacto/2024-10-01_031537/nerfstudio_models trainer.py:142
Auto image downscale factor of 2 nerfstudio_dataparser.py:484
load_3D_points is true, but the dataset was processed with an outdated ns-process-data that didn't convert colmap points to .ply! Update the colmap
dataset automatically? [y/n]: y
Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 233M/233M [00:00<00:00, 267MB/s]
╭─────────────── viser ───────────────╮
│ ╷ │
│ HTTP │ http://0.0.0.0:7007 │
│ Websocket │ ws://0.0.0.0:7007 │
│ ╵ │
╰─────────────────────────────────────╯
[03:16:08] Caching / undistorting eval images full_images_datamanager.py:230
[NOTE] Not running eval iterations since only viewer is enabled.
Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval.
No Nerfstudio checkpoint to load, so training from scratch.
Disabled comet/tensorboard/wandb event writers
[03:16:12] Caching / undistorting train images full_images_datamanager.py:230
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 559.3889
VanillaPipeline.get_train_loss_dict: 559.3837
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gsplat/cuda/_backend.py", line 83, in
from gsplat import csrc as _C
ImportError: cannot import name 'csrc' from 'gsplat' (/usr/local/lib/python3.10/dist-packages/gsplat/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '10']' returned non-zero exit status 1.

brentyi · 2024-10-01T04:53:47Z

It seems like gsplat is not installing/building correctly on Colab? Related: nerfstudio-project/gsplat#315

It's also possible that recent changes to gsplat will help, it's pinned to 1.3.0 in nerfstudio but since nerfstudio-project/gsplat#365 was merged there's now pre-built wheels:

cc @liruilong940607 but I think he's very busy these days + also doesn't use Colab.

liruilong940607 · 2024-10-01T05:59:46Z

Thanks for looping me in @brentyi !

I did a quick test on colab (T4 GPU) and i was able to install the latest gsplat on it. So it might be just a issue in the previous version (though I can't think of what might cause this).

The colab: https://colab.research.google.com/drive/10HVUf6e8_pRrMj4cmQ5Xepoq6BdkJkav?usp=sharing

fasteinke · 2024-10-02T11:48:03Z

Thanks for the input ... some progress made ...

Added cell to demo.ipynb, following the "Install Nerfstudio and Dependencies" cell:

!pip install gsplat==1.4.0 --index-url https://docs.gsplat.studio/whl

which appeared to work; uninstalled 1.3.0, installed 1.4.0.

But, this time a different error:

...
Trainer.train_iteration: 501.1180
VanillaPipeline.get_train_loss_dict: 501.1118
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gsplat/cuda/_backend.py", line 83, in
from gsplat import csrc as _C
ImportError: /usr/local/lib/python3.10/dist-packages/gsplat/csrc.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorESt8optionalIN3c1010ScalarTypeEES5_INS6_6LayoutEES5_INS6_6DeviceEES5_IbES5_INS6_12MemoryFormatEE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '10']' returned non-zero exit status 1.

liruilong940607 · 2024-10-02T16:48:57Z

Hey, installing gsplat's prebuilt wheels works fine for me, see:

https://colab.research.google.com/drive/10HVUf6e8_pRrMj4cmQ5Xepoq6BdkJkav?usp=sharing

You need to figure out the torch and CUDA version in the system and choose the correct prebuilt wheel for gsplat.

fasteinke · 2024-10-03T07:13:44Z

Thanks!!! ... Had a misunderstanding about using "pip install ... --index-url ..." - so, next round installed the correct version, and the processing kicked off nicely ...

nerfstudio-project deleted a comment Sep 29, 2024

liruilong940607 closed this as completed Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

splatfacto method in Colab broken? #3455

splatfacto method in Colab broken? #3455

fasteinke commented Sep 29, 2024 •

edited

Loading

fasteinke commented Sep 29, 2024

fasteinke commented Sep 29, 2024

brentyi commented Sep 29, 2024

fasteinke commented Sep 29, 2024

fasteinke commented Oct 1, 2024

brentyi commented Oct 1, 2024

liruilong940607 commented Oct 1, 2024 •

edited

Loading

fasteinke commented Oct 2, 2024 •

edited

Loading

liruilong940607 commented Oct 2, 2024

fasteinke commented Oct 3, 2024 •

edited

Loading

splatfacto method in Colab broken? #3455

splatfacto method in Colab broken? #3455

Comments

fasteinke commented Sep 29, 2024 • edited Loading

fasteinke commented Sep 29, 2024

fasteinke commented Sep 29, 2024

brentyi commented Sep 29, 2024

fasteinke commented Sep 29, 2024

fasteinke commented Oct 1, 2024

brentyi commented Oct 1, 2024

liruilong940607 commented Oct 1, 2024 • edited Loading

fasteinke commented Oct 2, 2024 • edited Loading

liruilong940607 commented Oct 2, 2024

fasteinke commented Oct 3, 2024 • edited Loading

fasteinke commented Sep 29, 2024 •

edited

Loading

liruilong940607 commented Oct 1, 2024 •

edited

Loading

fasteinke commented Oct 2, 2024 •

edited

Loading

fasteinke commented Oct 3, 2024 •

edited

Loading