Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training failed #110

Open
Arseny5 opened this issue Aug 23, 2024 · 1 comment
Open

Training failed #110

Arseny5 opened this issue Aug 23, 2024 · 1 comment

Comments

@Arseny5
Copy link

Arseny5 commented Aug 23, 2024

Hello! Can you please help how I can fix this error?

Epoch 1: : 0batch [00:00, ?batch/s]

Traceback (most recent call last):
  File "tasks/run.py", line 15, in <module>
    run_task()
  File "tasks/run.py", line 10, in run_task
    task_cls.start()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/tasks/base_task.py", line 257, in start
    trainer.fit(task)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 580, in fit
    self.run_pretrain_routine(model)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 673, in run_pretrain_routine
    self.train()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1448, in train
    self.run_training_epoch()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1482, in run_training_epoch
    output = self.run_training_batch(batch, batch_idx)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1604, in run_training_batch
    loss = optimizer_closure()
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1569, in optimizer_closure
    output = self.training_forward(
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/utils/pl_utils.py", line 1678, in training_forward
    output = self.model.training_step(*args)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/tasks/base_task.py", line 128, in training_step
    loss_ret = self._training_step(sample, batch_idx, optimizer_idx)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/usr/task.py", line 57, in _training_step
    log_outputs = self.run_model(self.model, sample)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/usr/diffsinger_task.py", line 299, in run_model
    output = model(txt_tokens, mel2ph=mel2ph, spk_embed=spk_embed,
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/usr/diff/shallow_diffusion_tts.py", line 236, in forward
    ret = self.fs2(txt_tokens, mel2ph, spk_embed, ref_mels, f0, uv, energy,
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/nfs/ai-arseny/DiffSinger/modules/diffsinger_midi/fs2.py", line 63, in forward
    midi_dur_embedding = self.midi_dur_layer(kwargs['midi_dur'][:, :, None])  # [B, T, 1] -> [B, T, H]
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 91, in forward
    return F.linear(input, self.weight, self.bias)
  File "/opt/conda/envs/diffsinger_mooninriver/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
@godofecht
Copy link

Are your CUDA drivers compatible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants