Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab: Error when training new model #247

Open
siflueckiger opened this issue Jan 9, 2022 · 3 comments
Open

Colab: Error when training new model #247

siflueckiger opened this issue Jan 9, 2022 · 3 comments

Comments

@siflueckiger
Copy link

Hello. When I am trying to train a new model on google colab. I run into the following error:

Training new model w/ 3-layer, 128-cell LSTMs
Training on 125,286 character sequences.
Epoch 1/20

---------------------------------------------------------------------------

UnknownError                              Traceback (most recent call last)

<ipython-input-8-766a9f967633> in <module>()
     18     max_length=model_cfg['max_length'],
     19     dim_embeddings=100,
---> 20     word_level=model_cfg['word_level'])

4 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57     ctx.ensure_initialized()
     58     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 59                                         inputs, attrs, num_outputs)
     60   except core._NotOkStatusException as e:
     61     if name is not None:

UnknownError:    Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[model_2/rnn_1/PartitionedCall]] [Op:__inference_train_function_10520]

Function call stack:
train_function -> train_function -> train_function

i already tried the command !kill -9 -1 from another issue. It didn't worked.
Can anybody help me? Thanks..

@Fqlox
Copy link

Fqlox commented Jan 17, 2022

Hi, did you try to not run the block %tensorflow_version 1.x since the project is now using tensorflow 2.1 ?

@mocallito
Copy link

Hi, did you try to not run the block %tensorflow_version 1.x since the project is now using tensorflow 2.1 ?

Dude, it worked thx

@mocallito mocallito mentioned this issue Jan 22, 2022
@ghost
Copy link

ghost commented Jan 29, 2022

Note: for me the below alone didn't work, I also had to do a factory reset runtime.

Hi, did you try to not run the block %tensorflow_version 1.x since the project is now using tensorflow 2.1 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants