You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some weights of the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased were not used when initializing BertQueryNER: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight']
This IS expected if you are initializing BertQueryNER from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertQueryNER from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertQueryNER were not initialized from the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased and are newly initialized: ['span_embedding.classifier1.weight', 'end_outputs.bias', 'span_embedding.classifier2.weight', 'span_embedding.classifier2.bias', 'end_outputs.weight', 'span_embedding.classifier1.bias', 'start_outputs.bias', 'start_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: Checkpoint directory /home/sunpeng/AXJ/MRC/outputs/ace2005/warmup0lr2e-5_drop0.3_norm1.0_weight0.1_warmup0_maxlen128 exists and is not empty with save_top_k != 0.All files in this directory will be deleted when a checkpoint is saved!
warnings.warn(*args, **kwargs)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0,1]
Using native 16bit precision.
Some weights of the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased were not used when initializing BertQueryNER: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias']
This IS expected if you are initializing BertQueryNER from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertQueryNER from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertQueryNER were not initialized from the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased and are newly initialized: ['end_outputs.weight', 'span_embedding.classifier2.weight', 'end_outputs.bias', 'start_outputs.bias', 'span_embedding.classifier1.weight', 'span_embedding.classifier1.bias', 'span_embedding.classifier2.bias', 'start_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using native 16bit precision.
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/2
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
Traceback (most recent call last):
File "/home/sunpeng/AXJ/MRC//train/mrc_ner_trainer.py", line 429, in
main()
File "/home/sunpeng/AXJ/MRC//train/mrc_ner_trainer.py", line 416, in main
trainer.fit(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1058, in fit
results = self.accelerator_backend.spawn_ddp_children(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 123, in spawn_ddp_children
results = self.ddp_train(local_rank, mp_queue=None, model=model, is_master=True)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 164, in ddp_train
self.trainer.is_slurm_managing_tasks
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 908, in init_ddp_connection
torch_distrib.init_process_group(torch_backend, rank=global_rank, world_size=world_size)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:31, unhandled cuda error, NCCL version 2.7.8
Traceback (most recent call last):
File "/home/sunpeng/AXJ/MRC/train/mrc_ner_trainer.py", line 429, in
main()
File "/home/sunpeng/AXJ/MRC/train/mrc_ner_trainer.py", line 416, in main
trainer.fit(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1046, in fit
self.accelerator_backend.train(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 57, in train
self.ddp_train(process_idx=self.task_idx, mp_queue=None, model=model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 164, in ddp_train
self.trainer.is_slurm_managing_tasks
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 908, in init_ddp_connection
torch_distrib.init_process_group(torch_backend, rank=global_rank, world_size=world_size)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:31, unhandled cuda error, NCCL version 2.7.8
The text was updated successfully, but these errors were encountered:
Some weights of the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased were not used when initializing BertQueryNER: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight']
Some weights of BertQueryNER were not initialized from the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased and are newly initialized: ['span_embedding.classifier1.weight', 'end_outputs.bias', 'span_embedding.classifier2.weight', 'span_embedding.classifier2.bias', 'end_outputs.weight', 'span_embedding.classifier1.bias', 'start_outputs.bias', 'start_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: Checkpoint directory /home/sunpeng/AXJ/MRC/outputs/ace2005/warmup0lr2e-5_drop0.3_norm1.0_weight0.1_warmup0_maxlen128 exists and is not empty with save_top_k != 0.All files in this directory will be deleted when a checkpoint is saved!
warnings.warn(*args, **kwargs)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0,1]
Using native 16bit precision.
Some weights of the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased were not used when initializing BertQueryNER: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias']
Some weights of BertQueryNER were not initialized from the model checkpoint at /home/sunpeng/AXJ/MRC/bert/bert-base-uncased and are newly initialized: ['end_outputs.weight', 'span_embedding.classifier2.weight', 'end_outputs.bias', 'start_outputs.bias', 'span_embedding.classifier1.weight', 'span_embedding.classifier1.bias', 'span_embedding.classifier2.bias', 'start_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using native 16bit precision.
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/2
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
Traceback (most recent call last):
File "/home/sunpeng/AXJ/MRC//train/mrc_ner_trainer.py", line 429, in
main()
File "/home/sunpeng/AXJ/MRC//train/mrc_ner_trainer.py", line 416, in main
trainer.fit(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1058, in fit
results = self.accelerator_backend.spawn_ddp_children(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 123, in spawn_ddp_children
results = self.ddp_train(local_rank, mp_queue=None, model=model, is_master=True)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 164, in ddp_train
self.trainer.is_slurm_managing_tasks
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 908, in init_ddp_connection
torch_distrib.init_process_group(torch_backend, rank=global_rank, world_size=world_size)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:31, unhandled cuda error, NCCL version 2.7.8
Traceback (most recent call last):
File "/home/sunpeng/AXJ/MRC/train/mrc_ner_trainer.py", line 429, in
main()
File "/home/sunpeng/AXJ/MRC/train/mrc_ner_trainer.py", line 416, in main
trainer.fit(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1046, in fit
self.accelerator_backend.train(model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 57, in train
self.ddp_train(process_idx=self.task_idx, mp_queue=None, model=model)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_backend.py", line 164, in ddp_train
self.trainer.is_slurm_managing_tasks
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 908, in init_ddp_connection
torch_distrib.init_process_group(torch_backend, rank=global_rank, world_size=world_size)
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/sunpeng/.conda/envs/pytorch_gpu/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:31, unhandled cuda error, NCCL version 2.7.8
The text was updated successfully, but these errors were encountered: