Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 ExpectedMoreSplits Exception #57

Open
time-less-ness opened this issue Jun 17, 2024 · 1 comment
Open

Llama2 ExpectedMoreSplits Exception #57

time-less-ness opened this issue Jun 17, 2024 · 1 comment

Comments

@time-less-ness
Copy link

time-less-ness commented Jun 17, 2024

Command:

python main.py     --model /data/Llama-2-7b-chat-hf/      --prune_method wanda     --sparsity_ratio 0.5     --sparsity_type unstructured     --save out/llama_2_7b/unstructured/wanda/

Result:

  File "Dev/wanda/main.py", line 110, in <module>
    main()
  File "Dev/wanda/main.py", line 69, in main
    prune_wanda(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)
  File "Dev/wanda/lib/prune.py", line 132, in prune_wanda
    dataloader, _ = get_loaders("c4",nsamples=args.nsamples,seed=args.seed,seqlen=model.seqlen,tokenizer=tokenizer)
  File "Dev/wanda/lib/data.py", line 73, in get_loaders
    return get_c4(nsamples, seed, seqlen, tokenizer)
  File "Dev/wanda/lib/data.py", line 43, in get_c4
    traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train')
  File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/load.py", line 1791, in load_dataset
    builder_instance.download_and_prepare(
  File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/builder.py", line 891, in download_and_prepare
    self._download_and_prepare(
  File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/builder.py", line 1004, in _download_and_prepare
    verify_splits(self.info.splits, split_dict)
  File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 91, in verify_splits
    raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}

In case it matters, the Llama-2-7b-chat-hf folder:

Llama-2-7b-chat-hf/
total 39509433
-rw-r--r-- 1 usr grp         21 Jun 17 14:06 added_tokens.json
-rw-r--r-- 1 usr grp        583 Jun 17 14:06 config.json
-rw-r--r-- 1 usr grp        200 Jun 17 14:06 generation_config.json
-rw-r--r-- 1 usr grp       7020 Jun 17 14:06 LICENSE.txt
-rw-r--r-- 1 usr grp 9976576152 Jun 17 14:15 model-00001-of-00002.safetensors
-rw-r--r-- 1 usr grp 3500296424 Jun 17 14:09 model-00002-of-00002.safetensors
-rw-r--r-- 1 usr grp      26788 Jun 17 14:06 model.safetensors.index.json
-rw-r--r-- 1 usr grp 9877989586 Jun 17 14:13 pytorch_model-00001-of-00003.bin
-rw-r--r-- 1 usr grp 9894801014 Jun 17 14:14 pytorch_model-00002-of-00003.bin
-rw-r--r-- 1 usr grp 7180990649 Jun 17 14:12 pytorch_model-00003-of-00003.bin
-rw-r--r-- 1 usr grp      26788 Jun 17 14:06 pytorch_model.bin.index.json
-rw-r--r-- 1 usr grp      10148 Jun 17 14:06 README.md
-rw-r--r-- 1 usr grp        435 Jun 17 14:06 special_tokens_map.json
-rw-r--r-- 1 usr grp        746 Jun 17 14:06 tokenizer_config.json
-rw-r--r-- 1 usr grp    1842764 Jun 17 14:06 tokenizer.json
-rw-r--r-- 1 usr grp     499723 Jun 17 14:06 tokenizer.model
-rw-r--r-- 1 usr grp       4766 Jun 17 14:06 USE_POLICY.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants