Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors when executing script for generating the binarized data #14

Open
MeWannaSleep opened this issue Jul 2, 2023 · 7 comments
Open

Comments

@MeWannaSleep
Copy link

steps to reproduce the error
1,git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git && pip install -e .
it didn't work well .I execute git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git alone and then cd DA-Transformer,pip install -e . works fine
2,I tried to use the script in readme to generate binarized data

input_dir=path/to/raw_data        # directory of pre-processed text data
data_dir=path/to/binarized_data   # directory of the generated binarized data
src=src                           # source suffix
tgt=tgt                           # target suffix
fairseq-datpreprocess --source-lang ${src} --target-lang ${tgt} \
    --trainpref ${input_dir}/train --validpref ${input_dir}/valid --testpref ${input_dir}/test \
    --src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt \
    --destdir ${data_dir} --workers 32 \
    --user-dir fs_plugins --task translation_dat_task [--seg-tokens 32]

# seg-tokens should be set to 32 when you use pre-trained models.

image
I don't know what's going wrong. Plz help me

@hzhwcmhf
Copy link
Member

hzhwcmhf commented Jul 2, 2023

For the first problem, you are right. I will fix the script in README.
For the second probelm, I don't know the exact reason but guess that the problem may be caused by the corrupted environment. You can try pip uninstall fairseq, run some other python programs to make sure the envoronment is working well, and then pip install -e . in DA-Transformer. Or simply, create a new environment and re-install all the packages.

@MeWannaSleep
Copy link
Author

@hzhwcmhf Thx for your reply, I followed your first suggestion,
1pip uninstall fairseq
image
2,run some other python program to verify the environment,
image
3,pip install -e .
image

run the sript again ,still give the same error.

For the second suggestion,I don't quite understand,so just simply create a conda environment ,and then install fairseq using pip install fairseq
instead of pip install -e .
?

@hzhwcmhf
Copy link
Member

hzhwcmhf commented Jul 3, 2023

Create a conda environment, install pytorch (check if it is working well), clone this repo, pip install -e ..
Then, run fairseq-datpreprocess without arguments to see if there is any error.

@MeWannaSleep
Copy link
Author

@hzhwcmhf still get the same error
UR TP76LD{FMVA4{5)J{ ~Q

@hzhwcmhf
Copy link
Member

hzhwcmhf commented Jul 3, 2023

I am not sure where the problem is. The error message is in a module named pkg_resources, which is not used by this project. As far as I know, it indicates the package manager is corrupted (such as pip and conda). Or maybe you have multiple python installations and do not correctly set the environment variables.

You can try installing the original fairseq (see https://github.com/facebookresearch/fairseq#requirements-and-installation) first, and run fairseq-preprocess to see if there is any error. If the error still exists, maybe you should check the installation of python or conda.

@MeWannaSleep
Copy link
Author

@hzhwcmhf I installed original fairseq, fairseq-preprocess works fine.
I rechecked the error message ,found the error orginated form import dag_loss
image

from torch.utils.cpp_extension import load

maybe something wrong with gcc?

@hzhwcmhf
Copy link
Member

hzhwcmhf commented Jul 3, 2023

If you uninstall fairseq or this project, will from torch.utils.cpp_extension import load produce an error?
I still think that the problem is not caused by this project. Maybe there is a dependency file of an installed package containing PyYAML (>=5.1.*), which is not correctly parsed by pkg_resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants