Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rui/ds1000 #44

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

rui/ds1000 #44

wants to merge 34 commits into from

Conversation

rays1024
Copy link
Collaborator

No description provided.

.DS_Store Outdated Show resolved Hide resolved
@niansong1996
Copy link
Contributor

@rays1024 Any updates on this? btw, the title for your PR is kind of confusing. If you are working on two different things (length calculator and ds-1000 dataset), you should have two separate branches and two separate PRs accordingly.

@rays1024
Copy link
Collaborator Author

@rays1024 Any updates on this? btw, the title for your PR is kind of confusing. If you are working on two different things (length calculator and ds-1000 dataset), you should have two separate branches and two separate PRs accordingly.

Sorry I have been pretty busy with projects and assignments, but I will work on the issue and try to have an update by Friday. I'll also make a new PR for length_calculator as well. Thanks for the reminder!

@niansong1996
Copy link
Contributor

I see, thanks for the update. Let me know if anything changes.

@rays1024 rays1024 changed the title [wip] added length_calculator [wip] DS1000 dataset Jun 28, 2023
@rays1024
Copy link
Collaborator Author

Updating this pull request's name to "DS1000 dataset" to avoid confusion. Updates on length_calculator will be made on another branch/PR

Copy link
Contributor

@niansong1996 niansong1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested the actual functionality but the executors and datasets look good in general.

Made some comments, can you fix the raised issue?

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put this file in the data directory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want DS-1000's license and readme files in our repo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll delete the license, but the readme file is the instructions for evaluation and creating jsonl file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the testing harness for DS-1000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this script with the zip file are needed for evaluation. The DS-1000 evaluation requires having the original data so that the problems could be loaded into a DS1000Problem class. This class is then used for evaluation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zip files should not be checked in (i.e., committed to online repo) unless there is a strong reason in doing so

Copy link
Collaborator Author

@rays1024 rays1024 Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on removing the need for the original problem data, but these data in the zip file are indeed needed for evaluation.

for key in keys:
new_dict[key] = dictionary[key]

with open("ds1000.jsonl", 'a') as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this appending lines (i.e., using open mode a)?

Copy link
Collaborator Author

@rays1024 rays1024 Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each iteration of the for loop will append one dictionary as a new line to the jsonl file, which is why I used 'a' here. If other ways are commonly used, I can definitely change that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably put this file in analysis or utils or preprocessing

@rays1024 rays1024 changed the title [wip] DS1000 dataset DS1000 dataset Jul 9, 2023
@rays1024 rays1024 changed the title DS1000 dataset rui/ds1000 Jul 9, 2023
@rays1024
Copy link
Collaborator Author

Using jsonargparse==4.15.0 would resolve the previous problems. Still working on fixing the directory problem when evaluating 208 or more problems.

@rays1024
Copy link
Collaborator Author

Error with torch tensor occurred when evaluating incoder-1b, same as this discussion. I added incoder-1b to the if statement at line 159 of seq2seq_model.py to fix this issue

@rays1024
Copy link
Collaborator Author

fixed result saving issue by adding a safe executing wrapper in the DS1000Executor class. The wrapper uses execute in safe_execution_util as a template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants