Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict the training set of the StepsToReproduce model only to defects #792

Closed
marco-c opened this issue Jul 26, 2019 · 7 comments · Fixed by #3897
Closed

Restrict the training set of the StepsToReproduce model only to defects #792

marco-c opened this issue Jul 26, 2019 · 7 comments · Fixed by #3897
Labels
good-first-bug Good for newcomers

Comments

@marco-c
Copy link
Collaborator

marco-c commented Jul 26, 2019

Given that STRs don't apply to enhancement or task.

@sosa-e
Copy link

sosa-e commented Apr 10, 2023

Hello! I am new to open source and am currently working on this bug after following the steps outlined in #1092 . I read through #817 and have some questions. May I post questions in this issue?

@suhaibmujahid
Copy link
Member

Hello! I am new to open source and am currently working on this bug after following the steps outlined in #1092 . I read through #817 and have some questions. May I post questions in this issue?

Welcome, @sosa-e! You can post questions related to the issues here. Also, you can chat with us in the bugbug Matrix room.

@naoya2000
Copy link
Contributor

naoya2000 commented Dec 3, 2023

Hi! I'm trying to work on this issue, and according to the README.md, I start using the bugbug library by running the trainer.py script with the StepsToReproduce model by running python3 -m scripts.trainer stepstoreproduce in the bugbug directory.

However, I got the following result stating that the imblearn.pipeline.Pipeline model isn't supported by TreeExplainer. Can I safely ignore this, or do I have to resolve this issue first before working on the rest of the issue? Thanks!

(base) naoyaokamoto@Naoyas-MacBook-Air-393 bugbug % python3 -m scripts.trainer stepstoreproduce
2023-12-03 15:31:40,852:INFO:numexpr.utils:NumExpr defaulting to 8 threads.
2023-12-03 15:31:46,253:INFO:bugbug.db:Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.bugbug.data_bugs.latest/artifacts/public/bugs.json.zst to data/bugs.json.zst
data/bugs.json.zst  : 2328191107 bytes                                         
2023-12-03 15:32:57,619:INFO:__main__:Training *stepstoreproduce* model
2023-12-03 15:33:04,232:INFO:bugbug.models.stepstoreproduce:29 bugs have no steps to reproduce
2023-12-03 15:33:04,233:INFO:bugbug.models.stepstoreproduce:4693 bugs have steps to reproduce
2023-12-03 15:33:13,319:INFO:bugbug.model:X: (4722, 4), y: (4722,)
2023-12-03 15:33:18,377:INFO:bugbug.model:Cross Validation scores:
2023-12-03 15:33:18,377:INFO:bugbug.model:Accuracy: f0.6924007482851798 (+/- 0.022998486956704086)
2023-12-03 15:33:18,377:INFO:bugbug.model:Precision: f0.9976212992231602 (+/- 0.0026638832893729376)
2023-12-03 15:33:18,377:INFO:bugbug.model:Recall: f0.692237303345579 (+/- 0.023732875522107115)
2023-12-03 15:33:18,377:INFO:bugbug.model:X_train: (4249, 4), y_train: (4249,)
2023-12-03 15:33:18,377:INFO:bugbug.model:X_test: (473, 4), y_test: (473,)
2023-12-03 15:33:19,435:INFO:bugbug.model:Number of features: 75318
2023-12-03 15:33:19,436:INFO:bugbug.model:Model trained
Traceback (most recent call last):
  File "/Users/naoyaokamoto/mambaforge/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/naoyaokamoto/mambaforge/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/scripts/trainer.py", line 145, in <module>
    main()
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/scripts/trainer.py", line 141, in main
    retriever.go(args)
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/scripts/trainer.py", line 41, in go
    metrics = model_obj.train(limit=args.limit)
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/bugbug/model.py", line 402, in train
    explainer = shap.TreeExplainer(self.clf)
  File "/Users/naoyaokamoto/mambaforge/lib/python3.10/site-packages/shap/explainers/_tree.py", line 166, in __init__
    self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
  File "/Users/naoyaokamoto/mambaforge/lib/python3.10/site-packages/shap/explainers/_tree.py", line 1155, in __init__
    raise InvalidModelError("Model type not yet supported by TreeExplainer: " + str(type(model)))
shap.utils._exceptions.InvalidModelError: Model type not yet supported by TreeExplainer: <class 'imblearn.pipeline.Pipeline'>

@naoya2000
Copy link
Contributor

naoya2000 commented Dec 3, 2023

Turns out that there is an issue; I tried to use the model to classify a certain bug but it didn't work. However, it seems like that there are some steps that may help in the Discussion so I will try those out first. Thanks!

Edited: Sorry, the steps in the Discussion didn't resolve this issue.

(base) naoyaokamoto@Naoyas-MacBook-Air-393 bugbug % python3 -m scripts.bug_classifier stepstoreproduce --bug-id 1635837                                        
2023-12-03 18:38:17,677:INFO:numexpr.utils:NumExpr defaulting to 8 threads.
2023-12-03 18:38:19,688:INFO:__main__:stepstoreproducemodel does not exist. Downloading the model....
2023-12-03 18:38:19,703:INFO:bugbug.utils:Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.bugbug.train_stepstoreproduce.latest/artifacts/public/stepstoreproducemodel.tar.zst...
tar: Error inclusion pattern: Failed to open 'zstdmt'
2023-12-03 18:38:21,680:WARNING:bugbug.utils:Command '['tar', '-I', 'zstdmt', '-xf', 'stepstoreproducemodel.tar.zst']' returned non-zero exit status 1.. Falling back to zstandard API, which could be slower.
Traceback (most recent call last):
  File "/Users/naoyaokamoto/mambaforge/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/naoyaokamoto/mambaforge/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/scripts/bug_classifier.py", line 80, in <module>
    main()
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/scripts/bug_classifier.py", line 76, in main
    classify_bugs(args.model, args.bug_id)
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/scripts/bug_classifier.py", line 32, in classify_bugs
    model = model_class.load(model_file_name)
  File "/Users/naoyaokamoto/Documents/GitHub/bugbug/bugbug/model.py", line 593, in load
    model.clf.named_steps["estimator"].load_model(xgboost_model_path)
AttributeError: 'XGBClassifier' object has no attribute 'named_steps'

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 3, 2023

However, I got the following result stating that the imblearn.pipeline.Pipeline model isn't supported by TreeExplainer. Can I safely ignore this, or do I have to resolve this issue first before working on the rest of the issue? Thanks!

@suhaibmujahid maybe a regression from your recent changes?

@suhaibmujahid
Copy link
Member

Can I safely ignore this

@naoya2000 yes, you can ignore it. You can disable this until the issue is fixed by setting self.calculate_importance to False in the model constructer (do not commit this in your PR).

maybe a regression from your recent changes?

@marco-c Yes, it is a regression. I filed #3880 to follow up on this.

@naoya2000
Copy link
Contributor

I just want to see that I'm on the right track here; basically I need to write something so that the only bugs in the training set are those with type set to defect. So I think I would need to go into stepstoreproduce.py in the models folder and then in the get_labels function, I would want to write something that basically checks that the bug type is defect. Or, perhaps I shouldn't be focusing on stepstoreproduce.py but rather trainer.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-bug Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants