-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Melanoma task #310
base: main
Are you sure you want to change the base?
Melanoma task #310
Conversation
…e. also delete the tarball after done using
Bumps [JamesIves/github-pages-deploy-action](https://github.com/jamesives/github-pages-deploy-action) from 4.6.8 to 4.6.9. - [Release notes](https://github.com/jamesives/github-pages-deploy-action/releases) - [Commits](JamesIves/github-pages-deploy-action@v4.6.8...v4.6.9) --- updated-dependencies: - dependency-name: JamesIves/github-pages-deploy-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is still some small cleanup tasks to be done (maybe I reviewed to early), but I already left some comments. Looking good!
R/TaskClassif_melanoma.R
Outdated
#' | ||
#' @references | ||
#' `r format_bib("melanoma2021")` | ||
#' @examplesIf torch::torch_is_installed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is required here, as printing the task does not create any tensors
|
||
compressed_tarball_file_name = "hf_ISIC_2020_small.tar.gz" | ||
compressed_tarball_path = file.path(path, compressed_tarball_file_name) | ||
curl::curl_download(paste0(base_url, compressed_tarball_file_name), compressed_tarball_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because curl is in suggests, we should run mlr3misc::require_namespaces("curl")
before so users get a good error message when they don't have it installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we should just write require_namespaces()
without the mlr3misc::
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
R/TaskClassif_melanoma.R
Outdated
old = c("image", "patient", "anatom_site_general"), | ||
new = c("image_name", "patient_id", "anatom_site_general_challenge") | ||
)[, split := "test"] | ||
metadata = rbind(training_metadata, test_metadata, fill = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is being filled here?
attic/03-process_melanoma.R
Outdated
@@ -0,0 +1,84 @@ | |||
library(data.table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the attic is more for code that was discarded for now but might be used again in the future (I haven't cleaned up in a while though :D).
If it is about how to create data, I would probably rather put it into data-raw
. But as we already document on huggingface how it was created. We don't really need it in this repository I think.
R/TaskClassif_melanoma.R
Outdated
training_metadata = data.table::fread(here::here(path, training_metadata_file_name)) | ||
|
||
test_metadata_file_name = "ISIC_2020_Test_Metadata.csv" | ||
test_metadata = data.table::fread(here::here(path, test_metadata_file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use here
here
old = c("image", "patient", "anatom_site_general"), | ||
new = c("image_name", "patient_id", "anatom_site_general_challenge") | ||
)[, split := "test"] | ||
metadata = rbind(training_metadata, test_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fill = TRUE
was here because we want to fill the response variable of the test data with NA
, will double-check to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok makes sense, maybe add a comment
And yes, I would delete the individual files on huggingface |
…y_imagenet. Probably need to construct full file paths first
@@ -83,7 +83,7 @@ load_task_melanoma = function(id = "melanoma") { | |||
cached_constructor = function(backend) { | |||
data = cached(constructor_melanoma, "datasets", "melanoma")$data | |||
|
|||
data[, benign_malignant := factor(benign_malignant, levels = c("benign", "malignant"))] | |||
data[, outcome := factor(outcome, levels = c("benign", "malignant"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the static analyzer will not know that outcome is a valid variable here (this is generally an issue with NSE)
you can use outcome := factor(get("outcome"), ...)
as a workaround
https://huggingface.co/datasets/carsonzhang/ISIC_2020_small
Should we delete the individual files on Hugging Face?