-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full repository review for VRE on-boarding of @benelot by @stepaza #1
base: empty
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am done with my first review without having run the code. So this is very preliminary and I only have a limited understanding as of now.
@@ -0,0 +1,299 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full code repository review
All comments to it are inline to the repository. Please scroll on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest a clean up of the code before anything else is done:
- Migration into datascience cookiecutter template
- Code compliance
- Clean out unused code (it is still in the version control)
- Switch to english wording (I am a bit irritated by the german words that leaked into the function names etc. I would name all entities in proper english and make a clean cut between the database names and the code names. Every idea of selling a product in my opinion is diminished by language-dependent code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I get access to the Atelier_DS? Assume I am a noob on all the CDWH stuff, tell me everything, I will tell you what I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who designed the dataset? It seems to pull very specific data, so I would not know how to extend it on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we proceed from here?
@@ -0,0 +1,4 @@ | |||
# Sphinx build info version 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the docs build from the repository to reduce the clutter
@@ -0,0 +1,258 @@ | |||
******************** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patient Test Dataset
Is this for development vs. the full dataset for validation?
This folder contains important functions for loading data from SQL into CSV, thereby preparing the "raw" data used for | ||
building the actual network models. | ||
|
||
Most importantly, this folder also contains the file ``Update_Model.sh``, which is a bash script controlling `all steps` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably review that script.
@@ -0,0 +1,72 @@ | |||
BWART,Text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepaza: Quickly explain the original idea of the content of 'Unused'.
@@ -0,0 +1,133 @@ | |||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model classes
|
||
|
||
|
||
# logging.info(f"##################################################################################") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepaza That is a lot of unused, commented code here.
write_file.write(csv_sep.join(data_list) + '\n') | ||
print('\nSuccessfully collected patient information !\n') | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many trailing line feeds....
import itertools | ||
import random | ||
import json | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepaza Can this be deleted?
@@ -0,0 +1,22 @@ | |||
# Small script to test the patient import defined in the HDFS_data_loader.py script | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepaza Is this a user-driven unit test? Could we make a unit test out of it?
This is an artificially created full repository review. If you ever need to do the same, I added the procedure to do it in on the wiki: https://github.com/IDSC-io/idsc-data-science-wiki/wiki/Git#conduct-a-full-repository-review-on-github