Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full repository review for VRE on-boarding of @benelot by @stepaza #1

Open
wants to merge 11 commits into
base: empty
Choose a base branch
from

Conversation

benelot
Copy link
Member

@benelot benelot commented Aug 12, 2019

This is an artificially created full repository review. If you ever need to do the same, I added the procedure to do it in on the wiki: https://github.com/IDSC-io/idsc-data-science-wiki/wiki/Git#conduct-a-full-repository-review-on-github

@benelot benelot changed the title Full repository review for handover from @stepaza to @benelot within the DS team Full repository review for VRE on-boarding of @benelot by @stepaza Aug 12, 2019
Copy link
Member Author

@benelot benelot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am done with my first review without having run the code. So this is very preliminary and I only have a limited understanding as of now.

@@ -0,0 +1,299 @@

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full code repository review

All comments to it are inline to the repository. Please scroll on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a clean up of the code before anything else is done:

  • Migration into datascience cookiecutter template
  • Code compliance
  • Clean out unused code (it is still in the version control)
  • Switch to english wording (I am a bit irritated by the german words that leaked into the function names etc. I would name all entities in proper english and make a clean cut between the database names and the code names. Every idea of selling a product in my opinion is diminished by language-dependent code.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I get access to the Atelier_DS? Assume I am a noob on all the CDWH stuff, tell me everything, I will tell you what I know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who designed the dataset? It seems to pull very specific data, so I would not know how to extend it on the fly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we proceed from here?

@@ -0,0 +1,4 @@
# Sphinx build info version 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the docs build from the repository to reduce the clutter

@@ -0,0 +1,258 @@
********************
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patient Test Dataset

Is this for development vs. the full dataset for validation?

This folder contains important functions for loading data from SQL into CSV, thereby preparing the "raw" data used for
building the actual network models.

Most importantly, this folder also contains the file ``Update_Model.sh``, which is a bash script controlling `all steps`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably review that script.

@@ -0,0 +1,72 @@
BWART,Text
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepaza: Quickly explain the original idea of the content of 'Unused'.

@@ -0,0 +1,133 @@
# -*- coding: utf-8 -*-
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model classes




# logging.info(f"##################################################################################")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepaza That is a lot of unused, commented code here.

write_file.write(csv_sep.join(data_list) + '\n')
print('\nSuccessfully collected patient information !\n')


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many trailing line feeds....

import itertools
import random
import json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepaza Can this be deleted?

@@ -0,0 +1,22 @@
# Small script to test the patient import defined in the HDFS_data_loader.py script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepaza Is this a user-driven unit test? Could we make a unit test out of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants