What is Cdiscount starter?

This is ready to use, end-to-end sample solution for the currently running Kaggle Cdiscount challenge.

It involves data loading and augmentation, model training (many different architectures), ensembling and submit generator.

How to run Cdiscount starter?

Installation

Install the requirements
```
pip install -r requirements.txt
```
Install neptune by simply
```
pip install neptune-cli
```
Finish neptune installation by running
```
neptune login
```
Finally, open neptune and create project cdiscount. Check the project key because you will use it later (most likely it is: CDIS).

Now, you are ready to run the code and train some models...

Run code

remark about the competition data: We have uploaded the data to the neptune platform. It is available in the /public/cdiscount directory. Moreover, we created the meta_data file for large .bson files in the /public/Cdiscount/meta directory. It makes the process way faster.

You can run this end-to-end solution in two ways:

If you wish to work on your own machine you can run
```
neptune run experiment_manager.py -- run_pipeline
```

Deploying on cloud via neptune is super easy

just go
```
source run_neptune_command.sh
```

more advanced option is to run

neptune send experiment_manager.py \
--config experiment_config.yaml \
--pip-requirements-file neptune_requirements.txt \
--project-key CDIS \
--environment keras-2.0-gpu-py3 \
--worker gcp-gpu-medium \
-- run_pipeline

Collect results and upload to Kaggle

Navigate to /output/project_data/submissions, get your submission file, upload it to Kaggle and check your rank in the competition!

Advanced options

custom data directories

If you do not wish to use default data directories, you can specify custom paths in the data_config.yaml

raw_data_dir: /public/Cdiscount
meta_data_dir: /public/Cdiscount/meta
meta_data_processed_dir: /output/project_data/meta_processed
models_dir: /output/project_data/models
predictions_dir: /output/project_data/predictions
submissions_dir: /output/project_data/submissions

data sampling

Since the dataset is very large we suggest that you sample training dataset to a manageable size. Something like 1000 most common categories and 1000 images per category seems reasonable to start with. Nevertheless, You can tweak it however you want in the experiment_config.yaml file

properties:
  - key: top_categories
    value: 100
  - key: images_per_category
    value: 100
  - key: epochs
    value: 10
  - key: pipeline_name
    value: InceptionPipeline

hyperparameter space search

If you like to search the hyperparameter space, neptune can do this for you. Check out hyperparameter optimization.

training without neptune

We give you an option to run this code without neptune. The transition is seamless, just follow these steps:

Download the competition data to some folder your_raw_data_dir
specify data directories in the data_config.yaml

run python code

  python experiment_manager.py run_pipeline

Final remarks

Please feel free to modify this code in order to improve your score. Add new models, pre- and post-processing routines or ensembling methods.

Have fun competing on this Kaggle challenge!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
__init__.py		__init__.py
data_config.yaml		data_config.yaml
experiment_config.yaml		experiment_config.yaml
experiment_manager.py		experiment_manager.py
models.py		models.py
neptune_requirements.txt		neptune_requirements.txt
pipelines.py		pipelines.py
postprocessing.py		postprocessing.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
run_neptune_command.sh		run_neptune_command.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Cdiscount starter?

How to run Cdiscount starter?

Installation

Run code

Collect results and upload to Kaggle

Advanced options

custom data directories

data sampling

hyperparameter space search

training without neptune

Final remarks

About

Releases

Packages

Languages

terry-li-hm/cdiscount-starter

Folders and files

Latest commit

History

Repository files navigation

What is Cdiscount starter?

How to run Cdiscount starter?

Installation

Run code

Collect results and upload to Kaggle

Advanced options

custom data directories

data sampling

hyperparameter space search

training without neptune

Final remarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages