Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics evaluation dataset #10

Open
wants to merge 16 commits into
base: launch
Choose a base branch
from
Open

Conversation

leriomaggio
Copy link
Member

This PR introduces in the backend a new EvaluationDataSource created to calculate the metrics for the learning machine.
This special data source holds reference to the original FER validation set, with selected samples (60) used for model prediction.

These samples have been chosen as a result of a data-driven analysis whose thorough documentation is available into a dedicated notebook, which is also part of the current PR (pls see notebooks/FER_AutoEncoder.ipynb)

The PR also adds in a new LearningMachine Model (i.e. VGG13-FER) which is the DL model originally used in the FERPlus dataset.
This model is currently under training - meanwhile the whole infrastructure is available and integrated in the backend for potential future use.

New VGG13-fer-adapted model as per original FERPlus paper.
The model is currently in training - and it is ready to switch to the new FERPlus dataset.
Meanwhile, the whole infrastructure with the backend is in place and working.
A new instance of `EvaluationDataSource` is now available to be used specifically for metric calculation (i.e. Accuracy).
The new special data source includes a list of selected samples per each class (10 each) that will be consistently used *only* to evaluate the online performance of the model (i.e. the learning machine).
…e selection

This notebook contains all the code, and the analysis performed to identify and select the candidate faces (from the Validation Set) to be used for metric computation.
The (data-driven) strategy considers an AutoEncoder trained on the training set, in an unsupervised fashion.
The generated representation in the latent space is then passed on to a UMAP Dimension reduction model, to plot samples onto the embedding space (one for each data partition).

For each generated embedding, centroids per each class are generated and the 10 closest samples to the centroids are identified.
These are the chosen candidate samples to be used in metric evaluation.
@leriomaggio leriomaggio requested a review from OliverDavis March 2, 2022 15:27
@leriomaggio leriomaggio changed the title Metrics evaluation dataset [DRAFT] Metrics evaluation dataset Mar 4, 2022
@OliverDavis
Copy link
Contributor

This is brilliant! Working perfectly here. Are we ready to merge?

@leriomaggio
Copy link
Member Author

leriomaggio commented Mar 4, 2022

Not yet.

I moved the PR to draft as I am now finalising several changes I worked on in the last couple of days which I am also willing to include in the PR.

These include:

  • a new model for the learning machine (also trained, and several checkpoints saved)
  • integration of the new FERPlus dataset as a backend datasource
  • (therefore) new AutoEncoder and brand new selection of pool for candidate faces for metric calculation (from FERPlus)

I was just checking the last couple of things, and how the pretrained can be used in case.

Will push the latest soon

In previous implementation, indices of samples where chosen locally (i.e. per-emotion) and not globally (per whole dataset) resulting in a wrong overall sample selection.
This has been fixed, and the corresponding notebook has been rerun, now containing correct sample references.
The new FERPlus dataset extends the previous FER dataset by using the new and improved labels from the FER+ dataset version.
Further details can be found in the module documentation.
To integrate the dataset within the backend, the majority label-strategy has been adopted for model training and evaluation.
FERPlus Dataset now is fully integrated into the backend as an available source.

FERPlus is now available as a full-fledge dataset (to be used with the frontend) as well as
plugged-in for evaluation as metric dataset.
Corresponding functions and settings have been updated accordingly.

Note: samples selection for FERPlus **Metric** Dataset have been chosen using a similar
approach used for FER (further details in the next commit with the attached notebook)
…ataset.

In this notebook, the auto-encoder model resembles the convolutional structure of a new model that will be used
as an alternative backbone for the learning machine (subsequently named `VGGFERNet`)
…hts has been added too to enable pre-training.
Previous implementation was mistakenly considering the original samples on the board for evaluation, despite already using the metric dataset.
New implementation further extends on that point by better integrating the selected `EvaluationDataset` by using its corresponding `evluation_sample` and `ground_truth` property for accuracy calculation.
@leriomaggio leriomaggio changed the title [DRAFT] Metrics evaluation dataset Metrics evaluation dataset Mar 4, 2022
@leriomaggio
Copy link
Member Author

Ok @OliverDavis should be ready now for testing, and review.

Please note that I've changed the defaults for the backend in this MR, therefore using the new LearningMachine model, and the new FER+ dataset.
This means that it would take a few seconds the first time it will start, to download the new dataset.
Please keep an eye on the logs in the console for the backend :)

The pretrained option for the model should still be disabled, although fully available at different checkpoints (see learning_machine.py#L335), corresponding to different "starting" grades of learning of the model.

Final performance of this model are also quite decent too, on a full scale training:

Training: ACC: 0.934; 	 MCC: 0.911; 
Validation:  0.802; MCC: 0.732
Test:  ACC: 0.821; MCC: 0.755

Please feel free to let me know if theres's anything that's not clear, or requires fix :)

Once this is merged, we might want to discuss how do we want to save annotations.

TY

@OliverDavis
Copy link
Contributor

I'm getting this when I run the backend:

File "app.py", line 6, in
from endpoints import faces, get_face, annotate, test_face
File "/.../The-Learning-Machine/backend/endpoints.py", line 4, in
from numpy.typing import ArrayLike
ModuleNotFoundError: No module named 'numpy.typing'

Am I missing something from the Conda environment? (Or just missing something :-)

@leriomaggio
Copy link
Member Author

leriomaggio commented May 30, 2022

I'm getting this when I run the backend:

File "app.py", line 6, in from endpoints import faces, get_face, annotate, test_face File "/.../The-Learning-Machine/backend/endpoints.py", line 4, in from numpy.typing import ArrayLike ModuleNotFoundError: No module named 'numpy.typing'

Am I missing something from the Conda environment? (Or just missing something :-)

I think this is because you don't probably have the nptyping package installed within your env. (or it could not be found for some reason)

That could be easily fixed with the following command:

pip install nptyping

Please let me know if that worked :)

@OliverDavis
Copy link
Contributor

Thanks! Solved with pip -U numpy, updating numpy to version > 1.20.

@OliverDavis
Copy link
Contributor

I've tried training this for a while, and I'm not sure it's learning anything at the moment: I'm stuck around 0.16 on the graph. Is that random chance with six categories? Very occasionally I get a jump up or down, but it quickly returns to 0.16. The donut plots change, but they remain identical for all the faces (i.e. all of them change in the same way). This might be expected behaviour, if training is very, very slow, although it seemed faster with the previous version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants