Since the inception of LeNet5, Yan LeCun’s first Convolutional Neural Network for handwritten digit-recognition in 1994, Neural Networks have come a long way in classifying 2D images and extracting their features. However, since we live in a 3D world, it is natural to ask if we can extract the features of 3D shapes and classify them. This turns out to be rather difficult given the exponential increase in complexity with the added third dimension. To perform this classification efficiently, we propose an ensembled encoderless-decoder and classifier neural network to learn the latent encoding vectors for point-cloud pair transformations. This ensemble extracts the feature descriptors of the transformations without a parametric encoder or a lossy 2D projection. We also empirically demonstrate the effectiveness of our ensemble model through a benchmark comparison on a 3D point cloud classification task for the PointNet and ModelNet dataset. Finally, we discuss the applicability of our 3D shape descriptor learners and classifiers as vital preprocessing techniques for eventual 3D object classification, segmentation, and database retrieval tasks.
For training and testing our model, we use the PointNet and the ModelNet data.
ModelNet40 | |
---|---|
Data can be acquired from the PointNet and ModelNet websites.
We convert all 3D image data to a .npy
3D point cloud format. See kaolin for conversion from ModelNet .off
files to .npy
files.
Uses four AutoDecoder Nets of different sizes to capture the shallow and deeper level features from the original point cloud that will be used for learning the latent transformation encoding between the point cloud pairs.
Uses 5 similar CompNets that were initialized with different weights to make them independent.
The AutoDecoder Ensemble Net is able to perform significantly better than the single shallow layered CompNet
Uses Four CompNets of different sizes to capture the general and deeper level features
After converting all 3D data into .npy
point clouds, we create DataLoaders
that feeds batches of Point Cloud Pairs of the same and different classes to the AutoDecoder for training.
We first create an embedding or encoding layer that will represent the latent transformation encoding vectors. between point-cloud pairs of the same and different class.
Given the embedding vector and a point-cloud, the AutoDecoder receives two more sample point-clouds, one of the same class and one of a different class.
We train the AutoDecoder's layer weights by learning the geometric transformation from the source point-cloud to the target point-cloud that might be of the same or different class. This learning is done iteratively with the ADAM optimizer where the goal is to minimize the chamfer loss between the source and target point clouds.
This information about the geometric transformation is stored in the embedding or the latent vector that is returned after the training process is done.
The latent vector that is generated by the AutoDecoder can be supplied to a classifier that can learn whether latent encoding represents point-cloud transformations between the same class or a different class. This information on class transformation can then be used for the classification of the 3D object.
Latent vectors that represent transformations between point-clouds of the same class give a higher similarity score and vice-versa for point-clouds of different classes.
We minimize the symmetric chamfer loss between point cloud pairs of the same class and different class to get the representative latent encoding transformation vector.
The Chamfer loss is easily differentiable for back-propagation and is robust to outliers in the point cloud.
One of the main hyper-parameters, the encoding size for the latent vector is set to 256
.
All other Neural Network Parameters are reported in the supplementary material.
- CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
- Architecture: 64 Bit
- RAM: 62 Gib System Memory
- Graphics: 7979 MiB GeForce RTX 2080
Note: The accuracies have an error of 1 percentage points between different runs
Single CompNet |
Ensemble CompNet |
---|---|
Logistic Regr | Naive Bayes |
Decision Trees | Random Forest Classifier |
Figure 1: ROC Curves for the different models on the Classification PointNet7 task
Note: The accuracies have an error of 1 percentage points between different runs
Single CompNet | Ensemble CompNet | |
---|---|---|
Single AutoDecoder 5m 24s |
25.4s |
1m 12.2s |
Ensemble AutoDecoder 8m 24s |
25.2s |
14.4s |
Note: The accuracies have an error of 1 percentage points between different runs
Note: Results for the Neural Network Models models only
Single CompNet (Full PointNet) | Ensemble CompNet (Full PointNet) |
---|---|
AutoDecoder Training time was 11m 45s
; Single CompNet and Ensemble CompNet training times were at 26.6s
and 13.1s
respectively.
Note: The accuracies have an error of 2 percentage points between different runs
Single CompNet | Ensemble CompNet | |
---|---|---|
Single AutoDecoder 5m 59s |
10.0s |
27.7s |
Ensemble AutoDecoder 8m 41s |
9.83s |
27.2s |
Note:
- The accuracies have an error of 3 percentage points between different runs
- The Dataset contains samples from only 33 classes for training and testing as the conversion module (
kaolin
) we were using could not resolve some of the.off
files in the ModelNet40 dataset. The used classes are mentioned in the supplementary material.
Single CompNet | Ensemble CompNet | |
---|---|---|
Single AutoDecoder 11m 39s |
47.1s |
2m 9s |
Ensemble AutoDecoder 16m 51s |
47.8s |
2m 12s |
In general, no matter whether the AutoDecoder has a single Neural Network or an ensemble Neural Network, the ensemble Comparison Nets (CompNets) always outperform all the Single CompNets in classification tasks for the same and different class. In general there is an average improvement of 10% in accuracy when switching to an ensemble classifier network for all of the different datasets that were used.
We also show that our neural network classification models (Single and Ensemble CompNets) always outperform all other classical supervised machine learning models in accuracy, f1-scores and Receiving Operator Characteristic Area Under Curve (ROC-AUC) for the 3D object classification task (See Table 1).
The training times for the different datasets also show that our method computes latent encodings and does the classifications very quickly given our training system statistics.
However, the efficacy of an ensemble AutoDecoder network was not observed for all datasets, especially for PointNet7
, where the total accuracy and related metrics actually decreased.
One of the reasons for this reduced accuracy for the Ensemble AutoDecoder Net could be the highly imbalanced nature of training set for the PointNet7
dataset.
Note: Anaconda
can also be used for the venv
$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
To start a new Jupyter Notebook kernel based on the current virtual environment and run a new Jupyter Notebook Instance:
$ python -m ipykernel install --user --name ENV_NAME --display-name "ENV_DISPLAY_NAME"
$ jupyter notebook
All information to run the ensemble neural network can be found in the autodecoder_ensemble_net
.
The geotnf
package and LoaderFish.py
files are based on implementations from PR-Net (Wang et al. 2019)
All References have been mentionned in the References
section in Ensemble AutoDecoder CompNets for 3D Point Cloud Pair Encoding and Classification.pdf