Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce MPJPE results in Table 4 #5

Open
victkid opened this issue Sep 23, 2019 · 22 comments
Open

Cannot reproduce MPJPE results in Table 4 #5

victkid opened this issue Sep 23, 2019 · 22 comments

Comments

@victkid
Copy link

victkid commented Sep 23, 2019

Hi,

I'm trying to calculate the MPJPE for camera 2 with none confidence threshold. In the paper, the MPJPE for camera 2 is 7.72, but I got a number around 33. Followings are my approach, please guide where I did it wrong.

Step 1, use inner linear interpolation method to reshape test images from 260x346 to 260x344

Question:
The image produced from Matlab has the shape 260x346, but the input shape to the neural network is 260x344, how exactly do you cut the 2 pixels?

Step 2, load nn weights from DHP_CNN.model

Step 3, generate prediction outputs from test image set. The outputs have the shape (n_samples, 260, 344, n_joints).

Step 4, inner linear interpolation method to reshape the prediction outputs to (n_samples, 260, 346, n_joints) to match with the ground truth data.

Step 5, calculate the pairwise wise euclidean distance for each joint and each sample and then take the average.

@ShallowWill
Copy link

I meet the same problem. @enrico-c @tobidelbruck

I am not familiar with Keras, so I did not load the '*.model' file provided by authors. But I used the Stacked Hourglass Networks, which I think is more advanced than authors' model, to reproduce results in Tables 4. When testing on data from Camera 2, the MPJPE is around 36 that is much higher than the reported 7.72. I nearly give up working on this dataset. The authors already achieved good results that I cannot surpass.

@victkid
Copy link
Author

victkid commented Sep 26, 2019

Hi ShallowWill,

I tried different Networks as well. The best-performed ones have the MPJPE around 35 ish. I draw the outputs on the images, they look similar to the benchmark outputs as well. I think it's highly possible that we calculate the MPJPE differently than the authors.
@enrico-c @tobidelbruck Please provide some help here.

@enrico-c
Copy link
Contributor

Hi,
Thanks for your question.
It is easier to help with the issue if you provide a code example with your procedure.
As for your approach @victkid , in our case we cropped the two rightmost pixel columns after generating the groundtruth heatmaps.

I would not expect a CNN trained on RGB frames to perform well on DVS frames without any fine tuning

@victkid
Copy link
Author

victkid commented Sep 30, 2019

@enrico-c Thank you for your reply. Following is the code for calculating the MPJPE. The MPJPE I get is 30.84.

model_name = "./data/weights/benchmark/DHP_CNN.model"
model = keras.models.load_model(model_name)

test_data = "/media/data/DHP19/train_test_data/x_test_2.npy" # test data path for camera 2
test_label = "/media/data/DHP19/train_test_data/y_test_2.npy" # test label path for camera 2
x_test = np.load(test_data)  # shape: (n_samples, 260, 346) n_samples = 23583
y_test = np.load(test_label)  # shape: (n_samples, n_joints, 2)
n_samples = len(x_test)
n_joints = 13

x_test = x_test[:, :, :-2]  # shape: (n_samples, 260, 344)
y_pred = np.zeros((n_samples, n_joints))
for i in range(len(x_test)):
    y_i = model.predict(x_test[i][np.newaxis, ..., np.newaxis])[0]  # shape: (260, 344, 13)
    y_i_reshape = np.reshape(y_i, [-1, n_joints])  # shape: (260 * 344, 13)
    pixel_index = np.unravel_index(np.argmax(y_i_reshape, axis=0), (260, 344))  # get the pixel locations for the maximum value from the output
    y_pred_i = np.array([pixel_index[1], pixel_index[0]]).transpose()  # shape: (13, 2)
    dist = paired_distances(y_pred_i, y_test[i])  # euclidean distance for each joint. shape: (13, )
    y_pred[i] = dist

mean_dist = np.mean(y_pred)
print("mean dist: ", mean_dist)

@ShallowWill
Copy link

Hi, @victkid , is paired_distances() function embedded in the '.model' file? how do you define this function? Could you please give some code about this function?

@victkid
Copy link
Author

victkid commented Oct 3, 2019

Hi @ShallowWill, sorry for the confusion. paired_distances() is a function provided from sklearn. You can import it by using
from sklearn.metrics.pairwise import paired_distances
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.paired_distances.html

@enrico-c
Copy link
Contributor

enrico-c commented Oct 4, 2019

Hi, I cannot run the code as it is, because I don't have your .npy arrays. But I tried your code on one of the .h5 files and I obtained the same result as with my own code.

Also, the results I obtained from the repo for all the samples are in line with the results in table 4 (*)(see bottom). So I think the issue is in the way the .npy arrays are created.

These are the things I would suggest:

  • make sure the 3D labels are flipped along the v direction (vertical) when projecting in 2D (as done in the heatmap generation example)
  • make sure to calculate MPJPE for cameras 2/3 only
  • the channel index in the generated .h5 file is not equivalent to the camera index (even if for camera 2 the index is 3, and for camera 3 the index is 2, so this should not be the problem)
  • make sure to keep the same order of u,v pixels in y_test and y_pred_i

If you provide a complete example to run on one of the .h5 generated files (frames + 3D labels) I can have a closer look at your code.

(*): There is actually a main difference in the code in the repo and the one used to generate the numbers in table 4, that is, in the background filter I had included the central pixel. This in general is not wanted, it is a mistake I made and I am committing a fix for it.
However, the 2D MPJPE results I obtained for all test samples are about 8 pixel even with this bug in the background filter.

@WillCheung2016
Copy link

WillCheung2016 commented Oct 25, 2019

Hi @enrico-c ,

I have the same problem as @victkid 's. I made a sample python program to reproduce the result. I tried it on the keras model file DHP_CNN.model provided using S12_session5_mov4_7500events for test. Please see the codes below.

`
import numpy as np
import glob
from keras.models import load_model
import keras.backend as K

def mse2D(y_true, y_pred):
mean_over_ch = K.mean(K.square(y_pred - y_true), axis=-1)
mean_over_w = K.mean(mean_over_ch, axis=-1)
mean_over_h = K.mean(mean_over_w, axis=-1)
return mean_over_h

def compute_error(images, labels, model):
'''
This function computes the MPJPE error
:param images: DHP image data stored as a numpy array of shape [number_of_frames, height, width]
:param labels: DHP joint coordinates data stored as a numpy array of shape [number of frames, number_of_joints, 2]
:param model: keras model object
:return: MPJPE error
'''
num_frames = images.shape[0]
pred_pts_mtx = np.zeros(shape=(num_frames, 13, 2))
for i in range(num_frames):
image = images[i, :, :-2]
heatmaps = model.predict(image[np.newaxis, :, :, np.newaxis])
heatmaps = heatmaps[0, :, :, :]
pts = np.zeros(shape=(13, 2))
for j in range(13):
hm = heatmaps[:, :, j]
score = hm.max() # Find the maximum score which indicates the predicted joint location
pt_y, pt_x = np.where(hm == score) # Find the corresponding coordinates
pt_y = pt_y[0]
pt_x = pt_x[0]
pts[j, 0] = pt_x
pts[j, 1] = pt_y
pred_pts_mtx[i, :, :] = pts
difference_mtx = pred_pts_mtx - labels.astype('float32')
error = np.sum(np.sqrt(np.sum(np.square(difference_mtx), axis=2))) / (13. * num_frames)
print('MPJPE error: ', error)
return error

model=load_model('DHP_CNN.model', custom_objects={'mse2D': mse2D})
model.summary()
cam_ind = 2
chosen_image_file = 'S12_session5_mov4_7500events_img.npy'
chosen_label_file = 'S12_session5_mov4_7500events_uv.npy'
images = np.load(chosen_image_file)[:, :, :, cam_ind-1]
labels = np.load(chosen_label_file)[:, :, :, cam_ind-1]
error = compute_error(images, labels, model)
`
I got MPJPE error of 41.115 for the test file. Since it was tested on just one file the error is not a reflection of the overall model performance. However it is still much larger than 8. Could you use my code to test the models you have? If I made mistakes in my code, please let me know.

Thank you very much.
Shengdong

@enrico-c
Copy link
Contributor

Hi @WillCheung2016 ,
As in the previous case, it is not specified how your numpy files are generated:

chosen_image_file = 'S12_session5_mov4_7500events_img.npy'
chosen_label_file = 'S12_session5_mov4_7500events_uv.npy'

I do not have these files, so I cannot run your code.
I can help to solve the issue if you provide an example using the h5 files that are generated by the code of this repo

@WillCheung2016
Copy link

Hi @WillCheung2016 ,
As in the previous case, it is not specified how your numpy files are generated:

chosen_image_file = 'S12_session5_mov4_7500events_img.npy'
chosen_label_file = 'S12_session5_mov4_7500events_uv.npy'

I do not have these files, so I cannot run your code.
I can help to solve the issue if you provide an example using the h5 files that are generated by the code of this repo

Hi @enrico-c ,

Apologies. I forgot to attach the .npy files. Please find them here in the zip file.

data_files.zip

I just read data from .h5 files and saved them as .npy files. Could you run the model you provided on the data and compute the error with your error metrics? If you end up with the same number so I know that the way I computed errors was consistent with yours. Thanks a lot.

@enrico-c
Copy link
Contributor

Hi @WillCheung2016 ,

thanks for clarifying, I tried and I get a similar result.
The reason is that you are using cam_ind=2 and loading channel cam_ind-1. That view is one of the two side views, while the provided model was trained on the 2 front views only.

@WillCheung2016
Copy link

Hi @enrico-c ,

Thank you very much for confirmation. I repeated the experiment using the keras model downloaded from the repo on the data files generated by the matlab scripts. The groundtruth joint locations were computed by the codes in the ipython notebook for heatmap generation. This time I used the data from camera 3.
predicted

labels

The top one shows the joint predictions of the model, and the one below shows the groundtruth joint locations. The MPJPE error is 16.035, which is much better than the previous number, but still relatively large. I attach the h5 files here.

h5_data.zip

Also, I have a question about the CNN model architecture. Below is the architecture of the keras model in the repo:

`
Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 260, 344, 1) 0


conv1 (Conv2D) (None, 260, 344, 16) 144


activation_1 (Activation) (None, 260, 344, 16) 0


pool1 (MaxPooling2D) (None, 130, 172, 16) 0


conv2a (Conv2D) (None, 130, 172, 32) 4608


activation_2 (Activation) (None, 130, 172, 32) 0


conv2b (Conv2D) (None, 130, 172, 32) 9216


activation_3 (Activation) (None, 130, 172, 32) 0


conv2d (Conv2D) (None, 130, 172, 32) 9216


activation_4 (Activation) (None, 130, 172, 32) 0


pool2 (MaxPooling2D) (None, 65, 86, 32) 0


conv3a (Conv2D) (None, 65, 86, 64) 18432


activation_5 (Activation) (None, 65, 86, 64) 0


conv3b (Conv2D) (None, 65, 86, 64) 36864


activation_6 (Activation) (None, 65, 86, 64) 0


conv3c (Conv2D) (None, 65, 86, 64) 36864


activation_7 (Activation) (None, 65, 86, 64) 0


conv3d (Conv2D) (None, 65, 86, 64) 36864


activation_8 (Activation) (None, 65, 86, 64) 0


conv3_up (Conv2DTranspose) (None, 130, 172, 32) 18432


activation_9 (Activation) (None, 130, 172, 32) 0


conv4a (Conv2D) (None, 130, 172, 32) 9216


activation_10 (Activation) (None, 130, 172, 32) 0


conv4b (Conv2D) (None, 130, 172, 32) 9216


activation_11 (Activation) (None, 130, 172, 32) 0


conv4c (Conv2D) (None, 130, 172, 32) 9216


activation_12 (Activation) (None, 130, 172, 32) 0


conv4d (Conv2D) (None, 130, 172, 32) 9216


activation_13 (Activation) (None, 130, 172, 32) 0


conv4_up (Conv2DTranspose) (None, 260, 344, 16) 4608


activation_14 (Activation) (None, 260, 344, 16) 0


conv5a (Conv2D) (None, 260, 344, 16) 2304


activation_15 (Activation) (None, 260, 344, 16) 0


conv5d (Conv2D) (None, 260, 344, 16) 2304


activation_16 (Activation) (None, 260, 344, 16) 0


pred_cube (Conv2D) (None, 260, 344, 13) 1872


activation_17 (Activation) (None, 260, 344, 13) 0
=================================================================`

The first layer after the input layer is a convolutional layer. But according to Table 3 in the paper, the spatial resolution should be halved by having a max-pooling layer as the first layer. In addtion to this, there should be two convolutional layers between the two max-pooling layers, but for the model in the repo, it has 3. Could you please let me know if did something wrong or I misunderstood something? Or the model in the repo is different from the one described in the paper? Thank you very much.

@WillCheung2016
Copy link

Hi @enrico-c ,

Hope you are doing well. Currently I am writing a paper that will cite your cvpr paper and would like to use the DHP19 dataset you proposed. I hope to make fair comparison between the model you have and the model we are proposing. Your clarification of the model architecture and the performance metrics will be very appreciated. Look forward to your reply.

@enrico-c
Copy link
Contributor

enrico-c commented Nov 6, 2019

Hi @WillCheung2016 ,

The results reported in Table 4 are calculated as the average of all frames in all recordings of the 5 test subjects. It would be great if you could verify the total averages on your side. Also, please make sure to pull the latest version of the repo, for proper behavior of the background filter.

As a note, session5_mov4 is "Circle right hand" with only one moving limb. Static limbs are known to be an issue in the current implementation, particularly for instantaneous predictions.

About the architecture: in Table 3, with layer we refer to Conv + ReLU + (if present) max pooling. E.g., layer 1 is composed of Conv+ReLU+pooling. Hope this clarifies the question.

@ShallowWill
Copy link

Hi, all, did anybody successfully reproduce the results from the paper yet?

@victkid
Copy link
Author

victkid commented Nov 20, 2019

Hi @enrico-c , it looks like my evaluation has implemented correctly. But I still couldn't reproduce your results. Could you provide one of your h5 data? so that I can better target the problem.

@enrico-c
Copy link
Contributor

Hi @victkid ,

Thanks for your comment. Could you please be more specific, what do you mean by "it looks like my evaluation has implemented correctly"?

@victkid
Copy link
Author

victkid commented Nov 22, 2019

Hi @victkid ,

Thanks for your comment. Could you please be more specific, what do you mean by "it looks like my evaluation has implemented correctly"?

Sorry for the confusion. I mean if you could provide one of your h5 data and evaluation code so that I can use the same code to evaluate the model I trained. The h5 data would help me to know if my preprocessing has been implemented correctly. Thank you.

@enrico-c
Copy link
Contributor

enrico-c commented Dec 4, 2019

Hi @victkid ,

please excuse my late reply. The .h5 files generated from the repo are what you need. Could you generate the files with Matlab?

About the evaluation code, I just uploaded a notebook for 2D evaluation, as well as triangulation and 3D evaluation, for a single frame: Eval_2D_triangulation_and_3D_tutorial.ipynb
Hopefully this should help to clarify the remaining doubts about how to evaluate the predictions.

@WillCheung2016
Copy link

Hi @enrico-c,

Thank you very much for your help. And the ipython notebook was very helpful. However, I still have one question. How did you calculate the MPJPE for all the h5 files in table 4. Did you calculate the MPJPE error for each frame of a file, sum them up, and finally divide it by the total number of frames? Or you simply calculated the MPJPE for each h5 file, sum them up, and finally divide it by the total number of h5 files? Or some other methods?

Thank you.

@ruitaoleng
Copy link

Hi @enrico-c @ShallowWill @victkid @WillCheung2016 ,
I also tried to train the network described in the paper, but failed to make it converge. I don't know whether it's because I didn't process the accumulated frames correctly or I misunderstood how the 17-layer network was trained. Could you share your main code for training the network?
Thanks

@enrico-c
Copy link
Contributor

enrico-c commented Mar 2, 2020

Hi @ruitaoleng !
Thanks for your comment. Currently we do not have in plan to release the complete DHP19 training code, however, if you provide specific questions we can help fixing your training setup.
You could open a new issue specifically for the training setup, as this one was originally for reproducing the results in table 4.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants