Cannot reproduce MPJPE results in Table 4 #5

victkid · 2019-09-23T19:30:18Z

Hi,

I'm trying to calculate the MPJPE for camera 2 with none confidence threshold. In the paper, the MPJPE for camera 2 is 7.72, but I got a number around 33. Followings are my approach, please guide where I did it wrong.

Step 1, use inner linear interpolation method to reshape test images from 260x346 to 260x344

Question:
The image produced from Matlab has the shape 260x346, but the input shape to the neural network is 260x344, how exactly do you cut the 2 pixels?

Step 2, load nn weights from DHP_CNN.model

Step 3, generate prediction outputs from test image set. The outputs have the shape (n_samples, 260, 344, n_joints).

Step 4, inner linear interpolation method to reshape the prediction outputs to (n_samples, 260, 346, n_joints) to match with the ground truth data.

Step 5, calculate the pairwise wise euclidean distance for each joint and each sample and then take the average.

The text was updated successfully, but these errors were encountered:

ShallowWill · 2019-09-26T10:05:14Z

I meet the same problem. @enrico-c @tobidelbruck

I am not familiar with Keras, so I did not load the '*.model' file provided by authors. But I used the Stacked Hourglass Networks, which I think is more advanced than authors' model, to reproduce results in Tables 4. When testing on data from Camera 2, the MPJPE is around 36 that is much higher than the reported 7.72. I nearly give up working on this dataset. The authors already achieved good results that I cannot surpass.

victkid · 2019-09-26T15:36:41Z

Hi ShallowWill,

I tried different Networks as well. The best-performed ones have the MPJPE around 35 ish. I draw the outputs on the images, they look similar to the benchmark outputs as well. I think it's highly possible that we calculate the MPJPE differently than the authors.
@enrico-c @tobidelbruck Please provide some help here.

enrico-c · 2019-09-27T15:56:47Z

Hi,
Thanks for your question.
It is easier to help with the issue if you provide a code example with your procedure.
As for your approach @victkid , in our case we cropped the two rightmost pixel columns after generating the groundtruth heatmaps.

I would not expect a CNN trained on RGB frames to perform well on DVS frames without any fine tuning

victkid · 2019-09-30T18:41:38Z

@enrico-c Thank you for your reply. Following is the code for calculating the MPJPE. The MPJPE I get is 30.84.

model_name = "./data/weights/benchmark/DHP_CNN.model"
model = keras.models.load_model(model_name)

test_data = "/media/data/DHP19/train_test_data/x_test_2.npy" # test data path for camera 2
test_label = "/media/data/DHP19/train_test_data/y_test_2.npy" # test label path for camera 2
x_test = np.load(test_data)  # shape: (n_samples, 260, 346) n_samples = 23583
y_test = np.load(test_label)  # shape: (n_samples, n_joints, 2)
n_samples = len(x_test)
n_joints = 13

x_test = x_test[:, :, :-2]  # shape: (n_samples, 260, 344)
y_pred = np.zeros((n_samples, n_joints))
for i in range(len(x_test)):
    y_i = model.predict(x_test[i][np.newaxis, ..., np.newaxis])[0]  # shape: (260, 344, 13)
    y_i_reshape = np.reshape(y_i, [-1, n_joints])  # shape: (260 * 344, 13)
    pixel_index = np.unravel_index(np.argmax(y_i_reshape, axis=0), (260, 344))  # get the pixel locations for the maximum value from the output
    y_pred_i = np.array([pixel_index[1], pixel_index[0]]).transpose()  # shape: (13, 2)
    dist = paired_distances(y_pred_i, y_test[i])  # euclidean distance for each joint. shape: (13, )
    y_pred[i] = dist

mean_dist = np.mean(y_pred)
print("mean dist: ", mean_dist)

ShallowWill · 2019-10-03T08:49:37Z

Hi, @victkid , is paired_distances() function embedded in the '.model' file? how do you define this function? Could you please give some code about this function?

victkid · 2019-10-03T13:51:38Z

Hi @ShallowWill, sorry for the confusion. paired_distances() is a function provided from sklearn. You can import it by using
from sklearn.metrics.pairwise import paired_distances
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.paired_distances.html

enrico-c · 2019-10-04T08:33:54Z

Hi, I cannot run the code as it is, because I don't have your .npy arrays. But I tried your code on one of the .h5 files and I obtained the same result as with my own code.

Also, the results I obtained from the repo for all the samples are in line with the results in table 4 (*)(see bottom). So I think the issue is in the way the .npy arrays are created.

These are the things I would suggest:

make sure the 3D labels are flipped along the v direction (vertical) when projecting in 2D (as done in the heatmap generation example)
make sure to calculate MPJPE for cameras 2/3 only
the channel index in the generated .h5 file is not equivalent to the camera index (even if for camera 2 the index is 3, and for camera 3 the index is 2, so this should not be the problem)
make sure to keep the same order of u,v pixels in y_test and y_pred_i

If you provide a complete example to run on one of the .h5 generated files (frames + 3D labels) I can have a closer look at your code.

(*): There is actually a main difference in the code in the repo and the one used to generate the numbers in table 4, that is, in the background filter I had included the central pixel. This in general is not wanted, it is a mistake I made and I am committing a fix for it.
However, the 2D MPJPE results I obtained for all test samples are about 8 pixel even with this bug in the background filter.

WillCheung2016 · 2019-10-25T21:46:30Z

Hi @enrico-c ,

I have the same problem as @victkid 's. I made a sample python program to reproduce the result. I tried it on the keras model file DHP_CNN.model provided using S12_session5_mov4_7500events for test. Please see the codes below.

`
import numpy as np
import glob
from keras.models import load_model
import keras.backend as K

def mse2D(y_true, y_pred):
mean_over_ch = K.mean(K.square(y_pred - y_true), axis=-1)
mean_over_w = K.mean(mean_over_ch, axis=-1)
mean_over_h = K.mean(mean_over_w, axis=-1)
return mean_over_h

def compute_error(images, labels, model):
'''
This function computes the MPJPE error
:param images: DHP image data stored as a numpy array of shape [number_of_frames, height, width]
:param labels: DHP joint coordinates data stored as a numpy array of shape [number of frames, number_of_joints, 2]
:param model: keras model object
:return: MPJPE error
'''
num_frames = images.shape[0]
pred_pts_mtx = np.zeros(shape=(num_frames, 13, 2))
for i in range(num_frames):
image = images[i, :, :-2]
heatmaps = model.predict(image[np.newaxis, :, :, np.newaxis])
heatmaps = heatmaps[0, :, :, :]
pts = np.zeros(shape=(13, 2))
for j in range(13):
hm = heatmaps[:, :, j]
score = hm.max() # Find the maximum score which indicates the predicted joint location
pt_y, pt_x = np.where(hm == score) # Find the corresponding coordinates
pt_y = pt_y[0]
pt_x = pt_x[0]
pts[j, 0] = pt_x
pts[j, 1] = pt_y
pred_pts_mtx[i, :, :] = pts
difference_mtx = pred_pts_mtx - labels.astype('float32')
error = np.sum(np.sqrt(np.sum(np.square(difference_mtx), axis=2))) / (13. * num_frames)
print('MPJPE error: ', error)
return error

model=load_model('DHP_CNN.model', custom_objects={'mse2D': mse2D})
model.summary()
cam_ind = 2
chosen_image_file = 'S12_session5_mov4_7500events_img.npy'
chosen_label_file = 'S12_session5_mov4_7500events_uv.npy'
images = np.load(chosen_image_file)[:, :, :, cam_ind-1]
labels = np.load(chosen_label_file)[:, :, :, cam_ind-1]
error = compute_error(images, labels, model)
`
I got MPJPE error of 41.115 for the test file. Since it was tested on just one file the error is not a reflection of the overall model performance. However it is still much larger than 8. Could you use my code to test the models you have? If I made mistakes in my code, please let me know.

Thank you very much.
Shengdong

enrico-c · 2019-10-26T14:05:53Z

Hi @WillCheung2016 ,
As in the previous case, it is not specified how your numpy files are generated:

chosen_image_file = 'S12_session5_mov4_7500events_img.npy'
chosen_label_file = 'S12_session5_mov4_7500events_uv.npy'

I do not have these files, so I cannot run your code.
I can help to solve the issue if you provide an example using the h5 files that are generated by the code of this repo

WillCheung2016 · 2019-10-28T16:19:33Z

Hi @WillCheung2016 ,
As in the previous case, it is not specified how your numpy files are generated:
chosen_image_file = 'S12_session5_mov4_7500events_img.npy'
chosen_label_file = 'S12_session5_mov4_7500events_uv.npy'
I do not have these files, so I cannot run your code.
I can help to solve the issue if you provide an example using the h5 files that are generated by the code of this repo

Hi @enrico-c ,

Apologies. I forgot to attach the .npy files. Please find them here in the zip file.

data_files.zip

I just read data from .h5 files and saved them as .npy files. Could you run the model you provided on the data and compute the error with your error metrics? If you end up with the same number so I know that the way I computed errors was consistent with yours. Thanks a lot.

enrico-c · 2019-10-28T18:34:56Z

Hi @WillCheung2016 ,

thanks for clarifying, I tried and I get a similar result.
The reason is that you are using cam_ind=2 and loading channel cam_ind-1. That view is one of the two side views, while the provided model was trained on the 2 front views only.

WillCheung2016 · 2019-10-29T22:31:37Z

Hi @enrico-c ,

Thank you very much for confirmation. I repeated the experiment using the keras model downloaded from the repo on the data files generated by the matlab scripts. The groundtruth joint locations were computed by the codes in the ipython notebook for heatmap generation. This time I used the data from camera 3.

The top one shows the joint predictions of the model, and the one below shows the groundtruth joint locations. The MPJPE error is 16.035, which is much better than the previous number, but still relatively large. I attach the h5 files here.

h5_data.zip

Also, I have a question about the CNN model architecture. Below is the architecture of the keras model in the repo:

`
Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 260, 344, 1) 0

conv1 (Conv2D) (None, 260, 344, 16) 144

activation_1 (Activation) (None, 260, 344, 16) 0

pool1 (MaxPooling2D) (None, 130, 172, 16) 0

conv2a (Conv2D) (None, 130, 172, 32) 4608

activation_2 (Activation) (None, 130, 172, 32) 0

conv2b (Conv2D) (None, 130, 172, 32) 9216

activation_3 (Activation) (None, 130, 172, 32) 0

conv2d (Conv2D) (None, 130, 172, 32) 9216

activation_4 (Activation) (None, 130, 172, 32) 0

pool2 (MaxPooling2D) (None, 65, 86, 32) 0

conv3a (Conv2D) (None, 65, 86, 64) 18432

activation_5 (Activation) (None, 65, 86, 64) 0

conv3b (Conv2D) (None, 65, 86, 64) 36864

activation_6 (Activation) (None, 65, 86, 64) 0

conv3c (Conv2D) (None, 65, 86, 64) 36864

activation_7 (Activation) (None, 65, 86, 64) 0

conv3d (Conv2D) (None, 65, 86, 64) 36864

activation_8 (Activation) (None, 65, 86, 64) 0

conv3_up (Conv2DTranspose) (None, 130, 172, 32) 18432

activation_9 (Activation) (None, 130, 172, 32) 0

conv4a (Conv2D) (None, 130, 172, 32) 9216

activation_10 (Activation) (None, 130, 172, 32) 0

conv4b (Conv2D) (None, 130, 172, 32) 9216

activation_11 (Activation) (None, 130, 172, 32) 0

conv4c (Conv2D) (None, 130, 172, 32) 9216

activation_12 (Activation) (None, 130, 172, 32) 0

conv4d (Conv2D) (None, 130, 172, 32) 9216

activation_13 (Activation) (None, 130, 172, 32) 0

conv4_up (Conv2DTranspose) (None, 260, 344, 16) 4608

activation_14 (Activation) (None, 260, 344, 16) 0

conv5a (Conv2D) (None, 260, 344, 16) 2304

activation_15 (Activation) (None, 260, 344, 16) 0

conv5d (Conv2D) (None, 260, 344, 16) 2304

activation_16 (Activation) (None, 260, 344, 16) 0

pred_cube (Conv2D) (None, 260, 344, 13) 1872

activation_17 (Activation) (None, 260, 344, 13) 0
=================================================================`

The first layer after the input layer is a convolutional layer. But according to Table 3 in the paper, the spatial resolution should be halved by having a max-pooling layer as the first layer. In addtion to this, there should be two convolutional layers between the two max-pooling layers, but for the model in the repo, it has 3. Could you please let me know if did something wrong or I misunderstood something? Or the model in the repo is different from the one described in the paper? Thank you very much.

WillCheung2016 · 2019-11-05T21:37:42Z

Hi @enrico-c ,

Hope you are doing well. Currently I am writing a paper that will cite your cvpr paper and would like to use the DHP19 dataset you proposed. I hope to make fair comparison between the model you have and the model we are proposing. Your clarification of the model architecture and the performance metrics will be very appreciated. Look forward to your reply.

enrico-c · 2019-11-06T12:52:18Z

Hi @WillCheung2016 ,

The results reported in Table 4 are calculated as the average of all frames in all recordings of the 5 test subjects. It would be great if you could verify the total averages on your side. Also, please make sure to pull the latest version of the repo, for proper behavior of the background filter.

As a note, session5_mov4 is "Circle right hand" with only one moving limb. Static limbs are known to be an issue in the current implementation, particularly for instantaneous predictions.

About the architecture: in Table 3, with layer we refer to Conv + ReLU + (if present) max pooling. E.g., layer 1 is composed of Conv+ReLU+pooling. Hope this clarifies the question.

ShallowWill · 2019-11-20T09:41:26Z

Hi, all, did anybody successfully reproduce the results from the paper yet?

victkid · 2019-11-20T16:49:51Z

Hi @enrico-c , it looks like my evaluation has implemented correctly. But I still couldn't reproduce your results. Could you provide one of your h5 data? so that I can better target the problem.

enrico-c · 2019-11-22T11:59:58Z

Hi @victkid ,

Thanks for your comment. Could you please be more specific, what do you mean by "it looks like my evaluation has implemented correctly"?

victkid · 2019-11-22T19:26:47Z

Hi @victkid ,

Thanks for your comment. Could you please be more specific, what do you mean by "it looks like my evaluation has implemented correctly"?

Sorry for the confusion. I mean if you could provide one of your h5 data and evaluation code so that I can use the same code to evaluate the model I trained. The h5 data would help me to know if my preprocessing has been implemented correctly. Thank you.

enrico-c · 2019-12-04T14:50:06Z

Hi @victkid ,

please excuse my late reply. The .h5 files generated from the repo are what you need. Could you generate the files with Matlab?

About the evaluation code, I just uploaded a notebook for 2D evaluation, as well as triangulation and 3D evaluation, for a single frame: Eval_2D_triangulation_and_3D_tutorial.ipynb
Hopefully this should help to clarify the remaining doubts about how to evaluate the predictions.

WillCheung2016 · 2019-12-13T22:53:57Z

Hi @enrico-c,

Thank you very much for your help. And the ipython notebook was very helpful. However, I still have one question. How did you calculate the MPJPE for all the h5 files in table 4. Did you calculate the MPJPE error for each frame of a file, sum them up, and finally divide it by the total number of frames? Or you simply calculated the MPJPE for each h5 file, sum them up, and finally divide it by the total number of h5 files? Or some other methods?

Thank you.

ruitaoleng · 2020-03-01T10:50:48Z

Hi @enrico-c @ShallowWill @victkid @WillCheung2016 ,
I also tried to train the network described in the paper, but failed to make it converge. I don't know whether it's because I didn't process the accumulated frames correctly or I misunderstood how the 17-layer network was trained. Could you share your main code for training the network?
Thanks

enrico-c · 2020-03-02T17:03:14Z

Hi @ruitaoleng !
Thanks for your comment. Currently we do not have in plan to release the complete DHP19 training code, however, if you provide specific questions we can help fixing your training setup.
You could open a new issue specifically for the training setup, as this one was originally for reproducing the results in table 4.
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce MPJPE results in Table 4 #5

Cannot reproduce MPJPE results in Table 4 #5

victkid commented Sep 23, 2019

ShallowWill commented Sep 26, 2019

victkid commented Sep 26, 2019 •

edited

Loading

enrico-c commented Sep 27, 2019

victkid commented Sep 30, 2019 •

edited

Loading

ShallowWill commented Oct 3, 2019

victkid commented Oct 3, 2019

enrico-c commented Oct 4, 2019

WillCheung2016 commented Oct 25, 2019 •

edited

Loading

enrico-c commented Oct 26, 2019

WillCheung2016 commented Oct 28, 2019

enrico-c commented Oct 28, 2019

WillCheung2016 commented Oct 29, 2019

WillCheung2016 commented Nov 5, 2019

enrico-c commented Nov 6, 2019

ShallowWill commented Nov 20, 2019

victkid commented Nov 20, 2019

enrico-c commented Nov 22, 2019

victkid commented Nov 22, 2019

enrico-c commented Dec 4, 2019

WillCheung2016 commented Dec 13, 2019

ruitaoleng commented Mar 1, 2020

enrico-c commented Mar 2, 2020

Cannot reproduce MPJPE results in Table 4 #5

Cannot reproduce MPJPE results in Table 4 #5

Comments

victkid commented Sep 23, 2019

ShallowWill commented Sep 26, 2019

victkid commented Sep 26, 2019 • edited Loading

enrico-c commented Sep 27, 2019

victkid commented Sep 30, 2019 • edited Loading

ShallowWill commented Oct 3, 2019

victkid commented Oct 3, 2019

enrico-c commented Oct 4, 2019

WillCheung2016 commented Oct 25, 2019 • edited Loading

enrico-c commented Oct 26, 2019

WillCheung2016 commented Oct 28, 2019

enrico-c commented Oct 28, 2019

WillCheung2016 commented Oct 29, 2019

` Layer (type) Output Shape Param #

WillCheung2016 commented Nov 5, 2019

enrico-c commented Nov 6, 2019

ShallowWill commented Nov 20, 2019

victkid commented Nov 20, 2019

enrico-c commented Nov 22, 2019

victkid commented Nov 22, 2019

enrico-c commented Dec 4, 2019

WillCheung2016 commented Dec 13, 2019

ruitaoleng commented Mar 1, 2020

enrico-c commented Mar 2, 2020

victkid commented Sep 26, 2019 •

edited

Loading

victkid commented Sep 30, 2019 •

edited

Loading

WillCheung2016 commented Oct 25, 2019 •

edited

Loading

`
Layer (type) Output Shape Param #