Skip to content
This repository has been archived by the owner on Oct 7, 2024. It is now read-only.

Questions about training pipeline #30

Open
tholmb opened this issue Nov 6, 2020 · 2 comments
Open

Questions about training pipeline #30

tholmb opened this issue Nov 6, 2020 · 2 comments

Comments

@tholmb
Copy link

tholmb commented Nov 6, 2020

First of all, thanks for sharing the great project! I have tried to implement your mobilePydnet network but cannot reach totally to the same results compared with pre-trained model. For that reason I have several questions about the model, loss, data and training itself.

  1. Did you initialize weights and biases by using some particular initialization strategy or did you just use the default initialization of convolution layers?

  2. Did you use any data augmentation like flipping, rotating, random cropping or blurring?

  3. You told here in the issues section that your range of input and output images are [0,255]. Does it mean that in the training when you load input image and ground truth as float32, you don't normalize them for example by dividing 255 to range [0,1]?

  4. The loss is described in the paper as: , where is fixed to 1 and goes from 0.5, 0.25, 0.125 (if I understood correctly you just used 3 different scales). Here is the python code for calculating the loss but I'm not sure if I am missing something:
    Screenshot from 2020-11-06 14-48-46

@tholmb
Copy link
Author

tholmb commented Nov 16, 2020

I found the answer to the question 1. from the provided codes. So, convolutional kernels are initialized with xavier initialization and biases with trundated normal initialization (mean 0.0 and std 1.0).

# Conv2D
weights = tf.get_variable(
    "weights",
    kernel_shape,
    initializer=tf.contrib.layers.xavier_initializer(),
    dtype=tf.float32,
)
biases = tf.get_variable(
    "biases",
    bias_shape,
    initializer=tf.truncated_normal_initializer(),
    dtype=tf.float32,

However I have a new question about the network architecture.

  1. In the original Pydnet get_disp extracts depth map by means of a sigmoid operator but in your network sigmoids are replaced by convolutions which outputs 1 channel. Does it really goes like this?

@sieme97
Copy link

sieme97 commented Dec 8, 2020

Hi, how did your training go?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants