Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paddings must be non-negative: 0 -7 [[{{node Pad_2}}]] [Op:IteratorGetNext] #28

Open
simpad2409 opened this issue Jul 1, 2021 · 9 comments

Comments

@simpad2409
Copy link

Hello everyone,
I tried to run the finetuning of DETR with the dataset indicated in the tutorial ("hardhat") and it seems to work fine.
Now I have replaced the initial dataset with one of my interest, namely "Crowd Human".
Suddenly, however, I get this error (sometimes it even manages to get to validation step 3 or 4):

`Load weights from weights/detr/detr.ckpt
Model: "detr_finetuning"


Layer (type) Output Shape Param # Connected to

input_2 (InputLayer) [(None, None, None, 0


detr (Functional) (6, None, 100, 256) 41449152 input_2[0][0]


pos_layer (Sequential) (6, None, 100, 4) 132612 detr[0][0]


cls_layer (Dense) (6, None, 100, 2) 514 detr[0][0]


tf_op_layer_strided_slice_6 (Te [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_5 (Te [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_8 (Te [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_7 (Te [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_10 (T [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_9 (Te [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_12 (T [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_11 (T [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_14 (T [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_13 (T [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_4 (Te [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_3 (Te [(None, 100, 2)] 0 cls_layer[0][0]

Total params: 41,582,278
Trainable params: 41,476,038
Non-trainable params: 106,240


/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/loss/compute_map.py:101: RuntimeWarning: invalid value encountered in true_divide
overlaps = intersections / union
Validation step: [0], ce: [0.78] giou : [1.13] l1 : [0.93] time : [0.00]
Traceback (most recent call last):
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2102, in execution_mode
yield
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 758, in _next_internal
output_shapes=self._flat_output_shapes)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2610, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7
[[{{node Pad_2}}]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "finetune_crowd_human.py", line 99, in
run_finetuning(config)
File "finetune_crowd_human.py", line 79, in run_finetuning
training.eval(detr, valid_dt, config, class_names, evaluation_step=100)
File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/training.py", line 72, in eval
for val_step, (images, t_bbox, t_class) in enumerate(valid_dt):
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 736, in next
return self.next()
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 772, in next
return self._next_internal()
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 764, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/usr/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2105, in execution_mode
executor_new.wait()
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7
[[{{node Pad_2}}]]
Traceback (most recent call last):
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2102, in execution_mode
yield
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 758, in _next_internal
output_shapes=self._flat_output_shapes)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2610, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7
[[{{node Pad_2}}]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "finetune_crowd_human.py", line 99, in
run_finetuning(config)
File "finetune_crowd_human.py", line 79, in run_finetuning
training.eval(detr, valid_dt, config, class_names, evaluation_step=100)
File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/training.py", line 72, in eval
for val_step, (images, t_bbox, t_class) in enumerate(valid_dt):
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 736, in next
return self.next()
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 772, in next
return self._next_internal()
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 764, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/usr/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2105, in execution_mode
executor_new.wait()
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7
[[{{node Pad_2}}]]
`

What does it mean? and how can I solve the problem?

Thank you so much in advance. Greetings, Simone.

@PhanTask
Copy link
Contributor

PhanTask commented Jul 1, 2021

This normally means some samples from your own dataset have more than 100 objects (in this case, 107. That's why it tells you -7 because you are trying to pad the objects from 107 back to 100).

One solution would be increasing both the number of object queries and the padding number, which are defined in DETR class and pad_labels function, respectively.

@simpad2409
Copy link
Author

@PhanTask Thanks so much for the answer... Yes, apparently the problem seems to be just that! I tried to change the parameters you said, but I get this error:

Load weights from weights/detr/detr.ckpt
Traceback (most recent call last):
File "finetune_crowd_human.py", line 99, in
run_finetuning(config)
File "finetune_crowd_human.py", line 49, in run_finetuning
detr = build_model(config)
File "finetune_crowd_human.py", line 41, in build_model
detr = get_detr_model(config, include_top=False, nb_class=2, weights="detr", num_decoder_layers=6, num_encoder_layers=6)
File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/detr.py", line 175, in get_detr_model
hs = transformer(input_proj(x), masks, query_embed(None), pos_encoding)[0]
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 982, in call
self._maybe_build(inputs)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2643, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/custom_layers.py", line 64, in build
initializer=tf.keras.initializers.GlorotUniform(), trainable=True)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 614, in add_weight
caching_device=caching_device)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 724, in _add_variable_with_custom_getter
name=name, shape=shape)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 791, in _preload_simple_restoration
checkpoint_position=checkpoint_position, shape=shape)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 75, in init
self.wrapped_value.set_shape(shape)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1209, in set_shape
(self.shape, shape))
ValueError: Tensor's shape (100, 256) is not compatible with supplied shape (500, 256)
Traceback (most recent call last):
File "finetune_crowd_human.py", line 99, in
run_finetuning(config)
File "finetune_crowd_human.py", line 49, in run_finetuning
detr = build_model(config)
File "finetune_crowd_human.py", line 41, in build_model
detr = get_detr_model(config, include_top=False, nb_class=2, weights="detr", num_decoder_layers=6, num_encoder_layers=6)
File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/detr.py", line 175, in get_detr_model
hs = transformer(input_proj(x), masks, query_embed(None), pos_encoding)[0]
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 982, in call
self._maybe_build(inputs)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2643, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/custom_layers.py", line 64, in build
initializer=tf.keras.initializers.GlorotUniform(), trainable=True)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 614, in add_weight
caching_device=caching_device)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 724, in _add_variable_with_custom_getter
name=name, shape=shape)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 791, in _preload_simple_restoration
checkpoint_position=checkpoint_position, shape=shape)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 75, in init
self.wrapped_value.set_shape(shape)
File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1209, in set_shape
(self.shape, shape))
ValueError: Tensor's shape (100, 256) is not compatible with supplied shape (500, 256)

It seems we need to change something else too...

I take this opportunity to ask you another question, given your gentleness: once finetuning is done in this way, where can I find the "retrained" model, so that I can use it?

Thanks in advance for your availability!

@PhanTask
Copy link
Contributor

PhanTask commented Jul 3, 2021

@simpad2409 I saw you changed the object query number from 100 to 500. Note that if you do so, the structure of the transformer changed and you cannot load the pretrained DETR weight anymore (trained for 100 queries, thereby causing shape mismatch issue), which means you cannot use the provided DETR weight for fine-tuning directly.
This means, in the code detr = get_detr_model(config, include_top=False, nb_class=2, weights="detr", num_decoder_layers=6, num_encoder_layers=6), weights="detr" should be weights=None.

However, you may still preload weights for some specific layers that are not affected by object query numbers such as the ResNet backbone part. You may extract this part of the weight and preload it for your backbone.

For saving and loading funetune model, you may use model.save_weights() and model.load_weights() functions.

@thibo73800
Copy link
Contributor

If you really want to be able to handle more than 100 objects the best move might be to extend the num_queries while keeping the weights of the first 100 pretrained queries.

It might be a good ideas to look at the distribution of the number of obejcts per image for the Crowd Human dataset, to decide on the right things to do.

@simpad2409
Copy link
Author

@PhanTask @thibo73800 Thanks to both of you for answering me. You are really kind! Below is a graph showing the number of people for each image (and therefore gt) of the CrowdHuman dataset.
What do you think? Is it possible to increase the number of queries? If so, how?

Thanks again to both of you!

Schermata 2021-07-05 alle 12 28 25

@thibo73800
Copy link
Contributor

Based on this histogram, the mean might rouhgly around 30, while the median should be around 20. So the question is, do you want your model to be able to predict more than 100 people (even if this is really not often the case) ? Or do you want your model to always detect the 25/50th most obvious people ?

@simpad2409
Copy link
Author

@thibo73800 I would like to create a detector that is able to detect as many people as possible (correctly) within a scene, in order to estimate the distance between them and to create a system for social distancing on the subject of Covid-19.
I don't know if I got the idea ...

@PhanTask
Copy link
Contributor

PhanTask commented Jul 5, 2021

@simpad2409 My feeling is that the detection results with more than 100 people in one view, considering the image resolution, are not very reliable. Even COCO metrics only calculate the accuracy of the top 100 detection results. I would suggest you either skip these samples with more than 100 people (treated as outliers) or only keep the top 100 most visually salient objects in these samples.
A third option would be patching your data into smaller patches that contain fewer objects if you really want to detect more than 100 people in one sample.

@simpad2409
Copy link
Author

@PhanTask @thibo73800 Thanks for the reply. :)
I followed the advice to filter images with at most 100 people inside. THANK YOU!
I have another question ... I started finetuning on my CrowdHuman dataset to improve detection on people in DETR. I have a validation set of about 3000 images; should I pass it all on? I do like this:

valid_dt, _ = load_tfcsv_dataset(
config, 3000, augmentation=False, ann_file="test/_annotations.csv", img_dir="test")

(I noticed, however, that at each validation ALL 3000 images are viewed from the network... and not just a few at a time, different for each validation at the end of each epoch).

Am I doing something wrong?
What can you advise me to do a "good" finetuning for the detection of people? Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants