Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible shapes of loaded weights and Model Layer #26 (named "rpn_out_class") for VGG while testing #7

Open
Avani1994 opened this issue Jul 12, 2020 · 5 comments

Comments

@Avani1994
Copy link

Hi,
I was trying to use trained weights from mode_path, but I am getting follwing error when I am trying to load weights:

ValueError: Layer #26 (named "rpn_out_class"), weight <tf.Variable 'rpn_out_class_6/kernel:0' shape=(1, 1, 512, 1) dtype=float32> has shape (1, 1, 512, 1), but the saved weight has shape (9, 512, 1, 1).

Can you please help debug the issue? Am stuck at this place!

My parameters are as follows:

anchor_box_scales=[64, 128, 256] or [128, 256, 512]
anchor_box_ratios=[[1, 1], [1./math.sqrt(2), 2./math.sqrt(2)], [2./math.sqrt(2), 1./math.sqrt(2)]],
num_rois = 256
im_size = 300
num_anchors = len(anchor_box_scales) * len(anchor_box_ratios)

@Avani1994
Copy link
Author

Avani1994 commented Jul 12, 2020

I was able to get testing running by putting in default parameters, however it seems likes network did not learn anything and is giving random wrong results. I am trying to detect emojis in whatsapp chat image. Do you think FRCNN won't work for small object detection or should i change something in training. I reduced image size by half but forgot to reduce anchor scales. Do you think that might be the cause? And I should try retraining with original image size = 600 and default anchor scales. If not can you please suggest something else that might be good fit?

@eleow
Copy link
Owner

eleow commented Jul 13, 2020

I was able to get testing running by putting in default parameters, however it seems likes network did not learn anything and is giving random wrong results. I am trying to detect emojis in whatsapp chat image. Do you think FRCNN won't work for small object detection or should i change something in training. I reduced image size by half but forgot to reduce anchor scales. Do you think that might be the cause? And I should try retraining with original image size = 600 and default anchor scales. If not can you please suggest something else that might be good fit?

Yes, each model that is trained will be specific to the parameters that were used in training. Hmm.. I would think, with emoticons in chat images, you could maybe preprocess the images based on color or edge detection algorithms and other rules to form a mask. Then if the results are good, you might not need FRCNN at all. Otherwise, the preprocessed image could then be fed to FRCNN. What are your classes like? Are you trying to detect and classify the different emoticons? Or just the presence of emoticons? How are you labelling the emoticons currently?

@Avani1994
Copy link
Author

Avani1994 commented Jul 13, 2020

Hey thanks, @eleow. You mean that just using Image processing rather than deep learning to detect emojis? I think that would help in detection but it would be really hard to classify emoticons. I am trying to both detect and classify emoticons. Currently I have around 88 classes, I am using unicodes of emoji's as class and have following classes:
Total classes: 88
[('bluetick', 6891), ('🤔', 767), ('😭', 748), ('😠', 733), ('😋', 720), ('😚', 717), ('😃', 712), ('😇', 701), ('👍', 415), ('😒', 411), ('\U0001f92a', 409), ('😆', 408), ('😄', 404), ('👎', 404), ('😅', 396), ('😦', 393), ('😓', 393), ('😎', 392), ('\U0001f97a', 390), ('😩', 390), ('😙', 387), ('😁', 385), ('🤗', 384), ('😕', 383), ('😔', 383), ('\U0001f91f', 382), ('😉', 381), ('😢', 379), ('😮', 378), ('😖', 378), ('😴', 377), ('😌', 377), ('😣', 376), ('😑', 376), ('🤕', 374), ('😧', 373), ('👌', 373), ('🤢', 372), ('🤓', 372), ('😂', 372), ('🤞', 370), ('😨', 370), ('✌️', 370), ('😰', 369), ('😗', 369), ('\U0001f92d', 368), ('😱', 368), ('😞', 366), ('🤥', 364), ('☺️', 364), ('😐', 363), ('😟', 361), ('🙁', 360), ('😊', 360), ('🤒', 358), ('😀', 358), ('\U0001f928', 357), ('😶', 353), ('😷', 352), ('☹', 351), ('🤤', 350), ('😲', 350), ('😫', 350), ('\U0001f929', 348), ('😥', 348), ('🤧', 346), ('😝', 346), ('👊', 346), ('😡', 345), ('\U0001f973', 344), ('😛', 344), ('😏', 344), ('\U0001f92e', 343), ('🙂', 343), ('😬', 343), ('🤣', 341), ('🙄', 341), ('\U0001f975', 340), ('\U0001f970', 340), ('\U0001f92b', 339), ('😜', 337), ('🤜', 328), ('😪', 328), ('🙃', 327), ('😍', 325), ('🤑', 323), ('😘', 323), ('😯', 320)]
Have not included all emojis in the classes, but in future I plan on extending this list and also include facebook / insta emoticons, but at least need a base to proceed forward.

I am generating training data by myself, currently I have 1300 training images and 200 test images. Can generate more if needed. the only special case here is that I have same size object detection (24,24) and class 'bluetick' is (24,15) but they are very small size. After, successfully training the model on this synthetic dataset, I expect Model to be accurate on real chat Images.
I dont know what should be the best thing/approach I should try for this usecase. You think deep learning won't give good results?

Let me attach a sample image and its annotations:

  • Image0 is one of the images generated omitting the bounding box, and is used for training
  • Image1 is annotated with bounding box for your reference

I am not able to upload annotations here as csv format is not supported only images are supported. But you can get idea that coordinates will be the coordinates of rectangles drawn around each emoticon. For "blue tick" I am just giving class name as 'bluetick'

image0:
image0
image1:
image1

Your suggestions will be really helpful as this problem seems open ended for me and I could not narrow down the approaches I can take.

@eleow
Copy link
Owner

eleow commented Jul 13, 2020

Well, in my opinion, deep learning might not be the best approach. You see, in your training set, for each class eg ('😭', 748), all images would basically be the same right? If you could guarantee that the emoticon size will be constant, then you might as well perform some form of pixel matching/similarity vector, using boxes of 24x24 pixels,, and search the image? To be more efficient, I would get bounding boxes for the message content areas (white rectangles and green rectangles), and search within those areas only.

Alternatively, if you have to use deep learning, then just classify all emoticons as a single class. Then for each detected emoticon, classify it using pixel matching or similarity vector, etc.

@Avani1994
Copy link
Author

Hmm yeah makes sense, but emoji size might not be constant in the real chat images, It might be lil big but yeah as you see in chat images it won't be too big but won't be constant as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants