-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add OCR Decoding support - WIP #113
base: dev
Are you sure you want to change the base?
Conversation
class_mapping: Dict[str, int], | ||
**_, | ||
) -> np.ndarray: | ||
text_labels = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text_labels
can be already set to np.zeros((len(annotations), ann.max_len))
so there's no chance to return None
@@ -174,6 +174,7 @@ def _load_image_with_annotations(self, idx: int) -> Tuple[np.ndarray, Labels]: | |||
|
|||
uuid = self.instances[idx] | |||
df = self.df.loc[uuid] | |||
print(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgotten print
|
||
|
||
def validate_text_value( | ||
value: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation of value
seems to be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, otherwise looks good.
@type is_train: bool | ||
""" | ||
super(OCRAugmentation, self).__init__() | ||
self.transforms = A.Compose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a set of some standard augmentations that are usually performed for OCR task or how is this defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious on this
], | ||
p=0.2 | ||
), | ||
A.Compose( # resize to image_size with aspect ratio, pad if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resize and Normalize are already part of the default augmentations. Resize is always done (you can control if it keeps aspect ratio or not) and Normalize is also appended to list of augmentations (if used by luxonis-train, can be deactivated through config though). So is this needed here?
@param is_train: True if image is train. False if image is val/test. | ||
@type is_train: bool | ||
""" | ||
super(OCRAugmentation, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Le'ts keep it just super().__init__()
. The arguments in super
are a relic from python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next change will be adding kpt label to the TEXT LabelType, even though the goal is only OCR recognition, on the data side it makes since to create the annotation/LabelType from the begining to support kpt annotations
What is meant by this? We have the LabelType.KEYPOINTS
already. We also plan to support nested annotations, so I think the final form for OCR + keypoints would be TEXT and KEYPOINTS nested within a BOUNDINGBOX
def set_global_metadata(self, metadata: Dict[str, Any]) -> None: | ||
self.global_metadata = metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to GCS datasets, I think we need a way to persist this via storage instead of just memory? Perhaps we could use the existing datasets.json
or metadata
folder?
This is a WIP to add CTC OCR recognition/decoding
Conformity to contribution guidelines will be fixed before closing
Next change will be adding kpt label to the TEXT LabelType, even though the goal is only OCR recognition, on the data side it makes since to create the annotation/LabelType from the begining to support kpt annotations