Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scripts for metrics computation #443

Merged
merged 9 commits into from
Dec 5, 2024
Merged

Conversation

jonchang
Copy link
Collaborator

@jonchang jonchang commented Nov 27, 2024

Description

Various updates needed to compute benchmark and metrics comparisons between tesseract and tr-ocr backends. In particular:

  • Enabled comparing tesseract to tr-ocr for testing timing-based benchmarking
  • Made the accuracy-based benchmarking actually run by fixing relative imports for the non-package install requirement of the /ocr folder
  • Fix types of segments returned from segmentation functions (needs uint8, not int64)
  • Use case-insensitive distance comparisons for metrics (tesseract can transcribe both caps and lowercase, unlike tr-ocr)
  • Add tesseract options to the cli interface for accuracy-based benchmarking and update documentation accordingly

Rendered link for documentation updates

Related Issues

Part of #422

Checklist

  • The title of this PR is descriptive and concise.
  • My changes follow the style guidelines of this project.
  • I have added or updated test cases to cover my changes.
  • I've let the team know about this PR by linking it in the review channel

@jonchang jonchang force-pushed the metrics-script-updates branch from 270507c to 590802e Compare December 3, 2024 20:42
* add a nicer interface for turning on / off various run settings
* enable tesseract option for benchmarking
@jonchang jonchang force-pushed the metrics-script-updates branch from 097fd80 to 6549a7d Compare December 4, 2024 17:57
@jonchang jonchang marked this pull request as ready for review December 4, 2024 23:33
@jonchang jonchang added this pull request to the merge queue Dec 5, 2024
Merged via the queue into main with commit e49c696 Dec 5, 2024
2 checks passed
@jonchang jonchang deleted the metrics-script-updates branch December 5, 2024 17:08
@jonchang
Copy link
Collaborator Author

jonchang commented Dec 5, 2024

Thanks @knguyenrise8!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants