Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse raw model output to structured JSON object #19

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

romansinkus
Copy link

  • Create hard-coded keys file for the example log book
  • Use hard-coded keys to parse raw model output from Florence model to create a structured JSON object

Note:

  • The parsing accuracy is still quite low. This will likely mean we should transition to a more general model (not OCR-specific) in order to more accurately parse the model output into a JSON object making it more usable on the front end.

Copy link
Contributor

@TonyLiu0226 TonyLiu0226 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far with parsing model output to keys! I would say as a temporary measure for the demo, if it is possible, please add the full generated text to the repsonse since after parsing to keys a lot of the transcribed text ends up missing (likely as mentioned due to model performance)

@@ -16,6 +18,7 @@

@app.route("/api/transcribe", methods=["POST"])
def transcribe():
print("START OF ENDPOINT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this print

@@ -0,0 +1,18 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this is okay as this model's accuracy is low and we will also need to consider other database templates, but we will need a more comprehensive list of keys for all subheadings in log template

return jsonify({"transcription": generated_text})
keys = load_keys("keys.json")
json_result = parse_florence_output(generated_text, keys)
return json_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to low model accuracy, could you please also include the full generated_text in the API's repsonse for demo purposes? I could maybe suggest return a jsonify or json.dumps that combines json_result with generated_text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants