-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some images get stuck in the NIC (staging and prod) #1235
Comments
Can you show us what the image (note layer?) going into NIC looks like? |
Notes for the future; try running a strip (single stave) through NIC (previously have only reduced down to half an image and switched models); also try running several of the failing images through the interactive classifier and see what happens/if they all have weird glyph/no glyph similarities. |
Update: I've tried both of those things! Running a single staff: the NIC did NOT get stuck! However, I can't see the final result because Running a problem folio through the IC: I ran folios 171 to peek. Folio 171 had plenty of glyphs, a lot of which were correctly identified as various neumes. So it wasn't all 'skip' classes! So... I can't see from the IC what the problem might be. But more investigation could be useful! |
Were both of these from 171? I also tested a couple of mine which failed- I did not see any kind of smoking gun at all: neither in the realm of weird glyphs, or over-representation in a category and so on. |
Is the image above going into the NIC? Shouldn't the image basically have only notes? I see lots of text and background. |
@fujinaga this is fairly expected for this manuscript and this folio; the majority of our folios separate just fine, but we had to accept a certain margin of error for middle folios, because the different parts of the manuscript are just so different from each other. Training more models and determining exactly which folios worked best with which models was becoming a very time-consuming task, so we eventually decided that it would be most efficient to keep our two models as they were. For imperfect layer separations like the one above, we taught the IC to discard fragments of text as much as possible, and the rest can be easily corrected in Neon. |
Thank you for the excellent debugging! |
Ran the remaining images which fail in the NIC that I hadn't tested in the IC previously through the IC; they processed successfully. Nothing incredibly unusual jumped out in any of the categories, nor was any one category incredibly over-represented, or at least nothing which doesn't also show up in the successful pages. It is likely this is something within the NIC, and I'll make a separate issue for this. I'm going to resume testing a 4-layer training approach, with layer four as noise (pages, cover, shadows, etc) and see what we can do with that until background removal is back. |
While processing groups of images through the E2E workflow, occasionally the odd image gets stuck on the NIC step and cannot proceed. I retested these image in another batch of images, as well as on their own and with different models, and repeated these steps on production as well as staging. I have halved the images just in case, and still met with an NIC sticking point. The images, while they are from the same section of MS73, are notably different from one another (my initial thought was that the images failing were images with poor parchment and bad ink, etc., but the errors do not seem to be consistent in this area).
@JoyfulGen I believe you also have a few images which do this as well.
I have attached two of the error producing images below (MS073_108, 110), as well as the relevant screenshot.
The text was updated successfully, but these errors were encountered: