Some images get stuck in the NIC (staging and prod) #1235

kyrieb-ekat · 2024-12-05T15:46:54Z

While processing groups of images through the E2E workflow, occasionally the odd image gets stuck on the NIC step and cannot proceed. I retested these image in another batch of images, as well as on their own and with different models, and repeated these steps on production as well as staging. I have halved the images just in case, and still met with an NIC sticking point. The images, while they are from the same section of MS73, are notably different from one another (my initial thought was that the images failing were images with poor parchment and bad ink, etc., but the errors do not seem to be consistent in this area).

@JoyfulGen I believe you also have a few images which do this as well.

I have attached two of the error producing images below (MS073_108, 110), as well as the relevant screenshot.

fujinaga · 2024-12-06T01:35:36Z

Can you show us what the image (note layer?) going into NIC looks like?

kyrieb-ekat · 2024-12-06T03:27:37Z

Here's 108! It looks like it's combined the text and music later, but the notes came through better than I thought they would for being so smeary.

kyrieb-ekat · 2024-12-06T14:36:07Z

Notes for the future; try running a strip (single stave) through NIC (previously have only reduced down to half an image and switched models); also try running several of the failing images through the interactive classifier and see what happens/if they all have weird glyph/no glyph similarities.

JoyfulGen · 2024-12-09T03:33:59Z

Update: I've tried both of those things!

Running a single staff: the NIC did NOT get stuck! However, I can't see the final result because Miyao failed because all the staff lines ended up in layer 3. Oh well. I did put the same staff through the IC and saw that every single glyph had been categorized as a 'skip' category.

Running a problem folio through the IC: I ran folios 171 to peek. Folio 171 had plenty of glyphs, a lot of which were correctly identified as various neumes. So it wasn't all 'skip' classes! So... I can't see from the IC what the problem might be. But more investigation could be useful!

kyrieb-ekat · 2024-12-09T22:11:29Z

Were both of these from 171?

I also tested a couple of mine which failed- I did not see any kind of smoking gun at all: neither in the realm of weird glyphs, or over-representation in a category and so on.

fujinaga · 2024-12-10T00:19:12Z

Is the image above going into the NIC? Shouldn't the image basically have only notes? I see lots of text and background.

JoyfulGen · 2024-12-10T01:11:11Z

@fujinaga this is fairly expected for this manuscript and this folio; the majority of our folios separate just fine, but we had to accept a certain margin of error for middle folios, because the different parts of the manuscript are just so different from each other. Training more models and determining exactly which folios worked best with which models was becoming a very time-consuming task, so we eventually decided that it would be most efficient to keep our two models as they were. For imperfect layer separations like the one above, we taught the IC to discard fragments of text as much as possible, and the rest can be easily corrected in Neon.

kyrieb-ekat · 2024-12-12T19:49:54Z

I have attempted a variety of runs, starting with smaller strips (without edges) which processed successfully on staging and prod, scaling all the way up to a full page (which failed as expected). I stepped down to just cropping out the edges (no pages from the following parts of the manuscript, no binding, no cover), and the images ran successfully. I then used the same image but with only one portion of edge available and it failed.

I attempted a workflow wherein I included the resize job, and it also failed using the entire/intact image; it succeeded with the no-edges version.

The produced layers all look very, very similar to their successful counterparts—I very much think that the presence of the edges/book elements are causing some kind of issue in either the category size (a "skip" category becoming very, very large), or the connected component factor getting stumped by these very large items.

At some point it might be worth thinking about some kind of page layout detector, which will automatically crop any elements of a folio image outside of the ruling, or something.

In sum: I'm feeling comfortable in saying this is something related to the edges in the images, perhaps in 1) memory space from a category (skip) or 2) connected component size/processing

Below are three samples of the "whole" "cropped" and "single edge" layers I fed through.

fujinaga · 2024-12-12T23:13:50Z

Thank you for the excellent debugging!
To determine whether the cause is "2) connected component size/processing" or not,
can you try the image with the border that fail on NIC, on IC? Both NIC and IC should be using the same connected component (CC) analysis algorithm. If the image fails both on NIC and IC then it's very likely the CC analysis. If it only fails on NIC then there's something else wrong with NC and we should create a separate issue.
At one time, pre-Rodan, we did develop a border-removal algorithm, but I'd prefer that we let Paco handle that. Consider creating the "noise layer" in the iterative document analysis step.

kyrieb-ekat · 2024-12-13T17:12:30Z

Ran the remaining images which fail in the NIC that I hadn't tested in the IC previously through the IC; they processed successfully. Nothing incredibly unusual jumped out in any of the categories, nor was any one category incredibly over-represented, or at least nothing which doesn't also show up in the successful pages. It is likely this is something within the NIC, and I'll make a separate issue for this.

I'm going to resume testing a 4-layer training approach, with layer four as noise (pages, cover, shadows, etc) and see what we can do with that until background removal is back.

kyrieb-ekat added the Priority: LOW label Dec 5, 2024

JoyfulGen added priority: Medium Priority: LOW and removed Priority: LOW priority: Medium labels Dec 5, 2024

kyrieb-ekat mentioned this issue Dec 13, 2024

The NIC has a bug #1238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some images get stuck in the NIC (staging and prod) #1235

Some images get stuck in the NIC (staging and prod) #1235

kyrieb-ekat commented Dec 5, 2024

fujinaga commented Dec 6, 2024

kyrieb-ekat commented Dec 6, 2024

kyrieb-ekat commented Dec 6, 2024

JoyfulGen commented Dec 9, 2024

kyrieb-ekat commented Dec 9, 2024

fujinaga commented Dec 10, 2024

JoyfulGen commented Dec 10, 2024 •

edited

Loading

kyrieb-ekat commented Dec 12, 2024 •

edited

Loading

fujinaga commented Dec 12, 2024

kyrieb-ekat commented Dec 13, 2024 •

edited

Loading

Some images get stuck in the NIC (staging and prod) #1235

Some images get stuck in the NIC (staging and prod) #1235

Comments

kyrieb-ekat commented Dec 5, 2024

fujinaga commented Dec 6, 2024

kyrieb-ekat commented Dec 6, 2024

kyrieb-ekat commented Dec 6, 2024

JoyfulGen commented Dec 9, 2024

kyrieb-ekat commented Dec 9, 2024

fujinaga commented Dec 10, 2024

JoyfulGen commented Dec 10, 2024 • edited Loading

kyrieb-ekat commented Dec 12, 2024 • edited Loading

fujinaga commented Dec 12, 2024

kyrieb-ekat commented Dec 13, 2024 • edited Loading

JoyfulGen commented Dec 10, 2024 •

edited

Loading

kyrieb-ekat commented Dec 12, 2024 •

edited

Loading

kyrieb-ekat commented Dec 13, 2024 •

edited

Loading