-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix low quality BWB imports (when no ISBN or unlikely dates) #10157
Comments
Isn't this the issue #9440, with the flow chart , was meant to address? |
@hornc yes, both addressing same thing. Can you help us project manage the effort based on the flow chart by providing a
Needs: Breakdown
I believe the list of rules looks something like:
I do see these seemingly solved:
This issue simply deals with the first two cases (if no ISBN don't import and if ISBN but pre-1966 don't import) |
I was under the impression from what @seabelis said that the rules from #9440 must not be working as I thought they did, though the pre-1966 thing is a new requirement I am happy to add. But it would be helpful to know which things are broken, because they must be evading the tests and it's not obvious to me which parts aren't working as intended. I think that may be more in the ballpark of @seabelis than @hornc, though, unless you, Charles, have also noticed ways the flowchart has been improperly implemented. I am more than happy to fix this in any way, but I am unsure what exactly is not working, aside from the prohibition against importing records that have a publish_date of < 1966 and that also have an ISBN. |
I don't know if/whether affiliate server is fetching things right now (is the service fully operational). I'm not sure what happens to promise items if amz is down (I'm not sure we're rejecting them, in fact I think promise flow has several exemptions that allows it to proceed even if fields are missing). |
Ah, I see. That is a mistake, then. It was meant to not import records that are incomplete (by the definition in #9440) full stop. I will try to get that fixed by Monday. |
This was originally added as a way to ensure promise items were imported (even if data was incomplete) to enable scribes to find the book record. openlibrary/openlibrary/catalog/add_book/__init__.py Lines 919 to 934 in 81681fb
However, it seems like maybe this is the wrong call. There are a lot of competing concerns, for instance, we want to:
These requirements often compromise each other. Furthermore, a large number of promise items don't have date, so even though I agree with @hornc, I think it's possible promise items would become somewhat worthless to our digitization process if we enforced every requirement in that list (i.e. many records only have title and ISBN)... So we may want a compromise that still passes promise items through different validation or we may need to do more to augment promise records with amazon and google books. I think for now, the right path forward is to:
After this, we might consider...
|
Problem
Subtask of #9440
BWB promise records without ISBN seem to have unreliable data. Reject promise items that
Otherwise, as to not block digitization, clear any other metadata field (such as date, title, etc) before importing.
The text was updated successfully, but these errors were encountered: