Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Missing WV02_MSI_L1B granules from DB into Production from MCP #377

Open
15 of 18 tasks
jsrikish opened this issue Aug 13, 2024 · 2 comments
Open
15 of 18 tasks
Assignees

Comments

@jsrikish
Copy link
Collaborator

jsrikish commented Aug 13, 2024

Ingest granules in collection WV02_MSI_L1B to CBA Prod by discovering/ingesting from MCP account.

Note: Some of the WV02_MSI_L1B granules have already been ingested into cumulus; there were a lot of missing granules which did not have entry in the DB; For those missing granules, checksums were calculated and inserted into DB
In the "collections/WV02_MSI_L1B___1_2022.json" file "duplicateHandling": "skip" is set to skip to avoid re-ingesting granules which have been ingested.

Following steps will be repeated for 2022-2009, one year at a time starting with 2022

  • Checkout and pull main: git checkout main && git pull
  • Create new branch: git checkout -b iss377_missing_WV02_MSI_L1B
  • Rule app/stacks/cumulus/resources/rules/WV02_Pan_L1B/v1/WV02_MSI_L1B___1_2022.json:
    • name: "WV02_MSI_L1B___1"
    • provider: "maxar"
    • meta.providerPathFormat: "'css/nga/WV02/1B/'yyyy/DDD"
    • meta.startDate: "2022-01-01T00:00:00Z"
    • meta.endDate: "2023-01-01T00:00:00Z"
  • Enter Docker with your environment (ex: DOTENV=.env.cba.prod make bash)
  • Replace the collection: cumulus collections replace --data app/stacks/cumulus/resources/collections/WV02_MSI_L1B___1.json
  • Replace the rule: cumulus rules replace --data app/stacks/cumulus/resources/rules/WV02_MSI_L1B/v1/WV02_MSI_L1B___1_2022.json
  • Enable the rule: cumulus rules enable --name WV02_MSI_L1B___1_2022
  • Run the rule: cumulus rules run --name WV02_MSI_L1B___1_2022

Acceptance criteria

  • The MapRun of the DiscoverAndQueueGranules execution triggered by running the rule
  • After some successful executions of IngestAndPublishGranules, thumbnails are visible in the Earthdata Search results (sort results with oldest first, as those will be the first ingested, and confirm that the URL for the thumbnail shows the hostname as data.csdap.earthdata.nasa.gov [note: csdap, not csda])
  • It is possible to download files in the file list for a granule shown in Earthdata Search (again, hostname should include csdap, not csda) -- Cognito auth should be triggered
  • After a few minutes (not more than 15 minutes?), granules and granule files can be found in Kibana Prod or this link for the correct time of the rule execution
  • All granules in WV02_MSI_L1B have been ingest into CBA Prod, with the exception of perhaps a small percentage of errors.

To determine how many granules have been processed, first enter the Docker container:

DOTENV=.env.cba-prod make bash

In the container, run the following:

DEBUG=1 cumulus granules list -? collectionId=WV02_Pan_L1B___1 --limit=0 -? status=completed

(note: due to a Cumulus bug, sometimes the status does not get properly updated. Try running these to match the numbers)

DEBUG=1 cumulus granules list -? collectionId=WV02_MSI_L1B___1 --limit=0
DEBUG=1 cumulus granules list -? collectionId=WV02_MSI_L1B___1 --limit=0 -? status=queued
DEBUG=1 cumulus granules list -? collectionId=WV02_MSI_L1B___1 --limit=0 -? status=running
DEBUG=1 cumulus granules list -? collectionId=WV02_MSI_L1B___1 --limit=0 -? status=completed
DEBUG=1 cumulus granules list -? collectionId=WV02_MSI_L1B___1 --limit=0 -? status=failed

You should see output similar to the following:

...
RESPONSE: {
  statusCode: 200,
  body: '{"meta":{"name":"cumulus-api","stack":"cumulus-prod","table":"granule","limit":0,"page":1,"count":8592},"results":[]}',
  headers: {
    'x-powered-by': 'Express',
    'access-control-allow-origin': '*',
    'strict-transport-security': 'max-age=31536000; includeSubDomains',
    'content-type': 'application/json; charset=utf-8',
    'content-length': '114',
    etag: 'W/"72-O2wUXhu+Q9J1hqdDrb0fcsZeFHo"',
    date: 'Fri, 01 Dec 2023 21:29:19 GMT',
    connection: 'close'
  },
  isBase64Encoded: false
}
[]

In particular, look at the value for body and within it, locate the value of "count". In the output above, the count should match the Earthdata Search granule count obtained in the very first step.

@hbparache
Copy link
Collaborator

Maxar Data Conversion and Cumulus Ingest Tracking

  • Based on the percentage of discrepancy, will focus on ingesting years highlighted in green (over 30% discrepancy)
  • Then will do the checksum DAG for the other years and update the percentage based on the new dynamoDB entries
  • If still "low" discrepancy (less than 30%), will wait until granules already ingested into EDC are deleted from MCP before re-ingesting these years

@jsrikish
Copy link
Collaborator Author

Year total # granules in Earthdata previous count
2022 555976 98950
2009 1007

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants