Date: 2024-06-25
Accepted
- Compliance
- Content
- CX
- Design
- Engineering
- Policy
- Product
- Process
- UX
- Migrating data from Census to GSA
- Logging and Transformation Tracking
- Data discrepancy compilation
- Completion of the ADR closes
The Census to GSA migration will require that some data be altered to improve its quality. The aim of this ADR is to establish a strategy for maintaining a complete record of any and all changes made, such that the original data can be reconstructed. In doing so, we are annotating where historical data fails to meet the validations in place with the FAC today.
-
Logging Migration Attempts (Success/Failure): Each migration run will result in some reports successfully migrating and others failing. It's important to record adequate details of both to filter out successfully migrated reports in subsequent iterations. To track this, we'll use a Django model similar to the following:
class ReportMigrationStatus(models.Model): audit_year = models.TextField(blank=True, null=True) dbkey = models.TextField(blank=True, null=True) run_datetime = models.DateTimeField(default=timezone.now) migration_status = models.TextField(blank=True, null=True)
audit_year
anddbkey
will store the AUDITYEAR and DBKEY, respectively.run_datetime
will record the migration attempt's timestamp, andmigration_status
will indicate whether the report was successfully migrated. -
MigrationErrorDetail Model: This model will link to
ReportMigrationStatus
via a foreign key relationship, enabling a direct association between a migration attempt and its corresponding error details.class MigrationErrorDetail(models.Model): report_migration_status_id = models.ForeignKey(ReportMigrationStatus, on_delete=models.CASCADE) tag = models.TextField(blank=True, null=True) exception_class = models.TextField(blank=True, null=True) detail = models.TextField(blank=True, null=True)
tag
will provide a tag for the error,exception_class
will provide the class of the error, 'detail' will contain the full description of the error. -
Logging Data Transformations (Change Logs): This logging is crucial for documenting any modifications to the data throughout the migration process. It is vital to record the state of the data both before and after transformation, along with any other pertinent details for auditing purposes. The
MigrationInspectionRecord
model is utilized to track these changes:class MigrationInspectionRecord(models.Model): audit_year = models.TextField(blank=True, null=True) dbkey = models.TextField(blank=True, null=True) report_id = models.TextField(blank=True, null=True) run_datetime = models.DateTimeField(default=timezone.now) finding_text = models.JSONField(blank=True, null=True) additional_uei = models.JSONField(blank=True, null=True) additional_ein = models.JSONField(blank=True, null=True) finding = models.JSONField(blank=True, null=True) federal_award = models.JSONField(blank=True, null=True) cap_text = models.JSONField(blank=True, null=True) note = models.JSONField(blank=True, null=True) passthrough = models.JSONField(blank=True, null=True) general = models.JSONField(blank=True, null=True) secondary_auditor = models.JSONField(blank=True, null=True)
Example
MigrationInspectionRecord
:{ "audit_year": "2022", "dbkey": "255585", "report_id": "2022-02-CENSUS-0000255585", "run_datetime": "2008-10-29 14:56:59", "general": [ { "census_data": [ { "value": "Non-profit", "column": "ENTITY_TYPE" } ], "gsa_fac_data": { "field": "entity_type", "value": "non-profit" }, "transformation_functions": [ "xform_entity_type" ] }, ... // More general field changes ], ... // More models from dissemination }
Here, every report being migrated will yield a single
MigrationInspectionRecord
, where each section field will hold the data transformation details related to that section.
We have determined it to be safe to ignore some generalized changes, such as upper/lowercasing, or using default keywords, to limit the amount of transformation data stored. Instead, these changes will be available for users to see in our documentation, while more field-specific and involved changes will be recorded in InspectionRecord
s. Transformations needed for each field is being tracked here.
The work of the migration takes place largely in census_historical_migration
in the repository. As of June 2024, that path can be found here:
https://github.com/GSA-TTS/FAC/tree/dda11bdfd31a000601e427379a3fac6ee9e7f1f8
- The
ReportMigrationStatus
model tracks migration status and makes it possible to compute the success-to-failure ratio after each iteration. Ideally, the failure rate should be 0% after all iterations for a given audit year. - The
MigrationInspectionRecord
model provides a comprehensive record of data changes for audit purposes.