Skip to content

Commit

Permalink
Merge pull request #25 from OHDSI/Aug_2023_update
Browse files Browse the repository at this point in the history
Aug 2023 update
  • Loading branch information
dimshitc authored Oct 26, 2023
2 parents 0a4dcd5 + 866cd9a commit 1f07496
Show file tree
Hide file tree
Showing 11 changed files with 244 additions and 478 deletions.
8 changes: 1 addition & 7 deletions docs/PREMIER/PREMIER.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,11 @@ permalink: /docs/PREMIER
# Premier Hospitalization Database (PHD) ETL Documentation
This ETL documentation is in the process of being updated. Here is the [link](https://github.com/OHDSI/ETL-CDMBuilder/blob/master/man/PREMIER/Premier_ETL_CDM_V5_3.doc) to the current document.

This document reflects the requirements, assumptions, business rules, and transformations for the implementation of the Common Data Model Version 5.0 (CDM) as implemented by Alan A. Andryc and Stephen Fortin, Observational Health Data Analytics, Janssen Epidemiology, Research and Development.
This document reflects the requirements, assumptions, business rules, and transformations for the implementation of the Common Data Model Version 5.4 (CDM) as implemented by Alan A. Andryc and Stephen Fortin, Observational Health Data Analytics, Janssen Epidemiology, Research and Development.

The purpose of this document is to describe the ETL mapping of the licensed data from Premier Hospitalization Database (PHD) into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM).

Premier Healthcare Database (PHD) is a nationally representative all-payer US hospital database that houses data on the inpatient and outpatient visits from non-profit, non-governmental and community and teaching hospitals and health systems. The data represent 1 in 5 inpatient hospital stays in the US. It is a visit-centric, billing database where each visit is linked with a unique billing record. The database contains information on medications, laboratory and diagnostic procedures, and diagnoses with day of service for medications and procedures.

# Source Data Mapping

![](/PREMIER/images/source_data_mapping.png)

The VISIT_OCCURRENCE table must be generated first because procedure occurrence, device exposure, condition occurrence, and drug exposure dates are subsequently generated using visit start date. The start and end date of each visit are derived from the maximum number of service days recorded during a visit and leveraging the days from prior to ensure temporality. The service days for each visit are in the PATBILL table where, for each visit, the maximum value in this field is obtained. The days from piror data is in the READMIT table where for each visit we can calculate how many days it came after a prior visit. The visit logic anchors on the last day of the month for which the most recent visit occurred. The logic transformation for these dates are explained in the sections for each respective table.

# Janssen Note:
In 2020 a separate data set was delivered that was COVID-19 specific. This data set followed the same structure as the PHD data set previously licensed but included both the GEN_LAB and LAB_RESULTS tables. Upon expiration of the COVID-19 specific license the GEN_LAB and LAB_RESULTS tables were licensed in addition to the previously licensed full PHD data set.
2 changes: 1 addition & 1 deletion docs/PREMIER/Premier_Cdm_Source.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ The CDM_SOURCE table houses metadata about the version of the CDM that is popula
| CDM_ETL_REFERENCE | | | |
| SOURCE_RELEASE_DATE | | SELECT VERSION_DATE FROM [_Version] | Get from the raw source tables. |
| CDM_RELEASE_DATE | | SELECT CONVERT(VARCHAR(10), GETDATE(),102) | Get the date the run completes. |
| CDM_VERSION | | V5.3.1 | |
| CDM_VERSION | | V5.4 | |
| VOCABULARY_VERSION | | SELECT VOCABULARY_VERSION FROM VOCABULARY WHERE VOCABULARY_ID = 'None' | Taken from the Vocabulary loaded into the CDM. |

3 changes: 1 addition & 2 deletions docs/PREMIER/Premier_Condition_Era.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,4 @@ All Condition Eras are recorded in the CONDITION_ERA table based on the followin
| CONDITION_CONCEPT_ID  | CONDITION_CONCEPT_ID  | Do not build condition_era where the condition_occurence_condition_concept_id=0  | |
| CONDITION_ERA_START_DATE  | CONDITION_START_DATE  | | The start date for the condition era constructed from the individual instances of condition occurrences. It is the start date of the very first chronologically recorded instance of the condition.  |
| CONDITION_ERA_END_DATE  | CONDITION_START_DATE  | | The end date for the condition era constructed from the individual instances of condition occurrences. It is the end date of the final continuously recorded instance of the condition.  |
| CONDITION_TYPE_CONCEPT_ID  | | Apply a 30-day persistence window and label as CONCEPT_ID 38000247 (Condition era - 30 days persistence window).    | Falls under CONCEPT_VOCABULARY_ID = 37 - OMOP Condition Occurrence Type.  |
| CONDITION_OCCURRENCE_COUNT  | | Sum up the number of CONDITION_OCCURRENCEs for this PERSON_ID and this CONCEPT_ID during the exposure window being built.  | |
| CONDITION_OCCURRENCE_COUNT  | | Sum up the number of CONDITION_OCCURRENCEs for this PERSON_ID and this CONCEPT_ID during the exposure window being built.  | |
8 changes: 5 additions & 3 deletions docs/PREMIER/Premier_Condition_Occurrence.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ The field mapping is performed as follows:
| --- | --- | --- | --- |
| CONDITION_OCCURRENCE_ID | - | System-generated | |
| PERSON_ID | PAT.MEDREC_KEY | | |
| CONDITION_CONCEPT_ID | PATICD_DIAG.ICD_CODEPATBILL.STD_CHG_CODE | For records from PATBILL.STD_CHG_CODE:QUERY: SOURCE TO STANDARD SELECT TARGET_VOCABULARY_IDFROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('JNJ_PMR_PROC_CHRG_CD')AND TARGET_DOMAIN_ID = 'Condition'For records from PATICD_DIAG.ICD_CODE:where ICD_VERSION=9QUERY: SOURCE TO STANDARDSELECT TARGET_CONCEPT_IDFROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('ICD9CM')AND TARGET_DOMAIN_ID = 'Condition'For records from PATICD_DIAG.ICD_CODE:where ICD_VERSION=10QUERY: SOURCE TO STANDARDSELECT TARGET_CONCEPT_IDFROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('ICD10CM')AND TARGET_DOMAIN_ID = 'Condition' | ICD9 diagnosis codes are mapped to SNOMED concepts |
| CONDITION_START_DATE | PATBILL.SERV_DAY VISIT_OCCURRENCE.VISIT_START_DATEORVISIT_OCCURRENCE.VISIT_START_DATE | If condition is from PATBILL use a combination of service day and visit start date unless the service day is greater than the end of the monthIf observation comes from PATICD_DIAG.ICD_CODE then use visit start date | If condition is from PATBILL use a combination of service day and visit start date unless the service day is greater than the end of the monthIf observation comes from PATICD_DIAG.ICD_CODE then use visit start date |
| CONDITION_CONCEPT_ID | PATICD_DIAG.ICD_CODE <br /> PATBILL.STD_CHG_CODE | For records from PATBILL.STD_CHG_CODE: <br /> QUERY: SOURCE TO STANDARD SELECT TARGET_VOCABULARY_ID FROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('JNJ_PMR_PROC_CHRG_CD') AND TARGET_DOMAIN_ID = 'Condition' <br /> <br /> For records from PATICD_DIAG.ICD_CODE: where ICD_VERSION=9 QUERY: SOURCE TO STANDARD SELECT TARGET_CONCEPT_IDFROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('ICD9CM') AND TARGET_DOMAIN_ID = 'Condition' <br /> <br />For records from PATICD_DIAG.ICD_CODE: where ICD_VERSION=10 <br /> QUERY: SOURCE TO STANDARD SELECT TARGET_CONCEPT_IDFROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('ICD10CM') AND TARGET_DOMAIN_ID = 'Condition' | ICD9 diagnosis codes are mapped to SNOMED concepts |
| CONDITION_START_DATE | PATBILL.SERV_DATE VISIT_OCCURRENCE.VISIT_START_DATE OR VISIT_OCCURRENCE.VISIT_START_DATE | If observation comes from PATICD_DIAG.ICD_CODE or PATBILL then use visit start date | |
| CONDITION_START_DATETIME | - | NULL | |
| CONDITION_END_DATE | - | NULL | |
| CONDITION_END_DATETIME | - | NULL | |
Expand All @@ -28,11 +28,13 @@ The field mapping is performed as follows:
| PROVIDER_ID | PAT.ADMPHY | NULL | |
| VISIT_OCCURRENCE_ID | PAT.PAT_KEY | | |
| CONDITION_SOURCE_VALUE | PATICD_DIAG.ICD_CODE | | |
| CONDITION_SOURCE_CONCEPT_ID | | QUERY: SOURCE TO SOURCESELECT SOURCE_CONCEPT_IDFROM CTE_VOCAB_MAPWHERE SOURCE_VOCABULARY_ID IN ('ICD9CM', 'ICD10', 'ICD10CM')AND TARGET_VOCABULARY_ID IN ('ICD9CM', 'ICD10', 'ICD10CM') AND DOMAIN_ID='CONDITION' | |
| CONDITION_SOURCE_CONCEPT_ID | | QUERY: SOURCE TO SOURCE SELECT SOURCE_CONCEPT_IDFROM CTE_VOCAB_MAPWHERE SOURCE_VOCABULARY_ID IN ('ICD9CM', 'ICD10', 'ICD10CM') AND TARGET_VOCABULARY_ID IN ('ICD9CM', 'ICD10', 'ICD10CM') AND DOMAIN_ID='CONDITION' | |
| CONDITION_STATUS_SOURCE_VALUE | PATICD_DIAG.ICD_PRI_SEC <br> PATBILL | | Records coming from PATICD_DIAG will have condition_status_source_value = 'A', 'P' or 'S'.<br><br> Records coming from PATBIL will have a condition_status_source_value = 'From PATBILL - No information provided' |
| CONDITION_STATUS_CONCEPT_ID | PATICD_DIAG.ICD_PRI_SEC | Records from PATICD_DIAG: <br> ICD_PRI_SEC = A, then 32890 (admission diagnosis) <br> ICD_PRI_SEC = P, then 32902 (primary diagnosis) <br> ICD_PRI_SEC = S, then 32908 (secondary diagnosis) <br><br> Records from PATBILL: <br> Assign 32908 (secondary diagnosis)| |

## Change log:
* 2023.10.23:
+ SERV_DAY changed to SERV_DATE
* 2021.08.11:
+ Updated CONDITION_STATUS_CONCEPT_ID to leverage icd_pir_sec and the standard status concepts. This replaced previous logic leveraging ICD_POA.
+ Added comments to CONDITION_SOURCE_VALUE.
Expand Down
2 changes: 1 addition & 1 deletion docs/PREMIER/Premier_Device_Exposure.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The field mapping is as follows:
| | PATICD_PROC.ICD_CODE  | | |
| | PATICD_DIAG.ICD_CODE  | SELECT TARGET_CONCEPT_ID FROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('HCPCS', 'ICD10CM', 'JNJ_PMR_PROC_CHRG_CD') AND TARGET_DOMAIN_ID IN ('Device')  | |
| | PATCPT.CPT_CODE   | | |
| DEVICE_EXPOSURE_START_DATE  | VISIT_OCCURRENCE.VISIT_END_DATE  or  VISIT_OCCURRENCE.VISIT_START_DATE   PATBILL.SERV_DAY  | | If the device is a CPT code or HCPCS code then discharge date is used as device date because the exact date is unknown. If the row is coming from PATBILL then a combination or admit date and service date is used.  |
| DEVICE_EXPOSURE_START_DATE  | VISIT_OCCURRENCE.VISIT_END_DATE  or  VISIT_OCCURRENCE.VISIT_START_DATE   PATBILL.SERV_DATE  | | If the device is a CPT code or HCPCS code then discharge date is used as device date because the exact date is unknown. If the row is coming from PATBILL then a combination or admit date and service date is used.  |
| DEVICE_EXPOSURE_START_DATETIME  || NULL  | |
| DEVICE_EXPOSURE_END_DATE  |    | | |
| DEVICE_EXPOSURE_END_DATETIME  || NULL  | |
Expand Down
37 changes: 19 additions & 18 deletions docs/PREMIER/Premier_Drug_Era.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,28 @@ A Drug Era is defined as a span of time when the Person is assumed to be exposed
Drugs that are mapped to a DRUG_CONCEPT_ID=0 should not be mapped. The logic below is used to map DRUG_CONCEPT_ID’s to ingredients.

```r
SELECT DISTINCT ca.ANCESTOR_CONCEPT_ID /*ingredient level*/,
ca.DESCENDANT_CONCEPT_ID /*this is where you set the DRUG_EXPOSURE.DRUG_CONCEPT_ID to*/
FROM CONCEPT c1
JOIN CONCEPT_ANCESTOR ca
ON ca.DESCENDANT_CONCEPT_ID = c1.CONCEPT_ID
JOIN CONCEPT c2
ON c2.CONCEPT_ID = ca.ANCESTOR_CONCEPT_ID
AND c2.CONCEPT_VOCABULARY_ID = 8
AND c2.CONCEPT_LEVEL = 2
WHERE c1.CONCEPT_VOCABULARY_ID = 8
Select distinct A.concept_id as concept_id,
C.concept_id as ingredient_concept_id
FROM CONCEPT C
JOIN CONCEPT_ANCESTOR CA
ON CA.ancestor_concept_id = C.concept_id
and lower(c.standard_concept) = 's'
and lower(c.concept_class_id) = 'ingredient'
and (invalid_reason is null or invalid_reason = '')
JOIN CONCEPT A
ON CA.descendant_CONCEPT_ID = A.CONCEPT_ID
```

Do not include records that cannot be mapped to the ingredient level. The DRUG_EXPOSURE_END_DATE is the DRUG_EXPOSURE_START_DATE.

The field mapping is as follows:

| Destination Field  | Source Field  | Applied Rule  | Comment  |
| DRUG_ERA_ID  | | System generated  | |
| PERSON_ID  | PERSON_ID  | | |
| DRUG_CONCEPT_ID  | DRUG_CONCEPT_ID  | Do no create DRUG_ERAs where the DRUG_EXPOSURE.DRUG_CONCEPT_ID is 0. Use the map above to map DRUG_EXPOSURE.DRUG_CONCEPT_ID to the ingredient level DRUG_CONCEPT_ID used in the DRUG_ERA.  | |
| DRUG_ERA_START_DATE  | DRUG_EXPOSURE_START_DATE  | | The start date for the drug era constructed from the individual instances of drug exposures. It is the start date of the very first chronologically recorded instance of utilization of a drug.  |
| DRUG_ERA_END_DATE  | DRUG_EXPOSURE.START_DATE  | | |
| DRUG_TYPE_CONCEPT_ID  || Apply a 30-day persistence window and label as CONCEPT_ID 38000182 (Drug era - 30 days persistence window).    | Falls under CONCEPT_VOCABULARY_ID = 36 as a Drug Exposure Type.  |
| DRUG_EXPOSURE_COUNT  || Sum up the number of DRUG_EXPOSURES for this PERSON_ID and this CONCEPT_ID during the exposure window being built.  | |
|||||
|--- |--- |--- |--- |
|Destination Field|Source Field|Applied Rule|Comment|
|DRUG_ERA_ID||System generated||
|PERSON_ID|PERSON_ID|||
|DRUG_CONCEPT_ID|DRUG_CONCEPT_ID|Do no create DRUG_ERAs where the DRUG_EXPOSURE.DRUG_CONCEPT_ID is 0. Use the map above to map DRUG_EXPOSURE.DRUG_CONCEPT_ID to the ingredient level DRUG_CONCEPT_ID used in the DRUG_ERA.||
|DRUG_ERA_START_DATE|DRUG_EXPOSURE_START_DATE||The start date for the drug era constructed from the individual instances of drug exposures. It is the start date of the very first chronologically recorded instance of utilization of a drug.|
|DRUG_ERA_END_DATE|DRUG_EXPOSURE.START_DATE|||
|DRUG_EXPOSURE_COUNT|-|Sum up the number of DRUG_EXPOSURES for this PERSON_ID and this CONCEPT_ID during the exposure window being built.||
4 changes: 2 additions & 2 deletions docs/PREMIER/Premier_Drug_Exposure.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ layout: default

The DRUG_EXPOSURE table will house records from PATBILL and PATCPT that have been mapped to the drug or metadata domain.

Administrations of drugs are recorded in the PATBILL table as standard charges. Premier captures the day of administration in the SERV_DAY field. DRUG_EXPOSURE_START_DATE is determined by adding the number of service days to the visit start day using VISIT_OCCURRENCE .VISIT_START_DATE and PATBILL.SERV_DAY. If the start date is greater than the end of the month, then it’s truncated to the end of month. Procedure drugs reside in the PATCPT table. DRUG_EXPOSURE_START_DATE for procedures is the last day of visit or VISIT_END_DATE, since dates for the administration of procedure drugs is not recorded, the assumption is made that the procedure occurred sometime before the end of the visit. DRUG_EXPOSURE_END_DATE cannot be determined because the patient is not followed each stay and days’ supply information is not available.
Administrations of drugs are recorded in the PATBILL table as standard charges. Premier captures the date of administration in the SERV_DATE field. Procedure drugs reside in the PATCPT table. DRUG_EXPOSURE_START_DATE for procedures is the last day of visit or VISIT_END_DATE, since dates for the administration of procedure drugs is not recorded, the assumption is made that the procedure occurred sometime before the end of the visit. DRUG_EXPOSURE_END_DATE cannot be determined because the patient is not followed each stay and days’ supply information is not available.

Premier does not provide NDC codes for drugs that are administered during a visit. The PRESCRIBING_PROVIDER_ID is determined from the visit using the admitting provider id of the visit. In Premier, the admitting and attending providers are provided and due to the similarity of both the fields, the admitting provider id is used. The determination cannot be made if the admitting provider was the provider that prescribed the medication but that is the only information that is available. Drug type is considered inpatient administration for all drugs, except those drugs that are procedures and come from PATCPT. Both HCPCS codes and CPT codes are available in PATCPT. The quantity of drugs administered as captured from the QTY field in PATBILL.

Expand All @@ -26,7 +26,7 @@ The field mapping is performed as follows:
| PERSON_ID | PAT.MEDREC_KEY | | |
| DRUG_CONCEPT_ID | PATCPT.CPT_CODE <br> PATBILL.STD_CHG_CODE | QUERY for PATCPT: SOURCE TO STANDARD: SELECT TARGET_CONCEPT_ID FROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('CPT4', 'HCPCS') AND TARGET_DOMAIN_ID = 'Drug' <br><br> QUERY for PATBILL: SOURCE TO STANDARD: SELECT TARGET_CONCEPT_ID FROM CTE_VOCAB_MAP WHERE SOURCE_VOCABULARY_ID IN ('JNJ_PMR_DRUG_CHRG_CD') AND TARGET_DOMAIN_ID = 'Drug'
| Include all concepts that map to a concept id of zero. |
| DRUG_EXPOSURE_START_DATE | PATBILL.SERV_DAY VISIT_OCCURRENCE.VISIT_START_DATE Or VISIT_OCCURRENCE.VISIT_END_DATE | If drug is from PATBILL use a combination of service day and visit start date unless the service day is greater than the end of the monthIf drug comes from PATCPT then use visit end date | |
| DRUG_EXPOSURE_START_DATE | PATBILL.SERV_DATE VISIT_OCCURRENCE.VISIT_START_DATE Or VISIT_OCCURRENCE.VISIT_END_DATE | If drug is from PATBILL use SERV_DATE <br>If drug comes from PATCPT then use visit end date | |
| DRUG_EXPOSURE_START_DATETIME | - | NULL | |
| DRUG_EXPOSURE_END_DATE | DRUG_EXPOSURE.DRUG_EXPOSURE_START_DATE | DRUG_EXPOSURE.DRUG_EXPOSURE_START_DATE | Now a required field. No info on days supply, so set same date as drug_exposure_start_date |
| DRUG_EXPSURE_END_DATETIME | - | NULL | |
Expand Down
Loading

0 comments on commit 1f07496

Please sign in to comment.