Skip to content

nyeongna/PUBG-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

71 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PUBG Dataset ETL with AWS

๐Ÿ“– Overview

This project is to practice the ETL process with Player Unknown Battle Ground(PUBG) dataset. Often times, dataset below 10,000 rows can easily be handled within a single server. However, when the size of the dataset is over 1,000,000 or even higher up to billions, the need for distributed computation is somewhat required. This project utilizes three PUBG-related dataset (two of them with 13,000,000 rows) from Kaggle. This ETL process loads the dataset into AWS S3 bucket, creates the AWS EMR cluster, loads and transforms the dataset within the EMR cluster-end by using pySpark, then write the transformed dataset back to AWS S3 in csv format. Then finally, it extracts the dataset in the S3 to final Fact and Dimension tables in AWS Redshift. All of the above series of steps are orchestrated by AirFlow. Structure of the Fact/Dimension tables are made based on the future analytical queries.

โšก๏ธŽ Data Source

๐Ÿšฉ Tech

  • Python
  • AirFlow
  • Docker
  • AWS S3
  • AWS EMR
  • AWS Redshift

โ†’ How to Run

  1. You need to have AWS CLI configuration ready (AWS credentials + EMR Credentiasl) (for details)
  2. You need ๐Ÿณ docker & docker-compose
  3. Run the following command in the terminal where you git clone the reposit
    docker-compose -f docker-compose-LocalExecutor.yml up -d
  4. Add your "redshift' account info in the AirFlow Web UI (localhost:8080/admin -> Admin -> Connections)
  5. Assign your S3 Bucket name to "BUCKET_NAME" variable in "/dags/spark_submit_airflow.py"
  6. Assign your S3 Bucket name to "BUCKET_NAME" variable in "/dags/scripts/spark/spark-scipt.py"
  7. Create the S3 bucket with the name you specified for "BUCKET_NAME"
  8. Run the dag named "spark_submit_airflow"

๐Ÿ“˜ General Description

image Above is the total process of ETL process used in this project. All the workflows were controlled by AirFlow. Raw dataset is stored in AWS S3 bucket and all the data wrangling process is handled by AWS EMR cluster (mostly spark-related work). Then final Fact and Dimension tables are created in AWS Redshift, which supports fast query speed and compuatation due to columnar storage characteristic.

๐Ÿ—’ DAG and Tasks

image

  • start_data_pipeline: DummyOperator to indicate the successful run of the DAG
  • <script_to_s3, data_to_s3>: Load raw data and spark script to S3
  • create_emr_cluster: Create AWS EMR cluster for spark job
  • add_steps: Submit a list of work EMR cluster needs to do
  • watch_step: Check if the EMR cluster and the steps are successfully done
  • terminate_emr_cluster: Terminate the created EMR cluster after job finished
  • create_tables: Create Fact/Dimension tables in AWS Redshift
  • load_XXX_table: Load the output csv file from EMR cluster in S3 to Redsfhit
  • check_data_quality: Check if the data is successfully stored in Redshift table
  • end_data_pipeline: DummyOperator to indicate the successful end of the DAG

๐Ÿ“Š Fact/Dimension Tables

image kill_log table acts as FACT table. Each record represents every kill log during the match and the details of the kill log and relevant players info are stored in other DIMENSION tables .

  • Detailed information about the match itself (map, game_size, etc...) can be found by JOINING the fact table with "match" table with JOIN key of "match_id".
  • "killer_id" and "victim_id" represents unique identifier for the player at specific match. It can be used as JOIN key with "player_id" column of "player" table.
  • Detailed "timestamp" information can be retrieved by JOINING "kill_log" table with "time" table with JOIN key of "timestamp".
  • Specific information regarding the weapon that was used in the kill log can be found in the "weapon" dimension table. It can be retrieved by JOINING the fact table with "weapon" table.

๐Ÿ™‹โ€โ™‚๏ธ Query Exmaple

The Most Used Weapon by Map Query

SELECT m.map AS Map,
       kl.weapon AS Weapon,
       COUNT(*) AS Num
FROM pubg.kill_log AS kl
LEFT JOIN pubg.match AS m ON kl.match_id = m.match_id
LEFT JOIN pubg.time AS t ON kl.timestamp = t.timestamp
LEFT JOIN pubg.weapon AS w ON kl.weapon = w.weapon
WHERE m.map in ('ERANGEL', 'MIRAMAR')
GROUP BY m.map, kl.weapon
ORDER BY m.map, Num DESC

By JOINING Fact & Dimension tables, one can get the result of the Most used Weapon by Map. The result of the above code would be as follows

โœ”๏ธŽ Reasons for the Tech Stacks

  • Often times when Data Engineering work is needed, seemless workflows from Extract to Transform to Load are necessary. These 3 steps can be treated as one single data engineering work and Airflow works as one of the best tools to orchestrate the 3 ETL steps.
  • Airflow was chosen for orchestration because I was accustomed to working with Python and Airflow is one of the most popular Open Source pipeline framework recognized by many developers in Github. This hugh community enables quick trouble-shooting.
  • Since AWS Services share the same data center, moving data within the AWS Services guarantees high speed and stability. Thus, AWS S3 was chosen for Storage.
  • For data wrangling, Spark was used instead of Hadoop since Spark supports faster speed with the use of in-memory as intermediate data saving storage (replacing HDFS). For this Spark job, AWS EMR was used because it can be created and turned-off easily with Airflow and support Spark. It also supports easy data transfer from AWS S3.
  • Lastly, AWS Redshift was used for storing the final Fact/Dimension table because it supports high data transfer speed from AWS S3 by using 'COPY COMMAND'. In spite of the fact that AWS Redshift is a columnar storage, it also supports PostgreSQL. Thus, it can be said AWS Redshift supports both the easy access and fast query speed.

Why AWS?

  • ๋จผ์ €, ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค๋กœ๋งŒ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์„ฑํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค. AWS, ๊ตฌ๊ธ€ ํด๋ผ์šฐ๋“œ, Azure ๋“ฑ์˜ ์˜ต์…˜์ด ์žˆ์—ˆ์ง€๋งŒ, Airflow์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ ์œ ์œจ๊ณผ ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ํฌ๊ธฐ ๋ฉด์—์„œ ๋„์›€์„ ๋ฐ›์„ ๊ธธ์ด ๋” ๋งŽ์•„๋ณด์—ฌ์„œ AWS๋ฅผ ์„ ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Airflow

  • ์›Œํฌํ”Œ๋กœ์šฐ ๋งค๋‹ˆ์ง€๋จผํŠธ ํ”Œ๋žซํผ์œผ๋กœ Airflow, Oozie, Luigi ๋“ฑ๋“ฑ์ด ์žˆ์—ˆ์ง€๋งŒ ํŒŒ์ด์ฌ์„ ์‚ฌ์šฉํ•˜๊ณ  UI๊ฐ€ ์ข€๋” ์ง๊ด€์ ์ด๋ฉฐ tasks ๋ผ๋ฆฌ์˜ dependcies๋ฅผ ์‰ฝ๊ฒŒ ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ ํŠธ๋ฆฌ๋‚˜ DAG ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚ด์ฃผ๋Š” Airflow๋ฅผ ์„ ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ด€๋ จ ์ปค๋ฎค๋‹ˆํ‹ฐ๊ฐ€ ํ˜„์‹œ์ ์—์„œ ๊ฐ€์žฅ ์ปค์„œ ์ดˆ๋ณด์ž…์žฅ์—์„œ ํŠธ๋Ÿฌ๋ธ” ์ŠˆํŒ…์— ์ข€ ๋” ์šฉ์ดํ• ๊ฒƒ ๊ฐ™์•„์„œ Airflow๋ฅผ ์„ ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.

AWS S3

  • AWS S3 ์—๋Š” raw data๊ฐ€ ์ €์žฅ๋˜๋Š” ๊ณณ + AWS EMR ์—์„œ spark๋ฅผ ์ด์šฉํ•œ data transformation ๊ฒฐ๊ณผ ํ…Œ์ด๋ธ”์„ ์ €์žฅํ•˜๋Š” ์šฉ๋„๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. AWS EMR์„ ์ด์šฉํ•œ ๊ฒฐ๊ณผ๋ฌผ์„ ์ €์žฅํ•˜๋Š” ์šฉ๋„๋กœ๋Š” HDFS ๋„ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ,
    1. AWS EMR ์—์„œ HDFS ์ €์žฅ ๋น„์šฉ์„ ๋Š˜๋ฆฌ๊ณ  ์‹ถ์ง€ ์•Š๋‹ค๋Š” ์ 
    2. AWS ์„œ๋น„์Šค๋“ค์€ ๊ฐ™์€ data center๋ฅผ ๊ณต์œ ํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ S3๋ฅผ HDFS ๋Œ€์šฉ์œผ๋กœ ์ด์šฉํ•˜์—ฌ๋„ ๋„คํŠธ์›Œํฌ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ํฌ์ง€ ์•Š๊ณ , ์‹ผ ๊ฐ€๊ฒฉ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ 
      ๋•Œ๋ฌธ์— AWS S3 ๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค

AWS EMR

  • Data Transformation์„ ํ•˜๋Š” ๊ณผ์ •์—์„œ Airflow ์„œ๋ฒ„์—์„œ ์ง์ ‘ sql๋ฌธ์„ ํ†ตํ•ด์„œ data wrangling์„ ํ•  ์ˆ˜๋„ ์žˆ์—ˆ์ง€๋งŒ,
    1. raw data๊ฐ€ 2GB์— ํ•ด๋‹นํ•˜์—ฌ ํฌ๋‹ค๋Š” ์ 
    2. Airflow operator๋ฅผ ํ†ตํ•ด์„œ AWS EMR์„ ์ž๋™ ์ƒ์„ฑ ๋ฐ ์ข…๋ฃŒ๋ฅผ ์ง€์›ํ•˜๊ณ  AWS ์„œ๋น„์Šค ๋‚ด์—์„œ์˜ (S3 -> EMR -> S3 -> Redshift) ๋ฐ์ดํ„ฐ ์ด๋™์ด ๊ฐ„ํŽธํ•˜๊ณ  ๋น ๋ฅด๋‹ค๋Š” ์ 
    3. data transformation ์„ ์œ„ํ•ด ์ž ๊น ๋™์•ˆ AWS EMR์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์€ ๋น„์šฉ์ด ๊ทธ๋ ‡๊ฒŒ ํฌ์ง€ ์•Š๋‹ค๋Š” ์ 
    4. Spark๋ฅผ ์ง€์›ํ•ด์„œ pySpark๋ฅผ ํ†ตํ•œ ๋น ๋ฅธ data transformation์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ 
    5. Spark๋Š” intermediate output ์ €์žฅ ์‹œ ํ•˜๋“œ ๋Œ€์‹  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์—์„œ Hadoop ๋Œ€์‹  ์ŠคํŒŒํฌ๋ฅผ ์‚ฌ์šฉ ํ–ˆ์Šต๋‹ˆ๋‹ค.
      ์ด๋Ÿฌํ•œ ์  ๋•Œ๋ฌธ์— AWS EMR์„ ์ด์šฉํ•˜์—ฌ data transformation ์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค

AWS Redshift

  • ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค๋กœ AWS Aurora, RDS ๋“ฑ ์‚ฌ์–‘ํ•œ ์„œ๋น„์Šค๋“ค์ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์ง€๋งŒ,
    1. Column based Redshift๊ฐ€ ๋‹ค๋ฅธ DB ๋ณด๋‹ค ๋” ๋น ๋ฅธ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๋Š” ์ 
    2. PostgreSQL ์„ ์ง€์›ํ•˜์—ฌ S3 ๋กœ๋ถ€ํ„ฐ COPY COMMAND๋กœ ๋น ๋ฅด๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์ ์žฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ 
    3. Redshift๊ฐ€ OLAP ์™€ ๊ฐ™์€ analytical query ์—์„œ ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๋Š” ์ 
      ๋•Œ๋ฌธ์— AWS Redshift๋ฅผ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.


๐Ÿค” Struggle Points

S3, Redshift ๊ด€๋ จ

  • [S3, Redshift] Region ์„ ๋™์ผ์‹œํ•˜๋ฉด, data transfer ์†๋„๊ฐ€ ๋นจ๋ผ์ง (๊ฐ™์€ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ๋‚ด์— ์žˆ๊ธฐ ๋•Œ๋ฌธ)
  • COPY COMMAND ์ž‘์„ฑ์‹œ, ์˜ฎ๊ธฐ๋ ค๋Š” ํŒŒ์ผ(json, csv, parquet) ๋ฐ์ดํ„ฐ์˜ ํ—ค๋”(HEADER)๊ฐ€ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€ ๊ต‰์žฅํžˆ ์ค‘์š”ํ•˜๋‹ค. Redshift์— ์ด๋ฏธ Columns๋“ค์„ ๋งŒ๋“ค์—ˆ๋‹ค๋ฉด, ignoreheader=1 ์˜ต์…˜์„ ๊ผญ ๋„ฃ์–ด์ค˜์•ผํ•จ
    • ignoreheader=1 ์˜ต์…˜์„ ์ถ”๊ฐ€ํ–ˆ์œผ๋ฏ€๋กœ, Redshift์—์„œ ๋ ˆ์ฝ”๋“œ๋ฅผ ์ฝ์„๋•Œ ์ปฌ๋Ÿผ๋ช… ์ •๋ณด ์—†์ด ๊ฐ’๋“ค๋งŒ ์ˆœ์„œ๋Œ€๋กœ ์ฝ์œผ๋ฏ€๋กœ, Redshift ์ปฌ๋Ÿผ๋ช… ์ •์˜ํ•  ๋•Œ ์ˆœ์„œ๊ฐ€ ์ค‘์š”
  • COPY COMMAND ์ž‘์„ฑ์‹œ, ์˜ต์…˜์— "TIMEFORMAT 'YYYY-MM-DD HH:MI:SS" ์ถ”๊ฐ€ํ•ด์ค˜์•ผํ•จ
    • ์˜ต์…˜ ์•ˆ ์ ์„์‹œ Redshift์— "yyyy-MM-dd HH:mm:ss.0" default ํ˜•์‹์œผ๋กœ ์ €์žฅ๋จ

AirFlow ๊ด€๋ จ

  • Airflow ๋ฒ„์ „๋ณ„(v1, v2)๋กœ CustomOperator ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ์‚ฌ์šฉ๋ฒ•์ด ๋‹ค๋ฅด๋ฏ€๋กœ ์ฃผ์˜ํ•  ๊ฒƒ. ํ˜„ ํ”„๋กœ์ ํŠธ๋Š” v1.1 ๊ธฐ์ค€
  • ๊ฐ€๋” ๋จนํ†ต์ด ๋  ๋•Œ๊ฐ€ ์žˆ๋Š”๋ฐ, web server ๋ฆฌ๋ถ€ํŒ…ํ•˜์ž

pySpark ๊ด€๋ จ

  • Timestamp() ํ˜•์‹ ๊ด€๋ จ

    • to_timestamp() ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด 10(13)์ž๋ฆฌ Unix Timestamp OR ๋‹ค์–‘ํ•ญ ํ˜•ํƒœ์˜ Default Timestamp ๋ฅผ (yyyy-MM-dd HH:mm:ss) ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜๊ฐ€๋Šฅ
    • df = kg_df.withColumn("ts", to_timestamp(df.unix_ts / 1000)
      df = kg_df.withColumn("ts", to_timestamp(df.default_ts, "yyyy-MM-dd'T'HH:mm:ssZ"))
    • ์‹œ๊ฐ„๋ผ๋ฆฌ์˜ ์—ฐ์‚ฐ์—๋Š” Unix Timestampํ˜•ํƒœ๋กœ ๋ฐ”๊ฟ”์„œ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ํŽธํ•จ
    • df = df.withColumn("added timestamp", to_timestamp(unix_timestamp(df.ts) + 1232))
    • to_timestamp(): [string/long(10) ํ˜•์‹ โ†’ timestamp ํ˜•์‹] ๋ณ€ํ™˜
    • unix_timestamp(): [Default timestamp ํ˜•์‹ โ†’ unix timestamp ํ˜•์‹] ๋ณ€ํ™˜
    • Unix timestamp ๊ด€๋ จ ํ•จ์ˆ˜
  • ์ธ๋ฑ์Šค ๋„˜๋ฒ„(index) ์ถ”๊ฐ€ํ•˜๊ธฐ

    • Pandas index ์ดˆ๊ธฐํ™” ๊ธฐ๋ณธ ํ•จ์ˆ˜๋ฅผ pySpark๋Š” ์ œ๊ณตํ•˜์ง€ ์•Š์Œ
      df.reset_index(drop=False, inplace=True)
    • ๋”ฐ๋ผ์„œ 'row_number()' ์™€ 'window ํ•จ์ˆ˜' ์กฐํ•ฉ์œผ๋กœ index column(0,1,2,...,n-1) ์ถ”๊ฐ€ ๊ฐ€๋Šฅ
    • window = Window.orderBy(kill_log_df.killer_id)
      kill_log_df = kill_log_df.withColumn('kill_log_id', row_number().over(window))
      • Window.orderBy()๋ฅผ ํ†ตํ•ด ํŠน์ • ์—ด์— ๋Œ€ํ•œ windowSpec ์„ ๋งŒ๋“ค์–ด์ค€๋‹ค
      • row_number()๋ฅผ ํ†ตํ•ด windowSpec์— ๋Œ€ํ•ด ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ๋ฅผ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค
  • pySpark โ†’ S3 ๋กœ write ํ•  ๋•Œ (df.write.csv("s3a://xxxxx", timestampFormat="....")

    • timestampFormat ์ธ์ž๋ฅผ ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด default ํฌ๋งท์œผ๋กœ write ๋˜๋ฏ€๋กœ ์›ํ•˜๋Š” ํฌ๋งท์ด ์žˆ์œผ๋ฉด ๊ผญ ๋ช…์‹œํ•ด์ค˜์•ผํ•จ timestampFormat = "yyyy-MM-dd HH:mm:ss"
  • Unix timestamp(์ •์ˆ˜ 13์ž๋ฆฌ, ๋ฐ€๋ฆฌ์ดˆ) ์ฒ˜๋ฆฌ๋ฅผ ์ฃผ์˜

AWS EMR ๊ด€๋ จ

  • Airflow EmrTerminateJobFlowOperator ๋ฅผ ์จ์„œ EMR Auto termination ๋ช…๋ น์„ ๋‚ด๋ฆด ๋•Œ EMR version์ด 5.34.0 ์ด์ƒ์ด์–ด์•ผํ•จ

ํ…Œ์ด๋ธ” ๋ฐ PostgreSQL ๊ด€๋ จ

  • NUMERIC ํƒ€์ž…์€ Numeric(precision, scale) ์ธ์ž๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š”๋ฐ, precision์€ ์ „์ฒด(์†Œ์ˆ˜์ ํฌํ•จ) ์ˆซ์ž ๊ธธ์ด๋ฅผ ๋œปํ•˜๊ณ  scales์€ ์†Œ์ˆ˜์ ์ž๋ฆฌ๋ฅผ ๋œปํ•œ๋‹ค. ๋”ฐ๋ผ์„œ Numeric(5,2)๋Š” (-999.99 ~ 999.99 ๊นŒ์ง€ ์ปค๋ฒ„๊ฐ€๋Šฅ). default scale ๊ฐ’์ด 0 ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋žตํ•˜๋ฉด ์†Œ์ˆ˜์  ์ˆซ์ž๋ฅผ ํ‘œ๊ธฐํ•  ์ˆ˜ ์—†์Œ!!!
  • distkey, sortkey ์ถ”๊ฐ€ํ•ด๋ณด๊ณ 

๐Ÿƒ Improvement to be done

  • Redshift table์— distribution style, sorting key ์ถ”๊ฐ€ํ•ด์„œ ์ฟผ๋ฆฌ ์„ฑ๋Šฅ ๊ฒ€์ฆํ•ด๋ณด๊ธฐ
    • dist/sort key๋Š” 2๊ฐœ ์ด์ƒ์˜ node๋กœ ๊ตฌ์„ฑ๋œ cluster์—์„œ ํšจ๊ณผ๊ฐ€ ๋‚˜์˜ด
    • why? distkey ์ž์ฒด๊ฐ€ ๊ฐ node์— ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ฅด๊ฒŒ ๋ถ„๋ฐฐํ•ด์„œ shuffling overhead๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ธ๋ฐ node๊ฐ€ 1๊ฐœ๋ฉด ์–ด์ฐจํ”ผ ํ•œ node์—์„œ JOIN์ด ์ผ์–ด๋‚˜๋‹ˆ ํšจ๊ณผ X
  • Redshift table์— BI Tool ์—ฐ๊ฒฐํ•ด์„œ analytics ํ•ด๋ณด๊ธฐ
  • Full refresh (DAG ๋Œ๋ฆด๋•Œ๋งˆ๋‹ค ๋ชจ๋“  ๊ฒƒ์„ ์ „๋ถ€ ์ƒˆ๋กœ ETL) ๋ง๊ณ  Execution date ๋ฅผ ๊ธฐ์ค€์œผ๋กœ backfilling ํ•ด๋ณด๊ธฐ

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published