Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain Data Contracts within single Data Product #5

Open
maikelpenz opened this issue Nov 20, 2024 · 1 comment
Open

Chain Data Contracts within single Data Product #5

maikelpenz opened this issue Nov 20, 2024 · 1 comment

Comments

@maikelpenz
Copy link

In my current setup, there are some steps between ingesting data and producing the final data product table.

For example, I don't treat intermediate tables as data products. Instead, I created data contracts for them for reference and my "data products" refer to one or more tables that directly deliver value to the customer. My structure aligns with the medallion architecture as the following example:

Data Product: Sales Silver
Input Port: Upstream Database
Output Ports: Ingestion, Bronze, Silver

All the output ports listed above are represented as data contracts, following this logical sequence:
Upstream Database → Ingestion → Bronze → Silver

It doesn’t make sense to create data products from the Ingestion and Bronze stages because those tables are not customer-facing.

What I wish was possible:

  • Define intermediate processes within a data product: Configure the data contracts involved in the data product, specifying the processing sequence.
  • Simplify output ports: By only setting the final table in the chain (e.g., "Silver") as the data product's output port, intermediate stages would remain part of the internal process rather than appearing as standalone products.
@maikelpenz maikelpenz changed the title Chain Data Product output ports Chain Data Contracts within single Data Product Nov 20, 2024
@jochenchrist
Copy link
Contributor

Hi @maikelpenz

Thanks for the feedback and this feature request.

Currently, I'd recommend these options:

  1. Consider raw and bronze tables as internal details of the data product. Do not define them as output port. You can use assets (https://api.datamesh-manager.com/swagger/index.html#/Assets) (<- new feature) to assign these tables/views to a data product.
image
  1. If you want to have a data contract for your source data, define a proxy data product (Sales Raw / Sales Bronze), which is internal to the team. We have in backlog a feature to define the visibility of data products.
image

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants