Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json schema dataset not showing lineage #12113

Open
lekhamaru opened this issue Dec 12, 2024 · 0 comments
Open

Json schema dataset not showing lineage #12113

lekhamaru opened this issue Dec 12, 2024 · 0 comments
Labels
bug Bug report

Comments

@lekhamaru
Copy link

lekhamaru commented Dec 12, 2024

Describe the bug
I'm using the json-schema plugin to ingest json schemas into DataHub. I'm successfully able to ingest the schemas but not able to view the lineage. Example2 schema refers to Example1 schema but I cannot see it when I try to view in the Visualize Lineage.

To Reproduce
Steps to reproduce the behavior:

  1. Example2 is a json schema file that refers to Example1. Kept both files in one location.
  2. Used the recipe file
    pipeline_name: json_schema_ingestion
    source:
    type: json-schema
    config:
    path: "C:/datahub/Entity/"
    use_id_as_base_uri: true
    platform: TMFSchemaRegistry # e.g. schemaregistry

    platform_instance:

    stateful_ingestion:
    enabled: false # recommended to have this turned on

sink configs if needed

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"

  1. Used the command datahub ingest -c json-schema_recipe.yml to ingest the 2 schemas. I had to use absolute path on $ref because ingestion was not supporting relative paths.

Example1.schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "Example1.schema.json",
"title": "Example1",
"type": "object",
"definitions": {
"Example1": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": "The address of the entity."
},
"city": {
"type": "string",
"description": "City of the entity."
}
},
"required": ["address", "city"]
}
}
}

Example2.schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "Example2.schema.json",
"title": "Example2",
"type": "object",
"allOf": [
{
"$ref": "file:///C:/datahub/Entity/Example1.schema.json#/definitions/Example1"

}

],
"properties": {
"id": {
"type": "string",
"description": "The identifier of the entity."
},
"name": {
"type": "string",
"description": "Name of the entity."
}
},
"required": ["id", "name"]
}

  1. After successful ingestion, I can see the 2 schemas in the Datahub UI.
  2. But when I try to visualize lineage I cannot see it from both the schemas.
  3. pip install acryl-datahub[datahub-rest,json-schema] says Requirement already satisfied

Expected behavior
The lineage should be visible both upstream and downstream from both schemas.

Screenshots
Image of Example1 in UI

{297F89BF-A44A-4F59-97ED-9ED8649F28A3} {ADF296D1-47A3-4C13-8944-6387AD255886}

Image of Example2 in UI
{F14572C5-9075-473F-844A-992D32EA4B7B}
{4B35EE99-A61F-45B8-92B2-B1BC746DC212}

Desktop (please complete the following information):

  • OS: Windows
  • Browser chrome
    Datahub quickstart version - acryl-datahub, version 0.14.1
    'py_version': '3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)]',
    'py_exec_path': 'c:\Python\python.exe',
    'os_details': 'Windows-10-10.0.22631-SP0',

Additional context
Another issue is that I'm unable to use relative paths in the json schemas only absolute paths work.
Also I can see only the raw form of Example1 schema and not the tabular form. I can see the same for Exampe2.
Is this the expected behavior.

@lekhamaru lekhamaru added the bug Bug report label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

1 participant