💄 Create a more intuitive `artifact.describe()` #2221

falexwolf · 2024-11-26T20:43:03Z

@chaichontat proposed the below in early February:

Define consistent column widths

NAME_WIDTH = 25
TYPE_WIDTH = 25
VALUES_WIDTH = 40

Main artifact tree

artifact = Tree(Text("Artifact – h5ad", style="bold"), guide_style="dim")

general = artifact.add(Text("General", style="bold green"))
general.add(".uid = 'dP0F1fEQWtorhDaI0000'")
general.add(".storage = s3://...")
general.add(Text.assemble(".path = ", ("storage/", "dim"), " example_datasets/dataset2.h5ad"))
general.add(Text.assemble(".created_at = ", highlight_time("2024-11-25 14:59:38")))
general.add(Text(".transform = 'Towards pretty dataframes with features & params'", style="green"))

Labels as a branch of artifact

labels = artifact.add(Text("Labels", style="bold red"))
labels_table = Table(
Column("Name", style="", no_wrap=True, width=NAME_WIDTH),
Column("Type", style="dim", no_wrap=True, width=TYPE_WIDTH),
Column("Values", width=VALUES_WIDTH, no_wrap=True),
show_header=False,
box=None,
pad_edge=False,
)
labels_table.add_row(".cell_types", Text("bio.CellType", style="dim"), "T cell, B cell")
labels_table.add_row(".ulabels", Text("ULabel", style="dim"), "DMSO, IFNG, Candidate marker study 2")
labels.add(labels_table)

Features as a branch of artifact

features = artifact.add(Text("Features", style="dark_orange bold"))

Create combined tables for each feature group that include the header

def create_feature_table(name: str, type_info: str, data: list) -> Table:
table = Table(
Column(Text.assemble((name, "purple"), (" ← " + type_info, "dim")), style="", no_wrap=True, width=NAME_WIDTH),
Column("", style="dim", no_wrap=True, width=TYPE_WIDTH),
Column("", width=VALUES_WIDTH, no_wrap=True),
show_header=True,
box=None,
pad_edge=False,
)
for row in data:
table.add_row(*row)
return table

External features

external_data = [
("study", Text("cat[ULabel]", style="dim"), "Candidate marker study 2"),
("duration_s", Text("float", style="dim"), "122.0"),
("temperature", Text("float", style="dim"), "22.6"),
]
features.add(create_feature_table("external", "3", external_data))

Obs features

obs_data = [
("cell_type_by_model", Text("cat[bionty.CellType]", style="dim"), "B cell, T cell"),
("perturbation", Text("cat[ULabel]", style="dim"), "DMSO, IFNG"),
]
features.add(create_feature_table("obs", "2", obs_data))

Var features

var_data = [
("CD38", Text("number", style="dim"), ""),
("CD4", Text("number", style="dim"), ""),
("CD8A", Text("number", style="dim"), ""),
]
features.add(create_feature_table("var", "3 [bionty.Gene]", var_data))

Print the complete tree

from rich import print
print(artifact)

falexwolf · 2024-11-27T09:36:26Z

A few options. Design (1) is essentially the same as the last iteration yesterday night.

Source: https://lamin.ai/laminlabs/lamindata/transform/A4EupIQ7dEWx0001

Designs	Designs
(1) Separate features vs. labels	(2) Separate dataset vs. annotations

(3) Group labels into another section "Relations"	(Current)

Zethson · 2024-11-27T11:10:08Z

Be careful with so many colors. They can render pretty differently on different terminals. The brighter or darker they are, the higher the risk that they will be unreadable. The dark red and dark blue that I see here are risky IMO.

Generally, the biggest win for me would be https://laminlabs.slack.com/archives/C04FPE8V01W/p1729856432777079 because even now, I think that the output is nothing but confusing. Especially to people that are new to lamin.

falexwolf · 2024-11-27T15:01:23Z

Be careful with so many colors.

That's not yet the point. They will be worked on.

Generally, the biggest win for me would be

The kind of tabular structure is the main question, in particular what to group. Quoting from our Slack thread.

So, which of the designs do you prefer? Sunny & I prefer Design 2.

Zethson · 2024-11-27T15:11:02Z

I'm not sure. The name datasets is a bit weird because that's the whole Artifact. data would be more correct maybe?

Annotations is great because we also use that term in the Curator flow. But yeah 2) seems to be the best.

falexwolf · 2024-11-27T18:12:20Z

I'm not sure. The name datasets is a bit weird because that's the whole Artifact. data would be more correct maybe?

That's an interesting take and I'm surprised to see us misaligned on such a fundamental topic! To me, the dataset/artifact is the data on S3. The artifact object in LaminDB is a wrapper around it that contextualizes the metadata of that dataset/artifact. So, an artifact object in lamindb is not the artifact, it's just a wrapper.

Of course in casual language you wouldn't make that difference, but it's important that we use the same finegrained definitions.

The problem with the term data is that it's too overloaded. Both the metadata and the data in the dataset are adata. In fact, the dataset contains numeral actual data and metadata in terms of identifiers.

falexwolf · 2024-11-28T08:49:35Z

Can you take this @sunnyosun? First thing would be to refactor all data gathering logic and separate it from the "printing logic". I already did this for the whole features part in my last PR with Claude's help:

✨ Support dtype = 'datetime' and improve annotating with, retrieving & removing feature values #2218

It'll be easy for the "labels" part.

sunnyosun · 2024-11-29T09:38:09Z

Can you take this @sunnyosun? First thing would be to refactor all data gathering logic and separate it from the "printing logic". I already did this for the whole features part in my last PR with Claude's help:

✨ Improve annotating with, retrieving & removing feature values #2218

It'll be easy for the "labels" part.

Done here: #2225

falexwolf changed the title ~~💄 Create a more intuitive and appealing artifact.describe()~~ 💄 Create a more intuitive artifact.describe() Dec 1, 2024

falexwolf linked a pull request Dec 1, 2024 that will close this issue

💄 Create a more intuitive artifact.describe() #2236

Merged

sunnyosun closed this as completed in #2236 Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💄 Create a more intuitive `artifact.describe()` #2221

💄 Create a more intuitive `artifact.describe()` #2221

falexwolf commented Nov 26, 2024 •

edited

Loading

falexwolf commented Nov 26, 2024 •

edited

Loading

falexwolf commented Nov 26, 2024 •

edited

Loading

Define consistent column widths

Main artifact tree

Labels as a branch of artifact

Features as a branch of artifact

Create combined tables for each feature group that include the header

External features

Obs features

Var features

Print the complete tree

falexwolf commented Nov 27, 2024 •

edited

Loading

Zethson commented Nov 27, 2024 •

edited

Loading

falexwolf commented Nov 27, 2024 •

edited

Loading

Zethson commented Nov 27, 2024

falexwolf commented Nov 27, 2024

falexwolf commented Nov 28, 2024

sunnyosun commented Nov 29, 2024

💄 Create a more intuitive artifact.describe() #2221

💄 Create a more intuitive artifact.describe() #2221

Comments

falexwolf commented Nov 26, 2024 • edited Loading

falexwolf commented Nov 26, 2024 • edited Loading

falexwolf commented Nov 26, 2024 • edited Loading

Define consistent column widths

Main artifact tree

Labels as a branch of artifact

Features as a branch of artifact

Create combined tables for each feature group that include the header

External features

Obs features

Var features

Print the complete tree

falexwolf commented Nov 27, 2024 • edited Loading

Zethson commented Nov 27, 2024 • edited Loading

falexwolf commented Nov 27, 2024 • edited Loading

Zethson commented Nov 27, 2024

falexwolf commented Nov 27, 2024

falexwolf commented Nov 28, 2024

sunnyosun commented Nov 29, 2024

💄 Create a more intuitive `artifact.describe()` #2221

💄 Create a more intuitive `artifact.describe()` #2221

falexwolf commented Nov 26, 2024 •

edited

Loading

falexwolf commented Nov 26, 2024 •

edited

Loading

falexwolf commented Nov 26, 2024 •

edited

Loading

falexwolf commented Nov 27, 2024 •

edited

Loading

Zethson commented Nov 27, 2024 •

edited

Loading

falexwolf commented Nov 27, 2024 •

edited

Loading