Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💄 Create a more intuitive artifact.describe() #2221

Closed
falexwolf opened this issue Nov 26, 2024 · 9 comments · Fixed by #2236
Closed

💄 Create a more intuitive artifact.describe() #2221

falexwolf opened this issue Nov 26, 2024 · 9 comments · Fixed by #2236

Comments

@falexwolf
Copy link
Member

falexwolf commented Nov 26, 2024

@chaichontat proposed the below in early February:

image

Internal Slack ref.

More internal Slack refs on this topic:

@falexwolf
Copy link
Member Author

falexwolf commented Nov 26, 2024

Here is a first draft building on @chaichontat's proposal: https://lamin.ai/laminlabs/lamindata/transform/A4EupIQ7dEWx0000

Tentative outcome is:
image

Decided against the below approaches:

image image

Will keep iterating tomorrow.

@falexwolf
Copy link
Member Author

falexwolf commented Nov 26, 2024

This is better:
image

Compare with current:
image

Code

from rich.tree import Tree from rich.text import Text from rich.table import Table, Column

def highlight_time(iso: str):
tz = datetime.datetime.now().astimezone().tzinfo
res = (
datetime.datetime.fromisoformat(iso)
.replace(tzinfo=datetime.timezone.utc)
.astimezone(tz)
.strftime("%Y-%m-%d %H:%M:%S")
)
return Text(res, style="dim")

Define consistent column widths

NAME_WIDTH = 25
TYPE_WIDTH = 25
VALUES_WIDTH = 40

Main artifact tree

artifact = Tree(Text("Artifact – h5ad", style="bold"), guide_style="dim")

general = artifact.add(Text("General", style="bold green"))
general.add(".uid = 'dP0F1fEQWtorhDaI0000'")
general.add(".storage = s3://...")
general.add(Text.assemble(".path = ", ("storage/", "dim"), " example_datasets/dataset2.h5ad"))
general.add(Text.assemble(".created_at = ", highlight_time("2024-11-25 14:59:38")))
general.add(Text(".transform = 'Towards pretty dataframes with features & params'", style="green"))

Labels as a branch of artifact

labels = artifact.add(Text("Labels", style="bold red"))
labels_table = Table(
Column("Name", style="", no_wrap=True, width=NAME_WIDTH),
Column("Type", style="dim", no_wrap=True, width=TYPE_WIDTH),
Column("Values", width=VALUES_WIDTH, no_wrap=True),
show_header=False,
box=None,
pad_edge=False,
)
labels_table.add_row(".cell_types", Text("bio.CellType", style="dim"), "T cell, B cell")
labels_table.add_row(".ulabels", Text("ULabel", style="dim"), "DMSO, IFNG, Candidate marker study 2")
labels.add(labels_table)

Features as a branch of artifact

features = artifact.add(Text("Features", style="dark_orange bold"))

Create combined tables for each feature group that include the header

def create_feature_table(name: str, type_info: str, data: list) -> Table:
table = Table(
Column(Text.assemble((name, "purple"), (" ← " + type_info, "dim")), style="", no_wrap=True, width=NAME_WIDTH),
Column("", style="dim", no_wrap=True, width=TYPE_WIDTH),
Column("", width=VALUES_WIDTH, no_wrap=True),
show_header=True,
box=None,
pad_edge=False,
)
for row in data:
table.add_row(*row)
return table

External features

external_data = [
("study", Text("cat[ULabel]", style="dim"), "Candidate marker study 2"),
("duration_s", Text("float", style="dim"), "122.0"),
("temperature", Text("float", style="dim"), "22.6"),
]
features.add(create_feature_table("external", "3", external_data))

Obs features

obs_data = [
("cell_type_by_model", Text("cat[bionty.CellType]", style="dim"), "B cell, T cell"),
("perturbation", Text("cat[ULabel]", style="dim"), "DMSO, IFNG"),
]
features.add(create_feature_table("obs", "2", obs_data))

Var features 

var_data = [
("CD38", Text("number", style="dim"), ""),
("CD4", Text("number", style="dim"), ""),
("CD8A", Text("number", style="dim"), ""),
]
features.add(create_feature_table("var", "3 [bionty.Gene]", var_data))

Print the complete tree

from rich import print
print(artifact)

@falexwolf
Copy link
Member Author

falexwolf commented Nov 27, 2024

A few options. Design (1) is essentially the same as the last iteration yesterday night.

Source: https://lamin.ai/laminlabs/lamindata/transform/A4EupIQ7dEWx0001

Designs Designs
(1) Separate features vs. labels (2) Separate dataset vs. annotations
image image
(3) Group labels into another section "Relations" (Current)
image image

@Zethson
Copy link
Member

Zethson commented Nov 27, 2024

Be careful with so many colors. They can render pretty differently on different terminals. The brighter or darker they are, the higher the risk that they will be unreadable. The dark red and dark blue that I see here are risky IMO.

Generally, the biggest win for me would be https://laminlabs.slack.com/archives/C04FPE8V01W/p1729856432777079 because even now, I think that the output is nothing but confusing. Especially to people that are new to lamin.

@falexwolf
Copy link
Member Author

falexwolf commented Nov 27, 2024

Be careful with so many colors.

That's not yet the point. They will be worked on.

Generally, the biggest win for me would be

The kind of tabular structure is the main question, in particular what to group. Quoting from our Slack thread.
image

So, which of the designs do you prefer? Sunny & I prefer Design 2.

@Zethson
Copy link
Member

Zethson commented Nov 27, 2024

I'm not sure. The name datasets is a bit weird because that's the whole Artifact. data would be more correct maybe?

Annotations is great because we also use that term in the Curator flow. But yeah 2) seems to be the best.

@falexwolf
Copy link
Member Author

I'm not sure. The name datasets is a bit weird because that's the whole Artifact. data would be more correct maybe?

That's an interesting take and I'm surprised to see us misaligned on such a fundamental topic! To me, the dataset/artifact is the data on S3. The artifact object in LaminDB is a wrapper around it that contextualizes the metadata of that dataset/artifact. So, an artifact object in lamindb is not the artifact, it's just a wrapper.

Of course in casual language you wouldn't make that difference, but it's important that we use the same finegrained definitions.

The problem with the term data is that it's too overloaded. Both the metadata and the data in the dataset are adata. In fact, the dataset contains numeral actual data and metadata in terms of identifiers.

@falexwolf
Copy link
Member Author

Can you take this @sunnyosun? First thing would be to refactor all data gathering logic and separate it from the "printing logic". I already did this for the whole features part in my last PR with Claude's help:

It'll be easy for the "labels" part.

@sunnyosun
Copy link
Member

Can you take this @sunnyosun? First thing would be to refactor all data gathering logic and separate it from the "printing logic". I already did this for the whole features part in my last PR with Claude's help:

It'll be easy for the "labels" part.

Done here: #2225

@falexwolf falexwolf changed the title 💄 Create a more intuitive and appealing artifact.describe() 💄 Create a more intuitive artifact.describe() Dec 1, 2024
@falexwolf falexwolf linked a pull request Dec 1, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants