Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading dataset from a different instance that does not have all schemas of the current instance errors #2117

Closed
Zethson opened this issue Oct 29, 2024 · 7 comments · Fixed by #2132
Assignees
Labels

Comments

@Zethson
Copy link
Member

Zethson commented Oct 29, 2024

Report

lamin init --storage ./test-perturbation --schema bionty,wetlab,findrefs

ln.track("HIRTYxL3aZc70000")

adata = ln.Artifact.using("laminlabs/lamindata").get(uid="Xk7Qaik9vBLV4PKf0001").load()

results in

{
	"name": "ProgrammingError",
	"message": "relation \"findrefs_reference\" does not exist
LINE 1: SELECT 1 AS \"a\" FROM \"findrefs_reference\" INNER JOIN \"findre...
                             ^
",
	"stack": "---------------------------------------------------------------------------
UndefinedTable                            Traceback (most recent call last)
File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:105, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
    104 else:
--> 105     return self.cursor.execute(sql, params)

UndefinedTable: relation \"findrefs_reference\" does not exist
LINE 1: SELECT 1 AS \"a\" FROM \"findrefs_reference\" INNER JOIN \"findre...
                             ^


The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
Cell In[5], line 1
----> 1 adata = ln.Artifact.using(\"laminlabs/lamindata\").get(uid=\"Xk7Qaik9vBLV4PKf0001\").load()
      2 adata.obs.head(3)

File ~/PycharmProjects/lamindb/lamindb/_artifact.py:994, in load(self, is_run_input, **kwargs)
    992     access_memory = load_to_memory(cache_path, **kwargs)
    993 # only call if load is successfull
--> 994 _track_run_input(self, is_run_input)
    995 return access_memory

File ~/PycharmProjects/lamindb/lamindb/core/_data.py:465, in _track_run_input(data, is_run_input, run)
    458             is_valid = True
    459         return (
    460             data.run_id != run.id
    461             and not data._state.adding  # this seems duplicated with data._state.db is None
    462             and is_valid
    463         )
--> 465     input_data = [data for data in data_iter if is_valid_input(data)]
    466     input_data_ids = [data.id for data in input_data]
    467 if input_data:

File ~/PycharmProjects/lamindb/lamindb/core/_data.py:457, in _track_run_input.<locals>.is_valid_input(data)
    450 else:
    451     # record is on another db
    452     # we have to save the record into the current db with
    453     # the run being attached to a transfer transform
    454     logger.important(
    455         f\"completing transfer to track {data.__class__.__name__}('{data.uid[:8]}') as input\"
    456     )
--> 457     data.save()
    458     is_valid = True
    459 return (
    460     data.run_id != run.id
    461     and not data._state.adding  # this seems duplicated with data._state.db is None
    462     and is_valid
    463 )

File ~/PycharmProjects/lamindb/lamindb/_artifact.py:1107, in save(self, upload, **kwargs)
   1104     # ensure that the artifact is uploaded
   1105     self._to_store = True
-> 1107 self._save_skip_storage(**kwargs)
   1109 from lamindb._save import check_and_attempt_clearing, check_and_attempt_upload
   1111 using_key = None

File ~/PycharmProjects/lamindb/lamindb/_artifact.py:1138, in _save_skip_storage(file, **kwargs)
   1136 def _save_skip_storage(file, **kwargs) -> None:
   1137     save_feature_sets(file)
-> 1138     super(Artifact, file).save(**kwargs)
   1139     save_feature_set_links(file)

File ~/PycharmProjects/lamindb/lamindb/_record.py:618, in save(self, *args, **kwargs)
    616     self_on_db.features = FeatureManager(self_on_db)
    617     self.features._add_from(self_on_db, transfer_logs=transfer_logs)
--> 618     self.labels.add_from(self_on_db, transfer_logs=transfer_logs)
    619 for k, v in transfer_logs.items():
    620     if k != \"run\":

File ~/PycharmProjects/lamindb/lamindb/core/_label_manager.py:208, in LabelManager.add_from(self, data, transfer_logs)
    204 for related_name, (_, labels) in get_labels_as_dict(
    205     data, instance=self._host._state.db
    206 ).items():
    207     labels = labels.all()
--> 208     if not labels.exists():
    209         continue
    210     # look for features

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/query.py:1288, in QuerySet.exists(self)
   1284 \"\"\"
   1285 Return True if the QuerySet would have any results, False otherwise.
   1286 \"\"\"
   1287 if self._result_cache is None:
-> 1288     return self.query.has_results(using=self.db)
   1289 return bool(self._result_cache)

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/sql/query.py:660, in Query.has_results(self, using)
    658 q = self.exists(using)
    659 compiler = q.get_compiler(using=using)
--> 660 return compiler.has_results()

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/sql/compiler.py:1542, in SQLCompiler.has_results(self)
   1537 def has_results(self):
   1538     \"\"\"
   1539     Backends (e.g. NoSQL) can override this in order to use optimized
   1540     versions of \"query has any results.\"
   1541     \"\"\"
-> 1542     return bool(self.execute_sql(SINGLE))

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/sql/compiler.py:1574, in SQLCompiler.execute_sql(self, result_type, chunked_fetch, chunk_size)
   1572     cursor = self.connection.cursor()
   1573 try:
-> 1574     cursor.execute(sql, params)
   1575 except Exception:
   1576     # Might fail for server-side cursors (e.g. connection closed)
   1577     cursor.close()

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:79, in CursorWrapper.execute(self, sql, params)
     78 def execute(self, sql, params=None):
---> 79     return self._execute_with_wrappers(
     80         sql, params, many=False, executor=self._execute
     81     )

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:92, in CursorWrapper._execute_with_wrappers(self, sql, params, many, executor)
     90 for wrapper in reversed(self.db.execute_wrappers):
     91     executor = functools.partial(wrapper, executor)
---> 92 return executor(sql, params, many, context)

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:100, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
     98     warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)
     99 self.db.validate_no_broken_transaction()
--> 100 with self.db.wrap_database_errors:
    101     if params is None:
    102         # params default might be backend specific.
    103         return self.cursor.execute(sql)

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/utils.py:91, in DatabaseErrorWrapper.__exit__(self, exc_type, exc_value, traceback)
     89 if dj_exc_type not in (DataError, IntegrityError):
     90     self.wrapper.errors_occurred = True
---> 91 raise dj_exc_value.with_traceback(traceback) from exc_value

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:105, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
    103     return self.cursor.execute(sql)
    104 else:
--> 105     return self.cursor.execute(sql, params)

ProgrammingError: relation \"findrefs_reference\" does not exist
LINE 1: SELECT 1 AS \"a\" FROM \"findrefs_reference\" INNER JOIN \"findre...
                             ^
"
}

The get works but the load() errors. Only when track() is on.

Version information

No response

@sunnyosun
Copy link
Member

Do you mean if ln.track() wasn't run, load() works?

@Zethson
Copy link
Member Author

Zethson commented Nov 5, 2024

Correct

@sunnyosun sunnyosun assigned falexwolf and unassigned sunnyosun Nov 5, 2024
@sunnyosun
Copy link
Member

@falexwolf I think this is related to the inter-instance tracking, why is the artifact being saved in this case?

@falexwolf
Copy link
Member

You can only load an artifact if it's saved; otherwise, there is no way to track lineage.

The bug here is independent of data lineage but related to not being able to save the artifact. I thought we're meanwhile able to transfer artifacts across instances with mismatching schemas? I'm surprised this doesn't work.

I know this is hard to test but we should add a test for a target instance whose schema modules is neither a strict super nor a strict subset of the source instance.

@falexwolf
Copy link
Member

Looking at the below line in the traceback I believe we in fact don't have a general problem, just a coverage problem for edge cases:

208     if not labels.exists():

Likely, this case isn't covered in the tests and this leads to the bug.

@sunnyosun
Copy link
Member

But saving an artifact works without ln.track(), so I think the issue is in the tracking.

@sunnyosun
Copy link
Member

Should be fixed here and added tests: #2132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants