Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the pipestat store attribute? #99

Closed
nsheff opened this issue Oct 16, 2023 · 10 comments
Closed

what is the pipestat store attribute? #99

nsheff opened this issue Oct 16, 2023 · 10 comments

Comments

@nsheff
Copy link
Contributor

nsheff commented Oct 16, 2023

the pipestat.PipestatManager has a store attribute:

https://github.com/pepkit/pipestat/blob/dev/pipestat/pipestat.py#L90

It appears to be being used to set/get items from a dict:

def __getitem__(self, key):
return self.store[self._keytransform(key)]
def __setitem__(self, key, value):
self.store[self._keytransform(key)] = value
def __delitem__(self, key):
del self.store[self._keytransform(key)]
def __iter__(self):
return iter(self.store)
def __len__(self):
return len(self.store)

I can't see where this is connected to the backends. What is this store for? Is this a relic that should be removed?

@nsheff nsheff added the question Further information is requested label Oct 16, 2023
@donaldcampbelljr
Copy link
Contributor

donaldcampbelljr commented Oct 16, 2023

This was actually added when we decided to change PipestatManger to inherit from MutableMapping and not dict. Essentially, the above code has pipestat act like a dict, but it is not technically a dict.

@nsheff
Copy link
Contributor Author

nsheff commented Oct 17, 2023

Ok, interesting.

What purpose does it serve? It is not connected to the back-end, so this seems just... superfluous.

@donaldcampbelljr
Copy link
Contributor

It is so that PipestatManager can store attributes like so:

self[STATUS_FILE_DIR] = mk_abs_via_cfg(flag_file_dir, self.config_path or self.file)
self.backend = FileBackend(
self[FILE_KEY],
record_identifier,
self[PIPELINE_NAME],
self[PIPELINE_TYPE],
self[SCHEMA_KEY],
self[STATUS_SCHEMA_KEY],
self[STATUS_FILE_DIR],
self[RESULT_FORMATTER],
self[MULTI_PIPELINE],
)

def record_identifier(self) -> str:
"""
Pipeline type: "sample" or "project"
:return str: pipeline type
"""
return self.get(RECORD_IDENTIFIER)

@nsheff
Copy link
Contributor Author

nsheff commented Oct 18, 2023

I see.

I propose we change this so that:

  1. For internal config metadata, we use a dedicated dict and don't use getitem syntax, so the above would be changed to: self.config[FILE_KEY] and self.config.get(RECORD_IDENTIFIER). Since this seems to be for internal use, I'm not sure we need the getitem convenience.
  2. We change the getitem syntax to actually be an alias for .retrieve, and setitem to be an alias for report. This would be a simplified form, less powerful than using the actual retrieve, but maybe more convenient.

To insert a result then, you'd use psm["my_result_key"] = my result_value.

Why? Because then a PipestateManager object is actually a Mapping of the values it is reporting to the database. This would allow me to use it in seqcol instead of my RDBDict object.

Alternatively, I could just write a little wrapper around PipestatManager that would do this.

@nsheff
Copy link
Contributor Author

nsheff commented Oct 24, 2023

any objection to this @donaldcampbelljr ?

@nsheff
Copy link
Contributor Author

nsheff commented Oct 24, 2023

Related to #97

@donaldcampbelljr
Copy link
Contributor

No objection. I think this is a cool idea. I can begin implementing it this week.

donaldcampbelljr added a commit that referenced this issue Oct 26, 2023
donaldcampbelljr added a commit that referenced this issue Oct 26, 2023
@donaldcampbelljr
Copy link
Contributor

Ok, I have added the discussed changes for this. I opted to use psm.cfg instead of psm.config.

Changes exist in branch dev_99_103_API_Refactor

@nsheff
Copy link
Contributor Author

nsheff commented Oct 26, 2023

excellent, thanks

@nsheff nsheff added likely-solved and removed question Further information is requested labels Oct 27, 2023
donaldcampbelljr added a commit that referenced this issue Dec 22, 2023
* Add get_records to pipestat_manager and add related test #75

* clean up signatures #72

* clean up docstrings

* remove superfluous get_table_name function

* Reduce get_orm and get_model into one function. #71

* Fix CLI to use record-identifier instead of sample-name

* Fix CLI to use pipeline_type #37

* modify get_records to return new structure #75

* fix html report generation using get_records

* add simple CLI test for reporting and retrieving #81

* Pipestat reader (#80)

* update version in prep for pipestat table

* add table function

* add initial stats function from looper, remove counter

* fixing _create_stats_summary and get_file_for_project

* fix sample_level stats reporting

* add object reporting

* remove redundancies

* lint

* add assertion for table generation to pre-existing test

* use pipeline_type when retrieving samples

* Fix stats table generation output to look better

* fix return types

* fix list vs List

* fix typo for stats table

* fix doc strings

* update func names for disambiguation

* fix key error issue in html_reports

* clean up

* adjust docstrings

* update docstrings

Co-authored-by: Nathan Sheffield <[email protected]>

* adjust docstrings, rename html_reports_pipestat.py to reports.py

* adjust LOGGER.info string

* simplify objs and stats generation

* remove old todos

* remove sample_name, add pipeline_name to project field definitions

* work towards using project_name instead of sample_name from project-level pipelines

* fix check record bug, add better error message for pipeline type.

* add checks for pipeline types, lint

* fix get_samples with pipeline type

* fix fetch_pipeline_results

* add basic tests for project_level, fix associated bugs

* allow table creation to use project_name if applicable

* add passing project name to file_backend

* refactor to use r_id as input

* update doc strings

* more refactoring of records vs samples to be more general

* move table generation to reports.py

* allow status file dir path to be associated with config path OR the file path to ensure looper compatibility.

* add output_dir as a parameter for placing reports and stat files. Otherwise defaults to results_file dir or config_file dir

* add obtaining schema_path from pipestat config for Looper compatibility

* added select function to main pipestat class

* added select_txt and select_distinct function to main pipestat class

* added get_one_record function to main pipestat class #70

* Removing CLI req that one must supply config and schema because the user can supply schema within the config. For Looper compatibility.

* Revert "added get_one_record function to main pipestat class #70"

This reverts commit b477060.

* Revert "added select_txt and select_distinct function to main pipestat class"

This reverts commit 4588f4f.

* Revert "added select function to main pipestat class"

This reverts commit ee98145.

* Add basic glossary page in the form of a table. Addresses: pepkit/looper#290

* remove redundant tag

* add RecordNotFoundError across both backends #74

* Add PipelineTypeNotSuppliedError to hide dbbackend information #74

* Initial work towards a fastapi implementation of pipestat reader
#22

* Initial work to separate classes #78

* working implementation of child classes for reporting using DBBACKEND. Manual testing confirmed. pytest Tests broken. #78

* implement report for filebackend

* fix inheritance issue with pipeline_type

* typo

* major refactoring, many tests still broken #78

* Fix signatures for report and explicitly state input arguments.

* remove pipeline type check

* add projectpipestamanager and refactor project_name to record_identifier

* fix get_status

* fix tests

* fix all tests #78

* add todo

* code cleanup and remove getting_table_name

* clean up

* partial removal of pipeline_type logic from DB backend

* remove pipeline_type input arguments from File Backend

* clean up

* add simple class to wrap SamplePipestatManager and ProjectPipestatManager

* change pipestatmanager to mutable mapping #78

* lowercase attributes #78

* polish PipestatBoss #78

* env_vars for pipeline_type

* update docs

* working proof of concept to retrieve results #22

* update readme and add getting output schema

* add else catch for outptu schema and fix typo

* add ability to run with uvicorn if calling reader.py directly.

* add __init__.py and main func

* add more endpoints #22

* update endpoints, add Query

* allow for more complex filtering using post and pydantic models #22

* Add image and file endpoints #22

* lint

* Add basic arg parser to pass absolute path to config file

* Add endpoint for get_records()

* clean up

* modify get_records to return new structure #75

* fix html report generation using get_records

* attempt global creation of psm, does not work

* fix creation of global psm

* change fetching by filetype

* add cli option for pipestat serve

* simplify and fix pipestat serve #22

* add host and port arguments #22

* add config error

* update readme

---------

Co-authored-by: nsheff <[email protected]>

* list to List

* fixed typing in python3.8

* update changelog

* fix table_name_bug

* "0.5.4a1" for alpha release

* "0.6.0a1" for alpha release

* Dev json schema (#87)

* first pass to allow for output_schema as a true JSON schema #85

* remove unused import

* simplify logic, extend _safe_pop_one_mapping to handles multiple keys

* disambiguation key vs keys in parsed schema

* make samples and project objects instead of arrays, add more test assertions, clean up docstrings.

* add status data to string representation of ParsedSchema

* add retrieve_distinct function for #70 and databio/bedhost#75

* add `pipestat link` functionality to create symlinks (#90)

* begin work on file inking via pipestat

* continue implementing link for an output_dir OR the results_file.yaml
#89

* implemented symlinks for filebackend given output_dir
#89

* implemented symlinks for filebackend using results.yaml, polish tests
#89

* fix typo, clean up

* more lcean up

* begin rework to be backend agnostic

* refactor to place in folders specific to a result_identifier

* add more complex types, add temp directory for better testing

* fix recursive finding of paths

* clean up, move to abstract class, confirm works for both backends

* remove unused test files

* add jinja2 to requirements-all.txt #91

* complex example with collision

* add warning

* add cli option

* remove unused functions

* update changelog

* clean up docstrings, remove unused variables, refactor link_dir

* clean up more docstrings

* remove unnecessary print statement to reduce verbosity

* Begin work on adding created and modified datetime, DateTime objects are not parsing correctly, tests broken #93

* fix datetime objects #93

* fix pipeline_name field beginning with underscore #94

* timestamp continuation, list_recent_results does NOT work

* fix doc string for schema

* add filtering by created and modified time #93

* update version for alpha release v0.6.0a2

* fix list reversing #93

* polish tests #93

* add filebackend list_recent_results, remove records broken #93

* fix remove records bug #93

* update doc strings #93

* update change log for list_recent_results

* begin work on retrieve_multiple #96

* add db backend, test broken #96

* fix db backend logic #96

* implement file backend #96

* fix docstrings

* some minor polishing

* update readme

* move pipestat reader per #22

* fix pipestat reader setup and manifest per #22

* Add optional dependencies via setup #22

* pipestat can now run without DB specific dependencies #22

* Some refactoring #64

* add other requirements to test

* refactor to have ParsedSchemaDB inherit from ParsedSchema for code reduction #64

* Clean up

* return total record count #75

* make initialization of file and DB backend into separate functions

* add decorator for checking dependencies

* polishing decorator for checking dependencies

* consolidate decorator and ancillary function into one

* first pass #99, some tests broken

* fix all tests #99

* refactor store to cfg #99

* fix bug with table path

* setitem is now a wrapper for report

* fix getitem implementation, add test #99

* Begin work on select_records, retrieve_one, retrieve_many #103

* Implement select_records, and cursor pagination #103

* clarify init message

* Update dbbackend.py

* add loading JSON and column.contains for filtering #103

* testing differences

* add or_ and and_ sqlmodel operators #103

* some comments

* steps toward smarter pytesting

* #103 -- added multi key filter

* typo

* polish changes to select_records and add tests. #103

* fix typo

* remove Union for python 3.8 compat

* move get_nested_column and define_sqlalchemy_type outside of select_records

* update doc strings

* Modify to be true JSONs chema with using $ref and $def for custom objects (image, file) #85

* Update pipestat/backends/abstract.py

Co-authored-by: Oleksandr <[email protected]>

* Update pipestat/backends/db_backend/dbbackend.py

Co-authored-by: Oleksandr <[email protected]>

* Update pipestat/backends/db_backend/dbbackend.py

Co-authored-by: Oleksandr <[email protected]>

* Update pipestat/pipestat.py

Co-authored-by: Oleksandr <[email protected]>

* Update pipestat/pipestat.py

Co-authored-by: Oleksandr <[email protected]>

* Update tests/data/sample_output_schema.yaml

Co-authored-by: Oleksandr <[email protected]>

* Polish PR, fix docstrings, add wrapper for remove

* remove unnecessary mutablemapping inheritance

* lint

* cleaning

* fixed failing test

* remove old, unused validate_schema function

* change ALL output_schemas to reflect JSON schema

* lint

* per request, remove select function as well as dynamic_filter

* add to do

* revert PipestatManager to a MutableMapping

* remove select_txt and select_distinct

* revert select_distinct for now

* remove unused backend funcs, tests broken

* implement list_recent_results wrapper (DBBackend only)

* implement list_recent_results wrapper (DBBackend only)

* some polish, begin implementing select_records

* more work towards select_records

* change operator func, add list comprehension for sets

* implement AND OR Logic and fix tests, add sorting

* fix time stamp filtering if NOT nested

* lint and fix db test

* attempt to pare down results based on columns, does not work.

* add column filtering for filebackend, add tests, add no filter condition retrieving all records on filebackend

* fix complex object filtering

* implement select_distinct for filebackend

* select_dsitinct now works on record identifiers for filebackend, polish tests

* fix retrieve_one and retrieve_many for filebackend

* fix timestamp test for filebackend

* use isinstance for type checking

* refactored for efficiency

* fix recursive schema after merge

* remove get_records, fix return type for select_records if given columns

* begin remove of get_one_record

* deleted get_one_record function

* fix get_status on dbbackend using select_records

* remove retrieve from dbbackend and reports

* replace retrieve with retrieve_one and begin replacing retrieve with select_records

* lint

* fix select_records filebackend bug, remove retrieve from tests

* finish removing retrieve function

* attempt re-implementing record not found error for retrieve_one and retrieve_many

* fix bugs with select_records and RecordNotFound, implement ColumnNotFound exception

* lint

* update changelog and version for 0.6.0a3 release

* refactoring based on ruff

* update cli and usage and api docs, lint

* update documentation focused on schemas

* add result_identifier as parameter to retrieve_one

* add result_identifier as parameter to retrieve_many

* __version__ = "0.6.0a4"

* Finish adding pytest.mark.skipif

* lint

* move general tests to test_pipestat, remove some redundancy

* implement clear_status pytest for filebackend #110

* polish test set_item, report_overwrite

* add pytest fixture val_dict

* add pytest fixture values_sample, values_project

* add pytest fixture values_complex_linking, move fixtures to conftest.py

* deleted unnecessary print

* add test_select_no_filter_limit

* black formatting

* add pytest fixture for ranges

* lint

* fix record_identifier being returned twice if it is an input column

* fix typo

* change retrieve_one output to no longer we wrapped by select_records output

* fix schema propert to return psm.cfg["_schema"].to_dict()

* fix schema propert to return psm.cfg["_schema"]

* add psm.schema.original_schema to parsed schema

* add r_id = record_identifier or self.record_identifier to retrieve_one

* fix majority of tests due to retrieve_one

* lint

* fix table creation bug with retrieve_one

* remove RecordNotFoundError because it breaks pypiper

* fix parsed_schema json schema parsing to be correct hierarchy

* update jupyter api docs

* update cli docs

* fix property access #113

* add retrieving multiple results using retrieve_one

* list to List

* add quickstart guide

* add psm.schema.resolved_schema

* remove unused function

* version bump to v0.6.0a5

* fix config_path property

* re-implement RecordNotFoundError

* be more selective about which exceptions to pass

* potential solution for #117

* version bump 0.6.0a6

* remove redundant pydantic

* fix pipestat reader dependency checking

* fix cli bug regarding env variables

* retrieve_one record_identifier defaults to None to allow for use with env variables, clean README.md

* further documentation polishing

* fix obtaining record_identifier from cfg file

* correct config.md

* correct pipestat_specification.md

* 0.6.0a7 prerelease

* Clarify return on select_records

* ensure requirements list yacman>= 0.9.2

* potential fix for #119

* change warning to debug for creating status table

* fix pipestat creating subdirectories before creating results file.

* First proof of concept for #120

* simply do replacement at beginning

* Fixed #106

* now functional, create aggregate_results.yaml which replaces self._data before reporting #120

* replace spaces in names with %20 during html link generation

* fix reported_objs bug

* expandpath for link_dir during pipestat link

* add tests for new aggregate_results functionality

* refactor db-backend to dbbackend

* fix stats and objects buttons for summary page #130

* fix stats table on summary page #130

* fix broken status if multi_results_files #130

* fix log and profile files #130

* fix tsv columns #130

* Add check for pipeline type during report generation #130

* refactor Sample to Record #130

* fix refactor for sample page

* fix stats.tsv and objects.yaml generation to incorporate pipeline types

* add project_objects page #130

* minor polishing

* remove temp test

* lint

* add passing looper's samples to pipestat summarize to populate table

* remove unused code

* version bump for pre-release v0.6.0a9

* use "object_type" for image and file types or fall back on using "type", add try except

* ensure project level objects are accurate on summary/index page

* remove project name from stats and object file names

* update changelog

* 0.6.0a10 bump for pre-release

* allow project-level figures to display as columns instead of rows

* 0.6.0a11 pre-release

* v0.6.0 release prep

* lint

---------

Co-authored-by: nsheff <[email protected]>
Co-authored-by: Khoroshevskyi <[email protected]>
@donaldcampbelljr
Copy link
Contributor

Closed via #92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants