Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: thing-model-catalog: Working With Multiple Thing Model Repositories #13

Open
alexbrdn opened this issue Nov 17, 2023 · 2 comments

Comments

@alexbrdn
Copy link

alexbrdn commented Nov 17, 2023

[Description]

One of the requirements for Thing Model Catalog (TMC) is to enable using multiple and different remote repositories (remotes) for storing Thing Models, e.g. public vs private, file system vs Amazon S3 storage. This proposal describes envisioned workflows for working with remotes and necessary functionality on the part of TMC.

The TMC should support at least the following workflows. The list is not exhaustive.

  • Get up and running with local repository
    Download tm-catalog binary, set up a local folder as a [file] remote repository, start pushing TMs to it.
  • Edit a TM
    Fetch a TM from a remote, edit the file, push back to the same remote.
  • Use a git Repository as TM Catalog
    Initialize a git repository in a folder that is used as a file remote, push it to a remote git server. add that server as a TMC remote
  • Shorten a TM for a particular usecase
    Fetch from a publicly available catalog a complete TM from the manufacturer, remove affordances not necessary for the usecase, push to a private catalog
  • Search across multiple remotes
    Add a combination of local/remote/public/private repositories to tm-catalog config. Search for a TM or list versions of a TM across all of the remotes.
  • Merge one remote into another
    A subdivision of a larger organization may develop their own TMC and decide to later merge into the central TMC maintained by the company

[How]

The TMC should have a CLI command to manage remotes. This command should support CRUD operations on the list of remotes and store it in the config file. Parts of configuration (e.g. authentication secrets) may be left out of the config file and instead be derived from environment variables.

The following kinds of remotes are envisioned and can be implemented in due time:

  • local filesystem
  • http-accessble file storage
    allows configuring e.g. a remotely hosted git repository as a remote for tm-catalog
  • tm-catalog server instance running remotely
    incidentally, allows for easy sharding of large catalogs, in case, for example, if some organization decides to host a global TMC that would be the default for everyone, akin to Docker Hub.
  • cloud bulk storage, e.g. Amazon S3

The commands (APIs) that TMC provides through its CLI or REST API can be divided into three groups by their relation to remotes:

  1. Doesn't need a remote. E.g. validate
  2. Requires exactly one remote. E.g. push
  3. Performs a federated action on all remotes, but may be limited to just one. E.g. list, versions, fetch

The implementation of the third kind of commands - federated actions - should be extended to support multiple remotes. Of special interest is the list command, which should perform a federated search across all remotes. For such search to work, each kind of remote should ether implement some kind of search API, which returns structured results (with rankings), or host a predefined search index at a known place. The rankings are necessary to smartly merge search results. A tm-catalog instance or cloud bulk storage will have their own search apis. A file remote or a plain http server serving a file directory, on the other hand, must include a search index.

cc => @hadjian @EVO-Antoniazzi

@hadjian
Copy link
Contributor

hadjian commented Dec 19, 2023

To document some points from our discussion this morning:

  • Get rid of the "default" remote concept and make the remote argument mandatory for commands that need it. Otherwise the users may easily push to a public remote when trying to push e.g. to a local directory (accepted)
  • Allow on-the-fly remotes by specifying an url instead of a remote alias. If the user wants to pull a part of a catalog into a directory and has e.g. another catalog somewhere in his/her filesystem it can become weird, if all directories must be configured as remotes. Also, a user may want to restrict a command to one remote, e.g. for listing. (accepted)
  • Introduce a pull command to recursively fetch all TMs from a catalog/namespace/versions. Makes it easy to build your own catalog from e.g. a hosted catalog that you cannot checkout via git. (accepted)
  • Introduce a flag to instruct fetch to create a file with the same directory layout as within the catalog being fetched from. Leave the output to stdout as the default. (accepted)
  • Introduce tm-catalog-cli remote set-auth to make credential setup easier. (accepted)

@alexbrdn
Copy link
Author

@hadjian, restricting list, versions, and fetch commands to just one remote is already implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants