Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api gui howto guide #27

Merged
merged 75 commits into from
Sep 7, 2023
Merged
Changes from 13 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
28e66eb
Add api_gui first draft
brynnz22 Aug 10, 2023
e310164
update api docs
brynnz22 Aug 10, 2023
ee331a7
fix metadata/find endpoints images
brynnz22 Aug 10, 2023
8cc7196
Add find example
brynnz22 Aug 10, 2023
720518e
fix typos, set reminders for next time
brynnz22 Aug 10, 2023
cb93664
fix typo
brynnz22 Aug 10, 2023
1d82689
Add get biosamples table image
brynnz22 Aug 11, 2023
c0e0c00
add activities endpoint parameter example and pic
brynnz22 Aug 11, 2023
a26eeb8
fix activities image typo
brynnz22 Aug 11, 2023
b522c34
take out table images and rearrange
brynnz22 Aug 11, 2023
ce83680
add study_id parameter and endpoint images
brynnz22 Aug 11, 2023
5757c94
try extra space between endpoint images
brynnz22 Aug 11, 2023
c6b6171
try line breaks
brynnz22 Aug 11, 2023
f848200
try line breaks again
brynnz22 Aug 11, 2023
4a8f983
line breaks...
brynnz22 Aug 11, 2023
68995c6
trying again...
brynnz22 Aug 11, 2023
663b055
last time...
brynnz22 Aug 11, 2023
f869027
Added find endpoints and images
brynnz22 Aug 11, 2023
c7878e3
add activity_id to parameter table
brynnz22 Aug 11, 2023
2fdac8b
add edits to find section
brynnz22 Aug 14, 2023
b25c032
add find example and images
brynnz22 Aug 14, 2023
7d09d6f
fix typo and remove italics
brynnz22 Aug 14, 2023
f101b2b
add space between find example and note
brynnz22 Aug 14, 2023
dbf294c
clean up
brynnz22 Aug 14, 2023
10e48a2
fix note image
brynnz22 Aug 14, 2023
5220250
fix example note image path
brynnz22 Aug 14, 2023
b2ebebd
indent caption
brynnz22 Aug 14, 2023
355b741
fix typo
brynnz22 Aug 14, 2023
3fabeaf
remove inconsistent example in find parameter table
brynnz22 Aug 14, 2023
08c479f
fix typo
brynnz22 Aug 14, 2023
88ed0c7
add metadata endpoints info
brynnz22 Aug 21, 2023
6c3b3df
update page_token param
brynnz22 Aug 21, 2023
925cf0c
add projections example
brynnz22 Aug 21, 2023
34202cc
fix typos
brynnz22 Aug 21, 2023
3f58656
finish up metadata endpoint table
brynnz22 Aug 22, 2023
973cc05
fix typos/line breaks
brynnz22 Aug 22, 2023
615efa5
add POST changsheet validate endpoint
brynnz22 Aug 22, 2023
59dcaec
add POST submit changesheet endpoint
brynnz22 Aug 22, 2023
3adb336
remove submission endpoint, add json urls endpoint
brynnz22 Aug 22, 2023
bba7936
add post changesheet edge case note
brynnz22 Aug 22, 2023
8ad88e7
fix typo
brynnz22 Aug 22, 2023
e36565d
add line break after changesheet steps
brynnz22 Aug 22, 2023
c839283
update line break
brynnz22 Aug 22, 2023
3b6ce3d
line break experiment
brynnz22 Aug 22, 2023
8203a01
line breaks again
brynnz22 Aug 22, 2023
0e18c6e
line breaks check again
brynnz22 Aug 22, 2023
4682ad6
add metadata endpoints
brynnz22 Aug 22, 2023
9293c53
add doc_id endpoints
brynnz22 Aug 22, 2023
88a4e6d
add metadata endpoint example
brynnz22 Aug 23, 2023
7b5437d
fix typos
brynnz22 Aug 23, 2023
537999f
fix more typos
brynnz22 Aug 23, 2023
ee94498
add cursor parameter
brynnz22 Aug 23, 2023
020c578
fix cursor image
brynnz22 Aug 23, 2023
c3b86cc
Update api_gui.md to napa identifiers
aclum Aug 25, 2023
482def9
Fix closing parentheses in api_gui.md
aclum Aug 25, 2023
f9b0f30
Update api_gui.md formatting
aclum Aug 25, 2023
673a540
Remove data_object/study endpoint example from api_gui.md
aclum Aug 25, 2023
9a95cdb
Update api_gui.md to remove legacy identifiers
aclum Aug 25, 2023
454fd99
Replace `<b/>` (self-closing bold) with `<br/>` (line break)
eecavanna Aug 25, 2023
a05a20e
Standardize letter casing and fix typos
eecavanna Aug 26, 2023
7372ee3
emphasize what API GUI provides - Donny's feedback
brynnz22 Aug 28, 2023
9eb3bc4
include openalex link for compact syntax
brynnz22 Aug 28, 2023
e22bbfc
clarify GET/Activites endpoint and split into separate sentence
brynnz22 Aug 28, 2023
b514ad4
add optional parameter info. for each endpoint type
brynnz22 Aug 28, 2023
8388514
add comma after case
brynnz22 Aug 28, 2023
2e2bdad
capitalize URL
brynnz22 Aug 28, 2023
f0f351a
add projection syntax clarification
brynnz22 Aug 28, 2023
6af0528
clarify sentence about collection_set
brynnz22 Aug 28, 2023
f7b1c1d
clarify biosamples study example heading
brynnz22 Aug 28, 2023
c96e9ee
remove all submitting and posting endpoints
brynnz22 Aug 28, 2023
8ec86c6
clarify filter syntax paragraph - Montana's suggestion
brynnz22 Aug 28, 2023
c18125b
fix typos
brynnz22 Aug 28, 2023
ccfe889
add fields parameter to find parameter table
brynnz22 Aug 28, 2023
c646594
fix some grammer
brynnz22 Aug 29, 2023
5511f03
add link to runtime docs
brynnz22 Aug 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 17 additions & 36 deletions docs/howto_guides/api_gui.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Using the NMDC API Graphical User Interface (GUI)

## Retrieving and Submitting Metadata using the ___Find___ and ___Metadata___ API Endpoints
## Retrieving Metadata using the ___Find___ and ___Metadata___ API Endpoints

Metadata describing NMDC data (e.g. studies, biosamples, data objects, etc.) may be retrieved or submitted with GET and POST requests, respectively, using the **[NMDC API Graphical User Interface (GUI)](https://api.microbiomedata.org/docs#/)**. The API GUI provides a user interface for programmatic access to the NMDC data portal without needing to use the Command Line.
Metadata describing NMDC data (e.g. studies, biosamples, data objects, etc.) may be retrieved with GET requests, using the **[NMDC API Graphical User Interface (GUI)](https://api.microbiomedata.org/docs#/)**. The API GUI provides a guided user interface for direct access to the NMDC data portal. It allows for:
1. performing highly granular and targeted queries directly. This is especially helpful if a user has a query that may not be supported by the [NMDC Data Portal](https://data.microbiomedata.org/) yet.
2. interactive exploration of querying capabilities. It provides code snippets that can be used in scripts for programmatic access, i.e. the request `curl` commands and URLs provided in the responses (please see the examples below).

Requests can include various parameters to filter, sort, and organize the requested information. Attribute names in the parameters will vary depending on the collection. The required syntax of the parameters will also vary, depending on if it is a ___find___ or a ___metadata___ endpoint. ___Find___ endpoints are designed to use more compact syntax (for example, filtering biosamples for an "Ecosystem Category" of "Plants" would look like `ecosystem_category:Plants` using the `GET /biosamples` endpoint). While ___metadata___ endpoints use [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/) (e.g. the same filter would look like `{"ecosystem_category": "Plants"}` using the `GET /nmdcshema/{collection_name}` endpoint with `collection_name` set to `biosample_set`).
Requests can include various parameters to filter, sort, and organize the requested information. Attribute names in the parameters will vary depending on the collection. The required syntax of the parameters will also vary, depending on if it is a ___find___ or a ___metadata___ endpoint. ___Find___ endpoints are designed to use more [compact syntax](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists) (for example, filtering biosamples for an "Ecosystem Category" of "Plants" would look like `ecosystem_category:Plants` using the `GET /biosamples` endpoint). While ___metadata___ endpoints use [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/) (e.g. the same filter would look like `{"ecosystem_category": "Plants"}` using the `GET /nmdcshema/{collection_name}` endpoint with `collection_name` set to `biosample_set`).

#### ___Find___ Endpoints

The [find endpoints](https://api.microbiomedata.org/docs#/find:~:text=Find%20NMDC-,metadata,-entities.) are provided with NMDC metadata entites already specified - where metadata about [studies](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Study.html), [biosamples](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Biosample.html), [data objects](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/DataObject.html), and [activities](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Activity.html) can be retrieved using GET requests.
The [find endpoints](https://api.microbiomedata.org/docs#/find:~:text=Find%20NMDC-,metadata,-entities.) are provided with NMDC metadata entities already specified - where metadata about [studies](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Study.html), [biosamples](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Biosample.html), [data objects](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/DataObject.html), and [activities](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Activity.html) can be retrieved using GET requests.

The applicable parameters of the ___find___ endpoints, with acceptable syntax and examples, are in the table below.

Expand All @@ -21,13 +23,14 @@ The applicable parameters of the ___find___ endpoints, with acceptable syntax an
| per_page | Specifies the number of results returned per page. Maximum allowed is 2,000 | Integer | `50` |
| cursor | A bookmark for where a query can pick up where it has left off. To use cursor paging, set the `cursor` parameter to `*`. The results will include a `next_cursor` value in the response's `meta` object that can be used in the `cursor` parameter to retrieve the subsequent results ![next_cursor](../_static/images/howto_guides/api_gui/find_cursor.png) | String | `*` or `nmdc:sys0zr0fbt71` |
| group_by | Not yet implemented | Coming Soon | Not yet implemented |
| fields | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | `name, ess_dive_datasets` |
| study_id | The unique identifier of a study | Curie e.g. `prefix:identifier` | `nmdc:sty-11-34xj1150` |
| sample_id | The unique identifier of a biosample | Curie e.g. `prefix:identifier` | `nmdc:bsm-11-w43vsm21` |
| data_object_id | The unique identifer of a data object | Curie e.g. `prefix:identifier` | `nmdc:dobj-11-7c6np651` |
| data_object_id | The unique identifier of a data object | Curie e.g. `prefix:identifier` | `nmdc:dobj-11-7c6np651` |
| activity_id | The unique identifier for an NMDC workflow execution activity | Curie e.g. `prefix:identifier` | `nmdc:wfmgan-11-hvcnga50.1`|<br/>
<br/>

Each endpoint is unique and requires the applicable attribute names to be known in order to structure a query in a meaningful way.<br/>
Each endpoint is unique and requires the applicable attribute names to be known in order to structure a query in a meaningful way. Please note that endpoints with parameters that do not have a red `* required` label next to them, are optional.<br/>
<br/>

![find get studies](../_static/images/howto_guides/api_gui/find_get_studies.png)
Expand Down Expand Up @@ -55,7 +58,7 @@ If the data object identifier is known, the metadata can be retrieved using the
<br/>

![find get activities](../_static/images/howto_guides/api_gui/find_get_activities.png)
The `GET /activities` endpoint is a general way to fetch metadata about various activities (e.g. metagenome assembly, natural organic matter analysis, library preparation, etc.). Any "slot" (a.k.a. attribute) for [WorkflowExecutionActivty](https://microbiomedata.github.io/nmdc-schema/WorkflowExecutionActivity/) or [PlannedProcess](https://microbiomedata.github.io/nmdc-schema/PlannedProcess/) may be used in the filter and sort parameters, including attributes for subclasses of `WorkflowExecutionActvity` and `PlannedProcess`, such as slots used in the `MetabolomicsAnalysisActivity` or `Extraction` class among others.<br/>
The `GET /activities` endpoint is a general way to fetch metadata about various activities (e.g. metagenome assembly, natural organic matter analysis, library preparation, etc.). Any "slot" (a.k.a. attribute) for [WorkflowExecutionActivity](https://microbiomedata.github.io/nmdc-schema/WorkflowExecutionActivity/) or [PlannedProcess](https://microbiomedata.github.io/nmdc-schema/PlannedProcess/) classes may be used in the filter and sort parameters, including attributes for subclasses of `WorkflowExecutionActivity` and `PlannedProcess`. For example, attributes used in subclasses such as, [MetabolomicsAnalysisActivity](https://microbiomedata.github.io/nmdc-schema/MetabolomicsAnalysisActivity/) (subclass of `WorkflowExecutionActivity`) or (Extraction)[https://microbiomedata.github.io/nmdc-schema/Extraction/] (subclass of `PlannedProcess`), can be used as input criteria for the filter and sort parameters of this endpoint.<br/>
brynnz22 marked this conversation as resolved.
Show resolved Hide resolved
<br/>

![find get activities by activity id](../_static/images/howto_guides/api_gui/find_get_activities_activity_id.png)
Expand Down Expand Up @@ -83,43 +86,21 @@ For more information and to see more examples of __find__ endpoints outside of t

#### ___Metadata___ Endpoints

The [metadata endpoints](https://api.microbiomedata.org/docs#/metadata) can be used to get and filter metadata from collection set types (including studies, biosamples, activites, and data objects as discussed in the __find__ section), as well as validate and submit updates to existing metadata to the data portal as a spreadsheet or json file.
The [metadata endpoints](https://api.microbiomedata.org/docs#/metadata) can be used to get and filter metadata from collection set types (including studies, biosamples, activities, and data objects as discussed in the __find__ section).

The syntax for the filter parameter of the __metadata__ endpoints is slightly different than that of the __find__ endpoints using [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/) instead of the compact syntax the __find__ endpoints use. The applicable parameters of the __metadata__ endpoints, with acceptable syntax and examples, are in the table below.
Unlike the compact syntax used in the __find__ endpoints, the syntax for the filter parameter of the metadata endpoints use [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/). The applicable parameters of the __metadata__ endpoints, with acceptable syntax and examples, are in the table below.
brynnz22 marked this conversation as resolved.
Show resolved Hide resolved

| Parameter | Description | Syntax | Example |
| :---: | :-----------: | :-------: | :---: |
| collection_name | The name of the collection to be queried. For a list of collection names please see the [Database class](https://microbiomedata.github.io/nmdc-schema/Database/) of the NMDC Schema | String | `biosample_set` |
| filter | Allows conditions to be set as part of the query, returning only results that satisfy the conditions | [MongoDB-like query language](https://www.mongodb.com/docs/manual/tutorial/query-documents/). All strings should be in double quotation marks. | `{"lat_lon.latitude": {"$gt": 45.0}, "ecosystem_category": "Plants"}` |
| max_page_size | Specifies the maximum number of documents returned at a time | Integer | `25`
| page_token | Specifies the token of the page to return. If unspecified, the first page is returned. To retrieve a subsequent page, the value received as the `next_page_token` from the bottom of the previous results can be provided as a `page_token`. ![next_page_token](../_static/images/howto_guides/api_gui/metadata_page_token_param.png) | String | `nmdc:sys0ae1sh583`
| projection | Indicates the desired fields to be included in the response. Helpful for trimming down the returned results | Comma separated string of field names that correspond to a `collection_name`. | `name, ecosystem_type` |
| projection | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | `name, ecosystem_type` |
| doc_id | The unique identifier of the item being requested. For example, the identifier of a biosample or an extraction | Curie e.g. `prefix:identifier` | `nmdc:bsm-11-ha3vfb58` |<br/>
<br/>

The __metadata__ endpoints allow users to retrieve metadata from the data portal using the various `GET` endpoints that are slightly different than the __find__ endpoints, but some can be used similarily. They also include the ability to `POST` metadata changes to the data portal by allowing the validation and submission of change sheets or JSON files. Change sheets are spreadsheets that specify changes to be made to existing metadata in the portal, like updating, removing, or inserting values.<br/>
<br/>

![metadata post changesheets validate](../_static/images/howto_guides/api_gui/metadata_post_changesheets_validate.png)
A CSV or TSV file can be validated against the NMDC schema using the `POST /metadata/changesheets:validate` endpoint in order to update an already existing record in the portal. Please see an [example changesheet](https://github.com/microbiomedata/nmdc-runtime/blob/main/metadata-translation/notebooks/data/changesheet-without-separator3.tsv). The file should include four columns:
1. `id`: the identifier of the metadata object to be updated
2. `action`: the type of update to be performed. There are four actions:
- `insert`: inserts a new value
- `remove item`: removes the value from a specified attribute.
- `update`: replaces the existing value with a new value
- `delete`: removes the attribute entirely from the metadata document
3. `attribute`: the attribute (or field/slot) that will be updated (e.g. `name` or `ecosystem_category`, etc.)
4. `value`: the new value that will be inserted or that will replace the old value.

Please note that if changes are made to multivalued attributes that have a "structured" value, e.g. [air_temp_regm](https://microbiomedata.github.io/nmdc-schema/air_temp_regm/) has a range of [TextValue](https://microbiomedata.github.io/nmdc-schema/TextValue/), set the `value` to a variable, then set the `id` of the next line to the variable with the `value` set to what will get populated. See [example](https://github.com/microbiomedata/nmdc-runtime/issues/284#issuecomment-1686825159). For more information, please see [Authoring Changesheets](https://microbiomedata.github.io/nmdc-runtime/howto-guides/author-changesheets/).<br/>
<br/>

![metadata post json validate](../_static/images/howto_guides/api_gui/metadata_post_json_validate.png)
Metadata may also be validated in JSON format, which can be posted in the body of the request and validated against the NMDC schema using the `POST /metatdata/json:validate` endpoint before final submission to the portal.<br/>
<br/>

![metadata post json validate urls file](../_static/images/howto_guides/api_gui/metadata_post_validate_urls_file.png)
A text file of URLs that point to a JSON object may be supplied using the `POST /metadata/json:validate_urls_file` endpoint. This is helpful for validation of very large JSON metadata or if a user has a link to a JSON file but is not in an environment where it is convenient to download the file and then upload it to validate.<br/>
The __metadata__ endpoints allow users to retrieve metadata from the data portal using the various `GET` endpoints that are slightly different than the __find__ endpoints, but some can be used similarly. As with the __find__ endpoints, parameters for the __metadata__ endpoints that do not have a red `* required` next to them are optional. <br/>
<br/>

![metadata get nmdcshema version](../_static/images/howto_guides/api_gui/metadata_get_nmdcschema_version.png)
Expand All @@ -131,7 +112,7 @@ To get the NMDC Database collection statistics, like the total count of records
<br/>

![metadata get collection name](../_static/images/howto_guides/api_gui/metadata_get_collection_name.png)
The `GET /nmdcschema/{collection_name}` endpoint is a general purpose way to retrieve metadata about a specified collection given user-provided filter and projection criteria. Please see the [Collection Names](https://microbiomedata.github.io/nmdc-schema/Database/) that may be retrieved. Please note that only one collection set may be retrieved at a time.<br/>
The `GET /nmdcschema/{collection_name}` endpoint is a general purpose way to retrieve metadata about a specified collection given user-provided filter and projection criteria. Please see the [Collection Names](https://microbiomedata.github.io/nmdc-schema/Database/) that may be retrieved. Please note that metadata may only be retrieved about one collection at a time.<br/>
<br/>

![metadata get doc_id](../_static/images/howto_guides/api_gui/metadata_get_doc_id.png)
Expand All @@ -142,7 +123,7 @@ If the identifier of the record is known, the `GET /nmdcshema/ids/{doc_id}` can
If both the identifier and the collection name of the desired record is known, the `GET /nmdcschema/{collection_name}/{doc_id}` can be used to retrieve the record. The projection parameter is optionally available for this endpoint to retrieve only desired attributes from a record. Please note that only one record can be retrieved at one time using this method.<br/>
<br/>

#### Metadata Endpoints Example 1: Get all of the biosamples part of the 1000 Soils Research Campaign Study sampled from Colorado
#### Metadata Endpoints Example 1: Get all of the biosamples that are part of the 1000 Soils Research Campaign Study sampled from Colorado

1. Click on the drop down arrow to the right side of the **`GET /nmdcschema/{collection_name}`** endpoint
![metadata example step1](../_static/images/howto_guides/api_gui/metadata_example_step1.png)
Expand All @@ -153,7 +134,7 @@ If both the identifier and the collection name of the desired record is known, t
4. Enter in the parameters in the **`GET /nmdcschema/{collection_name}`** endpoint. For this example, we will input `biosample_set` into the **collection_name** parameter and `{"part_of": "nmdc:sty-11-28tm5d36", "geo_loc_name.has_raw_value": {"$regex": "Colorado"}}` into the **filter** parameter. See the [Biosample Class](https://microbiomedata.github.io/nmdc-schema/Biosample/) in the NMDC Schema to view the applicable biosample attributes (slots); for this example, they are `part_of` and `geo_loc_name.has_raw_value`. Note that `$regex` conducts a full text search for the word "Colorado" in the `geo_loc_name.has_raw_value` attribute.
5. Click **Execute**
![metadata example step4](../_static/images/howto_guides/api_gui/metadata_example_step4and5.png)
6. View the results in JSON format, available to download by clicking **Download**; or copy the results by clicking the clipboard icon in the bottom right corner of the response. In this case two studies were retrieved. Note that the curl and request URL are provided as well.
6. View the results in JSON format, available to download by clicking **Download**; or copy the results by clicking the clipboard icon in the bottom right corner of the response. In this case, two studies were retrieved. Note that the curl and request URL are provided as well.
![metadata example step6](../_static/images/howto_guides/api_gui/metadata_example_step6.png)


Expand Down
Loading