Skip to content

Commit

Permalink
Support multi valued facets (#29)
Browse files Browse the repository at this point in the history
  • Loading branch information
Cito authored Aug 6, 2024
1 parent 1c49b9e commit 9432e96
Show file tree
Hide file tree
Showing 19 changed files with 500 additions and 204 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ repos:
- id: no-commit-to-branch
args: [--branch, dev, --branch, int, --branch, main]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.5
rev: v0.5.6
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
Expand Down
5 changes: 3 additions & 2 deletions .readme_generation/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ occur in these embedded classes, too.
Along with the hits, facet options are reported that can be used to filter down the hits by
performing the same search query again but with specific facet selections being set.

The search endpoint supports pagination to deal with large hit lists. Facet options can
help avoid having to rely on this feature by filtering down the number of hits to a single page.
The search endpoint supports pagination to deal with a large number of search results.
Facet options can help avoid having to rely on this feature by filtering down the number
of hits to a single page.

For more information see the OpenAPI spec linked below.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ occur in these embedded classes, too.
Along with the hits, facet options are reported that can be used to filter down the hits by
performing the same search query again but with specific facet selections being set.

The search endpoint supports pagination to deal with large hit lists. Facet options can
help avoid having to rely on this feature by filtering down the number of hits to a single page.
The search endpoint supports pagination to deal with a large number of search results.
Facet options can help avoid having to rely on this feature by filtering down the number
of hits to a single page.

For more information see the OpenAPI spec linked below.

Expand Down Expand Up @@ -317,11 +318,11 @@ The service requires the following configuration parameters:

- **`description`** *(string, required)*: A brief description of the resource type.

- **`facetable_fields`** *(array)*: A list of the facetable fields for the resource type (leave empty to not use faceting). Default: `[]`.
- **`facetable_fields`** *(array)*: A list of the facetable fields for the resource type (leave empty to not use faceting, use dotted notation for nested fields). Default: `[]`.

- **Items**: Refer to *[#/$defs/FieldLabel](#%24defs/FieldLabel)*.

- **`selected_fields`** *(array)*: A list of the returned fields for the resource type (leave empty to return all). Default: `[]`.
- **`selected_fields`** *(array)*: A list of the returned fields for the resource type (leave empty to return all, use dotted notation for nested fields). Default: `[]`.

- **Items**: Refer to *[#/$defs/FieldLabel](#%24defs/FieldLabel)*.

Expand Down
4 changes: 2 additions & 2 deletions config_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
},
"facetable_fields": {
"default": [],
"description": "A list of the facetable fields for the resource type (leave empty to not use faceting)",
"description": "A list of the facetable fields for the resource type (leave empty to not use faceting, use dotted notation for nested fields)",
"items": {
"$ref": "#/$defs/FieldLabel"
},
Expand All @@ -40,7 +40,7 @@
},
"selected_fields": {
"default": [],
"description": "A list of the returned fields for the resource type (leave empty to return all)",
"description": "A list of the returned fields for the resource type (leave empty to return all, use dotted notation for nested fields)",
"items": {
"$ref": "#/$defs/FieldLabel"
},
Expand Down
215 changes: 118 additions & 97 deletions lock/requirements-dev.txt

Large diffs are not rendered by default.

13 changes: 7 additions & 6 deletions lock/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ async-timeout==4.0.3 \
# via
# -c lock/requirements-dev.txt
# aiokafka
attrs==23.2.0 \
--hash=sha256:935dc3b529c262f6cf76e50877d35a4bd3c1de194fd41f47a2b7ae8f19971f30 \
--hash=sha256:99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1
attrs==24.1.0 \
--hash=sha256:377b47448cb61fea38533f671fba0d0f8a96fd58facd4dc518e3dac9dbea0905 \
--hash=sha256:adbdec84af72d38be7628e353a09b6a6790d15cd71819f6e9d7b0faa8a125745
# via
# -c lock/requirements-dev.txt
# jsonschema
Expand Down Expand Up @@ -94,9 +94,9 @@ fastapi==0.111.1 \
# via
# -c lock/requirements-dev.txt
# ghga-service-commons
fastapi-cli==0.0.4 \
--hash=sha256:a2552f3a7ae64058cdbb530be6fa6dbfc975dc165e4fa66d224c3d396e25e809 \
--hash=sha256:e2e9ffaffc1f7767f488d6da34b6f5a377751c996f397902eb6abb99a67bde32
fastapi-cli==0.0.5 \
--hash=sha256:d30e1239c6f46fcb95e606f02cdda59a1e2fa778a54b64686b3ff27f6211ff9f \
--hash=sha256:e94d847524648c748a5350673546bbf9bcaeb086b33c24f2e82e021436866a46
# via
# -c lock/requirements-dev.txt
# fastapi
Expand Down Expand Up @@ -694,6 +694,7 @@ uvicorn==0.29.0 \
# via
# -c lock/requirements-dev.txt
# fastapi
# fastapi-cli
# ghga-service-commons
uvloop==0.19.0 \
--hash=sha256:0246f4fd1bf2bf702e06b0d45ee91677ee5c31242f39aab4ea6fe0c51aedd0fd \
Expand Down
5 changes: 3 additions & 2 deletions openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,15 +128,15 @@ components:
facetable_fields:
default: []
description: A list of the facetable fields for the resource type (leave
empty to not use faceting)
empty to not use faceting, use dotted notation for nested fields)
items:
$ref: '#/components/schemas/FieldLabel'
title: Facetable Fields
type: array
selected_fields:
default: []
description: A list of the returned fields for the resource type (leave
empty to return all)
empty to return all, use dotted notation for nested fields)
items:
$ref: '#/components/schemas/FieldLabel'
title: Selected Fields
Expand Down Expand Up @@ -177,6 +177,7 @@ components:
info:
contact:
email: [email protected]
name: German Human Genome Phenome Archive (GHGA)
license:
name: Apache 2.0
summary: A service for searching metadata artifacts and filtering results.
Expand Down
8 changes: 7 additions & 1 deletion src/mass/adapters/inbound/fastapi_/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,13 @@
def get_configured_app(*, config: Config) -> FastAPI:
"""Create and configure a REST API application."""
summary = metadata["Summary"]
author = metadata["Author"]
author = metadata.get("Author")
email = metadata["Author-email"]
if not author and email.endswith(">"):
# author is contained in Author-email
author, email = email.rsplit("<", 1)
author = author.strip().strip('"')
email = email[:-1]
email = metadata["Author-email"]
license = metadata["License"]
title, summary = summary.split(" - ", 1)
Expand Down
60 changes: 50 additions & 10 deletions src/mass/adapters/outbound/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
}


def name_from_key(key: str) -> str:
"""Auto generate a suitable name from a key"""
return key.title().replace("_", " ")


def pipeline_match_text_search(*, query: str) -> JsonObject:
"""Build text search segment of aggregation pipeline"""
text_search = {"$text": {"$search": query}}
Expand All @@ -45,18 +50,37 @@ def args_for_getfield(*, root_object_name: str, field_name: str) -> tuple[str, s
specified_field = pieces[-1]
prefix += "." + ".".join(pieces[:-1])

return (prefix, specified_field)
return prefix, specified_field


def pipeline_match_filters_stage(*, filters: list[models.Filter]) -> JsonObject:
"""Build segment of pipeline to apply search filters"""
segment: dict[str, dict[str, list[str]]] = defaultdict(lambda: {"$in": []})
filter_values = defaultdict(list)
for item in filters:
filter_key = "content." + str(item.key)
filter_value = item.value
segment[filter_key]["$in"].append(filter_value)

return {"$match": segment}
filter_values[item.key].append(item.value)
segment = []
for key, values in filter_values.items():
if key != "id_":
key = "content." + key
segment.append(
{
"$or": [
{
"$and": [
{key: {"$not": {"$type": "array"}}},
{key: {"$in": values}},
]
},
{
"$and": [
{key: {"$type": "array"}},
{key: {"$elemMatch": {"$in": values}}},
]
},
]
}
)
return {"$match": {"$and": segment}}


def pipeline_facet_sort_and_paginate(
Expand All @@ -74,8 +98,16 @@ def pipeline_facet_sort_and_paginate(
prefix, specified_field = args_for_getfield(
root_object_name="content", field_name=facet.key
)

segment[facet.name] = [
name = facet.name
if not name:
name = name_from_key(facet.key)
segment[name] = [
{
"$unwind": {
"path": f"{prefix}.{specified_field}",
"preserveNullAndEmptyArrays": True,
}
},
{
"$group": {
"_id": {"$getField": {"field": specified_field, "input": prefix}},
Expand Down Expand Up @@ -116,8 +148,16 @@ def pipeline_project(*, facet_fields: list[models.FieldLabel]) -> JsonObject:

# add a segment for each facet to summarize the options
for facet in facet_fields:
key = facet.key
name = facet.name
if not name:
name = name_from_key(key)
segment["facets"].append(
{"key": facet.key, "name": facet.name, "options": f"${facet.name}"}
{
"key": key,
"name": name,
"options": f"${name}",
}
)
return {"$project": segment}

Expand Down
4 changes: 2 additions & 2 deletions src/mass/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,12 @@ class SearchableClass(BaseModel):
facetable_fields: list[FieldLabel] = Field(
[],
description="A list of the facetable fields for the resource type"
" (leave empty to not use faceting)",
" (leave empty to not use faceting, use dotted notation for nested fields)",
)
selected_fields: list[FieldLabel] = Field(
[],
description="A list of the returned fields for the resource type"
" (leave empty to return all)",
" (leave empty to return all, use dotted notation for nested fields)",
)


Expand Down
16 changes: 12 additions & 4 deletions tests/fixtures/test_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,21 @@
db_connection_str: mongodb://localhost:27017
db_name: metadata-store
searchable_classes:
DatasetEmbedded:
NestedData:
description: Dataset with embedded references.
facetable_fields:
- key: category
name: Category
- key: field1
- key: city
name: Field 1
- key: "has_object.type"
- key: "object.type"
name: Object Type
selected_fields:
- key: id_
name: ID
- key: type
name: Location Type
- key: "has_object.type"
- key: "object.type"
name: Object Type
EmptyCollection:
description: An empty collection to test the index creation.
Expand All @@ -53,6 +53,14 @@ searchable_classes:
- key: data
name: Data
selected_fields: []
FilteringTests:
description: Data for testing filtering on using single and multi-valued fields.
facetable_fields:
- key: species
- key: eats
name: Food
selected_fields:
- key: name
resource_change_event_topic: searchable_resources
resource_deletion_event_type: searchable_resource_deleted
resource_upsertion_event_type: searchable_resource_upserted
Expand Down
51 changes: 51 additions & 0 deletions tests/fixtures/test_data/FilteringTests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"items": [
{
"eats": [
"bananas"
],
"id_": "1",
"name": "Jack",
"species": "monkey"
},
{
"eats": [
"dog food",
"treats"
],
"id_": "2",
"name": "Bruiser",
"species": "dog"
},
{
"eats": [
"spaghetti",
"meatballs"
],
"id_": "3",
"name": "Lady",
"species": "dog"
},
{
"eats": [
"fish",
"lasagna",
"meatballs",
"spaghetti",
"treats"
],
"id_": "4",
"name": "Garfield",
"species": "cat"
},
{
"eats": [
"fish",
"shrimp"
],
"id_": "5",
"name": "Flipper",
"species": "dolphin"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
"items": [
{
"category": "hotel",
"field1": "Miami",
"has_object": {
"city": "Miami",
"id_": "1HotelAlpha-id",
"object": {
"id_": "HotelAlphaObject",
"type": "piano"
},
"has_rooms": [
"rooms": [
{
"id_": "HotelAlphaLarge",
"type": "large room"
Expand All @@ -17,31 +18,30 @@
"type": "poolside room"
}
],
"id_": "1HotelAlpha-id",
"type": "resort"
},
{
"category": "hotel",
"field1": "Denver",
"has_object": {
"city": "Denver",
"id_": "2HotelBeta-id",
"object": {
"id_": "HotelBetaObject",
"type": "kitchen"
},
"id_": "2HotelBeta-id",
"type": "luxury"
},
{
"category": "zoo",
"field1": "Amsterdam",
"has_animal": {
"animal": {
"id_": "ZooAnimal",
"type": "giraffe"
},
"has_object": {
"category": "zoo",
"city": "Amsterdam",
"id_": "3zoo-id",
"object": {
"id_": "zoo-object",
"type": "concessions stand"
},
"id_": "3zoo-id"
}
}
]
}
Loading

0 comments on commit 9432e96

Please sign in to comment.