Skip to content

Commit

Permalink
add optional docscraper in specs (#10)
Browse files Browse the repository at this point in the history
* added scraper (draft) #3

* finalized scraper #3

* changes in README.md #3

* updated reconciliation loop diagram

* added cronjob rbac #3

* changes in README.md #3

* changes in README.md #3

* changes in README.md #3

* changes in README.md #3

* Update README.md
  • Loading branch information
akyriako authored Dec 10, 2024
1 parent 979b1cf commit 508877e
Show file tree
Hide file tree
Showing 14 changed files with 351 additions and 10 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ OPERATOR_SDK_VERSION ?= v1.38.0
# Image URL to use all building/pushing image targets
DOCKER_HUB_NAME ?= $(shell docker info | sed '/Username:/!d;s/.* //')
IMG_NAME ?= typesense-operator
IMG_TAG ?= 0.2.1
IMG_TAG ?= 0.2.2
IMG ?= $(DOCKER_HUB_NAME)/$(IMG_NAME):$(IMG_TAG)

# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
Expand Down
33 changes: 28 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Key features of Typesense Kubernetes Operator include:
- provision Typesense services (headless & discovery `Services`),
- actively discover and update Typesense's nodes list (quorum configuration mounted as `ConfigMap`),
- place claims for Typesense `PersistentVolumes`
- _optionally_ expose Typesense API endpoint via an `Ingress`
- _optionally_ provision one or multiple instances (one per target URL) of DocSearch as `Cronjobs`
- **Raft Quorum Configuration & Recovery Automation**:
- Continuous active (re)discovery of the quorum configuration reacting to changes in `ReplicaSet` **without the need of an additional sidecar container**,
- Automatic recovery of a cluster that has lost quorum **without the need of manual intervention**.
Expand Down Expand Up @@ -78,15 +80,20 @@ The Typesense Kubernetes Operator manages the entire lifecycle of Typesense Clus
4. A `StatefulSet` is then created. The quorum configuration stored in the `ConfigMap` is mounted as a volume in each `Pod`
under `/usr/share/typesense/nodelist`. No `Pod` restart is necessary when the `ConfigMap` changes, as raft automatically
detects and applies the updates.
5. Optionally, an **nginx:alpine** workload is provisioned as `Deployment` and published via an `Ingress`, in order to exposed safely
the Typesense REST/API endpoint outside the Kubernetes cluster **only** to selected referrers. The configuration of the
nginx workload is stored in a `ConfigMap`.
6. Optionally, one or more instances of **DocSearch** are deployed as distinct `CronJobs` (one per scraping target URL),
which based on user-defined schedules, periodically scrape the target sites and store the results in Typesense.

![image](https://github.com/user-attachments/assets/30b6989c-c872-46ef-8ece-86c5d4911667)
![image](https://github.com/user-attachments/assets/2afb802c-11f7-4be4-b44f-5dab9d489971)

> [!NOTE]
> The interval between reconciliation loops depends on the number of nodes. This approach ensures raft has sufficient
> "breathing room" to carry out its operations—such as leader election, log replication, and bootstrapping—before the
> next quorum health reconciliation begins.
5. The controller assesses the quorum's health by probing each node at `http://{nodeUrl}:{api-port}/health`. Based on the
7. The controller assesses the quorum's health by probing each node at `http://{nodeUrl}:{api-port}/health`. Based on the
results, it formulates an action plan for the next reconciliation loop. This process is detailed in the following section:

### Problem 2: Recovering a cluster that has lost quorum
Expand Down Expand Up @@ -146,7 +153,7 @@ of manual intervention in order to recover a cluster that has lost quorum.
Typesense Kubernetes Operator is controlling the lifecycle of multiple Typesense instances in the same Kubernetes cluster by
introducing `TypesenseCluster`, a new Custom Resource Definition:

![image](https://github.com/user-attachments/assets/23e40781-ca21-4297-93bf-2b5dbebc7e0e)
![image](https://github.com/user-attachments/assets/3dd20498-fb4b-46b7-9f60-ff413fadc942)

**Spec**

Expand All @@ -160,6 +167,7 @@ introducing `TypesenseCluster`, a new Custom Resource Definition:
| corsDomains | domains that would be allowed for CORS calls | X | |
| storage | check StorageSpec below | | |
| ingress | check IngressSpec below | X | |
| scrapers | array of DocSearchScraperSpec; check below | X | |

**StorageSpec** (optional)

Expand All @@ -178,12 +186,27 @@ introducing `TypesenseCluster`, a new Custom Resource Definition:
| ingressClassName | Ingress to use | | |
| annotations | User-Defined annotations | X | |

> [!IMPORTANT]
> This feature requires the existence of [cert-manager](https://cert-manager.io/) in the cluster, but **does not** actively enforce it with an error.
> If you are targeting Open Telekom Cloud, you might be interested in provisioning additionally the designated DNS solver webhook
> for Open Telekom Cloud. You can find it [here](https://github.com/akyriako/cert-manager-webhook-opentelekomcloud).
**DocSearchScraperSpec** (optional)

| Name | Description | Optional | Default |
|-------------|------------------------------------------|----------|---------|
| name | name of the scraper | | |
| image | container image to use | | |
| config | config to use | | |
| schedule | cron expression; no timezone; no seconds | | |

> [!CAUTION]
> Although in Typesense documentation under _Production Best Practices_ -> _Configuration_ is stated:
> "_Typesense comes built-in with a high performance HTTP server (opens new window)that is used by likes of Fastly (opens new window)in
> their edge servers at scale. So Typesense can be directly exposed to incoming public-facing internet traffic,
> without the need to place it behind another web server like Nginx / Apache or your backend API._" it is highly recommended
> , from this operator's perspective, to always expose Typesense behind a reverse proxy (using the `referer` option).
> without the need to place it behind another web server like Nginx / Apache or your backend API._"
>
> It is highly recommended, from this operator's perspective, to always expose Typesense behind a reverse proxy (using the `referer` option).

**Status**
Expand Down
12 changes: 12 additions & 0 deletions api/v1alpha1/typesensecluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ type TypesenseClusterSpec struct {
Storage *StorageSpec `json:"storage"`

Ingress *IngressSpec `json:"ingress,omitempty"`

Scrapers []DocSearchScraperSpec `json:"scrapers,omitempty"`
}

type StorageSpec struct {
Expand Down Expand Up @@ -84,6 +86,16 @@ type IngressSpec struct {
Annotations map[string]string `json:"annotations,omitempty"`
}

type DocSearchScraperSpec struct {
Name string `json:"name"`
Image string `json:"image"`
Config string `json:"config"`

// +kubebuilder:validation:Pattern:=`(^((\*\/)?([0-5]?[0-9])((\,|\-|\/)([0-5]?[0-9]))*|\*)\s+((\*\/)?((2[0-3]|1[0-9]|[0-9]|00))((\,|\-|\/)(2[0-3]|1[0-9]|[0-9]|00))*|\*)\s+((\*\/)?([1-9]|[12][0-9]|3[01])((\,|\-|\/)([1-9]|[12][0-9]|3[01]))*|\*)\s+((\*\/)?([1-9]|1[0-2])((\,|\-|\/)([1-9]|1[0-2]))*|\*|(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|des))\s+((\*\/)?[0-6]((\,|\-|\/)[0-6])*|\*|00|(sun|mon|tue|wed|thu|fri|sat))\s*$)|@(annually|yearly|monthly|weekly|daily|hourly|reboot)`
// +kubebuilder:validation:Type=string
Schedule string `json:"schedule"`
}

// TypesenseClusterStatus defines the observed state of TypesenseCluster
type TypesenseClusterStatus struct {

Expand Down
20 changes: 20 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions charts/typesense-operator/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.2.1
version: 0.2.2
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.2.1"
appVersion: "0.2.2"
12 changes: 12 additions & 0 deletions charts/typesense-operator/templates/manager-rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,18 @@ rules:
- patch
- update
- watch
- apiGroups:
- batch
resources:
- cronjobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
Expand Down
19 changes: 19 additions & 0 deletions charts/typesense-operator/templates/typesensecluster-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,25 @@ spec:
resetPeersOnError:
default: true
type: boolean
scrapers:
items:
properties:
config:
type: string
image:
type: string
name:
type: string
schedule:
pattern: (^((\*\/)?([0-5]?[0-9])((\,|\-|\/)([0-5]?[0-9]))*|\*)\s+((\*\/)?((2[0-3]|1[0-9]|[0-9]|00))((\,|\-|\/)(2[0-3]|1[0-9]|[0-9]|00))*|\*)\s+((\*\/)?([1-9]|[12][0-9]|3[01])((\,|\-|\/)([1-9]|[12][0-9]|3[01]))*|\*)\s+((\*\/)?([1-9]|1[0-2])((\,|\-|\/)([1-9]|1[0-2]))*|\*|(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|des))\s+((\*\/)?[0-6]((\,|\-|\/)[0-6])*|\*|00|(sun|mon|tue|wed|thu|fri|sat))\s*$)|@(annually|yearly|monthly|weekly|daily|hourly|reboot)
type: string
required:
- config
- image
- name
- schedule
type: object
type: array
storage:
properties:
size:
Expand Down
2 changes: 1 addition & 1 deletion charts/typesense-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ controllerManager:
- ALL
image:
repository: akyriako78/typesense-operator
tag: 0.2.1
tag: 0.2.2
resources:
limits:
cpu: 500m
Expand Down
19 changes: 19 additions & 0 deletions config/crd/bases/ts.opentelekomcloud.com_typesenseclusters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,25 @@ spec:
resetPeersOnError:
default: true
type: boolean
scrapers:
items:
properties:
config:
type: string
image:
type: string
name:
type: string
schedule:
pattern: (^((\*\/)?([0-5]?[0-9])((\,|\-|\/)([0-5]?[0-9]))*|\*)\s+((\*\/)?((2[0-3]|1[0-9]|[0-9]|00))((\,|\-|\/)(2[0-3]|1[0-9]|[0-9]|00))*|\*)\s+((\*\/)?([1-9]|[12][0-9]|3[01])((\,|\-|\/)([1-9]|[12][0-9]|3[01]))*|\*)\s+((\*\/)?([1-9]|1[0-2])((\,|\-|\/)([1-9]|1[0-2]))*|\*|(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|des))\s+((\*\/)?[0-6]((\,|\-|\/)[0-6])*|\*|00|(sun|mon|tue|wed|thu|fri|sat))\s*$)|@(annually|yearly|monthly|weekly|daily|hourly|reboot)
type: string
required:
- config
- image
- name
- schedule
type: object
type: array
storage:
properties:
size:
Expand Down
12 changes: 12 additions & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,18 @@ rules:
- patch
- update
- watch
- apiGroups:
- batch
resources:
- cronjobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
Expand Down
11 changes: 10 additions & 1 deletion config/samples/ts_v1alpha1_typesensecluster_kind.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,13 @@ spec:
ingress:
host: host.example.com
ingressClassName: nginx
clusterIssuer: lets-encrypt-prod
clusterIssuer: lets-encrypt-prod
scrapers:
- name: empty-target
image: typesense/docsearch-scraper:0.11.0
config: ''
schedule: '*/2 * * * *'
- name: docusaurus-example-com
image: typesense/docsearch-scraper:0.11.0
config: "{\"index_name\":\"docuraurus-example\",\"start_urls\":[\"https://docusaurus.example.com/\"],\"sitemap_urls\":[\"https://docusaurus.example.com/sitemap.xml\"],\"sitemap_alternate_links\":true,\"stop_urls\":[\"/tests\"],\"selectors\":{\"lvl0\":{\"selector\":\"(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]\",\"type\":\"xpath\",\"global\":true,\"default_value\":\"Documentation\"},\"lvl1\":\"header h1\",\"lvl2\":\"article h2\",\"lvl3\":\"article h3\",\"lvl4\":\"article h4\",\"lvl5\":\"article h5, article td:first-child\",\"lvl6\":\"article h6\",\"text\":\"article p, article li, article td:last-child\"},\"strip_chars\":\" .,;:#\",\"custom_settings\":{\"separatorsToIndex\":\"_\",\"attributesForFaceting\":[\"language\",\"version\",\"type\",\"docusaurus_tag\"],\"attributesToRetrieve\":[\"hierarchy\",\"content\",\"anchor\",\"url\",\"url_without_anchor\",\"type\"]},\"conversation_id\":[\"833762294\"],\"nb_hits\":46250}"
schedule: '*/2 * * * *'
1 change: 1 addition & 0 deletions internal/controller/typesensecluster_condition_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ const (
ConditionReasonConfigMapNotReady = "ConfigMapNotReady"
ConditionReasonServicesNotReady = "ServicesNotReady"
ConditionReasonIngressNotReady = "IngressNotReady"
ConditionReasonScrapersNotReady = "ScrapersNotReady"
ConditionReasonQuorumReady ConditionQuorum = "QuorumReady"
ConditionReasonQuorumNotReady ConditionQuorum = "QuorumNotReady"
ConditionReasonQuorumDowngraded ConditionQuorum = "QuorumDowngraded"
Expand Down
10 changes: 10 additions & 0 deletions internal/controller/typesensecluster_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ var (
// +kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
// +kubebuilder:rbac:groups=networking.k8s.io,resources=ingresses,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
Expand Down Expand Up @@ -137,6 +138,15 @@ func (r *TypesenseClusterReconciler) Reconcile(ctx context.Context, req ctrl.Req
return ctrl.Result{}, err
}

err = r.ReconcileScraper(ctx, ts)
if err != nil {
cerr := r.setConditionNotReady(ctx, &ts, ConditionReasonScrapersNotReady, err)
if cerr != nil {
err = errors.Wrap(err, cerr.Error())
}
return ctrl.Result{}, err
}

sts, err := r.ReconcileStatefulSet(ctx, ts)
if err != nil {
cerr := r.setConditionNotReady(ctx, &ts, ConditionReasonStatefulSetNotReady, err)
Expand Down
Loading

0 comments on commit 508877e

Please sign in to comment.