From 8622f9794b63cc785aab2ebc72ab6db3e9b9e52e Mon Sep 17 00:00:00 2001 From: Miles-Garnsey Date: Fri, 29 Sep 2023 12:19:31 +1000 Subject: [PATCH 1/5] I get very sad when docs don't link prominently to repos, or repos don't link prominently to docs. --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 4d59e9dd6..d1774c4a3 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,7 @@ # K8ssandra Operator + +**[Documentation site](https://docs.k8ssandra.io/)** + This is the Kubernetes operator for K8ssandra. K8ssandra is a Kubernetes-based distribution of Apache Cassandra that includes several tools and components that automate and simplify configuring, managing, and operating a Cassandra cluster. From 934e5f0779f256685cfef4f7a0338fd653e918cc Mon Sep 17 00:00:00 2001 From: Miles-Garnsey Date: Fri, 29 Sep 2023 13:53:45 +1000 Subject: [PATCH 2/5] New front page - why are we still focusing on k8ssandra 1.x on the front page? --- docs/content/en/_index.md | 62 +++++++++++++++++-- .../en/reference/old-k8ssandra/_index.md | 8 +++ 2 files changed, 64 insertions(+), 6 deletions(-) create mode 100644 docs/content/en/reference/old-k8ssandra/_index.md diff --git a/docs/content/en/_index.md b/docs/content/en/_index.md index 07ad62203..279cf9aa7 100755 --- a/docs/content/en/_index.md +++ b/docs/content/en/_index.md @@ -12,18 +12,68 @@ description: "K8ssandra documentation: architecture, configuration, guided tasks type: docs --- +k8ssandra-operator is a turnkey solution to manage [Apache Cassandra](https://cassandra.apache.org/_/index.html) on Kubernetes. Apache Cassandra is the premiere wide column NoSQL data store, offering low latency, geo-replication, and the capacity to store petabytes of data. Apache Cassandra is in use in 90% of the Fortune 500 in some capacity. + +k8ssandra-operator allows for the deployment of multiple Apache Cassandra datacenters, spanned over multiple Kubernetes clusters. The intention of this architecture is to provide geo-replication to enhance latency (by moving data closer to the end user) and availability (by providing multiple datacenters to serve requests in the event of a datacenter failure or network partition). + +Apache Cassandra offers rack and failure zone aware data replication which is both replicated and sharded for performance and protection. + +It incorporates the following functionality; + +### Deployment + +Apache Cassandra can be deployed into multiple datacenters in separate regions or availability/failure zones. k8ssandra-operator makes this possible by enabling communication between multiple Kubernetes clusters and deploying Cassandra datacenters into them. + +This distinguishes k8ssandra-operator from [cass-operator](https://github.com/k8ssandra/cass-operator) (which is used internally within k8ssandra-operator) which does not automate multi-region deployments. + +A single k8ssandra-operator instance in a control plane cluster can manage many data plane DCs across multiple Kubernetes clusters, and split across multiple Cassandra clusters. Clusters of up to 1000 nodes have been [tested](https://dok.community/blog/1000-node-cassandra-cluster-on-amazons-eks/) and confirmed to perform well. + +Advanced Cassandra features such as Change Data Capture (CDC) are supported and can be configured using Kubernetes manifests. + +### Monitoring + +Monitoring is a critical service in any distributed system, and k8ssandra-operator provides a rich suite of Apache Cassandra metrics via an [agent](https://github.com/k8ssandra/management-api-for-apache-cassandra) added to the Cassandra JVM. + +By integrating with [Vector](https://vector.dev/), k8ssandra-operator allows metrics to flow to a location of the user's choice, including an existing [Prometheus](https://prometheus.io/) or [Mimir](https://grafana.com/oss/mimir/) instance. A variety of other protocols and systems such as AMQP, Elasticsearch, Kafka, or Redis (see [here](https://vector.dev/docs/reference/configuration/sinks/) for a full list of integrations) are also supported. + +Metrics pipelines can be configured using Kubernetes custom resources, allowing for the creation of multiple pipelines to support different use cases across many clusters. + +Cassandra auditing and monitoring features such as full query logging are supported and can be configured direct from a K8ssandraCluster manifest. + +### Repairs and data maintenance + +Apache Cassandra requires regular maintenance to ensure data is replicated consistently across the cluster. k8ssandra-operator automates this process by running repairs on a regular schedule using [Reaper](https://cassandra-reaper.io/), a widely adopted solution for anti-entropy repairs in Cassandra maintained by the K8ssandra team. + +Using k8ssandra-operator, you can use Kubernetes manifests to configure and monitor the success of repair schedules across many Cassandra datacenters and clusters. + +### Backup and restore + +k8ssandra-operator uses [Medusa](https://github.com/thelastpickle/cassandra-medusa) to enable backup of Cassandra's SSTables to cloud storage locations such as S3 buckets, GCS and Azure storage. + +Backup and restore schedules can be configured using Kubernetes manifests, allowing for declarative, auditable management of backup and restore processes. + +### Flexible APIs + +[Stargate](https://stargate.io/) for Apache Cassandra offers advanced APIs including integration with the [Mongoose](https://mongoosejs.com/) object modelling framework for node.js, GraphQL, and REST. It can also enhance Cassandra's native CQL performance in some cluster topologies. + +Using k8ssandra-operator, Stargate can be deployed and configured via simple Kubernetes manifests. + +### Where to from here? + This documentation covers everything from install details, deployed components, configuration references, and guided outcome-based tasks. +To install k8ssandra-operator start [here] ({{< relref "install/" >}}). + Be sure to leave us a star on Github! -## Features for single- and multi-cluster Kubernetes environments -| K8ssandra Operator: enhanced capabilities | Initial K8ssandra project| -| ----------- | ----------- | -| K8ssandra Operator is our most recent offering. In a **unified operator**, K8ssandra Operator provides an entirely new, solidified set of features for Kubernetes + Cassandra deployments. The features include robust management (cass-operator), API integration (Stargate), anti-entropy repairs (Reaper), and backup/restore (Medusa). Important enhancements include **multi-cluster** and **multi-region** support, which enables greater scalability and availability for enterprise apps and data. Single cluster/region deployments are also supported with K8ssandra Operator.| K8ssandra v1.4.x is our project's initial implementation. It continues to provide a set of separate Helm charts that you can use to configure and deploy Apache Cassandra® into a single-cluster, single-region Kubernetes environment. | -| For enhanced capabilities, we recommend that you explore K8ssandra Operator [local install]({{< relref "install/local" >}}) topic, which focuses on single- or multi-cluster deployments on local dev **kind** Kubernetes clusters, using the provided `make` commands, `helm`, or `kustomize`. | Start in the K8ssandra v1.4.x [install](https://docs-v1.k8ssandra.io/install/local/) topics, which include the steps for single-cluster installs on local or cloud-provider Kubernetes platforms. | +## K8ssandra 1x + +We previously released a product named "k8ssandra" (as distinct from k8ssandra-operator). This product comprised a set of Helm charts and is still available for existing users. + +We strongly advise new users to adopt k8ssandra-operator since that is where future development is continuing. -If you're using K8ssandra v1.4.x, you may continue to do so. Or consider stepping up to the project's latest implementation with K8ssandra Operator. +A comparison between the two can be found [here]({{< relref "reference/old-k8ssandra" >}}). ## Compatibility matrix diff --git a/docs/content/en/reference/old-k8ssandra/_index.md b/docs/content/en/reference/old-k8ssandra/_index.md new file mode 100644 index 000000000..827c5bd23 --- /dev/null +++ b/docs/content/en/reference/old-k8ssandra/_index.md @@ -0,0 +1,8 @@ +## Features for single- and multi-cluster Kubernetes environments + +| K8ssandra Operator: enhanced capabilities | Initial K8ssandra project| +| ----------- | ----------- | +| K8ssandra Operator is our most recent offering. In a **unified operator**, K8ssandra Operator provides an entirely new, solidified set of features for Kubernetes + Cassandra deployments. The features include robust management (cass-operator), API integration (Stargate), anti-entropy repairs (Reaper), and backup/restore (Medusa). Important enhancements include **multi-cluster** and **multi-region** support, which enables greater scalability and availability for enterprise apps and data. Single cluster/region deployments are also supported with K8ssandra Operator.| K8ssandra v1.4.x is our project's initial implementation. It continues to provide a set of separate Helm charts that you can use to configure and deploy Apache Cassandra® into a single-cluster, single-region Kubernetes environment. | +| For enhanced capabilities, we recommend that you explore K8ssandra Operator [local install]({{< relref "install/local" >}}) topic, which focuses on single- or multi-cluster deployments on local dev **kind** Kubernetes clusters, using the provided `make` commands, `helm`, or `kustomize`. | Start in the K8ssandra v1.4.x [install](https://docs-v1.k8ssandra.io/install/local/) topics, which include the steps for single-cluster installs on local or cloud-provider Kubernetes platforms. | + +If you're using K8ssandra v1.4.x, you may continue to do so. Or consider stepping up to the project's latest implementation with K8ssandra Operator. From 0d39f015198f56c9409618f4941fa35cb48f6720 Mon Sep 17 00:00:00 2001 From: Miles-Garnsey Date: Tue, 3 Oct 2023 14:04:21 +1100 Subject: [PATCH 3/5] Add some stuff about vector search and DSE. --- docs/content/en/_index.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/content/en/_index.md b/docs/content/en/_index.md index 279cf9aa7..b32f6a3e8 100755 --- a/docs/content/en/_index.md +++ b/docs/content/en/_index.md @@ -12,11 +12,13 @@ description: "K8ssandra documentation: architecture, configuration, guided tasks type: docs --- -k8ssandra-operator is a turnkey solution to manage [Apache Cassandra](https://cassandra.apache.org/_/index.html) on Kubernetes. Apache Cassandra is the premiere wide column NoSQL data store, offering low latency, geo-replication, and the capacity to store petabytes of data. Apache Cassandra is in use in 90% of the Fortune 500 in some capacity. +k8ssandra-operator is a turnkey solution to manage [Apache Cassandra](https://cassandra.apache.org/_/index.html) and [DSE](https://www.datastax.com/products/datastax-enterprise) on Kubernetes. Apache Cassandra is the premiere wide column NoSQL data store, offering low latency, geo-replication, and the capacity to store petabytes of data. Apache Cassandra is in use in 90% of the Fortune 500 in some capacity. + +DSE is the DataStax distribution of Apache Cassandra, offering additional features such as advanced security, analytics, and search, as well as features not yet available in Cassandra like vector search for generative AI applications. k8ssandra-operator allows for the deployment of multiple Apache Cassandra datacenters, spanned over multiple Kubernetes clusters. The intention of this architecture is to provide geo-replication to enhance latency (by moving data closer to the end user) and availability (by providing multiple datacenters to serve requests in the event of a datacenter failure or network partition). -Apache Cassandra offers rack and failure zone aware data replication which is both replicated and sharded for performance and protection. +Apache Cassandra offers rack and failure zone aware data replication which is both replicated and sharded for performance and protection. It incorporates the following functionality; From e1c73f57a7ec4aa53264c5338b418fd8cbb5fcc7 Mon Sep 17 00:00:00 2001 From: Miles-Garnsey Date: Tue, 3 Oct 2023 14:09:25 +1100 Subject: [PATCH 4/5] Sync README to the new docs front page. --- README.md | 61 ++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 49 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index d1774c4a3..a0b2dabd4 100644 --- a/README.md +++ b/README.md @@ -2,24 +2,61 @@ **[Documentation site](https://docs.k8ssandra.io/)** -This is the Kubernetes operator for K8ssandra. +k8ssandra-operator is a turnkey solution to manage [Apache Cassandra](https://cassandra.apache.org/_/index.html) and [DSE](https://www.datastax.com/products/datastax-enterprise) on Kubernetes. Apache Cassandra is the premiere wide column NoSQL data store, offering low latency, geo-replication, and the capacity to store petabytes of data. Apache Cassandra is in use in 90% of the Fortune 500 in some capacity. -K8ssandra is a Kubernetes-based distribution of Apache Cassandra that includes several tools and components that automate and simplify configuring, managing, and operating a Cassandra cluster. +DSE is the DataStax distribution of Apache Cassandra, offering additional features such as advanced security, analytics, and search, as well as features not yet available in Cassandra like vector search for generative AI applications. -K8ssandra includes the following components: +k8ssandra-operator allows for the deployment of multiple Apache Cassandra datacenters, spanned over multiple Kubernetes clusters. The intention of this architecture is to provide geo-replication to enhance latency (by moving data closer to the end user) and availability (by providing multiple datacenters to serve requests in the event of a datacenter failure or network partition). -* [Cassandra](https://cassandra.apache.org/) -* [Stargate](https://stargate.io/) -* [Medusa](https://github.com/thelastpickle/cassandra-medusa) -* [Reaper](http://cassandra-reaper.io/) -* [Grafana](https://grafana.com/) -* [Prometheus](https://prometheus.io/) +Apache Cassandra offers rack and failure zone aware data replication which is both replicated and sharded for performance and protection. -K8ssandra 1.x is configured, packaged, and deployed via Helm charts. Those Helm charts can be found in the [k8ssandra](https://github.com/k8ssandra/k8ssandra) repo. +It incorporates the following functionality; -K8ssandra 2.x will be based on this operator. +### Deployment -One of the primary features of this operator is multi-cluster support which will facilitate multi-region Cassandra clusters. +Apache Cassandra can be deployed into multiple datacenters in separate regions or availability/failure zones. k8ssandra-operator makes this possible by enabling communication between multiple Kubernetes clusters and deploying Cassandra datacenters into them. + +This distinguishes k8ssandra-operator from [cass-operator](https://github.com/k8ssandra/cass-operator) (which is used internally within k8ssandra-operator) which does not automate multi-region deployments. + +A single k8ssandra-operator instance in a control plane cluster can manage many data plane DCs across multiple Kubernetes clusters, and split across multiple Cassandra clusters. Clusters of up to 1000 nodes have been [tested](https://dok.community/blog/1000-node-cassandra-cluster-on-amazons-eks/) and confirmed to perform well. + +Advanced Cassandra features such as Change Data Capture (CDC) are supported and can be configured using Kubernetes manifests. + +### Monitoring + +Monitoring is a critical service in any distributed system, and k8ssandra-operator provides a rich suite of Apache Cassandra metrics via an [agent](https://github.com/k8ssandra/management-api-for-apache-cassandra) added to the Cassandra JVM. + +By integrating with [Vector](https://vector.dev/), k8ssandra-operator allows metrics to flow to a location of the user's choice, including an existing [Prometheus](https://prometheus.io/) or [Mimir](https://grafana.com/oss/mimir/) instance. A variety of other protocols and systems such as AMQP, Elasticsearch, Kafka, or Redis (see [here](https://vector.dev/docs/reference/configuration/sinks/) for a full list of integrations) are also supported. + +Metrics pipelines can be configured using Kubernetes custom resources, allowing for the creation of multiple pipelines to support different use cases across many clusters. + +Cassandra auditing and monitoring features such as full query logging are supported and can be configured direct from a K8ssandraCluster manifest. + +### Repairs and data maintenance + +Apache Cassandra requires regular maintenance to ensure data is replicated consistently across the cluster. k8ssandra-operator automates this process by running repairs on a regular schedule using [Reaper](https://cassandra-reaper.io/), a widely adopted solution for anti-entropy repairs in Cassandra maintained by the K8ssandra team. + +Using k8ssandra-operator, you can use Kubernetes manifests to configure and monitor the success of repair schedules across many Cassandra datacenters and clusters. + +### Backup and restore + +k8ssandra-operator uses [Medusa](https://github.com/thelastpickle/cassandra-medusa) to enable backup of Cassandra's SSTables to cloud storage locations such as S3 buckets, GCS and Azure storage. + +Backup and restore schedules can be configured using Kubernetes manifests, allowing for declarative, auditable management of backup and restore processes. + +### Flexible APIs + +[Stargate](https://stargate.io/) for Apache Cassandra offers advanced APIs including integration with the [Mongoose](https://mongoosejs.com/) object modelling framework for node.js, GraphQL, and REST. It can also enhance Cassandra's native CQL performance in some cluster topologies. + +Using k8ssandra-operator, Stargate can be deployed and configured via simple Kubernetes manifests. + +### Where to from here? + +This documentation covers everything from install details, deployed components, configuration references, and guided outcome-based tasks. + +To install k8ssandra-operator start [here] ({{< relref "install/" >}}). + +Be sure to leave us a star on Github! ## Architecture The K8ssandra operator is being developed with multi-cluster support first and foremost in mind. It can be used seamlessly in single-cluster deployments as well. From 6ae401e7e1f8535738d08ce25ff862262d7a66d7 Mon Sep 17 00:00:00 2001 From: Miles Garnsey <11435896+Miles-Garnsey@users.noreply.github.com> Date: Thu, 9 Nov 2023 12:47:30 -0800 Subject: [PATCH 5/5] Update README.md Co-authored-by: Christopher Bradford --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a0b2dabd4..6f6216565 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ k8ssandra-operator is a turnkey solution to manage [Apache Cassandra](https://cassandra.apache.org/_/index.html) and [DSE](https://www.datastax.com/products/datastax-enterprise) on Kubernetes. Apache Cassandra is the premiere wide column NoSQL data store, offering low latency, geo-replication, and the capacity to store petabytes of data. Apache Cassandra is in use in 90% of the Fortune 500 in some capacity. -DSE is the DataStax distribution of Apache Cassandra, offering additional features such as advanced security, analytics, and search, as well as features not yet available in Cassandra like vector search for generative AI applications. +DataStax Enterprise, DSE, is the DataStax distribution of Apache Cassandra, offering additional features such as advanced security, search, and graph, as well as features not yet available in Cassandra like vector search for generative AI applications. k8ssandra-operator allows for the deployment of multiple Apache Cassandra datacenters, spanned over multiple Kubernetes clusters. The intention of this architecture is to provide geo-replication to enhance latency (by moving data closer to the end user) and availability (by providing multiple datacenters to serve requests in the event of a datacenter failure or network partition).