This document provides detailed information for deploying the Solace PubSub+ Software Event Broker on Kubernetes, using the Solace PubSub+ Event Broker Operator. A basic understanding of Kubernetes concepts is assumed.
The following additional set of documentation is also available:
- For a hands-on quick start, refer to the Quick Start guide.
- For the
PubSubPlusEventBroker
custom resource (deployment configuration, or "broker spec") parameter options, refer to the PubSub+ Event Broker Operator Parameters Reference. - For version-specific information, refer to the Operator Release Notes
This guide is focused on deploying the event broker using the Operator, which is the preferred way to deploy. Note that a Helm-based deployment is also supported but out of scope for this document.
Contents:
- Solace PubSub+ Event Broker Operator User Guide
- The Solace PubSub+ Software Event Broker
- Overview
- Supported Kubernetes Environments
- Deployment Architecture
- Deployment Planning
- Exposing Metrics to Prometheus
- Broker Deployment Guide
- Operator Deployment Guide
- Migration from Helm-based deployments
PubSub+ Platform is a complete event streaming and management platform for the real-time enterprise. The PubSub+ Software Event Broker efficiently streams event-driven information between applications, IoT devices, and user interfaces running in the cloud, on-premises, and in hybrid environments using open APIs and protocols like AMQP, JMS, MQTT, REST and WebSocket. It can be installed into a variety of public and private clouds, PaaS, and on-premises environments. Event brokers in multiple locations can be linked together in an Event Mesh to dynamically share events across the distributed enterprise.
The PubSub+ Event Broker Operator supports:
- Installing a PubSub+ Software Event Broker in non-HA or HA mode.
- Adjusting the deployment to updated parameters (with limitations).
- Upgrading to a new broker version.
- Repairing the deployment.
- Enabling Prometheus monitoring.
- Providing status of the deployment.
After you have installed the Operator, you can deploy an event broker by simply creating a PubSubPlusEventBroker
manifest that declares the broker properties in Kubernetes. This is no different from creating any Kubernetes-native resource, for example a Pod.
Kubernetes passes the manifest to the Operator and the Operator supervises the deployment from beginning to completion. The Operator also takes corrective action or provides notification if the deployment deviates from the desired state.
The Operator supports Kubernetes version 1.23 or later and is generally expected to work in complying Kubernetes environments.
This includes OpenShift because there are provisions in the Operator to detect OpenShift environment and seamlessly adjust defaults. Details are provided with the appropriate parameters.
The PubSub+ Operator is following the Kubernetes Operator Pattern. The diagram gives an overview of this mechanism:
PubSubPlusEventBroker
is registered with Kubernetes as a Custom Resource and it becomes a recognised Kubernetes object type.- The
PubSub+ Event Broker Operator
packaged in a Pod within a Deployment must be in running state. It is configured with a set of Kubernetes namespaces to watch, which can be a list of specified ones or all. - Creating a
PubSubPlusEventBroker
Custom Resource (CR) in a watched namespace triggers the creation of a new PubSub+ Event Broker deployment that meets the properties specified in the CR manifest (also referred to as "broker spec"). - Deviation of the deployment from the desired state or a change in the CR spec also triggers the operator to reconcile, that is to adjust the deployment towards the desired state.
- The operator runs reconcile in loops, making one adjustment at a time, until the desired state has been reached.
- Note that RBAC settings are required to permit the operator create Kubernetes objects, especially in other namespaces. Refer to the Security section for further details.
The activity of the Operator can be followed from its Pod logs as described in the troubleshooting section.
The diagram illustrates a Highly Available (HA) PubSub+ Event Broker deployment in Kubernetes. HA deployment requires three brokers in designated roles of Primary, Backup and Monitor in an HA group.
- At the core, there are the Pods running the broker containers and the associated Persistent Volume Claim (PVC) storage elements, directly managed by dedicated StatefulSets.
- Secrets are mounted on the containers feeding into the security configuration.
- There are also a set of shell scripts in a ConfigMap mounted on each broker container. They take care of configuring the broker at startup and conveying internal broker state to Kubernetes by reporting readiness and signalling which Pod is active and ready for service traffic. Active status is signalled by setting an
active=true
Pod label. - A Service exposes the active broker Pod's services at service ports to clients.
- An additional Discovery Service enables internal access between brokers.
- Signaling active broker state requires permissions for a Pod to update its own label so this needs to be configured using RBAC settings for the deployment.
The Operator ensures that all above objects are in place, with the exception of the Pods and storage managed by the StatefulSets. This ensures that even if the Operator is temporarily out of service, the broker stays functional and resilient (noting that introducing changes are note possible during that time) because the StatefulSets control the Pods directly.
A non-HA deployment differs from HA in that: (1) there is only one StatefulSet managing one Pod that hosts the single broker; (2) there is no Discovery Service for internal communication; and (3) there is no pre-shared AuthenticationKey to secure internal communication.
Note: Each event broker deployment conforms to the guidelines for naming objects in Kubernetes. For example you can not have multiple event brokers with the same name in the same namespace, port names must be at least one character and no more than 15 characters long. For more info on other guidelines : https://kubernetes.io/docs/concepts/overview/working-with-objects/names/
Support can be enabled for exposing broker metrics to Prometheus Monitoring. Prometheus requires an exporter running that pulls requested metrics from the monitored application - the broker in this case.
- When monitoring is enabled, the Operator adds an Exporter Pod to the broker deployment. The Exporter Pod acts as a bridge between Prometheus and the broker deployment to deliver metrics.
- On one side, the Exporter Pod obtains metrics from the broker using SEMP requests. To access the broker, it uses the username and password from the MonitoringCredentials secret, and uses TLS access to the broker if Broker TLS has been configured.
- The metrics are exposed to Prometheus through the Prometheus Metrics Service via the metrics port. The Metrics port is accessible using TLS if Metrics TLS has been enabled.
- As Kubernetes recommended practice, it is assumed that the Prometheus stack has been deployed using the Prometheus operator in a dedicated Prometheus Monitoring namespace. In this setup, a
ServiceMonitor
custom resource, placed in the Event Broker namespace, defines how Prometheus can access the broker metrics—which service to select and which endpoint to use. - Prometheus comes installed with strict security by default. Its ClusterRole RBAC settings must be edited to enable watching ServiceMonitor in the Event Broker namespace.
This section describes options that should be considered when planning a PubSub+ Event Broker deployment, especially for Production.
The Operator supports deploying a single non-HA broker and also HA deployment for fault tolerance. This can be enabled by setting spec.redundancy
to true
in the broker deployment manifest.
No single point of failure is important for HA deployments. Kubernetes by default tries to spread broker pods of an HA redundancy group across Availability Zones. For more deterministic deployments, specific control is enabled using the spec.nodeAssignment
section of the broker spec for the Primary, Backup and Monitor brokers where Kubernetes standard Affinity and NodeSelector definitions can be provided.
In an HA deployment with Primary, Backup, and Monitor nodes, a minimum of two nodes must be available to reach quorum. Specifying a Pod Disruption Budget is recommended to limit situations where quorum might be lost.
This can be enabled setting the spec.podDisruptionBudgetForHA
parameter to true
. This creates a PodDisruptionBudget resource adapted to the broker deployment's needs, that is the number of minimum available pods set to two. Note that the parameter is ignored for a non-HA deployment.
There are two containers used in the deployment with following specifications:
Event Broker | Prometheus Exporter | |
---|---|---|
Image Repository | spec.image.repository |
spec.monitoring.image.repository |
Image Tag | spec.image.tag |
spec.monitoring.image.tag |
Pull Policy | spec.image.pullPolicy |
spec.monitoring.image.pullPolicy |
Pull Secrets | spec.image.pullSecrets |
spec.monitoring.image.pullSecrets |
The Image Repository and Tag parameters combined specify the image to be used for the deployment. They can either point to an image in a public or a private container registry. Pull Policy and Pull Secrets can be specified the standard Kubernetes way.
Example:
image:
repository: solace/solace-pubsub-standard
tag: latest
pullPolicy: IfNotPresent
pullSecrets:
- pull-secret
For the broker image, default values are solace/solace-pubsub-standard/
and latest
, which is the free PubSub+ Software Event Broker Standard Edition from the public Solace Docker Hub repo. It is generally recommended to set the image tag to a specific build for traceability purposes.
Similarly, the default exporter image values are solace/solace-pubsub-prometheus-exporter
and latest
.
Follow the general steps below to load an image into a private container registry (e.g.: GCR, ECR or Harbor). For specifics, consult the documentation of the registry you are using.
podman login <private-registry> ...
- First, load the broker image to the local registry:
Important There are broker container image variants for
Amd64
andArm64
architectures. Ensure to use the correct image.
# Options a) or b) depending on your image source:
## Option a): If you have a local tar.gz image file
podman load -i <solace-pubsub-XYZ>.tar.gz
## Option b): You can use the public Solace container image, such as from Docker Hub
podman pull solace/solace-pubsub-standard:latest # or specific <TagName>
#
# Verify the image has been loaded and note the associated "IMAGE ID"
podman images
- Tag the image with a name specific to the private registry and tag:
podman tag <image-id> <private-registry>/<path>/<image-name>:<tag>
- Push the image to the private registry
podman push <private-registry>/<path>/<image-name>:<tag>
Note that additional steps might be required if using signed images.
An ImagePullSecret might be required if pulling images from a private registry, e.g.: Harbor.
Here is an example of creating an ImagePullSecret. Refer to your registry's documentation for the specific details of use.
kubectl create secret docker-registry <pull-secret-name> --dockerserver=<private-registry-server> \
--docker-username=<registry-user-name> --docker-password=<registry-user-password> \
--docker-email=<registry-user-email>
Then add <pull-secret-name>
to the list under the image.pullSecrets
parameter.
The PubSub+ Event Mesh can be scaled vertically and horizontally.
You can horizontally scale your mesh by connecting multiple broker deployments. This is out of scope for this document.
For vertical scaling, you set the maximum capacity of a given broker deployment using system scaling parameters.
The following scaling parameters can be specified:
- Maximum Number of Client Connections, in
spec.systemScaling.maxConnections
parameter - Maximum Number of Queue Messages, in
spec.systemScaling.maxQueueMessages
parameter - Maximum Spool Usage, in
spec.systemScaling.maxSpoolUsage
parameter
In addition, for a given set of scaling parameters, the event broker container CPU and memory requirements must be calculated and provided in the spec.systemScaling.cpu
and spec.systemScaling.memory
parameters. Use the System Resource Calculator to determine the CPU and memory requirements for the selected scaling parameters.
Example:
spec:
systemScaling:
maxConnections: 100
maxQueueMessages: 100
maxSpoolUsage: 1000
messagingNodeCpu: "2"
messagingNodeMemory: "4025Mi"
or
spec:
systemScaling:
system_scaling_maxconnectioncount: 100
system_scaling_maxqueuemessagecount: 100
messagespool_maxspoolusage: 1000
messagingNodeCpu: "2"
messagingNodeMemory: "4025Mi"
Note: Beyond CPU and memory requirements, broker storage size (see Storage section) must also support the provided scaling. The calculator can be used to determine that as well.
Also note, that specifying maxConnections
, maxQueueMessages
, and maxSpoolUsage
on initial deployment overwrites the broker’s default values. On the other hand, doing the same using upgrade on an existing deployment does not overwrite these values on brokers configuration, but it can be used to prepare (first step) for a manual scale up using CLI where these parameter changes would actually become effective (second step). The Operator will use default configurations in situations where the scaling parameters are not provided or are not valid.
Note: The scaling parameters intentionally use a mix of camelCase and snake_case to maintain backward and forward compatibility with Solace PubSub+ Software Event Broker configurations. Make sure values are not duplicated for consistency. When using the resource calculator, ensure that the scaling parameters are in the correct format to match what the Solace PubSub+ Software Event Broker expects. If invalid scaling parameters are provided, the Operator will revert to default values. For the list of default values, please refer to this link.
A minimum footprint deployment option is available for development purposes but with no guaranteed performance. The minimum available resources requirements are 1 CPU, 3.4 GiB memory and 7Gi of disk storage additional to the Kubernetes environment requirements.
To activate, set spec.developer
to true
.
Important: If set to
true
,spec.developer
has precedence over anyspec.systemScaling
vertical scaling settings.
The PubSub+ deployment uses disk storage for logging, configuration, guaranteed messaging, and storing diagnostic and other information, allocated from Kubernetes volumes.
For a given set of scaling, use the Solace online System Resource Calculator to determine the required storage size.
The broker pods can use following storage options:
- Dynamically allocated storage from a Kubernetes Storage Class (default)
- Static storage using a Persistent Volume Claim linked to a Persistent Volume
- Ephemeral storage
Note: Ephemeral storage is generally not recommended. It might be acceptable for temporary deployments understanding that all configuration and messages are lost with the loss of the broker pod.
The recommended default allocation is using Kubernetes Dynamic Volume Provisioning utilizing Storage Classes.
The StatefulSet controlling a broker pod creates a Persistent Volume Claim (PVC) specifying the requested size and the Storage Class of the volume; a Persistent Volume (PV) is allocated from the storage class pool that meets these requirements. Both the PVC and PV names are linked to the broker pod's name. If you delete the event broker pod(s) or even the entire deployment, the PVC and the allocated PV are not deleted, so potentially complex configuration is preserved. The PVC and PV are re-mounted and reused with the existing configuration when a new pod starts (controlled by the StatefulSet, automatically matched to the old pod even in an HA deployment) or when a deployment with the same as the old name is started. Explicitly delete a PVC if you no longer need it—this deletes the corresponding PV. For more information, see to Deleting a Deployment.
Example:
spec:
storage:
messagingNodeStorageSize: 30Gi
monitorNodeStorageSize: 3Gi
# dynamic allocation
useStorageClass: standard
For message processing brokers (this includes the single broker in non-HA deployment), the requested storage size is set using the spec.storage.messagingNodeStorageSize
parameter. If not specified then the default value of 30Gi
is used. If the storage size is set to 0
then useStorageClass
is disregarded and pod-local ephemeral storage is used.
When deploying PubSub+ in an HA redundancy group, monitoring broker nodes have minimal storage requirements compared to working nodes. It is recommended to leave the spec.storage.monitorNodeStorageSize
parameter unspecified or at default. Although monitoring nodes will work with zero persistent (ephemeral) storage it is recommended to allocate the minimum so diagnostic information remains available with the loss of the monitoring pod.
Set the spec.storage.useStorageClass
parameter to use a particular storage class or leave this parameter to default undefined to allocate from your platform's "default" storage class - ensure it exists.
# Check existing storage classes
kubectl get storageclass
Create a new storage class if no existing storage class meets your needs and then specify to use that storage class. Refer to your Kubernetes environment's documentation if a StorageClass needs to be created or to understand the differences if there are multiple options. Example:
# AWS fast storage class example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
fsType: xsf
If using NFS, or generally if allocating from a defined Kubernetes Persistent Volume, specify a storageClassName
in the PV manifest as in this NFS example, then set the spec.storage.useStorageClass
parameter to the same:
# Persistent Volume example
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
spec:
storageClassName: my-nfs
capacity:
storage: 15Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /tmp
server: 172.17.0.2
Note: NFS is currently supported for development and demo purposes. If using NFS also set the
spec.storage.slow
parameter totrue
:
spec:
storage:
messagingNodeStorageSize: 5Gi
useStorageClass: my-nfs
slow: true
You can to use an existing PVC with its associated PV for storage, but it must be taken into account that the deployment tries to use any existing, potentially incompatible, configuration data on that volume. The PV size must also meet the broker scaling requirements.
PVCs must be assigned individually to the brokers in an HA deployment. Assign a PVC to the Primary in case of non-HA.
spec:
storage:
customVolumeMount:
- name: Primary
persistentVolumeClaim:
claimName: my-primary-pvc-name
- name: Backup
persistentVolumeClaim:
claimName: my-backup-pvc-name
- name: Monitor
persistentVolumeClaim:
claimName: my-monitor-pvc-name
Note: Whenever existing PVC is reused, the deployment should maintain the same name to keep DNS configurations in sync. An out of sync DNS configuration will produce unintended consequences.
The PubSub+ Software Event Broker has been tested to work with Portworx, Ceph, Cinder (Openstack) and vSphere storage for Kubernetes as documented here.
Regarding providers, note that for EKS and GKE, xfs
produced the best results during tests. AKS users can opt for Local Redundant Storage (LRS)
redundancy which produced the best results when compared with other types available on Azure.
Broker services (messaging, management) are available through the service ports of the Broker Service object created as part of the deployment.
Clients can access the service ports directly through a configured standard Kubernetes service type. Alternatively, services can be mapped to Kubernetes Ingress. These options are discussed in details in the upcoming Using Service Type and Using Ingress sections.
Note: An OpenShift-specific alternative of exposing services through Routes is described in the PubSub+ Openshift Deployment Guide.
Enabling TLS for services is recommended. For details, see Configuring TLS for Services.
Regardless the way to access services, the Service object is always used and it determines when and which broker Pod provides the actual service as explained in the next section.
The first criteria for a broker Pod to be selected for service is its readiness—if readiness is failing, Kubernetes stops sending traffic to the pod until it passes again.
The second, additional criteria is the pod label set to active=true
.
Both pod readiness and label are updated periodically (every 5 seconds), triggered by the pod readiness probe. This probe invokes the readiness_check.sh
script which is mounted on the broker container.
The requirements for a broker pod to satisfy both criteria are:
- The broker must be in Guaranteed Active service state, that is providing Guaranteed Messaging Quality-of-Service (QoS) level of event messages persistence. If service level is degraded even to Direct Messages QoS this is no longer sufficient.
- Management service must be up at the broker container level at port 8080.
- In an HA deployment, networking must enable the broker pods to communicate with each-other at the internal ports using the Service-Discovery service.
- The Kubernetes service account associated with the deployment must have sufficient rights to patch the pod's label when the active event broker is service ready
- The broker pods must be able to communicate with the Kubernetes API at kubernetes.default.svc.cluster.local at port $KUBERNETES_SERVICE_PORT. You can find out the address and port by SSH into the pod.
In summary, a deployment is ready for service requests when there is a broker pod that is running, 1/1
ready, and the pod's label is active=true
. An exposed service port forwards traffic to that active event broker node. Pod readiness and labels can be checked with the command:
kubectl get pods --show-labels
Note: The Operator uses SEMP to determine the active PubSub+ Event Broker. Restarting SEMP will affect the active broker selection.
PubSub+ services can be exposed using one of the following Kubernetes service types by specifying the spec.service.type
parameter:
- LoadBalancer (default) - a load balancer, typically externally accessible depending on the K8s provider.
- NodePort - maps PubSub+ services to a port on a Kubernetes node; external access depends on access to the Kubernetes node.
- ClusterIP - internal access only from within K8s.
To support Internal load balancers, a provider-specific service annotation can be added by defining the spec.service.annotations
parameter.
The spec.service.ports
parameter defines the broker ports/services exposed. It specifies the event broker containerPort
that provides the service and the mapping to the servicePort
where the service can be accessed when using LoadBalancer or ClusterIP - however there is no control over the port number mapping when using NodePort. By default most broker service ports are exposed, refer to the "pubsubpluseventbrokers" Custom Resource definition.
Example:
spec:
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
ports:
- servicePort: 55555
containerPort: 55555
protocol: TCP
name: tcp-smf
- ...
Default broker deployment does not have TLS over TCP enabled to access broker services. Although the exposed spec.service.ports
include ports for secured TCP, only the insecure ports can be used by default.
To enable accessing services over TLS a server key and certificate must be configured.
It is assumed that you will use a third-party provider to create a server key and certificate for the event broker. The key and certificate must meet the requirements described in the Solace Documentation. If the server key is password protected it must be transformed to an unencrypted key, e.g.: openssl rsa -in encryedprivate.key -out unencryed.key
.
The server key and certificate must be packaged in a Kubernetes secret, for example by creating a TLS secret. Example:
kubectl create secret tls <my-tls-secret> --key="<my-server-key-file>" --cert="<my-certificate-file>"
This secret name and related parameters must be specified in the broker spec:
spec:
tls:
enabled: true
serverTlsConfigSecret: test-tls
certFilename: # optional, default if not provided: tls.crt
certKeyFilename: # optional, default if not provided: tls.key
Note: Ensure that all filenames match those reported when you run
kubectl describe secret <my-tls-secret>
.
Important: It is not possible to update an existing deployment (created without TLS) to enable TLS using the update deployment procedure. In this case, for the first time, certificates must be manually loaded and set up on each broker node. After that it is possible to use update with a secret specified.
In the event the server key or certificate must be rotated the TLS Config Secret must be updated or recreated with the new contents. Alternatively a new secret can be created and the broker spec can be updated with that secret's name.
If you are reusing an existing TLS secret, the new contents are automatically mounted on the broker containers. The Operator is already watching the configured secret for any changes and automatically initiates a rolling pod restart to take effect. Deleting the existing TLS secret does not result in immediate action, but broker pods will not start if the specified TLS secret does not exist.
Note: A pod restart results in provisioning the server certificate from the secret again, so it reverts back from any other server certificate that might have been provisioned on the broker through another mechanism.
The LoadBalancer
or NodePort
service types can be used to expose all services from one PubSub+ broker (one-to-one relationship). Ingress can be used to enable efficient external access from a single external IP address to multiple PubSub+ services, potentially provided by multiple brokers.
The following table gives an overview of how external access can be configured for PubSub+ services via Ingress.
PubSub+ service / protocol, configuration and requirements | HTTP, no TLS | HTTPS with TLS terminate at ingress | HTTPS with TLS re-encrypt at ingress | General TCP over TLS with passthrough to broker |
---|---|---|---|---|
Notes: | -- | Requires TLS config on Ingress-controller | Requires TLS config on broker AND TLS config on Ingress-controller | Requires TLS config on broker. Client must use SNI to provide target host |
WebSockets, MQTT over WebSockets | Supported | Supported | Supported | Supported (routing via SNI) |
REST | Supported with restrictions: if publishing to a Queue, only root path is supported in Ingress rule or must use rewrite target annotation. For Topics, the initial path would make it to the topic name. | Supported, see prev. note | Supported, see prev. note | Supported (routing via SNI) |
SEMP | Not recommended to expose management services without TLS | Supported with restrictions: (1) Only root path is supported in Ingress rule or must use rewrite target annotation; (2) Non-TLS access to SEMP must be enabled on broker | Supported with restrictions: only root path is supported in Ingress rule or must use rewrite target annotation | Supported (routing via SNI) |
SMF, SMF compressed, AMQP, MQTT | - | - | - | Supported (routing via SNI) |
SSH* | - | - | - | - |
*SSH has been listed here for completeness only, external exposure not recommended.
All examples assume NGINX used as ingress controller (documented here), selected because NGINX is supported by most K8s providers. For other ingress controllers refer to their respective documentation.
To deploy the NGINX Ingress Controller, refer to the Quick start in the NGINX documentation. After successful deployment get the ingress External-IP or FQDN with the following command:
kubectl get service ingress-nginx-controller --namespace=ingress-nginx
This is the IP (or the IP address the FQDN resolves to) of the ingress where external clients must target their request and any additional DNS-resolvable hostnames, used for name-based virtual host routing, must also be configured to resolve to this IP address. If using TLS then the host certificate Common Name (CN) and/or Subject Alternative Name (SAN) must be configured to match the respective FQDN.
For options to expose multiple services from potentially multiple brokers, review the Types of Ingress from the Kubernetes documentation.
The next examples provide Ingress manifests that can be applied using kubectl apply -f <manifest-yaml>
. Then check that an external IP address (ingress controller external IP) has been assigned to the rule/service and also that the host/external IP is ready for use because it could take a some time for the address to be populated.
kubectl get ingress
NAME CLASS HOSTS
ADDRESS PORTS AGE
example.address nginx frontend.host
20.120.69.200 80 43m
The following example configures ingress to access PubSub+ REST service. Replace <my-pubsubplus-service>
with the name of the service of your deployment (hint: the service name is similar to your pod names). The port name must match the service.ports
name in the broker spec file.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: http-plaintext-example
spec:
ingressClassName: nginx
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: <my-pubsubplus-service>
port:
name: tcp-rest
External requests must be targeted to the ingress External-IP at the HTTP port (80) and the specified path.
Additional to above, this requires specifying a target virtual DNS-resolvable host (here https-example.foo.com
), which resolves to the ingress External-IP, and a tls
section. The tls
section provides the possible hosts and corresponding TLS secret that includes a private key and a certificate. The certificate must include the virtual host FQDN in its CN and/or SAN, as described above. Hint: TLS secrets can be easily created from existing files.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: https-ingress-terminated-tls-example
spec:
ingressClassName: nginx
tls:
- hosts:
- https-example.foo.com
secretName: testsecret-tls
rules:
- host: https-example.foo.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: <my-pubsubplus-service>
port:
name: tcp-rest
External requests must be targeted to the ingress External-IP through the defined hostname (here https-example.foo.com
) at the TLS port (443) and the specified path.
This only differs from above in that the request is forwarded to a TLS-encrypted PubSub+ service port. The broker must have TLS configured but there are no specific requirements for the broker certificate as the ingress does not enforce it.
The difference in the Ingress manifest is an NGINX-specific annotation marking that the backend is using TLS, and the service target port in the last line - it refers now to a TLS backend port:
metadata:
:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
:
spec:
:
rules:
:
port:
name: tls-rest
In this case the ingress does not terminate TLS; it only provides routing to the broker Pod based on the hostname provided in the SNI extension of the Client Hello at TLS connection setup. Because it passes TLS traffic directly through to the broker as opaque data, any TCP-based protocol using TLS as transport is enabled for ingress.
The TLS passthrough capability must be explicitly enabled on the NGINX ingress controller, because it is off by default. This can be done by editing the ingress-nginx-controller
"Deployment" in the ingress-nginx
namespace.
- Open the controller for editing:
kubectl edit deployment ingress-nginx-controller --namespace ingress-nginx
- Search where the
nginx-ingress-controller
arguments are provided, insert--enable-ssl-passthrough
to the list and save. For more information refer to the NGINX User Guide. Also note the potential performance impact of using SSL Passthrough mentioned here.
The Ingress manifest specifies "passthrough" by adding the nginx.ingress.kubernetes.io/ssl-passthrough: "true"
annotation.
The deployed PubSub+ broker(s) must have TLS configured with a certificate that includes DNS names in CN and/or SAN, that match the host used. In the example, the broker server certificate can specify the host *.broker1.bar.com
, so multiple services can be exposed from broker1
, distinguished by the host FQDN.
The protocol client must support SNI. It depends on the client if it uses the server certificate CN or SAN for host name validation. Most recent clients use SAN, for example the PubSub+ Java API requires host DNS names in the SAN when using SNI.
With above, an ingress example looks following:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-passthrough-tls-example
annotations:
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
spec:
ingressClassName: nginx
rules:
- host: smf.broker1.bar.com
http:
paths:
- backend:
service:
name: <my-pubsubplus-service>
port:
name: tls-smf
path: /
pathType: ImplementationSpecific
External requests must be targeted to the ingress External-IP through the defined hostname (here smf.broker1.bar.com
) at the TLS port (443) with no path required.
The broker spec enables additional configurations that affect each broker pod or the broker containers of the deployment.
Example:
spec:
podLabels:
"my-label": "my-label-value"
podAnnotations:
"my-annotation": "my-annotation-value"
extraEnvVars:
- name: TestType
value: "GithubAction"
extraEnvVarsCM: "my-env-variables-configmap"
extraEnvVarsSecret: "my-env-variables-secret"
timezone: UTC
Additional Pod labels and annotations can be specified using spec.podLabels
and spec.podAnnotations
.
Additional environment variables can be passed to the broker container in the form of
- a name-value list of variables, using
spec.extraEnvVars
- providing the name of a ConfigMap that contains env variable names and values, using
spec.extraEnvVarsCM
- providing the name of a secret that contains env variable names and values, using
spec.extraEnvVarsSecret
One of the primary use of environment variables is to define configuration keys that are consumed and applied at the broker initial deployment. It must be noted that configuration keys are ignored thereafter so they won't take effect even if updated later.
Finally, the timezone can be passed to the the event broker container.
The default installation of the Operator is optimized for easy deployment and getting started in a wide range of Kubernetes environments even by developers. Production use requires more tightened security. This section provides considerations for that.
The Operator can be configured with a list of namespaces to watch, so it picks up all broker specs created in the watched namespaces and create deployments there. However all other namespaces are ignored.
Watched namespaces can be configured by providing the comma-separated list of namespaces in the WATCH_NAMESPACE
environment variable defined in the container spec of the Deployment which controls the Operator pod. Assigning an empty string (default) means watching all namespaces.
It is recommended to restrict the watched namespaces for Production use. It is generally also recommended to not include the Operator's own namespace in the list because it is easier to separate RBAC settings for the operator from the broker's deployment - see next section.
The Operator requires CRUD permissions to manage all broker deployment resource types (e.g.: secrets) and the broker spec itself. This is defined in a ClusterRole which is bound to the Operator's service account using a ClusterRoleBinding if using the default Operator deployment. This enables the Operator to manage any of those resource types in all namespaces even if they don't belong to a broker deployment.
This needs to be restricted in a Production environment by creating a service account for the Operator in each watched namespace and use RoleBinding to bind the defined ClusterRole in each.
A broker deployment only needs permission to update pod labels. This is defined in a Role, and a RoleBinding is created to the ServiceAccount used for the deployment. Note that without this permission the deployment will not work.
The default deployment of the Operator pulls the Operator image from a public registry. If a Production deployment needs to pull the Operator image from a private registry then the Deployment which controls the Operator pod requires imagePullSecrets
added for that repo:
kind: Deployment
metadata:
name: operator
...
spec:
template:
spec:
imagePullSecrets:
- name: regcred
...
containers:
- name: manager
image: private/controller:latest
...
A management admin
and a monitor
user are created at the broker initial deployment, with global access level "admin" and "read-only", respectively. The passwords are auto-generated if not provided and are stored in Operator-generated secrets.
It is also possible to provide pre-existing secrets containing the respective passwords as in the following example:
spec:
adminCredentialsSecret: my-admin-credetials-secret
monitoringCredentialsSecret: my-monitoring-credetials-secret
The secrets must contain following data files: username_admin_password
and username_monitor_password
with password contents, respectively.
Important: These secrets are used at initial broker deployment to setup passwords. Changing the secret contents later will not result in password updates in the broker. However changing the secret contents at a later point is service affecting as scripts in the broker container itself are using the passwords stored to access the broker's own management service. To fix a password discrepancy, log into each broker pod and use CLI password change to set the password for the user account to the same as in the secret.
Using secrets for storing passwords, TLS server keys and certificates in the broker deployment namespace follows Kubernetes recommendations.
In a Production environment additional steps are required to ensure there is only authorized access to these secrets. Is recommended to follow industry Kubernetes security best practices including setting tight RBAC permissions for the namespace and harden security of the API server's underlying data store (etcd).
The following container-level security context configuration is automatically set by the operator for the PubSub+ broker container
capabilities:
drop:
- ALL
privileged: false
runAsNonRoot: true
allowPrivilegeEscalation: false
Following additional settings are configurable using broker spec parameters:
spec:
securityContext:
runAsUser: 1000001
fsGroup: 1000002
Above are generally the defaults if not provided. It must be noted that the Operator detects whether the current Kubernetes environment is OpenShift. In that case, if not provided, the default runAsUser
and fsGroup
are set to unspecified because otherwise they would conflict with the OpenShift "restricted" Security Context Constraint settings for a project.
In a controlled environment it might be necessary to configure a NetworkPolicy to enable required communication between the broker nodes as well as between the broker container and the API server to set the Pod label.
Refer to the Prometheus Monitoring Support section for an overview of how metrics are exposed.
This section describes how to enable and configure the metrics exporter and the available metrics from the broker deployment, configure Prometheus to use that, and finally an example setup of Grafana to visualize broker metrics.
To enable monitoring with all defaults, simply add spec.monitoring.enabled: true
to the broker spec. This sets up a metrics service endpoint through a Prometheus Metrics Service that offers a REST API that responds with broker metrics to GET requests.
The next more advanced example shows a configuration with additional configuration to specify that the exporter image is pulled from a private repo using a pull secret, the service type is set to Kubernetes internal ClusterIP
, and also TLS is enabled for the service with key and certificate contained in Secret monitoring-tls
. The way to create the Secret is the same as for the broker TLS configuration.
spec:
monitoring:
enabled: true
image:
repository: solace/pubsubplus-prometheus-exporter
tag: latest
pullSecrets:
- name: regcred
metricsEndpoint:
listenTLS: true
serviceType: ClusterIP # This is the default, exposes service within Kubernetes only
endpointTlsConfigSecret: monitoring-tls
The broker metrics are exposed through the Prometheus Metrics Service REST API (GET only) at port 9628:
kubectl describe svc <eventbroker-deployment-name>-pubsubplus-prometheus-metrics
There are two sets of metrics exposed through two paths:
- Standard: http://
<service-ip>
:9628/solace-std - Additional Details: http://
<service-ip>
:9628/solace-det
Note: The
<service-ip>
address is the Kubernetes internal ClusterIP address of the Prometheus Metrics Service. Usekubectl port-forward svc/<eventbroker-deployment-name>-pubsubplus-prometheus-metrics 9628
to expose it through yourlocalhost
for testing.
The following table lists the metrics exposed by the paths:
Path | Definition | Name | Type |
---|---|---|---|
solace-std |
|||
Max number of Local Bridges | solace_bridges_max_num_local_bridges | gauge | |
Max number of Remote Bridges | solace_bridges_max_num_remote_bridges | gauge | |
Max number of Bridges | solace_bridges_max_num_total_bridges | gauge | |
Max total number of Remote Bridge Subscription | solace_bridges_max_num_total_remote_bridge_subscriptions | gauge | |
Number of Local Bridges | solace_bridges_num_local_bridges | gauge | |
Number of Remote Bridges | solace_bridges_num_remote_bridges | gauge | |
Number of Bridges | solace_bridges_num_total_bridges | gauge | |
Total number of Remote Bridge Subscription | solace_bridges_num_total_remote_bridge_subscriptions | gauge | |
Config Sync Ownership (0-Master, 1-Slave, 2-Unknown) | solace_configsync_table_ownership | gauge | |
Config Sync State (0-Down, 1-Up, 2-Unknown, 3-In-Sync, 4-Reconciling, 5-Blocked, 6-Out-Of-Sync) | solace_configsync_table_syncstate | gauge | |
Config Sync Time in State | solace_configsync_table_timeinstateseconds | counter | |
Config Sync Resource (0-Router, 1-Vpn, 2-Unknown, 3-None, 4-All) | solace_configsync_table_type | gauge | |
Average compute latency | solace_system_compute_latency_avg_seconds | gauge | |
Current compute latency | solace_system_compute_latency_cur_seconds | gauge | |
Maximum compute latency | solace_system_compute_latency_max_seconds | gauge | |
Minimum compute latency | solace_system_compute_latency_min_seconds | gauge | |
Average disk latency | solace_system_disk_latency_avg_seconds | gauge | |
Current disk latency | solace_system_disk_latency_cur_seconds | gauge | |
Maximum disk latency | solace_system_disk_latency_max_seconds | gauge | |
Minimum disk latency | solace_system_disk_latency_min_seconds | gauge | |
Average mate link latency | solace_system_mate_link_latency_avg_seconds | gauge | |
Current mate link latency | solace_system_mate_link_latency_cur_seconds | gauge | |
Maximum mate link latency | solace_system_mate_link_latency_max_seconds | gauge | |
Minimum mate link latency | solace_system_mate_link_latency_min_seconds | gauge | |
Redundancy configuration (0-Disabled, 1-Enabled, 2-Shutdown) | solace_system_redundancy_config | gauge | |
Is local node the active messaging node? (0-not active, 1-active). | solace_system_redundancy_local_active | gauge | |
Redundancy role (0=Backup, 1=Primary, 2=Monitor, 3-Undefined). | solace_system_redundancy_role | gauge | |
Is redundancy up? (0=Down, 1=Up). | solace_system_redundancy_up | gauge | |
Total disk usage in percent | solace_system_spool_disk_partition_usage_active_percent | gauge | |
Total disk usage of mate instance in percent | solace_system_spool_disk_partition_usage_mate_percent | gauge | |
Utilization of spool files in percent | solace_system_spool_files_utilization_percent | gauge | |
Spool configured max disk usage | solace_system_spool_quota_bytes | gauge | |
Spool configured max number of messages | solace_system_spool_quota_msgs | gauge | |
Spool total persisted usage | solace_system_spool_usage_bytes | gauge | |
Spool total number of persisted messages | solace_system_spool_usage_msgs | gauge | |
Solace Version as WWWXXXYYYZZZ | solace_system_version_currentload | gauge | |
Broker uptime in seconds | solace_system_version_uptime_totalsecs | gauge | |
Was the last scrape of Solace broker successful? | solace_up | gauge | |
Number of connections | solace_vpn_connections | gauge | |
Total number of AMQP connections | solace_vpn_connections_service_amqp | gauge | |
Total number of SMF connections | solace_vpn_connections_service_smf | gauge | |
VPN is enabled | solace_vpn_enabled | gauge | |
VPN is a management VPN | solace_vpn_is_management_vpn | gauge | |
Local status (0=Down, 1=Up) | solace_vpn_local_status | gauge | |
VPN is locally configured | solace_vpn_locally_configured | gauge | |
VPN is operational | solace_vpn_operational | gauge | |
Maximum number of connections | solace_vpn_quota_connections | gauge | |
Replication Admin Status (0-shutdown, 1-enabled, 2-n/a) | solace_vpn_replication_admin_state | gauge | |
Replication Config Status (0-standby, 1-active, 2-n/a) | solace_vpn_replication_config_state | gauge | |
Replication Tx Replication Mode (0-async, 1-sync) | solace_vpn_replication_transaction_replication_mode | gauge | |
Spool configured max disk usage | solace_vpn_spool_quota_bytes | gauge | |
Spool total persisted usage | solace_vpn_spool_usage_bytes | gauge | |
Spool total number of persisted messages | solace_vpn_spool_usage_msgs | gauge | |
Total unique local subscriptions count | solace_vpn_total_local_unique_subscriptions | gauge | |
Total unique remote subscriptions count | solace_vpn_total_remote_unique_subscriptions | gauge | |
Total unique subscriptions count | solace_vpn_total_unique_subscriptions | gauge | |
Total subscriptions count | solace_vpn_unique_subscriptions | gauge | |
solace-det |
|||
Is client a slow subscriber? (0=not slow, 1=slow) | solace_client_slow_subscriber | gauge | |
Number of clients bound to queue | solace_queue_binds | gauge | |
Number of discarded received messages | solace_client_rx_discarded_msgs_total | counter | |
Number of discarded received messages | solace_vpn_rx_discarded_msgs_total | counter | |
Number of discarded transmitted messages | solace_client_tx_discarded_msgs_total | counter | |
Number of discarded transmitted messages | solace_vpn_tx_discarded_msgs_total | counter | |
Number of received bytes | solace_client_rx_bytes_total | counter | |
Number of received bytes | solace_vpn_rx_bytes_total | counter | |
Number of received messages | solace_client_rx_msgs_total | counter | |
Number of received messages | solace_vpn_rx_msgs_total | counter | |
Number of transmitted bytes | solace_client_tx_bytes_total | counter | |
Number of transmitted bytes | solace_vpn_tx_bytes_total | counter | |
Number of transmitted messages | solace_client_tx_msgs_total | counter | |
Number of transmitted messages | solace_vpn_tx_msgs_total | counter | |
Queue spool configured max disk usage in bytes | solace_queue_spool_quota_bytes | gauge | |
Queue spool total of all spooled messages in bytes | solace_queue_byte_spooled | gauge | |
Queue spool total of all spooled messages | solace_queue_msg_spooled | gauge | |
Queue spool usage in bytes | solace_queue_spool_usage_bytes | gauge | |
Queue spooled number of messages | solace_queue_spool_usage_msgs | gauge | |
Queue total msg redeliveries | solace_queue_msg_redelivered | gauge | |
Queue total msg retransmitted on transport | solace_queue_msg_retransmited | gauge | |
Queue total number of messages delivered to dmq due to exceeded max redelivery | solace_queue_msg_max_redelivered_dmq | gauge | |
Queue total number of messages delivered to dmq due to ttl expiry | solace_queue_msg_ttl_dmq | gauge | |
Queue total number of messages discarded due to exceeded max redelivery | solace_queue_msg_max_redelivered_discarded | gauge | |
Queue total number of messages discarded due to spool shutdown | solace_queue_msg_shutdown_discarded | gauge | |
Queue total number of messages discarded due to ttl expiry | solace_queue_msg_ttl_discarded | gauge | |
Queue total number of messages exceeded the max message size | solace_queue_msg_max_msg_size_exceeded | gauge | |
Queue total number of messages exceeded the spool usage | solace_queue_msg_spool_usage_exceeded | gauge | |
Queue total number of messages failed delivery to dmq due to exceeded max redelivery | solace_queue_msg_max_redelivered_dmq_failed | gauge | |
Queue total number of messages that failed delivery to dmq due to ttl expiry | solace_queue_msg_ttl_dmq_failed | gauge | |
Queue total number that was deleted | solace_queue_msg_total_deleted | gauge | |
Was the last scrape of Solace broker successful | solace_up | gauge |
With the metrics endpoint of the broker deployment enabled and up, it is matter of configuring Prometheus to add this endpoint to its list of scraped targets. The way to configure Prometheus is highly dependent on how it has been deployed including whether it is inside or outside the Kubernetes cluster.
For reference, this guide shows how to set up a Prometheus deployment, created and managed by the Prometheus Operator. Consult your documentation and adjust the procedure if your Prometheus environment differs.
This section describes the setup of a reference Prometheus stack that includes Prometheus and Grafana (and also other Prometheus components not used here). We use the kube-prometheus project which not only includes the Prometheus Operator, but also Grafana. There are some adjustments/workarounds needed as described below.
Steps:
- Git clone the
kube-prometheus
project. These steps were tested with the tagged version, later versions might well work too.
git clone https://github.com/prometheus-operator/kube-prometheus.git --tag v0.12.0
- Follow
kube-prometheus
Quickstart steps: https://github.com/prometheus-operator/kube-prometheus#quickstart . These steps deploy the required operators and create a Prometheus stack in themonitoring
namespace. - Patch the
prometheus-k8s
ClusterRole to enable access to the event broker metrics service. Runkubectl edit ClusterRole prometheus-k8s
and append following to therules
section, then save:
- apiGroups:
- ""
resources:
- services
- pods
- endpoints
verbs:
- get
- list
- watch
- The datasource for Grafana needs to be adjusted to use the
prometheus-operated
service from themonitoring
namespace. This is configured in thegrafana-datasources
secret in the same namespace. Runkubectl edit secret grafana-datasources -n monitoring
, then replace thedata.datasources.yaml
section as follows, then save:
data:
datasources.yaml: ewogICAgImFwaVZlcnNpb24iOiAxLAogICAgImRhdGFzb3VyY2VzIjogWwogICAgICAgIHsKICAgICAgICAgICAgImFjY2VzcyI6ICJwcm94eSIsCiAgICAgICAgICAgICJlZGl0YWJsZSI6IGZhbHNlLAogICAgICAgICAgICAibmFtZSI6ICJwcm9tZXRoZXVzIiwKICAgICAgICAgICAgIm9yZ0lkIjogMSwKICAgICAgICAgICAgInR5cGUiOiAicHJvbWV0aGV1cyIsCiAgICAgICAgICAgICJ1cmwiOiAiaHR0cDovL3Byb21ldGhldXMtb3BlcmF0ZWQubW9uaXRvcmluZy5zdmM6OTA5MCIsCiAgICAgICAgICAgICJ2ZXJzaW9uIjogMQogICAgICAgIH0KICAgIF0KfQ==
Note: Because this is data stored in a Kubernetes secret, you must provide Base64-encoded data. Use an online Base64 decode tool to reveal the unencoded content of the data above.
- Restart the pods in the
monitoring
namespace to pick up the changes:
kubectl delete pods --all -n monitoring
# wait for all pods come back up all ready
kubectl get pods --watch -n monitoring
Now both Prometheus and Grafana are running. Their Web Management UIs are exposed through the services prometheus-k8s
at port 9090 and grafana
at port 3000 in the monitoring
namespace. Because these services are of type ClusterIP one of the options is to use Kubectl port-forwarding to access them:
kubectl port-forward svc/prometheus-k8s 9090 -n monitoring &
kubectl port-forward svc/grafana 3000 -n monitoring &
Point your browser to localhost:9090 for Prometheus and to localhost:3000 for Grafana. An initial login might be required using the credentials admin/admin
.
With the adjustments discussed above, the Prometheus Operator is now watching all namespaces for ServiceMonitor
custom resource objects. A ServiceMonitor
defines which metrics services must be added to the Prometheus targets. It is namespace scoped so it must be added to the namespace where the event broker has been deployed.
Example:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: test-monitor
spec:
endpoints:
- interval: 10s
path: /solace-std
port: tcp-metrics
- interval: 10s
path: /solace-det
port: tcp-metrics
jobLabel: pubsubplus-metrics
selector:
matchLabels:
app.kubernetes.io/name: pubsubpluseventbroker
app.kubernetes.io/component: metricsexporter
app.kubernetes.io/instance: <eventbroker-deployment-name>
This adds the deployment's metrics service (by matching labels) to the Prometheus targets. Refresh the Prometheus Status "Targets" in the Prometheus Web Management UI to see the newly added target.
The ServiceMonitor's selector can be adjusted to match all broker deployments in the namespace by removing instance
from the matched labels. Also, multiple endpoints can be listed to obtain the combination of metrics from those Exporter paths.
The ServiceMonitor
example above specifies that three target endpoints are scraped to get the combination of all metrics available from the broker deployment. The metrics endpoints can be accessed at the port named tcp-metrics
and at the PubSub+ Exporter path, e.g.: /solace-std
is added to the scrape request REST API calls.
In the Grafana Web Management UI, select "Dashboards"->"Import dashboard"->"Upload JSON File". Upload deploy/grafana_example.json
. This opens a sample Grafana dashboard. The following image shows this sample rendered after running some messaging traffic through the broker deployment.
To create or customize your own dashboard refer to the Grafana documentation.
Refer to the Quick Start guide in the root of this repo. It also provides information about deployment pre-requisites and tools.
Example:
# Initial deployment
kubectl apply -f <initial-broker-spec>.yaml
# Wait for the deployment to come up ...
You can validate your deployment on the command line. In this example an HA configuration is deployed with name ha-example
, created using the Quick Start.
prompt:~$ kubectl get statefulsets,services,pods,pvc,pv
NAME READY AGE
statefulset.apps/ha-example-pubsubplus-b 1/1 1h
statefulset.apps/ha-example-pubsubplus-m 1/1 1h
statefulset.apps/ha-example-pubsubplus-p 1/1 1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE
service/ha-example-pubsubplus LoadBalancer 10.124.2.72 35.238.219.112 2222:31209/TCP,8080:31536/TCP,1943:30396/TCP,51234:31106/TCP,55003:31764/TCP,55443:32625/TCP,55556:30149/TCP,8008:30054/TCP,1443:32480/TCP,9000:31032/TCP,9443:30728/TCP,5672:31944/TCP,5671:30878/TCP,1883:31123/TCP,8883:31873/TCP,8000:31970/TCP,8443:32172/TCP 25h
service/ha-example-pubsubplus-discovery ClusterIP None <none> 8080/TCP,8741/TCP,8300/TCP,8301/TCP,8302/TCP
1h
service/ha-example-pubsubplus-prometheus-metrics ClusterIP 10.124.15.107 <none> 9628/TCP
1h
service/kubernetes ClusterIP 10.124.0.1 <none> 443/TCP
1h
NAME READY STATUS RESTARTS AGE
pod/ha-example-pubsubplus-b-0 1/1 Running 0 1h
pod/ha-example-pubsubplus-m-0 1/1 Running 0 1h
pod/ha-example-pubsubplus-p-0 1/1 Running 0 1h
pod/ha-example-pubsubplus-prometheus-exporter-5cdfcd64b4-dbl2j 1/1 Running 0 1h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-ha-example-pubsubplus-b-0 Bound pvc-6de2275b-9731-417b-9e54-341dec2ffa40 30Gi RWO standard-rwo 1h
persistentvolumeclaim/data-ha-example-pubsubplus-m-0 Bound pvc-3c1f3799-fa82-45a2-883a-d5ed52637783 3Gi RWO standard-rwo 1h
persistentvolumeclaim/data-ha-example-pubsubplus-p-0 Bound pvc-4d05e27e-007d-4a4f-a08a-93a2c41005c1 30Gi RWO standard-rwo 1h
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-3c1f3799-fa82-45a2-883a-d5ed52637783 3Gi RWO Delete Bound default/data-ha-example-pubsubplus-m-0 standard-rwo 1h
persistentvolume/pvc-4d05e27e-007d-4a4f-a08a-93a2c41005c1 30Gi RWO Delete Bound default/data-ha-example-pubsubplus-p-0 standard-rwo 1h
persistentvolume/pvc-6de2275b-9731-417b-9e54-341dec2ffa40 30Gi RWO Delete Bound default/data-ha-example-pubsubplus-b-0 standard-rwo 1h
prompt:~$ kubectl describe service ha-example-pubsubplus
Name: ha-example-pubsubplus
Namespace: default
Labels: app.kubernetes.io/instance=ha-example
app.kubernetes.io/managed-by=solace-pubsubplus-operator
app.kubernetes.io/name=pubsubpluseventbroker
Annotations: cloud.google.com/neg: {"ingress":true}
lastAppliedConfig/brokerService: 3a87fe83d04ddd7f
Selector: active=true,app.kubernetes.io/instance=ha-example,app.kubernetes.io/name=pubsubpluseventbroker
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.124.2.72
IPs: 10.124.2.72
LoadBalancer Ingress: 35.238.219.112
Port: tcp-ssh 2222/TCP
TargetPort: 2222/TCP
NodePort: tcp-ssh 31209/TCP
Endpoints: 10.120.1.6:2222
Port: tcp-semp 8080/TCP
TargetPort: 8080/TCP
NodePort: tcp-semp 31536/TCP
Endpoints: 10.120.1.6:8080
Port: tls-semp 1943/TCP
TargetPort: 1943/TCP
NodePort: tls-semp 30396/TCP
Endpoints: 10.120.1.6:1943
:
:
There are three StatefulSets controlling each broker node in an HA redundancy group, with naming conventions <deployment-name>-pubsubplus-p
for Primary, ...-b
for Backup and ...-m
for Monitor brokers. Similarly, the broker pods are named <deployment-name>-pubsubplus-p-0
, ...-b-0
and ...-m-0
. In case of a non-HA deployment there is one StatefulSet with the naming convention of ...-p
.
Generally, all services including management and messaging are accessible through a Load Balancer. In the above example 35.238.219.112
is the Load Balancer's external Public IP to use.
Note: When using MiniKube or other minimal Kubernetes provider, there may be no integrated Load Balancer available, which is the default service type. For a workaround, either refer to the MiniKube documentation for LoadBalancer access or use local port forwarding to the service port:
kubectl port-forward service/$BROKER_SERVICE_NAME <target-port-on-localhost>:<service-port-on-load-balancer> &
. Then access the service atlocalhost:<target-port-on-localhost>
The PubSub+ Broker Manager is the recommended simplest way to administer the event broker for common tasks.
The default admin username is admin
. A password can be provided encoded in a Kubernetes secret in the broker spec parameter spec.adminCredentialsSecret
. If not provided then a random password is generated at initial deployment and stored in a secret named <eventbroker-deployment-name>-pubsubplus-admin-creds
.
For example you can create a secret my-admin-secret
with MyP@ssword
before deployment then pass its name to the broker spec:
echo 'MyP@ssword' | kubectl create secret generic my-admin-secret --from-file=username_admin_password=/dev/stdin
To obtain the admin password from a secret use:
kubectl get secret my-admin-secret -o jsonpath='{.data.username_admin_password}' | base64 -d
Use the Load Balancer's external Public IP at port 8080 to access management services including PubSub+ Broker Manager, SolAdmin and SEMP access.
One option to access the event broker's CLI console is to SSH into the broker as the admin
user using the Load Balancer's external Public IP, at port 2222:
prompt:~$ ssh -p 2222 [email protected]
The authenticity of host '[35.238.219.112]:2222 ([35.238.219.112]:2222)' can't be established.
ECDSA key fingerprint is SHA256:iBVfUuHRh7r8stH4fv3CCzv7966UEK/ZfHTh2Yt79No.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[35.238.219.112]:2222' (ECDSA) to the list of known hosts.
Solace PubSub+ Standard
Password:
Solace PubSub+ Standard Version 10.2.1.32
This Solace product is proprietary software of
Solace Corporation. By accessing this Solace product
you are agreeing to the license terms and conditions
located at http://www.solace.com/license-software
Copyright 2004-2022 Solace Corporation. All rights reserved.
To purchase product support, please contact Solace at:
https://solace.com/contact-us/
Operating Mode: Message Routing Node
ha-example-pubsubplus-p-0>
This enables access only to the active Pod's CLI.
In an HA deployment, CLI access might be needed to any of the brokers, not only the active one.
- The simplest option from the Kubernetes command line console is
kubectl exec -it <broker-pod-name> -- cli
- Loopback to SSH directly on the pod
kubectl exec -it <broker-pod-name> -- bash -c "ssh -p 2222 admin@localhost"
- Loopback to SSH on your host with a port-forward map
kubectl port-forward <broker-pod-name> 62222:2222 &
ssh -p 62222 admin@localhost
This can also be mapped to multiple event brokers in the HA deployment via port-forward:
kubectl port-forward <primary-broker-pod-name> 8081:8080 &
kubectl port-forward <backup-broker-pod-name> 8082:8080 &
kubectl port-forward <monitor-broker-pod-name> 8083:8080 &
For direct access, use:
kubectl exec -it <broker-pod-name> -- bash
The newly created event broker instance comes with a basic configuration of a default
client username with no authentication on the default
message VPN.
An easy first test is using the PubSub+ Broker Manager's built-in Try-Me tool. Try-Me is based on JavaScript making use of the WebSockets API for messaging at port 8008.
To test data traffic using other supported APIs, visit the Solace Developer Portal APIs & Protocols. Under each option there is a Publish/Subscribe tutorial that will help you get started and provide the specific default port to use.
Use the external Public IP to access the deployment at the port required for the protocol.
https://kubernetes.io/docs/tasks/debug/
Run kubectl get statefulsets,services,pods,pvc,pv
to get an understanding of the state, then drill down to get more information on a failed resource to reveal possible Kubernetes resourcing issues, e.g.:
kubectl describe pvc <pvc-name>
The Operator, Broker, and Prometheus Exporter pods all provide logs that might be useful to understand issues.
Detailed logs from the currently running container in a pod:
kubectl logs <pod-name> -f # use -f to follow live
It is also possible to get the logs from a previously terminated or failed container:
kubectl logs <pod-name> -p
Filtering on bringup logs (helps with initial troubleshooting):
kubectl logs <pod-name> | grep [.]sh
The Operator uses a zap-based logger. This means when deploying the operator to a cluster you can set additional flags using an args array in your operator’s container spec. One such flags allows the log level to be updated.
One can extract the current log level with:
kubectl get deployment <deployment-name> --namespace <namespace> -o=json | jq '.spec.template.spec.containers[0].args'
The log levels available are: --zap-log-level=debug
, --zap-log-level=info
and --zap-log-level=error
.
Other configurations to note which can be added to the args array in the operator’s container spec are:
--zap-encoder
: To set log encoding. Options are json
or console
.
--zap-stacktrace-level
: To set level at and above which stacktraces are captured. Options
are info
or error
.
Note that for OLM deployments however, a manual update to the Deployment will be reverted since OLM automatically manages the CRD. This has to be done with the approved means of patching OLM deployments.
Kubernetes collects all events for a cluster in one pool. This includes events related to the PubSub+ deployment.
It is recommended to watch events when creating or upgrading a Solace deployment. Events clear after about an hour. You can query all available events:
kubectl get events -w # use -w to watch live
If pods stay in pending state and kubectl describe pods
reveals there are not enough memory or CPU resources, check the resource requirements of the targeted scaling tier of your deployment and ensure adequate node resources are available.
Pods might also stay in pending state because storage requirements cannot be met. Check kubectl get pv,pvc
. PVCs and PVs should be in bound state and if not then use kubectl describe pvc
for any issues.
Unless otherwise specified, a default storage class must be available for default PubSub+ deployment configuration.
kubectl get storageclasses
Pods stuck in CrashLoopBackoff, or Failed, or Running but not Ready "active" state, usually indicate an issue with available Kubernetes node resources or with the container OS or the event broker process start.
- Try to understand the reason following earlier hints in this section.
- Try to recreate the issue by deleting and then reinstalling the deployment - ensure to remove related PVCs if applicable because they would mount volumes with existing, possibly outdated or incompatible database - and watch the logs and events from the beginning. Look for ERROR messages preceded by information that might reveal the issue.
If no pods are listed related to your deployment check the StatefulSets for any clues:
kubectl describe statefulset | grep <broker-deployment-name>
Your Kubernetes environment's security constraints might also impact successful deployment. Review the Security considerations section.
When the Operator is running it is constantly stewarding the broker deployment artifacts and intervene in case of any deviation.
Maintenance mode enables that in special cases users can "turn off" the operator's control for a broker deployment. This can be done by adding a solace.com/pauseReconcile=true
label to the broker spec:
# Activate maintenance mode by adding the pauseReconcile label
kubectl label eb <broker-deployment-name> solace.com/pauseReconcile=true
# Operator will now ignore this deployment
# ...
# Remove the label to activate Operator's control again
kubectl label eb <broker-deployment-name> solace.com/pauseReconcile-
Modification of the broker deployment (or update) can be initiated by applying an updated broker spec with modified parameter values. Upgrade is a special modification where the broker's spec.image.repository
and/or spec.image.tag
has been modified.
Note: There are limitations; some parameters cannot be modified or the updated values will be ignored. For details on these exceptions, see the Update Limitations section.
Applying a modified manifest example:
# Initial deployment
kubectl apply -f <initial-broker-spec>.yaml
# Wait for the deployment to come up ...
#
# Update
kubectl apply -f <modified-broker-spec>.yaml
It is also possible to directly edit the current deployment spec (manifest).
Edit manifest example:
# Initial deployment
kubectl apply -f <initial-broker-spec>.yaml
# Wait for the deployment to come up ...
#
# Update
kubectl edit eventbroker <broker-deployment-name>
# Make changes to parameters then save
By default an update triggers the restart of the broker pods:
- In a non-HA deployment the single broker pod is restarted.
- In an HA deployment the three broker pods of the HA redundancy group are restarted in a rolling update: first the Monitor Broker pod, then the pod hosting the redundancy Standby Broker, and finally the pod hosting the currently Active Broker. When the currently active broker is terminated for update in the final step, an automatic redundancy activity switch happens where the already updated standby takes activity.
Users wishing to manually control the pod restarts can activate manual updates by specifying spec.updateStrategy: manualPodRestart
in the broker spec. In this case the user is responsible for initiating the termination of the individual pods at their discretion. Removing or setting the value back to automatedRolling
reverts to the rolling update mode.
The following table lists parameters for which update using Modify Deployment is not supported in the current Operator release.
Parameter | Notes |
---|---|
spec.adminCredentialsSecret |
Changing the secret name or contained password does not update the password on the broker but does result in broker pods getting out of readiness. It requires an additional manual action to update the admin password using CLI on each broker. |
spec.monitoringCredentialsSecret |
Similarly to the adminCredentialsSecret , additional manual action is required to update the password of the minitor user on each broker. |
spec.preSharedAuthKeySecret |
Any updates are ignored, manual change of key is required using CLI. |
spec.systemScaling.maxConnections |
Scaling up a broker deployment requires two steps: first, to update the deployment with the desired target systemScaling and next, to manually update scaling using the CLI on each broker. |
spec.systemScaling.maxQueueMessages |
As for maxConnections . |
spec.systemScaling.maxSpoolUsage |
As for maxConnections , but here storage size might need to be increased. Follow the specific instructions of your Kubernetes or storage provider to manually expand the volume claims for the PVCs used. |
spec.redundancy |
Changing a broker deployment from non-HA to HA or HA to non-HA is not supported by simply updating this parameter. |
Important: If you are using ephemeral storage for Monitor nodes, the monitor picks up any of the above updates. However, message routing broker nodes do not—you must manually bring all broker nodes back in sync after a monitor restart.
You can delete event broker deployment by deleting the EventBroker manifest:
kubectl delete eventbroker <broker-deployment-name>
# Check what has remained from the deployment
kubectl get statefulsets,services,pods,pvc,pv
# It might take some time for broker resources to delete
This initiates the deletion of the event broker deployment.
Important: PVCs and related PVs associated with the broker's persistent storage are preserved even after deleting the EventBroker manifest because they might contain important broker configuration. They must be deleted manually if required.
As described in the previous section, broker persistent storage is not automatically deleted.
In this case the deployment can be reinstalled and continue from the point before the delete eventbroker
command was executed by running kubectl apply
again, using the same deployment name and parameters as the previous run. This includes explicitly providing the same admin password as before.
There are two recommended options to acquire the PubSub+ Event Broker Operator:
- Operator Lifecycle Manager (OLM)
- Command Line direct install
The Operator Lifecycle Manager (OLM) tool can be used to install, update, and manage the lifecycle of community operators available from OperatorHub.
Follow the steps from OperatorHub to setup OLM and to install the PubSub+ Event Broker Operator. Click on the Install button to see the detailed instructions.
The default namespace is operators
for operators installed from OperatorHub.
Use the deploy.yaml
from the PubSub+ Event Broker Operator GitHub project. It includes a collection of manifests for all the Kubernetes resources that must be created.
The following example creates a default deployment. Edit the deploy.yaml
before applying to customize options:
# Download manifest for possible edit
wget https://github.com/SolaceProducts/pubsubplus-kubernetes-quickstart/blob/main/deploy/deploy.yaml
# Edit manifest as required
# Manifest creates a namespace and all K8s resources for the Operator deployment
kubectl apply -f deploy.yaml
# Wait for deployment to complete
kubectl get pods -n pubsubplus-operator-system --watch
Customization options:
- Operator namespace: replace the default
pubsubplus-operator-system
- Operator image: replace the default
solace/solace-pubsub-eventbroker-operator:latest
- Allowed namespaces for broker deployment: replace default
""
value forWATCH_NAMESPACE
env variable. Default""
means all namespaces. Provide the name of a single namespace or a comma-separated list of namespaces. - ImagePullSecret used when pulling the operator image from a private repo: use or replace the name
regcred
. In this case the Operator namespace must exist with the ImagePullSecret created there before applyingdeploy.yaml
.
First, check if the PubSubPlusEventBroker
Custom Resource Definition (CRD) is in place. This is a global Kubernetes resource so no namespace is required.
prompt:~$ kubectl get crd pubsubpluseventbrokers.pubsubplus.solace.com
NAME CREATED AT
pubsubpluseventbrokers.pubsubplus.solace.com 2023-02-14T15:24:22Z
Next, check the operator deployment. The following example assumes that the operator has been deployed in the pubsubplus-operator-system
namespace, adjust the command for a different namespace.
prompt:~$ kubectl get deployments -n pubsubplus-operator-system
NAME READY UP-TO-DATE AVAILABLE AGE
pubsubplus-eventbroker-operator 1/1 1 1 3h
If the deployment is not ready then check whether the Operator pod is running at all:
kubectl get pods -n pubsubplus-operator-system
Get the description of the deployment and the operator pod and look for any issues:
kubectl describe deployment pubsubplus-eventbroker-operator -n pubsubplus-operator-system
kubectl describe pod pubsubplus-eventbroker-operator-XXX-YYY -n pubsubplus-operator-system
In the Operator Pod description check the WATCH_NAMESPACE
environment variable. The default ""
value means that all namespaces are watched, otherwise the namespaces are listed where the Operator is allowed to create a broker deployment.
You should also verify that there are adequate RBAC permissions for the Operator; for details, see the Security section.
For additional hints refer to the Broker Troubleshooting section.
A given version of the Operator has a dependency on the PubSubPlusEventBroker Custom Resource Definition (CRD) version it can interpret. The CRD can be viewed as a schema. A new version of a CRD might not be compatible with an older Operator version. Therefore it is generally recommended to use the latest possible version of the Operator. Installing a newer CRD can be expected to be backwards compatible for existing EventBroker resources, but requires an Operator upgrade to at least the same version or later.
You can use OLM to manage installing new versions of the Operator as they become available. The default install of the PubSub+ Event Broker Operator is set to perform automatic updates. This can be changed to Manual
by editing the broker subscription in the operators
namespace.
OLM automatically manages the CRD and Operator updates.
A direct installation requires taking deploy.yaml
from the correctly tagged version of the PubSub+ Event Broker Operator GitHub project, because it includes the corresponding version of the CRD.
Note: Although the goal is to keep the CRD API versions backwards compatible, it might become necessary to introduce a new API version. In that case, detailed upgrade instructions will be provided in the Release Notes.
Existing deployments that were created using the pubsubplus
Helm chart can be ported to Operator control. In-service migration is not supported; broker shutdown is required.
Consider the following:
- The key elements holding the broker configuration and messaging data are the PVs and associated PVCs. They must be assigned to the new deployment. The broker spec allows specifying individual PVCs for each broker in a deployment.
- The Helm deployment used a single StatefulSet model with HA broker pods named
<deployment>-pubsubplus-0
,...-1
and...-2
for Primary, Backup and Monitor broker nodes, respectively. The new Operator deployment model creates a dedicated StatefulSet to each, with pod names<deployment>-pubsubplus-p-0
,...-b-0
and...-m-0
. - The already configured admin password must be provided to the new deployment in a secret, refer to the Users and Passwords section.
- Naming the Operator-managed broker deployment the same as the Helm-based deployment helps to keep the service name (and hence the DNS name) the same, although the external IP address is expected to change if you're using LoadBalancer.
- The TLS secret for broker TLS configuration can be reused.
- Using the existing Helm-based deployment:
- Take note of the
admin
user password. - Follow the documentation to create a global read-only
monitor
user and configure a password. - Take note of the PVCs used. Their naming scheme is
data-<deplyment-name>-pubsubplus-0
,...-1
and...-2
for Primary, Backup and Monitor.
- Create secrets for the
admin
andmonitoring
users, respectively. - Shut down the existing deployment using
helm delete <deployment-name>
. This deletes the broker deployment but not the PV/PVCs. - Create a new broker spec. This example shows one for an HA deployment:
apiVersion: pubsubplus.solace.com/v1beta1
kind: PubSubPlusEventBroker
metadata:
name: <deployment-name> # ensure this is matching the original to keep dns configurations in sync
spec:
redundancy: true # "false" for non-HA
image:
repository: solace/solace-pubsub-standard # ensure this is matching the original
tag: latest # ensure this is matching the original
systemScaling:
messagingNodeCpu: "2" # ensure this is matching the original
messagingNodeMemory: "3410Mi" # ensure this is matching the original
adminCredentialsSecret: created-admin-credetials-secret
monitoringCredentialsSecret: created-monitoring-credetials-secret
tls:
enabled: true
serverTlsConfigSecret: existing-tls-secret
storage:
customVolumeMount:
- name: Primary
persistentVolumeClaim:
claimName: helm-primary-pvc-name
- name: Backup
persistentVolumeClaim:
claimName: helm-backup-pvc-name
- name: Monitor
persistentVolumeClaim:
claimName: helm-monitor-pvc-name
# Add any other parameter from the original deployment as required
- Apply the broker spec. This creates a new deployment using the specified resources:
kubectl apply -f new-broker-spec.yaml
No further steps are required for non-HA deployments. Simply wait for the deployment to come up as ready.
For HA deployments, wait for the pods to come up as running. However, they will never become ready. This is because the redundancy group addresses must be updated because the pods have new names:
Old broker pod name | New broker pod name |
---|---|
<deployment-name>-pubsubplus-0 |
<deployment-name>-pubsubplus-p-0 |
<deployment-name>-pubsubplus-1 |
<deployment-name>-pubsubplus-b-0 |
<deployment-name>-pubsubplus-2 |
<deployment-name>-pubsubplus-m-0 |
Log into each broker pod and follow the Solace documentation to configure the HA redundancy group connect-via
settings.
Example for the monitor node:
my-pubsubplus-m-0> en
my-pubsubplus-m-0# conf
my-pubsubplus-m-0(configure)# redundancy
my-pubsubplus-m-0(configure/redundancy)# shutdown
my-pubsubplus-m-0(configure/redundancy)# show redundancy group <== Existing config
Node Router-Name Node Type Address Status
----------------- -------------- ---------------- ---------
mypubsubplusha0 Message-Router my-pubsubplus Offline
-0.my-pubsubpl
us-discover
y.default.svc
mypubsubplusha1 Message-Router my-pubsubplus Offline
-1.my-pubsubpl
us-discover
y.default.svc
mypubsubplusha2* Monitor my-pubsubplus Offline
-2.my-pubsubpl
us-discover
y.default.svc
* - indicates the current node
my-pubsubplus-m-0(configure/redundancy)# group
my-pubsubplus-m-0(configure/redundancy/group)# node mypubsubplusha0
my-pubxsubplus-m-0(configure/redundancy/group/node)# connect-via my-pubsubplus-p-0.my-pubsubplus-discovery.default.svc
my-pubsubplus-m-0(configure/redundancy/group/node)# exit
my-pubsubplus-m-0(configure/redundancy/group)# node mypubsubplusha1
my-pubsubplus-m-0(configure/redundancy/group/node)# connect-via my-pubsubplus-b-0.my-pubsubplus-discovery.default.svc
my-pubsubplus-m-0(configure/redundancy/group/node)# exit
my-pubsubplus-m-0(configure/redundancy/group)# node mypubsubplusha2
my-pubsubplus-m-0(configure/redundancy/group/node)# connect-via my-pubsubplus-m-0.my-pubsubplus-discovery.default.svc
my-pubsubplus-m-0(configure/redundancy/group/node)# exit
my-pubsubplus-m-0(configure/redundancy/group)# exit
my-pubsubplus-m-0(configure/redundancy)# no shutdown
my-pubsubplus-m-0(configure/redundancy)# show redundancy
Configuration Status : Enabled
Redundancy Status : Down
…
my-pubsubplus-m-0(configure/redundancy)# show redundancy
Configuration Status : Enabled
Redundancy Status : Up