Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted 3.0.0-beta.0 version and installed 4.5.1 on top. sumologic-sumologic-otelcol-logs-collector is stuck in CrashLoopBackOff. #3587

Closed
saymolet opened this issue Mar 4, 2024 · 1 comment
Labels
question Further information is requested

Comments

@saymolet
Copy link

saymolet commented Mar 4, 2024

Previously, sumologic Helm Chart of version 3.0.0-beta.0 was installed to the cluster. This release was deleted with Helm and a new 4.5.1 release was installed to the same cluster. After the installation sumologic-sumologic-otelcol-logs-collector daemonset is not able to bring up pods, they are stuck in CrashLoopBackOff state. This problem did not occur at any other cluster, only the one where an older version of sumologic was installed.

~ ❯ kubectl get po -n sumologic
NAME                                                           READY   STATUS             RESTARTS      AGE
sumologic-kube-state-metrics-ddc4bd668-zh77s                   1/1     Running            0             22m
sumologic-opentelemetry-operator-7c75546d6b-6rmcs              2/2     Running            0             22m
sumologic-prometheus-node-exporter-2gxvs                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-84qmk                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-8j6xp                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-fj6wg                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-grctk                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-h62rh                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-p5t9s                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-slmtm                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-vs7rt                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-z4kcj                       1/1     Running            0             22m
sumologic-prometheus-node-exporter-zqpsb                       1/1     Running            0             22m
sumologic-sumologic-metrics-collector-0                        1/1     Running            1 (22m ago)   22m
sumologic-sumologic-metrics-targetallocator-665c9864f8-nbr8f   1/1     Running            0             22m
sumologic-sumologic-otelcol-events-0                           1/1     Running            0             22m
sumologic-sumologic-otelcol-instrumentation-0                  1/1     Running            0             22m
sumologic-sumologic-otelcol-instrumentation-1                  1/1     Running            0             22m
sumologic-sumologic-otelcol-instrumentation-2                  1/1     Running            0             22m
sumologic-sumologic-otelcol-logs-0                             1/1     Running            0             22m
sumologic-sumologic-otelcol-logs-1                             1/1     Running            0             22m
sumologic-sumologic-otelcol-logs-2                             1/1     Running            0             22m
sumologic-sumologic-otelcol-logs-collector-4hr6v               0/1     CrashLoopBackOff   9 (71s ago)   22m
sumologic-sumologic-otelcol-logs-collector-4p56f               0/1     CrashLoopBackOff   9 (54s ago)   22m
sumologic-sumologic-otelcol-logs-collector-7c92p               0/1     CrashLoopBackOff   9 (70s ago)   22m
sumologic-sumologic-otelcol-logs-collector-87l4j               0/1     CrashLoopBackOff   9 (51s ago)   22m
sumologic-sumologic-otelcol-logs-collector-gr2z6               0/1     CrashLoopBackOff   9 (64s ago)   22m
sumologic-sumologic-otelcol-logs-collector-jvx4n               1/1     Running            0             22m
sumologic-sumologic-otelcol-logs-collector-p7jqm               0/1     CrashLoopBackOff   9 (58s ago)   22m
sumologic-sumologic-otelcol-logs-collector-q22ld               0/1     CrashLoopBackOff   9 (82s ago)   22m
sumologic-sumologic-otelcol-logs-collector-qh7wk               0/1     CrashLoopBackOff   9 (62s ago)   22m
sumologic-sumologic-otelcol-logs-collector-sbzzx               0/1     CrashLoopBackOff   9 (65s ago)   22m
sumologic-sumologic-otelcol-logs-collector-t249v               0/1     CrashLoopBackOff   9 (66s ago)   22m
sumologic-sumologic-otelcol-metrics-0                          1/1     Running            0             22m
sumologic-sumologic-otelcol-metrics-1                          1/1     Running            0             22m
sumologic-sumologic-otelcol-metrics-2                          1/1     Running            0             22m
sumologic-sumologic-traces-gateway-5ccdd68b9-k9489             1/1     Running            0             22m
sumologic-sumologic-traces-sampler-788bd6b7bc-728g5            1/1     Running            0             22m

Every pod reports some variation of the same error

~ ❯ kubectl logs -n sumologic sumologic-sumologic-otelcol-logs-collector-t249v                                                                                                                                                                                                                            17:58:16
Defaulted container "otelcol" out of: otelcol, changeowner (init)
2024-03-04T15:57:08.520Z        info    [email protected]/telemetry.go:86 Setting up own telemetry...
2024-03-04T15:57:08.520Z        info    [email protected]/telemetry.go:159        Serving metrics {"address": ":8888", "level": "Basic"}
2024-03-04T15:57:08.521Z        info    [email protected]/processor.go:289      Development component. May change in the future.        {"kind": "processor", "name": "logstransform/systemd", "pipeline": "logs/systemd"}
2024-03-04T15:57:08.522Z        info    [email protected]/service.go:151  Starting otelcol-sumo...        {"Version": "v0.92.0-sumo-0", "NumCPU": 4}
2024-03-04T15:57:08.522Z        info    extensions/extensions.go:34     Starting extensions...
2024-03-04T15:57:08.522Z        info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "file_storage"}
2024-03-04T15:57:08.522Z        info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "file_storage"}
2024-03-04T15:57:08.522Z        info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "health_check"}
2024-03-04T15:57:08.522Z        info    [email protected]/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-03-04T15:57:08.522Z        warn    [email protected]/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks        {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-03-04T15:57:08.522Z        info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "health_check"}
2024-03-04T15:57:08.522Z        info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "pprof"}
2024-03-04T15:57:08.522Z        info    [email protected]/pprofextension.go:60     Starting net/http/pprof server  {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777","DialerConfig":{"Timeout":0}},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2024-03-04T15:57:08.523Z        info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "pprof"}
2024-03-04T15:57:08.523Z        info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "journald", "data_type": "logs"}
2024-03-04T15:57:09.524Z        info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "filelog/containers", "data_type": "logs"}
2024-03-04T15:57:09.533Z        info    fileconsumer/file.go:64 Resuming from previously known offset(s). 'start_at' setting is not applicable. {"kind": "receiver", "name": "filelog/containers", "data_type": "logs", "component": "fileconsumer"}
2024-03-04T15:57:09.533Z        info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-03-04T15:57:09.533Z        info    [email protected]/service.go:177  Everything is ready. Begin running and processing data.
panic: assignment to entry in nil map

goroutine 128 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/reader.(*Factory).NewReaderFromMetadata(0x40028962f0, 0x4002d06210, 0x4002e152f0)
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/internal/reader/factory.go:117 +0x720
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).newReader(0x40028962d0, 0x4002d06210, 0x4002d12a68)
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/file.go:263 +0x394
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).makeReaders(0x40028962d0, {0x4002dca600, 0x10, 0x0?})
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/file.go:234 +0x11c
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).consume(0x40028962d0, {0x7d1f288?, 0x4002c83040}, {0x4002dca600, 0x10, 0x10})
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/file.go:168 +0x134
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).poll(0x40028962d0, {0x7d1f288, 0x4002c83040})
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/file.go:150 +0x2f0
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).startPoller.func1()
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/file.go:118 +0xb0
created by github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).startPoller in goroutine 1
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/fileconsumer/file.go:106 +0xa8

Here are the events of daemonset:

~ ❯ kubectl describe daemonset sumologic-sumologic-otelcol-logs-collector -n sumologic
Name:           sumologic-sumologic-otelcol-logs-collector
Selector:       app.kubernetes.io/name=sumologic-sumologic-otelcol-logs-collector
Node-Selector:  <none>
Labels:         app=sumologic-sumologic-otelcol-logs-collector
                app.kubernetes.io/managed-by=Helm
                chart=sumologic-4.5.1
                heritage=Helm
                release=sumologic
Annotations:    deprecated.daemonset.template.generation: 1
                meta.helm.sh/release-name: sumologic
                meta.helm.sh/release-namespace: sumologic
Desired Number of Nodes Scheduled: 11
Current Number of Nodes Scheduled: 11
Number of Nodes Scheduled with Up-to-date Pods: 11
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status:  11 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/app-name=sumologic-sumologic-otelcol-logs-collector
                    app.kubernetes.io/name=sumologic-sumologic-otelcol-logs-collector
                    chart=sumologic-4.5.1
                    heritage=Helm
                    release=sumologic
  Annotations:      checksum/config: 89d2f067c94e7733a930f1e9b5758d5e093dadc70568c30214141b761f99a63e
  Service Account:  sumologic-sumologic-otelcol-logs-collector
  Init Containers:
   changeowner:
    Image:      public.ecr.aws/docker/library/busybox:1.36.0
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
      chown -R \
        0:0 \
        /var/lib/storage/otc
      
    Environment:  <none>
    Mounts:
      /var/lib/storage/otc from file-storage (rw)
  Containers:
   otelcol:
    Image:       public.ecr.aws/sumologic/sumologic-otel-collector:0.92.0-sumo-0
    Ports:       1777/TCP, 8888/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      --config=/etc/otelcol/config.yaml
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   32Mi
    Liveness:   http-get http://:13133/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:13133/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      LOGS_METADATA_SVC:  <set to the key 'metadataLogs' of config map 'sumologic-configmap'>  Optional: false
      NAMESPACE:           (v1:metadata.namespace)
    Mounts:
      /etc/otelcol from otelcol-config (rw)
      /var/lib/docker/containers from varlibdockercontainers (ro)
      /var/lib/storage/otc from file-storage (rw)
      /var/log/journal from varlogjournal (ro)
      /var/log/pods from varlogpods (ro)
  Volumes:
   otelcol-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sumologic-sumologic-otelcol-logs-collector
    Optional:  false
   varlogpods:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/pods
    HostPathType:  
   varlibdockercontainers:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker/containers
    HostPathType:  
   file-storage:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/otc
    HostPathType:  DirectoryOrCreate
   varlogjournal:
    Type:               HostPath (bare host directory volume)
    Path:               /var/log/journal/
    HostPathType:       
  Priority Class Name:  sumologic-sumologic-priorityclass
Events:
  Type     Reason            Age                From                  Message
  ----     ------            ----               ----                  -------
  Warning  FailedCreate      15m (x2 over 15m)  daemonset-controller  Error creating: pods "sumologic-sumologic-otelcol-logs-collector-" is forbidden: no PriorityClass with name sumologic-sumologic-priorityclass was found
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-4p56f
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-t249v
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-sbzzx
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-87l4j
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-qh7wk
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-jvx4n
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-gr2z6
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-p7jqm
  Normal   SuccessfulCreate  15m                daemonset-controller  Created pod: sumologic-sumologic-otelcol-logs-collector-4hr6v
  Normal   SuccessfulCreate  15m (x2 over 15m)  daemonset-controller  (combined from similar events): Created pod: sumologic-sumologic-otelcol-logs-collector-7c92p

PriorityClass with the name sumologic-sumologic-priorityclass does in fact exist

~ ❯ kubectl get priorityclass -n sumologic
NAME                                VALUE        GLOBAL-DEFAULT   AGE
sumologic-sumologic-priorityclass   1000000      false            34m
system-cluster-critical             2000000000   false            443d
system-node-critical                2000001000   false            443d

~ ❯ kubectl describe priorityclass sumologic-sumologic-priorityclass -n sumologic
Name:              sumologic-sumologic-priorityclass
Value:             1000000
GlobalDefault:     false
PreemptionPolicy:  PreemptLowerPriority
Description:       This PriorityClass will be used for OTel Distro agents running as Daemonsets
Annotations:       meta.helm.sh/release-name=sumologic,meta.helm.sh/release-namespace=sumologic
Events:            <none>
~ ❯ kubectl get crd
NAME                                         CREATED AT
alertmanagerconfigs.monitoring.coreos.com    2022-12-21T22:16:02Z
alertmanagers.monitoring.coreos.com          2022-12-21T22:16:04Z
certificaterequests.cert-manager.io          2023-11-15T08:15:46Z
certificates.cert-manager.io                 2023-11-15T08:15:46Z
challenges.acme.cert-manager.io              2023-11-15T08:15:46Z
clusterissuers.cert-manager.io               2023-11-15T08:15:46Z
cninodes.vpcresources.k8s.aws                2023-08-15T09:50:43Z
collectorsets.logicmonitor.com               2024-02-22T16:02:56Z
eniconfigs.crd.k8s.amazonaws.com             2022-12-17T11:21:12Z
instrumentations.opentelemetry.io            2024-03-01T14:20:04Z
issuers.cert-manager.io                      2023-11-15T08:15:46Z
opampbridges.opentelemetry.io                2024-03-01T14:20:04Z
opentelemetrycollectors.opentelemetry.io     2024-03-01T14:20:04Z
orders.acme.cert-manager.io                  2023-11-15T08:15:46Z
podmonitors.monitoring.coreos.com            2022-12-21T22:16:05Z
policyendpoints.networking.k8s.aws           2023-09-12T13:36:10Z
probes.monitoring.coreos.com                 2022-12-21T22:16:06Z
prometheuses.monitoring.coreos.com           2022-12-21T22:16:08Z
prometheusrules.monitoring.coreos.com        2022-12-21T22:16:09Z
securitygrouppolicies.vpcresources.k8s.aws   2022-12-17T11:21:14Z
servicemonitors.monitoring.coreos.com        2022-12-21T22:16:09Z
thanosrulers.monitoring.coreos.com           2022-12-21T22:16:11Z

I additionally referred to this issue: #3397. But unfortunately it did not help, as all the resources were deleted by helm.
Multiple reinstallments, deletion of persistent-volume-claims and deleteion of the namespace did not help.

The only values that are being specified in the values.yaml are:

sumologic:
  clusterName: "value"
  collectorName: "value"
  logs:
    container:
      sourceCategoryPrefix: "value"
      sourceCategoryReplaceDash: "-"
    systemd:
      sourceCategoryPrefix: "value"
    kubelet:
      sourceCategoryPrefix: "value"
    defaultFluentd:
      sourceCategoryPrefix: "value"
@saymolet saymolet added the question Further information is requested label Mar 4, 2024
@saymolet saymolet changed the title Deleted 3.0.0 version and installed 4.5.1 on top. sumologic-sumologic-otelcol-logs-collector is stuck in CrashLoopBackOff. Deleted 3.0.0-beta.0 version and installed 4.5.1 on top. sumologic-sumologic-otelcol-logs-collector is stuck in CrashLoopBackOff. Mar 4, 2024
@saymolet
Copy link
Author

saymolet commented Mar 6, 2024

https://help.sumologic.com/docs/send-data/kubernetes/v3/how-to-upgrade/

This page helped a lot. I think it was said before that helm does not upgrade CRD's so you need to do that manually. After the upgrade of CRD's and Helm upgrade the release deployed without any problems. Although I needed to completely uninstall v4.0.0 to install v4.5.1, as some resources were dangling from v4.5.1 I assume, so not all pods were healthy. Thank you

@saymolet saymolet closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant