What's new

Version 5.0

The following are the highlights and main changes of Logging operator 5.0. For a complete list of changes and bugfixes, see the Logging operator 5.0 releases page.

Breaking changes

The Sumo Logic filter, the Sumo Logic output, and the enhance_k8s filter are no longer supported, as they are not available in the new Fluentd image (v1.17-5.0). If you want to continue using them, keep using the 4.x version of Logging operator.
The name of the crd subchart changed to logging-operator-crds. This new subchart is also available as an OCI artifact.

Clean up stuck finalizers

When uninstalling Logging operator using Helm, some finalizers may be stuck because Helm uninstalls the resources in a non-deterministic order. You can use the new finalizer-cleanup flag in conjunction with .Values.rbac.retainOnDelete to avoid this problem. When both options are set during uninstall, then:

Helm will retain the operator’s service account, cluster role, and cluster role binding. This is necessary because Helm uninstalls resources in an unpredictable order.
The operator will attempt to free up its managed logging resources by removing finalizers, which would otherwise remain stuck.

For example:

helm install logging-operator ./charts/logging-operator/ --set rbac.retainOnDelete=true --set extraArgs='{"-enable-leader-election=true","-finalizer-cleanup=true"}'

Experimental Telemetry Controller support

You can now use the Telemetry Controller as a log collector agent instead of Fluent Bit. To test this feature, you have to:

Install Logging operator with the enable-telemetry-controller-route flag and the telemetry-controller.install options enabled:

helm install logging-operator ./charts/logging-operator/ --set extraArgs='{"-enable-leader-election=true","-enable-telemetry-controller-route"}' --set telemetry-controller.install=true

Configure the routeConfig option in the Logging resource to specify how the Telemetry Controller resources are created:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: tc-config
spec:
  routeConfig:
    enableTelemetryControllerRoute: true # Determines whether to create TC resources for log-collection and routing purposes.
    disableLoggingRoute: false # Determines whether to use Logging-routes for routing purposes and Fluentbit-agents for log-collection.
    tenantLabels: # Will be placed on TC's tenant resources, must be matched with the same field on the Collector resource. (Deployed by the cluster administrator.)
      tenant: logging

NOTE: There is a hands-on example available to try: https://github.com/kube-logging/logging-operator/tree/master/config/samples/telemetry-controller-routing

Improved IPv6 support

Fluent Bit couldn’t listen on an IPv6 http socket because its HTTP_Listen address was hardcoded. Now this is set using the POD_IP, so it works in IPv6 environments as well.
Until now, IPv6-only clusters couldn’t scrape metrics from the Fluentd aggregator. This has been fixed.

rdkafka option support

You can now set rdkafka_options in for rdkafka2 in the Kafka Fluentd output.

Other Fluent Bit changes

You can now specify whether to pause or drop data when the buffer is full. This helps to make sure we apply backpressure on the input. For details, see storage.pause_on_chunks_overlimit (string, optional).
HotReload pauses all inputs and waits until they finish. However, this can block the reload indefinitely, for example, if an output is down for a longer time. You can now force the reload to happen after a grace period using the forceHotReloadAfterGrace option.

Version 4.11

The following are the highlights and main changes of Logging operator 4.11. For a complete list of changes and bugfixes, see the Logging operator 4.11 releases page.

You can now set the protected flag for SyslogNGClusterOutput kinds.
Charts and images are now signed. To verify the signature, see Image and chart verification.

You can now add annotations and labels to Persistent Volume Claims of the Fluentd StatefulSet. For example:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: all-to-file
spec:
  controlNamespace: default
  fluentd:
    bufferStorageVolume:
      pvc:
        labels:
          app: logging
        annotations:
          app: logging
        source:
          claimName: manual
          readOnly: false

You can now set liveness probes to the buffer-metrics sidecar container using the bufferVolumeLivenessProbe option.
IPv6 improvements:
- You can now scrape the metrics of Fluentd on clusters that only have IPv6 addresses.
- Fluent Bit can now listen on IPv6 addresses.
The OpenSearch Fluentd output now supports the remove_keys option.
You can now set the strategy and topologySpreadConstraints in the Logging operator chart.

Version 4.10

The following are the highlights and main changes of Logging operator 4.10. For a complete list of changes and bugfixes, see the Logging operator 4.10 releases page.

You can now control the memory usage of Fluent Bit in persistent buffering mode using the storage.max_chunks_up option.
When using systemdFilters in HostTailers, you can now specify the journal field to use.
The documentation of the Gelf Fluentd output has been improved, and now includes the max_bytes option that can limit the size of the messages.

You can now configure image repository overrides in the syslog-ng spec (both in the Logging resource and in the SyslogNGConfig resource):

syslogNGImage:
  repository: ...
  tag: ...
configReloadImage:
  repository: ...
  tag: ...
metricsExporterImage:
  repository: ...
  tag: ...
bufferVolumeMetricsImage:
  repository: ...
  tag: ...

When using the Kafka Fluentd output, you can now set the maximal size of the messages using the max_send_limit_bytes option.

Version 4.9

The following are the highlights and main changes of Logging operator 4.9. For a complete list of changes and bugfixes, see the Logging operator 4.9 releases page.

OpenTelemetry output

When using the syslog-ng aggretor, you can now send data directly to an OpenTelemetry endpoint. All metadata and the original log record are available in the body of the log record. Resource attributes will be available in a future release, when we switch to an OpenTelemetry input and receive standard OTLP logs. For details, see OpenTelemetry output.

2024-07-05T09:00:23.407Z	info	LogsExporter	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-07-05T09:00:23.407Z	info	ResourceLog #0
Resource SchemaURL:
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-07-05 09:00:23.405798 +0000 UTC
Timestamp: 2024-07-05 09:00:23.406049 +0000 UTC
SeverityText:
SeverityNumber: Info2(10)
Body: Str({"ts":"2024-07-05T09:00:22.424670Z","log":"107.147.239.123 - - [05/Jul/2024:09:00:22 +0000] \"POST /index.html HTTP/1.1\" 200 14184 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3191.0 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2024-07-05T09:00:22.424670292Z","kubernetes":{"pod_name":"log-generator-55867b6d4c-66fdv","namespace_name":"log-generator","pod_id":"682c9ed9-9421-406f-9c7b-cf2b2e62f406","labels":{"app.kubernetes.io/instance":"log-generator","app.kubernetes.io/name":"log-generator","pod-template-hash":"55867b6d4c"},"host":"logging","container_name":"log-generator","docker_id":"dba14a358990c4b6ab82acdf75069952f3b180b3f16dd9527adc7eb11f6d2167","container_hash":"ghcr.io/kube-logging/log-generator@sha256:e26102ef2d28201240fa6825e39efdf90dec0da9fa6b5aea6cf9113c0d3e93aa","container_image":"ghcr.io/kube-logging/log-generator:0.7.0"}})
Trace ID:
Span ID:
Flags: 0
	{"kind": "exporter", "data_type": "logs", "name": "debug"}

Elasticsearch data streams

You can now send messages and metrics to Elasticsearch data streams to store your log and metrics data as time series data. You have to use the syslog-ng aggretor to use this output. For details, see Elasticsearch datastream.

Improved Kafka performance

The Kafka Fluentd output now supports using the rdkafka2 client, which offers higher performance than ruby-kafka. Set use_rdkafka to true to use the rdkafka2 client. (If you’re using a custom Fluentd image, note that rdkafka2 requires v1.16-4.9-full or higher.)

Containerd compatibility

As many users prefer Containerd instead of Docker as their container runtime, the slight differences between these CRIs are causing some problems. Now you can enable a compatibility layer with the enableDockerParserCompatibilityForCRI option of the logging CRD, for example:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: containerd
spec:
  enableDockerParserCompatibilityForCRI: true

This option enables a log parser that is compatible with the docker parser. This has the following benefits:

automatically parses JSON logs using the Merge_Log feature
downstream parsers can use the log field instead of the message field, just like with the docker runtime
the concat and parser filters are automatically set back to use the log field.

Here is a sample log message with the option enabled:

{
  "log": "2024-08-12T14:19:29.672991171Z stderr F [2024/08/12 14:19:29] [ info] [input:tail:tail.0] inotify_fs_add(): inode=2939617 watch_fd=17 name=/var/log/containers/containerd-fluentd-0_default_fluentd-e46f1fcb1b63e7458fc43c079b7455f8e1305e551939ca128361a9574a194ed7.log",
  "kubernetes": {
    "pod_name": "containerd-fluentbit-mchwz",
    "namespace_name": "default",
    "pod_id": "22f86078-26f1-4202-bbc7-1dd5ddce20ec",
    "labels": {
      "app.kubernetes.io/instance": "containerd",
      "app.kubernetes.io/managed-by": "containerd",
      "app.kubernetes.io/name": "fluentbit",
      "controller-revision-hash": "6bb6f7f5f4",
      "pod-template-generation": "2"
    },
    "annotations": {
      "checksum/cri-log-parser.conf": "d902a0ee964e9398e637b581be851cdf50ab2846e82003d2f5e2feef82bef95d",
      "checksum/fluent-bit.conf": "dc9727d915c447b414dc05df2d9a6f23246cdca345309eb3107cc16ae8369b53"
    },
    "host": "logging",
    "container_name": "fluent-bit",
    "docker_id": "5cf032406344fdf41d76ce4489ee6f3ca092e9207ec49ab6209cc2bcf950e593",
    "container_image": "fluent/fluent-bit:3.0.4"
  }
}

Set the severity of PrometheusRules

You can now configure PrometheusRulesOverride in your logging CRDs. The content of PrometheusRulesOverride is identical to the v1.Rule Prometheus rule type. The controller will match overrides by their names with the original rules. All of the override attributes are optional and whenever an attribute is set, it will replace the original attribute.

For example, you can change the severity of a rule like this:

fluentd:
  metrics:
    prometheusRules: true
    prometheusRulesOverride:
    - alert: FluentdPredictedBufferGrowth
      labels:
         rulegroup: fluentd
         service:   fluentd
         severity:  none

Other changes

Fluent Bit hot reload now reloads imagePullSecrets as well.
From now on, the entire spec.Security.SecurityContext is passed to Fluent Bit.
Kubernetes namespace labels are added to the metadata by default. (The default of the namespace_labels option in the FluentBitAgent CRD is on.)

Version 4.8

The following are the highlights and main changes of Logging operator 4.8. For a complete list of changes and bugfixes, see the Logging operator 4.8 releases page and the Logging operator 4.8 release blog post.

Routing based on namespace labels

In your Fluentd ClusterFlows you can now route your messages based on namespace labels.

Note: This feature requires a new Fluentd image: ghcr.io/kube-logging/fluentd:v1.16-4.8-full. If you’re using a custom Fluentd image, make sure to update it!

If you have enabled namespace labeling in Fluent Bit, you can use namespace labels in your selectors, for example:

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: dev-logs
spec:
  match:
    - select:
        namespace_labels:
          tenant: dev
  globalOutputRefs:
    - example

Breaking change

If you’re using hostTailer or eventTailer and configured it through the helm chart’s logging.hostTailer or logging.eventTailer option, note that now both components have an enabled flag. Set this flag to true if you used any of these components from the chart. For details, see the pull request.

Go templates in metrics-probe label values

You can now use go templates that resolve to destination information (name, namespace, scope:local/global and the logging name) in metrics-probe label values. For example:

apiVersion: logging.banzaicloud.io/v1beta1
kind: SyslogNGClusterFlow
metadata:
  name: all
spec:
  match: {}
  outputMetrics:
    - key: custom_output
      labels:
        flow: all
        # define your own label for output name
        my-key-for-the-output: "{{ .Destination.Name }}"
        # do not add the output_name label to the metric
        output_name: ""
  globalOutputRefs:
    - http

Other changes

You can set the maximal number of TCP connections Fluent Bit can open towards the aggregator to avoid overloading it.

spec:
  controlNamespace: default
  fluentbit:
# The below network configurations allow fluentbit to retry indefinitely on a limited number of connections to avoid overloading the aggregator (syslog-ng in this case)
  network:
    maxWorkerConnections: 2
  syslogng_output:
    Workers: 2
    Retry_Limit: "no_limits"

In the Logging operator helm chart you can include extra manifests to deploy together with the chart using the extraManifests field, similarly to other popular charts.

Version 4.7

The following are the highlights and main changes of Logging operator 4.7. For a complete list of changes and bugfixes, see the Logging operator 4.7 releases page and the Logging operator 4.7 release blog post.

Breaking change for Fluentd

When using the Fluentd aggregator, Logging operator has overridden the default chunk_limit_size for the Fluentd disk buffers. Since Fluentd updated the default value to a much saner default, Logging operator won’t override that to avoid creating too many small buffer chunks. (Having too many small chunks can lead to too many open files errors.)

This isn’t an intrusive breaking change, it only affects your deployments if you intentionally or accidentally depended on this value.

JSON output format for Fluentd

In addition to the default text format, Fluentd can now format the output as JSON:

spec:
  fluentd:
    logFormat: json

Disk buffer support for more outputs

Enabling disk buffers wasn’t available for some of the outputs, this has been fixed for: Gelf, Elasticsearch, OpenObserve, S3, Splunk HEC.

Compression support for Elasticsearch

The Elasticsearch output of the Fluentd aggregator now supports compressing the output data using gzip. You can use the compression_level option to set default_compression, best_compression, or best_speed. By default, compression is disabled.

Protected ClusterOutputs for Fluentd

By default, ClusterOutputs can be referenced in any Flow. In certain scenarios, this means that users can send logs from Flows to the ClusterOutput, possibly spamming the output with user logs. From now on, you can set the protected flag for ClusterOutputs and prevent Flows from sending logs to the protected ClusterOutput.

ConfigCheck settings for aggregators

You can now specify configCheck settings globally in the Loggings CRD, and override them if needed on the aggregator level in the Fluentd or SyslogNG CRD.

Limit connections for Fluent Bit

You can now limit the number of TCP connections that each Fluent Bit worker can open toward the aggregator endpoints. The max_worker_connections is set to unlimited by default, and should be used together with the Workers option (which defaults to 2 according to the Fluent Bit documentation). The following example uses a single worker with a single connection:

kind: FluentbitAgent
spec:
  network:
    maxWorkerConnections: 1
  syslogng_output:
    Workers: 1

Version 4.6

The following are the highlights and main changes of Logging operator 4.6. For a complete list of changes and bugfixes, see the Logging operator 4.6 releases page and the Logging operator 4.6 release blog post.

Fluent Bit hot reload

As a Fluent Bit restart can take a long time when there are many files to index, Logging operator now supports hot reload for Fluent Bit to reload its configuration on the fly.

You can enable hot reloads under the Logging’s spec.fluentbit.configHotReload (legacy method) option, or the new FluentbitAgent’s spec.configHotReload option:

apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
  name: reload-example
spec:
  configHotReload: {}

You can configure the resources and image options:

apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
  name: reload-example
spec:
  configHotReload:
    resources: ...
    image:
      repository: ghcr.io/kube-logging/config-reloader
      tag: v0.0.5

Many thanks to @aslafy-z for contributing this feature!

VMware Aria Operations output for Fluentd

When using the Fluentd aggregator with the Logging operator, you can now send your logs to VMware Aria Operations for Logs. This output uses the vmwareLogInsight plugin.

Here is a sample output snippet:

spec:
  vmwareLogInsight:
    scheme: https
    ssl_verify: true
    host: MY_LOGINSIGHT_HOST
    port: 9543
    agent_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    log_text_keys:
	- log
	- msg
	- message
    http_conn_debug: false

Many thanks to @logikone for contributing this feature!

VMware Log Intelligence output for Fluentd

When using the Fluentd aggregator with the Logging operator, you can now send your logs to VMware Log Intelligence. This output uses the vmware_log_intelligence plugin.

Here is a sample output snippet:

spec:
  vmwarelogintelligence:
    endpoint_url: https://data.upgrade.symphony-dev.com/le-mans/v1/streams/ingestion-pipeline-stream
    verify_ssl: true
    http_compress: false
    headers:
      content_type: "application/json"
      authorization:
        valueFrom:
          secretKeyRef:
            name: vmware-log-intelligence-token
            key: authorization
      structure: simple
    buffer:
      chunk_limit_records: 300
      flush_interval: 3s
      retry_max_times: 3

Many thanks to @zrobisho for contributing this feature!

Kubernetes namespace labels and annotations

Logging operator 4.6 supports the new Fluent Bit Kubernetes filter options that will be released in Fluent Bit 3.0. That way you’ll be able to enrich your logs with Kubernetes namespace labels and annotations right at the source of the log messages.

Fluent Bit 3.0 hasn’t been released yet (at the time of this writing), but you can use a developer image to test the feature, using a FluentbitAgent resource like this:

apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
  name: namespace-label-test
spec:
  filterKubernetes:
    namespace_annotations: "On"
    namespace_labels: "On"
  image:
    repository: ghcr.io/fluent/fluent-bit
    tag: 3.0.0

Other changes

Enabling ServiceMonitor checks if Prometheus is already available.
You can now use a custom PVC without a template for the statefulset.
You can now configure PodDisruptionBudget for Fluentd.
Event tailer metrics are now automatically exposed.
You can configure timeout-based configuration checks using the logging.configCheck object of the logging-operator chart.
You can now specify the event tailer image to use in the logging-operator chart.
Fluent Bit can now automatically delete irrecoverable chunks.
The Fluentd statefulset and its components created by the Logging operator now include the whole securityContext object.
The Elasticsearch output of the syslog-ng aggregator now supports the template option.
To avoid problems that might occur when a tenant has a faulty output and backpressure kicks in, Logging operator now creates a dedicated tail input for each tenant.

Removed feature

We have removed support for Pod Security Policies (PSPs), which were deprecated in Kubernetes v1.21, and removed from Kubernetes in v1.25.

Note that the API was left intact, it just doesn’t do anything.

Version 4.5

The following are the highlights and main changes of Logging operator 4.5. For a complete list of changes and bugfixes, see the Logging operator 4.5 releases page.

Standalone FluentdConfig and SyslogNGConfig CRDs

Starting with Logging operator version 4.5, you can either configure Fluentd in the Logging CR, or you can use a standalone FluentdConfig CR. Similarly, you can use a standalone SyslogNGConfig CRD to configure syslog-ng.

These standalone CRDs are namespaced resources that allow you to configure the Fluentd/syslog-ng aggregator in the control namespace, separately from the Logging resource. That way you can use a multi-tenant model, where tenant owners are responsible for operating their own aggregator, while the Logging resource is in control of the central operations team.

For details, see Configure Fluentd and Configure syslog-ng.

New syslog-ng features

When using syslog-ng as the log aggregator, you can now:

Send data to OpenObserve
Use a custom date-parser
Create custom log metrics for sources and outputs
Set the permitted SSL versions in HTTP based outputs
Configure the maxConnections parameter of the sources

New Fluentd features

When using Fluentd as the log aggregator, you can now:

Use the useragent Fluent filter
Configure sidecar container in Fluentd pods
Configure the security-context of every container
Set which Azure Cloud to use (for example, AzurePublicCloud), when using the Azure Storage output
Customize the image to use in event and host tailers

Other changes

LoggingStatus now includes the number (problemsCount) and the related watchNamespaces to help troubleshooting

Image and dependency updates

For the list of images used in Logging operator, see Images used by Logging operator.

Version 4.4

The following are the highlights and main changes of Logging operator 4.4. For a complete list of changes and bugfixes, see the Logging operator 4.4 releases page.

New syslog-ng features

When using syslog-ng as the log aggregator, you can now use the following new outputs:

ElasticSearch
Grafana Loki
MongoDB
Redis
Amazon S3
Splunk HEC
The HTTP output now supports the log-fifo-size, response-action, and timeout fields.

You can now use the metrics-probe() parser of syslog-ng in syslogNGFLow and SyslogNGClusterFlow. For details, see MetricsProbe.

Multitenancy with namespace-based routing

Logging operator now supports namespace based routing for efficient aggregator-level multi-tenancy.

In the project repository you can:

find an overview about multitenancy.
find more detailed information about the new LoggingRoute resource that enables this new behaviour.
find a simple example to demonstrate the new behaviour

On a side note, nodegroup level isolation for hard multitenancy is also supported, see the Nodegroup-based multitenancy example.

Forwarder logs

Fluent-bit now doesn’t process the logs of the Fluentd and syslog-ng forwarders by default to avoid infinitely growing message loops. With this change, you can access Fluentd and syslog-ng logs simply by running kubectl logs <name-of-forwarder-pod>

In a future Logging operator version the logs of the aggregators will also be available for routing to external outputs.

Timeout-based configuration checks

Timeout-based configuration checks are different from the normal method: they start a Fluentd or syslog-ng instance without the dry-run or syntax-check flags, so output plugins or destination drivers actually try to establish connections and will fail if there are any issues , for example, with the credentials.

Add the following to you Logging resource spec:

spec:
  configCheck:
    strategy: StartWithTimeout
    timeoutSeconds: 5

Istio support

For jobs/individual pods that run to completion, Istio sidecar injection needs to be disabled, otherwise the affected pods would live forever with the running sidecar container. Configuration checkers and Fluentd drainer pods can be configured with the label sidecar.istio.io/inject set to false. You can configure Fluentd drainer labels in the Logging spec.

Improved buffer metrics

The buffer metrics are now available for both the Fluentd and the SyslogNG based aggregators.

The sidecar configuration has been rewritten to add a new metric and improve performance by avoiding unnecessary cardinality.

The name of the metric has been changed as well, but the original metric was kept in place to avoid breaking existing clients.

Metrics currently supported by the sidecar

Old

+# HELP node_buffer_size_bytes Disk space used [deprecated]
+# TYPE node_buffer_size_bytes gauge
+node_buffer_size_bytes{entity="/buffers"} 32253

New

+# HELP logging_buffer_files File count
+# TYPE logging_buffer_files gauge
+logging_buffer_files{entity="/buffers",host="all-to-file-fluentd-0"} 2
+# HELP logging_buffer_size_bytes Disk space used
+# TYPE logging_buffer_size_bytes gauge
+logging_buffer_size_bytes{entity="/buffers",host="all-to-file-fluentd-0"} 32253

Other improvements

You can now configure the resources of the buffer metrics sidecar.
You can now rerun failed configuration checks if there is no configcheck pod.
The Fluentd ElasticSearch output now supports the composable index template format. To use it, set the use_legacy_template option to false.
The metrics for the syslog-ng forwarder are now exported using axosyslog-metrics-exporter.

Image and dependency updates

For the list of images used in Logging operator, see Images used by Logging operator.

Fluentd images with versions v1.14 and v1.15 are now EOL due to the fact they are based on ruby 2.7 which is EOL as well.

The currently supported image is v1.15-ruby3 and build configuration for v1.15-staging is available for staging experimental changes.

Last modified February 7, 2025: Merge pull request #277 from vfaergestad/docs-dup-line (d5fb6a4)