Configure Fluentd
This page shows some examples on configuring Fluentd.
Ways to configure Fluentd
There are two ways to configure the Fluentd statefulset:
-
Using the spec.fluentd section of The Logging custom resource.
-
Using the standalone FluentdConfig CRD. This method is only available in Logging operator version 4.5 and newer, and the specification of the CRD is compatible with the spec.fluentd configuration method. That way you can use a multi-tenant model, where tenant owners are responsible for operating their own aggregator, while the Logging resource is in control of the central operations team.
The standalone FluentdConfig is a namespaced resource that allows the configuration of the Fluentd aggregator in the control namespace, separately from the Logging resource. This allows you to use a multi-tenant model, where tenant owners are responsible for operating their own aggregator, while the Logging resource is in control of the central operations team. For more information about the multi-tenancy model where the collector is capable of routing logs based on namespaces to individual aggregators and where aggregators are fully isolated, see this blog post about Multi-tenancy using Logging operator.
For the detailed list of available parameters, see FluentdSpec.
Migrating from spec.fluentd to FluentdConfig
The standalone FluentdConfig CRD is only available in Logging operator version 4.5 and newer. Its specification and logic is identical with the spec.fluentd configuration method. Using the FluentdConfig CRD allows you to remove the spec.fluentd section from the Logging CRD, which has the following benefits.
- RBAC control over the FluentdConfig CRD, so you can have separate roles that can manage the Logging resource and the FluentdConfig resource (that is, the Fluentd deployment).
- It reduces the size of the Logging resource, which can grow big enough to reach the annotation size limit in certain scenarios (e.g. when using
kubectl apply
). - You can use a multi-tenant model, where tenant owners are responsible for operating their own aggregator, while the Logging resource is in control of the central operations team.
To migrate your spec.fluentd configuration from the Logging resource to a separate FluentdConfig CRD, complete the following steps.
-
Open your Logging resource and find the spec.fluentd section. For example:
apiVersion: logging.banzaicloud.io/v1beta1 kind: Logging metadata: name: example-logging-resource spec: controlNamespace: logging fluentd: scaling: replicas: 2
-
Create a new FluentdConfig CRD. For the value of metadata.name, use the name of the Logging resource, for example:
apiVersion: logging.banzaicloud.io/v1beta1 kind: FluentdConfig metadata: # Use the name of the logging resource name: example-logging-resource # Use the control namespace of the logging resource namespace: logging
-
Copy the the spec.fluentd section from the Logging resource into the spec section of the FluentdConfig CRD, then fix the indentation. For example:
apiVersion: logging.banzaicloud.io/v1beta1 kind: FluentdConfig metadata: # Use the name of the logging resource name: example-logging-resource # Use the control namespace of the logging resource namespace: logging spec: scaling: replicas: 2
-
Delete the spec.fluentd section from the Logging resource, then apply the Logging and the FluentdConfig CRDs.
Using the standalone FluentdConfig resource
The standalone FluentdConfig is a namespaced resource that allows the configuration of the Fluentd aggregator in the control namespace, separately from the Logging resource. This allows you to use a multi-tenant model, where tenant owners are responsible for operating their own aggregator, while the Logging resource is in control of the central operations team. For more information about the multi-tenancy model where the collector is capable of routing logs based on namespaces to individual aggregators and where aggregators are fully isolated, see this blog post about Multi-tenancy using Logging operator.
A Logging
resource can have only one FluentdConfig
at a time. The controller registers the active FluentdConfig
resource into the Logging
resource’s status under fluentdConfigName
, and also registers the Logging
resource name under logging
in the FluentdConfig
resource’s status, for example:
kubectl get logging example -o jsonpath='{.status}' | jq .
{
"configCheckResults": {
"ac2d4553": true
},
"fluentdConfigName": "example"
}
kubectl get fluentdconfig example -o jsonpath='{.status}' | jq .
{
"active": true,
"logging": "example"
}
If there is a conflict, the controller adds a problem to both resources so that both the operations team and the tenant users can notice the problem. For example, if a FluentdConfig
is already registered to a Logging
resource and you create another FluentdConfig
resource in the same namespace, then the first FluentdConfig
is left intact, while the second one should have the following status:
kubectl get fluentdconfig example2 -o jsonpath='{.status}' | jq .
{
"active": false,
"problems": [
"logging already has a detached fluentd configuration, remove excess configuration objects"
],
"problemsCount": 1
}
The Logging
resource will also show the issue:
kubectl get logging example -o jsonpath='{.status}' | jq .
{
"configCheckResults": {
"ac2d4553": true
},
"fluentdConfigName": "example",
"problems": [
"multiple fluentd configurations found, couldn't associate it with logging"
],
"problemsCount": 1
}
Custom pvc volume for Fluentd buffers
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
bufferStorageVolume:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 40Gi
storageClassName: fast
volumeMode: Filesystem
fluentbit: {}
controlNamespace: logging
Custom Fluentd hostPath volume for buffers
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
disablePvc: true
bufferStorageVolume:
hostPath:
path: "" # leave it empty to automatically generate: /opt/logging-operator/default-logging-simple/default-logging-simple-fluentd-buffer
fluentbit: {}
controlNamespace: logging
FluentOutLogrotate
The following snippet redirects Fluentd’s stdout to a file and configures rotation settings.
This mechanism was used prior to version 4.4 to avoid Fluent-bit rereading Fluentd’s logs and causing an exponentially growing amount of redundant logs.
Example configuration used by the operator in version 4.3 and earlier (keep 10 files, 10M each):
spec:
fluentd:
fluentOutLogrotate:
enabled: true
path: /fluentd/log/out
age: 10
size: 10485760
Fluentd logs are now excluded using the fluentbit.io/exclude: "true"
annotation.
Scaling
You can scale the Fluentd deployment manually by changing the number of replicas in the fluentd section of the The Logging custom resource. For example:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
scaling:
replicas: 3
fluentbit: {}
controlNamespace: logging
For automatic scaling, see Autoscaling with HPA.
Graceful draining
While you can scale down the Fluentd deployment by decreasing the number of replicas in the fluentd section of the The Logging custom resource, it won’t automatically be graceful, as the controller will stop the extra replica pods without waiting for any remaining buffers to be flushed. You can enable graceful draining in the scaling subsection:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
scaling:
drain:
enabled: true
fluentbit: {}
controlNamespace: logging
When graceful draining is enabled, the operator starts drainer jobs for any undrained volumes. The drainer job flushes any remaining buffers before terminating, and the operator marks the associated volume (the PVC, actually) as drained until it gets used again. The drainer job has a template very similar to that of the Fluentd deployment with the addition of a sidecar container that oversees the buffers and signals Fluentd to terminate when all buffers are gone. Pods created by the job are labeled as not to receive any further logs, thus buffers will clear out eventually.
If you want, you can specify a custom drainer job sidecar image in the drain subsection:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
scaling:
drain:
enabled: true
image:
repository: ghcr.io/banzaicloud/fluentd-drain-watch
tag: latest
fluentbit: {}
controlNamespace: logging
In addition to the drainer job, the operator also creates a placeholder pod with the same name as the terminated pod of the Fluentd deployment to keep the deployment from recreating that pod which would result in concurrent access of the volume. The placeholder pod just runs a pause container, and goes away as soon as the job has finished successfully or the deployment is scaled back up and explicitly flushing the buffers is no longer necessary because the newly created replica will take care of processing them.
You can mark volumes that should be ignored by the drain logic by adding the label logging.banzaicloud.io/drain: no
to the PVC.
Autoscaling with HPA
To configure autoscaling of the Fluentd deployment using Horizontal Pod Autoscaler (HPA), complete the following steps.
-
Configure the aggregation layer. Many providers already have this configured, including
kind
. -
Install Prometheus and the Prometheus Adapter if you don’t already have them installed on the cluster. Adjust the default Prometheus address values as needed for your environment (set
prometheus.url
,prometheus.port
, andprometheus.path
to the appropriate values). -
(Optional) Install
metrics-server
to access basic metrics. If the readiness of themetrics-server
pod fails with HTTP 500, try adding the--kubelet-insecure-tls
flag to the container. -
If you want to use a custom metric for autoscaling Fluentd and the necessary metric is not available in Prometheus, define a Prometheus recording rule:
groups: - name: my-logging-hpa.rules rules: - expr: (node_filesystem_size_bytes{container="buffer-metrics-sidecar",mountpoint="/buffers"}-node_filesystem_free_bytes{container="buffer-metrics-sidecar",mountpoint="/buffers"})/node_filesystem_size_bytes{container="buffer-metrics-sidecar",mountpoint="/buffers"} record: buffer_space_usage_ratio
Alternatively, you can define the derived metric as a configuration rule in the Prometheus Adapter’s config map.
-
If it’s not already installed, install the logging-operator and configure a logging resource with at least one flow. Make sure that the logging resource has buffer volume metrics monitoring enabled under
spec.fluentd
:#spec: # fluentd: bufferVolumeMetrics: serviceMonitor: true
-
Verify that the custom metric is available by running:
kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/buffer_space_usage_ratio'
-
The logging-operator enforces the replica count of the stateful set based on the logging resource’s replica count, even if it’s not set explicitly. To allow for HPA to control the replica count of the stateful set, this coupling has to be severed. Currently, the only way to do that is by deleting the logging-operator deployment.
-
Create a HPA resource. The following example tries to keep the average buffer volume usage of Fluentd instances at 80%.
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: logging-fluentd spec: scaleTargetRef: apiVersion: apps/v1 kind: StatefulSet name: logging-fluentd minReplicas: 1 maxReplicas: 10 metrics: - type: Pods pods: metric: name: buffer_space_usage_ratio target: type: AverageValue averageValue: 800m
Probe
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. You can configure a probe for Fluentd in the livenessProbe section of the The Logging custom resource. For example:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
livenessProbe:
periodSeconds: 60
initialDelaySeconds: 600
exec:
command:
- "/bin/sh"
- "-c"
- >
LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
if [ ! -e /buffers ];
then
exit 1;
fi;
touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
if [ -z "$(find /buffers -type d -newer /tmp/marker-liveness -print -quit)" ];
then
exit 1;
fi;
fluentbit: {}
controlNamespace: logging
You can use the following parameters:
Name | Type | Default | Description |
---|---|---|---|
initialDelaySeconds | int | 600 | Number of seconds after the container has started before liveness probes are initiated. |
timeoutSeconds | int | 0 | Number of seconds after which the probe times out. |
periodSeconds | int | 60 | How often (in seconds) to perform the probe. |
successThreshold | int | 0 | Minimum consecutive successes for the probe to be considered successful after having failed. |
failureThreshold | int | 0 | Minimum consecutive failures for the probe to be considered failed after having succeeded. |
exec | array | {} | Exec specifies the action to take. More info |
httpGet | array | {} | HTTPGet specifies the http request to perform. More info |
tcpSocket | array | {} | TCPSocket specifies an action involving a TCP port. More info |
Note: To configure readiness probes, see Readiness probe.
Custom Fluentd image
You can deploy custom images by overriding the default images using the following parameters in the fluentd or fluentbit sections of the logging resource.
Name | Type | Default | Description |
---|---|---|---|
repository | string | "" | Image repository |
tag | string | "" | Image tag |
pullPolicy | string | "" | Always, IfNotPresent, Never |
The following example deploys a custom fluentd image:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd:
image:
repository: banzaicloud/fluentd
tag: v1.10.4-alpine-1
pullPolicy: IfNotPresent
configReloaderImage:
repository: jimmidyson/configmap-reload
tag: v0.4.0
pullPolicy: IfNotPresent
scaling:
drain:
image:
repository: ghcr.io/banzaicloud/fluentd-drain-watch
tag: v0.0.1
pullPolicy: IfNotPresent
bufferVolumeImage:
repository: quay.io/prometheus/node-exporter
tag: v1.1.2
pullPolicy: IfNotPresent
fluentbit: {}
controlNamespace: logging
KubernetesStorage
Define Kubernetes storage.
Name | Type | Default | Description |
---|---|---|---|
hostPath | HostPathVolumeSource | - | Represents a host path mapped into a pod. If path is empty, it will automatically be set to /opt/logging-operator/<name of the logging CR>/<name of the volume> |
emptyDir | EmptyDirVolumeSource | - | Represents an empty directory for a pod. |
pvc | PersistentVolumeClaim | - | A PersistentVolumeClaim (PVC) is a request for storage by a user. |
Persistent Volume Claim
Name | Type | Default | Description |
---|---|---|---|
spec | PersistentVolumeClaimSpec | - | Spec defines the desired characteristics of a volume requested by a pod author. |
source | PersistentVolumeClaimVolumeSource | - | PersistentVolumeClaimVolumeSource references the user’s PVC in the same namespace. |
The Persistent Volume Claim should be created with the given spec
and with the name
defined in the source
’s claimName
.
CPU and memory requirements
To adjust the CPU and memory limits and requests of the pods managed by Logging operator, see CPU and memory requirements.