I'm running a Fluent Bit v3.0.2 instance using opentelemetry as input. It filters and modifies logs written by another service and outputs it to an elasticsearch instance.
I noticed v3.0.2 is quite old, so I started updating to v3.0.7. After that I started updating to 3.1, 3.2 and 4.0. But I'm not able to flush my logs - at least not all of them. After analyzing my logs in Elasticsearch it was obvious:
{"@timestamp":"2025-07-01T15:28:52.278Z", "log.level": "INFO", "message":"Error while parsing document for index [searches-2025-07-01]: [1:15] failed to parse field [@timestamp] of type [date_nanos] in document with id 'm_CaxpcBaxoCCPtY8Oc1'. Preview of field's value: '2025-07-01T15:28:51.18446744073562Z'", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es-cluster-0][write][T#1]","log.logger":"org.elasticsearch.index.mapper.DocumentMapper","elasticsearch.cluster.uuid":"2xBcZWfsS-ua9FgSD4n4Pg","elasticsearch.node.id":"WJyLGxIhQsuhDGug_yWOEg","elasticsearch.node.name":"es-cluster-0","elasticsearch.cluster.name":"tvthek-logs","tags":["[searches-2025-07-01]"],"error.type":"org.elasticsearch.index.mapper.DocumentParsingException","error.message":"[1:15] failed to parse field [@timestamp] of type [date_nanos] in document with id 'm_CaxpcBaxoCCPtY8Oc1'. Preview of field's value: '2025-07-01T15:28:51.18446744073562Z'","error.stack_trace":"...
So it looks like Elasticsearch can't handle nanoseconds precision passed by Fluent Bit.
I then removed the index, updated the index-template to make sure it uses date_nanos on @timestamp and restarted again without success; partially. Some logs still get flushed to Elasticsearch - they usually have only 3-4 digits after the second (milliseconds). Like previous messages. Because of that behavior (some messages get flushed, some not) my fluent-bit's healthcheck is constantly failing and Openshift restarts it everytime - forcing me to use 3.0.7 (which is pretty outdated).
Interestingly this starts with 3.1, while 3.0.x works fine. Nanosecond precision is available for years - so I'm not sure where to look for the issue. Here is my fluent-bit.conf:
[INPUT] name opentelemetry tag otel tag_from_uri false port 4318 [FILTER] Name parser Parser json Match otel Key_Name message Reserve_Data Off [FILTER] Name modify Match otel Rename url.path url_path Rename url.query url_query Rename http.response.header.X-App-Searchterm term # filter /suggest|/search calls (API & Frontend) [FILTER] Name grep Match otel Exclude url_path \/suggest$ Exclude url_path \/search$ # validate [FILTER] Name expect Match otel key_exists term action warn # keep only specific keys, remove others (GDPR & disk usage) [FILTER] Name record_modifier Match otel Allowlist_key url_path Allowlist_key url_query Allowlist_key first_line Allowlist_key term [OUTPUT] Name es Match otel Host <domain>.svc.cluster.local Port 9200 Index daily # Trace_Error On Replace_Dots On Suppress_Type_Name On compress gzip Logstash_Format On Logstash_Prefix searches Logstash_DateFormat %F