0

I'm running a Fluent Bit v3.0.2 instance using opentelemetry as input. It filters and modifies logs written by another service and outputs it to an elasticsearch instance.

I noticed v3.0.2 is quite old, so I started updating to v3.0.7. After that I started updating to 3.1, 3.2 and 4.0. But I'm not able to flush my logs - at least not all of them. After analyzing my logs in Elasticsearch it was obvious:

{"@timestamp":"2025-07-01T15:28:52.278Z", "log.level": "INFO", "message":"Error while parsing document for index [searches-2025-07-01]: [1:15] failed to parse field [@timestamp] of type [date_nanos] in document with id 'm_CaxpcBaxoCCPtY8Oc1'. Preview of field's value: '2025-07-01T15:28:51.18446744073562Z'", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es-cluster-0][write][T#1]","log.logger":"org.elasticsearch.index.mapper.DocumentMapper","elasticsearch.cluster.uuid":"2xBcZWfsS-ua9FgSD4n4Pg","elasticsearch.node.id":"WJyLGxIhQsuhDGug_yWOEg","elasticsearch.node.name":"es-cluster-0","elasticsearch.cluster.name":"tvthek-logs","tags":["[searches-2025-07-01]"],"error.type":"org.elasticsearch.index.mapper.DocumentParsingException","error.message":"[1:15] failed to parse field [@timestamp] of type [date_nanos] in document with id 'm_CaxpcBaxoCCPtY8Oc1'. Preview of field's value: '2025-07-01T15:28:51.18446744073562Z'","error.stack_trace":"...

So it looks like Elasticsearch can't handle nanoseconds precision passed by Fluent Bit.

I then removed the index, updated the index-template to make sure it uses date_nanos on @timestamp and restarted again without success; partially. Some logs still get flushed to Elasticsearch - they usually have only 3-4 digits after the second (milliseconds). Like previous messages. Because of that behavior (some messages get flushed, some not) my fluent-bit's healthcheck is constantly failing and Openshift restarts it everytime - forcing me to use 3.0.7 (which is pretty outdated).

Interestingly this starts with 3.1, while 3.0.x works fine. Nanosecond precision is available for years - so I'm not sure where to look for the issue. Here is my fluent-bit.conf:

[INPUT] name opentelemetry tag otel tag_from_uri false port 4318 [FILTER] Name parser Parser json Match otel Key_Name message Reserve_Data Off [FILTER] Name modify Match otel Rename url.path url_path Rename url.query url_query Rename http.response.header.X-App-Searchterm term # filter /suggest|/search calls (API & Frontend) [FILTER] Name grep Match otel Exclude url_path \/suggest$ Exclude url_path \/search$ # validate [FILTER] Name expect Match otel key_exists term action warn # keep only specific keys, remove others (GDPR & disk usage) [FILTER] Name record_modifier Match otel Allowlist_key url_path Allowlist_key url_query Allowlist_key first_line Allowlist_key term [OUTPUT] Name es Match otel Host <domain>.svc.cluster.local Port 9200 Index daily # Trace_Error On Replace_Dots On Suppress_Type_Name On compress gzip Logstash_Format On Logstash_Prefix searches Logstash_DateFormat %F 

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.