Elasticsearch indexing_pressure.memory.limit

When I try to index large log files through bulk_index, I notice got logs missing. Suspecting it could be due to the memory issue of the elasticsearch nodes. I tried to change the indexing_pressure.memory.limit to 40%. But the issue still persist. FYI, the server has 10GB RAM and has two elasticsearch nodes (docker containers). Both of them with mem_limit set to 6GB. Any advices on how to fix logs missing issues?

Welcome to the forum !

Why? Why not just one elasticsearch node, without any memory overcommitted ?

1 Like

Welcome!

How large the bulk requests are? May be consider to reduce the bulk size?
And yes as @RainTown noticed it's useless to run 2 instances in the same machine unless you are testing something specific...

I’m a bit old school. Docker, containers, virtualization, k8s, all that modern wizardry - lovely stuff. But at the end of the day, a pint pot is still a pint pot. You can wrap it in all the YAML you want, it’s not magically turning into a bucket.

Is this “normal” now? “The server has 10GB RAM and has two Elasticsearch nodes … each with mem_limit set to 6GB.”

My hunch is the “server” may actually be a VM, because clearly what this setup needs is another layer or two. Also, it only dawned on me later that these two instances might be in different Elasticsearch clusters? :grinning_face: Or that there might be a 2GB swap partition on the host.

1 Like

Actually I was considering to have multiple ES nodes for load balancing, or ensuring high availability through failover and backup strategies.

The bulk size is consider quite large (E.g: around 2500000 documents to be uploaded in the elasticsearch). Instead of bulk indexing directly the entire ndjson file, I try to split it into chunks (e.g. each ndjson files consists of 500 lines), to prevent OOM.

The sample lines (each document) in the ndjson file:

{"index":{}}
{"message":"10.1.2.3 - - [26/Nov/2025:23:59:59 +0000] \"GET /apps/a.php HTTP/1.1\" 200 10000 \"``https://www.test.com/apps/b.php\``" \"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.7151.103 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\""}

Ya, the server is actually a VM for testing. And for now, they are deployed in the same cluster. I’m not sure whether this is ”normal” or the correct way to have each of them set with 6GB mem_limit, because the goal here is to test how large memory is adequate so that when the “indexing_pressure.memory.limit” is set to certain percent, the node won’t start rejecting new indexing work and to prevent missing documents.

I was a little flippant yesterday, please forgive me if it was irritating.

A 2-node cluster doesn’t have any more redundancy than a single node cluster. For the cluster to even boot and form, both nodes need to be present and running. You would probably be better to add a 3rd, small, voting-only node.

I am slightly confused now what you are trying to do. Maybe my eye was caught by the 2 x 6GB containers on a 10GB server. Am I right that you are deliberately doing so, simply as a test, in a test environment? You don’t intend such memory over committ, nor hosting cluster nodes on same server, when you reach production?

A “memory issue” does not explain documents just quietly going missing, so fiddling with settings like `indexing_pressure.memory.limit` is probably a bad idea. If Elasticsearch fails to index a document then it will include details of the error in its response, and if you get a successful response then Elasticsearch guarantees that the document has been durably written to the index.

Rather than just guessing at the problem, and possible solution, you need to look at the responses you’re getting back from Elasticsearch.

2 Likes

I half presumed @ee99 was getting 429s sometimes from the bulk requests ? Would be nice to be confirmed, or corrected of course

But, doing some sort of load or failover or whatever testing on a 2-node cluster, both nodes on same server , while overcommitting RAM, just feels like a bad idea to me. Like there might be a more basic misunderstanding. If it’s production, even more so.

Speculating, but tweaking some settings here looks a bit like someone asked chatGPT, or read a blog or other thread here, and is following the suggestion a bit blindly.

It’s unclear from the OP but I don’t normally see folks describe this as mysteriously as “logs missing” nor do they have to do any “suspecting” about the cause of the problem. The solution for 429s is to back off and retry, not to fiddle with settings.

Agreed @DavidTurner

Neither I (nor @dadoonet) have suggested @ee99 fiddle with any settings, nor endorsed the fiddling. He/she is doing some sort of testing, which both I and @dadoonet have enquired about. All we have for now is:

I remain doubtful on the value of this very specific testing, and speculated already that it’s maybe based on some misunderstanding. Hopefully @ee99 can give more details.

Aside: I, and also others, occasionally semi-ignore the specific question asked, if I see something else I think is not ideal. In that scenario, often I ask questions. try to understand, maybe I’ve missed something, poke a little. This is such a case.

Analogy would be if a guy comes into my garage wanting his slightly flat front tyre checked, but I notice his front wheel brakes are almost completely shot, I’m going to mention that. If guy says “yeah, I know, will fix that later” then … well, at least he knows. But plenty of times someone will say “Oh, really, didn’t realize that” or “Yeah, thanks, that could have been really nasty”.

Hi all, thanks for the advices. I admit and realize that it could be an improper way to build/design/deploy it in this way. Sorry about that if I don’t seem to be expert in the configuration and deployment, because I’m using the deployed environment so what I’m trying to do is test on it or adjusting the settings. Will try to redeploy a single cluster for testing or try to explore more on the voting-only node as suggested.

I am really appreciate to have all the advices here. I actually also refer to the documentation available to see what else can be done, just to tune around and test on it. I just try to see whether it got any effect, I do notice it does not recommend to adjust the indexing_pressure.memory.limit as from Indexing pressure settings | Reference . The reason I try to fiddle with the settings because I thought it might be somehow restricted due to the limit has been reached.

Hi all, thanks for the advices. I admit and realize that it could be an improper way to build/design/deploy it in this way. Sorry about that if I don’t seem to be expert in the configuration and deployment, because I’m using the deployed environment so what I’m trying to do is test on it or adjusting the settings. Will try to redeploy a single cluster for testing or try to explore more on the voting-only node as suggested.

I am really appreciate to have all the advices here. I actually also refer to the documentation available to see what else can be done, just to tune around and test on it. I just try to see whether it got any effect, I do notice it does not recommend to adjust the indexing_pressure.memory.limit as from Indexing pressure settings | Reference . The reason I try to fiddle with the settings because I thought it might be somehow restricted due to the limit has been reached.

Hi, I’m not experienced enough so I just trial and error with the settings. Sorry about that. I will revert back the original settings so that it won’t affect other components.

To clarify what I have observed when i try to bulk indexing large logs, I did notice abnormal decrease in the index’s “Storage size” shown in the index management while the “Documents count” remain the same. Is it possible that the indexed documents could have somehow been overwritten by the new indexed documents?

Only if you’re specifying the document ID during indexing, and you’re re-using those IDs. The difference is visible in the response from ES so again my advice is to check the response body carefully:

When creating a new doc, note status: 201 and result: created and _version: 1:

{ "index": { "_id": "test1", "_index": "testindex", "_primary_term": 1, "_seq_no": 0, "_shards": { "failed": 0, "successful": 2, "total": 2 }, "_version": 1, "result": "created", "status": 201 } } 

When overwriting an existing doc, note status: 200 and result: updated and _version: 2:

{ "index": { "_id": "test1", "_index": "testindex", "_primary_term": 1, "_seq_no": 1, "_shards": { "failed": 0, "successful": 2, "total": 2 }, "_version": 2, "result": "updated", "status": 200 } } 

Thanks for the updates. The problem report has taken a sharp turn, but if I now understand correctly by

you actually meant

As @DavidTurner says, important missing info is what did the indexing client see, did it get any error, a 200, a 201, … ? As well as whether you are re-using the same _id field.

Elasticsearch never overwrites documents in place, it creates a new doc with same _id but a higher _version, and marks the previous one as to-be-deleted (at some later point). This “merging”, aka a sort of internal tidy up, can mean storage goes down even if document count does not change.

To see this, put this into kibana dev tools:

DELETE /silly1 PUT /silly1 POST /silly1/_doc/1?refresh=true { "field1": "original"} # That will return a 201 GET /silly1/_doc/1 GET /silly1/_count # # there will be one document in the index, with _id=1 and version=1 # POST /silly1/_doc/1?refresh=true { "field1": "changed"} # That should return a 200 GET /silly1/_doc/1 GET /silly1/_count # # there will still be only one document in the index, with _id=1 and version=2 # But likely the version=1 document is still on your disk, marked for later deletion, but still occupying disk space # 

On the more general point, you wrote:

IMO the most sensible approach at the start of your Elasticsearch journey is to keep the architecture as simple as possible:

  • Start with a single-node cluster

  • Give it dedicated resources (no overcommit)

  • Use the latest version and default settings

  • Learn from doing, while taking note of the documentation

  • Ask here whenever something doesn’t behave as expected and you cant figure it out from the docs.

Once you’re comfortable with Elasticsearch and your level of knowledge/confidence, then expand to more advanced topics.