Skip to content

Measure per-request/per-worker memory allocations statistics

Problem

Currently, we scrape amount of memory allocations for all work being executed. This does not provide a granularity needed to understand per-context memory allocations (per-request or per-worker execution).

This is due that a globally sampled stats are affected by all threads being executed.

Idea

We should have a predictable way to understand how much memory is being allocated by each requests/worker during its execution to understand:

  • GC pressure on heap slots allocations
  • a number of malloc calls (like for Strings)
  • a size of malloc calls (like processing large blobs of data)
  • allocations per second of execution
  • allocation size per second of execution
  • (maybe) histogram of memory allocations

Requirements

  • be thread-safe and measure only with a context of a given execution
  • log all counters to allow them easily to scrape using ELK

Solution

  • Extend Ruby VM, and try to upstream it to provide an ability to measure allocations done in a given thread.
  • Expose these counters as part of our logs to be able to scrape it.
  • Analise data
  • Provide a dashboards taking into account per-request and per-feature_category information about allocations per-unit of execution or per-unit of time

What needs to be done?

1. Harder and longer

  • Get the Ruby VM patch be merged upstream and wait for us to update to Ruby 3.1 or 3.2
  • Update GitLab Rails to use a patch and log data

2. Likely easier, but requiring to update our components (choosen path)

Summary

I decided to take a road of manually patching our stack for time being, and hopefully getting this merged upstream. Assuming that this gets merged and we update to Ruby 3.1 we could have that supported out of box and just drop the patch.

Edited by Kamil Trzciński