2

We're running MariaDB server 10.6.16 from the RPM repository https://mirror.mariadb.org/yum/10.6/rhel9-amd64/ on a AlmaLinux 9.3 system with 48 GiB memory (no swap space) and 32 CPU cores. We're exclusively using InnoDB for the database tables. The system's filesystems run on NVME SSDs via Ceph.

Memory usage steadily increases more and more, triggering the OOM killer now and then.

The formula from https://serverfault.com/a/1020847 returns an upper memory consumption limit of 7500 MiB. I'm aware that this formula is not exact, but the MariaDB service is currently using about 11 GiB of memory, which is much more than the predicted maximum of 7500 MiB. We observed a memory usage of over 15 GiB in the past, just before the OOM killer hit.

What am I missing here?

/proc/[mariadb-pid]/limits:

Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 191442 191442 processes Max open files 32768 32768 files Max locked memory 8388608 8388608 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 191442 191442 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us 

[mysqld] section of the MariaDB configuration:

basedir = /usr bind_address = 0.0.0.0 binlog_format = mixed character-set-server = utf8 collation-server = utf8_unicode_ci datadir = /var/lib/mysql expire_logs_days = 7 innodb_buffer_pool_size = 4G innodb_data_file_path = ibdata1:10M:autoextend innodb_file_per_table = on innodb_flush_log_at_trx_commit = 1 innodb_lock_wait_timeout = 50 innodb_log_buffer_size = 16M innodb_log_file_size = 512M innodb_log_files_in_group = 2 innodb_stats_on_metadata = off key_buffer_size = 16M lc_messages = en_US lc_messages_dir = /usr/share/mysql log-bin = /var/log/mariadb/mysql-bin log-error = /var/log/mysqld.log long_query_time = 5 max_allowed_packet = 16M max_connections = 1024 myisam_sort_buffer_size = 8M net_buffer_length = 8K pid-file = /run/mariadb/mariadb.pid port = 3306 query_cache_size = 4194304 query_cache_type = 1 read_buffer_size = 256K read_rnd_buffer_size = 512K require_secure_transport = ON server-id = 1 skip-external-locking skip-name-resolve slow_query_log = ON slow_query_log_file = /var/log/mariadb/slow-queries.log socket = /var/lib/mysql/mysql.sock sort_buffer_size = 2M sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION ssl_ca = /etc/ssl/mariadb/chain.pem ssl_cert = /etc/ssl/mariadb/cert.pem ssl_key = /etc/ssl/mariadb/privkey.pem table_definition_cache = 8000 table_open_cache = 16000 thread_cache_size = 8 thread_stack = 256K tmp_table_size = 8M tmpdir = /tmp user = mysql 

SHOW VARIABLES: https://pastebin.com/raw/NcT5jxiR

SHOW GLOBAL STATUS: https://pastebin.com/raw/B8MFr2WX

SHOW ENGINE INNODB STATUS:

===================================== 2024-02-07 09:51:38 0x7fcf4d171640 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 1 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 508984 srv_idle srv_master_thread log flush and writes: 508978 ---------- SEMAPHORES ---------- ------------ TRANSACTIONS ------------ Trx id counter 191793820 Purge done for trx's n:o < 191793816 undo n:o < 0 state: running History list length 2 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION (0x7fd05f21e880), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f21dd80), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f21bc80), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f21b180), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f21a680), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f21d280), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f21c780), not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION (0x7fd05f219b80), not started 0 lock struct(s), heap size 1128, 0 row lock(s) -------- FILE I/O -------- Pending flushes (fsync) log: 0; buffer pool: 0 31442319 OS file reads, 3626734 OS file writes, 3262529 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 5.99 writes/s, 5.99 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 merges merged operations: insert 0, delete mark 0, delete 0 discarded operations: insert 0, delete mark 0, delete 0 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 65657865607 Log flushed up to 65657865557 Pages flushed up to 65594938073 Last checkpoint at 65594938073 0 pending log flushes, 0 pending chkp writes 3185884 log i/o's done, 6.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 4328521728 Dictionary memory allocated 61678248 Buffer pool size 259584 Free buffers 1 Database pages 259583 Old database pages 95802 Modified db pages 5013 Percent of dirty pages(LRU & free pages): 1.931 Max dirty pages percent: 90.000 Pending reads 0 Pending writes: LRU 0, flush list 0 Pages made young 7999092, not young 2607506757 0.00 youngs/s, 0.00 non-youngs/s Pages read 31427077, created 59965, written 435199 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 259583, unzip_LRU len: 0 I/O sum[290]:cur[0], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 read views open inside InnoDB Process ID=0, Main thread ID=0, state: sleeping Number of rows inserted 564360, updated 1722533, deleted 374105, read 92464595804 0.00 inserts/s, 3.00 updates/s, 0.00 deletes/s, 211216.78 reads/s Number of system rows inserted 0, updated 0, deleted 0, read 0 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 

SELECT COUNT(*) FROM information_schema.tables:

+----------+ | COUNT(*) | +----------+ | 14520 | +----------+ 1 row in set (0.125 sec) 

SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables LIMIT 100000:

+----------+------------------+-------------------+----------------+ | COUNT(*) | sum(data_length) | sum(index_length) | sum(data_free) | +----------+------------------+-------------------+----------------+ | 228 | 106496 | 106496 | 0 | +----------+------------------+-------------------+----------------+ 1 row in set (0.618 sec) 
10
  • Well, the formula that much not exact. Monitor buffer usage, monitor connection usage then tune down the limits appropriately. Commented Feb 7, 2024 at 13:41
  • You have Max_used_connections=205 with Uptime=512543. Pretty sure you can tune down max_connections from 1024. Commented Feb 7, 2024 at 13:56
  • Could you tell us # Cores on your 48GB server and storage media is HDD, SSD or NVME ? And GB of OS SWAP space. Commented Feb 8, 2024 at 15:32
  • 1
    @WilsonHauck I have updated my post accordingly. Commented Feb 8, 2024 at 16:17
  • 1
    @WilsonHauck I have updated my post at the very bottom with the requested information. Commented Feb 12, 2024 at 9:01

3 Answers 3

2

3/1/2024 stepping stone AG
This is likely contributing to your problem.

https://stackoverflow.com/questions/78071133/mariadb-not-releasing-ram-after-jobs-finish

Global Variable SELECT @@version_malloc_library; needs to be jemalloc

The best to you, always.

1
Analysis of GLOBAL STATUS and VARIABLES: 

Observations:

  • Version: 10.6.16-MariaDB-log
  • 16 GB of RAM
  • Uptime = 5d 22:22:23
  • 904 Queries/sec : 761 Questions/sec

The More Important Issues:

Why is SHOW CREATE TABLE being performed so often?

Lowering innodb_ft_result_cache_limit might help.

There seems to be a lot of thrashing in the Query cache. Suggest turning it off:

query_cache_type = OFF query_cache_size = 0 

Try lowering thread_pool_max_threads (currently 64K)

Why is UNLOCK TABLES being used?

Why so many tables? Are lots of them "identical"? Let's discuss the drawbacks.

A guess: There are a lot of tmp tables (inside SELECTs); these may first try to use 8MB of RAM before giving up to use disk:

  • Lower tmp_table_size to 4M to see if it helps; and
  • Use the slowlog to find "slow" queries; then let's improve them. Use long_query_time = 1; see SlowLog

Details and other observations:

( table_open_cache ) = 15,857 -- Number of table descriptors to cache -- Several hundred is usually good.

( Table_open_cache_misses ) = 1010093 / 512543 = 2 /sec -- May need to increase table_open_cache (now 15857)

( Com_show_create_table ) = 2531490 / 512543 = 4.9 /sec -- How often SHOW CREATE TABLE is executed. Possibly comes from a 3rd party package that is sloppy. -- Check table_open_cache (now 15857) and open_files_limit (now 32768). Or avoid the 3rd party package.

( innodb_buffer_pool_size ) = 4,096 / 16384M = 25.0% -- % of RAM used for InnoDB buffer_pool -- Set to about 70% of available RAM. (To low is less efficient; too high risks swapping.)

( innodb_lru_scan_depth ) = 1,536 -- innodb_lru_scan_depth is a very poorly named variable. A better name would be innodb_free_page_target_per_buffer_pool. It is a number of pages InnoDB tries to keep free in each buffer pool instance to speed up read and page creation operations. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity ) = 200 -- When flushing, use this many IOPs. -- Reads could be slugghish or spiky. Use 2000 if using SSD drive.

( Innodb_buffer_pool_pages_old / Innodb_buffer_pool_pages_total ) = 95,803 / 259584 = 36.9% -- Pct of buffer_pool that is "old".

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 512,543 / 60 * 512M / 5039606272 = 910 -- Minutes between InnoDB log rotations Beginning with 5.6.8, innodb_log_file_size can be changed dynamically; I don't know about MariaDB. Be sure to also change my.cnf -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 536870912). (Cannot change in AWS.)

( Innodb_row_lock_waits/Innodb_rows_inserted ) = 19,947/565194 = 3.5% -- Frequency of having to wait for a row.

( innodb_flush_neighbors ) = innodb_flush_neighbors = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor. Limits write IO requests per second (IOPS). -- For starters: HDD: 200; SSD: 2000.

( innodb_flush_log_at_trx_commit ) = 1 -- 1 = secure; 2 = faster -- (You decide) Use 1, along with sync_binlog (now 0)=1 for the greatest level of fault tolerance. 0 is best for speed. 2 is a compromise between 0 and 1.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster. 0 is OK for Galera.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( innodb_ft_result_cache_limit ) = 2,000,000,000 / 16384M = 11.6% -- Byte limit on FULLTEXT resultset. (It grows as needed.) -- Lower the setting.

( character_set_client ) = character_set_client = utf8mb3 -- -- If you will be using text from places other than Western Europe, consider switching to utf8mb4. (Beyond the scope of this discussion.)

( character_set_connection ) = character_set_connection = utf8mb3 --

( character_set_results ) = character_set_results = utf8mb3 --

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( query_cache_size ) = 4M -- Size of QC -- Too small = not of much use. Too large = too much overhead. Recommend either 0 or no more than 50M.

( Qcache_lowmem_prunes ) = 223055278 / 512543 = 435 /sec -- Running out of room in QC -- increase query_cache_size (now 4194304)

( Qcache_lowmem_prunes/Qcache_inserts ) = 223,055,278/224646888 = 99.3% -- Removal Ratio (frequency of needing to prune due to not enough memory)

( Qcache_hits / Qcache_inserts ) = 119,595,591 / 224646888 = 0.532 -- Hit to insert ratio -- high is good -- Consider turning off the query cache.

( Qcache_inserts - Qcache_queries_in_cache ) = (224646888 - 700) / 512543 = 438 /sec -- Invalidations/sec.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (4M - 582944) / 700 / 16384 = 0.315 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 16384)

( Created_tmp_tables ) = 14256825 / 512543 = 28 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 3584311 / 512543 = 7 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 8388608) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( long_query_time ) = 5 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Max_used_connections / max_connections ) = 205 / 1024 = 20.0% -- Peak % of connections -- Since several memory factors can expand based on max_connections (now 1024), it is good not to have that setting too high.

( Com_change_db ) = 4206525 / 512543 = 8.2 /sec -- Probably comes from USE statements. -- Consider connecting with DB, using db.tbl syntax, eliminating spurious USE statements, etc.

( Connections ) = 7860380 / 512543 = 15 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

( thread_pool_max_threads ) = 65,536 -- One of many settings for MariaDB's thread pooling -- Lower the value.

Abnormally small:

Handler_read_next / Handler_read_key = 0.58 min(max_heap_table_size, tmp_table_size) = 8MB net_buffer_length = 8,192 tmp_memory_table_size = 8MB 

Abnormally large:

(Com_select + Qcache_hits) / (Com_insert + Com_update + Com_delete + Com_replace) = 90 Acl_database_grants = 171 Acl_users = 175 Binlog_snapshot_position = 1.06e+9 Com_release_savepoint = 14 /HR Com_rollback_to_savepoint = 2.3 /sec Com_savepoint = 14 /HR Com_show_binlog_status = 0.014 /HR Com_show_events = 1.1 /HR Com_show_fields = 2.5 /sec Com_show_grants = 0.099 /sec Com_stmt_close = 72 /sec Com_stmt_execute = 72 /sec Com_stmt_prepare = 72 /sec Com_unlock_tables = 18 /HR Feature_check_constraint = 22 /HR Feature_fulltext = 0.054 /sec Feature_json = 4.6 /sec Feature_subquery = 167 /sec Handler_discover = 0.77 /HR Handler_read_key = 90462 /sec Handler_savepoint = 28 /HR Handler_savepoint_rollback = 4.6 /sec Handler_tmp_update = 6181 /sec Innodb_buffer_pool_pages_lru_flushed = 317,184 Innodb_buffer_pool_pages_lru_freed = 1.72e+6 Innodb_data_pending_reads = 1 Innodb_dblwr_pages_written / Innodb_dblwr_writes = 72.5 Innodb_instant_alter_column = 0.47 /HR Innodb_log_writes / Innodb_log_write_requests = 675.0% Innodb_num_open_files = 15,257 Memory_used = 29.6% Memory_used_initial = 4,280.9MB Open_streams = 4 Open_table_definitions = 14,162 Open_tables = 15,857 Opened_views = 0.15 /sec Qcache_hits = 233 /sec Qcache_inserts = 438 /sec Ssl_accepts = 536 Ssl_finished_accepts = 536 Subquery_cache_hit = 494 /sec Subquery_cache_miss = 114 /sec innodb_lru_scan_depth / innodb_io_capacity = 7.68 innodb_open_files = 15,857 innodb_purge_batch_size = 1,000 performance_schema_max_statement_classes = 222 table_definition_cache = 8,000 

Abnormal strings:

Slave_heartbeat_period = 0 Slave_received_heartbeats = 0 aria_recover_options = BACKUP,QUICK character_set_system = utf8mb3 disconnect_on_expired_password = OFF innodb_fast_shutdown = 1 old_alter_table = DEFAULT old_mode = UTF8_IS_UTF8MB3 optimizer_trace = enabled=off require_secure_transport = ON sql_slave_skip_counter = 0 
5
  • The OP is getting OOMs already and you are suggesting that they increase innodb_buffer_pool_size. Commented Feb 9, 2024 at 7:06
  • @AlexD - Yes, that is a puzzle. Addressing the "more important issues" may solve the problem and the let the buffer_pool grow. Commented Feb 9, 2024 at 17:26
  • And how much memory do you expect to free up with the "more important issues"? Commented Feb 9, 2024 at 17:57
  • @AlexD - No clue. For example, increasing query_cache_size would allow bigger resultsets to be saved, thereby avoiding recomputing some queries (and saving space). More likely, turning off the QC would save some space without slowing down the system. The QC has some overhead; it is impossible (without testing your app) whether it helps or hurts. Commented Feb 9, 2024 at 18:33
  • The OP's main issue is that MariaDB consumes more memory than they expected and they are hitting OOMs. Commented Feb 9, 2024 at 22:45
0

Out Of Memory Suggestion to consider for relief.

Create 10G SWAP Space. 

It would be better to run slow for a few seconds than deal with OOM.

Rate Per Second = RPS

Suggestions to consider for your my.cnf [mysqld] section to reduce RAM used

max_connections= 512 # from 1024 - max_connections_used = 205 in 5 days. read_rnd_buffer_size=16K # from 512K to reduce handler_read_rnd_next RPS 108,335 innodb_io_capacity=900 # from 200 to use more of NVME IOPS capacity thread_cache_size=128 # from 8 to reduce threads_created RPS 2+. innodb_old_blocks_pct=1 # from 37 to support more rows in Innodb_buffer_pool_size 

There are many more opportunities to conserve RAM in your configuration and improve performance every day/night.

7
  • 10 GB swap? Linux with overcommit enabled is bad at handling low memory conditions and adding large swap space can lead to long 'swap thrashing' incidents. It is better to have an OOM than to have a system locked entirely. Commented Feb 9, 2024 at 10:19
  • 1
    Yes, 10 GB swap. Will allow you to survive by running slower than usual to avoid totally alienating someone who is just trying to do their job every day, successfully. All swap events should have follow-up to attempt improvements in response time for every slow function. Usually if someone cares enough to keep the customer online, they will care enough to find and fix the underlying problem. Commented Feb 9, 2024 at 11:47
  • 1
    You need to learn what 'swap thrashing' is. OOM-killer was invented to solve this problem. Commented Feb 9, 2024 at 11:51
  • 1
    It is not your system you are giving advice about. It is someone else system and your advice can lead to bad consequences. Commented Feb 9, 2024 at 12:03
  • 1
    OOM-Killer was not designed to prevent swap. Adding swap space is not the right way to solve OOM. Adjusting OOM score is not the right way to address OOM. Reduce your overcommit limit. Commented Feb 12, 2024 at 9:35

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.