DEV Community

Cover image for RAM consumption in Python
Z. QIU
Z. QIU

Posted on

RAM consumption in Python

Yesterday I was busy helping a young engineer in our startup to solve a technique problem about strangely high RAM consumption. He has recently implemented an API in our Back-end project for uploading data from json/jsonl file to MongoDB. I spent some time yesterday on digging into the RAM consumption of Python. I would like to share what I've learnt for this subject.

Check RAM status of Linux

Above all, I want to note down how I check the usage of RAM in our Ubuntu 18.04 running on server.

free -h 
Enter fullscreen mode Exit fullscreen mode

Output in my case:
Alt Text

Some explanation of the output of free command from this page:

Free memory is the amount of memory which is currently not used for anything. This number should be small, because memory which is not used is simply wasted.

Available memory is the amount of memory which is available for allocation to a new process or to existing processes.

Memory allocation in C++

I'd like to recall firstly how C++ allocates memories for its variables. In C++ (prior to C++ 11), all variables are declared with a predefined type. Thus the compiler can easily decide the size of the variable and where to store it (heap, stack or static area). See this example I wrote yesterday (I have a 64-bit CPU and the compiler is g++ for x86_64) as file testMem.cpp:

#include <iostream> /* I set here alignment to 1 byte. One can remove this line to see RAM consumption with default alignment setting /* #pragma pack(1)  using namespace std; int g1 = 4; class c1 { }; class c2 { int x = 1; int y = 2; char z = 12; char* name; }; int main() { cout << " ================= " << endl; cout << " Sizeof g1: " << sizeof(g1) << endl; cout << " Address of g1: " << &(g1) << endl; cout << " ================= " << endl; int a = 100; double b = 20.0; c1* myC1 = new c1(); // heap c2* myC2 = new c2(); // heap char c = 55; short d = 122; cout << " Sizeof a: " << sizeof(a) << endl; cout << " Address of a: " << &(a) << endl; cout << " Sizeof b: " << sizeof(b) << endl; cout << " Address of b: " << &(b) << endl; cout << " Sizeof c: " << sizeof(c) << endl; cout << " Address of c: " << static_cast<void *>(&c) << endl; cout << " Sizeof d: " << sizeof(d) << endl; cout << " Address of d: " << static_cast<void *>(&d) << endl; cout << " ================= " << endl; cout << " Sizeof c1: " << sizeof(c1) << endl; cout << " Sizeof c2: " << sizeof(c2) << endl; cout << " Sizeof myC1: " << sizeof(myC1) << endl; cout << " Sizeof myC2: " << sizeof(myC2) << endl; cout << " ================= " << endl; cout << " Address of ptr myC1: " << static_cast<void *>(&myC1) << endl; cout << " Address of ptr myC2: " << static_cast<void *>(&myC1) << endl; cout << " Address value of myC1: " << static_cast<void *>(myC1) << endl; // heap cout << " Address value of myC2: " << static_cast<void *>(myC1) << endl; // heap cout << " ================= " << endl; int arr[10] = {1}; cout << " Sizeof arr: " << sizeof(arr) << endl; // array of 10 integers cout << " Address of arr: " << arr << endl; } 
Enter fullscreen mode Exit fullscreen mode

Compile this file and execute it:

> g++ testMem.cpp -o testMem > ./testMem 
Enter fullscreen mode Exit fullscreen mode

Below is the output:
Alt Text

In C++, it's quite clear for us to predict in which memory area (stack/heap/static) a variable is stored by only reading the code. The size of a simple variable in C++ is exactly the number of bytes in which its data has been stored, and it's also straight forward to calculate the size of a compound data type variable. As shown in this example, one can calculate the size of a class/struct by summing up the sizes of its non-static data members (one can search google for a further explanation).

Memory allocation in Python

Now let's do some tests in Python. We can use sys.getsizeof() and id() to get the size and address of an object in Python, however, the real calculation of RAM consumption in Python is a little more complicated than expected.

import sys import time def testMem(): a1, a2 = 1, 1.0 print("++ a1 has size: {}, address: {}".format( sys.getsizeof(a1), id(a1) )) print("-- a2 has size: {}, address: {}".format( sys.getsizeof(a2), id(a2) )) b1, b2 = 256, 257 print("++ b1 has size: {}, address: {}".format( sys.getsizeof(b1), id(b1) )) print("-- b2 has size: {}, address: {}".format( sys.getsizeof(b2), id(b2) )) c1, c2 = -5, -6 print("++ c1 has size: {}, address: {}".format( sys.getsizeof(c1), id(c1) )) print("-- c2 has size: {}, address: {}".format( sys.getsizeof(c2), id(c2) )) d1 = {"x":12} d2 = {"x1":100000, "x2":"abcdefg", "x3":-100000000000, "x4":0.00000005, "x5": 'v'} print("++ d1 has size: {}, address: {}".format( sys.getsizeof(d1), id(d1) )) print("-- d2 has size: {}, address: {}".format( sys.getsizeof(d2), id(d2) )) e1 = (1, 2, 3) e2 = [1, 2, 3] print("++ e1 has size: {}, address: {}".format( sys.getsizeof(e1), id(e1) )) print("-- e2 has size: {}, address: {}".format( sys.getsizeof(e2), id(e2) )) if __name__ =="__main__": testMem() 
Enter fullscreen mode Exit fullscreen mode

Execution output:
Alt Text

As we can see in the picture above, variables' sizes in Python are larger than in C++. Reason for this fact is that everything in Python is an Object (i.e instance of a class-type). Some interesting facts seen in this example:

  • addresses of integers in [-5, 256] are far away from that of other integers (-6, 257 in this example)
  • a short Dict has the same size as a long Dict
  • size of a tuple/list is not the sum of all its items

To understand the details of memory management in Python, one can refer to this article. Here I want to emphasize two important things :

  • The management of Python is quite different from C++, Python objects have a huge fixed overhead regarding C++.
  • For Python containers, sys.getsizeof() does not return the sum of its containing objects, however, it returns only the memory consumption of the container itself and the pointers to its objects.

Below is an example for calculating the "real size" of a container. This function total_size() will go over all items in the container and sum up their sizes to give a total size of the container. See the code:

from __future__ import print_function from sys import getsizeof, stderr from itertools import chain from collections import deque try: from reprlib import repr except ImportError: pass import sys def total_size(o, handlers={}, verbose=False): """ Returns the approximate memory footprint an object and all of its contents. Automatically finds the contents of the following builtin containers and their subclasses: tuple, list, deque, dict, set and frozenset. To search other containers, add handlers to iterate over their contents: handlers = {SomeContainerClass: iter, OtherContainerClass: OtherContainerClass.get_elements} """ dict_handler = lambda d: chain.from_iterable(d.items()) all_handlers = {tuple: iter, list: iter, deque: iter, dict: dict_handler, set: iter, frozenset: iter, } all_handlers.update(handlers) # user handlers take precedence  seen = set() # track which object id's have already been seen  default_size = getsizeof(0) # estimate sizeof object without __sizeof__  def sizeof(o): if id(o) in seen: # do not double count the same object  return 0 seen.add(id(o)) s = getsizeof(o, default_size) if verbose: print(s, type(o), repr(o), file=stderr) for typ, handler in all_handlers.items(): if isinstance(o, typ): s += sum(map(sizeof, handler(o))) break return s return sizeof(o) def testMemory(): a = {"x":12} b = {"x1":1, "x2":"hello", "x3":1.2, "x4":-3, "x5":2000000} print("memory of a: {}".format( total_size(a) )) print("memory of b: {}".format( total_size(b) )) print('Done!') if __name__ == '__main__': testMemory() 
Enter fullscreen mode Exit fullscreen mode

Execution output:
Alt Text

Our issue

To analyze our technique problem, I have used a very handy tool memory_profiler for displaying memory consumption status. Here is my test code (function total_size() is needed but not shown here):

from pymongo import MongoClient from memory_profiler import profile import sys @profile def testMemory(): client = MongoClient("mongodb://xxxx:tttttt@ourserver:8917") db = client["suppliers"] col = db.get_collection("companies_pool") re_query_filter = {"domain": {'$regex': "科技"} } docs = col.find(re_query_filter) print(type(docs)) docs = docs[10:] l = list(docs) print("memory of l: {}".format( total_size(l) )) f = open("D:\\concour_s2\\Train\\dd.zip", "br") // a large file in my test, in server case, it shall load only json/jsonl file s = f.read() print("memory of s: ", sys.getsizeof(s)) del l del s print('Done!') if __name__ == '__main__': testMemory() 
Enter fullscreen mode Exit fullscreen mode

Below is the execution output:
Alt Text

The code in this part is not what our engineer has written in his project, but he has implemented some similar operations in his API method. With this simple example, he has now understood why his method devoured surprisingly so much RAM at times. Then the problem has been quickly solved.

Top comments (0)