This is an improved version of smmalloc a fast and efficient memory allocator designed to handle many small allocations/deallocations in heavy multi-threaded scenarios. The allocator created for usage in applications where the performance is critical such as video games.
Using smmalloc allocator in the .NET environment helps to minimize GC pressure for allocating buffers and avoid using lock-based pools in multi-threaded systems. Modern .NET features such as Span<T> greatly works in tandem with smmalloc and allows conveniently manage data in native memory blocks.
To build the native library appropriate software is required:
For desktop platforms CMake with GNU Make or Visual Studio.
A managed assembly can be built using any available compiling platform that supports C# 3.0 or higher.
// 8 buckets, 16 MB each, 128 bytes maximum allocation size SmmallocInstance smmalloc = new SmmallocInstance(8, 16 * 1024 * 1024);smmalloc.Dispose();// 4 KB of thread cache for each bucket, hot warmup smmalloc.CreateThreadCache(4 * 1024, CacheWarmupOptions.Hot);smmalloc.DestroyThreadCache();// 64 bytes of a memory block IntPtr memory = smmalloc.Malloc(64);smmalloc.Free(memory);IntPtr[] batch = new IntPtr[32]; // Allocate a batch of memory for (int i = 0; i < batch.Length; i++) { batch[i] = smmalloc.Malloc(64); } // Release the whole batch smmalloc.Free(batch);// Using Marshal byte data = 0; for (int i = 0; i < smmalloc.Size(memory); i++) { Marshal.WriteByte(memory, i, data++); } // Using Span Span<byte> buffer; unsafe { buffer = new Span<byte>((byte*)memory, smmalloc.Size(memory)); } byte data = 0; for (int i = 0; i < buffer.Length; i++) { buffer[i] = data++; }// Using Marshal int sum = 0; for (int i = 0; i < smmalloc.Size(memory); i++) { sum += Marshal.ReadByte(memory, i); } // Using Span int sum = 0; foreach (var value in buffer) { sum += value; }// Xor using Vector and Span if (Vector.IsHardwareAccelerated) { Span<Vector<byte>> bufferVector = MemoryMarshal.Cast<byte, Vector<byte>>(buffer); Span<Vector<byte>> xorVector = MemoryMarshal.Cast<byte, Vector<byte>>(xor); for (int i = 0; i < bufferVector.Length; i++) { bufferVector[i] ^= xorVector[i]; } }// Using Marshal byte[] data = new byte[64]; // Copy from native memory Marshal.Copy(memory, data, 0, 64); // Copy to native memory Marshal.Copy(data, 0, memory, 64); // Using Buffer unsafe { // Copy from native memory fixed (byte* destination = &data[0]) { Buffer.MemoryCopy((byte*)memory, destination, 64, 64); } // Copy to native memory fixed (byte* source = &data[0]) { Buffer.MemoryCopy(source, (byte*)memory, 64, 64); } }// Define a custom structure struct Entity { public uint id; public byte health; public byte state; } int entitySize = Marshal.SizeOf(typeof(Entity)); int entityCount = 10; // Allocate memory block IntPtr memory = smmalloc.Malloc(entitySize * entityCount); // Create Span using native memory block Span<Entity> entities; unsafe { entities = new Span<Entity>((void*)memory, entityCount); } // Do some stuff uint id = 1; for (int i = 0; i < entities.Length; i++) { entities[i].id = id++; entities[i].health = (byte)(new Random().Next(1, 100)); entities[i].state = (byte)(new Random().Next(1, 255)); } // Release memory block smmalloc.Free(memory);Definitions of warmup options for CreateThreadCache() function:
CacheWarmupOptions.Cold warmup not performed for cache elements.
CacheWarmupOptions.Warm warmup performed for half of the cache elements.
CacheWarmupOptions.Hot warmup performed for all cache elements.
A single low-level disposable class is used to work with smmalloc.
Contains a managed pointer to the smmalloc instance.
SmmallocInstance(uint bucketsCount, int bucketSize) creates allocator instance with a memory pool. Size of memory blocks in each bucket increases with a count of buckets. The bucket size parameter sets an initial size of a pooled memory in bytes.
SmmallocInstance.Dispose() destroys the smmalloc instance and frees allocated memory.
SmmallocInstance.CreateThreadCache(int cacheSize, CacheWarmupOptions warmupOption) creates thread cache for fast memory allocations within a thread. The warmup option sets pre-allocation degree of cache elements.
SmmallocInstance.DestroyThreadCache() destroys the thread cache. Should be called before the end of the thread's life cycle.
SmmallocInstance.Malloc(int bytesCount, int alignment) allocates aligned memory block. Allocation size depends on buckets count multiplied by 16, so the minimum allocation size is 16 bytes. Maximum allocation size using two buckets in a smmalloc instance will be 32 bytes, for three buckets 48 bytes, for four 64 bytes, and so on. The alignment parameter is optional. Returns pointer to a memory block. Returns a pointer to an allocated memory block.
SmmallocInstance.Free(IntPtr memory) frees memory block. A managed array or pointer to pointers with length can be used instead of a pointer to memory block to free a batch of memory.
SmmallocInstance.Realloc(IntPtr memory, int bytesCount, int alignment) reallocates memory block. The alignment parameter is optional. Returns a pointer to a reallocated memory block.
SmmallocInstance.Size(IntPtr memory) gets usable memory size. Returns size in bytes.
SmmallocInstance.Bucket(IntPtr memory) gets bucket index of a memory block. Returns placement index.
