Writing a Custom Memory Allocator
We all agree that malloc() is kinda trash, right?
So I was up late the other night with nothing particularly useful to do, and I started thinking about how malloc() and free() actually work under the hood. I did a dive on Wikipedia and YouTube and I realized that I think I can probably create this, it's actually pretty simple.
I've always been curious about memory management - it's one of those fundamental computer science concepts that most developers take for granted. We just call malloc() and free() (actually, most people probably have no clue even what these are, they're just varing all over the place) without thinking about the complex algorithms working behind the scenes. So I decided to implement a simple pool allocator.
What's a Pool Allocator?
For those who haven't gone down this particular rabbit hole, a pool allocator pre-allocates large chunks of memory and divides them into fixed-size blocks. When you need memory, you get a block from the pool. When you're done, you return it to the pool. It's much simpler than a general-purpose allocator, but that's the point - simplicity leads to speed (I think?).
Pool allocators are particularly good for:
- Many small objects of similar sizes
- High allocation/deallocation frequency
- Predictable allocation patterns
They actually get used pretty often over the general purpose allocator in the c standard library for hyper-optimized software like games.
The Implementation
Here's the basic structure I came up with:
typedef struct MemoryBlock {
struct MemoryBlock* next;
// The rest of the block is available for user data
} MemoryBlock;
typedef struct MemoryPool {
void* pool_start; // Start of the memory region
size_t block_size; // Size of each block
size_t num_blocks; // Total number of blocks
size_t free_blocks; // Number of available blocks
MemoryBlock* free_list; // Linked list of free blocks
} MemoryPool;The key idea here is that we maintain a linked list of free blocks. When a block is free, we can use its memory to store a pointer to the next free block. When a block is allocated, the user gets the entire block (including the space used by the next pointer) to use however they want.
I won't bore you with all the implementation details (though I've posted the full code here if you're curious), but the essence is pretty simple:
- The
pool_createfunction allocates a chunk of memory and links all the blocks together into a free list - The
pool_allocfunction grabs the first block from the free list and returns it - The
pool_freefunction adds the block back to the free list - The
pool_destroyfunction cleans up the entire pool
Things I Learned Along the Way
Security Considerations
One interesting thing I learned is that this simple implementation is vulnerable to use-after-free and double-free bugs. If a client frees a block and then continues to use it, they might be accessing memory that has been reallocated to someone else. Similarly, if a block is freed twice, it will be added to the free list twice, potentially causing corruption.
Out of curiosity, I added some debugging features to detect these issues:
#ifdef DEBUG_ALLOCATOR
typedef struct DebugMemoryBlock {
struct DebugMemoryBlock* next;
uint32_t magic; // Magic number to detect use-after-free
bool is_allocated; // Track allocation state
// The rest of the block is available for user data
} DebugMemoryBlock;
#define MAGIC_ALLOCATED 0xABBABABA
#define MAGIC_FREED 0xDEADBEEF
// Modified pool_alloc and pool_free to set/check these values
#endifWith this in place, I can detect use-after-free bugs (the magic number will be wrong) and double-free bugs (the is_allocated flag will be false) but this of course comes with it's own performance penalties as well. I don't actually need this for anything, but it was cool to implement.
Alignment Considerations
I had no idea about alignment or that platforms require certain data types to be aligned to specific boundaries (e.g., 4 or 8 bytes). In hindsight this feels a bit obvious, but, fi the memory isn't properly aligned, you'll get a crash or serious performance penalty.
I solved this by ensuring that each block is aligned to an 8-byte boundary, which is sufficient for most data types on 64-bit platforms:
// Align block_size to 8-byte boundary
block_size = (block_size + 7) & ~7;Performance Testing
Just for fun, I ran some benchmarks to compare my pool allocator with malloc/free:
#define NUM_ALLOCATIONS 1000000
#define BLOCK_SIZE 32I ran two types of tests:
- Simple sequential allocation and immediate freeing
- A more realistic random pattern with intermixed allocations and frees
The results were... well, not earth-shattering, but still pretty cool:
Running benchmark with 1000000 allocations of 32 bytes each...
malloc/free time: 0.021569 seconds
pool allocator time: 0.011922 seconds
Speedup: 1.81x
Running random allocation/free pattern benchmark...
malloc/free random pattern time: 0.021406 seconds
pool allocator random pattern time: 0.015716 seconds
Random pattern speedup: 1.36x
A 1.81x speedup in the simple case and a 1.36x speedup in the random pattern! Take that Dennis Ritchie! (Just kidding I love you Dennis, RIP) Not bad for a few hours of work, even if it's not the mind-blowing 10x improvement I was secretly hoping for.
What's fascinating is that the system malloc implementation is actually pretty damn efficient already. I was expecting a bigger difference, but modern malloc implementations have been optimized to hell and back. The fact that my simple implementation still outperforms it at all is actually pretty satisfying.
Trade-offs
The pool allocator is faster, but it has limitations:
- Fixed block sizes
- Can't resize allocations
- Wasted space if you allocate blocks much larger than needed
- Memory fragmentation if many pools are used
General-purpose allocators like malloc() solve these problems, but at the cost of some performance. It's all about trade-offs.
Was It Worth It?
Yeah, I think so. It at least wasn't a waste of time. It gave me a peek at the deepest workings of things that most developers don't even consider, let alone use directly. In 2023 the significant majority of development is being done in JavaScript which doesn't even come close to thinking about how their objects or variables are being stored in memory.
The most satisfying part wasn't even the performance - it was seeing my own code actually work and behave as expected. There's something deeply satisfying about building foundational components like this, even if they're not strictly necessary.
I'm inspired to try some of the other memory algorithms like the buddy or slab allocators in the future.