Dan Henry - Hobby Computer Scientist

So I was up late the other night with nothing particularly useful to do, and I started thinking about how malloc() and free() actually work under the hood. I did a dive on Wikipedia and YouTube and I realized that I think I can probably create this, it's actually pretty simple.

I've always been curious about memory management - it's one of those fundamental computer science concepts that most developers take for granted. We just call malloc() and free() (actually, most people probably have no clue even what these are, they're just varing all over the place) without thinking about the complex algorithms working behind the scenes. So I decided to implement a simple pool allocator.

What's a Pool Allocator?

For those who haven't gone down this particular rabbit hole, a pool allocator pre-allocates large chunks of memory and divides them into fixed-size blocks. When you need memory, you get a block from the pool. When you're done, you return it to the pool. It's much simpler than a general-purpose allocator, but that's the point - simplicity leads to speed (I think?).

Pool allocators are particularly good for:

Many small objects of similar sizes
High allocation/deallocation frequency
Predictable allocation patterns

They actually get used pretty often over the general purpose allocator in the c standard library for hyper-optimized software like games.

The Implementation

Here's the basic structure I came up with:

typedef struct MemoryBlock {
    struct MemoryBlock* next;
    // The rest of the block is available for user data
} MemoryBlock;
 
typedef struct MemoryPool {
    void* pool_start;         // Start of the memory region
    size_t block_size;        // Size of each block
    size_t num_blocks;        // Total number of blocks
    size_t free_blocks;       // Number of available blocks
    MemoryBlock* free_list;   // Linked list of free blocks
} MemoryPool;

The key idea here is that we maintain a linked list of free blocks. When a block is free, we can use its memory to store a pointer to the next free block. When a block is allocated, the user gets the entire block (including the space used by the next pointer) to use however they want.

I won't bore you with all the implementation details (though I've posted the full code here if you're curious), but the essence is pretty simple:

The pool_create function allocates a chunk of memory and links all the blocks together into a free list
The pool_alloc function grabs the first block from the free list and returns it
The pool_free function adds the block back to the free list
The pool_destroy function cleans up the entire pool

Things I Learned Along the Way

Security Considerations

One interesting thing I learned is that this simple implementation is vulnerable to use-after-free and double-free bugs. If a client frees a block and then continues to use it, they might be accessing memory that has been reallocated to someone else. Similarly, if a block is freed twice, it will be added to the free list twice, potentially causing corruption.

Out of curiosity, I added some debugging features to detect these issues:

#ifdef DEBUG_ALLOCATOR
typedef struct DebugMemoryBlock {
    struct DebugMemoryBlock* next;
    uint32_t magic;  // Magic number to detect use-after-free
    bool is_allocated;  // Track allocation state
    // The rest of the block is available for user data
} DebugMemoryBlock;
 
#define MAGIC_ALLOCATED 0xABBABABA
#define MAGIC_FREED 0xDEADBEEF
 
// Modified pool_alloc and pool_free to set/check these values
#endif

With this in place, I can detect use-after-free bugs (the magic number will be wrong) and double-free bugs (the is_allocated flag will be false) but this of course comes with it's own performance penalties as well. I don't actually need this for anything, but it was cool to implement.

Alignment Considerations

I had no idea about alignment or that platforms require certain data types to be aligned to specific boundaries (e.g., 4 or 8 bytes). In hindsight this feels a bit obvious, but, fi the memory isn't properly aligned, you'll get a crash or serious performance penalty.

I solved this by ensuring that each block is aligned to an 8-byte boundary, which is sufficient for most data types on 64-bit platforms:

// Align block_size to 8-byte boundary
block_size = (block_size + 7) & ~7;

Performance Testing

Just for fun, I ran some benchmarks to compare my pool allocator with malloc/free:

#define NUM_ALLOCATIONS 1000000
#define BLOCK_SIZE 32

I ran two types of tests:

Simple sequential allocation and immediate freeing
A more realistic random pattern with intermixed allocations and frees

The results were... well, not earth-shattering, but still pretty cool:

Running benchmark with 1000000 allocations of 32 bytes each...
malloc/free time: 0.021569 seconds
pool allocator time: 0.011922 seconds
Speedup: 1.81x

Running random allocation/free pattern benchmark...
malloc/free random pattern time: 0.021406 seconds
pool allocator random pattern time: 0.015716 seconds
Random pattern speedup: 1.36x

A 1.81x speedup in the simple case and a 1.36x speedup in the random pattern! Take that Dennis Ritchie! (Just kidding I love you Dennis, RIP) Not bad for a few hours of work, even if it's not the mind-blowing 10x improvement I was secretly hoping for.

What's fascinating is that the system malloc implementation is actually pretty damn efficient already. I was expecting a bigger difference, but modern malloc implementations have been optimized to hell and back. The fact that my simple implementation still outperforms it at all is actually pretty satisfying.

Trade-offs

The pool allocator is faster, but it has limitations:

Fixed block sizes
Can't resize allocations
Wasted space if you allocate blocks much larger than needed
Memory fragmentation if many pools are used

General-purpose allocators like malloc() solve these problems, but at the cost of some performance. It's all about trade-offs.

Was It Worth It?

Yeah, I think so. It at least wasn't a waste of time. It gave me a peek at the deepest workings of things that most developers don't even consider, let alone use directly. In 2023 the significant majority of development is being done in JavaScript which doesn't even come close to thinking about how their objects or variables are being stored in memory.

The most satisfying part wasn't even the performance - it was seeing my own code actually work and behave as expected. There's something deeply satisfying about building foundational components like this, even if they're not strictly necessary.

I'm inspired to try some of the other memory algorithms like the buddy or slab allocators in the future.

Writing a Custom Memory Allocator