The Allocator
This section explains how Capy manages coroutine frame allocation and how to optimize allocation for high-throughput scenarios.
The Timing Problem
Coroutine frames are allocated before the coroutine body runs:
task<void> work(allocator& alloc)
{
// Problem: by the time we get here, the frame is already allocated!
// We can't use 'alloc' for our frame allocation
co_return;
}
The compiler calls promise_type::operator new() before evaluating
coroutine parameters. This creates a timing constraint: how do we provide
an allocator to a coroutine if we can’t pass it through parameters?
Thread-Local Propagation
Capy solves this with thread-local state. Before creating a coroutine, a launcher sets up the frame allocator in thread-local storage:
// Conceptually:
thread_local frame_allocator* current_allocator = nullptr;
void* promise_type::operator new(std::size_t size)
{
if (current_allocator)
return current_allocator->allocate(size);
return ::operator new(size);
}
The allocation window is the period when thread-local state is active:
[launcher sets TLS] → [coroutine created] → [frame allocated] → [TLS cleared]
↑ ↑
window opens window closes
The FrameAllocator Concept
A frame allocator provides allocation and deallocation:
template<typename A>
concept FrameAllocator = requires(A& a, std::size_t size, void* ptr)
{
{ a.allocate(size) } -> std::same_as<void*>;
a.deallocate(ptr, size);
};
Frame Recycling
The default frame allocator recycles frames through a thread-local free list:
Thread-Local Free List:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ 128B │ -> │ 128B │ -> │ 128B │ -> null
└─────────┘ └─────────┘ └─────────┘
┌─────────┐ ┌─────────┐
│ 256B │ -> │ 256B │ -> null
└─────────┘ └─────────┘
When a coroutine completes:
-
Frame is returned to the free list (binned by size)
-
Next coroutine of similar size reuses the frame
-
No heap allocation needed for steady-state operation
This dramatically reduces allocation overhead for programs that repeatedly create and destroy coroutines.
Using run_async with Allocators
The run_async function manages the allocation window:
#include <boost/capy/ex/run_async.hpp>
thread_pool pool(4);
// The () syntax ensures allocator is active when task is created
run_async(pool.get_executor())(my_task());
// └─── sets up TLS ───┘└── task created while TLS active ──┘
The two-call syntax (run_async(ex)(task)) is deliberate:
-
First call sets up thread-local allocator
-
Second call creates the task (frame allocated using TLS)
-
TLS is cleared after task creation
| Don’t split the calls: |
// WRONG: TLS state may be lost between calls
auto launcher = run_async(ex); // Sets TLS
// ... other code might interfere ...
launcher(my_task()); // TLS may no longer be valid
Propagation Through Coroutine Chains
Child coroutines inherit the parent’s allocator:
task<void> child()
{
// Uses same allocator as parent
co_return;
}
task<void> parent()
{
// Allocator propagates to children
co_await child(); // child uses our allocator
}
run_async(ex)(parent());
The mechanism:
-
parent’s `initial_suspendcaptures TLS allocator -
When
parentawaitschild, it sets TLS beforechildis created -
`child’s frame uses the same allocator
-
TLS is restored after
childcompletes
Custom Allocators
For special requirements, provide a custom allocator:
struct my_allocator
{
void* allocate(std::size_t size)
{
return my_pool_.allocate(size);
}
void deallocate(void* ptr, std::size_t size)
{
my_pool_.deallocate(ptr, size);
}
private:
memory_pool my_pool_;
};
// Use with run_async
my_allocator alloc;
run_async(ex, alloc)(my_task());
HALO: Heap Allocation eLision Optimization
Compilers can sometimes eliminate frame allocation entirely:
task<int> leaf()
{
co_return 42;
}
task<int> parent()
{
// If the compiler can prove leaf's lifetime is bounded by parent,
// it may allocate leaf's frame inside parent's frame
int x = co_await leaf(); // Potential HALO
co_return x;
}
HALO requirements:
You can’t force HALO, but you can enable it:
When to Optimize Allocation
Use default allocator when:
-
Typical request rates (< 100K/sec)
-
Mixed workloads
-
Simplicity is preferred
Use custom allocator when:
-
Very high throughput (> 100K coroutines/sec)
-
Allocation shows up in profiles
-
Memory fragmentation is a concern
Rely on HALO when:
-
Deep coroutine nesting
-
Simple, scoped coroutine lifetimes
-
Compiler supports await elision
Summary
| Component | Purpose |
|---|---|
Allocation window |
Period when thread-local allocator is active |
|
Interface for custom allocators |
Frame recycling |
Thread-local free list for frame reuse |
|
Two-call syntax ensures proper TLS setup |
HALO |
Compiler optimization eliminating heap allocation |
Next Steps
-
Launching Coroutines —
run_asyncandrun_onin detail -
Frame Allocators — Advanced allocation patterns