Frame Allocators
This section explains how to optimize coroutine frame allocation.
Code examples assume using namespace boost::capy; is in effect.
|
Default Allocation
By default, coroutine frames are heap-allocated:
task<int> compute() // Frame allocated with new
{
int x = 42;
co_return x;
}
For most applications, this is fine. Optimize only when profiling shows allocation overhead.
Frame Recycling
Capy’s frame allocator recycles frames through a thread-local free list:
Thread-Local Free List:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ 128B │ -> │ 128B │ -> │ 128B │ -> null
└─────────┘ └─────────┘ └─────────┘
┌─────────┐
│ 256B │ -> null
└─────────┘
When a coroutine completes:
-
Frame returned to free list (binned by size)
-
Next coroutine of similar size reuses the frame
-
No heap allocation for steady-state operation
This is automatic when using run_async.
How It Works
run_async manages the allocation window:
run_async(pool.get_executor())(my_task());
// ↑ sets TLS allocator ↑↑ task created with allocator ↑
The two-call syntax ensures:
-
First call sets thread-local allocator
-
Second call creates task (frame uses TLS)
-
TLS cleared after task creation
Propagation to Children
Child coroutines inherit the allocator:
task<void> child()
{
// Uses parent's allocator
co_return;
}
task<void> parent()
{
co_await child(); // child uses our allocator
}
run_async(ex)(parent()); // Sets up allocator for entire tree
The mechanism:
-
parent’s `initial_suspendcaptures TLS allocator -
When awaiting
child, TLS is restored before creation -
childuses the same allocator -
TLS restored for
parentafterchildcompletes
Custom Allocators
For special requirements, provide a custom allocator:
struct my_allocator
{
void* allocate(std::size_t size)
{
return my_pool_.allocate(size);
}
void deallocate(void* ptr, std::size_t size)
{
my_pool_.deallocate(ptr, size);
}
private:
memory_pool my_pool_;
};
// Use with run_async
my_allocator alloc;
run_async(ex, alloc)(my_task());
Your allocator must satisfy FrameAllocator:
template<typename A>
concept FrameAllocator = requires(A& a, std::size_t size, void* ptr)
{
{ a.allocate(size) } -> std::same_as<void*>;
a.deallocate(ptr, size);
};
HALO Optimization
Profiling Allocation
Before optimizing, measure:
// Count allocations
std::atomic<int> alloc_count{0};
struct counting_allocator
{
void* allocate(std::size_t size)
{
++alloc_count;
return ::operator new(size);
}
void deallocate(void* ptr, std::size_t size)
{
::operator delete(ptr, size);
}
};
// Run benchmark
counting_allocator alloc;
for (int i = 0; i < 10000; ++i)
run_async(ex, alloc)(benchmark_task());
std::cout << "Allocations: " << alloc_count << "\n";
If allocation is the bottleneck:
-
Check HALO eligibility first (free optimization)
-
Use frame recycling (default with
run_async) -
Consider custom allocators for specific patterns
When to Optimize
Default allocator is fine when:
-
Typical request rates (< 100K/sec)
-
Mixed workloads
-
Simplicity is preferred
Optimize allocation when:
-
Very high throughput (> 100K coroutines/sec)
-
Allocation shows up in profiles
-
Memory fragmentation is a concern
Summary
| Technique | Benefit |
|---|---|
Frame recycling |
Reuse frames via thread-local free list |
HALO |
Compiler eliminates allocation entirely |
Custom allocator |
Application-specific allocation strategy |
Two-call syntax |
Ensures allocator is active during creation |
Next Steps
-
Executors and Strands — Execution contexts
-
The Allocator — Allocation window details