Frame Allocators

This section explains how to optimize coroutine frame allocation.

Code examples assume using namespace boost::capy; is in effect.

Default Allocation

By default, coroutine frames are heap-allocated:

task<int> compute()  // Frame allocated with new
{
    int x = 42;
    co_return x;
}

For most applications, this is fine. Optimize only when profiling shows allocation overhead.

Frame Recycling

Capy’s frame allocator recycles frames through a thread-local free list:

Thread-Local Free List:
┌─────────┐    ┌─────────┐    ┌─────────┐
│  128B   │ -> │  128B   │ -> │  128B   │ -> null
└─────────┘    └─────────┘    └─────────┘
┌─────────┐
│  256B   │ -> null
└─────────┘

When a coroutine completes:

Frame returned to free list (binned by size)
Next coroutine of similar size reuses the frame
No heap allocation for steady-state operation

This is automatic when using run_async.

How It Works

run_async manages the allocation window:

run_async(pool.get_executor())(my_task());
//        ↑ sets TLS allocator ↑↑ task created with allocator ↑

The two-call syntax ensures:

First call sets thread-local allocator
Second call creates task (frame uses TLS)
TLS cleared after task creation

Propagation to Children

Child coroutines inherit the allocator:

task<void> child()
{
    // Uses parent's allocator
    co_return;
}

task<void> parent()
{
    co_await child();  // child uses our allocator
}

run_async(ex)(parent());  // Sets up allocator for entire tree

The mechanism:

parent’s `initial_suspend captures TLS allocator
When awaiting child, TLS is restored before creation
child uses the same allocator
TLS restored for parent after child completes

Custom Allocators

For special requirements, provide a custom allocator:

struct my_allocator
{
    void* allocate(std::size_t size)
    {
        return my_pool_.allocate(size);
    }

    void deallocate(void* ptr, std::size_t size)
    {
        my_pool_.deallocate(ptr, size);
    }

private:
    memory_pool my_pool_;
};

// Use with run_async
my_allocator alloc;
run_async(ex, alloc)(my_task());

Your allocator must satisfy FrameAllocator:

template<typename A>
concept FrameAllocator = requires(A& a, std::size_t size, void* ptr)
{
    { a.allocate(size) } -> std::same_as<void*>;
    a.deallocate(ptr, size);
};

HALO Optimization

Heap Allocation eLision Optimization lets compilers avoid allocation entirely:

task<int> leaf()
{
    co_return 42;
}

task<int> parent()
{
    // Compiler may place leaf's frame in parent's frame
    int x = co_await leaf();
    co_return x;
}

Enabling HALO

Use on task types (Clang)
Keep coroutine lifetimes scoped to the caller
Don’t store task objects in containers
Don’t return tasks from functions that await them

Capy’s task<T> includes the attribute when available.

When HALO Applies

HALO works when the compiler can prove:

The child coroutine’s lifetime is bounded by the parent
The child doesn’t escape (not stored, not returned)
The frame fits in the caller’s frame

HALO is opportunistic—you can’t force it, but you can enable it.

Profiling Allocation

Before optimizing, measure:

// Count allocations
std::atomic<int> alloc_count{0};

struct counting_allocator
{
    void* allocate(std::size_t size)
    {
        ++alloc_count;
        return ::operator new(size);
    }

    void deallocate(void* ptr, std::size_t size)
    {
        ::operator delete(ptr, size);
    }
};

// Run benchmark
counting_allocator alloc;
for (int i = 0; i < 10000; ++i)
    run_async(ex, alloc)(benchmark_task());

std::cout << "Allocations: " << alloc_count << "\n";

If allocation is the bottleneck:

Check HALO eligibility first (free optimization)
Use frame recycling (default with run_async)
Consider custom allocators for specific patterns

When to Optimize

Default allocator is fine when:

Typical request rates (< 100K/sec)
Mixed workloads
Simplicity is preferred

Optimize allocation when:

Very high throughput (> 100K coroutines/sec)
Allocation shows up in profiles
Memory fragmentation is a concern

Summary

Technique	Benefit
Frame recycling	Reuse frames via thread-local free list
HALO	Compiler eliminates allocation entirely
Custom allocator	Application-specific allocation strategy
Two-call syntax	Ensures allocator is active during creation

Technique

Benefit

Frame recycling

Reuse frames via thread-local free list

HALO

Compiler eliminates allocation entirely

Custom allocator

Application-specific allocation strategy

Two-call syntax

Ensures allocator is active during creation

Next Steps

Executors and Strands — Execution contexts
The Allocator — Allocation window details

Edit this Page