Skip to content

Poor performance scaling with allocation size with vsg::IntrusiveAllocator #1679

@AnyOldName3

Description

@AnyOldName3

Describe the bug
While vsg::IntrusiveAllocator is comparable to std::malloc/operator new for really small allocations, for larger ones, even if they're still pretty small, it's much slower.

Running a profiler shows nearly all the CPU time is spent in IntrusiveAllocator::MemoryBlock::allocate checking the status of the slot at memory[freePosition] here https://github.com/vsg-dev/VulkanSceneGraph/blob/master/src/vsg/core/IntrusiveAllocator.cpp#L135 (which is probably really the fact that it's got to load memory[freePosition].status into a register).

The stair stepping in the graph happens every eight bytes. I'm running this on a CPU documented as having 64-byte L1 cache lines, so that makes the allocator's alignment the likely source of that number.

To Reproduce

Here's a nice simple test app:

#include <vsg/all.h>

#include <chrono>
#include <iostream>

int main(int argc, char** argv)
{
    // set up defaults and read command line arguments to override them
    vsg::CommandLine arguments(&argc, argv);

    // Allocaotor related command line settings
    size_t size = 1;
    arguments.read("--size", size);
    size_t count = 100000000;
    arguments.read("--count", count);
    bool malloc = arguments.read("--malloc");
    bool operatorNew = arguments.read("--new");

    if (malloc && operatorNew)
    {
        std::cerr << "Cannot test malloc and operator new at the same time.";
        return 1;
    }

    auto start = vsg::clock::now();
    if (malloc)
    {
        for (size_t i = 0; i < count; ++i)
            std::malloc(size);
    }
    else if (operatorNew)
    {
        for (size_t i = 0; i < count; ++i)
            operator new(size);
    }
    else
    {
        for (size_t i = 0; i < count; ++i)
            vsg::Allocator::instance()->allocate(size);
    }
    auto end = vsg::clock::now();

    double duration = std::chrono::duration<double, std::chrono::milliseconds::period>(end - start).count();
    std::string allocator = malloc ? "malloc" : (operatorNew ? "operator new" : "vsg::Allocator");
    std::cout << "It took " << duration << " ms to allocate " << count << " " << size << "-byte objects with " << allocator << "." << std::endl;
}

Expected behavior
It not taking much longer to do small allocations than tiny allocations.

Screenshots
Here's a graph of allocation size vs time in milliseconds for a hundred million allocations of that size:

Image

Desktop (please complete the following information):

  • OS: Windows, but I'd be very surprised if that matters
  • Browser: I think this probably shouldn't be a field in the issue template.
  • Version: All the relevant files are identical to the tip of master.

Additional context
This might not have mattered most of the time - if the cost was just that it was loading memory into L1 when allocating it, then as most of the time, memory is written to as soon as it's allocated, there'd be activity for the same memory range anyway. To confirm this one way or the other, I altered the test to call std::memset on the memory returned by each allocation. This made it a little slower, so the memset call wasn't optimised out, but mallocing and memsetting loads of memory was still far cheaper than doing the same with vsg::IntrusiveAllocator, so that's a strong indication that this is a real problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions