-
Notifications
You must be signed in to change notification settings - Fork 267
Description
Describe the bug
While vsg::IntrusiveAllocator is comparable to std::malloc/operator new for really small allocations, for larger ones, even if they're still pretty small, it's much slower.
Running a profiler shows nearly all the CPU time is spent in IntrusiveAllocator::MemoryBlock::allocate checking the status of the slot at memory[freePosition] here https://github.com/vsg-dev/VulkanSceneGraph/blob/master/src/vsg/core/IntrusiveAllocator.cpp#L135 (which is probably really the fact that it's got to load memory[freePosition].status into a register).
The stair stepping in the graph happens every eight bytes. I'm running this on a CPU documented as having 64-byte L1 cache lines, so that makes the allocator's alignment the likely source of that number.
To Reproduce
Here's a nice simple test app:
#include <vsg/all.h>
#include <chrono>
#include <iostream>
int main(int argc, char** argv)
{
// set up defaults and read command line arguments to override them
vsg::CommandLine arguments(&argc, argv);
// Allocaotor related command line settings
size_t size = 1;
arguments.read("--size", size);
size_t count = 100000000;
arguments.read("--count", count);
bool malloc = arguments.read("--malloc");
bool operatorNew = arguments.read("--new");
if (malloc && operatorNew)
{
std::cerr << "Cannot test malloc and operator new at the same time.";
return 1;
}
auto start = vsg::clock::now();
if (malloc)
{
for (size_t i = 0; i < count; ++i)
std::malloc(size);
}
else if (operatorNew)
{
for (size_t i = 0; i < count; ++i)
operator new(size);
}
else
{
for (size_t i = 0; i < count; ++i)
vsg::Allocator::instance()->allocate(size);
}
auto end = vsg::clock::now();
double duration = std::chrono::duration<double, std::chrono::milliseconds::period>(end - start).count();
std::string allocator = malloc ? "malloc" : (operatorNew ? "operator new" : "vsg::Allocator");
std::cout << "It took " << duration << " ms to allocate " << count << " " << size << "-byte objects with " << allocator << "." << std::endl;
}Expected behavior
It not taking much longer to do small allocations than tiny allocations.
Screenshots
Here's a graph of allocation size vs time in milliseconds for a hundred million allocations of that size:
Desktop (please complete the following information):
- OS: Windows, but I'd be very surprised if that matters
- Browser: I think this probably shouldn't be a field in the issue template.
- Version: All the relevant files are identical to the tip of master.
Additional context
This might not have mattered most of the time - if the cost was just that it was loading memory into L1 when allocating it, then as most of the time, memory is written to as soon as it's allocated, there'd be activity for the same memory range anyway. To confirm this one way or the other, I altered the test to call std::memset on the memory returned by each allocation. This made it a little slower, so the memset call wasn't optimised out, but mallocing and memsetting loads of memory was still far cheaper than doing the same with vsg::IntrusiveAllocator, so that's a strong indication that this is a real problem.