How to use bindless resources in Vulkan

Jul 8, 2021 ยท 5 minute read

Managing DescriptorSets is probably the most difficult part of using Vulkan. In this post I will describe 6 steps to easily implement bindless resources in Vulkan.

What is bindless? ๐Ÿ”—

Normally when you need to read an image in a graphics api you have to bind it before using, in Vulkan it means that you need to allocate a DescriptorSet bind the image to the DescriptorSet then bind it to the CommandBuffer using a compatible PipelineLayout.

This makes managing and tracking the life time of the DescriptorSet a really dauting task.

Bindless in the context of Vulkan just means that an resource be it an image or buffer will be addressed by a single integer number.

While using the DescriptorSet without bindless probably best describes what hardware does under the hood and also is faster. It makes that usage of resources really bothersome by needing to bind descriptor sets and also to manage them.

1. Enable extension ๐Ÿ”—

First thing you need to do is enable the extension VK_EXT_descriptor_indexing. If you are using Vulkan 1.2 you don’t need to enable as it became part of the core.

2. Enable features ๐Ÿ”—

The extension will provide features that must be manually enabled. This can be done at the device creation time.

auto feature_descriptorIndexing = vk::PhysicalDeviceDescriptorIndexingFeatures()
    // Enable non sized arrays
    .setRuntimeDescriptorArray(true)
    // Enable non bound descriptors slots
    .setDescriptorBindingPartiallyBound(true)
    // Enable non uniform array indexing 
    // (#extension GL_EXT_nonuniform_qualifier : require)
    .setShaderStorageBufferArrayNonUniformIndexing(true)
    .setShaderSampledImageArrayNonUniformIndexing(true)
    .setShaderStorageImageArrayNonUniformIndexing(true)
    // All of these enables to update after the
    // commandbuffer used the bindDescriptorsSet
    .setDescriptorBindingStorageBufferUpdateAfterBind(true)
    .setDescriptorBindingSampledImageUpdateAfterBind(true)
    .setDescriptorBindingStorageImageUpdateAfterBind(true);

// When creating the device
auto deviceCreateInfo = vk::DeviceCreateInfo()
    ...
    .setPNext(feature_descriptorIndexing);

3. Create the descriptor set ๐Ÿ”—

First we need to define what binding each descriptor type will be, and we also need the max count of the descriptor, which usually have a extremely high max value so we should not worry about running out of slots.

// Select a binding for each descriptor type
constexpr int STORAGE_BINDING = 0;
constexpr int SAMPLER_BINDING = 1;
constexpr int IMAGE_BINDING = 2;
// Max count of each descriptor type
// You can query the max values for these with
// physicalDevice.getProperties().limits.maxDescriptrorSet*******
constexpr int STORAGE_COUNT = 65536;
constexpr int SAMPLER_COUNT = 65536;
constexpr int IMAGE_COUNT = 65536;

Then we can create the descriptor pool with the max count of each descriptor type.

Remember that the poolSizes’s elements should be in the same order of the bindings.

// Pool Sizes
std::vector<vk::DescriptorPoolSize> poolSizes = {
    {vk::DescriptorType::eStorageBuffer, STORAGE_COUNT},
    {vk::DescriptorType::eCombinedImageSampler, SAMPLER_COUNT},
    {vk::DescriptorType::eStorageImage, IMAGE_COUNT},
};
// Create descriptor pool
auto descriptorPool = device.createDescriptorPool(vk::DescriptorPoolCreateInfo()
	.setPoolSizes(poolSizes)
	.setMaxSets(1)
	.setFlags(vk::DescriptorPoolCreateFlagBits::eUpdateAfterBind)
);

Now we need a global and the only one DescriptorSetLayout.

We will have only one DescriptorSet so we need to set the stage to vk::ShaderStageFlagBits::eAll to enable us to access the bindings on compute and graphics shaders

// Vector of all the bindings slots and count
auto bindings = {
    vk::DescriptorSetLayoutBinding()
        .setBinding(BINDING_STORAGE)
        .setDescriptorType(vk::DescriptorType::eStorageBuffer)
        .setDescriptorCount(STORAGE_COUNT)
        .setStageFlags(vk::ShaderStageFlagBits::eAll),
    vk::DescriptorSetLayoutBinding()
        .setBinding(BINDING_SAMPLER)
        .setDescriptorType(vk::DescriptorType::eCombinedImageSampler)
        .setDescriptorCount(SAMPLER_COUNT)
        .setStageFlags(vk::ShaderStageFlagBits::eAll),
    vk::DescriptorSetLayoutBinding()
        .setBinding(BINDING_IMAGE)
        .setDescriptorType(vk::DescriptorType::eStorageImage)
        .setDescriptorCount(IMAGE_COUNT)
        .setStageFlags(vk::ShaderStageFlagBits::eAll),
};

//Flag each set's bindings as partiallyBound and updateAfterBind features
vk::DescriptorSetLayoutBindingFlagsCreateInfo setLayoutBindingsFlags = {};
std::vector<vk::DescriptorBindingFlags> bindingFlags = {
    vk::DescriptorBindingFlagBits::ePartiallyBound | vk::DescriptorBindingFlagBits::eUpdateAfterBind,
    vk::DescriptorBindingFlagBits::ePartiallyBound | vk::DescriptorBindingFlagBits::eUpdateAfterBind,
    vk::DescriptorBindingFlagBits::ePartiallyBound | vk::DescriptorBindingFlagBits::eUpdateAfterBind
};
setLayoutBindingsFlags.setBindingFlags(bindingFlags);

// Create a single descriptor set layout that will be on the set = 0 slot
auto descriptorSetLayout = device.createDescriptorSetLayout(
    vk::DescriptorSetLayoutCreateInfo()
    .setBindings(bindings)
    .setFlags(vk::DescriptorSetLayoutCreateFlagBits::eUpdateAfterBindPool) // updateAfterBind feature
    .setPNext(&setLayoutBindingsFlags)
);

// Allocate single global descriptor set
// will be bound at the start of a CommandBuffer
auto sets = { descriptorSetLayout }; // only a single set = 0
descriptorSet = device.allocateDescriptorSets(vk::DescriptorSetAllocateInfo()
	.setDescriptorPool(descriptorPool)
	.setDescriptorSetCount(1)
	.setSetLayouts(sets)
)[0];

4. Create the pipeline layout ๐Ÿ”—

After we have created the DescriptorSet, to do anything with it we also need to create a compatible PipelineLayout. This layout will be used to create the GraphicsPipeline, ComputePipeline, bind the DescriptorSet and also to push constant.

auto sets = { descriptorSetLayout }; // only a single set = 0
auto pushConstant = {vk::PushConstantRange()
    .setOffset(0)
    .setSize(128) // 128 bytes (guaranteed minimum size)
    .setStageFlags(vk::ShaderStageFlagBits::eAll) // all stages
};
// Create pipeline layout (will be used in all the Graphics/Compute pipelines)
auto pipelineLayout = device.createPipelineLayout(vk::PipelineLayoutCreateInfo()
	.setSetLayouts(sets)
	.setPushConstantRanges(pushConstant)
);

5. Bind the resources ๐Ÿ”—

At every resources creation (image, buffer and sampler) you need to bind to the descriptor set.

You may want to defer the descriptors update so you can update all of them in a single batch, to do this you can just pick the ids normally and append the vk::WriteDescriptorSet in a list. So before submitting the command buffer you simple update the descriptor with a single command then clear the list.

vk::Buffer buffer = createBuffer(...);
auto bufferSize = ...;
// This resourceID should be done for each different resource type
// You can reuse resourceID by creating a list of deleted resources ids
// You don't need to unbind the resourceID before the deleting the resource
int newResourceID = oldResourceID++;
device.updateDescriptorSets({ vk::WriteDescriptorSet()
	.setBufferInfo(vk::DescriptorBufferInfo().setBuffer(buffer).setOffset(0).setRange(bufferSize))
	.setDescriptorCount(1)
	.setDescriptorType(vk::DescriptorType::eStorageBuffer) // or eCombinedImageSampler, eStorageImage
	.setDstSet(descriptorSet)
	.setDstBinding(BINDING_STORAGE) // or BINDING_SAMPLER, BINDING_IMAGE
	.setDstArrayElement(newResourceID)
}, {});

6. Use the bindless resources ๐Ÿ”—

Now we need just to bind the descriptor after our command buffer begins

cmd.begin(vk::CommandBufferBeginInfo().setFlags(vk::CommandBufferUsageFlagBits::eOneTimeSubmit));
// if you don't need compute you can omit this line
cmd.bindDescriptorSets(vk::PipelineBindPoint::eCompute, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
cmd.bindDescriptorSets(vk::PipelineBindPoint::eGraphics, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
struct PushConstant{
    int textureRID;
}
PushConstant pc;
pc.textureRID = newResourceID;
cmd.pushConstants(pipelineLayout, vk::ShaderStageFlagBits::eAll, 0, sizeof(PushConstant), &pc);

Now to access the resources in the shaders, we just need to address the buffer with an integer.

layout(binding = 1) uniform sampler2D Sampler2D[];

layout(push_constant) uniform _PushConstant {
    int textureRID;
};

layout(location=0) in struct {
    vec2 uv;
} In;
layout(location=0) out vec4 out_Color;

void main(){
    out_Color = texture(Sampler2D[textureRID], In.uv);
}

Conclusion ๐Ÿ”—

Now you can access any image or buffer at any time in any shader, but not everything is perfect, remember that bindless comes with a small overhead as now you have one more indirection when trying to address the resource.

In the mobile we probably don’t want to use bindless resources, not only because the extension just launched (February 23, 2021), but also because most of performance optimizations on mobile comes from prefetching resources. But obviously there are situations where you need to use it, so don’t take this as the last word.

References ๐Ÿ”—