44. Execution Graphs
Execution graphs provide a way for applications to dispatch multiple operations dynamically from a single initial command on the host. To achieve this, a new execution graph pipeline is provided, that links together multiple shaders or pipelines which each describe one or more operations that can be dispatched within the execution graph. Each linked pipeline or shader describes an execution node within the graph, which can be dispatched dynamically from another shader within the same graph. This allows applications to describe much richer execution topologies at a finer granularity than would typically be possible with API commands alone.
44.1. Pipeline Creation
To create execution graph pipelines, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkCreateExecutionGraphPipelinesAMDX(
VkDevice device,
VkPipelineCache pipelineCache,
uint32_t createInfoCount,
const VkExecutionGraphPipelineCreateInfoAMDX* pCreateInfos,
const VkAllocationCallbacks* pAllocator,
VkPipeline* pPipelines);
-
deviceis the logical device that creates the execution graph pipelines. -
pipelineCacheis either VK_NULL_HANDLE, indicating that pipeline caching is disabled; or the handle of a valid pipeline cache object, in which case use of that cache is enabled for the duration of the command. -
createInfoCountis the length of thepCreateInfosandpPipelinesarrays. -
pCreateInfosis a pointer to an array of VkExecutionGraphPipelineCreateInfoAMDX structures. -
pAllocatorcontrols host memory allocation as described in the Memory Allocation chapter. -
pPipelinesis a pointer to an array of VkPipeline handles in which the resulting execution graph pipeline objects are returned.
Pipelines are created and returned as described for Multiple Pipeline Creation.
The VkExecutionGraphPipelineCreateInfoAMDX structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkExecutionGraphPipelineCreateInfoAMDX {
VkStructureType sType;
const void* pNext;
VkPipelineCreateFlags flags;
uint32_t stageCount;
const VkPipelineShaderStageCreateInfo* pStages;
const VkPipelineLibraryCreateInfoKHR* pLibraryInfo;
VkPipelineLayout layout;
VkPipeline basePipelineHandle;
int32_t basePipelineIndex;
} VkExecutionGraphPipelineCreateInfoAMDX;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
flagsis a bitmask of VkPipelineCreateFlagBits specifying how the pipeline will be generated. -
stageCountis the number of entries in thepStagesarray. -
pStagesis a pointer to an array ofstageCountVkPipelineShaderStageCreateInfo structures describing the set of the shader stages to be included in the execution graph pipeline. -
pLibraryInfois a pointer to a VkPipelineLibraryCreateInfoKHR structure defining pipeline libraries to include. -
layoutis the description of binding locations used by both the pipeline and descriptor sets used with the pipeline. -
basePipelineHandleis a pipeline to derive from -
basePipelineIndexis an index into thepCreateInfosparameter to use as a pipeline to derive from
The parameters basePipelineHandle and basePipelineIndex are
described in more detail in Pipeline
Derivatives.
Each shader stage provided when creating an execution graph pipeline
(including those in libraries) is associated with a name and an index,
determined by the inclusion or omission of a
VkPipelineShaderStageNodeCreateInfoAMDX structure in its pNext
chain.
In addition to the shader name and index, an internal "node index" is also generated for each node, which can be queried with vkGetExecutionGraphPipelineNodeIndexAMDX, and is used exclusively for initial dispatch of an execution graph.
VK_SHADER_INDEX_UNUSED_AMDX is a special shader index used to indicate
that the created node does not override the index.
In this case, the shader index is determined through other means.
It is defined as:
#define VK_SHADER_INDEX_UNUSED_AMDX (~0U)
The VkPipelineShaderStageNodeCreateInfoAMDX structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkPipelineShaderStageNodeCreateInfoAMDX {
VkStructureType sType;
const void* pNext;
const char* pName;
uint32_t index;
} VkPipelineShaderStageNodeCreateInfoAMDX;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
pNameis the shader name to use when creating a node in an execution graph. IfpNameisNULL, the name of the entry point specified in SPIR-V is used as the shader name. -
indexis the shader index to use when creating a node in an execution graph. IfindexisVK_SHADER_INDEX_UNUSED_AMDXthen the original index is used, either as specified by theShaderIndexAMDXexecution mode, or0if that too is not specified.
When included in the pNext chain of a
VkPipelineShaderStageCreateInfo structure, this structure specifies
the shader name and shader index of a node when creating an execution graph
pipeline.
If this structure is omitted, the shader name is set to the name of the
entry point in SPIR-V and the shader index is set to 0.
When dispatching a node from another shader, the name is fixed at pipeline creation, but the index can be set dynamically. By associating multiple shaders with the same name but different indexes, applications can dynamically select different nodes to execute. Applications must ensure each node has a unique name and index.
To query the internal node index for a particular node in an execution graph, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkGetExecutionGraphPipelineNodeIndexAMDX(
VkDevice device,
VkPipeline executionGraph,
const VkPipelineShaderStageNodeCreateInfoAMDX* pNodeInfo,
uint32_t* pNodeIndex);
-
deviceis the thatexecutionGraphwas created on. -
executionGraphis the execution graph pipeline to query the internal node index for. -
pNodeInfois a pointer to a VkPipelineShaderStageNodeCreateInfoAMDX structure identifying the name and index of the node to query. -
pNodeIndexis the returned internal node index of the identified node.
Once this function returns, the contents of pNodeIndex contain the
internal node index of the identified node.
44.2. Initializing Scratch Memory
Implementations may need scratch memory to manage dispatch queues or similar when executing a pipeline graph, and this is explicitly managed by the application.
To query the scratch space required to dispatch an execution graph, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkGetExecutionGraphPipelineScratchSizeAMDX(
VkDevice device,
VkPipeline executionGraph,
VkExecutionGraphPipelineScratchSizeAMDX* pSizeInfo);
-
deviceis the thatexecutionGraphwas created on. -
executionGraphis the execution graph pipeline to query the scratch space for. -
pSizeInfois a pointer to a VkExecutionGraphPipelineScratchSizeAMDX structure that will contain the required scratch size.
After this function returns, information about the scratch space required
will be returned in pSizeInfo.
The VkExecutionGraphPipelineScratchSizeAMDX structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkExecutionGraphPipelineScratchSizeAMDX {
VkStructureType sType;
void* pNext;
VkDeviceSize size;
} VkExecutionGraphPipelineScratchSizeAMDX;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
sizeindicates the scratch space required for dispatch the queried execution graph.
To initialize scratch memory for a particular execution graph, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdInitializeGraphScratchMemoryAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch);
-
commandBufferis the command buffer into which the command will be recorded. -
scratchis a pointer to the scratch memory to be initialized.
This command must be called before using scratch to dispatch the
currently bound execution graph pipeline.
Execution of this command may modify any memory locations in the range
[scratch,scratch + size), where size is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.
If any portion of scratch is modified by any command other than
vkCmdDispatchGraphAMDX, vkCmdDispatchGraphIndirectAMDX,
vkCmdDispatchGraphIndirectCountAMDX, or
vkCmdInitializeGraphScratchMemoryAMDX with the same execution graph,
it must be reinitialized for the execution graph again before dispatching
against it.
44.3. Dispatching a Graph
Initial dispatch of an execution graph is done from the host in the same way as any other command, and can be used in a similar way to compute dispatch commands, with indirect variants available.
To record an execution graph dispatch, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
const VkDispatchGraphCountInfoAMDX* pCountInfo);
-
commandBufferis the command buffer into which the command will be recorded. -
scratchis a pointer to the scratch memory to be used. -
pCountInfois a host pointer to a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in pCountInfo are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all device/host pointers in substructures are treated as host pointers and read only during host execution of this command. Once this command returns, no reference to the original pointers is retained.
Execution of this command may modify any memory locations in the range
[scratch,scratch + size), where size is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.
To record an execution graph dispatch with node and payload parameters read on device, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphIndirectAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
const VkDispatchGraphCountInfoAMDX* pCountInfo);
-
commandBufferis the command buffer into which the command will be recorded. -
scratchis a pointer to the scratch memory to be used. -
pCountInfois a host pointer to a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in pCountInfo are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all device/host pointers in substructures are treated as
device pointers and read during device execution of this command.
The allocation and contents of these pointers only needs to be valid during
device execution.
All of these addresses will be read in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT access flag.
Execution of this command may modify any memory locations in the range
[scratch,scratch + size), where size is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.
To record an execution graph dispatch with all parameters read on device, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphIndirectCountAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
VkDeviceAddress countInfo);
-
commandBufferis the command buffer into which the command will be recorded. -
scratchis a pointer to the scratch memory to be used. -
countInfois a device address of a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in countInfo are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all pointers in substructures are treated as device
pointers and read during device execution of this command.
The allocation and contents of these pointers only needs to be valid during
device execution.
All of these addresses will be read in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT access flag.
Execution of this command may modify any memory locations in the range
[scratch,scratch + size), where size is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT access flags.
The VkDeviceOrHostAddressConstAMDX union is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef union VkDeviceOrHostAddressConstAMDX {
VkDeviceAddress deviceAddress;
const void* hostAddress;
} VkDeviceOrHostAddressConstAMDX;
-
deviceAddressis a buffer device address as returned by the vkGetBufferDeviceAddressKHR command. -
hostAddressis a const host memory address.
The VkDispatchGraphCountInfoAMDX structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkDispatchGraphCountInfoAMDX {
uint32_t count;
VkDeviceOrHostAddressConstAMDX infos;
uint64_t stride;
} VkDispatchGraphCountInfoAMDX;
-
countis the number of dispatches to perform. -
infosis the device or host address of a flat array of VkDispatchGraphInfoAMDX structures -
strideis the byte stride between successive VkDispatchGraphInfoAMDX structures ininfos
Whether infos is consumed as a device or host pointer is defined by
the command this structure is used in.
The VkDispatchGraphInfoAMDX structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkDispatchGraphInfoAMDX {
uint32_t nodeIndex;
uint32_t payloadCount;
VkDeviceOrHostAddressConstAMDX payloads;
uint64_t payloadStride;
} VkDispatchGraphInfoAMDX;
-
nodeIndexis the index of a node in an execution graph to be dispatched. -
payloadCountis the number of payloads to dispatch for the specified node. -
payloadsis a device or host address pointer to a flat array of payloads with size equal to the product ofpayloadCountandpayloadStride -
payloadStrideis the byte stride between successive payloads inpayloads
Whether payloads is consumed as a device or host pointer is defined by
the command this structure is used in.
44.4. Shader Enqueue
Compute shaders in an execution graph can use the
OpInitializeNodePayloadsAMDX to initialize nodes for dispatch.
Any node payload initialized in this way will be enqueued for dispatch once
the shader is done writing to the payload.
As compilers may be conservative when making this determination, shaders
can further call OpFinalizeNodePayloadsAMDX to guarantee that the
payload is no longer being written.
The Node Name operand of the PayloadNodeNameAMDX decoration
on a payload identifies the shader name of the node to be enqueued, and the
Shader Index operand of OpInitializeNodePayloadsAMDX
identifies the shader index.
A node identified in this way is dispatched as described in the following
sections.
44.4.1. Compute Nodes
Compute shaders added as nodes to an execution graph are executed
differently based on the presence or absence of the
StaticNumWorkgroupsAMDX or CoalescingAMDX execution modes.
Dispatching a compute shader node that does not declare either the
StaticNumWorkgroupsAMDX or CoalescingAMDX execution mode will
execute a number of workgroups in each dimension specified by the first 12
bytes of the payload, interpreted as a VkDispatchIndirectCommand.
The same payload will be broadcast to each workgroup in the same dispatch.
Additional values in the payload are have no effect on execution.
Dispatching a compute shader node with the StaticNumWorkgroupsAMDX
execution mode will execute workgroups in each dimension according to the
x, y, and z size operands to the
StaticNumWorkgroupsAMDX execution mode.
The same payload will be broadcast to each workgroup in the same dispatch.
Any values in the payload have no effect on execution.
Dispatching a compute shader node with the CoalescingAMDX execution
mode will enqueue a single invocation for execution.
Implementations may combine multiple such dispatches into the same
workgroup, up to the size of the workgroup.
The number of invocations coalesced into a given workgroup in this way can
be queried via the CoalescedInputCountAMDX built-in.
Any values in the payload have no effect on execution.