Skip to content

Conversation

@inner-daemons
Copy link
Collaborator

@inner-daemons inner-daemons commented Oct 31, 2025

Connections
Works towards #7197
Builds on #8370

Description

Add a SPIR-V writer for mesh shaders

All major changes here are in the SPIR-V backend for naga and naga snapshots. Other "changes" are inherited from #8370

Testing

Mesh shader test WGSL is now written to SPIR-V as a snapshot. Mesh shader tests & example use naga-written spirv on vulkan backend.

Squash or Rebase?
Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

inner-daemons and others added 30 commits August 14, 2025 12:53
inner-daemons and others added 5 commits November 26, 2025 13:19
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
@inner-daemons
Copy link
Collaborator Author

@cwfitzgerald All comments should be addressed, ready for round 2! (=2)

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments - also tests need to pass.

@inner-daemons
Copy link
Collaborator Author

I'll work on fixing the tests... at some point

@inner-daemons
Copy link
Collaborator Author

For future me to enjoy and ponder:

	<call no='157' class='pipe_context' method='create_fs_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_FRAGMENT
source_blake3: {0x36b56488, 0x1d19aa4b, 0xcbbf732d, 0x58f338fc, 0x4ff49005, 0x051203b8, 0x5a5ae0b8, 0x945f3dc4}
inputs_read: 32-33
outputs_written: 4
per_primitive_inputs: 33
subgroup_size: 0
bit_sizes_float: 0x20
bit_sizes_int: 0x20
origin_upper_left: true
inputs: 2
outputs: 1
decl_var shader_in INTERP_MODE_NONE none vec4 color (VARYING_SLOT_VAR0.xyzw, 0, 0)
decl_var per_primitive shader_in INTERP_MODE_NONE none vec4 colorMask (VARYING_SLOT_VAR1.xyzw, 1, 0)
decl_var shader_out INTERP_MODE_NONE none vec4 #0 (FRAG_RESULT_DATA0.xyzw, 0, 0)
decl_function fs_main () (entrypoint)

impl fs_main {
    block b0:   // preds:
    32     %0 = deref_var &color (shader_in vec4)
    32x4   %1 = @load_deref (%0) (access=none)
    32     %2 = deref_var &colorMask (shader_in vec4)
    32x4   %3 = @load_deref (%2) (access=none)
    32     %4 = fmul %1.x, %3.x
    32     %5 = fmul %1.y, %3.y
    32     %6 = fmul %1.z, %3.z
    32     %7 = fmul %1.w, %3.w
    32     %8 = deref_var &#0 (shader_out vec4)
    32x4   %9 = vec4 %4, %5, %6, %7
                @store_deref (%8, %9) (wrmask=xyzw, access=none)
                // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1da32f0</ptr></ret>
		<time><int>72</int></time>
	</call>
	<call no='158' class='pipe_context' method='create_ts_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_TASK
source_blake3: {0xa93be064, 0x9f407d12, 0x5b23394f, 0xc5a9b7f6, 0x9146aa9a, 0xd2190e79, 0x0df20856, 0xdec646c8}
workgroup_size: 1, 1, 1
shared_size: 4
task_payload_size: 20
subgroup_size: 0
Unhandled stage 6
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_function ts_main () (entrypoint)

impl ts_main {
    block b0:  // preds:
    32    %0 = load_const (0x3f800000 = 1.000000 = 1065353216)
    32    %1 = load_const (0x00000000)
               @store_shared (%0 (0x3f800000), %1 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32x4  %2 = load_const (0x3f800000, 0x3f800000, 0x00000000, 0x3f800000) = (1.000000, 1.000000, 0.000000, 1.000000) = (1065353216, 1065353216, 0, 1065353216)
               @store_task_payload (%2 (0x3f800000, 0x3f800000, 0x0, 0x3f800000), %1 (0x0)) (base=0, wrmask=xyzw, align_mul=256, align_offset=0)
    32    %3 = load_const (0x00000010 = 16)
    32    %4 = load_const (0x00000001 = 0.000000)
               @store_task_payload (%4 (0x1), %3 (0x10)) (base=0, wrmask=x, align_mul=256, align_offset=16)
    32x3  %5 = load_const (0x00000003, 0x00000001, 0x00000001) = (0.000000, 0.000000, 0.000000)
               @launch_mesh_workgroups (%5 (0x3, 0x1, 0x1)) (base=0, range=20)
               // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1d87610</ptr></ret>
		<time><int>25</int></time>
	</call>
	<call no='159' class='pipe_context' method='create_ms_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_MESH
source_blake3: {0xa41a4a8c, 0xdb892a32, 0x86c55f33, 0x112e676c, 0xd27ecff8, 0x46a99a32, 0x1337b67d, 0xcc785c3c}
workgroup_size: 1, 1, 1
outputs_written: 0,27-28,32-33
system_values_read: 0x00000000'00000000'00000800'00000000
per_primitive_outputs: 27-28,33
shared_size: 140
task_payload_size: 20
subgroup_size: 0
bit_sizes_int: 0x21
ms_cross_invocation_output_access: 0,27-28,32-33
max_vertices_out: 3
max_primitives_out: 1
primitive_type: TRIANGLES
outputs: 5
decl_var shader_out INTERP_MODE_NONE none vec4[3] naga_vertex_builtin_outputs[*].field0 (VARYING_SLOT_POS.xyzw, 0, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none uvec3[1] naga_primitive_indices_outputs (VARYING_SLOT_PRIMITIVE_INDICES.xyz, 2, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none bool[1] naga_primitive_builtin_outputs[*].field0 (VARYING_SLOT_CULL_PRIMITIVE.x, 3, 0)
decl_var shader_out INTERP_MODE_NONE none vec4[3] #0 (VARYING_SLOT_VAR0.xyzw, 1, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none vec4[1] #1 (VARYING_SLOT_VAR1.xyzw, 4, 0)
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_var shared INTERP_MODE_NONE none MeshOutput mesh_output = null
decl_function ms_main () (entrypoint)

impl ms_main {
    block b0:   // preds:
    32     %0 = undefined
    32     %1 = @load_local_invocation_index
    32     %2 = load_const (0x00000004)
    32     %3 = load_const (0x00000084 = 132)
    32     %4 = load_const (0x00000003)
                @store_shared (%4 (0x3), %3 (0x84)) (base=0, wrmask=x, align_mul=256, align_offset=132)
    32     %5 = load_const (0x00000088 = 136)
    32     %6 = load_const (0x00000001)
                @store_shared (%6 (0x1), %5 (0x88)) (base=0, wrmask=x, align_mul=256, align_offset=136)
    32     %7 = load_const (0x40000000 = 2.000000 = 1073741824)
    32     %8 = load_const (0x00000000)
                @store_shared (%7 (0x40000000), %8 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32     %9 = load_const (0x00000005)
    32x4  %10 = load_const (0x00000000, 0x3f800000, 0x00000000, 0x3f800000) = (0.000000, 1.000000, 0.000000, 1.000000) = (0, 1065353216, 0, 1065353216)
                @store_shared (%10 (0x0, 0x3f800000, 0x0, 0x3f800000), %2 (0x4)) (base=0, wrmask=xyzw, align_mul=256, align_offset=4)
    32x4  %11 = @load_task_payload (%8 (0x0)) (base=0, align_mul=256, align_offset=0)
    32x4  %12 = vec4 %8 (0x0), %11.y, %8 (0x0), %11.w
    32    %13 = load_const (0x00000010 = 16)
    32    %14 = load_const (0x00000014 = 20)
                @store_shared (%12, %14 (0x14)) (base=0, wrmask=xyzw, align_mul=256, align_offset=20)
    32    %15 = load_const (0x00000024 = 36)
    32x4  %16 = load_const (0xbf800000, 0xbf800000, 0x00000000, 0x3f800000) = (-1.000000, -1.000000, 0.000000, 1.000000) = (-1082130432, -1082130432, +0, +1065353216) = (3212836864, 3212836864, 0, 1065353216)
                @store_shared (%16 (0xbf800000, 0xbf800000, 0x0, 0x3f800000), %15 (0x24)) (base=0, wrmask=xyzw, align_mul=256, align_offset=36)
    32x4  %17 = vec4 %8 (0x0), %8 (0x0), %11.z, %11.w
    32    %18 = load_const (0x00000034 = 52)
                @store_shared (%17, %18 (0x34)) (base=0, wrmask=xyzw, align_mul=256, align_offset=52)
    32    %19 = load_const (0x00000044 = 68)
    32x4  %20 = load_const (0x3f800000, 0xbf800000, 0x00000000, 0x3f800000) = (1.000000, -1.000000, 0.000000, 1.000000) = (+1065353216, -1082130432, +0, +1065353216) = (1065353216, 3212836864, 0, 1065353216)
                @store_shared (%20 (0x3f800000, 0xbf800000, 0x0, 0x3f800000), %19 (0x44)) (base=0, wrmask=xyzw, align_mul=256, align_offset=68)
    32x4  %21 = vec4 %11.x, %8 (0x0), %8 (0x0), %11.w
    32    %22 = load_const (0x00000054 = 84)
                @store_shared (%21, %22 (0x54)) (base=0, wrmask=xyzw, align_mul=256, align_offset=84)
    32    %23 = load_const (0x00000064 = 100)
    32x3  %24 = load_const (0x00000000, 0x00000001, 0x00000002) = (0.000000, 0.000000, 0.000000)
                @store_shared (%24 (0x0, 0x1, 0x2), %23 (0x64)) (base=0, wrmask=xyz, align_mul=256, align_offset=100)
    32    %25 = @load_task_payload (%13 (0x10)) (base=0, align_mul=256, align_offset=16)
    1     %26 = ieq %25, %8 (0x0)
    32    %27 = load_const (0x00000070 = 112)
    32    %28 = b2b32 %26
                @store_shared (%28, %27 (0x70)) (base=0, wrmask=x, align_mul=256, align_offset=112)
    32    %29 = load_const (0x00000074 = 116)
    32x4  %30 = load_const (0x3f800000, 0x00000000, 0x3f800000, 0x3f800000) = (1.000000, 0.000000, 1.000000, 1.000000) = (1065353216, 0, 1065353216, 1065353216)
                @store_shared (%30 (0x3f800000, 0x0, 0x3f800000, 0x3f800000), %29 (0x74)) (base=0, wrmask=xyzw, align_mul=256, align_offset=116)
                @barrier (execution_scope=WORKGROUP, memory_scope=WORKGROUP, mem_semantics=ACQ|REL, mem_modes=shader_out|shared)
    32    %31 = @load_shared (%3 (0x84)) (base=0, align_mul=256, align_offset=132)
    32    %32 = umin %31, %4 (0x3)
    32    %33 = @load_shared (%5 (0x88)) (base=0, align_mul=256, align_offset=136)
    32    %34 = umin %33, %6 (0x1)
                @set_vertex_and_primitive_count (%32, %34, %0) (stream_id=0)
                // succs: b1
    loop {
        block b1:   // preds: b0 b4
        32    %35 = phi b0: %1, b4: %46
        1     %36 = uge %35, %32
                    // succs: b2 b3
        if %36 {
            block b2:// preds: b1
            break
            // succs: b5
        } else {
            block b3:  // preds: b1, succs: b4
        }
        block b4:   // preds: b3
        32    %37 = ishl %35, %9 (0x5)
        32    %38 = iadd %2 (0x4), %37
        32x4  %39 = @load_shared (%38) (base=0, align_mul=32, align_offset=4)
        32    %40 = iadd %14 (0x14), %37
        32x4  %41 = @load_shared (%40) (base=0, align_mul=32, align_offset=20)
        32    %42 = deref_var &naga_vertex_builtin_outputs[*].field0 (shader_out vec4[3])
        32    %43 = deref_array &(*%42)[%35] (shader_out vec4)  // &naga_vertex_builtin_outputs[*].field0[%35]
                    @store_deref (%43, %39) (wrmask=xyzw, access=none)
        32    %44 = deref_var &#0 (shader_out vec4[3])
        32    %45 = deref_array &(*%44)[%35] (shader_out vec4)  // &#0[%35]
                    @store_deref (%45, %41) (wrmask=xyzw, access=none)
        32    %46 = iadd %35, %6 (0x1)
                    // succs: b1
    }
    block b5:  // preds: b2, succs: b6
    loop {
        block b6:   // preds: b5 b9
        32    %47 = phi b5: %1, b9: %63
        1     %48 = uge %47, %34
                    // succs: b7 b8
        if %48 {
            block b7:// preds: b6
            break
            // succs: b10
        } else {
            block b8:  // preds: b6, succs: b9
        }
        block b9:   // preds: b8
        32    %49 = ishl %47, %9 (0x5)
        32    %50 = iadd %23 (0x64), %49
        32x3  %51 = @load_shared (%50) (base=0, align_mul=32, align_offset=4)
        32    %52 = iadd %27 (0x70), %49
        32    %53 = @load_shared (%52) (base=0, align_mul=32, align_offset=16)
        1     %54 = b2b1 %53
        32    %55 = iadd %29 (0x74), %49
        32x4  %56 = @load_shared (%55) (base=0, align_mul=32, align_offset=20)
        32    %57 = deref_var &naga_primitive_indices_outputs (shader_out uvec3[1])
        32    %58 = deref_array &(*%57)[%47] (shader_out uvec3)  // &naga_primitive_indices_outputs[%47]
                    @store_deref (%58, %51) (wrmask=xyz, access=none)
        32    %59 = deref_var &naga_primitive_builtin_outputs[*].field0 (shader_out bool[1])
        32    %60 = deref_array &(*%59)[%47] (shader_out bool)  // &naga_primitive_builtin_outputs[*].field0[%47]
                    @store_deref (%60, %54) (wrmask=x, access=none)
        32    %61 = deref_var &#1 (shader_out vec4[1])
        32    %62 = deref_array &(*%61)[%47] (shader_out vec4)  // &#1[%47]
                    @store_deref (%62, %56) (wrmask=xyzw, access=none)
        32    %63 = iadd %47, %6 (0x1)
                    // succs: b6
    }
    block b10:  // preds: b7, succs: b11
    block b11:
}

]]>

@inner-daemons
Copy link
Collaborator Author

I also have some AMD GPU code from windows that I'll upload at some point but its absolutely unreadable to my eyes.

@inner-daemons
Copy link
Collaborator Author

inner-daemons commented Nov 28, 2025

For future reference: I have taken the generated code with debug symbols and put it through SPIRV-Opt, I have put it through SPIRV-Cross and then back through glslc, and both times it still caused a bug in LLVMPipe. Not sure about AMD however.

I have also narrowed it down to just being the mesh shader part.

@inner-daemons
Copy link
Collaborator Author

inner-daemons commented Nov 28, 2025

Update: got this to work by modifying the body of main() in the generated GLSL. Therefore, the interface is fine, and its just an issue with something goofy.

@inner-daemons
Copy link
Collaborator Author

inner-daemons commented Nov 28, 2025

These comments are spammy if anyone is listening so you don't need to

I have narrowed it down to this little bit of code that breaks stuff even if the values are immediately rewritten:

  bool cull = mesh_output.primitives[gl_LocalInvocationIndex].cull;
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

Notably, it has to be referenced by gl_LocalInvocationIndex to break, even though that should always be 0. It still results in the following error:

(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never usedLLVM ERROR: Cannot emit physreg copy instruction

I'm beginning to believe this really is a mesa bug

@inner-daemons
Copy link
Collaborator Author

The following when replacing the body of the generated GLSL makes it terminate with sigsegv:

void main() {
  mesh_output.primitives[0u].cull = !taskPayload.visible;

  SetMeshOutputsEXT(3, 1);

  bool cull = mesh_output.primitives[gl_LocalInvocationIndex].cull;
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

  gl_MeshVerticesEXT[0].gl_Position = vec4(0.0, 1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[1].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[2].gl_Position = vec4(1.0, -1.0, 0.0, 1.0);

  _43[0]._m0 = vec4(0.0, 1.0, 0.0, 1.0) * taskPayload.colorMask;
  _43[1]._m0 = vec4(0.0, 0.0, 1.0, 1.0) * taskPayload.colorMask;
  _43[2]._m0 = vec4(1.0, 0.0, 0.0, 1.0) * taskPayload.colorMask;

  gl_PrimitiveTriangleIndicesEXT[0] = uvec3(0u, 1u, 2u);
  _50[0]._m0 = vec4(1.0, 0.0, 1.0, 1.0);
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = false;
}

It doesn't terminate when gl_LocalInvocationIndex isn't used to index mesh_output.primitives, even if it is used elsewhere

@inner-daemons
Copy link
Collaborator Author

#version 450
#extension GL_EXT_mesh_shader : require
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(max_vertices = 3, max_primitives = 1, triangles) out;

struct TaskPayload {
  vec4 colorMask;
  bool visible;
};

out _40 { layout(location = 0) vec4 _m0; }
_43[3];

perprimitiveEXT out _47 { layout(location = 1) vec4 _m0; }
_50[1];

taskPayloadSharedEXT TaskPayload taskPayload;

shared bool array[3];

void main() {
  array[0] = !taskPayload.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

  SetMeshOutputsEXT(3, 1);

  gl_MeshVerticesEXT[0].gl_Position = vec4(0.0, 1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[1].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[2].gl_Position = vec4(1.0, -1.0, 0.0, 1.0);

  _43[0]._m0 = vec4(0.0, 1.0, 0.0, 1.0) * taskPayload.colorMask;
  _43[1]._m0 = vec4(0.0, 0.0, 1.0, 1.0) * taskPayload.colorMask;
  _43[2]._m0 = vec4(1.0, 0.0, 0.0, 1.0) * taskPayload.colorMask;

  gl_PrimitiveTriangleIndicesEXT[0] = uvec3(0u, 1u, 2u);
  _50[0]._m0 = vec4(1.0, 0.0, 1.0, 1.0);
}

This example is broken but only because of the first 3 lines in main: if you chain the indexing to use 0 it works fine, and if you remove it it works fine.

@inner-daemons
Copy link
Collaborator Author

If I change naga's logic to use


        // Current index to copy
        let val_i = if is_primitive {
            self.get_constant_scalar(crate::Literal::U32(0))
        } else {
            let val_i = self.id_gen.next();
            body.push(Instruction::load(u32_type_id, val_i, index_var, None));
            val_i
        };

when deciding which index to copy over, it actually passes all of the tests except for 2: wgpu_gpu::mesh_shader::mesh_pipeline_basic_mesh and wgpu_gpu::mesh_shader::mesh_pipeline_basic_mesh_frag both consistently segfault while nothing else does.

So in conclusion I have no fucking clue what's going on.

@inner-daemons
Copy link
Collaborator Author

inner-daemons commented Nov 28, 2025

This shader

#version 450
#extension GL_EXT_mesh_shader : require

const vec4[3] positions = {vec4(0., 1.0, 0., 1.0), vec4(-1.0, -1.0, 0., 1.0),
                           vec4(1.0, -1.0, 0., 1.0)};
const vec4[3] colors = {vec4(0., 1., 0., 1.), vec4(0., 0., 1., 1.),
                        vec4(1., 0., 0., 1.)};

// This is an inefficient workgroup size.Ideally the total thread count would be
// a multiple of 64
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
struct PayloadData {
  vec4 colorMask;
  bool visible;
};
taskPayloadSharedEXT PayloadData payloadData;

out VertexOutput { layout(location = 0) vec4 color; }
vertexOutput[];
layout(location = 1) perprimitiveEXT out PrimitiveOutput { vec4 colorMask; }
primitiveOutput[];

layout(triangles, max_vertices = 3, max_primitives = 1) out;

shared bool array[3];

void main() {
  SetMeshOutputsEXT(3, 1);

  gl_MeshVerticesEXT[0].gl_Position = positions[0];
  gl_MeshVerticesEXT[1].gl_Position = positions[1];
  gl_MeshVerticesEXT[2].gl_Position = positions[2];

  vertexOutput[0].color = colors[0] * payloadData.colorMask;
  vertexOutput[1].color = colors[1] * payloadData.colorMask;
  vertexOutput[2].color = colors[2] * payloadData.colorMask;

  gl_PrimitiveTriangleIndicesEXT[gl_LocalInvocationIndex] = uvec3(0, 1, 2);
  primitiveOutput[0].colorMask = vec4(1.0, 0.0, 1.0, 1.0);

  array[0] = !payloadData.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;
}

Which is a copy-paste of the fully functional SPIR-V that works on NVIDIA, AMD, and LLVMPIPE, except for the shared bool array[3]; and

  array[0] = !payloadData.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

fails to run on LLVMPIPE. LLVMPIPE segfaults here.

@inner-daemons
Copy link
Collaborator Author

So that I remember: the tests started failing between 4b7ba3a and 65be73b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants