Skip to content

Conversation

@christiangnrd
Copy link
Member

@christiangnrd christiangnrd commented Sep 30, 2025

The goal is to allow for kernels to be written without relying on KernelAbstractions macros

See #562 for initial discussion

@vchuravy @maleadt

@github-actions
Copy link
Contributor

github-actions bot commented Sep 30, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/pocl/backend.jl b/src/pocl/backend.jl
index 055e1c7d..abf0f803 100644
--- a/src/pocl/backend.jl
+++ b/src/pocl/backend.jl
@@ -149,9 +149,9 @@ end
 function (obj::KI.Kernel{POCLBackend})(args...; numworkgroups = 1, workgroupsize = 1)
     KI.check_launch_args(numworkgroups, workgroupsize)
 
-    local_size = (workgroupsize..., ntuple(_->1, 3-length(workgroupsize))...,)
+    local_size = (workgroupsize..., ntuple(_ -> 1, 3 - length(workgroupsize))...)
 
-    numworkgroups = (numworkgroups..., ntuple(_->1, 3-length(numworkgroups))...,)
+    numworkgroups = (numworkgroups..., ntuple(_ -> 1, 3 - length(numworkgroups))...)
     global_size = local_size .* numworkgroups
 
     event = obj.kern(args...; local_size, global_size)

@christiangnrd christiangnrd force-pushed the intrinsics branch 3 times, most recently from c16a665 to b166baa Compare October 2, 2025 19:58
@vchuravy
Copy link
Member

vchuravy commented Oct 6, 2025

Can you rebase?

@christiangnrd

This comment was marked as outdated.

@vchuravy vchuravy changed the title More KernelIntrinsics KernelIntrinsics API Oct 6, 2025
@vchuravy vchuravy added this to the 0.10.0 milestone Oct 6, 2025
@vchuravy vchuravy mentioned this pull request Oct 6, 2025
@vchuravy vchuravy linked an issue Oct 6, 2025 that may be closed by this pull request
@vchuravy
Copy link
Member

vchuravy commented Oct 6, 2025

@christiangnrd do you think we need a lower-level kernel launch interface? Otherwise the three-dimensional indices would be superflous.

@christiangnrd
Copy link
Member Author

christiangnrd commented Oct 6, 2025

do you think we need a lower-level kernel launch interface?

I'd been thinking about that and I think so. Would something that, assuming you wrote the whole kernel with KernelIntrinsics, could be used interchangeably with the backend's kernel launch macro (@cuda, @metal, etc.) make sense/be feasible?

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Maybe we can also add some simple shuffle operations?


return quote
$SharedMemory($(esc(T)), Val($(esc(dims))), Val($(QuoteNode(id))))
$SharedMemory($(esc(T)), Val($(esc(dims))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically an ABI break, which I had avoided so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ca11482 sufficient or do we still need to require backends to take in a third unused argument on the KI side?

@vchuravy vchuravy requested a review from maleadt November 6, 2025 19:43
@christiangnrd
Copy link
Member Author

I'm thinking we should either directly rename KernelIntrinsics to KI or provide the alias.

@vchuravy
Copy link
Member

vchuravy commented Nov 7, 2025

I am okay with providing the alias!

@vchuravy
Copy link
Member

vchuravy commented Nov 7, 2025

One additional thing to consider is what intrinsics do we need to implement something like #559

Can we add some primitive shuffles? Or add a test kernel that implements a reduction correctly?

@christiangnrd
Copy link
Member Author

One additional thing to consider is what intrinsics do we need to implement something like #559

Can we add some primitive shuffles? Or add a test kernel that implements a reduction correctly?

I think this should be a follow-up PR since it'll probably generate quite a bit of discussion, but I feel like shfl_down and a check for support taking into account backend and type may be all that's needed?

Otherwise, I believe the only things left to resolve for this PR are the module docstring, #635 (comment), and I'm not sure if we should at least start considering dynamic local memory API to make sure we don't need to release another breaking version or if you think the current api can definitely fit the addition of local memory.

@maleadt
Copy link
Member

maleadt commented Nov 13, 2025

I am okay with providing the alias!

Do we really need to, given the import KernelIntrinsics as KI feature?

@christiangnrd
Copy link
Member Author

Do we really need to, given the import KernelIntrinsics as KI feature?

I felt like we might as well include it since it'll be written so much, and I found fully qualifying KernelIntrinsics for every KI function mention in the docstrings was distracting, but also didn't want to write KI since users new to the package would be confused wondering where it came from.

I don't feel that strongly about this though so I'll defer to you for the final decision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lower-level kernel form?

3 participants