Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR defines methods for making cuDNN work with
BFloat16s.BFloat16.In the following example, I show how the new methods fixes the
BFloat16backward pass ofFlux.logitcrossentropy:Before
Note: Core.BFloat16 === BFloat16s.BFloat16, but I didn't explicitly import in this REPL session.
After defining cudnnDataType(::Type{BFloat16})
After defining scalingParameter(::Type{BFloat16}, val)
I also define a
cptrmethod for consistency, but it appears the function isn't used anywhere.Tests are added for softmax, activations, and pooling. I initially also tested convolutions, normalization, RNNs, and MHA but they don't appear to support BFloat16.
Adding BFloat16s.jl as a dependency does not affect compilation since it's already a dependency of CUDA.jl.
Along with my proposed fix in FluxML/Optimisers.jl#215, this has allowed me to train LLMs in BFloat16 with Flux.jl in Julia v1.12. I am still working on an Optimisers.jl, but these together would be a significant unlock for my lab.