-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Rewrite the matrix multiplication function. rearrange the loops and us… #20365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the matrix multiplication function. rearrange the loops and us… #20365
Conversation
…e blocking to enhance the speed (a factor of 6 for very large matrices)
|
Thanks! Could we have some tests in addition to the algorithm to ensure it works? |
|
I generated two 20×20 matrices using TRandom3 r(1) and verified the correctness of the resulting matrix C = A * B by printing the output. The attached slide includes the verification results, while the attached macro shows how the matrices were created and multiplied. I have also included a performance chart that highlights the observed improvements achieved through this implementation. |
Test Results 22 files 22 suites 3d 21h 3m 44s ⏱️ For more details on these failures, see this check. Results for commit ebf8d63. ♻️ This comment has been updated with latest results. |
|
Hi @muhammadalhroob, thank you for your contribution! No need to write more tests, because the TMatrix class is already well tested in the CI, and some tests are failing by the way 🙂 TEST FAILURES:
416:gtest-roofit-roofitcore-testSumW2Error
454:test-stresslinear
455:test-stresslinear-interpreted
2792:roottest-root-math-smatrix-Kalman
2793:roottest-root-math-smatrix-OperationsCould you please make sure that tests are passing before we can proceed? Thanks! |
|
Hi @guitargeek, |
|
I don't think it's related to numerics. If you look at the output of 2025-11-25T15:06:55.4764218Z -- BEGIN ERRDIFF OUTPUT --
2025-11-25T15:06:55.4791591Z --- /github/home/ROOT-CI/src/roottest/root/math/smatrix/testOperations.ref Tue Nov 25 13:12:10 2025
2025-11-25T15:06:55.4862367Z +++ /github/home/ROOT-CI/build/roottest/root/math/smatrix/Operations.err Tue Nov 25 16:06:55 2025
2025-11-25T15:06:55.4862711Z @@ -1,4 +1,4 @@
2025-11-25T15:06:55.4862878Z SMatrix: r1 = 299.766 r2 = -17486.4
2025-11-25T15:06:55.4863072Z -TMatrix: r1 = 299.766 r2 = -17486.4
2025-11-25T15:06:55.4863260Z +TMatrix: r1 = 299.766 r2 = 25523.4
2025-11-25T15:06:55.4863443Z SMatrixSym: r1 = 6972.11
2025-11-25T15:06:55.4863613Z TMatrixSym: r1 = 6972.11
2025-11-25T15:06:55.4863713Z
2025-11-25T15:06:55.4863787Z -- END ERRDIFF OUTPUT --I would appreciate if you could look into this more, as we don't want to miss out on this performance improvement 🙂 |
|
Hi @guitargeek, |
|
Hi @guitargeek, I see that 19 CI checks have passed and 7 have failed. I am not sure whether the failing checks are related to the new matrix multiplication algorithm. Cheers, |
|
Some of the failures are related, in particular on 2025-11-28T14:43:10.7387749Z /github/home/ROOT-CI/src/math/matrix/src/TMatrixT.cxx: In instantiation of ‘void TMatrixTAutoloadOps::AMultB(const Element*, Int_t, Int_t, const Element*, Int_t, Int_t, Element*) [with Element = float; Int_t = int]’:
2025-11-28T14:43:10.7522482Z /github/home/ROOT-CI/src/math/matrix/src/TMatrixT.cxx:3308:112: required from here
2025-11-28T14:43:10.7668391Z ##[error]/github/home/ROOT-CI/src/math/matrix/src/TMatrixT.cxx:3079:114: error: unused parameter ‘nb’ [-Werror=unused-parameter]
2025-11-28T14:43:10.7676986Z 3079 | void TMatrixTAutoloadOps::AMultB(const Element *const ap, Int_t na, Int_t ncolsa, const Element *const bp, Int_t nb,
2025-11-28T14:43:10.7761705Z | ~~~~~~^~
2025-11-28T14:43:10.7764475Z /github/home/ROOT-CI/src/math/matrix/src/TMatrixT.cxx: In instantiation of ‘void TMatrixTAutoloadOps::AMultB(const Element*, Int_t, Int_t, const Element*, Int_t, Int_t, Element*) [with Element = double; Int_t = int]’:
2025-11-28T14:43:10.7766043Z /github/home/ROOT-CI/src/math/matrix/src/TMatrixT.cxx:3365:115: required from here
2025-11-28T14:43:10.7768859Z ##[error]/github/home/ROOT-CI/src/math/matrix/src/TMatrixT.cxx:3079:114: error: unused parameter ‘nb’ [-Werror=unused-parameter]
2025-11-28T14:43:10.7770822Z cc1plus: all warnings being treated as errors
2025-11-28T14:43:10.7771659Z gmake[2]: *** [math/matrix/CMakeFiles/Matrix.dir/build.make:202: math/matrix/CMakeFiles/Matrix.dir/src/TMatrixT.cxx.o] Error 1
2025-11-28T14:43:10.7772626Z gmake[1]: *** [CMakeFiles/Makefile2:53360: math/matrix/CMakeFiles/Matrix.dir/all] Error 2
2025-11-28T14:43:10.7773263Z gmake[1]: *** Waiting for unfinished jobs....It void TMatrixTAutoloadOps::AMultB(const Element *const ap, Int_t na, Int_t ncolsa, const Element *const bp, Int_t /*nb*/,
Int_t ncolsb, Element *cp) |
|
Thanks a lot @guitargeek, |
guitargeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this nice optimization!
|
Thanks! @muhammadalhroob could you clarify if AI did the parallel openmp implementation or if it was just used for the benchmark tests? fyi @silverweed |
|
Hi @ferdymercury, |
|
Yeah the OpenMP might be indeed problematic, because it implies multithreading that can be surprising to the user. But since it's done for a minimum entry size of 10000, I expect this not to trigger often and the risk of problems is low. So I thought it was worth to keep it. |
|
Some comments:
My 2 cents: the OpenMP implementation should likely be removed since it's not compiled by default and does not work well with other parallelization options in ROOT. |
|
Hi @guitargeek, |
|
Hi! Yes, given the discussion here, it would be good if you open a new PR, removing the OpenMP pragmas. |
|
Hi @guitargeek, |

This PR enhances the matrix multiplication by rearranging loops and blocking to enhance the speed (a factor of 6 for very large matrices)
This Pull request:
Changes or fixes:
Checklist:
This PR fixes #
Speeds up the code