v0.18.0
·
1056 commits
to master
since this release
This release reaches an important milestone of making offloading fully asynchronous.
Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
Added
- Implement
tensor.take_along_axisper Python Array API specification gh-1778 - Implement
tensor.put_along_axisto complementtensor.take_along_axisgh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpackfunction andtensor.usm_ndarray.__dlpack__method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafterfunction per Python Array API specification gh-1730 - Implement
tensor.count_nonzeroandtensor.difffunctions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"to*_likearray creation functions, and change defaultorderkeyword value from'C'to'K'gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memoryclass defined indpctl4pybind11.hppadds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
Change
- Change ownership of USM allocation by
dpctl.memoryobjects, make executions ofdpctl.tensoroperations asynchronous gh-1705 - Add support for Python scalars by
tensor.wherefunction gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean,tensor.std,tensor.vargh-1820 - Use transcendental functions from
syclnamespace instead ofstdnamespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zerosto use asynchronousmemsetoperation gh-1806 - The setter of
tensor.usm_ndarray.shapeproperty now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocatorused withstd::vectorgh-1791 - Use
dpctl::tensor::offset_utils::sycl_free_noexceptinstead ofsycl::freeinhost_tasktasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"-style casting for in-place mathematical operators oftensor.usm_ndarraygh-1827, gh-1830
Fixed
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1dandDPCTLDevice_GetMaxWorkGroupSize2dgh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.roundbehavior on CUDA devices gh-1700 - Add missing
#include <sstream>gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extractfunction gh-1727 - Fix for
tensor.unique_allandtensor.unique_inverseto always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__intensor.asarrayfunction gh-1756 tensor.clipto handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divideand comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.testsgh-1833
Maintenance
- Improve performance of
test_sort_complex_fp_nangh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()gh-1720 - Remove
templatekeyword in method call ofsycl::kernel_bundlegh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTERgh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>for definition ofstd::moveused gh-1787 - Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flagsclass gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexerclass gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::eventassociated with compute task to vector of events representing execution ofhost_taskgh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctlgh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, gh-1721, gh-1743, gh-1739, gh-1747, gh-1748, gh-1750, gh-1752, gh-1767, gh-1768, gh-1775, gh-1783, gh-1790, gh-1795, gh-1796, gh-1800, gh-1760, gh-1803, gh-1777, gh-1813, gh-1817, gh-1818