Skip to content

Commit 98e66c8

Browse files
Merge branch 'main' into add-zip-strict-core-arrays
2 parents 682295f + 5cc3240 commit 98e66c8

File tree

42 files changed

+1008
-564
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1008
-564
lines changed

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
163163

164164
- name: Build wheels
165-
uses: pypa/cibuildwheel@v3.1.4
165+
uses: pypa/cibuildwheel@v3.2.0
166166
with:
167167
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
168168
env:

doc/source/reference/missing_value.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,12 @@ NA is the way to represent missing values for nullable dtypes (see below):
1111

1212
.. autosummary::
1313
:toctree: api/
14-
:template: autosummary/class_without_autosummary.rst
1514

1615
NA
1716

1817
NaT is the missing value for timedelta and datetime data (see below):
1918

2019
.. autosummary::
2120
:toctree: api/
22-
:template: autosummary/class_without_autosummary.rst
2321

2422
NaT

doc/source/whatsnew/v2.3.2.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@ become the default string dtype in pandas 3.0. See
2222

2323
Bug fixes
2424
^^^^^^^^^
25-
- Fix :meth:`~Series.str.isdigit` to correctly recognize unicode superscript
26-
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
2725
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2826
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2927
(:issue:`61889`)
@@ -39,4 +37,4 @@ Bug fixes
3937
Contributors
4038
~~~~~~~~~~~~
4139

42-
.. contributors:: v2.3.1..v2.3.2|HEAD
40+
.. contributors:: v2.3.1..v2.3.2

doc/source/whatsnew/v2.3.3.rst

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
.. _whatsnew_233:
22

3-
What's new in 2.3.3 (September XX, 2025)
3+
What's new in 2.3.3 (September 29, 2025)
44
----------------------------------------
55

66
These are the changes in pandas 2.3.3. See :ref:`release` for a full changelog
77
including other versions of pandas.
88

99
{{ header }}
1010

11-
.. _whatsnew_220.py14_compat:
11+
.. _whatsnew_233.py14_compat:
1212

1313
Pandas 2.3.3 is now compatible with Python 3.14
1414
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -37,25 +37,22 @@ Improvements
3737
specifying ``include=["object"]`` for backwards compatibility. In a future
3838
release, this will be deprecated and code for pandas 3+ should be updated to
3939
do ``include=["str"]`` (:issue:`61916`)
40-
40+
- Support the ``/`` operation between a ``pathlib.Path`` object and a :class:`StringDtype`
41+
Series, similarly as it works for object-dtype Series (:issue:`61940`)
4142

4243
.. _whatsnew_233.string_fixes.bugs:
4344

4445
Bug fixes
4546
^^^^^^^^^
4647
- Fix bug in :meth:`Series.str.replace` using named capture groups (e.g., ``\g<name>``) with the Arrow-backed dtype would raise an error (:issue:`57636`)
47-
- Fix regression in ``~Series.str.contains``, ``~Series.str.match`` and ``~Series.str.fullmatch``
48+
- Fix regression in :meth:`Series.str.contains`, :meth:`~Series.str.match` and :meth:`~Series.str.fullmatch`
4849
with a compiled regex and custom flags (:issue:`62240`)
49-
- Fix :meth:`Series.str.match` and :meth:`Series.str.fullmatch` not matching patterns with groups correctly for the Arrow-backed string dtype (:issue:`61072`)
50-
51-
52-
Improvements and fixes for Copy-on-Write
53-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54-
55-
Bug fixes
56-
^^^^^^^^^
57-
58-
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option when assigning values after subsetting the columns of a homogeneous DataFrame (:issue:`60309`)
50+
- Fix :meth:`Series.str.match` and :meth:`~Series.str.fullmatch` not matching patterns with groups correctly for the Arrow-backed string dtype (:issue:`61072`)
51+
- Fix bug in :meth:`~DataFrame.groupby` with ``sum()`` and unobserved categories resulting in ``0`` instead of the empty string ``""`` (:issue:`61909`)
52+
- Fix :meth:`Series.str.isdigit` to correctly recognize unicode superscript
53+
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
54+
- Fix comparing a :class:`StringDtype` Series with mixed objects raising an error (:issue:`60228`)
55+
- Fix error being raised when using a numpy ufunc with a Python-backed string array (:issue:`40800`)
5956

6057
Other changes
6158
~~~~~~~~~~~~~
@@ -65,9 +62,17 @@ Other changes
6562
Resampling with a :class:`PeriodIndex` is supported again, but a subset of
6663
methods that return incorrect results will raise an error in pandas 3.0 (:issue:`57033`)
6764

65+
Other bug fixes
66+
~~~~~~~~~~~~~~~~
67+
68+
- Fix memory leak in :meth:`DataFrame.to_json` with datetime columns (:issue:`62204`)
69+
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
70+
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option when assigning values after subsetting the columns of a homogeneous DataFrame (:issue:`60309`)
6871

6972
.. ---------------------------------------------------------------------------
7073
.. _whatsnew_233.contributors:
7174

7275
Contributors
7376
~~~~~~~~~~~~
77+
78+
.. contributors:: v2.3.2..v2.3.3|HEAD

doc/source/whatsnew/v3.0.0.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1054,6 +1054,8 @@ MultiIndex
10541054
I/O
10551055
^^^
10561056
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1057+
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
1058+
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
10571059
- Bug in :meth:`.DataFrame.to_json` when ``"index"`` was a value in the :attr:`DataFrame.column` and :attr:`Index.name` was ``None``. Now, this will fail with a ``ValueError`` (:issue:`58925`)
10581060
- Bug in :meth:`.io.common.is_fsspec_url` not recognizing chained fsspec URLs (:issue:`48978`)
10591061
- Bug in :meth:`DataFrame._repr_html_` which ignored the ``"display.float_format"`` option (:issue:`59876`)
@@ -1217,10 +1219,11 @@ Other
12171219
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
12181220
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
12191221
- Deprecated the keyword ``check_datetimelike_compat`` in :meth:`testing.assert_frame_equal` and :meth:`testing.assert_series_equal` (:issue:`55638`)
1222+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`NA` values in a :class:`Float64Dtype` object with ``np.nan``; this now works with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`55127`)
1223+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`np.nan` values in a :class:`Int64Dtype` object with :class:`NA`; this is now a no-op with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`51237`)
12201224
- Fixed bug in the :meth:`Series.rank` with object dtype and extremely small float values (:issue:`62036`)
12211225
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)
12221226
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
1223-
-
12241227

12251228
.. ***DO NOT USE THIS SECTION***
12261229

pandas/_libs/missing.pyx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,7 @@ class NAType(C_NAType):
393393
>>> True | pd.NA
394394
True
395395
"""
396+
__module__ = "pandas"
396397

397398
_instance = None
398399

pandas/_libs/tslibs/nattype.pyx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,8 @@ class NaTType(_NaT):
372372
1 NaT
373373
"""
374374

375+
__module__ = "pandas"
376+
375377
def __new__(cls):
376378
cdef _NaT base
377379

pandas/core/arrays/arrow/array.py

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -883,22 +883,27 @@ def _cmp_method(self, other, op) -> ArrowExtensionArray:
883883
ltype = self._pa_array.type
884884

885885
if isinstance(other, (ExtensionArray, np.ndarray, list)):
886-
boxed = self._box_pa(other)
887-
rtype = boxed.type
888-
if (pa.types.is_timestamp(ltype) and pa.types.is_date(rtype)) or (
889-
pa.types.is_timestamp(rtype) and pa.types.is_date(ltype)
890-
):
891-
# GH#62157 match non-pyarrow behavior
892-
result = ops.invalid_comparison(self, other, op)
893-
result = pa.array(result, type=pa.bool_())
886+
try:
887+
boxed = self._box_pa(other)
888+
except pa.lib.ArrowInvalid:
889+
# e.g. GH#60228 [1, "b"] we have to operate pointwise
890+
res_values = [op(x, y) for x, y in zip(self, other)]
891+
result = pa.array(res_values, type=pa.bool_(), from_pandas=True)
894892
else:
895-
try:
896-
result = pc_func(self._pa_array, boxed)
897-
except pa.ArrowNotImplementedError:
898-
# TODO: could this be wrong if other is object dtype?
899-
# in which case we need to operate pointwise?
893+
rtype = boxed.type
894+
if (pa.types.is_timestamp(ltype) and pa.types.is_date(rtype)) or (
895+
pa.types.is_timestamp(rtype) and pa.types.is_date(ltype)
896+
):
897+
# GH#62157 match non-pyarrow behavior
900898
result = ops.invalid_comparison(self, other, op)
901899
result = pa.array(result, type=pa.bool_())
900+
else:
901+
try:
902+
result = pc_func(self._pa_array, boxed)
903+
except pa.ArrowNotImplementedError:
904+
result = ops.invalid_comparison(self, other, op)
905+
result = pa.array(result, type=pa.bool_())
906+
902907
elif is_scalar(other):
903908
if (isinstance(other, datetime) and pa.types.is_date(ltype)) or (
904909
type(other) is date and pa.types.is_timestamp(ltype)

pandas/core/arrays/base.py

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@
3030
from pandas.compat.numpy import function as nv
3131
from pandas.errors import AbstractMethodError
3232
from pandas.util._decorators import (
33-
Appender,
34-
Substitution,
3533
cache_readonly,
3634
)
3735
from pandas.util._validators import (
@@ -1669,9 +1667,48 @@ def factorize(
16691667
Categories (3, str): ['a', 'b', 'c']
16701668
"""
16711669

1672-
@Substitution(klass="ExtensionArray")
1673-
@Appender(_extension_array_shared_docs["repeat"])
16741670
def repeat(self, repeats: int | Sequence[int], axis: AxisInt | None = None) -> Self:
1671+
"""
1672+
Repeat elements of an ExtensionArray.
1673+
1674+
Returns a new ExtensionArray where each element of the current ExtensionArray
1675+
is repeated consecutively a given number of times.
1676+
1677+
Parameters
1678+
----------
1679+
repeats : int or array of ints
1680+
The number of repetitions for each element. This should be a
1681+
non-negative integer. Repeating 0 times will return an empty
1682+
ExtensionArray.
1683+
axis : None
1684+
Must be ``None``. Has no effect but is accepted for compatibility
1685+
with numpy.
1686+
1687+
Returns
1688+
-------
1689+
ExtensionArray
1690+
Newly created ExtensionArray with repeated elements.
1691+
1692+
See Also
1693+
--------
1694+
Series.repeat : Equivalent function for Series.
1695+
Index.repeat : Equivalent function for Index.
1696+
numpy.repeat : Similar method for :class:`numpy.ndarray`.
1697+
ExtensionArray.take : Take arbitrary positions.
1698+
1699+
Examples
1700+
--------
1701+
>>> cat = pd.Categorical(["a", "b", "c"])
1702+
>>> cat
1703+
['a', 'b', 'c']
1704+
Categories (3, str): ['a', 'b', 'c']
1705+
>>> cat.repeat(2)
1706+
['a', 'a', 'b', 'b', 'c', 'c']
1707+
Categories (3, str): ['a', 'b', 'c']
1708+
>>> cat.repeat([1, 2, 3])
1709+
['a', 'b', 'b', 'c', 'c', 'c']
1710+
Categories (3, str): ['a', 'b', 'c']
1711+
"""
16751712
nv.validate_repeat((), {"axis": axis})
16761713
ind = np.arange(len(self)).repeat(repeats)
16771714
return self.take(ind)

pandas/core/arrays/masked.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,9 @@ def __setitem__(self, key, value) -> None:
312312
key = check_array_indexer(self, key)
313313

314314
if is_scalar(value):
315-
if is_valid_na_for_dtype(value, self.dtype):
315+
if is_valid_na_for_dtype(value, self.dtype) and not (
316+
lib.is_float(value) and not is_nan_na()
317+
):
316318
self._mask[key] = True
317319
else:
318320
value = self._validate_setitem_value(value)

0 commit comments

Comments
 (0)