Skip to content

Commit 6db4415

Browse files
committed
Restore changes section
1 parent 8e8b1ce commit 6db4415

File tree

1 file changed

+71
-9
lines changed

1 file changed

+71
-9
lines changed

doc/source/user_guide/text.rst

Lines changed: 71 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -89,15 +89,6 @@ or convert from existing pandas data:
8989
However there are four distinct :class:`StringDtype` variants that may be utilized.
9090
See :ref:`text.four_string_variants` section below for details.
9191

92-
.. _text.differences:
93-
94-
Behavior differences
95-
====================
96-
97-
There are various behavior differences between using NumPy ``object`` dtype,
98-
``dtype="str"``, and ``dtype="string"``. See the
99-
:ref:`String migration guide <string_migration_guide-differences>` section for further details.
100-
10192
.. _text.string_methods:
10293

10394
String methods
@@ -686,6 +677,77 @@ String ``Index`` also supports ``get_dummies`` which returns a ``MultiIndex``.
686677
687678
See also :func:`~pandas.get_dummies`.
688679

680+
.. _text.differences:
681+
682+
Behavior differences
683+
====================
684+
685+
Differences in behavior will be primarily due to the kind of NA value.
686+
687+
``StringDtype`` with ``np.nan`` NA values
688+
-----------------------------------------
689+
690+
1. Like ``dtype="object"``, :ref:`string accessor methods<api.series.str>`
691+
that return **integer** output will return a NumPy array that is
692+
either dtype int or float depending on the presence of NA values.
693+
Methods returning **boolean** output will return a NumPy array this is
694+
dtype bool, with the value ``False`` when an NA value is encountered.
695+
696+
.. ipython:: python
697+
698+
s = pd.Series(["a", None, "b"], dtype="str")
699+
s
700+
s.str.count("a")
701+
s.dropna().str.count("a")
702+
703+
When NA values are present, the output dtype is float64. However
704+
**boolean** output results in ``False`` for the NA values.
705+
706+
.. ipython:: python
707+
708+
s.str.isdigit()
709+
s.str.match("a")
710+
711+
2. Some string methods, like :meth:`Series.str.decode`, are not
712+
available because the underlying array can only contain
713+
strings, not bytes.
714+
3. Comparison operations will return a NumPy array with dtype bool. Missing
715+
values will always compare as unequal just as :attr:`np.nan` does.
716+
717+
``StringDtype`` with ``pd.NA`` NA values
718+
----------------------------------------
719+
720+
1. :ref:`String accessor methods<api.series.str>`
721+
that return **integer** output will always return a nullable integer dtype,
722+
rather than either int or float dtype (depending on the presence of NA values).
723+
Methods returning **boolean** output will return a nullable boolean dtype.
724+
725+
.. ipython:: python
726+
727+
s = pd.Series(["a", None, "b"], dtype="string")
728+
s
729+
s.str.count("a")
730+
s.dropna().str.count("a")
731+
732+
Both outputs are ``Int64`` dtype. Similarly for methods returning boolean values.
733+
734+
.. ipython:: python
735+
736+
s.str.isdigit()
737+
s.str.match("a")
738+
739+
2. Some string methods, like :meth:`Series.str.decode` because the underlying
740+
array can only contain strings, not bytes.
741+
3. Comparison operations will return an object with :class:`BooleanDtype`,
742+
rather than a ``bool`` dtype object. Missing values will propagate
743+
in comparison operations, rather than always comparing
744+
unequal like :attr:`numpy.nan`.
745+
746+
747+
.. important::
748+
Everything else that follows in the rest of this document applies equally to
749+
``'str'``, ``'string'``, and ``object`` dtype.
750+
689751
.. _text.four_string_variants:
690752

691753
The four :class:`StringDtype` variants

0 commit comments

Comments
 (0)