Skip to content

BUG: df.sum() of string columns depends on whether or not they can be coerced to numbers #22642

@rosnfeld

Description

@rosnfeld

Note that all the columns in this example are string columns:

In [32]: df1 = pd.DataFrame(data={'a':['1', '2'], 'b':['3', '4']})

In [33]: df1.sum(axis=1)
Out[33]: 
0    13.0
1    24.0
dtype: float64

In [34]: df2 = pd.DataFrame(data={'a':['i', 'j'], 'b':['m', 'n']})

In [35]: df2.sum(axis=1)
Out[35]: 
0    im
1    jn
dtype: object

Problem description

It would seem a bug, or at least very surprising behavior, that the sum() operation would depend on the contents of the strings when summing the columns.

Not sure if this has been reported before - I wasn't sure exactly what to search on, but wasn't able to find anything.

Expected Output

I would expect the columns to be concatenated as strings and then left as strings, just the same as if we did df1.a + df1.b.

In [25]: df1.a + df1.b
Out[25]: 
0    13
1    24
dtype: object

Output of pd.show_versions()

On current master as of this filing.

In [26]: pd.show_versions() No module named 'dask'

INSTALLED VERSIONS

commit: 996f361
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-128-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.24.0.dev0+562.g996f361
pytest: 3.7.4
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.8
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: 0.4.0
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions