Skip to content

DEPR: DataFrame.combine propagates nan values #10734

@aktiur

Description

@aktiur

The documentation for DataFrame.combine claims that the method "do[es] not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value". However, this does not seem to correspond to the actual behaviour of DataFrame.combine.

Sample code:

>>> import pandas as pd
>>> from operator import add
>>> a = pd.DataFrame({
...        'a': pd.Series([1, 3], index=[0, 1]),
...        'b': pd.Series([2, 3], index=[1, 2]),
...    })
>>> b = pd.DataFrame({
...         'a': pd.Series([3, 5], index=[1, 2]),
...         'c': pd.Series([1, 2, 3], index=[2, 3, 4])
...     })
>>> a.combine(b, add)
    a   b   c
0 NaN NaN NaN
1   6 NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

In many cases (as in this one), it may be remedied by using the fill_value parameter of DataFrame.combine. However, it might be a problem when there is no acceptable neutral element for the given combining function.

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.2
nose: 1.3.7
Cython: 0.22
numpy: 1.9.2
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.6
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions