Skip to content

Conversation

@jadosh123
Copy link
Contributor

@jadosh123 jadosh123 commented Nov 16, 2025

GH-40673: Adds a test case to prevent regression of a bug where the internal object reused by apply() would corrupt externally stored DataFrames created with .copy(). This test verifies that store[0] and store[1] correctly contain independent copies of their respective groups.

@aa-lunar

This comment was marked as spam.

@9h7xrywp7z-hue

This comment was marked as spam.

@9h7xrywp7z-hue

This comment was marked as spam.

@b0rderc0ntr0l

This comment was marked as spam.

@jadosh123
Copy link
Contributor Author

All CI checks passed, the pull request is ready for review but i don't see a button to request a reviewer.
@pandas-dev/maintainers

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR - looking good!

# The expected output in store dict
expected_out = {0: df[out_mask[0]], 1: df[out_mask[1]]}

tm.assert_frame_equal(store[0], expected_out[0].drop("B", axis=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected here can just be df.iloc[[0, 2], 1] I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apolgoies, I should have written df.iloc[[0, 2], [1]]. What I wrote will give you a Series, but with [1] instead will give you a DataFrame. This is minor, but in general we want to check the entire result in tests without modifications (in this case, a DataFrame).

expected_out = {0: df[out_mask[0]], 1: df[out_mask[1]]}

tm.assert_frame_equal(store[0], expected_out[0].drop("B", axis=1))
tm.assert_frame_equal(store[1], expected_out[1].drop("B", axis=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here.

}
)

# Empty dict to hold the chunks
Copy link
Member

@rhshadrach rhshadrach Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment and the ones below don't add anything that isn't already in the code. Can you remove them.

@jadosh123 jadosh123 force-pushed the GH/40673-test-apply-mutation branch from 6f7a751 to 7e53e6f Compare November 18, 2025 09:21
@jadosh123
Copy link
Contributor Author

@rhshadrach I've removed the comments, I tried the suggestion you made but couldn't get it to pass the test so I kept it as is if that's fine.

@jadosh123
Copy link
Contributor Author

I've updated the test test_groupby_apply_store_copy to use iloc slicing as you suggested.
I used df.iloc[[0, 2], 0] to generate the expected Series for column 'A', and changed the assertion to tm.assert_series_equal.

@jadosh123 jadosh123 requested a review from rhshadrach November 18, 2025 19:21
@jadosh123 jadosh123 force-pushed the GH/40673-test-apply-mutation branch from d9aea26 to 39eef05 Compare November 18, 2025 21:45
@jadosh123
Copy link
Contributor Author

No problem, I've changed it to compare the DataFrames now.

pandas-devGH-40673: Adds a test case to prevent regression of a bug where the internal
object reused by apply() would corrupt externally stored DataFrames created
with .copy(). This test verifies that store[0] and store[1] correctly contain
independent copies of their respective groups.
@jadosh123 jadosh123 force-pushed the GH/40673-test-apply-mutation branch from 9bb5bc6 to 11eefc4 Compare November 19, 2025 01:30
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added this to the 3.0 milestone Nov 19, 2025
@rhshadrach rhshadrach added Needs Tests Unit test(s) needed to prevent regressions Groupby Apply Apply, Aggregate, Transform, Map Bug and removed Bug labels Nov 19, 2025
@rhshadrach rhshadrach merged commit 49f4a94 into pandas-dev:main Nov 19, 2025
61 checks passed
@rhshadrach
Copy link
Member

Thanks @jadosh123!

@jadosh123 jadosh123 deleted the GH/40673-test-apply-mutation branch November 19, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apply Apply, Aggregate, Transform, Map Groupby Needs Tests Unit test(s) needed to prevent regressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Copy not working as expected within DataFrameGroupBy.apply function

5 participants