Skip to content

Conversation

@Alvaro-Kothe
Copy link
Member


Implementation

Check if value is finite using the C macro isfinite and use special verification for complex types.

Tests

In the tests, np.finfo(np.float64).max is equal to np.float64(1.7976931348623157e+308) and np.finfo(np.float64).min is equal to np.float64(-1.7976931348623157e+308). It requires 3 values to triggerNaN.

.noseids
.ipynb_checkpoints
.tags
tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags is an index file generated by ctags and I use it to navigate in Cython code. I thnk this file should be added to the .gitignore since it's auto generated and shouldn't go into version control.

If you don't want this file going into this PR, I can remove it and create a separate PR for it.

@jbrockmendel
Copy link
Member

perf impact?

@Alvaro-Kothe
Copy link
Member Author

perf impact?

$ asv continuous -E virtualenv:3.13 upstream/main HEAD -b "groupby.*sum" -a repeat=20

| Change   | Before [2d73d629] <main>   | After [dc6a26cf] <fix/group_sumNaN>   |   Ratio | Benchmark (Parameter)                                                               |
|----------|----------------------------|---------------------------------------|---------|-------------------------------------------------------------------------------------|
| +        | 40.6±3ms                   | 47.7±1ms                              |    1.17 | groupby.GroupByCythonAgg.time_frame_agg('float64', 'sum')                           |
| +        | 87.2±10μs                  | 101±0.4μs                             |    1.15 | groupby.GroupByMethods.time_dtype_as_group('int16', 'sum', 'direct', 1, 'cython')   |
| -        | 76.5±0.8ms                 | 69.3±0.5ms                            |    0.91 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Float64', 'sum')                   |
| -        | 98.3±2μs                   | 89.0±10μs                             |    0.91 | groupby.GroupByMethods.time_dtype_as_group('uint', 'sum', 'direct', 1, 'cython')    |
| -        | 99.7±2ms                   | 90.1±2ms                              |    0.9  | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int64', 'sum')                     |
| -        | 68.0±2μs                   | 60.7±8μs                              |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'cumsum', 'direct', 1, 'cython') |
| -        | 9.26±0.06ms                | 8.09±0.1ms                            |    0.87 | groupby.SumTimeDelta.time_groupby_sum_timedelta                                     |
| -        | 15.3±2ms                   | 12.8±0.1ms                            |    0.84 | groupby.Cumulative.time_frame_transform('float64', 'cumsum', True)                  |
| -        | 735±20μs                   | 614±50μs                              |    0.84 | groupby.GroupByMethods.time_dtype_as_group('float', 'sum', 'direct', 1, 'numba')    |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: inconsistent treatment of overflows between groupby.sum() and groupby.apply(lambda: _grp: _grp.sum()) and DataFrame.resample

2 participants