Skip to content

Commit 0540813

Browse files
authored
Unpin dask/distributed for development (#1319)
* Unpin dask/distributed for development * First pass at unblocking dask-expr issues - replace _Frame usage * First pass at unblocking pytest errors * Disable predicate pushdown & its tests if dask-expr enabled * Make sure partition_borders is computed in limit map_partitions func * Skip intake tests for now * Simplify cross join logic to avoid internal graph manipulation * Round trip timeseries fixture to pandas to avoid dask-expr bug * Fix skipif_dask_expr_enabled marker * Ignore warnings around mismatched dtypes in joins * Add handling for dask-expr to test_broadcast_join * Skip parquet stats tests for now * Skip DPP tests on dask-expr for now * Pass ddf object as meta for test_describe_model * Add dask-expr handling to test_sort_topk * Avoid using Dask graph internals for random functions * Skip over window count tests for now * Skip test_over_calls and test_over_with_windows * Update timeseries fixture comment to acknowledge fix * More detailed messages for window test skips * Skip test_join_alias_w_projection for now * Un-xfail test_xgboost_training_prediction on win32 * Windows failures are still intermittent * Bump rust to 1.73 to circumvent conda sha256 errors * Disable query planning in GPU CI for now * Revert "Bump rust to 1.73 to circumvent conda sha256 errors" This reverts commit 35aa225. * Use older conda-build version to try and resolve build issues * Pin to an older version of conda-build and boa * Skip deadlocking xgboost test on GPU * Add subset of testing with query planning disabled * Add query-planning to job names * Fix syntax errors * Add dask-expr to CI environments, bump to pandas 2 * Bump dask/dask-expr to 2024.2.1/0.5 to get around aggregation bug * Bump dask / dask-expr to 2024.3.1 / 1.0.5 to resolve drop bug * Bump dask / dask-expr to 2024.4.1 / 1.0.11 to resolve head bug * Remove dask-expr workaround from timeseries fixture * Unpin sqlalchemy in python 3.9 CI environment
1 parent 7600f60 commit 0540813

39 files changed

+281
-176
lines changed

.github/workflows/conda.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ jobs:
7676
channel-priority: strict
7777
- name: Install dependencies
7878
run: |
79-
mamba install -c conda-forge boa conda-verify
79+
mamba install -c conda-forge "boa<0.17" "conda-build<24.1" conda-verify
8080
8181
which python
8282
pip list

.github/workflows/test-upstream.yml

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,25 +11,38 @@ defaults:
1111

1212
jobs:
1313
test-dev:
14-
name: "Test upstream dev (${{ matrix.os }}, python: ${{ matrix.python }}, distributed: ${{ matrix.distributed }})"
14+
name: "Test upstream dev (${{ matrix.os }}, python: ${{ matrix.python }}, distributed: ${{ matrix.distributed }}, query-planning: ${{ matrix.query-planning }})"
1515
runs-on: ${{ matrix.os }}
1616
env:
1717
CONDA_FILE: continuous_integration/environment-${{ matrix.python }}.yaml
1818
DASK_SQL_DISTRIBUTED_TESTS: ${{ matrix.distributed }}
19+
DASK_DATAFRAME__QUERY_PLANNING: ${{ matrix.query-planning }}
1920
strategy:
2021
fail-fast: false
2122
matrix:
2223
os: [ubuntu-latest, windows-latest, macos-latest]
2324
python: ["3.9", "3.10", "3.11", "3.12"]
2425
distributed: [false]
26+
query-planning: [true]
2527
include:
2628
# run tests on a distributed client
2729
- os: "ubuntu-latest"
2830
python: "3.9"
2931
distributed: true
32+
query-planning: true
3033
- os: "ubuntu-latest"
3134
python: "3.11"
3235
distributed: true
36+
query-planning: true
37+
# run tests with query planning disabled
38+
- os: "ubuntu-latest"
39+
python: "3.9"
40+
distributed: false
41+
query-planning: false
42+
- os: "ubuntu-latest"
43+
python: "3.11"
44+
distributed: false
45+
query-planning: false
3346
steps:
3447
- uses: actions/checkout@v4
3548
with:
@@ -72,8 +85,12 @@ jobs:
7285
path: test-${{ matrix.os }}-py${{ matrix.python }}-results.jsonl
7386

7487
import-dev:
75-
name: "Test importing with bare requirements and upstream dev"
88+
name: "Test importing with bare requirements and upstream dev (query-planning: ${{ matrix.query-planning }})"
7689
runs-on: ubuntu-latest
90+
strategy:
91+
fail-fast: false
92+
matrix:
93+
query-planning: [true, false]
7794
steps:
7895
- uses: actions/checkout@v4
7996
- name: Set up Python
@@ -93,8 +110,11 @@ jobs:
93110
- name: Install upstream dev Dask
94111
run: |
95112
python -m pip install git+https://github.com/dask/dask
113+
python -m pip install git+https://github.com/dask/dask-expr
96114
python -m pip install git+https://github.com/dask/distributed
97115
- name: Try to import dask-sql
116+
env:
117+
DASK_DATAFRAME_QUERY_PLANNING: ${{ matrix.query-planning }}
98118
run: |
99119
python -c "import dask_sql; print('ok')"
100120

.github/workflows/test.yml

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,26 +33,39 @@ jobs:
3333
keyword: "[test-upstream]"
3434

3535
test:
36-
name: "Build & Test (${{ matrix.os }}, python: ${{ matrix.python }}, distributed: ${{ matrix.distributed }})"
36+
name: "Build & Test (${{ matrix.os }}, python: ${{ matrix.python }}, distributed: ${{ matrix.distributed }}, query-planning: ${{ matrix.query-planning }})"
3737
needs: [detect-ci-trigger]
3838
runs-on: ${{ matrix.os }}
3939
env:
4040
CONDA_FILE: continuous_integration/environment-${{ matrix.python }}.yaml
4141
DASK_SQL_DISTRIBUTED_TESTS: ${{ matrix.distributed }}
42+
DASK_DATAFRAME__QUERY_PLANNING: ${{ matrix.query-planning }}
4243
strategy:
4344
fail-fast: false
4445
matrix:
4546
os: [ubuntu-latest, windows-latest, macos-latest]
4647
python: ["3.9", "3.10", "3.11", "3.12"]
4748
distributed: [false]
49+
query-planning: [true]
4850
include:
4951
# run tests on a distributed client
5052
- os: "ubuntu-latest"
5153
python: "3.9"
5254
distributed: true
55+
query-planning: true
5356
- os: "ubuntu-latest"
5457
python: "3.11"
5558
distributed: true
59+
query-planning: true
60+
# run tests with query planning disabled
61+
- os: "ubuntu-latest"
62+
python: "3.9"
63+
distributed: false
64+
query-planning: false
65+
- os: "ubuntu-latest"
66+
python: "3.11"
67+
distributed: false
68+
query-planning: false
5669
steps:
5770
- uses: actions/checkout@v4
5871
- name: Set up Python
@@ -96,9 +109,13 @@ jobs:
96109
uses: codecov/codecov-action@v3
97110

98111
import:
99-
name: "Test importing with bare requirements"
112+
name: "Test importing with bare requirements (query-planning: ${{ matrix.query-planning }})"
100113
needs: [detect-ci-trigger]
101114
runs-on: ubuntu-latest
115+
strategy:
116+
fail-fast: false
117+
matrix:
118+
query-planning: [true, false]
102119
steps:
103120
- uses: actions/checkout@v4
104121
- name: Set up Python
@@ -119,7 +136,10 @@ jobs:
119136
if: needs.detect-ci-trigger.outputs.triggered == 'true'
120137
run: |
121138
python -m pip install git+https://github.com/dask/dask
139+
python -m pip install git+https://github.com/dask/dask-expr
122140
python -m pip install git+https://github.com/dask/distributed
123141
- name: Try to import dask-sql
142+
env:
143+
DASK_DATAFRAME_QUERY_PLANNING: ${{ matrix.query-planning }}
124144
run: |
125145
python -c "import dask_sql; print('ok')"

continuous_integration/docker/conda.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
python>=3.9
2-
dask==2024.1.1
2+
dask>=2024.4.1
33
pandas>=1.4.0
44
jpype1>=1.0.2
55
openjdk>=8

continuous_integration/docker/main.dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ RUN mamba install -y \
1616
# build requirements
1717
"maturin>=1.3,<1.4" \
1818
# core dependencies
19-
"dask==2024.1.1" \
19+
"dask>=2024.4.1" \
2020
"pandas>=1.4.0" \
2121
"fastapi>=0.92.0" \
2222
"httpx>=0.24.1" \

continuous_integration/environment-3.10.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ channels:
33
- conda-forge
44
dependencies:
55
- c-compiler
6-
- dask==2024.1.1
6+
- dask>=2024.4.1
7+
- dask-expr>=1.0.11
78
- fastapi>=0.92.0
89
- fugue>=0.7.3
910
- httpx>=0.24.1
@@ -14,7 +15,7 @@ dependencies:
1415
- mlflow>=2.9
1516
- mock
1617
- numpy>=1.22.4
17-
- pandas>=1.4.0
18+
- pandas>=2
1819
- pre-commit
1920
- prompt_toolkit>=3.0.8
2021
- psycopg2

continuous_integration/environment-3.11.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ channels:
33
- conda-forge
44
dependencies:
55
- c-compiler
6-
- dask==2024.1.1
6+
- dask>=2024.4.1
7+
- dask-expr>=1.0.11
78
- fastapi>=0.92.0
89
- fugue>=0.7.3
910
- httpx>=0.24.1
@@ -14,7 +15,7 @@ dependencies:
1415
- mlflow>=2.9
1516
- mock
1617
- numpy>=1.22.4
17-
- pandas>=1.4.0
18+
- pandas>=2
1819
- pre-commit
1920
- prompt_toolkit>=3.0.8
2021
- psycopg2

continuous_integration/environment-3.12.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ channels:
33
- conda-forge
44
dependencies:
55
- c-compiler
6-
- dask==2024.1.1
6+
- dask>=2024.4.1
7+
- dask-expr>=1.0.11
78
- fastapi>=0.92.0
89
- fugue>=0.7.3
910
- httpx>=0.24.1
@@ -15,7 +16,7 @@ dependencies:
1516
# - mlflow>=2.9
1617
- mock
1718
- numpy>=1.22.4
18-
- pandas>=1.4.0
19+
- pandas>=2
1920
- pre-commit
2021
- prompt_toolkit>=3.0.8
2122
- psycopg2

continuous_integration/environment-3.9.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ channels:
33
- conda-forge
44
dependencies:
55
- c-compiler
6-
- dask=2024.1.1
6+
- dask=2024.4.1
7+
- dask-expr=1.0.11
78
- fastapi=0.92.0
89
- fugue=0.7.3
910
- httpx=0.24.1
@@ -14,7 +15,7 @@ dependencies:
1415
- mlflow=2.9
1516
- mock
1617
- numpy=1.22.4
17-
- pandas=1.4.0
18+
- pandas=2
1819
- pre-commit
1920
- prompt_toolkit=3.0.8
2021
- psycopg2
@@ -29,8 +30,7 @@ dependencies:
2930
- py-xgboost=2.0.3
3031
- scikit-learn=1.0.0
3132
- sphinx
32-
# TODO: remove this constraint when we require pandas>2
33-
- sqlalchemy<2
33+
- sqlalchemy
3434
- tpot>=0.12.0
3535
# FIXME: https://github.com/fugue-project/fugue/issues/526
3636
- triad<0.9.2

continuous_integration/gpuci/build.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ cd "$WORKSPACE"
2323
# Determine CUDA release version
2424
export CUDA_REL=${CUDA_VERSION%.*}
2525

26+
# TODO: remove once RAPIDS 24.06 has support for query planning
27+
export DASK_DATAFRAME__QUERY_PLANNING=false
28+
2629
################################################################################
2730
# SETUP - Check environment
2831
################################################################################
@@ -61,4 +64,4 @@ conda config --show-sources
6164
conda list --show-channel-urls
6265

6366
rapids-logger "Python py.test for dask-sql"
64-
py.test $WORKSPACE -n 4 -v -m gpu --runqueries --rungpu --junitxml="$WORKSPACE/junit-dask-sql.xml" --cov-config="$WORKSPACE/.coveragerc" --cov=dask_sql --cov-report=xml:"$WORKSPACE/dask-sql-coverage.xml" --cov-report term
67+
py.test $WORKSPACE -n $PARALLEL_LEVEL -v -m gpu --runqueries --rungpu --junitxml="$WORKSPACE/junit-dask-sql.xml" --cov-config="$WORKSPACE/.coveragerc" --cov=dask_sql --cov-report=xml:"$WORKSPACE/dask-sql-coverage.xml" --cov-report term

0 commit comments

Comments
 (0)