@@ -156,21 +156,21 @@ pd.set_option("plotting.backend", "plotly")
156156
157157## Domain specific pandas extensions
158158
159- ### [ Geopandas] ( https://github.com/geopandas/geopandas )
159+ #### [ Geopandas] ( https://github.com/geopandas/geopandas )
160160
161161Geopandas extends pandas data objects to include geographic information
162162which support geometric operations. If your work entails maps and
163163geographical coordinates, and you love pandas, you should take a close
164164look at Geopandas.
165165
166- ### [ gurobipy-pandas] ( https://github.com/Gurobi/gurobipy-pandas )
166+ #### [ gurobipy-pandas] ( https://github.com/Gurobi/gurobipy-pandas )
167167
168168gurobipy-pandas provides a convenient accessor API to connect pandas with
169169gurobipy. It enables users to more easily and efficiently build mathematical
170170optimization models from data stored in DataFrames and Series, and to read
171171solutions back directly as pandas objects.
172172
173- ### [ Hail Query] ( https://hail.is/ )
173+ #### [ Hail Query] ( https://hail.is/ )
174174
175175An out-of-core, preemptible-safe, distributed, dataframe library serving
176176the genetics community. Hail Query ships with on-disk data formats,
@@ -185,14 +185,14 @@ native import to and export from pandas DataFrames:
185185- [ ` Table.from_pandas ` ] ( https://hail.is/docs/latest/hail.Table.html#hail.Table.from_pandas )
186186- [ ` Table.to_pandas ` ] ( https://hail.is/docs/latest/hail.Table.html#hail.Table.to_pandas )
187187
188- ### [ staircase] ( https://github.com/staircase-dev/staircase )
188+ #### [ staircase] ( https://github.com/staircase-dev/staircase )
189189
190190staircase is a data analysis package, built upon pandas and numpy, for modelling and
191191manipulation of mathematical step functions. It provides a rich variety of arithmetic
192192operations, relational operations, logical operations, statistical operations and
193193aggregations for step functions defined over real numbers, datetime and timedelta domains.
194194
195- ### [ xarray] ( https://github.com/pydata/xarray )
195+ #### [ xarray] ( https://github.com/pydata/xarray )
196196
197197xarray brings the labeled data power of pandas to the physical sciences
198198by providing N-dimensional variants of the core pandas data structures.
@@ -203,7 +203,7 @@ which pandas excels.
203203
204204## Data IO for pandas
205205
206- ### [ ArcticDB] ( https://github.com/man-group/ArcticDB )
206+ #### [ ArcticDB] ( https://github.com/man-group/ArcticDB )
207207
208208ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem.
209209ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale.
@@ -213,21 +213,21 @@ to object storage and can be installed in seconds.
213213
214214Please find full documentation [ here] ( https://docs.arcticdb.io/latest/ ) .
215215
216- ### [ BCPandas] ( https://github.com/yehoshuadimarsky/bcpandas )
216+ #### [ BCPandas] ( https://github.com/yehoshuadimarsky/bcpandas )
217217
218218BCPandas provides high performance writes from pandas to Microsoft SQL Server,
219219far exceeding the performance of the native `` df.to_sql `` method. Internally, it uses
220220Microsoft's BCP utility, but the complexity is fully abstracted away from the end user.
221221Rigorously tested, it is a complete replacement for `` df.to_sql `` .
222222
223- ### [ Deltalake] ( https://pypi.org/project/deltalake )
223+ #### [ Deltalake] ( https://pypi.org/project/deltalake )
224224
225225Deltalake python package lets you access tables stored in
226226[ Delta Lake] ( https://delta.io/ ) natively in Python without the need to use Spark or
227227JVM. It provides the `` delta_table.to_pyarrow_table().to_pandas() `` method to convert
228228any Delta table into Pandas dataframe.
229229
230- ### [ fredapi] ( https://github.com/mortada/fredapi )
230+ #### [ fredapi] ( https://github.com/mortada/fredapi )
231231
232232fredapi is a Python interface to the [ Federal Reserve Economic Data
233233(FRED)] ( https://fred.stlouisfed.org/ ) provided by the Federal Reserve
@@ -239,7 +239,7 @@ point-in-time data from ALFRED. fredapi makes use of pandas and returns
239239data in a Series or DataFrame. This module requires a FRED API key that
240240you can obtain for free on the FRED website.
241241
242- ### [ Hugging Face] ( https://huggingface.co/datasets )
242+ #### [ Hugging Face] ( https://huggingface.co/datasets )
243243
244244The Hugging Face Dataset Hub provides a large collection of ready-to-use
245245datasets for machine learning shared by the community. The platform offers
@@ -274,7 +274,7 @@ df.to_parquet("hf://datasets/username/dataset_name/train.parquet")
274274
275275You can find more information about the Hugging Face Dataset Hub in the [ documentation] ( https://huggingface.co/docs/hub/en/datasets ) .
276276
277- ### [ NTV-pandas] ( https://github.com/loco-philippe/ntv-pandas )
277+ #### [ NTV-pandas] ( https://github.com/loco-philippe/ntv-pandas )
278278
279279NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.
280280
@@ -297,7 +297,7 @@ df = npd.read_json(jsn) # load a JSON-value as a `DataFrame`
297297df.equals(npd.read_json(df.npd.to_json(df))) # `True` in any case, whether `table=True` or not
298298```
299299
300- ### [ pandas-datareader] ( https://github.com/pydata/pandas-datareader )
300+ #### [ pandas-datareader] ( https://github.com/pydata/pandas-datareader )
301301
302302` pandas-datareader ` is a remote data access library for pandas
303303(PyPI:` pandas-datareader ` ). It is based on functionality that was
@@ -324,14 +324,14 @@ The following data feeds are available:
324324- Stooq Index Data
325325- MOEX Data
326326
327- ### [ pandas-gbq] ( https://github.com/googleapis/python-bigquery-pandas )
327+ #### [ pandas-gbq] ( https://github.com/googleapis/python-bigquery-pandas )
328328
329329pandas-gbq provides high performance reads and writes to and from
330330[ Google BigQuery] ( https://cloud.google.com/bigquery/ ) . Previously (before version 2.2.0),
331331these methods were exposed as ` pandas.read_gbq ` and ` DataFrame.to_gbq ` .
332332Use ` pandas_gbq.read_gbq ` and ` pandas_gbq.to_gbq ` , instead.
333333
334- ### [ pandaSDMX] ( https://pandasdmx.readthedocs.io )
334+ #### [ pandaSDMX] ( https://pandasdmx.readthedocs.io )
335335
336336pandaSDMX is a library to retrieve and acquire statistical data and
337337metadata disseminated in [ SDMX] ( https://sdmx.org ) 2.1, an
@@ -344,7 +344,7 @@ MultiIndexed DataFrames.
344344
345345## Scaling pandas
346346
347- ### [ Bodo] ( https://github.com/bodo-ai/Bodo )
347+ #### [ Bodo] ( https://github.com/bodo-ai/Bodo )
348348
349349Bodo is a high-performance compute engine for Python data processing.
350350Using an auto-parallelizing just-in-time (JIT) compiler, Bodo simplifies scaling Pandas
@@ -366,26 +366,26 @@ def process_data():
366366process_data()
367367```
368368
369- ### [ Dask] ( https://docs.dask.org )
369+ #### [ Dask] ( https://docs.dask.org )
370370
371371Dask is a flexible parallel computing library for analytics. Dask
372372provides a familiar ` DataFrame ` interface for out-of-core, parallel and
373373distributed computing.
374374
375- ### [ Ibis] ( https://ibis-project.org/docs/ )
375+ #### [ Ibis] ( https://ibis-project.org/docs/ )
376376
377377Ibis offers a standard way to write analytics code, that can be run in
378378multiple engines. It helps in bridging the gap between local Python environments
379379(like pandas) and remote storage and execution systems like Hadoop components
380380(like HDFS, Impala, Hive, Spark) and SQL databases (Postgres, etc.).
381381
382- ### [ Koalas] ( https://koalas.readthedocs.io/en/latest/ )
382+ #### [ Koalas] ( https://koalas.readthedocs.io/en/latest/ )
383383
384384Koalas provides a familiar pandas DataFrame interface on top of Apache
385385Spark. It enables users to leverage multi-cores on one machine or a
386386cluster of machines to speed up or scale their DataFrame code.
387387
388- ### [ Modin] ( https://github.com/modin-project/modin )
388+ #### [ Modin] ( https://github.com/modin-project/modin )
389389
390390The `` modin.pandas `` DataFrame is a parallel and distributed drop-in replacement
391391for pandas. This means that you can use Modin with existing pandas code or write
@@ -404,21 +404,21 @@ df = pd.read_csv("big.csv") # use all your cores!
404404
405405## Data cleaning and validation for pandas
406406
407- ### [ Pandera] ( https://pandera.readthedocs.io/en/stable/ )
407+ #### [ Pandera] ( https://pandera.readthedocs.io/en/stable/ )
408408
409409Pandera provides a flexible and expressive API for performing data validation on dataframes
410410to make data processing pipelines more readable and robust.
411411Dataframes contain information that pandera explicitly validates at runtime. This is useful in
412412production-critical data pipelines or reproducible research settings.
413413
414- ### [ pyjanitor] ( https://github.com/pyjanitor-devs/pyjanitor )
414+ #### [ pyjanitor] ( https://github.com/pyjanitor-devs/pyjanitor )
415415
416416Pyjanitor provides a clean API for cleaning data, using method chaining.
417417
418418
419419## Development tools for pandas
420420
421- ### [ Hamilton] ( https://github.com/dagworks-inc/hamilton )
421+ #### [ Hamilton] ( https://github.com/dagworks-inc/hamilton )
422422
423423Hamilton is a declarative dataflow framework that came out of Stitch Fix. It was
424424designed to help one manage a Pandas code base, specifically with respect to
@@ -436,13 +436,13 @@ This helps one to scale your pandas code base, at the same time, keeping mainten
436436
437437For more information, see [ documentation] ( https://hamilton.readthedocs.io/ ) .
438438
439- ### [ IPython] ( https://ipython.org/documentation.html )
439+ #### [ IPython] ( https://ipython.org/documentation.html )
440440
441441IPython is an interactive command shell and distributed computing
442442environment. IPython tab completion works with Pandas methods and also
443443attributes like DataFrame columns.
444444
445- ### [ Jupyter Notebook / Jupyter Lab] ( https://jupyter.org )
445+ #### [ Jupyter Notebook / Jupyter Lab] ( https://jupyter.org )
446446
447447Jupyter Notebook is a web application for creating Jupyter notebooks. A
448448Jupyter notebook is a JSON document containing an ordered list of
@@ -460,7 +460,7 @@ or may not be compatible with non-HTML Jupyter output formats.)
460460See [ Options and Settings] ( https://pandas.pydata.org/docs/user_guide/options.html )
461461for pandas ` display. ` settings.
462462
463- ### [ marimo] ( https://marimo.io )
463+ #### [ marimo] ( https://marimo.io )
464464
465465marimo is a reactive notebook for Python and SQL that enhances productivity
466466when working with dataframes. It provides several features to make data
@@ -479,7 +479,7 @@ manipulation and visualization more interactive and fun:
4794796 . SQL integration: marimo allows users to write SQL queries against any
480480 pandas dataframes existing in memory.
481481
482- ### [ pandas-stubs] ( https://github.com/VirtusLab/pandas-stubs )
482+ #### [ pandas-stubs] ( https://github.com/VirtusLab/pandas-stubs )
483483
484484While pandas repository is partially typed, the package itself doesn't expose this information for external use.
485485Install pandas-stubs to enable basic type coverage of pandas API.
@@ -489,7 +489,7 @@ Learn more by reading through these issues [14468](https://github.com/pandas-dev
489489
490490See installation and usage instructions on the [ GitHub page] ( https://github.com/VirtusLab/pandas-stubs ) .
491491
492- ### [ Spyder] ( https://www.spyder-ide.org/ )
492+ #### [ Spyder] ( https://www.spyder-ide.org/ )
493493
494494Spyder is a cross-platform PyQt-based IDE combining the editing,
495495analysis, debugging and profiling functionality of a software
@@ -518,14 +518,14 @@ both automatically and on-demand.
518518
519519## Other related libraries
520520
521- ### [ Compose] ( https://github.com/alteryx/compose )
521+ #### [ Compose] ( https://github.com/alteryx/compose )
522522
523523Compose is a machine learning tool for labeling data and prediction engineering.
524524It allows you to structure the labeling process by parameterizing
525525prediction problems and transforming time-driven relational data into
526526target values with cutoff times that can be used for supervised learning.
527527
528- ### [ D-Tale] ( https://github.com/man-group/dtale )
528+ #### [ D-Tale] ( https://github.com/man-group/dtale )
529529
530530D-Tale is a lightweight web client for visualizing pandas data structures. It
531531provides a rich spreadsheet-style grid which acts as a wrapper for a lot of
@@ -544,20 +544,20 @@ D-Tale integrates seamlessly with Jupyter notebooks, Python terminals, Kaggle
544544& Google Colab. Here are some demos of the
545545[ grid] ( http://alphatechadmin.pythonanywhere.com/dtale/main/1 ) .
546546
547- ### [ Featuretools] ( https://github.com/alteryx/featuretools/ )
547+ #### [ Featuretools] ( https://github.com/alteryx/featuretools/ )
548548
549549Featuretools is a Python library for automated feature engineering built
550550on top of pandas. It excels at transforming temporal and relational
551551datasets into feature matrices for machine learning using reusable
552552feature engineering "primitives". Users can contribute their own
553553primitives in Python and share them with the rest of the community.
554554
555- ### [ IPython Vega] ( https://github.com/vega/ipyvega )
555+ #### [ IPython Vega] ( https://github.com/vega/ipyvega )
556556
557557[ IPython Vega] ( https://github.com/vega/ipyvega ) leverages
558558[ Vega] ( https://github.com/vega/vega ) to create plots within Jupyter Notebook.
559559
560- ### [ plotnine] ( https://github.com/has2k1/plotnine/ )
560+ #### [ plotnine] ( https://github.com/has2k1/plotnine/ )
561561
562562Hadley Wickham's [ ggplot2] ( https://ggplot2.tidyverse.org/ ) is a
563563foundational exploratory visualization package for the R language. Based
@@ -568,7 +568,7 @@ generate bespoke plots of any kind of data.
568568Various implementations to other languages are available.
569569A good implementation for Python users is [ has2k1/plotnine] ( https://github.com/has2k1/plotnine/ ) .
570570
571- ### [ pygwalker] ( https://github.com/Kanaries/pygwalker )
571+ #### [ pygwalker] ( https://github.com/Kanaries/pygwalker )
572572
573573PyGWalker is an interactive data visualization and
574574exploratory data analysis tool built upon Graphic Walker
@@ -582,7 +582,7 @@ import pygwalker as pyg
582582pyg.walk(df)
583583```
584584
585- ### [ seaborn] ( https://seaborn.pydata.org )
585+ #### [ seaborn] ( https://seaborn.pydata.org )
586586
587587Seaborn is a Python visualization library based on
588588[ matplotlib] ( https://matplotlib.org ) . It provides a high-level,
@@ -599,13 +599,13 @@ import seaborn as sns
599599sns.set_theme()
600600```
601601
602- ### [ skrub] ( https://skrub-data.org )
602+ #### [ skrub] ( https://skrub-data.org )
603603
604604Skrub facilitates machine learning on dataframes. It bridges pandas
605605to scikit-learn and related. In particular it facilitates building
606606features from dataframes.
607607
608- ### [ Statsmodels] ( https://www.statsmodels.org/ )
608+ #### [ Statsmodels] ( https://www.statsmodels.org/ )
609609
610610Statsmodels is the prominent Python "statistics and econometrics
611611library" and it has a long-standing special relationship with pandas.
@@ -614,7 +614,7 @@ modeling functionality that is out of pandas' scope. Statsmodels
614614leverages pandas objects as the underlying data container for
615615computation.
616616
617- ### [ STUMPY] ( https://github.com/TDAmeritrade/stumpy )
617+ #### [ STUMPY] ( https://github.com/TDAmeritrade/stumpy )
618618
619619STUMPY is a powerful and scalable Python library for modern time series analysis.
620620At its core, STUMPY efficiently computes something called a
0 commit comments