@@ -738,18 +738,17 @@ unnecessarily, and avoid objects with large numbers of attributes.
738738
739739.. _dataframeformat :
740740
741- Fetching using the DataFrame Interchange Protocol
742- -------------------------------------------------
743-
744- Python-oracledb can fetch directly to the `Python DataFrame Interchange
745- Protocol <https://data-apis.org/dataframe-protocol/latest/index.html> `__
746- format. This can reduce application memory requirements and allow zero-copy
747- data interchanges between Python data frame libraries. It is an efficient way
748- to work with data using Python libraries such as `Apache Arrow
749- <https://arrow.apache.org/> `__, `Pandas <https://pandas.pydata.org >`__, `Polars
750- <https://pola.rs/> `__, `NumPy <https://numpy.org/ >`__, `PyTorch
751- <https://pytorch.org/> `__, or to write files in `Apache Parquet
752- <https://parquet.apache.org/> `__ format.
741+ Fetching Data Frames
742+ --------------------
743+
744+ Python-oracledb can fetch directly to data frames that expose an Apache Arrow
745+ PyCapsule Interface. This can reduce application memory requirements and allow
746+ zero-copy data interchanges between Python data frame libraries. It is an
747+ efficient way to work with data using Python libraries such as `Apache PyArrow
748+ <https://arrow.apache.org/docs/python/index.html> `__, `Pandas
749+ <https://pandas.pydata.org> `__, `Polars <https://pola.rs/ >`__, `NumPy
750+ <https://numpy.org/> `__, `PyTorch <https://pytorch.org/ >`__, or to write files
751+ in `Apache Parquet <https://parquet.apache.org/ >`__ format.
753752
754753.. note ::
755754
@@ -759,9 +758,7 @@ to work with data using Python libraries such as `Apache Arrow
759758The method :meth: `Connection.fetch_df_all() ` fetches all rows from a query.
760759The method :meth: `Connection.fetch_df_batches() ` implements an iterator for
761760fetching batches of rows. The methods return :ref: `OracleDataFrame
762- <oracledataframeobj>` objects, whose :ref: `methods <oracledataframemeth >`
763- implement the Python DataFrame Interchange Protocol `DataFrame API Interface
764- <https://data-apis.org/dataframe-protocol/latest/API.html> `__.
761+ <oracledataframeobj>` objects.
765762
766763For example, to fetch all rows from a query and print some information about
767764the results:
@@ -782,13 +779,36 @@ With Oracle Database's standard DEPARTMENTS table, this would display::
782779 4 columns
783780 27 rows
784781
785- To do more extensive operations on an :ref: `OracleDataFrame
786- <oracledataframeobj>`, it can be converted to an appropriate library class, and
787- then methods of that library can be used. For example it could be converted to
788- a `Pandas DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.
789- DataFrame.html#pandas.DataFrame> `__, or to a `PyArrow table
790- <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html> `__ as shown
791- later.
782+ **Summary of Converting OracleDataFrame to Other Data Frames **
783+
784+ To do more extensive operations, :ref: `OracleDataFrames <oracledataframeobj >`
785+ can be converted to your chosen library data frame, and then methods of that
786+ library can be used. This section has an overview of how best to do
787+ conversions. Some examples are shown in subsequent sections.
788+
789+ To convert :ref: `OracleDataFrame <oracledataframeobj >` to a `PyArrow Table
790+ <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html> `__, use
791+ `pyarrow.Table.from_arrays()
792+ <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_arrays> `__
793+ which leverages the Arrow PyCapsule interface.
794+
795+ To convert :ref: `OracleDataFrame <oracledataframeobj >` to a `Pandas DataFrame
796+ <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame> `__,
797+ use `pyarrow.Table.to_pandas()
798+ <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas> `__.
799+
800+ If you want to use a data frame library other than Pandas or PyArrow, use the
801+ library's ``from_arrow() `` method to convert a PyArrow Table to the applicable
802+ data frame, if your library supports this. For example, with `Polars
803+ <https://pola.rs/> `__ use `polars.from_arrow()
804+ <https://docs.pola.rs/api/python/dev/reference/api/polars.from_arrow.html> `__.
805+
806+ Lastly, if your data frame library does not support ``from_arrow() ``, then use
807+ ``from_dataframe() `` if the library supports it. This can be slower, depending
808+ on the implementation.
809+
810+ The general recommendation is to use Apache Arrow as much as possible but if
811+ there are no options, then use ``from_dataframe() ``.
792812
793813**Data Frame Type Mapping **
794814
@@ -797,8 +817,8 @@ support makes use of `Apache nanoarrow <https://arrow.apache.org/nanoarrow/>`__
797817libraries to build data frames.
798818
799819The following data type mapping occurs from Oracle Database types to the Arrow
800- types used in OracleDataFrame objects. Querying any other types from Oracle
801- Database will result in an exception.
820+ types used in OracleDataFrame objects. Querying any other data types from
821+ Oracle Database will result in an exception.
802822
803823.. list-table-with-summary ::
804824 :header-rows: 1
@@ -830,7 +850,6 @@ Database will result in an exception.
830850 * - DB_TYPE_TIMESTAMP_TZ
831851 - TIMESTAMP
832852
833-
834853When converting Oracle Database NUMBERs, if :attr: `defaults.fetch_decimals ` is
835854*True *, the Arrow data type is DECIMAL128. Note Arrow's DECIMAL128 format only
836855supports precision of up to 38 decimal digits. Else, if the Oracle number data
@@ -895,6 +914,11 @@ An example that creates and uses a `PyArrow Table
895914 This makes use of :meth: `OracleDataFrame.column_arrays() ` which returns a list
896915of :ref: `OracleArrowArray Objects <oraclearrowarrayobj >`.
897916
917+ Internally `pyarrow.Table.from_arrays() <https://arrow.apache.org/docs/python/
918+ generated/pyarrow.Table.html#pyarrow.Table.from_arrays> `__ leverages the Apache
919+ Arrow PyCapsule interface that :ref: `OracleDataFrame <oracledataframeobj >`
920+ exposes.
921+
898922See `samples/dataframe_pyarrow.py <https://github.com/oracle/python-oracledb/
899923blob/main/samples/dataframe_pyarrow.py> `__ for a runnable example.
900924
@@ -924,17 +948,19 @@ org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame>`__ is:
924948 print (df.T) # transform
925949 print (df.tail(3 )) # last three rows
926950
927- Using python-oracledb to fetch the interchange format will be more efficient
928- than using the Pandas ``read_sql() `` method.
951+ The `to_pandas() <https://arrow.apache.org/docs/python/generated/pyarrow.Table.
952+ html#pyarrow.Table.to_pandas> `__ method supports arguments like
953+ ``types_mapper=pandas.ArrowDtype `` and ``deduplicate_objects=False ``, which may
954+ be useful for some data sets.
929955
930956See `samples/dataframe_pandas.py <https://github.com/oracle/python-oracledb/
931957blob/main/samples/dataframe_pandas.py> `__ for a runnable example.
932958
933- Creating Polars Series
934- ++++++++++++++++++++++
959+ Creating Polars DataFrames
960+ ++++++++++++++++++++++++++
935961
936- An example that creates and uses a `Polars Series
937- <https://docs.pola.rs/api/python/stable/reference/series /index.html> `__ is:
962+ An example that creates and uses a `Polars DataFrame
963+ <https://docs.pola.rs/api/python/stable/reference/dataframe /index.html> `__ is:
938964
939965.. code-block :: python
940966
@@ -946,13 +972,16 @@ An example that creates and uses a `Polars Series
946972 sql = " select id from SampleQueryTab order by id"
947973 odf = connection.fetch_df_all(statement = sql, arraysize = 100 )
948974
949- # Convert to a Polars Series
950- pyarrow_array = pyarrow.array(odf.get_column_by_name(" ID" ))
951- p = polars.from_arrow(pyarrow_array)
975+ # Convert to a Polars DataFrame
976+ pyarrow_table = pyarrow.Table.from_arrays(
977+ odf.column_arrays(), names = odf.column_names()
978+ )
979+ df = polars.from_arrow(pyarrow_table)
952980
953- # Perform various Polars operations on the Series
981+ # Perform various Polars operations on the DataFrame
982+ r, c = df.shape
983+ print (f " { r} rows, { c} columns " )
954984 print (p.sum())
955- print (p.log10())
956985
957986 See `samples/dataframe_polars.py <https://github.com/oracle/python-oracledb/
958987blob/main/samples/dataframe_polars.py> `__ for a runnable example.
0 commit comments