Skip to content

Commit b6e4e02

Browse files
committed
ENH: add dtype_from_format option to preserve Excel text formatting
1 parent 4f4b108 commit b6e4e02

File tree

9 files changed

+428
-24
lines changed

9 files changed

+428
-24
lines changed

doc/source/user_guide/io.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3378,6 +3378,39 @@ Reading Excel files
33783378
In the most basic use-case, ``read_excel`` takes a path to an Excel
33793379
file, and the ``sheet_name`` indicating which sheet to parse.
33803380

3381+
Text-formatted cells
3382+
++++++++++++++++++++
3383+
3384+
Excel workbooks often contain values that are stored as numbers but formatted as
3385+
``Text`` to preserve literal strings such as postal codes or account numbers
3386+
with leading zeros. By default, :func:`~pandas.read_excel` still converts those
3387+
cells to numeric types, which can alter the original representation. Pass
3388+
``dtype_from_format=True`` to maintain the Excel text formatting when parsing
3389+
each sheet. When enabled, pandas forces any columns or index levels that are
3390+
formatted as text in the source workbook to use string dtypes in the resulting
3391+
``Series``/``Index``.
3392+
3393+
This behavior currently applies to the ``openpyxl`` and ``xlrd`` engines. Other
3394+
engines simply ignore the flag until text format detection is implemented for
3395+
them.
3396+
3397+
.. ipython:: python
3398+
3399+
df = pd.DataFrame({"zip_code": ["00601", "02108", "10118"]})
3400+
with pd.ExcelWriter("zips.xlsx", engine="openpyxl") as writer:
3401+
df.to_excel(writer, index=False)
3402+
for cell in writer.sheets["Sheet1"]["A"]:
3403+
cell.number_format = "@" # Excel's Text format
3404+
3405+
parsed = pd.read_excel("zips.xlsx", dtype_from_format=True)
3406+
parsed.dtypes
3407+
3408+
.. ipython:: python
3409+
:suppress:
3410+
3411+
import os
3412+
os.remove("zips.xlsx")
3413+
33813414
When using the ``engine_kwargs`` parameter, pandas will pass these arguments to the
33823415
engine. For this, it is important to know which function pandas is
33833416
using internally.

0 commit comments

Comments
 (0)