Skip to content

Commit 5e7496c

Browse files
committed
ENH: add dtype_from_format option to preserve Excel text formatting
1 parent 942c56b commit 5e7496c

File tree

9 files changed

+428
-26
lines changed

9 files changed

+428
-26
lines changed

doc/source/user_guide/io.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3327,6 +3327,39 @@ Reading Excel files
33273327
In the most basic use-case, ``read_excel`` takes a path to an Excel
33283328
file, and the ``sheet_name`` indicating which sheet to parse.
33293329

3330+
Text-formatted cells
3331+
++++++++++++++++++++
3332+
3333+
Excel workbooks often contain values that are stored as numbers but formatted as
3334+
``Text`` to preserve literal strings such as postal codes or account numbers
3335+
with leading zeros. By default, :func:`~pandas.read_excel` still converts those
3336+
cells to numeric types, which can alter the original representation. Pass
3337+
``dtype_from_format=True`` to maintain the Excel text formatting when parsing
3338+
each sheet. When enabled, pandas forces any columns or index levels that are
3339+
formatted as text in the source workbook to use string dtypes in the resulting
3340+
``Series``/``Index``.
3341+
3342+
This behavior currently applies to the ``openpyxl`` and ``xlrd`` engines. Other
3343+
engines simply ignore the flag until text format detection is implemented for
3344+
them.
3345+
3346+
.. ipython:: python
3347+
3348+
df = pd.DataFrame({"zip_code": ["00601", "02108", "10118"]})
3349+
with pd.ExcelWriter("zips.xlsx", engine="openpyxl") as writer:
3350+
df.to_excel(writer, index=False)
3351+
for cell in writer.sheets["Sheet1"]["A"]:
3352+
cell.number_format = "@" # Excel's Text format
3353+
3354+
parsed = pd.read_excel("zips.xlsx", dtype_from_format=True)
3355+
parsed.dtypes
3356+
3357+
.. ipython:: python
3358+
:suppress:
3359+
3360+
import os
3361+
os.remove("zips.xlsx")
3362+
33303363
When using the ``engine_kwargs`` parameter, pandas will pass these arguments to the
33313364
engine. For this, it is important to know which function pandas is
33323365
using internally.

0 commit comments

Comments
 (0)