@@ -3327,6 +3327,39 @@ Reading Excel files
33273327In the most basic use-case, ``read_excel `` takes a path to an Excel
33283328file, and the ``sheet_name `` indicating which sheet to parse.
33293329
3330+ Text-formatted cells
3331+ ++++++++++++++++++++
3332+
3333+ Excel workbooks often contain values that are stored as numbers but formatted as
3334+ ``Text `` to preserve literal strings such as postal codes or account numbers
3335+ with leading zeros. By default, :func: `~pandas.read_excel ` still converts those
3336+ cells to numeric types, which can alter the original representation. Pass
3337+ ``dtype_from_format=True `` to maintain the Excel text formatting when parsing
3338+ each sheet. When enabled, pandas forces any columns or index levels that are
3339+ formatted as text in the source workbook to use string dtypes in the resulting
3340+ ``Series ``/``Index ``.
3341+
3342+ This behavior currently applies to the ``openpyxl `` and ``xlrd `` engines. Other
3343+ engines simply ignore the flag until text format detection is implemented for
3344+ them.
3345+
3346+ .. ipython :: python
3347+
3348+ df = pd.DataFrame({" zip_code" : [" 00601" , " 02108" , " 10118" ]})
3349+ with pd.ExcelWriter(" zips.xlsx" , engine = " openpyxl" ) as writer:
3350+ df.to_excel(writer, index = False )
3351+ for cell in writer.sheets[" Sheet1" ][" A" ]:
3352+ cell.number_format = " @" # Excel's Text format
3353+
3354+ parsed = pd.read_excel(" zips.xlsx" , dtype_from_format = True )
3355+ parsed.dtypes
3356+
3357+ .. ipython :: python
3358+ :suppress:
3359+
3360+ import os
3361+ os.remove(" zips.xlsx" )
3362+
33303363 When using the ``engine_kwargs `` parameter, pandas will pass these arguments to the
33313364engine. For this, it is important to know which function pandas is
33323365using internally.
0 commit comments