@@ -3378,6 +3378,39 @@ Reading Excel files
33783378In the most basic use-case, ``read_excel `` takes a path to an Excel
33793379file, and the ``sheet_name `` indicating which sheet to parse.
33803380
3381+ Text-formatted cells
3382+ ++++++++++++++++++++
3383+
3384+ Excel workbooks often contain values that are stored as numbers but formatted as
3385+ ``Text `` to preserve literal strings such as postal codes or account numbers
3386+ with leading zeros. By default, :func: `~pandas.read_excel ` still converts those
3387+ cells to numeric types, which can alter the original representation. Pass
3388+ ``dtype_from_format=True `` to maintain the Excel text formatting when parsing
3389+ each sheet. When enabled, pandas forces any columns or index levels that are
3390+ formatted as text in the source workbook to use string dtypes in the resulting
3391+ ``Series ``/``Index ``.
3392+
3393+ This behavior currently applies to the ``openpyxl `` and ``xlrd `` engines. Other
3394+ engines simply ignore the flag until text format detection is implemented for
3395+ them.
3396+
3397+ .. ipython :: python
3398+
3399+ df = pd.DataFrame({" zip_code" : [" 00601" , " 02108" , " 10118" ]})
3400+ with pd.ExcelWriter(" zips.xlsx" , engine = " openpyxl" ) as writer:
3401+ df.to_excel(writer, index = False )
3402+ for cell in writer.sheets[" Sheet1" ][" A" ]:
3403+ cell.number_format = " @" # Excel's Text format
3404+
3405+ parsed = pd.read_excel(" zips.xlsx" , dtype_from_format = True )
3406+ parsed.dtypes
3407+
3408+ .. ipython :: python
3409+ :suppress:
3410+
3411+ import os
3412+ os.remove(" zips.xlsx" )
3413+
33813414 When using the ``engine_kwargs `` parameter, pandas will pass these arguments to the
33823415engine. For this, it is important to know which function pandas is
33833416using internally.
0 commit comments