Disable loss rounding in training stats log #42104

AnMakc · 2025-11-07T19:10:41Z

This PR disables unconditional rounding of the loss value to 4 digits in logged values. Effectively reverting part of #9491.

Closes #38032 (stale)

Motivation:

Loss value may have very low absolute magnitude depending on model/data/objective function.
No other values are rounded (e.g. eval/loss or grad_norm) which may result in confusion.
Rounding is a display/formatting detail in this case and should not be done silently in non-UI code

Examples of undesired behavior that would be fixed with this PR:

Notebook outputs:

Wandb loss chart:

Rocketknight1 · 2025-11-10T13:17:18Z

cc @SunMarc, but maybe formatting using something like %.4f so that it always displays 4 significant figures with an optional exponent would preserve the clean rendering but still work in cases when the loss is very small?

AnMakc · 2025-11-10T13:29:41Z

@Rocketknight1 Any rounding at this point introduces inconsistency between validation and training loss. We could not apply formatting here as we log numeric value, not string.

The main point is that any formatting is a representation detail that should be handled by the display-related code and we should not make assumptions on how it works as in most cases it is some external handler (wandb, trackio, ipython NB widget, etc.).

AnMakc · 2025-11-24T22:26:38Z

@Rocketknight1 @SunMarc

Please, take a look

SunMarc

I don't mind merging this PR. I also saw a couple of issues before complaining about this. cc @stas00 as this is something you added way back

stas00 · 2025-11-25T16:13:08Z

This use case didn't exist when this was originally coded. The goal was to avoid a pointless output like loss=1.3982828288737373

I wonder if using a scientific notation can address this use-case while not making it an ugly experience for all other use cases. loss=2.5444e-4 for the example above. Same 4 decimals, except the exponent overcomes the issue of lost precision in the original.

AnMakc · 2025-11-25T16:28:00Z

I wonder if using a scientific notation can address this use-case

At this point these are still numeric values so we can't apply formatting, see #42104 (comment)

stas00 · 2025-11-25T17:13:31Z

Apply at the point of print then, while saving the full precision in the log file?

AnMakc · 2025-11-25T18:29:55Z

Apply at the point of print then, while saving the full precision in the log file?

There are multiple ways of how the values could be displayed - jupyter notebooks, reporting to wandb, trackio, etc. Each handles display rounding separately. See attached screenshots in the first message here.

This PR makes this behaviour explicit and consistent across different type of values/metrics. If there is a benefit of rounding the metric in specific logs - it could be added in a separate PR.

stas00 · 2025-11-25T19:46:43Z

Whatever works. To make it clear - I wasn't objecting to your PR, was just suggesting on possible further work to keep things neat for multiple use-cases since I was tagged as the originator of this "feature".

AnMakc · 2025-11-25T20:16:15Z

Sure, just clarified the intention. Thanks!

@SunMarc Could you merge please?

SunMarc · 2025-11-26T15:14:43Z

To not have regression for other users, maybe we can implement @Rocketknight1 suggestions in PrinterCallback class that we have in transformers @AnMakc ? It should be quite straightforward.

maybe formatting using something like %.4f so that it always displays 4 significant figures with an optional exponent would preserve the clean rendering but still work in cases when the loss is very small?

Thanks a lot @stas00 for the suggestion

HuggingFaceDocBuilderDev · 2025-11-26T15:18:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Much better, thank you !

AnMakc · 2025-11-26T17:08:44Z

@SunMarc

Updated the PR to format metrics in console with 4 significant digits.
Python default is to use scientific notation for values <1e-4 which seems sane threshold.

Example output:
{'loss': '0.0002014', 'grad_norm': '0.0001294', 'learning_rate': '2.179e-07', 'epoch': '0.0003269'}

Note - epoch formatted the same way for consistency.

stas00 · 2025-11-26T18:45:27Z

that looks great, except 'epoch': '0.0003269' which is not a sensible format when facing a human reader, IMHO.

Should it report the percentage of epoch progress? 0.03% maybe?

AnMakc · 2025-11-26T19:01:02Z

Percentage would not work for second+ epoch.

stas00 · 2025-11-26T21:47:31Z

why not? 135.03% unless we don't have the knowledge about which epoch is running?

stas00 · 2025-11-26T21:52:14Z

But please feel free to ignore my suggestions if the current maintainers are ok with your last proposal. They decide.

SunMarc · 2025-11-27T10:32:57Z

I think we can keep this for now. If ppl complains about that, we can still do follow-up pr for specific metrics

AnMakc force-pushed the disable-loss-rounding branch from 686b9df to 45fa1e7 Compare November 7, 2025 19:36

SunMarc approved these changes Nov 25, 2025

View reviewed changes

SunMarc requested a review from qgallouedec November 25, 2025 15:23

qgallouedec approved these changes Nov 25, 2025

View reviewed changes

AnMakc force-pushed the disable-loss-rounding branch from 45fa1e7 to 05a8b5e Compare November 25, 2025 20:16

AnMakc added 2 commits November 26, 2025 13:47

Disable loss rounding in training stats log

8925167

Format metrics in console logs.

e745b00

AnMakc force-pushed the disable-loss-rounding branch from 05a8b5e to e745b00 Compare November 26, 2025 16:58

SunMarc approved these changes Nov 26, 2025

View reviewed changes

SunMarc merged commit 7094f1e into huggingface:main Nov 27, 2025
23 checks passed

Disable loss rounding in training stats log #42104

Disable loss rounding in training stats log #42104

Conversation

AnMakc commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples of undesired behavior that would be fixed with this PR:

Uh oh!

Rocketknight1 commented Nov 10, 2025

Uh oh!

AnMakc commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AnMakc commented Nov 24, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

stas00 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AnMakc commented Nov 25, 2025

Uh oh!

stas00 commented Nov 25, 2025

Uh oh!

AnMakc commented Nov 25, 2025

Uh oh!

stas00 commented Nov 25, 2025

Uh oh!

AnMakc commented Nov 25, 2025

Uh oh!

SunMarc commented Nov 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 26, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

AnMakc commented Nov 26, 2025

Uh oh!

stas00 commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AnMakc commented Nov 26, 2025

Uh oh!

stas00 commented Nov 26, 2025

Uh oh!

stas00 commented Nov 26, 2025

Uh oh!

SunMarc commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AnMakc commented Nov 7, 2025 •

edited

Loading

AnMakc commented Nov 10, 2025 •

edited

Loading

stas00 commented Nov 25, 2025 •

edited

Loading

stas00 commented Nov 26, 2025 •

edited

Loading