Skip to content

Conversation

@AnMakc
Copy link
Contributor

@AnMakc AnMakc commented Nov 7, 2025

This PR disables unconditional rounding of the loss value to 4 digits in logged values. Effectively reverting part of #9491.

Closes #38032 (stale)

Motivation:

  • Loss value may have very low absolute magnitude depending on model/data/objective function.
  • No other values are rounded (e.g. eval/loss or grad_norm) which may result in confusion.
  • Rounding is a display/formatting detail in this case and should not be done silently in non-UI code

Examples of undesired behavior that would be fixed with this PR:

Notebook outputs:
Screenshot 2025-11-07 at 16 06 21

Wandb loss chart:
Screenshot 2025-11-07 at 15 42 45

@AnMakc AnMakc force-pushed the disable-loss-rounding branch from 686b9df to 45fa1e7 Compare November 7, 2025 19:36
@Rocketknight1
Copy link
Member

cc @SunMarc, but maybe formatting using something like %.4f so that it always displays 4 significant figures with an optional exponent would preserve the clean rendering but still work in cases when the loss is very small?

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 10, 2025

@Rocketknight1 Any rounding at this point introduces inconsistency between validation and training loss. We could not apply formatting here as we log numeric value, not string.

The main point is that any formatting is a representation detail that should be handled by the display-related code and we should not make assumptions on how it works as in most cases it is some external handler (wandb, trackio, ipython NB widget, etc.).

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 24, 2025

@Rocketknight1 @SunMarc

Please, take a look

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind merging this PR. I also saw a couple of issues before complaining about this. cc @stas00 as this is something you added way back

@SunMarc SunMarc requested a review from qgallouedec November 25, 2025 15:23
@stas00
Copy link
Contributor

stas00 commented Nov 25, 2025

This use case didn't exist when this was originally coded. The goal was to avoid a pointless output like loss=1.3982828288737373

I wonder if using a scientific notation can address this use-case while not making it an ugly experience for all other use cases. loss=2.5444e-4 for the example above. Same 4 decimals, except the exponent overcomes the issue of lost precision in the original.

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 25, 2025

I wonder if using a scientific notation can address this use-case

At this point these are still numeric values so we can't apply formatting, see #42104 (comment)

@stas00
Copy link
Contributor

stas00 commented Nov 25, 2025

Apply at the point of print then, while saving the full precision in the log file?

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 25, 2025

Apply at the point of print then, while saving the full precision in the log file?

There are multiple ways of how the values could be displayed - jupyter notebooks, reporting to wandb, trackio, etc. Each handles display rounding separately. See attached screenshots in the first message here.

This PR makes this behaviour explicit and consistent across different type of values/metrics. If there is a benefit of rounding the metric in specific logs - it could be added in a separate PR.

@stas00
Copy link
Contributor

stas00 commented Nov 25, 2025

Whatever works. To make it clear - I wasn't objecting to your PR, was just suggesting on possible further work to keep things neat for multiple use-cases since I was tagged as the originator of this "feature".

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 25, 2025

Sure, just clarified the intention. Thanks!

@SunMarc Could you merge please?

@AnMakc AnMakc force-pushed the disable-loss-rounding branch from 45fa1e7 to 05a8b5e Compare November 25, 2025 20:16
@SunMarc
Copy link
Member

SunMarc commented Nov 26, 2025

To not have regression for other users, maybe we can implement @Rocketknight1 suggestions in PrinterCallback class that we have in transformers @AnMakc ? It should be quite straightforward.

maybe formatting using something like %.4f so that it always displays 4 significant figures with an optional exponent would preserve the clean rendering but still work in cases when the loss is very small?

Thanks a lot @stas00 for the suggestion

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@AnMakc AnMakc force-pushed the disable-loss-rounding branch from 05a8b5e to e745b00 Compare November 26, 2025 16:58
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thank you !

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 26, 2025

@SunMarc

Updated the PR to format metrics in console with 4 significant digits.
Python default is to use scientific notation for values <1e-4 which seems sane threshold.

Example output:
{'loss': '0.0002014', 'grad_norm': '0.0001294', 'learning_rate': '2.179e-07', 'epoch': '0.0003269'}

Note - epoch formatted the same way for consistency.

@stas00
Copy link
Contributor

stas00 commented Nov 26, 2025

that looks great, except 'epoch': '0.0003269' which is not a sensible format when facing a human reader, IMHO.

Should it report the percentage of epoch progress? 0.03% maybe?

@AnMakc
Copy link
Contributor Author

AnMakc commented Nov 26, 2025

Percentage would not work for second+ epoch.

@stas00
Copy link
Contributor

stas00 commented Nov 26, 2025

why not? 135.03% unless we don't have the knowledge about which epoch is running?

@stas00
Copy link
Contributor

stas00 commented Nov 26, 2025

But please feel free to ignore my suggestions if the current maintainers are ok with your last proposal. They decide.

@SunMarc
Copy link
Member

SunMarc commented Nov 27, 2025

I think we can keep this for now. If ppl complains about that, we can still do follow-up pr for specific metrics

@SunMarc SunMarc merged commit 7094f1e into huggingface:main Nov 27, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Removing the modification of loss value due to rounding off to 4 digits

6 participants