fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

Pingasmaster · 2025-11-09T10:11:02Z

This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? idk). Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios. Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None. This is only me taking a stab at it for fun. It is not prod ready but just an idea.

Tries to help with #2177.

Byron

Thanks a lot for making this happen.

I took a first look and don't like it ~~at all~~, but also wouldn't know how to achieve round-tripping differently.

If you wouldn't mind, breaking changes should be in a separate commit and prefixed with fix!: probably, and all adjustments to other crates go into a separate commit (not marked as breaking (assuming they aren't breaking).

Byron · 2025-11-10T04:16:34Z

Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None.

First of all, the raw field is only in the *Ref versions of the type, and there already was a precedent for altering these to support round-tripping. From that point of view, I think it's acceptable, but… this never had to go so far and implement a by-pass.

I was wondering… what if this raw information would be stored on the CommitRef instead? Then I'd even go as far as to turn the existing fields with parsed *Ref types into &BStr, turning it into the raw field effectively.

Then one can use commit.author to access the raw field, and commit.author() to get a fallible parsed version of it just like before.

So this PR is definitely good at showing that alternative solutions like the one mentioned here a worth exploring.
Is this something you'd be interested in?

Pingasmaster · 2025-11-11T00:44:34Z

I'm not sure I can do exactly what you have in mind but I gave it a try. It's much cleaner code-wise at least. Divided it into 2 commits like you asked too, but I'm still not sure this is 100% ready. It's 2AM right now for me so I'll sleep and see tomorrow if I have anything else which might make this cleaner. Thanks for the feedback!

Byron

Thanks a lot for the second round!

Yes, using the fields directly and parsing on the fly is the way to go. In general, this parsing is now fallible, and .expect/.unwrap can't be used.

Besides that, it's definitely getting there, thanks again!

Byron · 2025-11-11T04:11:08Z

gix-object/src/commit/mod.rs

    /// Return the author, with whitespace trimmed.
    ///
    /// This is different from the `author` field which may contain whitespace.
    pub fn author(&self) -> gix_actor::SignatureRef<'a> {


These must be fallible, panics aren't allowed. It's OK to use the error type returned by SignatureRef::from_bytes().

Byron · 2025-11-11T04:13:18Z

gix-object/src/commit/write.rs

+}
+
+fn write_signature(mut out: &mut dyn io::Write, field: &[u8], raw: &bstr::BStr) -> io::Result<()> {
+    if signature_requires_raw(raw) {


I wonder why this differentiation is still required. In theory, the committer and author are now verbatim, which should always be what's written back.

Byron · 2025-11-11T04:15:56Z

gix-object/src/object/convert.rs

        } = other;
+        let tagger = tagger.map(|raw| {
+            gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref())
+                .expect("signatures were validated during parsing")


Every time the signature is parsed it must be fallible.

Byron · 2025-11-11T04:18:00Z

gix-object/src/tag/write.rs

+    gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref()).expect("signatures were validated during parsing")
+}
+
+fn signature_requires_raw(raw: &BStr) -> bool {


Again, I think differentiating between these shoudln't be necessary.

Pingasmaster · 2025-11-11T13:57:58Z

I've adjusted the commit and tag APIs so they return fallible results. CommitRef::author/committer/time report Result<SignatureRef, decode::Error> and after successful decoding trim the parsed value. TagRef::tagger also follows the same pattern. I made the owned conversions (CommitRef::into_owned/to_owned and TagRef::into_owned) use TryFrom so everything propagates cleanly, but don't hesitate to tell me if you have a better idea.

I’ve also removed the raw/canonical split. You were right, just streamring the stored header bytes and sizing them via raw.len() for commits and tag writers is much better.

Every place that re-parses actors during conversion or utility code should be fallible now. Tell me if you see anything else.

Byron reviewed Nov 9, 2025

View reviewed changes

Pingasmaster force-pushed the raw-email-attempt-fix branch from 2a27780 to 250d531 Compare November 11, 2025 00:38

Pingasmaster force-pushed the raw-email-attempt-fix branch 3 times, most recently from ddf9121 to 678bba4 Compare November 11, 2025 02:06

Pingasmaster changed the title ~~Enable SignatureRef/IdentityRef to preserve raw actor bytes for round-tripping malformed commits (see #2177)~~ fix!: store raw commit/tag actor headers and parse lazily (see #2177) Nov 11, 2025

fix!: expose raw commit/tag actor headers for round-tripping

c9cbfe1

Pingasmaster force-pushed the raw-email-attempt-fix branch from 40d5a95 to 114f986 Compare November 11, 2025 03:14

test: ensure malformed actors still round-trip

ddf21a1

Pingasmaster force-pushed the raw-email-attempt-fix branch from 114f986 to ddf21a1 Compare November 11, 2025 03:28

Byron requested changes Nov 11, 2025

View reviewed changes

fix!: make signature access fallible and preserve raw actor headers

f547dcc

Pingasmaster force-pushed the raw-email-attempt-fix branch from c423fbe to f547dcc Compare November 11, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

Pingasmaster commented Nov 9, 2025

Uh oh!

Byron left a comment •

edited

Loading

Uh oh!

Byron commented Nov 10, 2025

Uh oh!

Pingasmaster commented Nov 11, 2025 •

edited

Loading

Uh oh!

Byron left a comment

Uh oh!

Byron Nov 11, 2025

Uh oh!

Byron Nov 11, 2025

Uh oh!

Byron Nov 11, 2025

Uh oh!

Byron Nov 11, 2025

Uh oh!

Pingasmaster commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

Are you sure you want to change the base?

fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

Conversation

Pingasmaster commented Nov 9, 2025

Uh oh!

Byron left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Byron commented Nov 10, 2025

Uh oh!

Pingasmaster commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Pingasmaster commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Byron left a comment •

edited

Loading

Pingasmaster commented Nov 11, 2025 •

edited

Loading