Skip to content

Conversation

@Pingasmaster
Copy link

This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? idk). Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios. Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None. This is only me taking a stab at it for fun. It is not prod ready but just an idea.

Tries to help with #2177.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for making this happen.

I took a first look and don't like it at all, but also wouldn't know how to achieve round-tripping differently.

If you wouldn't mind, breaking changes should be in a separate commit and prefixed with fix!: probably, and all adjustments to other crates go into a separate commit (not marked as breaking (assuming they aren't breaking).

@Byron
Copy link
Member

Byron commented Nov 10, 2025

Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None.

First of all, the raw field is only in the *Ref versions of the type, and there already was a precedent for altering these to support round-tripping. From that point of view, I think it's acceptable, but… this never had to go so far and implement a by-pass.

I was wondering… what if this raw information would be stored on the CommitRef instead? Then I'd even go as far as to turn the existing fields with parsed *Ref types into &BStr, turning it into the raw field effectively.

Then one can use commit.author to access the raw field, and commit.author() to get a fallible parsed version of it just like before.

So this PR is definitely good at showing that alternative solutions like the one mentioned here a worth exploring.
Is this something you'd be interested in?

@Pingasmaster
Copy link
Author

Pingasmaster commented Nov 11, 2025

I'm not sure I can do exactly what you have in mind but I gave it a try. It's much cleaner code-wise at least. Divided it into 2 commits like you asked too, but I'm still not sure this is 100% ready. It's 2AM right now for me so I'll sleep and see tomorrow if I have anything else which might make this cleaner. Thanks for the feedback!

@Pingasmaster Pingasmaster force-pushed the raw-email-attempt-fix branch 3 times, most recently from ddf9121 to 678bba4 Compare November 11, 2025 02:06
@Pingasmaster Pingasmaster changed the title Enable SignatureRef/IdentityRef to preserve raw actor bytes for round-tripping malformed commits (see #2177) fix!: store raw commit/tag actor headers and parse lazily (see #2177) Nov 11, 2025
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the second round!

Yes, using the fields directly and parsing on the fly is the way to go. In general, this parsing is now fallible, and .expect/.unwrap can't be used.

Besides that, it's definitely getting there, thanks again!

/// Return the author, with whitespace trimmed.
///
/// This is different from the `author` field which may contain whitespace.
pub fn author(&self) -> gix_actor::SignatureRef<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These must be fallible, panics aren't allowed. It's OK to use the error type returned by SignatureRef::from_bytes().

}

fn write_signature(mut out: &mut dyn io::Write, field: &[u8], raw: &bstr::BStr) -> io::Result<()> {
if signature_requires_raw(raw) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this differentiation is still required. In theory, the committer and author are now verbatim, which should always be what's written back.

} = other;
let tagger = tagger.map(|raw| {
gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref())
.expect("signatures were validated during parsing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time the signature is parsed it must be fallible.

gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref()).expect("signatures were validated during parsing")
}

fn signature_requires_raw(raw: &BStr) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think differentiating between these shoudln't be necessary.

@Pingasmaster
Copy link
Author

I've adjusted the commit and tag APIs so they return fallible results. CommitRef::author/committer/time report Result<SignatureRef, decode::Error> and after successful decoding trim the parsed value. TagRef::tagger also follows the same pattern. I made the owned conversions (CommitRef::into_owned/to_owned and TagRef::into_owned) use TryFrom so everything propagates cleanly, but don't hesitate to tell me if you have a better idea.

I’ve also removed the raw/canonical split. You were right, just streamring the stored header bytes and sizing them via raw.len() for commits and tag writers is much better.

Every place that re-parses actors during conversion or utility code should be fallible now. Tell me if you see anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants