Skip to content

Conversation

@RoDmitry
Copy link
Collaborator

@RoDmitry RoDmitry commented Nov 9, 2025

Less String clones -> more performance 😊
Also Derive optimizations.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 9, 2025

@haydenhoang check it out. Does it look good?

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 9, 2025

There is a problem which I have skipped, in OpenApi generation:

pub struct UpdateNlSearchModelParams<'p> {
    /// The ID of the NL search model to update
    pub model_id: Cow<'p, str>,
    /// The NL search model fields to update
    pub body: models::NlSearchModelCreateSchema, // missing <'p>
    pub _phantom: PhantomData<&'p ()>,
}

Somehow it thinks that models::NlSearchModelCreateSchema isPrimitiveType, maybe some mistake in openapi.yml.
And actually I don't even care about it. Fixable by hand 😏

@haydenhoang
Copy link
Contributor

Hey @RoDmitry do you have any benchmark showing performance gain from replacing String with Cow<'_, str>?

Doing that make the code more complex and harder to maintain (lifetimes, phantom data), and I think the user might prefer the easy use of String if there is no significant performance gain.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 10, 2025

Benchmark of the clones? I don't think that's necessary. That's an obvious optimization. There will be significant performance gain, if you benchmark it. I don't want to, because I understand how slow memory allocations are.
The code is more complex, but not to the point that it's harder to maintain.

User will not be bothered by the lifetimes much. It's either local lifetime or 'static. Also lifetimes might be usable in other future optimizations like borrowing of slices or anything else.
String can be wrapped by Cow using Cow::from, or just using .into(), which is the replacement of .to_owned() in the current code, so a user will not care about it.

Although I agree that Cow<'_, str> does not look as solid as String especially if a user is a newbie to Rust.
In my codebase I have clones of big models::SearchParameters in loops, so it's very useful in my case.

@haydenhoang
Copy link
Contributor

Wait, isn't the deserialized string owned anyway? Since the content does not outlive the returned model.

 let content = resp.text().await?;
        match content_type {
            ContentType::Json => serde_json::from_str(&content).map_err(Error::from),
            ContentType::Text => {
                return Err(Error::from(serde_json::Error::custom(
                    "Received `text/plain` content type response that cannot be converted to `models::ApiKey`",
                )));
            }
            ContentType::Unsupported(unknown_type) => {
                return Err(Error::from(serde_json::Error::custom(format!(
                    "Received `{unknown_type}` content type response that cannot be converted to `models::ApiKey`"
                ))));
            }
        }

So Cow<'_, str> is only useful on the request/input side?

I don't think the performance benefit is worth it. Saving nanoseconds by avoiding clones but added complexity to the code (especially the public API models) while a typical API request already has milliseconds of latency.

In my codebase I have clones of big models::SearchParameters in loops, so it's very useful in my case.

If the loop runs only a small number of times, which is typical, because each iteration makes an API call, the cost of cloning SearchParameters is negligible compared to network I/O. I think the optimization only matters when you clone the struct thousands of times in a tight loop, which really isn't a common pattern for an HTTP client.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 11, 2025

So Cow<'_, str> is only useful on the request/input side?

Yes.

a typical API request already has milliseconds of latency.

Yes, but the API request is async, while clones are sync, and eat cpu time.

added complexity to the code (especially the public API models)

What complexity to the public API? Try it yourself, it doesn't change anything much. It's just an enum wrapper of borrowed/owned string. How is it hard to use? Cow has good a API for working with strings. Use .into_owned() and it will unwrap to a usual String.

the cost of cloning SearchParameters is negligible compared to network I/O.

Basically anything is negligible compared to network I/O. Does it mean that any optimizations are useless?

the optimization only matters when you clone the struct thousands of times in a tight loop

I clone it hundreds of times in a tight loop. Does it mean that I must struggle? Or just buy a better cpu? 😅

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 11, 2025

Or maybe you want to separate the request models from the response models? Because response models contain only owned Strings. I didn't try that. I have no idea on how to do it in mustache. Is it possible? If you want to improve it, I'm open, and can push your commits (because the repo is not mine).

Or another idea is to create the second model, but with all of the fields as owned types (Strings), and set that owned model as the response type in the API. But it will leave half of the models unused.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 12, 2025

I think the following logic might be possible: preprocess openapi.yml to mark models used as input, then using that mark like {{#vendorExtensions.x-rust-is-used-as-input}} it would be possible to conditionally generate the models.

@haydenhoang
Copy link
Contributor

haydenhoang commented Nov 12, 2025

Optimizations are good as long as it overweigh the trade-offs. I found a post that explain this pretty well.

The trade-offs here are adding more code/making the code less clear for some performance gain that we haven't even measured yet. I don't like the public API models to carry unnecessary phantomData and lifetimes because it's more clean that way.

I think the following logic might be possible: preprocess openapi.yml to mark models used as input, then using that mark like {{#vendorExtensions.x-rust-is-used-as-input}} it would be possible to conditionally generate the models.

Well, this attempt at optimization really does make our codebase more complex.

Or maybe you want to separate the request models from the response models?

I think we should keep the type consistent across models.

Would be nice if you can open a smaller PR for the internal derive optimization first. This big PR are slowing us down, right now I'm working on writing the documentation for the derive.

@RoDmitry RoDmitry marked this pull request as draft November 12, 2025 21:43
@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 12, 2025

I know that PhantomData does not look good in the APIs input params. I will think about that a little more.

I found a post that explain this pretty well.

anything that isn't a trivially clear optimization should be avoided until it can be measured.

But it is a trivially clear optimization!

And Premature optimization is the practice of trying to improve a program's performance before it's finished.
Do you think that APIs that we have now are not finished?

@RoDmitry RoDmitry marked this pull request as ready for review November 13, 2025 11:42
@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 13, 2025

I have removed all of the PhantomData and all unused lifetimes. @haydenhoang does it look ok now?

Code generation leaves some extra lifetimes (not a lot), which I have removed manually. After the merge, I will open an issue for it. Currently don't have extra time to fix it.

@RoDmitry RoDmitry force-pushed the optimize-cow branch 6 times, most recently from a22f31a to c272f23 Compare November 13, 2025 12:08
@RoDmitry RoDmitry force-pushed the optimize-cow branch 2 times, most recently from 1b24c60 to 5bf1a13 Compare November 13, 2025 12:22
@haydenhoang
Copy link
Contributor

Will look into this later 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants