Skip to content

Conversation

@thammegowda
Copy link

@thammegowda thammegowda commented Nov 21, 2025

Adding in bindings for two more languages!

  • bindings/cpp
  • bindings/c

C is an intermediate step to bind C++ and Rust: i.e., C++ <--> C <--> Rust.

--

  • Added tests to c++
  • Added benchmarks for my sanity checks and the results are as expected.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very open to this! let's make sure we have a big compat with expectations in terms of the funcs we bind

}

#[no_mangle]
pub extern "C" fn tokenizers_encode(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we need decode, batch encode batch encode fast etc and asyc? I don't know how best to define these all but basically the same surface as python!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review! Happy to know you're interested/open to this effort.
I will get back after surfacing more functionality to C++.

@thammegowda thammegowda marked this pull request as draft November 22, 2025 18:51
thammegowda and others added 8 commits November 22, 2025 10:56
CPP bindings coverage improvement
- Introduced a new submodule for Jinja2Cpp to handle chat template rendering.
- Enhanced the C++ bindings to load and apply chat templates from a configuration file.
- Added methods to retrieve special tokens and their IDs from the tokenizer configuration.
- Updated the CMake configuration to include Jinja2Cpp and link it with the tokenizers_cpp library.
- Refactored tests to validate the new chat template functionality and special token handling.
Support chat template in c++ api
@thammegowda
Copy link
Author

Hi @ArthurZucker I've made some progress with c++ adding more APIs.
But... templating like jinaj2 is giving me a bit of trouble. Wondering how your team is handling templates, e..g. chat_template -- is there a native support in the rust code for chat_template in jinja2 format?

I tried integrating https://github.com/jinja2cpp/Jinja2Cpp/ at c++ bindings layer, but some features are limited and lead to crash (IIRC, negative index like messages[-1])

Any tips/recommendation for handling templates natively would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants