Support empty response for Completions and ChatCompletions API #3309

tboerstad · 2025-09-22T22:02:03Z

This PR fixes a bug where evaluation will fail if any of the text/content responses is None from the API under test.
This happens because the parse_generations functions say that they return List[str], but will actually return a list[Optional[str]] when the response from the API is None.

This error scenario is more likely with reasoning models.
Some frameworks put the reasoning tokens in a different field, and without a high enough max_gen_toks the entire token budget will be spent on reasoning tokens. This leaves the text/content field as None.

I've hit this error scenario when testing gpt-oss-20b on vLLM with the gsm8k_cot_llama dataset.

CLAassistant · 2025-09-22T22:02:11Z

All committers have signed the CLA.

lm_eval/models/openai_completions.py

baberabb · 2025-10-02T20:48:14Z

Thanks for the PR! Left a comment to add warning in case an empty response is unexpected. Also if you could run the pre-commit for the formatting:

pip install pre-commit
pre-commit run --all-files
``

@baberabb

- Add warning logs when API returns None/empty responses in parse_generations - Helps users identify when reasoning models consume entire token budget - Applied pre-commit formatting Addresses review feedback from @baberabb

tboerstad · 2025-10-03T09:42:37Z

Thanks for the feedback. I've added a warning, tested that it's being emitted, and also ran the pre-commit run --all-files .

Support empty response for Completions and ChatCompletions API

e3d54d3

tboerstad requested review from StellaAthena and baberabb as code owners September 22, 2025 22:02

baberabb requested changes Oct 2, 2025

View reviewed changes

lm_eval/models/openai_completions.py Outdated Show resolved Hide resolved

Add warnings for empty/None API responses

05ee491

- Add warning logs when API returns None/empty responses in parse_generations - Helps users identify when reasoning models consume entire token budget - Applied pre-commit formatting Addresses review feedback from @baberabb

Improve code readability and check for empty string instead of just None

c01f509

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support empty response for Completions and ChatCompletions API #3309

Support empty response for Completions and ChatCompletions API #3309

Uh oh!

tboerstad commented Sep 22, 2025

Uh oh!

CLAassistant commented Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

baberabb commented Oct 2, 2025

Uh oh!

tboerstad commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support empty response for Completions and ChatCompletions API #3309

Are you sure you want to change the base?

Support empty response for Completions and ChatCompletions API #3309

Uh oh!

Conversation

tboerstad commented Sep 22, 2025

Uh oh!

CLAassistant commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

baberabb commented Oct 2, 2025

Uh oh!

tboerstad commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Sep 22, 2025 •

edited

Loading