You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[GuideLLM Refactor] Replace librosa, pydub, and soundfile with torchcodec (#411)
## TODO
- [ ] ~~More flexible version locking in multimodal extras group~~
- Goal with this was to add locking for different torchcodec/torch
versions but honestly its not worth the hassle
- [x] Check for multi-modal libs being installed
- [ ] More testing on `encode_audio`
## Summary
<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Replaces audio processing libraries with `torchcodec` which eliminates
19 dependencies and brings us inline with what HuggingFace `datasets` is
doing.
## Details
<!--
Provide a detailed list of all changes introduced in this pull request.
-->
-
## Test Plan
<!--
List the steps needed to test this PR.
-->
- Run against audio server with
```bash
guidellm benchmark run \
--target http://localhost:8000 \
--profile "synchronous" \
--max-requests 20 \
--request-type "audio_transcriptions" \
--data "openslr/librispeech_asr" \
--data-args '{"name": "clean", "split": "test"}'
```
---
- [x] "I certify that all code in this PR is my own, except as noted
below."
## Use of AI
- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
0 commit comments