Skip to content

Commit 9046dab

Browse files
committed
feat(blog): add MoM family blog post and improve blog UI
**What type of PR is this?** feat(blog): add comprehensive MoM family introduction and UI improvements **What this PR does / why we need it**: This PR introduces a new blog post about the MoM (Mixture of Models) family and improves the blog UI for better readability. Key changes: 1. New blog post: 'MoM: Specialized Models for Intelligent Routing' - Explains the evolution from ModernBERT encoder-only architecture - Describes advantages and limitations of the initial approach - Introduces the MoM family: encoders, decoders, and domain agents - Outlines future directions: agentic routing, RL, advanced post-training, tool integration 2. Blog UI improvements: - Increased blog content width from 1200px to 1400px - Expanded container width from 1400px to 1600px - Enhanced padding for better reading experience - Maintained responsive design for mobile devices 3. Added MoM family banner image The blog post provides clear context on why vLLM-SR is evolving from encoder-only models to a mixture-of-models architecture, addressing user needs for explainability, complex reasoning, and integration with frontier techniques. Signed-off-by: bitliu <bitliu@tencent.com>
1 parent 75cd042 commit 9046dab

File tree

3 files changed

+292
-0
lines changed

3 files changed

+292
-0
lines changed
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
slug: mom-family
3+
title: "MoM: Specialized Models for Intelligent Routing"
4+
authors: [Xunzhuo]
5+
tags: [mom, models, routing, announcement]
6+
---
7+
8+
![MoM Family](/img/mom-family.png)
9+
10+
**One fabric. Many minds.** We're introducing **MoM** (Mixture of Models)—a family of specialized routing models that power vLLM-SR's intelligent decision-making.
11+
12+
<!-- truncate -->
13+
14+
## Why MoM?
15+
16+
vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
17+
18+
## The Evolution: From Encoder-Only to Mixture-of-Models
19+
20+
### Where We Started: ModernBERT Foundation
21+
22+
vLLM-SR initially built its routing intelligence entirely on **ModernBERT** (encoder-only models):
23+
24+
**Advantages**:
25+
26+
-**Blazing fast**: Sub-10ms inference latency
27+
- 📊 **High throughput**: 10,000+ QPS on commodity hardware
28+
- 💰 **Cost-effective**: Minimal compute requirements
29+
- 🎯 **Proven accuracy**: Strong performance on classification tasks
30+
31+
**Limitations**:
32+
33+
-**Black box decisions**: No explanation for routing choices
34+
-**Limited reasoning**: Cannot handle complex, multi-step logic
35+
-**Fixed capabilities**: Hard to extend with new behaviors
36+
-**No tool integration**: Cannot leverage external tools or APIs
37+
38+
### Why We're Evolving: Decoder-Only Models
39+
40+
As vLLM-SR adoption grew, we encountered more diverse scenarios and requirements:
41+
42+
- **Explainability**: Users need to understand *why* a query was routed to a specific model
43+
- **Complex reasoning**: Some routing decisions require multi-step analysis
44+
- **Agentic workflows**: Integration with tool calling, function execution, and external APIs
45+
- **Advanced techniques**: Reinforcement learning (RL), sophisticated post-training methods
46+
- **Domain expertise**: Specialized routing for legal, medical, scientific domains
47+
48+
**The Solution**: Expand to decoder-only models while keeping encoder speed where it matters.
49+
50+
### The MoM Architecture: Best of Both Worlds
51+
52+
Our **Mixture-of-Models** approach combines encoder and decoder strengths:
53+
54+
-**Encoders** — Fast classification (sub-10ms latency) for high-throughput scenarios
55+
- 🧠 **Decoders** — Explainable decisions with reasoning for transparency
56+
- 🎯 **Domain Agents** — Expert routing with specialized knowledge
57+
58+
This hybrid architecture lets you choose the right tool for each job: speed when you need it, reasoning when it matters.
59+
60+
## The MoM Model Family
61+
62+
### 🔒 Encoders — Speed & Safety
63+
64+
Fast, high-throughput models for classification and security checks:
65+
66+
| Model | Purpose |
67+
|-------|---------|
68+
| **mom-enc-class-intent-v1** | Intent/topic classification (sub-10ms latency) |
69+
| **mom-enc-guard-pii-v1** | PII detection (privacy protection) |
70+
| **mom-enc-guard-jailbreak-v1** | Jailbreak/attack detection (security) |
71+
72+
### 🧠 Decoders — Explainability
73+
74+
When you need to understand *why* a routing decision was made:
75+
76+
| Model | Purpose |
77+
|-------|---------|
78+
| **mom-dec-class-intent-v1** | Intent classification with reasoning |
79+
| **mom-dec-class-intent-r1** | Higher-capacity variant for complex cases |
80+
81+
### 🎯 Domain Agents — Specialized Expertise
82+
83+
Expert models for domain-specific routing:
84+
85+
| Model | Domain |
86+
|-------|--------|
87+
| **mom-dec-agent-sci-v1** | Science (physics, chemistry, biology) |
88+
| **mom-dec-agent-math-v1** | Mathematics (algebra, calculus, statistics) |
89+
| **mom-dec-agent-hum-v1** | Humanities (literature, philosophy, history) |
90+
| **mom-dec-agent-soc-v1** | Social sciences (psychology, economics) |
91+
| **mom-dec-agent-law-v1** | Legal (contracts, compliance) |
92+
| **mom-dec-agent-gen-v1** | Generalist fallback |
93+
94+
## Design Principles
95+
96+
**Safety-First**: Guardrail models (PII, jailbreak detection) run before routing—security at the edge.
97+
98+
**Speed ↔ Explainability**: Choose encoders for sub-10ms latency or decoders for transparent reasoning. Different endpoints, different SLAs.
99+
100+
**Domain Expertise**: Specialized agents achieve 15-25% better accuracy on domain-specific tasks vs. generalist routing. Math queries go to math experts, legal queries to legal experts.
101+
102+
## How vLLM-SR Uses MoM
103+
104+
vLLM-SR's routing pipeline leverages MoM models at multiple stages:
105+
106+
1. **Security Check**`mom-enc-guard-*` models filter malicious/sensitive requests
107+
2. **Intent Classification**`mom-enc-class-intent-v1` or `mom-dec-class-intent-v1` determines query type
108+
3. **Domain Routing**`mom-dec-agent-*` models route specialized queries to optimal downstream models
109+
4. **Cost Optimization** → Simple queries → lightweight models; complex queries → premium models
110+
111+
This achieves **2x+ cost reduction** while maintaining quality, similar to [RouteLLM](https://arxiv.org/abs/2406.18665).
112+
113+
## Performance
114+
115+
Early benchmarks:
116+
117+
- **Encoders**: sub-10ms p99 latency, 10,000+ QPS
118+
- **Decoders**: ~50-100ms latency with explainable outputs
119+
- **Domain Agents**: 15-25% accuracy improvement over generalist routing
120+
121+
## What's Next: Exploring Frontier Techniques
122+
123+
The move to decoder-only models opens exciting possibilities for vLLM-SR:
124+
125+
### 🤖 Agentic Routing
126+
127+
Decoder models can act as intelligent agents that:
128+
129+
- Dynamically select and orchestrate multiple models
130+
- Make multi-step routing decisions with tool calling
131+
- Adapt routing strategies based on feedback
132+
133+
### 🎯 Reinforcement Learning (RL)
134+
135+
Apply RL techniques to optimize routing decisions:
136+
137+
- Learn from user feedback and model performance
138+
- Discover optimal routing policies through trial and error
139+
- Continuously improve cost-quality trade-offs
140+
141+
### 🔧 Advanced Post-Training
142+
143+
Leverage cutting-edge post-training methods:
144+
145+
- **Distillation**: Transfer knowledge from large models to efficient routers
146+
- **Preference learning**: Train on human feedback (RLHF, DPO)
147+
- **Domain adaptation**: Fine-tune for specific industries or use cases
148+
149+
### 🛠️ Tool Integration
150+
151+
Enable routers to:
152+
153+
- Call external APIs for context-aware routing
154+
- Query databases for historical routing patterns
155+
- Integrate with monitoring systems for real-time optimization
156+
157+
**The vision**: vLLM-SR routers that not only classify but *reason*, *learn*, and *adapt*.
158+
159+
## Model Naming
160+
161+
```text
162+
mom-{type}-{function}-{domain}-{version}
163+
```
164+
165+
- **type**: `enc` (encoder) / `dec` (decoder)
166+
- **function**: `class` (classification) / `guard` (safety) / `agent` (domain expert)
167+
- **domain**: `intent`, `pii`, `jailbreak`, `sci`, `math`, etc.
168+
- **version**: `v1` (baseline) / `r1` (higher-capacity)
169+
170+
## Get Started
171+
172+
All MoM models are available on [Hugging Face](https://huggingface.co/LLM-Semantic-Router).
173+
174+
**Resources**:
175+
176+
- [GitHub](https://github.com/vllm-project/semantic-router)
177+
- [Documentation](https://vllm-semantic-router.com)
178+
- [Quick Start Guide](https://vllm-semantic-router.com/docs/installation)
179+
180+
---
181+
182+
**vLLM-SR · Route with intent. Think with reason.**

website/src/css/custom.css

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -779,3 +779,113 @@ td, th {
779779
width: 100% !important;
780780
}
781781
}
782+
783+
/* ============================================
784+
Blog Page Optimizations - Wider Content
785+
============================================ */
786+
787+
/* Hide blog sidebar (left side posts list) */
788+
aside[class*='blogSidebar'],
789+
aside.col--3,
790+
.theme-blog-sidebar,
791+
div[class*='sidebar'] {
792+
display: none !important;
793+
}
794+
795+
/* Hide table of contents (right side) on blog pages */
796+
.theme-doc-toc-desktop,
797+
.table-of-contents,
798+
div[class*='tableOfContents'],
799+
div[class*='tocCollapsible'] {
800+
display: none !important;
801+
}
802+
803+
/* Target blog page main container */
804+
main[class*='docMainContainer'],
805+
.main-wrapper main {
806+
max-width: 100% !important;
807+
}
808+
809+
/* Expand blog content row to full width */
810+
div[class*='blogContainer'] .row,
811+
.blog-wrapper .row {
812+
justify-content: center !important;
813+
margin: 0 auto !important;
814+
}
815+
816+
/* Expand blog content column to use full width */
817+
.col--7,
818+
div[class*='blogPostContent'],
819+
main[class*='docMainContainer'] > .container > .row > div {
820+
max-width: 100% !important;
821+
flex: 0 0 100% !important;
822+
margin: 0 auto !important;
823+
}
824+
825+
/* Center blog content container - wider layout */
826+
.container,
827+
div[class*='blogContainer'] {
828+
max-width: 1600px !important;
829+
margin: 0 auto !important;
830+
padding: 0 3rem !important;
831+
}
832+
833+
/* Blog post content - centered and wide */
834+
article,
835+
article[class*='blogPostItem'],
836+
div[class*='blogPostContent'] article {
837+
min-width: 60vw !important;
838+
max-width: 1400px !important;
839+
margin: 2rem auto !important;
840+
padding: 3rem 4rem !important;
841+
display: block !important;
842+
}
843+
844+
/* Blog post header */
845+
header[class*='blogPostHeader'] {
846+
max-width: 1400px !important;
847+
margin: 0 auto !important;
848+
}
849+
850+
/* Blog list page optimization */
851+
.margin-vert--lg,
852+
div[class*='blogListPage'] {
853+
max-width: 1400px !important;
854+
margin: 2rem auto !important;
855+
width: 100% !important;
856+
}
857+
858+
/* Ensure blog post items are centered */
859+
.blogPostItem,
860+
div[class*='blogPostItem'] {
861+
max-width: 1400px !important;
862+
margin: 0 auto 2rem auto !important;
863+
}
864+
865+
/* Center blog post content wrapper */
866+
div[class*='blogPostPageContent'] {
867+
display: flex !important;
868+
justify-content: center !important;
869+
width: 100% !important;
870+
}
871+
872+
/* Responsive adjustments for blog */
873+
@media (max-width: 996px) {
874+
article,
875+
article[class*='blogPostItem'] {
876+
min-width: auto !important;
877+
padding: 2rem 1.5rem !important;
878+
}
879+
880+
.container,
881+
div[class*='blogContainer'] {
882+
padding: 0 1rem !important;
883+
}
884+
}
885+
886+
@media (max-width: 768px) {
887+
article,
888+
article[class*='blogPostItem'] {
889+
padding: 1.5rem 1rem !important;
890+
}
891+
}

website/static/img/mom-family.png

2.04 MB
Loading

0 commit comments

Comments
 (0)