huggingface · yashwantbezawada · Nov 8, 2025 · Nov 8, 2025 · Nov 8, 2025 · Nov 8, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,251 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code when working with code in the transformers repository.
+
+## ⚠️ CRITICAL: Auto-Generated Files
+
+**BEFORE editing ANY file in this repository, ALWAYS check if it's auto-generated.**
+
+### Pre-Edit Checklist (MANDATORY):
+
+1. **Read the file header** (first 50 lines) for warnings like:
+   - "This file was automatically generated"
+   - "Do NOT edit this file manually"
+   - "Generated from [source_file]"
+
+2. **Check for modular source files**:
+   ```bash
+   # If editing a modeling file, check if a modular version exists
+   ls src/transformers/models/[model_name]/modular_*.py
+   ```
+
+3. **If file is auto-generated**:
+   - ❌ DO NOT edit the generated file
+   - ✅ INSTEAD, edit the source file mentioned in the header
+   - ✅ Run the generation script if needed (usually mentioned in header or Makefile)
+
+### Common Auto-Generated Files:
+
+- **Modeling files**: Many `modeling_*.py` files are generated from `modular_*.py` files
+  - Example: `modeling_qwen3_moe.py` is generated from `modular_qwen3_moe.py`
+  - Always edit the `modular_*.py` file, not the generated one
+  - CI will fail if modular and generated files don't match
+
+- **Generation script**: `utils/check_modular_conversion.py --fix_and_overwrite`
+  - Regenerates all modeling files from their modular sources
+  - CI enforces that generated files match their modular sources
+
+### Example Warning Signs:
+
+```python
+# 🚨 THIS MEANS DO NOT EDIT THIS FILE 🚨
+# This file was automatically generated from src/
+# transformers/models/qwen3_moe/modular_qwen3_moe.py.
+#      Do NOT edit this file manually as any edits will be
+#      overwritten by the generation of
+#      the file from the modular. If any change should be
+# done, please apply the change to the
+#      modular qwen3_moe.py file directly. One
+# of our CI enforces this.
+```
+
+## Modular Architecture
+
+### What is it?
+
+The transformers library uses a "modular" system where:
+- Core model logic is written in `modular_*.py` files
+- These files are processed to generate the final `modeling_*.py` files
+- This allows for code reuse and consistency across similar models
+
+### How to work with modular files:
+
+1. **Find the modular source**:
+   ```bash
+   # If you need to edit modeling_foo.py, look for:
+   ls src/transformers/models/foo/modular_foo.py
+   ```
+
+2. **Edit the modular file**:
+   - Make your changes to `modular_foo.py`
+   - NOT to `modeling_foo.py`
+
+3. **Regenerate (if needed)**:
+   ```bash
+   # The generation usually happens automatically in CI
+   # But you can run it locally:
+   make check-modular-conversion
+   # or
+   python utils/check_modular_conversion.py --fix_and_overwrite
+   ```
+
+4. **Verify both files are updated**:
+   ```bash
+   git status  # Should show both modular and modeling files changed
+   ```
+
+## Contributing Workflow
+
+### Before Making Changes:
+
+1. Read `CONTRIBUTING.md` thoroughly
+2. Check if files are auto-generated (see above)
+3. Look for existing patterns in similar models
+4. Check CI requirements in `.github/workflows/`
+
+### Before Committing:
+
+1. ✅ Verify you edited the correct files (source, not generated)
+2. ✅ Run relevant tests
+3. ✅ Check that modular files and generated files both changed
+4. ✅ Read your diff carefully
+
+### Common Mistakes to Avoid:
+
+- ❌ Editing auto-generated files instead of source files
+- ❌ Not reading file headers before editing
+- ❌ Assuming file structure without checking documentation
+- ❌ Rushing to implement without understanding the architecture
+- ❌ Letting PR branches fall behind the base branch
+
+## ⚠️ CRITICAL: Keep PR Branches Up-to-Date
+
+**ALWAYS keep your PR branch current with the base branch to avoid conflicts and ensure CI passes.**
+
+### Why This Matters:
+
+1. **CI may fail** on outdated branches even if your code is correct
+2. **Merge conflicts** become harder to resolve the longer you wait
+3. **Review delays** - maintainers may wait for you to update before reviewing
+4. **Your changes may conflict** with recent updates to the codebase
+
+### When to Update Your Branch:
+
+✅ **ALWAYS update in these situations:**
+- When you see "This branch is out-of-date with the base branch" on GitHub
+- Before pushing new commits to an existing PR
+- After receiving review feedback that requires code changes
+- Daily for long-running PRs (if base branch is active)
+- Before requesting re-review from maintainers
+
+### How to Update Your Branch:
+
+```bash
+# Method 1: Merge upstream changes (RECOMMENDED for most cases)
+git fetch upstream main
+git merge upstream/main --no-edit
+git push origin <your-branch-name>
+
+# Method 2: Rebase (use if you need clean history, but be careful)
+git fetch upstream main
+git rebase upstream/main
+git push origin <your-branch-name> --force-with-lease
+
+# Quick check if your branch is behind:
+git fetch upstream main
+git log HEAD..upstream/main --oneline  # Shows commits you're missing
+```
+
+### Automated Branch Update Workflow:
+
+**Before EVERY push to a PR branch, run this checklist:**
+
+```bash
+# 1. Check if upstream has new commits
+git fetch upstream main
+
+# 2. See how far behind you are
+git log HEAD..upstream/main --oneline
+
+# 3. If there are new commits, merge them
+if [[ $(git log HEAD..upstream/main --oneline | wc -l) -gt 0 ]]; then
+    echo "⚠️  Branch is behind, updating..."
+    git merge upstream/main --no-edit
+fi
+
+# 4. Now push your changes
+git push origin <your-branch-name>
+```
+
+### Handling "Branch is out-of-date" Warnings:
+
+When GitHub shows: **"This branch is out-of-date with the base branch. It's N commits behind"**
+
+**Immediate action required:**
+1. Don't ignore it - update immediately
+2. Fetch and merge latest changes from base branch
+3. Resolve any conflicts if they appear
+4. Push the merge commit
+5. Verify CI passes on the updated branch
+
+### Merge Conflicts During Update:
+
+If you get conflicts when updating:
+
+```bash
+# 1. Conflicts will be marked in affected files
+git status  # Shows files with conflicts
+
+# 2. Open each conflicted file and resolve manually
+#    Look for <<<<<<< HEAD and >>>>>>> sections
+
+# 3. After resolving conflicts in all files:
+git add <conflicted-files>
+git commit  # Complete the merge
+
+# 4. Push the resolved merge
+git push origin <your-branch-name>
+```
+
+### Prevention Strategy:
+
+**Set up these habits to avoid falling behind:**
+
+1. **Daily check** for active PRs:
+   ```bash
+   gh pr status  # Shows status of your PRs
+   ```
+
+2. **Before starting work** on an existing PR:
+   ```bash
+   git fetch upstream main && git merge upstream/main --no-edit
+   ```
+
+3. **Before requesting review**:
+   ```bash
+   # Ensure branch is current
+   git fetch upstream main
+   git merge upstream/main --no-edit
+   git push origin <branch-name>
+   ```
+
+4. **Set up notifications**:
+   - Enable GitHub notifications for your PRs
+   - Watch for CI failures that might indicate branch is stale
+
+### Red Flags That Indicate Branch Needs Update:
+
+🚨 **Update immediately if you see:**
+- "This branch is out-of-date with the base branch"
+- CI failures on checks that passed before
+- Merge conflict warnings from GitHub
+- "Changes requested" review with note about conflicts
+- Your PR shows as "N commits behind" (any number > 0)
+
+### Best Practice Timing:
+
+| Situation | When to Update |
+|-----------|---------------|
+| Fresh PR just created | ✅ Already current, no action needed |
+| PR open for 1-2 days | ✅ Check daily, update if behind |
+| PR open for 1+ week | ⚠️ Update immediately, likely very behind |
+| After receiving review | ✅ Update before addressing comments |
+| Before pushing new commits | ✅ Update first, then push changes |
+| CI failing on old code | ⚠️ Update immediately, may fix issues |
+
+## Additional Resources
+
+- Main Contributing Guide: `CONTRIBUTING.md`
+- Modular Transformers Docs: `docs/source/en/modular_transformers.md`
+- Model Converter: `utils/modular_model_converter.py`
+- Conversion Checker: `utils/check_modular_conversion.py`
diff --git a/src/transformers/models/qwen3_moe/modeling_qwen3_moe.py b/src/transformers/models/qwen3_moe/modeling_qwen3_moe.py
@@ -675,7 +675,7 @@ def forward(
                 self.num_experts_per_tok,
                 attention_mask,
             )
-            if labels is not None:
+            if labels is not None and self.training:
                 loss += self.router_aux_loss_coef * aux_loss.to(loss.device)  # make sure to reside in the same device
 
         return MoeCausalLMOutputWithPast(

diff --git a/src/transformers/models/qwen3_moe/modular_qwen3_moe.py b/src/transformers/models/qwen3_moe/modular_qwen3_moe.py
@@ -187,7 +187,7 @@ def forward(
                 self.num_experts_per_tok,
                 attention_mask,
             )
-            if labels is not None:
+            if labels is not None and self.training:
                 loss += self.router_aux_loss_coef * aux_loss.to(loss.device)  # make sure to reside in the same device
 
         return MoeCausalLMOutputWithPast(