Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# CLAUDE.md

This file provides guidance to Claude Code when working with code in the transformers repository.

## ⚠️ CRITICAL: Auto-Generated Files

**BEFORE editing ANY file in this repository, ALWAYS check if it's auto-generated.**

### Pre-Edit Checklist (MANDATORY):

1. **Read the file header** (first 50 lines) for warnings like:
- "This file was automatically generated"
- "Do NOT edit this file manually"
- "Generated from [source_file]"

2. **Check for modular source files**:
```bash
# If editing a modeling file, check if a modular version exists
ls src/transformers/models/[model_name]/modular_*.py
```

3. **If file is auto-generated**:
- ❌ DO NOT edit the generated file
- ✅ INSTEAD, edit the source file mentioned in the header
- ✅ Run the generation script if needed (usually mentioned in header or Makefile)

### Common Auto-Generated Files:

- **Modeling files**: Many `modeling_*.py` files are generated from `modular_*.py` files
- Example: `modeling_qwen3_moe.py` is generated from `modular_qwen3_moe.py`
- Always edit the `modular_*.py` file, not the generated one
- CI will fail if modular and generated files don't match

- **Generation script**: `utils/check_modular_conversion.py --fix_and_overwrite`
- Regenerates all modeling files from their modular sources
- CI enforces that generated files match their modular sources

### Example Warning Signs:

```python
# 🚨 THIS MEANS DO NOT EDIT THIS FILE 🚨
# This file was automatically generated from src/
# transformers/models/qwen3_moe/modular_qwen3_moe.py.
# Do NOT edit this file manually as any edits will be
# overwritten by the generation of
# the file from the modular. If any change should be
# done, please apply the change to the
# modular qwen3_moe.py file directly. One
# of our CI enforces this.
```

## Modular Architecture

### What is it?

The transformers library uses a "modular" system where:
- Core model logic is written in `modular_*.py` files
- These files are processed to generate the final `modeling_*.py` files
- This allows for code reuse and consistency across similar models

### How to work with modular files:

1. **Find the modular source**:
```bash
# If you need to edit modeling_foo.py, look for:
ls src/transformers/models/foo/modular_foo.py
```

2. **Edit the modular file**:
- Make your changes to `modular_foo.py`
- NOT to `modeling_foo.py`

3. **Regenerate (if needed)**:
```bash
# The generation usually happens automatically in CI
# But you can run it locally:
make check-modular-conversion
# or
python utils/check_modular_conversion.py --fix_and_overwrite
```

4. **Verify both files are updated**:
```bash
git status # Should show both modular and modeling files changed
```

## Contributing Workflow

### Before Making Changes:

1. Read `CONTRIBUTING.md` thoroughly
2. Check if files are auto-generated (see above)
3. Look for existing patterns in similar models
4. Check CI requirements in `.github/workflows/`

### Before Committing:

1. ✅ Verify you edited the correct files (source, not generated)
2. ✅ Run relevant tests
3. ✅ Check that modular files and generated files both changed
4. ✅ Read your diff carefully

### Common Mistakes to Avoid:

- ❌ Editing auto-generated files instead of source files
- ❌ Not reading file headers before editing
- ❌ Assuming file structure without checking documentation
- ❌ Rushing to implement without understanding the architecture
- ❌ Letting PR branches fall behind the base branch

## ⚠️ CRITICAL: Keep PR Branches Up-to-Date

**ALWAYS keep your PR branch current with the base branch to avoid conflicts and ensure CI passes.**

### Why This Matters:

1. **CI may fail** on outdated branches even if your code is correct
2. **Merge conflicts** become harder to resolve the longer you wait
3. **Review delays** - maintainers may wait for you to update before reviewing
4. **Your changes may conflict** with recent updates to the codebase

### When to Update Your Branch:

**ALWAYS update in these situations:**
- When you see "This branch is out-of-date with the base branch" on GitHub
- Before pushing new commits to an existing PR
- After receiving review feedback that requires code changes
- Daily for long-running PRs (if base branch is active)
- Before requesting re-review from maintainers

### How to Update Your Branch:

```bash
# Method 1: Merge upstream changes (RECOMMENDED for most cases)
git fetch upstream main
git merge upstream/main --no-edit
git push origin <your-branch-name>

# Method 2: Rebase (use if you need clean history, but be careful)
git fetch upstream main
git rebase upstream/main
git push origin <your-branch-name> --force-with-lease

# Quick check if your branch is behind:
git fetch upstream main
git log HEAD..upstream/main --oneline # Shows commits you're missing
```

### Automated Branch Update Workflow:

**Before EVERY push to a PR branch, run this checklist:**

```bash
# 1. Check if upstream has new commits
git fetch upstream main

# 2. See how far behind you are
git log HEAD..upstream/main --oneline

# 3. If there are new commits, merge them
if [[ $(git log HEAD..upstream/main --oneline | wc -l) -gt 0 ]]; then
echo "⚠️ Branch is behind, updating..."
git merge upstream/main --no-edit
fi

# 4. Now push your changes
git push origin <your-branch-name>
```

### Handling "Branch is out-of-date" Warnings:

When GitHub shows: **"This branch is out-of-date with the base branch. It's N commits behind"**

**Immediate action required:**
1. Don't ignore it - update immediately
2. Fetch and merge latest changes from base branch
3. Resolve any conflicts if they appear
4. Push the merge commit
5. Verify CI passes on the updated branch

### Merge Conflicts During Update:

If you get conflicts when updating:

```bash
# 1. Conflicts will be marked in affected files
git status # Shows files with conflicts

# 2. Open each conflicted file and resolve manually
# Look for <<<<<<< HEAD and >>>>>>> sections

# 3. After resolving conflicts in all files:
git add <conflicted-files>
git commit # Complete the merge

# 4. Push the resolved merge
git push origin <your-branch-name>
```

### Prevention Strategy:

**Set up these habits to avoid falling behind:**

1. **Daily check** for active PRs:
```bash
gh pr status # Shows status of your PRs
```

2. **Before starting work** on an existing PR:
```bash
git fetch upstream main && git merge upstream/main --no-edit
```

3. **Before requesting review**:
```bash
# Ensure branch is current
git fetch upstream main
git merge upstream/main --no-edit
git push origin <branch-name>
```

4. **Set up notifications**:
- Enable GitHub notifications for your PRs
- Watch for CI failures that might indicate branch is stale

### Red Flags That Indicate Branch Needs Update:

🚨 **Update immediately if you see:**
- "This branch is out-of-date with the base branch"
- CI failures on checks that passed before
- Merge conflict warnings from GitHub
- "Changes requested" review with note about conflicts
- Your PR shows as "N commits behind" (any number > 0)

### Best Practice Timing:

| Situation | When to Update |
|-----------|---------------|
| Fresh PR just created | ✅ Already current, no action needed |
| PR open for 1-2 days | ✅ Check daily, update if behind |
| PR open for 1+ week | ⚠️ Update immediately, likely very behind |
| After receiving review | ✅ Update before addressing comments |
| Before pushing new commits | ✅ Update first, then push changes |
| CI failing on old code | ⚠️ Update immediately, may fix issues |

## Additional Resources

- Main Contributing Guide: `CONTRIBUTING.md`
- Modular Transformers Docs: `docs/source/en/modular_transformers.md`
- Model Converter: `utils/modular_model_converter.py`
- Conversion Checker: `utils/check_modular_conversion.py`
2 changes: 1 addition & 1 deletion src/transformers/models/qwen3_moe/modeling_qwen3_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -675,7 +675,7 @@ def forward(
self.num_experts_per_tok,
attention_mask,
)
if labels is not None:
if labels is not None and self.training:
loss += self.router_aux_loss_coef * aux_loss.to(loss.device) # make sure to reside in the same device

return MoeCausalLMOutputWithPast(
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/qwen3_moe/modular_qwen3_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ def forward(
self.num_experts_per_tok,
attention_mask,
)
if labels is not None:
if labels is not None and self.training:
loss += self.router_aux_loss_coef * aux_loss.to(loss.device) # make sure to reside in the same device

return MoeCausalLMOutputWithPast(
Expand Down