Skip to content

Commit 3f038f2

Browse files
latest papers 10-11 (#214)
1 parent c9f4d6d commit 3f038f2

File tree

1 file changed

+33
-8
lines changed

1 file changed

+33
-8
lines changed

README.md

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,21 +15,21 @@ This is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Per
1515

1616
## News
1717

18-
🔥🔥🔥 [2025/10/03] Featured papers:
18+
🔥🔥🔥 [2025/10/11] Featured papers:
1919

20-
- 🔥🔥 [BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software](https://arxiv.org/abs/2509.25248) from Arizona State University.
20+
- 🔥🔥 [EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models](https://arxiv.org/abs/2510.03760) from City University of Hong Kong.
2121

22-
- 🔥🔥 [Devstral: Fine-tuning Language Models for Coding Agent Applications](https://arxiv.org/abs/2509.25193) from Mistral AI.
22+
- 🔥 [CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
2323

24-
- 🔥🔥 [LLaDA-MoE: A Sparse MoE Diffusion Language Model](https://arxiv.org/abs/2509.24389) from Ant Group.
24+
- 🔥 [BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software](https://arxiv.org/abs/2509.25248) from Arizona State University.
2525

26-
- 🔥🔥 [Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents](https://arxiv.org/abs/2509.23045) from Moonshot AI.
26+
- 🔥 [Devstral: Fine-tuning Language Models for Coding Agent Applications](https://arxiv.org/abs/2509.25193) from Mistral AI.
2727

28-
- 🔥🔥 [ML2B: Multi-Lingual ML Benchmark For AutoML](https://arxiv.org/abs/2509.22768) from HSE University.
28+
- 🔥 [LLaDA-MoE: A Sparse MoE Diffusion Language Model](https://arxiv.org/abs/2509.24389) from Ant Group.
2929

30-
- 🔥 [CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
30+
- 🔥 [Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents](https://arxiv.org/abs/2509.23045) from Moonshot AI.
3131

32-
- 🔥 [SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?](https://arxiv.org/abs/2509.16941) from Scale AI.
32+
- 🔥 [ML2B: Multi-Lingual ML Benchmark For AutoML](https://arxiv.org/abs/2509.22768) from HSE University.
3333

3434
🔥🔥     [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
3535

@@ -517,6 +517,10 @@ These models are Transformer encoders, decoders, and encoder-decoders pretrained
517517

518518
25. **Seed-Coder**: "Seed-Coder: Let the Code Model Curate Data for Itself" [2025-06] [[paper](https://arxiv.org/abs/2506.03524)]
519519

520+
26. **CWM**: "CWM: An Open-Weights LLM for Research on Code Generation with World Models" [2025-09] [[paper](https://arxiv.org/abs/2510.02387)]
521+
522+
27. **Mellum**: "Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding" [2025-10] [[paper](https://arxiv.org/abs/2510.05788)]
523+
520524
#### Encoder-Decoder
521525

522526
1. **PyMT5** (Span Corruption): "PyMT5: multi-mode translation of natural language and Python code with transformers" [2020-10] [EMNLP 2020] [[paper](https://arxiv.org/abs/2010.03150)]
@@ -557,6 +561,8 @@ These models are Transformer encoders, decoders, and encoder-decoders pretrained
557561

558562
3. "Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation" [2025-09] [[paper](https://arxiv.org/abs/2509.11252)]
559563

564+
4. **CoDA**: "CoDA: Coding LM via Diffusion Adaptation" [2025-10] [[paper](https://arxiv.org/abs/2510.03270)]
565+
560566
### 2.4 (Instruction) Fine-Tuning on Code
561567

562568
These models apply Instruction Fine-Tuning techniques to enhance the capacities of Code LLMs.
@@ -933,6 +939,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
933939

934940
- "L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution" [2025-03] [[paper](https://arxiv.org/abs/2503.22832)]
935941

942+
- "PLSemanticsBench: Large Language Models As Programming Language Interpreters" [2025-10] [[paper](https://arxiv.org/abs/2510.03415)]
943+
944+
- "Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models" [2025-10] [[paper](https://arxiv.org/abs/2510.07892)]
945+
936946
### 3.3 Code Agents
937947

938948
1. **Self-collaboration**: "Self-collaboration Code Generation via ChatGPT" [2023-04] [[paper](https://arxiv.org/abs/2304.07590)]
@@ -1095,6 +1105,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
10951105

10961106
80. **Kimi-Dev**: "Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents" [2025-09] [[paper](https://arxiv.org/abs/2509.23045)]
10971107

1108+
81. **VeriGuard**: "VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation" [2025-10] [[paper](https://arxiv.org/abs/2510.05156)]
1109+
10981110
### 3.4 Interactive Coding
10991111

11001112
- "Interactive Program Synthesis" [2017-03] [[paper](https://arxiv.org/abs/1703.03539)]
@@ -1509,6 +1521,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
15091521

15101522
- "CodeChemist: Functional Knowledge Transfer for Low-Resource Code Generation via Test-Time Scaling" [2025-10] [[paper](https://arxiv.org/abs/2510.00501)]
15111523

1524+
- [**CUDA**] "EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models" [2025-10] [[paper](https://arxiv.org/abs/2510.03760)]
1525+
15121526
## 5. Methods/Models for Downstream Tasks
15131527

15141528
For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF, and (occasionally) static program analysis); the second column contains non-Transformer neural methods (e.g. LSTM, CNN, GNN); the third column contains Transformer based methods (e.g. BERT, GPT, T5).
@@ -1711,6 +1725,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
17111725

17121726
- "Impact-driven Context Filtering For Cross-file Code Completion" [2025-08] [[paper](https://arxiv.org/abs/2508.05970)]
17131727

1728+
- "Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches" [2025-10] [[paper](https://arxiv.org/abs/2510.04905)]
1729+
17141730
### Code Ranking
17151731

17161732
- "Fault-Aware Neural Code Rankers" [2022-06] [NeurIPS 2022] [[paper](https://arxiv.org/abs/2206.03865)]
@@ -2417,6 +2433,10 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
24172433

24182434
- "ML2B: Multi-Lingual ML Benchmark For AutoML" [2025-09] [[paper](https://arxiv.org/abs/2509.22768)]
24192435

2436+
- "RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback" [2025-10] [[paper](https://arxiv.org/abs/2510.06186)]
2437+
2438+
- "AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents" [2025-10] [[paper](https://arxiv.org/abs/2510.08511)]
2439+
24202440
### Text-To-SQL
24212441

24222442
- "PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models" [2021-09] [EMNLP 2021] [[paper](https://arxiv.org/abs/2109.05093)]
@@ -3233,6 +3253,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
32333253

32343254
- "Improving Code Localization with Repository Memory" [2025-10] [[paper](https://arxiv.org/abs/2510.01003)]
32353255

3256+
- "Vul-R2: A Reasoning LLM for Automated Vulnerability Repair" [2025-10] [[paper](https://arxiv.org/abs/2510.05480)]
3257+
32363258
### Malicious Code Detection
32373259

32383260
- "I-MAD: Interpretable Malware Detector Using Galaxy Transformer", 2019-09, Comput. Secur. 2021, [[paper](https://arxiv.org/abs/1909.06865)]
@@ -3579,6 +3601,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
35793601

35803602
- "Regression Language Models for Code" [2025-09] [[paper](https://arxiv.org/abs/2509.26476)]
35813603

3604+
- "When Names Disappear: Revealing What LLMs Actually Understand About Code" [2025-10] [[paper](https://arxiv.org/abs/2510.03178)]
3605+
35823606
### Software Modeling
35833607

35843608
- "Towards using Few-Shot Prompt Learning for Automating Model Completion" [2022-12] [[paper](https://arxiv.org/abs/2212.03404)]
@@ -4547,6 +4571,7 @@ $^\diamond$ Machine/human prompts
45474571
| 2025-05 | arXiv | BiomedSQL | 68,000 | | "BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases" [[paper](https://arxiv.org/abs/2505.20321)] [[data](https://github.com/NIH-CARD/biomedsql)] |
45484572
| 2025-09 | arXiv | PARROT | 598 | | "PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation" [[paper](https://arxiv.org/abs/2509.23338)] [[data](https://github.com/weAIDB/PARROT)] |
45494573
| 2025-09 | arXiv | MultiSpider 2.0 | 5056 | | "Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents" [[paper](https://arxiv.org/abs/2509.24405)] [[data](https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL)] |
4574+
| 2025-10 | arXiv | BIRD-INTERACT | 600 | | "BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions" [[paper](https://arxiv.org/abs/2510.05318)] [[data](https://bird-interact.github.io/)] |
45504575

45514576
#### Code Translation
45524577

0 commit comments

Comments
 (0)