You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+33-8Lines changed: 33 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,21 +15,21 @@ This is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Per
15
15
16
16
## News
17
17
18
-
🔥🔥🔥 [2025/10/03] Featured papers:
18
+
🔥🔥🔥 [2025/10/11] Featured papers:
19
19
20
-
- 🔥🔥 [BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software](https://arxiv.org/abs/2509.25248) from Arizona State University.
20
+
- 🔥🔥 [EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models](https://arxiv.org/abs/2510.03760) from City University of Hong Kong.
21
21
22
-
- 🔥🔥 [Devstral: Fine-tuning Language Models for Coding Agent Applications](https://arxiv.org/abs/2509.25193) from Mistral AI.
22
+
- 🔥[CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
23
23
24
-
- 🔥🔥 [LLaDA-MoE: A Sparse MoE Diffusion Language Model](https://arxiv.org/abs/2509.24389) from Ant Group.
24
+
- 🔥[BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software](https://arxiv.org/abs/2509.25248) from Arizona State University.
25
25
26
-
- 🔥🔥 [Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents](https://arxiv.org/abs/2509.23045) from Moonshot AI.
26
+
- 🔥[Devstral: Fine-tuning Language Models for Coding Agent Applications](https://arxiv.org/abs/2509.25193) from Mistral AI.
27
27
28
-
- 🔥🔥 [ML2B: Multi-Lingual ML Benchmark For AutoML](https://arxiv.org/abs/2509.22768) from HSE University.
28
+
- 🔥[LLaDA-MoE: A Sparse MoE Diffusion Language Model](https://arxiv.org/abs/2509.24389) from Ant Group.
29
29
30
-
- 🔥 [CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
30
+
- 🔥 [Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents](https://arxiv.org/abs/2509.23045) from Moonshot AI.
31
31
32
-
- 🔥 [SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?](https://arxiv.org/abs/2509.16941) from Scale AI.
32
+
- 🔥 [ML2B: Multi-Lingual ML Benchmark For AutoML](https://arxiv.org/abs/2509.22768) from HSE University.
33
33
34
34
🔥🔥 [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
35
35
@@ -517,6 +517,10 @@ These models are Transformer encoders, decoders, and encoder-decoders pretrained
517
517
518
518
25.**Seed-Coder**: "Seed-Coder: Let the Code Model Curate Data for Itself" [2025-06][[paper](https://arxiv.org/abs/2506.03524)]
519
519
520
+
26.**CWM**: "CWM: An Open-Weights LLM for Research on Code Generation with World Models" [2025-09][[paper](https://arxiv.org/abs/2510.02387)]
1.**PyMT5** (Span Corruption): "PyMT5: multi-mode translation of natural language and Python code with transformers" [2020-10][EMNLP 2020][[paper](https://arxiv.org/abs/2010.03150)]
@@ -557,6 +561,8 @@ These models are Transformer encoders, decoders, and encoder-decoders pretrained
557
561
558
562
3. "Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation" [2025-09][[paper](https://arxiv.org/abs/2509.11252)]
559
563
564
+
4.**CoDA**: "CoDA: Coding LM via Diffusion Adaptation" [2025-10][[paper](https://arxiv.org/abs/2510.03270)]
565
+
560
566
### 2.4 (Instruction) Fine-Tuning on Code
561
567
562
568
These models apply Instruction Fine-Tuning techniques to enhance the capacities of Code LLMs.
@@ -933,6 +939,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
933
939
934
940
- "L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution" [2025-03][[paper](https://arxiv.org/abs/2503.22832)]
935
941
942
+
- "PLSemanticsBench: Large Language Models As Programming Language Interpreters" [2025-10][[paper](https://arxiv.org/abs/2510.03415)]
943
+
944
+
- "Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models" [2025-10][[paper](https://arxiv.org/abs/2510.07892)]
945
+
936
946
### 3.3 Code Agents
937
947
938
948
1.**Self-collaboration**: "Self-collaboration Code Generation via ChatGPT" [2023-04][[paper](https://arxiv.org/abs/2304.07590)]
@@ -1095,6 +1105,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
1095
1105
1096
1106
80.**Kimi-Dev**: "Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents" [2025-09][[paper](https://arxiv.org/abs/2509.23045)]
- "Interactive Program Synthesis" [2017-03][[paper](https://arxiv.org/abs/1703.03539)]
@@ -1509,6 +1521,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
1509
1521
1510
1522
- "CodeChemist: Functional Knowledge Transfer for Low-Resource Code Generation via Test-Time Scaling" [2025-10][[paper](https://arxiv.org/abs/2510.00501)]
1511
1523
1524
+
-[**CUDA**] "EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models" [2025-10][[paper](https://arxiv.org/abs/2510.03760)]
1525
+
1512
1526
## 5. Methods/Models for Downstream Tasks
1513
1527
1514
1528
For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF, and (occasionally) static program analysis); the second column contains non-Transformer neural methods (e.g. LSTM, CNN, GNN); the third column contains Transformer based methods (e.g. BERT, GPT, T5).
@@ -1711,6 +1725,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
1711
1725
1712
1726
- "Impact-driven Context Filtering For Cross-file Code Completion" [2025-08][[paper](https://arxiv.org/abs/2508.05970)]
1713
1727
1728
+
- "Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches" [2025-10][[paper](https://arxiv.org/abs/2510.04905)]
@@ -2417,6 +2433,10 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
2417
2433
2418
2434
- "ML2B: Multi-Lingual ML Benchmark For AutoML" [2025-09][[paper](https://arxiv.org/abs/2509.22768)]
2419
2435
2436
+
- "RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback" [2025-10][[paper](https://arxiv.org/abs/2510.06186)]
2437
+
2438
+
- "AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents" [2025-10][[paper](https://arxiv.org/abs/2510.08511)]
2439
+
2420
2440
### Text-To-SQL
2421
2441
2422
2442
- "PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models" [2021-09][EMNLP 2021][[paper](https://arxiv.org/abs/2109.05093)]
@@ -3233,6 +3253,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
3233
3253
3234
3254
- "Improving Code Localization with Repository Memory" [2025-10][[paper](https://arxiv.org/abs/2510.01003)]
3235
3255
3256
+
- "Vul-R2: A Reasoning LLM for Automated Vulnerability Repair" [2025-10][[paper](https://arxiv.org/abs/2510.05480)]
| 2025-05 | arXiv | BiomedSQL | 68,000 || "BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases" [[paper](https://arxiv.org/abs/2505.20321)][[data](https://github.com/NIH-CARD/biomedsql)]|
4548
4572
| 2025-09 | arXiv | PARROT | 598 || "PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation" [[paper](https://arxiv.org/abs/2509.23338)][[data](https://github.com/weAIDB/PARROT)]|
4549
4573
| 2025-09 | arXiv | MultiSpider 2.0 | 5056 || "Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents" [[paper](https://arxiv.org/abs/2509.24405)][[data](https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL)]|
4574
+
| 2025-10 | arXiv | BIRD-INTERACT | 600 || "BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions" [[paper](https://arxiv.org/abs/2510.05318)][[data](https://bird-interact.github.io/)]|
0 commit comments