You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+51-6Lines changed: 51 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,17 +19,17 @@ This is the repo for our TMLR [code LLM survey](https://arxiv.org/abs/2311.07989
19
19
20
20
## News
21
21
22
-
🔥🔥🔥 [2025/10/23] Featured papers:
22
+
🔥🔥🔥 [2025/10/30] Featured papers:
23
23
24
-
- 🔥🔥 [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://arxiv.org/abs/2510.18855) from Ant Group.
24
+
- 🔥🔥 [VisCoder2: Building Multi-Language Visualization Coding Agents](https://arxiv.org/abs/2510.23642) from University of Waterloo.
25
25
26
-
- 🔥🔥 [TritonRL: Training LLMs to Think and Code Triton Without Cheating](https://arxiv.org/abs/2510.17891) from Carnegie Mellon University.
26
+
- 🔥🔥 [JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence](https://arxiv.org/abs/2510.23538) from The University of Hong Kong.
27
27
28
-
- 🔥[LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?](https://arxiv.org/abs/2510.09595) from University of Michigan.
28
+
- 🔥🔥 [From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph](https://arxiv.org/abs/2510.19873) from Chinese Academy of Sciences.
29
29
30
-
- 🔥 [Scaling Laws for Code: A More Data-Hungry Regime](https://arxiv.org/abs/2510.08702) from Harbin Institute of Technology.
30
+
- 🔥 [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://arxiv.org/abs/2510.18855) from Ant Group.
31
31
32
-
- 🔥 [BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution](https://arxiv.org/abs/2510.08697) from Monash University.
32
+
- 🔥 [TritonRL: Training LLMs to Think and Code Triton Without Cheating](https://arxiv.org/abs/2510.17891) from Carnegie Mellon University.
33
33
34
34
🔥🔥 [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
35
35
@@ -711,6 +711,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
69.**JanusCoder**: "JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence" [2025-10][[paper](https://arxiv.org/abs/2510.23538)]
715
+
716
+
70.**VisCoder2**: "VisCoder2: Building Multi-Language Visualization Coding Agents" [2025-10][[paper](https://arxiv.org/abs/2510.23642)]
717
+
714
718
### 2.5 Reinforcement Learning on Code
715
719
716
720
1.**CompCoder**: "Compilable Neural Code Generation with Compiler Feedback" [2022-03][ACL 2022][[paper](https://arxiv.org/abs/2203.05132)]
@@ -785,6 +789,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
785
789
786
790
36.**CodeRL+**: "CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment" [2025-10][[paper](https://arxiv.org/abs/2510.18471)]
787
791
792
+
37. "GAPO: Group Adaptive Policy Optimization for Real-World Code Edit" [2025-10][[paper](https://arxiv.org/abs/2510.21830)]
793
+
794
+
38.**AesCoder**: "Code Aesthetics with Agentic Reward Feedback" [2025-10][[paper](https://arxiv.org/abs/2510.23272)]
795
+
788
796
## 3. When Coding Meets Reasoning
789
797
790
798
### 3.1 Coding for Reasoning
@@ -919,6 +927,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
919
927
920
928
65. "On Code-Induced Reasoning in LLMs" [2025-09][[paper](https://arxiv.org/abs/2509.21499)]
921
929
930
+
66.**PIPS**: "Once Upon an Input: Reasoning via Per-Instance Program Synthesis" [2025-10][[paper](https://arxiv.org/abs/2510.22849)]
931
+
922
932
### 3.2 Code Simulation
923
933
924
934
- "Code Simulation Challenges for Large Language Models" [2024-01][[paper](https://arxiv.org/abs/2401.09074)]
@@ -1119,6 +1129,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
83.**TOM-SWE**: "TOM-SWE: User Mental Modeling For Software Engineering Agents" [2025-10][[paper](https://arxiv.org/abs/2510.21903)]
1133
+
1134
+
84.**SwiftSolve**: "SwiftSolve: A Self-Iterative, Complexity-Aware Multi-Agent Framework for Competitive Programming" [2025-10][[paper](https://arxiv.org/abs/2510.22626)]
1135
+
1122
1136
### 3.4 Interactive Coding
1123
1137
1124
1138
- "Interactive Program Synthesis" [2017-03][[paper](https://arxiv.org/abs/1703.03539)]
@@ -1543,6 +1557,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
1543
1557
1544
1558
-[**Triton**] "TritonRL: Training LLMs to Think and Code Triton Without Cheating" [2025-10][[paper](https://arxiv.org/abs/2510.17891)]
1545
1559
1560
+
-[**CUDA**] "From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph" [2025-10][[paper](https://arxiv.org/abs/2510.19873)]
1561
+
1546
1562
## 5. Methods/Models for Downstream Tasks
1547
1563
1548
1564
For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF, and (occasionally) static program analysis); the second column contains non-Transformer neural methods (e.g. LSTM, CNN, GNN); the third column contains Transformer based methods (e.g. BERT, GPT, T5).
@@ -1749,6 +1765,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
1749
1765
1750
1766
- "Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches" [2025-10][[paper](https://arxiv.org/abs/2510.04905)]
1751
1767
1768
+
- "Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets" [2025-10][[paper](https://arxiv.org/abs/2510.20609)]
@@ -1951,6 +1969,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
1951
1969
1952
1970
- "HGAdapter: Hypergraph-based Adapters in Language Models for Code Summarization and Clone Detection" [2025-10][[paper](https://arxiv.org/abs/2510.17591)]
1953
1971
1972
+
- "CodeWiki: Automated Repository-Level Documentation at Scale" [2025-10][[paper](https://arxiv.org/abs/2510.24428)]
1973
+
1954
1974
### Program Repair
1955
1975
1956
1976
- "CURE: Code-Aware Neural Machine Translation for Automatic Program Repair" [2021-02][ICSE 2021][[paper](https://arxiv.org/abs/2103.00073)]
@@ -2161,6 +2181,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
2161
2181
2162
2182
- "Efficient Code Embeddings from Code Generation Models" [2025-08][[paper](https://arxiv.org/abs/2508.21290)]
- "An Empirical Study on the Code Refactoring Capability of Large Language Models" [2024-11][[paper](https://arxiv.org/abs/2411.02320)]
@@ -2343,6 +2365,12 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
2343
2365
2344
2366
- "An Empirical Study on Failures in Automated Issue Solving" [2025-09][[paper](https://arxiv.org/abs/2509.13941)]
2345
2367
2368
+
- "BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills" [2025-10][[paper](https://arxiv.org/abs/2510.19898)]
2369
+
2370
+
- "BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills" [2025-10][[paper](https://arxiv.org/abs/2510.19898)]
2371
+
2372
+
- "Scalable Supervising Software Agents with Patch Reasoner" [2025-10][[paper](https://arxiv.org/abs/2510.22775)]
2373
+
2346
2374
### Frontend Development
2347
2375
2348
2376
- "Seeking the user interface", 2014-09, ASE 2014, [[paper](https://dl.acm.org/doi/10.1145/2642937.2642976)]
@@ -2759,6 +2787,12 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
2759
2787
2760
2788
- "Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL" [2025-10][[paper](https://arxiv.org/abs/2510.14296)]
2761
2789
2790
+
- "Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks" [2025-10][[paper](https://arxiv.org/abs/2510.24102)]
2791
+
2792
+
- "DCMM-SQL: Automated Data-Centric Pipeline and Multi-Model Collaboration Training for Text-to-SQL Model" [2025-10][[paper](https://arxiv.org/abs/2510.23284)]
2793
+
2794
+
- "MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL" [2025-10][[paper](https://arxiv.org/abs/2510.25510)]
2795
+
2762
2796
### Program Proof
2763
2797
2764
2798
- "Baldur: Whole-Proof Generation and Repair with Large Language Models" [2023-03][FSE 2023][[paper](https://arxiv.org/abs/2303.04910)]
@@ -2969,6 +3003,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
2969
3003
2970
3004
- "Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models" [2025-09][[paper](https://arxiv.org/abs/2509.23812)]
2971
3005
3006
+
- "LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation" [2025-10][[paper](https://arxiv.org/abs/2510.22210)]
3007
+
2972
3008
### Oracle Generation
2973
3009
2974
3010
- "Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers" [2020-09][[paper](https://arxiv.org/abs/2009.05634)]
@@ -3585,6 +3621,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
3585
3621
3586
3622
- "Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations" [2025-01][[paper](https://arxiv.org/abs/2501.16495)]
3587
3623
3624
+
- "CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs" [2025-10][[paper](https://arxiv.org/abs/2510.22986)]
3625
+
3588
3626
### Software Configuration
3589
3627
3590
3628
- "Configuration Validation with Large Language Models" [2023-10][[paper](https://arxiv.org/abs/2310.09690)]
@@ -3617,6 +3655,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
- "Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents" [2025-10][[paper](https://arxiv.org/abs/2510.25694)]
3659
+
3620
3660
### Code QA & Reasoning
3621
3661
3622
3662
- "DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production" [2024-12][[paper](https://arxiv.org/abs/2412.08069)]
@@ -4341,6 +4381,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
4341
4381
4342
4382
- "Intuition to Evidence: Measuring AI's True Impact on Developer Productivity" [2025-09][[paper](https://arxiv.org/abs/2509.19708)]
4343
4383
4384
+
- "Does In-IDE Calibration of Large Language Models work at Scale?" [2025-10][[paper](https://arxiv.org/abs/2510.22614)]
4385
+
4344
4386
## 8. Datasets
4345
4387
4346
4388
### 8.1 Pretraining
@@ -4445,6 +4487,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
4445
4487
4446
4488
- "How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective" [2025-10][[paper](https://arxiv.org/abs/2510.08720)]
4447
4489
4490
+
- "MATCH: Task-Driven Code Evaluation through Contrastive Learning" [2025-10][[paper](https://arxiv.org/abs/2510.23169)]
4491
+
4448
4492
#### Program Synthesis
4449
4493
4450
4494
| Date | Venue | Benchmark | Size | Language | Source |
| 2025-09 | arXiv | PARROT | 598 || "PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation" [[paper](https://arxiv.org/abs/2509.23338)][[data](https://github.com/weAIDB/PARROT)]|
4630
4674
| 2025-09 | arXiv | MultiSpider 2.0 | 5056 || "Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents" [[paper](https://arxiv.org/abs/2509.24405)][[data](https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL)]|
4631
4675
| 2025-10 | arXiv | BIRD-INTERACT | 600 || "BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions" [[paper](https://arxiv.org/abs/2510.05318)][[data](https://bird-interact.github.io/)]|
4676
+
| 2025-10 | arXiv | Falcon | 600 || "Falcon: A Comprehensive Chinese Text-to-SQL Benchmark for Enterprise-Grade Evaluation" [[paper](https://arxiv.org/abs/2510.24762)][[data](https://github.com/eosphoros-ai/Falcon)]|
0 commit comments