Skip to content

Commit f7f9fc9

Browse files
latest papers 10-30 (#219)
1 parent 8fc2008 commit f7f9fc9

File tree

1 file changed

+51
-6
lines changed

1 file changed

+51
-6
lines changed

README.md

Lines changed: 51 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,17 @@ This is the repo for our TMLR [code LLM survey](https://arxiv.org/abs/2311.07989
1919

2020
## News
2121

22-
🔥🔥🔥 [2025/10/23] Featured papers:
22+
🔥🔥🔥 [2025/10/30] Featured papers:
2323

24-
- 🔥🔥 [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://arxiv.org/abs/2510.18855) from Ant Group.
24+
- 🔥🔥 [VisCoder2: Building Multi-Language Visualization Coding Agents](https://arxiv.org/abs/2510.23642) from University of Waterloo.
2525

26-
- 🔥🔥 [TritonRL: Training LLMs to Think and Code Triton Without Cheating](https://arxiv.org/abs/2510.17891) from Carnegie Mellon University.
26+
- 🔥🔥 [JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence](https://arxiv.org/abs/2510.23538) from The University of Hong Kong.
2727

28-
- 🔥 [LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?](https://arxiv.org/abs/2510.09595) from University of Michigan.
28+
- 🔥🔥 [From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph](https://arxiv.org/abs/2510.19873) from Chinese Academy of Sciences.
2929

30-
- 🔥 [Scaling Laws for Code: A More Data-Hungry Regime](https://arxiv.org/abs/2510.08702) from Harbin Institute of Technology.
30+
- 🔥 [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://arxiv.org/abs/2510.18855) from Ant Group.
3131

32-
- 🔥 [BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution](https://arxiv.org/abs/2510.08697) from Monash University.
32+
- 🔥 [TritonRL: Training LLMs to Think and Code Triton Without Cheating](https://arxiv.org/abs/2510.17891) from Carnegie Mellon University.
3333

3434
🔥🔥     [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
3535

@@ -711,6 +711,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
711711

712712
68. "Verification Limits Code LLM Training" [2025-09] [[paper](https://arxiv.org/abs/2509.20837)]
713713

714+
69. **JanusCoder**: "JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence" [2025-10] [[paper](https://arxiv.org/abs/2510.23538)]
715+
716+
70. **VisCoder2**: "VisCoder2: Building Multi-Language Visualization Coding Agents" [2025-10] [[paper](https://arxiv.org/abs/2510.23642)]
717+
714718
### 2.5 Reinforcement Learning on Code
715719

716720
1. **CompCoder**: "Compilable Neural Code Generation with Compiler Feedback" [2022-03] [ACL 2022] [[paper](https://arxiv.org/abs/2203.05132)]
@@ -785,6 +789,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
785789

786790
36. **CodeRL+**: "CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment" [2025-10] [[paper](https://arxiv.org/abs/2510.18471)]
787791

792+
37. "GAPO: Group Adaptive Policy Optimization for Real-World Code Edit" [2025-10] [[paper](https://arxiv.org/abs/2510.21830)]
793+
794+
38. **AesCoder**: "Code Aesthetics with Agentic Reward Feedback" [2025-10] [[paper](https://arxiv.org/abs/2510.23272)]
795+
788796
## 3. When Coding Meets Reasoning
789797

790798
### 3.1 Coding for Reasoning
@@ -919,6 +927,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
919927

920928
65. "On Code-Induced Reasoning in LLMs" [2025-09] [[paper](https://arxiv.org/abs/2509.21499)]
921929

930+
66. **PIPS**: "Once Upon an Input: Reasoning via Per-Instance Program Synthesis" [2025-10] [[paper](https://arxiv.org/abs/2510.22849)]
931+
922932
### 3.2 Code Simulation
923933

924934
- "Code Simulation Challenges for Large Language Models" [2024-01] [[paper](https://arxiv.org/abs/2401.09074)]
@@ -1119,6 +1129,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
11191129

11201130
82. **KAT-Coder**: "KAT-Coder Technical Report" [2025-10] [[paper](https://arxiv.org/abs/2510.18779)]
11211131

1132+
83. **TOM-SWE**: "TOM-SWE: User Mental Modeling For Software Engineering Agents" [2025-10] [[paper](https://arxiv.org/abs/2510.21903)]
1133+
1134+
84. **SwiftSolve**: "SwiftSolve: A Self-Iterative, Complexity-Aware Multi-Agent Framework for Competitive Programming" [2025-10] [[paper](https://arxiv.org/abs/2510.22626)]
1135+
11221136
### 3.4 Interactive Coding
11231137

11241138
- "Interactive Program Synthesis" [2017-03] [[paper](https://arxiv.org/abs/1703.03539)]
@@ -1543,6 +1557,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
15431557

15441558
- [**Triton**] "TritonRL: Training LLMs to Think and Code Triton Without Cheating" [2025-10] [[paper](https://arxiv.org/abs/2510.17891)]
15451559

1560+
- [**CUDA**] "From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph" [2025-10] [[paper](https://arxiv.org/abs/2510.19873)]
1561+
15461562
## 5. Methods/Models for Downstream Tasks
15471563

15481564
For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF, and (occasionally) static program analysis); the second column contains non-Transformer neural methods (e.g. LSTM, CNN, GNN); the third column contains Transformer based methods (e.g. BERT, GPT, T5).
@@ -1749,6 +1765,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
17491765

17501766
- "Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches" [2025-10] [[paper](https://arxiv.org/abs/2510.04905)]
17511767

1768+
- "Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets" [2025-10] [[paper](https://arxiv.org/abs/2510.20609)]
1769+
17521770
### Code Ranking
17531771

17541772
- "Fault-Aware Neural Code Rankers" [2022-06] [NeurIPS 2022] [[paper](https://arxiv.org/abs/2206.03865)]
@@ -1951,6 +1969,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
19511969

19521970
- "HGAdapter: Hypergraph-based Adapters in Language Models for Code Summarization and Clone Detection" [2025-10] [[paper](https://arxiv.org/abs/2510.17591)]
19531971

1972+
- "CodeWiki: Automated Repository-Level Documentation at Scale" [2025-10] [[paper](https://arxiv.org/abs/2510.24428)]
1973+
19541974
### Program Repair
19551975

19561976
- "CURE: Code-Aware Neural Machine Translation for Automatic Program Repair" [2021-02] [ICSE 2021] [[paper](https://arxiv.org/abs/2103.00073)]
@@ -2161,6 +2181,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
21612181

21622182
- "Efficient Code Embeddings from Code Generation Models" [2025-08] [[paper](https://arxiv.org/abs/2508.21290)]
21632183

2184+
- "Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification" [2025-10] [[paper](https://arxiv.org/abs/2510.24749)]
2185+
21642186
### Code Refactoring and Migration
21652187

21662188
- "An Empirical Study on the Code Refactoring Capability of Large Language Models" [2024-11] [[paper](https://arxiv.org/abs/2411.02320)]
@@ -2343,6 +2365,12 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
23432365

23442366
- "An Empirical Study on Failures in Automated Issue Solving" [2025-09] [[paper](https://arxiv.org/abs/2509.13941)]
23452367

2368+
- "BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills" [2025-10] [[paper](https://arxiv.org/abs/2510.19898)]
2369+
2370+
- "BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills" [2025-10] [[paper](https://arxiv.org/abs/2510.19898)]
2371+
2372+
- "Scalable Supervising Software Agents with Patch Reasoner" [2025-10] [[paper](https://arxiv.org/abs/2510.22775)]
2373+
23462374
### Frontend Development
23472375

23482376
- "Seeking the user interface", 2014-09, ASE 2014, [[paper](https://dl.acm.org/doi/10.1145/2642937.2642976)]
@@ -2759,6 +2787,12 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
27592787

27602788
- "Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL" [2025-10] [[paper](https://arxiv.org/abs/2510.14296)]
27612789

2790+
- "Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks" [2025-10] [[paper](https://arxiv.org/abs/2510.24102)]
2791+
2792+
- "DCMM-SQL: Automated Data-Centric Pipeline and Multi-Model Collaboration Training for Text-to-SQL Model" [2025-10] [[paper](https://arxiv.org/abs/2510.23284)]
2793+
2794+
- "MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL" [2025-10] [[paper](https://arxiv.org/abs/2510.25510)]
2795+
27622796
### Program Proof
27632797

27642798
- "Baldur: Whole-Proof Generation and Repair with Large Language Models" [2023-03] [FSE 2023] [[paper](https://arxiv.org/abs/2303.04910)]
@@ -2969,6 +3003,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
29693003

29703004
- "Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models" [2025-09] [[paper](https://arxiv.org/abs/2509.23812)]
29713005

3006+
- "LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation" [2025-10] [[paper](https://arxiv.org/abs/2510.22210)]
3007+
29723008
### Oracle Generation
29733009

29743010
- "Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers" [2020-09] [[paper](https://arxiv.org/abs/2009.05634)]
@@ -3585,6 +3621,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
35853621

35863622
- "Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations" [2025-01] [[paper](https://arxiv.org/abs/2501.16495)]
35873623

3624+
- "CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs" [2025-10] [[paper](https://arxiv.org/abs/2510.22986)]
3625+
35883626
### Software Configuration
35893627

35903628
- "Configuration Validation with Large Language Models" [2023-10] [[paper](https://arxiv.org/abs/2310.09690)]
@@ -3617,6 +3655,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
36173655

36183656
- "BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software" [2025-09] [[paper](https://arxiv.org/abs/2509.25248)]
36193657

3658+
- "Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents" [2025-10] [[paper](https://arxiv.org/abs/2510.25694)]
3659+
36203660
### Code QA & Reasoning
36213661

36223662
- "DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production" [2024-12] [[paper](https://arxiv.org/abs/2412.08069)]
@@ -4341,6 +4381,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
43414381

43424382
- "Intuition to Evidence: Measuring AI's True Impact on Developer Productivity" [2025-09] [[paper](https://arxiv.org/abs/2509.19708)]
43434383

4384+
- "Does In-IDE Calibration of Large Language Models work at Scale?" [2025-10] [[paper](https://arxiv.org/abs/2510.22614)]
4385+
43444386
## 8. Datasets
43454387

43464388
### 8.1 Pretraining
@@ -4445,6 +4487,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
44454487

44464488
- "How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective" [2025-10] [[paper](https://arxiv.org/abs/2510.08720)]
44474489

4490+
- "MATCH: Task-Driven Code Evaluation through Contrastive Learning" [2025-10] [[paper](https://arxiv.org/abs/2510.23169)]
4491+
44484492
#### Program Synthesis
44494493

44504494
| Date | Venue | Benchmark | Size | Language | Source |
@@ -4629,6 +4673,7 @@ $^\diamond$ Machine/human prompts
46294673
| 2025-09 | arXiv | PARROT | 598 | | "PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation" [[paper](https://arxiv.org/abs/2509.23338)] [[data](https://github.com/weAIDB/PARROT)] |
46304674
| 2025-09 | arXiv | MultiSpider 2.0 | 5056 | | "Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents" [[paper](https://arxiv.org/abs/2509.24405)] [[data](https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL)] |
46314675
| 2025-10 | arXiv | BIRD-INTERACT | 600 | | "BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions" [[paper](https://arxiv.org/abs/2510.05318)] [[data](https://bird-interact.github.io/)] |
4676+
| 2025-10 | arXiv | Falcon | 600 | | "Falcon: A Comprehensive Chinese Text-to-SQL Benchmark for Enterprise-Grade Evaluation" [[paper](https://arxiv.org/abs/2510.24762)] [[data](https://github.com/eosphoros-ai/Falcon)] |
46324677

46334678
#### Code Translation
46344679

0 commit comments

Comments
 (0)