Skip to content

Commit 379640d

Browse files
latest papers 09-26 (#212)
1 parent 72af38a commit 379640d

File tree

1 file changed

+31
-12
lines changed

1 file changed

+31
-12
lines changed

README.md

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,13 @@ This is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Per
1515

1616
## News
1717

18-
🔥🔥🔥 [2025/09/22] Featured papers:
18+
🔥🔥🔥 [2025/09/26] Featured papers:
1919

2020
- 🔥🔥 [CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
2121

22-
- 🔥🔥 [SWE-QA: Can Language Models Answer Repository-level Code Questions?](https://arxiv.org/abs/2509.14635) from Shanghai Jiao Tong University.
22+
- 🔥🔥 [SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?](https://arxiv.org/abs/2509.16941) from Scale AI.
2323

24-
- 🔥 [LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering](https://arxiv.org/abs/2509.09614) from Salesforce AI Research.
25-
26-
- 🔥 [Astra: A Multi-Agent System for GPU Kernel Performance Optimization](https://arxiv.org/abs/2509.07506) from Stanford University.
27-
28-
- 🔥 [GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion](https://arxiv.org/abs/2509.05980) from Zhejiang University.
24+
- 🔥 [SWE-QA: Can Language Models Answer Repository-level Code Questions?](https://arxiv.org/abs/2509.14635) from Shanghai Jiao Tong University.
2925

3026
🔥🔥🔥 [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
3127

@@ -35,14 +31,10 @@ This is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Per
3531

3632
🔥🔥🔥 [2025/09/22] News from Codefuse
3733

38-
- [CGM (Code Graph Model)](https://arxiv.org/abs/2505.16901) is accepted to NeurIPS 2025. CGM currently ranks 1st among open-source models on [SWE-Bench leaderboard](https://www.swebench.com/). [[repo](https://github.com/codefuse-ai/CodeFuse-CGM)]
34+
- [CGM (Code Graph Model)](https://arxiv.org/abs/2505.16901) is accepted to NeurIPS 2025. CGM currently ranks 1st among open-weight models on [SWE-Bench-Lite leaderboard](https://www.swebench.com/). [[repo](https://github.com/codefuse-ai/CodeFuse-CGM)]
3935

4036
- [GALLa: Graph Aligned Large Language Models](https://arxiv.org/abs/2409.04183) is accepted by ACL 2025 main conference. [[repo](https://github.com/codefuse-ai/GALLa)]
4137

42-
<p align='center'>
43-
<img src='imgs/swe-leaderboard.png' style='width: 90%; '>
44-
</p>
45-
4638
#### How to Contribute
4739

4840
If you find a paper to be missing from this repository, misplaced in a category, or lacking a reference to its journal/conference information, please do not hesitate to create an issue.
@@ -693,6 +685,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
693685

694686
67. "SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems" [2025-09] [[paper](https://arxiv.org/abs/2509.14281)]
695687

688+
68. "Verification Limits Code LLM Training" [2025-09] [[paper](https://arxiv.org/abs/2509.20837)]
689+
696690
### 2.5 Reinforcement Learning on Code
697691

698692
1. **CompCoder**: "Compilable Neural Code Generation with Compiler Feedback" [2022-03] [ACL 2022] [[paper](https://arxiv.org/abs/2203.05132)]
@@ -761,6 +755,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
761755

762756
33. "Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization" [2025-09] [[paper](https://arxiv.org/abs/2509.12434)]
763757

758+
34. "DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?" [2025-09] [[paper](https://arxiv.org/abs/2509.21016)]
759+
764760
## 3. When Coding Meets Reasoning
765761

766762
### 3.1 Coding for Reasoning
@@ -1077,6 +1073,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
10771073

10781074
76. "GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging" [2025-08] [[paper](https://arxiv.org/abs/2508.18993)]
10791075

1076+
77. **MapCoder-Lite**: "MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM" [2025-09] [[paper](https://arxiv.org/abs/2509.17489)]
1077+
10801078
### 3.4 Interactive Coding
10811079

10821080
- "Interactive Program Synthesis" [2017-03] [[paper](https://arxiv.org/abs/1703.03539)]
@@ -1185,6 +1183,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
11851183

11861184
- "CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance" [2025-07] [[paper](https://arxiv.org/abs/2507.10646)]
11871185

1186+
- "SR-Eval: Evaluating LLMs on Code Generation under Stepwise Requirement Refinement" [2025-09] [[paper](https://arxiv.org/abs/2509.18808)]
1187+
11881188
### 3.5 Frontend Navigation
11891189

11901190
- "MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding" [2021-10] [ACL 2022] [[paper](https://arxiv.org/abs/2110.08518)]
@@ -1295,6 +1295,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
12951295

12961296
- "UI-Venus Technical Report: Building High-performance UI Agents with RFT" [2025-08] [[paper](https://arxiv.org/abs/2508.10833)]
12971297

1298+
- "Mano Report" [2025-09] [[paper](https://arxiv.org/abs/2509.17336)]
1299+
12981300
## 4. Code LLM for Low-Resource, Low-Level, and Domain-Specific Languages
12991301

13001302
- [**Ruby**] "On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages" [2022-04] [ICPC 2022] [[paper](https://arxiv.org/abs/2204.09653)]
@@ -1483,6 +1485,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
14831485

14841486
- [**CUDA**] "Astra: A Multi-Agent System for GPU Kernel Performance Optimization" [2025-09] [[paper](https://arxiv.org/abs/2509.07506)]
14851487

1488+
- [**LaTeX**] "Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models" [2025-09] [[paper](https://arxiv.org/abs/2509.17589)]
1489+
14861490
## 5. Methods/Models for Downstream Tasks
14871491

14881492
For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF, and (occasionally) static program analysis); the second column contains non-Transformer neural methods (e.g. LSTM, CNN, GNN); the third column contains Transformer based methods (e.g. BERT, GPT, T5).
@@ -2225,6 +2229,10 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
22252229

22262230
- "GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion" [2025-09] [[paper](https://arxiv.org/abs/2509.05980)]
22272231

2232+
- "CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion" [2025-09] [[paper](https://arxiv.org/abs/2509.16112)]
2233+
2234+
- "RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation" [2025-09] [[paper](https://arxiv.org/abs/2509.16198)]
2235+
22282236
### Issue Resolution
22292237

22302238
- "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" [2023-10] [ICLR 2024] [[paper](https://arxiv.org/abs/2310.06770)]
@@ -3183,6 +3191,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
31833191

31843192
- "Code-SPA: Style Preference Alignment to Large Language Models for Effective and Robust Code Debugging" [2025-07] [ACL 2025 Findings] [[paper](https://aclanthology.org/2025.findings-acl.912/)]
31853193

3194+
- "LLaVul: A Multimodal LLM for Interpretable Vulnerability Reasoning about Source Code" [2025-09] [[paper](https://arxiv.org/abs/2509.17337)]
3195+
31863196
### Malicious Code Detection
31873197

31883198
- "I-MAD: Interpretable Malware Detector Using Galaxy Transformer", 2019-09, Comput. Secur. 2021, [[paper](https://arxiv.org/abs/1909.06865)]
@@ -3337,6 +3347,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
33373347

33383348
- "Evaluating Generated Commit Messages with Large Language Models" [2025-07] [[paper](https://arxiv.org/abs/2507.10906)]
33393349

3350+
- "CoRaCMG: Contextual Retrieval-Augmented Framework for Commit Message Generation" [2025-09] [[paper](https://arxiv.org/abs/2509.18337)]
3351+
33403352
### Code Review
33413353

33423354
- "Using Pre-Trained Models to Boost Code Review Automation" [2022-01] [ICSE 2022] [[paper](https://arxiv.org/abs/2201.06850)]
@@ -3417,6 +3429,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
34173429

34183430
- "CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects" [2025-09] [[paper](https://arxiv.org/abs/2509.14856)]
34193431

3432+
- "Fine-Tuning LLMs to Analyze Multiple Dimensions of Code Review: A Maximum Entropy Regulated Long Chain-of-Thought Approach" [2025-09] [[paper](https://arxiv.org/abs/2509.21170)]
3433+
34203434
### Log Analysis
34213435

34223436
- "LogStamp: Automatic Online Log Parsing Based on Sequence Labelling" [2022-08] [[paper](https://arxiv.org/abs/2208.10282)]
@@ -3707,6 +3721,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
37073721

37083722
- "A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code" [2025-08] [[paper](https://arxiv.org/abs/2508.18106)]
37093723

3724+
- "Localizing Malicious Outputs from CodeLLM" [2025-09] [[paper](https://arxiv.org/abs/2509.17070)]
3725+
37103726
### Correctness
37113727

37123728
- "An Empirical Evaluation of GitHub Copilot's Code Suggestions" [2022-05] [MSR 2022] [[paper](https://ieeexplore.ieee.org/document/9796235)]
@@ -4201,6 +4217,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
42014217

42024218
- "ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming" [2025-05] [ACL 2025] [[paper](https://arxiv.org/abs/2505.16667)]
42034219

4220+
- "Intuition to Evidence: Measuring AI's True Impact on Developer Productivity" [2025-09] [[paper](https://arxiv.org/abs/2509.19708)]
4221+
42044222
## 8. Datasets
42054223

42064224
### 8.1 Pretraining
@@ -4692,6 +4710,7 @@ $^\diamond$ Machine/human prompts
46924710
| 2025-07 | arXiv | LiveRepoReflection | 1888 | C++, Go, Java, JS, Python, Rust | "Turning the Tide: Repository-based Code Reflection" [[paper](https://arxiv.org/abs/2507.09866)] |
46934711
| 2025-07 | arXiv | SWE-Perf | 140 | Python | "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?" [2025-07] [[paper](https://arxiv.org/abs/2507.12415)] [[data](https://github.com/swe-perf/swe-perf)] |
46944712
| 2025-09 | arXiv | RepoDebug | 30696 | 8 | "RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models" [[paper](https://arxiv.org/abs/2509.04078)] |
4713+
| 2025-09 | arXiv | SWE-Bench Pro | 1865 | Python, Go, JS, TS | "SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?" [[paper](https://arxiv.org/abs/2509.16941)] [[data](https://github.com/scaleapi/SWE-bench_Pro-os)] |
46954714

46964715
\*Line Completion/API Invocation Completion/Function Completion
46974716

0 commit comments

Comments
 (0)