Skip to content

Commit 72af38a

Browse files
latest papers 09-22 (#211)
1 parent 25388a3 commit 72af38a

File tree

1 file changed

+27
-10
lines changed

1 file changed

+27
-10
lines changed

README.md

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,29 +15,29 @@ This is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Per
1515

1616
## News
1717

18-
🔥🔥🔥 [2025/09/12] Featured papers:
18+
🔥🔥🔥 [2025/09/22] Featured papers:
1919

20-
- 🔥🔥 [LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering](https://arxiv.org/abs/2509.09614) from Salesforce AI Research.
20+
- 🔥🔥 [CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
2121

22-
- 🔥🔥 [Astra: A Multi-Agent System for GPU Kernel Performance Optimization](https://arxiv.org/abs/2509.07506) from Stanford University.
22+
- 🔥🔥 [SWE-QA: Can Language Models Answer Repository-level Code Questions?](https://arxiv.org/abs/2509.14635) from Shanghai Jiao Tong University.
2323

24-
- 🔥🔥 [GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion](https://arxiv.org/abs/2509.05980) from Zhejiang University.
24+
- 🔥 [LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering](https://arxiv.org/abs/2509.09614) from Salesforce AI Research.
2525

26-
- 🔥 [LongCat-Flash Technical Report](https://arxiv.org/abs/2509.01322) from Meituan.
26+
- 🔥 [Astra: A Multi-Agent System for GPU Kernel Performance Optimization](https://arxiv.org/abs/2509.07506) from Stanford University.
2727

28-
- 🔥 [Towards Better Correctness and Efficiency in Code Generation](https://arxiv.org/abs/2508.20124) from Qwen Team.
28+
- 🔥 [GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion](https://arxiv.org/abs/2509.05980) from Zhejiang University.
2929

3030
🔥🔥🔥 [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
3131

3232
🔥🔥     [2025/08/15] 80 papers from ACL 2025 have been added. Search for the keyword "ACL 2025"!
3333

3434
🔥         [2024/09/06] **Our survey has been accepted for publication by [Transactions on Machine Learning Research (TMLR)](https://jmlr.org/tmlr/).**
3535

36-
🔥🔥🔥 [2025/06/25] News from Codefuse
36+
🔥🔥🔥 [2025/09/22] News from Codefuse
3737

38-
- [GALLa: Graph Aligned Large Language Models](https://arxiv.org/abs/2409.04183) is accepted by ACL 2025 main conference. [[repo](https://github.com/codefuse-ai/GALLa)]
38+
- [CGM (Code Graph Model)](https://arxiv.org/abs/2505.16901) is accepted to NeurIPS 2025. CGM currently ranks 1st among open-source models on [SWE-Bench leaderboard](https://www.swebench.com/). [[repo](https://github.com/codefuse-ai/CodeFuse-CGM)]
3939

40-
- [CGM (Code Graph Model)](https://arxiv.org/abs/2505.16901) is released, **currently ranking 1st among open-source models on [SWE-Bench leaderboard](https://www.swebench.com/)**. [[repo](https://github.com/codefuse-ai/CodeFuse-CGM)]
40+
- [GALLa: Graph Aligned Large Language Models](https://arxiv.org/abs/2409.04183) is accepted by ACL 2025 main conference. [[repo](https://github.com/codefuse-ai/GALLa)]
4141

4242
<p align='center'>
4343
<img src='imgs/swe-leaderboard.png' style='width: 90%; '>
@@ -553,6 +553,8 @@ These models are Transformer encoders, decoders, and encoder-decoders pretrained
553553

554554
2. **Dream-Coder**: "Dream-Coder 7B: An Open Diffusion Language Model for Code" [2025-09] [[paper](https://arxiv.org/abs/2509.01142)]
555555

556+
3. "Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation" [2025-09] [[paper](https://arxiv.org/abs/2509.11252)]
557+
556558
### 2.4 (Instruction) Fine-Tuning on Code
557559

558560
These models apply Instruction Fine-Tuning techniques to enhance the capacities of Code LLMs.
@@ -687,6 +689,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
687689

688690
65. **SCoder**: "SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs" [2025-09] [[paper](https://arxiv.org/abs/2509.07858)]
689691

692+
66. "Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models" [2025-09] [[paper](https://arxiv.org/abs/2509.11686)]
693+
694+
67. "SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems" [2025-09] [[paper](https://arxiv.org/abs/2509.14281)]
695+
690696
### 2.5 Reinforcement Learning on Code
691697

692698
1. **CompCoder**: "Compilable Neural Code Generation with Compiler Feedback" [2022-03] [ACL 2022] [[paper](https://arxiv.org/abs/2203.05132)]
@@ -753,6 +759,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
753759

754760
32. "Towards Better Correctness and Efficiency in Code Generation" [2025-08] [[paper](https://arxiv.org/abs/2508.20124)]
755761

762+
33. "Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization" [2025-09] [[paper](https://arxiv.org/abs/2509.12434)]
763+
756764
## 3. When Coding Meets Reasoning
757765

758766
### 3.1 Coding for Reasoning
@@ -1993,7 +2001,7 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
19932001

19942002
- "NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging" [2025-05] [[paper](https://arxiv.org/abs/2505.15356)]
19952003

1996-
- "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" [2025-05] [EASE, June 2025] [[paper](https://arxiv.org/abs/2505.02931)]
2004+
- "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" [2025-05] [EASE, June 2025] [[paper](https://arxiv.org/abs/2505.02931)]
19972005

19982006
- "Adversarial Reasoning for Repair Based on Inferred Program Intent" [2025-05] [ISSTA 2025] [[paper](https://arxiv.org/abs/2505.13008)]
19992007

@@ -2251,6 +2259,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
22512259

22522260
- "Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning" [2025-08] [[paper](https://arxiv.org/abs/2508.03501)]
22532261

2262+
- "An Empirical Study on Failures in Automated Issue Solving" [2025-09] [[paper](https://arxiv.org/abs/2509.13941)]
2263+
22542264
### Frontend Development
22552265

22562266
- "Seeking the user interface", 2014-09, ASE 2014, [[paper](https://dl.acm.org/doi/10.1145/2642937.2642976)]
@@ -2647,6 +2657,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
26472657

26482658
- "Evaluating NL2SQL via SQL2NL" [2025-09] [[paper](https://arxiv.org/abs/2509.04657)]
26492659

2660+
- "DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction" [2025-09] [[paper](https://arxiv.org/abs/2509.14507)]
2661+
26502662
### Program Proof
26512663

26522664
- "Baldur: Whole-Proof Generation and Repair with Large Language Models" [2023-03] [FSE 2023] [[paper](https://arxiv.org/abs/2303.04910)]
@@ -3403,6 +3415,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
34033415

34043416
- "Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects" [2025-07] [[paper](https://arxiv.org/abs/2507.19271)]
34053417

3418+
- "CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects" [2025-09] [[paper](https://arxiv.org/abs/2509.14856)]
3419+
34063420
### Log Analysis
34073421

34083422
- "LogStamp: Automatic Online Log Parsing Based on Sequence Labelling" [2022-08] [[paper](https://arxiv.org/abs/2208.10282)]
@@ -3851,6 +3865,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
38513865

38523866
- "When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions" [2025-07] [[paper](https://arxiv.org/abs/2507.20439)]
38533867

3868+
- "Prompt Stability in Code LLMs: Measuring Sensitivity across Emotion- and Personality-Driven Variations" [2025-09] [[paper](https://arxiv.org/abs/2509.13680)]
3869+
38543870
### Interpretability
38553871

38563872
- "A Critical Study of What Code-LLMs (Do Not) Learn" [2024-06] [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2406.11930)]
@@ -4424,6 +4440,7 @@ $^\diamond$ Machine/human prompts
44244440
| 2025-03 | ACL 2025 | LONGCODEU | 3983 | Python | "LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding" [[paper](https://arxiv.org/abs/2503.04359)] |
44254441
| 2025-05 | arXiv | CodeSense | 4495 | Python, C, Java | "CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning" [[paper](https://arxiv.org/abs/2506.00750)] [[data](https://codesense-bench.github.io/)] |
44264442
| 2025-07 | arXiv | CORE | 12,533 | C/C++, Java, Python | "CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks" [[paper](https://arxiv.org/abs/2507.05269)] [[data](https://corebench.github.io/)] |
4443+
| 2025-09 | arXiv | SWE-QA | 576 | Python | "SWE-QA: Can Language Models Answer Repository-level Code Questions?" [[paper](https://arxiv.org/abs/2509.14635)] [[data](https://github.com/peng-weihan/SWE-QA-Bench)] |
44274444

44284445
#### Text-to-SQL
44294446

0 commit comments

Comments
 (0)