You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+27-10Lines changed: 27 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,29 +15,29 @@ This is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Per
15
15
16
16
## News
17
17
18
-
🔥🔥🔥 [2025/09/12] Featured papers:
18
+
🔥🔥🔥 [2025/09/22] Featured papers:
19
19
20
-
- 🔥🔥 [LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering](https://arxiv.org/abs/2509.09614) from Salesforce AI Research.
20
+
- 🔥🔥 [CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects](https://arxiv.org/abs/2509.14856) from Ant Group.
21
21
22
-
- 🔥🔥 [Astra: A Multi-Agent System for GPU Kernel Performance Optimization](https://arxiv.org/abs/2509.07506) from Stanford University.
22
+
- 🔥🔥 [SWE-QA: Can Language Models Answer Repository-level Code Questions?](https://arxiv.org/abs/2509.14635) from Shanghai Jiao Tong University.
23
23
24
-
- 🔥🔥 [GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion](https://arxiv.org/abs/2509.05980) from Zhejiang University.
24
+
- 🔥[LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering](https://arxiv.org/abs/2509.09614) from Salesforce AI Research.
25
25
26
-
- 🔥 [LongCat-Flash Technical Report](https://arxiv.org/abs/2509.01322) from Meituan.
26
+
- 🔥 [Astra: A Multi-Agent System for GPU Kernel Performance Optimization](https://arxiv.org/abs/2509.07506) from Stanford University.
27
27
28
-
- 🔥 [Towards Better Correctness and Efficiency in Code Generation](https://arxiv.org/abs/2508.20124) from Qwen Team.
28
+
- 🔥 [GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion](https://arxiv.org/abs/2509.05980) from Zhejiang University.
29
29
30
30
🔥🔥🔥 [2025/08/24] 29 papers from ICML 2025 have been added. Search for the keyword "ICML 2025"!
31
31
32
32
🔥🔥 [2025/08/15] 80 papers from ACL 2025 have been added. Search for the keyword "ACL 2025"!
33
33
34
34
🔥 [2024/09/06]**Our survey has been accepted for publication by [Transactions on Machine Learning Research (TMLR)](https://jmlr.org/tmlr/).**
35
35
36
-
🔥🔥🔥 [2025/06/25] News from Codefuse
36
+
🔥🔥🔥 [2025/09/22] News from Codefuse
37
37
38
-
-[GALLa: Graph Aligned Large Language Models](https://arxiv.org/abs/2409.04183) is accepted by ACL 2025 main conference.[[repo](https://github.com/codefuse-ai/GALLa)]
38
+
-[CGM (Code Graph Model)](https://arxiv.org/abs/2505.16901) is accepted to NeurIPS 2025. CGM currently ranks 1st among open-source models on [SWE-Bench leaderboard](https://www.swebench.com/).[[repo](https://github.com/codefuse-ai/CodeFuse-CGM)]
39
39
40
-
-[CGM (Code Graph Model)](https://arxiv.org/abs/2505.16901) is released, **currently ranking 1st among open-source models on [SWE-Bench leaderboard](https://www.swebench.com/)**.[[repo](https://github.com/codefuse-ai/CodeFuse-CGM)]
40
+
-[GALLa: Graph Aligned Large Language Models](https://arxiv.org/abs/2409.04183) is accepted by ACL 2025 main conference.[[repo](https://github.com/codefuse-ai/GALLa)]
@@ -553,6 +553,8 @@ These models are Transformer encoders, decoders, and encoder-decoders pretrained
553
553
554
554
2.**Dream-Coder**: "Dream-Coder 7B: An Open Diffusion Language Model for Code" [2025-09][[paper](https://arxiv.org/abs/2509.01142)]
555
555
556
+
3. "Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation" [2025-09][[paper](https://arxiv.org/abs/2509.11252)]
557
+
556
558
### 2.4 (Instruction) Fine-Tuning on Code
557
559
558
560
These models apply Instruction Fine-Tuning techniques to enhance the capacities of Code LLMs.
@@ -687,6 +689,10 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
687
689
688
690
65.**SCoder**: "SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs" [2025-09][[paper](https://arxiv.org/abs/2509.07858)]
689
691
692
+
66. "Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models" [2025-09][[paper](https://arxiv.org/abs/2509.11686)]
693
+
694
+
67. "SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems" [2025-09][[paper](https://arxiv.org/abs/2509.14281)]
695
+
690
696
### 2.5 Reinforcement Learning on Code
691
697
692
698
1.**CompCoder**: "Compilable Neural Code Generation with Compiler Feedback" [2022-03][ACL 2022][[paper](https://arxiv.org/abs/2203.05132)]
@@ -753,6 +759,8 @@ These models apply Instruction Fine-Tuning techniques to enhance the capacities
753
759
754
760
32. "Towards Better Correctness and Efficiency in Code Generation" [2025-08][[paper](https://arxiv.org/abs/2508.20124)]
755
761
762
+
33. "Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization" [2025-09][[paper](https://arxiv.org/abs/2509.12434)]
763
+
756
764
## 3. When Coding Meets Reasoning
757
765
758
766
### 3.1 Coding for Reasoning
@@ -1993,7 +2001,7 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
1993
2001
1994
2002
- "NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging" [2025-05][[paper](https://arxiv.org/abs/2505.15356)]
1995
2003
1996
-
- "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" [2025-05][EASE, June 2025][[paper](https://arxiv.org/abs/2505.02931)]
2004
+
- "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" [2025-05][EASE, June 2025][[paper](https://arxiv.org/abs/2505.02931)]
1997
2005
1998
2006
- "Adversarial Reasoning for Repair Based on Inferred Program Intent" [2025-05][ISSTA 2025][[paper](https://arxiv.org/abs/2505.13008)]
1999
2007
@@ -2251,6 +2259,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
- "An Empirical Study on Failures in Automated Issue Solving" [2025-09][[paper](https://arxiv.org/abs/2509.13941)]
2263
+
2254
2264
### Frontend Development
2255
2265
2256
2266
- "Seeking the user interface", 2014-09, ASE 2014, [[paper](https://dl.acm.org/doi/10.1145/2642937.2642976)]
@@ -2647,6 +2657,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
2647
2657
2648
2658
- "Evaluating NL2SQL via SQL2NL" [2025-09][[paper](https://arxiv.org/abs/2509.04657)]
2649
2659
2660
+
- "DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction" [2025-09][[paper](https://arxiv.org/abs/2509.14507)]
2661
+
2650
2662
### Program Proof
2651
2663
2652
2664
- "Baldur: Whole-Proof Generation and Repair with Large Language Models" [2023-03][FSE 2023][[paper](https://arxiv.org/abs/2303.04910)]
@@ -3403,6 +3415,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
3403
3415
3404
3416
- "Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects" [2025-07][[paper](https://arxiv.org/abs/2507.19271)]
3405
3417
3418
+
- "CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects" [2025-09][[paper](https://arxiv.org/abs/2509.14856)]
3419
+
3406
3420
### Log Analysis
3407
3421
3408
3422
- "LogStamp: Automatic Online Log Parsing Based on Sequence Labelling" [2022-08][[paper](https://arxiv.org/abs/2208.10282)]
@@ -3851,6 +3865,8 @@ For each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF
3851
3865
3852
3866
- "When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions" [2025-07][[paper](https://arxiv.org/abs/2507.20439)]
3853
3867
3868
+
- "Prompt Stability in Code LLMs: Measuring Sensitivity across Emotion- and Personality-Driven Variations" [2025-09][[paper](https://arxiv.org/abs/2509.13680)]
3869
+
3854
3870
### Interpretability
3855
3871
3856
3872
- "A Critical Study of What Code-LLMs (Do Not) Learn" [2024-06][ACL 2024 Findings][[paper](https://arxiv.org/abs/2406.11930)]
0 commit comments