Skip to content

Commit a9b5046

Browse files
committed
Adding part 3
1 parent 7fb94fa commit a9b5046

File tree

1 file changed

+109
-21
lines changed

1 file changed

+109
-21
lines changed

readme.md

Lines changed: 109 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,10 @@ You will implement several key functions for each part:
3333

3434
### 📌 Grading criteria:
3535

36-
- **Parts 1 & 2**: Autograded. Full credit if you pass all tests within 30 minutes of runtime.
36+
- **Parts 1 & 2**: Autograded. Full credit if you pass all tests within 10 minutes of runtime.
3737
(Hint: the reference solution runs most tasks in <1s, hardest ones in <10s.)
38-
- **Part 3**: Graded manually. Your LLM must solve at least 60% of test cases.
39-
Upload your llm_synthesis_report.json with all prompts/responses—it’s your proof of work.
38+
- **Part 3**: Autograded. Your LLM must solve at least 80% of test cases.
39+
Upload your llm_synthesis_report.json with all prompts/responses.
4040

4141
For Gradescope submission, zip the following 6 (or 7) files:
4242
- `strings.py`
@@ -45,7 +45,7 @@ For Gradescope submission, zip the following 6 (or 7) files:
4545
- `string_synthesizer.py`
4646
- `llm_string_synthesizer.py`
4747
- `llm_synthesis_report.json`
48-
- (Optional) `readme.md` — for notes, acknowledgements, and AI/collaboration credits.
48+
- (Optional) `readme.md` — for notes, acknowledgements, and AI/collaboration credits, specified below.
4949

5050
### 🤝 Collaboration Policy
5151

@@ -77,7 +77,7 @@ Excessive usage will be monitored, and we may revoke keys if abused.
7777

7878
### 📚 Reference
7979

80-
The design of the synthesizer and the Shape DSL is adapted from [PSET1](https://people.csail.mit.edu/asolar/SynthesisCourse/Assignment1.htm) in MIT’s [Introduction to Program Synthesis](https://people.csail.mit.edu/asolar/SynthesisCourse/index.htm), taught by [Prof. Armando Solar-Lezama](https://people.csail.mit.edu/asolar/).
80+
The design of the synthesizer and the Shape DSL is adapted from [pset1](https://people.csail.mit.edu/asolar/SynthesisCourse/Assignment1.htm) in MIT’s [Introduction to Program Synthesis](https://people.csail.mit.edu/asolar/SynthesisCourse/index.htm), taught by [Prof. Armando Solar-Lezama](https://people.csail.mit.edu/asolar/).
8181

8282
# 🚀 Part 0: Setting Up
8383

@@ -205,7 +205,7 @@ def grow(
205205
> - Progress tracking:
206206
> When you start generating large numbers of programs, visualization helps. Use `tqdm` to show a progress bar and keep your sanity.
207207
208-
### 🔨 Part 1(b). Eliminating (Observationally) Equivalent Shapes
208+
### 🔨 Part 1(b). Eliminating Observationally Equivalent Shapes
209209

210210
Now that you can **grow** shapes, the next challenge is to keep your search space from exploding.
211211
For this, we’ll turn to the more general `BottomUpSynthesizer` (in `enumerative_synthesis.py`) and implement a pruning step: **eliminating observationally equivalent programs**.
@@ -238,10 +238,7 @@ def eliminate_equivalents(
238238
> 💡 Hints & Tips
239239
> - Use the provided `compute_signature()` method (already implemented) to evaluate programs and produce signatures. These signatures will be your deduplication keys.
240240
> - Keep track of which signatures you’ve already seen using `Set` or `Dict`.
241-
> Be careful: different programs may map to the *same* signature—yield only the first and discard the rest.
242241
> - **Important**: use `yield` instead of returning a list. This way, the synthesizer can stop early if it finds a successful program before exhausting the search space.
243-
> - The `cache` is your friend: store previously computed outputs there to save time when the same program shows up again.
244-
245242
246243
### 🔨 Part 1(c). Bottom-up Synthesizing Shapes
247244

@@ -280,15 +277,9 @@ If no solution is found after `max_iterations`, you may `raise ValueError` (whic
280277
> * Manage your **cache** carefully. Without caching signatures, performance will tank as you repeatedly re-evaluate the same programs.
281278
> * When calling `eliminate_equivalents`, remember that the **test inputs** (`test_inputs`) are just the $(x, y)$ coordinates from the examples.
282279
283-
### 🎁 Wrapping Up Part 1
284-
285-
Once you’ve implemented all three core functions:
286-
287-
* `grow()` in `shape_synthesizer.py`
288-
* `eliminate_equivalents()` in `enumerative_synthesis.py`
289-
* `synthesize()` in `enumerative_synthesis.py`
280+
### ⚡ Expected Outcome
290281

291-
it’s time to test your synthesizer:
282+
Once you’ve implemented all three core functions, it’s time to test your synthesizer:
292283

293284
```bash
294285
python test_part1.py
@@ -297,7 +288,7 @@ python test_part1.py
297288
If all goes well, you’ll see your synthesizer solving the test cases one by one.
298289
The goal right now would be to make it faster.
299290

300-
Make sure your synthesizer can pass all test cases within 30 minutes (paralleled) on GradeScope.
291+
Make sure your synthesizer can pass all test cases within **10 minutes** (paralleled) on GradeScope.
301292
For reference, our solution takes about **30 seconds** to pass all test cases (sequentially) on a MacBook Pro with an Apple M1 Pro chip and Python 3.13.0:
302293

303294
```bash
@@ -415,6 +406,15 @@ Design new operations and add them as classes under `strings.py`.
415406
Now look at the `grow()` function in `StringSynthesizer` (`string_synthesizer.py`).
416407
This should work just like Part 1, except with string operations.
417408

409+
``` python
410+
# TODO
411+
def grow(
412+
self,
413+
program_list: List[StringExpression],
414+
examples: List[Tuple[str, str]]
415+
) -> List[StringExpression]:
416+
```
417+
418418
> **Hints:**
419419
>
420420
> * For operations that require extra integer arguments (e.g., substring indices), you can use the provided `common_indices = [0, 1, 2, ...]` as candidate constants.
@@ -445,9 +445,7 @@ Executed in 102.80 secs fish external
445445
sys time 1.25 secs 757.00 micros 1.25 secs
446446
```
447447

448-
✅ You will receive full credit if your synthesizer finishes all test cases within **30 minutes (parallelized)**.
449-
450-
---
448+
✅ You will receive full credit if your synthesizer finishes all test cases within **10 minutes (parallelized)**.
451449

452450
### 🚨 Note on Hard Test Cases
453451

@@ -460,3 +458,93 @@ python test_part2.py --hard
460458
```
461459

462460
Feel free to try, but beware of the combinatorial explosion.
461+
462+
# 🤖 Part 3: LLM Synthesis for Strings
463+
464+
We’ve now reached the final stage. At the age of **foundation models**, why not invite an LLM to help us synthesize string expressions in your very own DSL?
465+
LLM would address the limitation of the combinatorial explosion as well as the requirement for pre-defined terminals in bottom-up synthesis.
466+
467+
In this part, you will implement two functions inside `llm_string_synthesizer.py`:
468+
469+
* **`generate_prompt(examples: List[Tuple[str, str]])`**
470+
Generate a prompt string that instructs the LLM to synthesize a program in your DSL.
471+
472+
* **`extract_program(response: str)`**
473+
Parse the LLM’s output back into a program, represented as a Python `StringExpression` object.
474+
475+
Once implemented, test your solution by running:
476+
477+
```bash
478+
python test_part3.py
479+
```
480+
481+
Initially, all the test cases should be expected to fail.
482+
483+
### Part 3(a): Prompting the LLM
484+
485+
Your `generate_prompt(examples)` function should carefully craft a **prompt** for the LLM.
486+
At a minimum, the prompt should include:
487+
488+
* Clear **instructions** for generating programs, including the required response format.
489+
* A **description of your DSL**, including every operation, its syntax, and its semantics (what it does).
490+
* A nicely formatted list of **input-output examples** for the LLM to learn from.
491+
492+
The return value should be a single string, which will be passed directly to the LLM.
493+
494+
### Part 3(b): Parsing LLM Outputs
495+
496+
The LLM’s response needs to be turned back into a valid DSL program.
497+
498+
* Implement this logic in `extract_program(response: str)`.
499+
* Depending on your prompt design, parsing may involve:
500+
* Direct evaluation of the response (e.g., with Python’s `eval`)
501+
* Or custom parsing if the output is more free-form
502+
* If the response cannot be parsed, raise an error. Otherwise, return a concrete `StringExpression` object.
503+
504+
### ⚡ Expected Outcome
505+
506+
When you run:
507+
508+
```bash
509+
python test_part3.py
510+
```
511+
512+
You should see output like the following:
513+
514+
```
515+
================================================================================
516+
PART 3: LLM-BASED STRING SYNTHESIS TESTING
517+
================================================================================
518+
519+
============================================================
520+
Testing: test_formal_greeting
521+
============================================================
522+
Examples:
523+
1. 'hello' -> 'HELLO'
524+
2. 'world' -> 'WORLD'
525+
3. 'python' -> 'PYTHON'
526+
4. 'synthesis' -> 'SYNTHESIS'
527+
5. 'programming' -> 'PROGRAMMING'
528+
529+
Running LLM synthesis...
530+
Synthesized program: <HIDDEN>
531+
532+
Verification:
533+
1. ✓ 'hello' -> 'HELLO' (expected: 'HELLO')
534+
2. ✓ 'world' -> 'WORLD' (expected: 'WORLD')
535+
3. ✓ 'python' -> 'PYTHON' (expected: 'PYTHON')
536+
4. ✓ 'synthesis' -> 'SYNTHESIS' (expected: 'SYNTHESIS')
537+
5. ✓ 'programming' -> 'PROGRAMMING' (expected: 'PROGRAMMING')
538+
🎉 SUCCESS: Program works correctly on all examples!
539+
```
540+
541+
As a by-product, a **`.json` report** will be generated in your working directory.
542+
Do not modify this file manually — just submit it along with your code.
543+
544+
### 🎯 Grading Criteria
545+
546+
* You will receive **full credit** if Gemini 2.5 Pro, using your prompt, can pass at least **60% of the test cases** (there are a total of 53).
547+
* As a reference, our sample solution achieves about **95% success rate**.
548+
549+
✨ That’s it — you’ve completed the full cycle: from **bottom-up synthesis** (Part 1 & 2) to **LLM-assisted synthesis** (Part 3). Congratulations!
550+
Please zip the relevant files and submit your assignment on GradeScope!

0 commit comments

Comments
 (0)