You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -191,6 +191,7 @@ Your task is to implement `grow()`, which:
191
191
Once implemented, your `grow()` function will be the engine that drives bottom-up search over the DSL, which step by step builds increasingly complex shapes.
192
192
193
193
```python
194
+
#TODO
194
195
defgrow(
195
196
self,
196
197
program_list: List[Shape],
@@ -210,13 +211,15 @@ Now that you can **grow** shapes, the next challenge is to keep your search spac
210
211
For this, we’ll turn to the more general `BottomUpSynthesizer` (in `enumerative_synthesis.py`) and implement a pruning step: **eliminating observationally equivalent programs**.
211
212
212
213
Two programs are **observationally equivalent** if they produce the **same outputs** on the **same inputs**. For example, look at these two programs:
These two programs are *different syntactically* but *indistinguishable observationally* (their outputs match on all test points).
214
+
*`Union(Circle(0,0,1), Circle(0,0,1))`
215
+
*`Circle(0,0,1)`
216
+
217
+
These two programs are *different syntactically* but *indistinguishable observationally* (their outputs match on all given test points).
216
218
217
219
Your job is to filter out duplicates like these so the synthesizer only keeps *unique behaviors*. Please implement the `eliminate_equivalents` function:
218
220
219
221
```python
222
+
#TODO
220
223
defeliminate_equivalents(
221
224
self,
222
225
program_list: List[T],
@@ -247,7 +250,12 @@ Now it’s time to put everything together! You’ve implemented **growing** sha
247
250
Head to the function `synthesize()` in `BottomUpSynthesizer` under the file `enumerative_synthesis.py`:
@@ -260,17 +268,17 @@ The **bottom-up synthesis loop** works like this:
260
268
261
269
***Grow**: expand the program set one level deeper using `grow()`.
262
270
***Eliminate equivalents**: prune duplicates with `eliminate_equivalents()`.
263
-
***Check for success**: after pruning, see if any program satisfies all examples using `is_correct()`.
271
+
***Check for success**: while pruning (remember that we are using `yield`), see if any program satisfies all examples using `is_correct()`.
264
272
* If yes → 🎉 return that program immediately.
265
273
* Otherwise → continue expanding.
266
274
267
275
If no solution is found after `max_iterations`, you may `raise ValueError` (which is already provided).
268
276
269
277
> 💡 Hints & Tips
270
278
>
271
-
> * Use the helper functions! Don’t reinvent wheels—`generate_terminals`, `grow`, `eliminate_equivalents`, and `is_correct` are there to help.
272
-
> * Manage your **cache** carefully—without caching signatures, performance will tank as you repeatedly re-evaluate the same programs.
273
-
> * When calling `eliminate_equivalents`, remember that the **test inputs** are just the `(x, y)` coordinates from the examples.
279
+
> * Use the helper functions! Don’t reinvent wheels. `generate_terminals`, `grow`, `eliminate_equivalents`, and `is_correct` are there to help.
280
+
> * Manage your **cache** carefully. Without caching signatures, performance will tank as you repeatedly re-evaluate the same programs.
281
+
> * When calling `eliminate_equivalents`, remember that the **test inputs**(`test_inputs`) are just the $(x, y)$ coordinates from the examples.
274
282
275
283
### 🎁 Wrapping Up Part 1
276
284
@@ -287,8 +295,10 @@ python test_part1.py
287
295
```
288
296
289
297
If all goes well, you’ll see your synthesizer solving the test cases one by one.
298
+
The goal right now would be to make it faster.
290
299
291
-
For reference, our solution takes about **30 seconds** to pass all test cases on a MacBook Pro with an Apple M1 Pro chip and Python 3.13.0:
300
+
Make sure your synthesizer can pass all test cases within 30 minutes (paralleled) on GradeScope.
301
+
For reference, our solution takes about **30 seconds** to pass all test cases (sequentially) on a MacBook Pro with an Apple M1 Pro chip and Python 3.13.0:
*`__init__`: defines the *syntax* of the operation (what arguments it takes).
368
+
*`interpret`: defines the *semantics* (how to evaluate the operation).
369
+
* Implementing `__eq__`, `__hash__`, and `__str__` is highly recommended for debugging and deduplication.
370
+
371
+
> 👉 Pro tip: don’t handwrite too much boilerplate — LLMs are great at generating these helper methods.
372
+
373
+
### ✅ Testing Your Solution
374
+
375
+
Before you begin, run the test script:
376
+
377
+
```bash
378
+
python test_part2.py
379
+
```
380
+
381
+
You’ll see all test cases fail initially:
382
+
383
+
```
384
+
test_get_parent_directory ✗ FAIL
385
+
test_extract_directory_path ✗ FAIL
386
+
test_normalize_path_separators ✗ FAIL
387
+
388
+
Summary:
389
+
Total test cases: 25
390
+
Synthesis succeeded: 0
391
+
Fully correct: 0
392
+
Success rate: 0.0%
393
+
Accuracy rate: 0.0%
394
+
```
395
+
396
+
Your goal: design your DSL and synthesizer so these tests pass!
397
+
398
+
### 🎯 Your Task
399
+
400
+
1.**Extend the DSL** with new operations (Part 2a).
401
+
2.**Implement the grow function** so the synthesizer can explore programs using your DSL (Part 2b).
402
+
403
+
### Part 2(a): Creating Your Own DSL
404
+
405
+
Design new operations and add them as classes under `strings.py`.
406
+
407
+
> **Hints:**
408
+
>
409
+
> * Look at common string functions in Python/Java/C++ for inspiration (e.g., `substring`, `replace`, `find`).
410
+
> * Check the test cases in `test_part2.py` and ask yourself: *What minimal set of operations can solve all of them?*
411
+
> * Start small! Tackle easy test cases first, then add operators as needed.
412
+
413
+
### Part 2(b): Growing String Expressions
414
+
415
+
Now look at the `grow()` function in `StringSynthesizer` (`string_synthesizer.py`).
416
+
This should work just like Part 1, except with string operations.
417
+
418
+
> **Hints:**
419
+
>
420
+
> * For operations that require extra integer arguments (e.g., substring indices), you can use the provided `common_indices = [0, 1, 2, ...]` as candidate constants.
421
+
> ***Pruning is essential**. Without pruning, your search will explode. Examples:
422
+
> * For `substring(str, start, end)`, skip invalid cases like `start > end`.
423
+
> * When using `StringLiteral`, only allow literals that actually appear in one of the example outputs.
424
+
> * The more carefully you prune, the faster your synthesizer will run.
425
+
426
+
### ⚡ Expected Outcome
427
+
428
+
When everything is working, your synthesizer should solve **all 25 provided test cases** in `test_part2.py`.
429
+
430
+
For reference, our solution takes about **100 seconds** sequentially on a MacBook Pro (M1 Pro, Python 3.13.0).
✅ You will receive full credit if your synthesizer finishes all test cases within **30 minutes (parallelized)**.
449
+
450
+
---
451
+
452
+
### 🚨 Note on Hard Test Cases
453
+
454
+
There are additional **hard test cases** (see `get_hard_test_cases`) that we do not expect your DSL + synthesizer to solve (at least not without exponential blowup).
455
+
456
+
You can run them with:
457
+
458
+
```bash
459
+
python test_part2.py --hard
460
+
```
311
461
312
-
# Part 3: LLM Synthesis for Strings
462
+
Feel free to try, but beware of the combinatorial explosion.
0 commit comments