Skip to content

Commit b56ee97

Browse files
authored
Merge pull request #10 from VectorlyApp/alex
Resolve some issues
2 parents 7002ab6 + 5420f9c commit b56ee97

21 files changed

+591
-296
lines changed

.github/workflows/tests.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# .github/workflows/tests.yml
2+
3+
name: Linter and Tests
4+
5+
on:
6+
push:
7+
branches: [main]
8+
pull_request: # catches PRs from feature branches → main
9+
types: [opened, synchronize, reopened]
10+
11+
permissions:
12+
contents: read # fetch code
13+
id-token: write # enable OIDC if we ever need cloud creds
14+
# pull-requests: write # only if you want to post PR comments
15+
16+
jobs:
17+
lintAndTest:
18+
runs-on: "ubuntu-latest"
19+
defaults:
20+
run:
21+
shell: bash -l {0}
22+
23+
steps:
24+
- uses: actions/checkout@v4
25+
26+
- name: Install uv
27+
uses: astral-sh/setup-uv@v3
28+
with:
29+
version: "latest"
30+
31+
- name: Cache uv dependencies
32+
uses: actions/cache@v4
33+
with:
34+
path: |
35+
~/.cache/uv
36+
key: ${{ runner.os }}-uv-${{ hashFiles('pyproject.toml') }}
37+
38+
- name: Install dependencies
39+
run: uv sync
40+
41+
- name: Lint
42+
run: uv run pylint $(git ls-files '*.py')
43+
44+
#- name: Run tests
45+
# run: uv run pytest tests/ -v

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,4 +211,6 @@ __marimo__/
211211

212212
# output directories
213213
cdp_captures/
214+
cdp_captures*/
214215
routine_discovery_output/
216+
routine_discovery_output*/

.pylintrc

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[MAIN]
2+
3+
# minimum score (out of 10) required to pass
4+
fail-under=0.00
5+
6+
# files or directories to be skipped
7+
ignore=*.ipynb
8+
9+
disable=
10+
too-many-locals,
11+
missing-module-docstring
12+
13+
[FORMAT]
14+
15+
max-line-length=125
16+
17+
[./src/]
18+
19+
disable=
20+
too-few-public-methods,
21+
22+
[./tests]
23+
24+
disable=
25+
missing-function-docstring,
26+
missing-class-docstring,
27+
missing-module-docstring,
28+
too-few-public-methods,

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12.3

README.md

Lines changed: 45 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ Placeholders inside operation fields are resolved at runtime:
120120

121121
Interpolation occurs before an operation executes. For example, a fetch endpoint might be:
122122

123-
```
123+
```json
124124
{
125125
"type": "fetch",
126126
"endpoint": {
@@ -137,24 +137,26 @@ Interpolation occurs before an operation executes. For example, a fetch endpoint
137137

138138
This substitutes parameter values and injects `auth_token` from cookies. The JSON response is stored under `sessionStorage['result_key']` and can be returned by a final `return` operation using the matching `session_storage_key`.
139139

140-
## Prerequisits
140+
## Prerequisites
141141

142-
- Python 3.11+
142+
- Python 3.12+
143143
- Google Chrome (stable)
144-
- uv (Python package manager)
144+
- [uv (Python package manager)](https://github.com/astral-sh/uv)
145145
- macOS/Linux: `curl -LsSf https://astral.sh/uv/install.sh | sh`
146146
- Windows (PowerShell): `iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex`
147147
- OpenAI API key
148148

149149
## Set up Your Environment 🔧
150150

151+
### Linux
152+
151153
```bash
152154
# 1) Clone and enter the repo
153155
git clone https://github.com/VectorlyApp/web-hacker.git
154156
cd web-hacker
155157

156158
# 2) Create & activate virtual environment (uv)
157-
uv venv .venv
159+
uv venv --prompt web-hacker
158160
source .venv/bin/activate # Windows: .venv\\Scripts\\activate
159161

160162
# 3) Install in editable mode via uv (pip-compatible interface)
@@ -166,6 +168,29 @@ cp .env.example .env # then edit values
166168
export OPENAI_API_KEY="sk-..."
167169
```
168170

171+
### Windows
172+
173+
```powershell
174+
# 1) Clone and enter the repo
175+
git clone https://github.com/VectorlyApp/web-hacker.git
176+
cd web-hacker
177+
178+
# 2) Install uv (if not already installed)
179+
iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex
180+
181+
# 3) Create & activate virtual environment (uv)
182+
uv venv --prompt web-hacker
183+
.venv\Scripts\activate
184+
185+
# 4) Install in editable mode via uv (pip-compatible interface)
186+
uv pip install -e .
187+
188+
# 5) Configure environment
189+
copy .env.example .env # then edit values
190+
# or set directly
191+
$env:OPENAI_API_KEY="sk-..."
192+
```
193+
169194
## Launch Chrome in Debug Mode 🐞
170195

171196
### Instructions for MacOS
@@ -242,15 +267,10 @@ Use the CDP browser monitor to block trackers and capture network, storage, and
242267
**Run this command to start monitoring:**
243268

244269
```bash
245-
python scripts/browser_monitor.py \
246-
--host 127.0.0.1 \
247-
--port 9222 \
248-
--output-dir ./cdp_captures \
249-
--url about:blank \
250-
--incognito
270+
python scripts/browser_monitor.py --host 127.0.0.1 --port 9222 --output-dir ./cdp_captures --url about:blank --incognito
251271
```
252272

253-
The script will open a new tab (starting at `about:blank`). Navigate to your target website, then manually perform the actions you want to automate (e.g., search, login, export report). Keep Chrome focused during this process. Press `Ctrl+C` when done; the script will consolidate transactions and produce a HAR automatically.
273+
The script will open a new tab (starting at `about:blank`). Navigate to your target website, then manually perform the actions you want to automate (e.g., search, login, export report). Keep Chrome focused during this process. Press `Ctrl+C` and the script will consolidate transactions and produce a HAR automatically.
254274

255275
**Output structure** (under `--output-dir`, default `./cdp_captures`):
256276

@@ -265,8 +285,8 @@ cdp_captures/
265285
│ ├── request.json
266286
│ ├── response.json
267287
│ └── response_body.[ext]
268-
── storage/
269-
└── events.jsonl
288+
── storage/
289+
└── events.jsonl
270290
```
271291

272292
Tip: Keep Chrome focused while monitoring and perform the target flow (search, checkout, etc.). Press Ctrl+C to stop; the script will consolidate transactions and produce a HTTP Archive (HAR) automatically.
@@ -281,14 +301,21 @@ Use the **routine-discovery pipeline** to analyze captured data and synthesize a
281301

282302
> ⚠️ **Important:** You must specify your own `--task` parameter. The example below is just for demonstration—replace it with a description of what you want to automate.
283303
284-
```
304+
**Linux/macOS (bash):**
305+
```bash
285306
python scripts/discover_routines.py \
286307
--task "recover the api endpoints for searching for trains and their prices" \
287308
--cdp-captures-dir ./cdp_captures \
288309
--output-dir ./routine_discovery_output \
289310
--llm-model gpt-5
290311
```
291312

313+
**Windows (PowerShell):**
314+
```powershell
315+
# Simple task (no quotes inside):
316+
python scripts/discover_routines.py --task "Recover the API endpoints for searching for trains and their prices" --cdp-captures-dir ./cdp_captures --output-dir ./routine_discovery_output --llm-model gpt-5
317+
```
318+
292319
**Example tasks:**
293320
- `"recover the api endpoints for searching for trains and their prices"` (shown above)
294321
- `"discover how to search for flights and get pricing"`
@@ -322,6 +349,7 @@ routine_discovery_output/
322349
```json
323350
"field": "{{paramName}}"
324351
```
352+
325353
And `paramName` is a string parameter, manually change it to:
326354
```json
327355
"field": "\"{{paramName}}\""
@@ -331,7 +359,7 @@ This ensures the parameter value is properly quoted as a JSON string when substi
331359

332360
Run the example routine:
333361

334-
```
362+
```bash
335363
# Using a parameters file:
336364

337365
python scripts/execute_routine.py \
@@ -347,7 +375,7 @@ python scripts/execute_routine.py \
347375

348376
Run a discovered routine:
349377

350-
```
378+
```bash
351379
python scripts/execute_routine.py \
352380
--routine-path routine_discovery_output/routine.json \
353381
--parameters-path routine_discovery_output/test_parameters.json

pyproject.toml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,25 @@
1+
# pyproject.toml for web-hacker
2+
13
[build-system]
24
requires = ["hatchling"]
35
build-backend = "hatchling.build"
46

57
[project]
68
name = "web-hacker"
79
version = "0.1.0"
8-
description = "Add your description here"
10+
description = " Reverse engineer any web app!"
911
readme = "README.md"
10-
requires-python = ">=3.11"
12+
requires-python = ">=3.12.3,<3.13" # pinning to 3.12.x
1113
dependencies = [
14+
"ipykernel>=6.29.5",
1215
"openai>=2.6.1",
16+
"pydantic>=2.11.4",
17+
"pylint>=3.0.0",
18+
"pytest>=8.3.5",
1319
"python-dotenv>=1.2.1",
14-
"websocket-client>=1.6.0",
1520
"requests>=2.31.0",
21+
"websockets>=15.0.1",
22+
"websocket-client>=1.6.0",
1623
]
1724

1825
[tool.hatch.build.targets.wheel]

scripts/discover_routines.py

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,20 @@
11
"""
22
Script for discovering routines from the network transactions.
33
"""
4+
45
from argparse import ArgumentParser
6+
import logging
7+
import os
8+
9+
from dotenv import load_dotenv
510
from openai import OpenAI
11+
612
from src.routine_discovery.agent import RoutineDiscoveryAgent
713
from src.routine_discovery.context_manager import ContextManager
8-
from dotenv import load_dotenv
9-
import os
14+
15+
logging.basicConfig(level=logging.INFO)
16+
logger = logging.getLogger(__name__)
17+
1018

1119
def main() -> None:
1220

@@ -25,10 +33,9 @@ def main() -> None:
2533
if os.getenv("OPENAI_API_KEY") is None:
2634
raise ValueError("OPENAI_API_KEY is not set")
2735

28-
29-
print(f"\n{'-' * 100}")
30-
print(f"Starting routine discovery for task:\n{args.task}")
31-
print(f"{'-' * 100}\n")
36+
logger.info(f"\n{'-' * 100}")
37+
logger.info(f"Starting routine discovery for task:\n{args.task}")
38+
logger.info(f"{'-' * 100}\n")
3239

3340
# initialize OpenAI client
3441
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@ -45,11 +52,11 @@ def main() -> None:
4552
storage_jsonl_path=os.path.join(args.cdp_captures_dir, "storage/events.jsonl")
4653
)
4754

48-
print(f"Context manager initialized.")
55+
logger.info(f"Context manager initialized.")
4956

5057
# make the vectorstore
5158
context_manager.make_vectorstore()
52-
print(f"Vectorstore created: {context_manager.vectorstore_id}")
59+
logger.info(f"Vectorstore created: {context_manager.vectorstore_id}")
5360

5461
# initialize routine discovery agent
5562
routine_discovery_agent = RoutineDiscoveryAgent(
@@ -59,15 +66,15 @@ def main() -> None:
5966
llm_model=args.llm_model,
6067
output_dir=args.output_dir,
6168
)
62-
print(f"Routine discovery agent initialized.")
69+
logger.info(f"Routine discovery agent initialized.")
6370

64-
print(f"\n{'-' * 100}")
65-
print(f"Running routine discovery agent.")
66-
print(f"{'-' * 100}\n")
71+
logger.info(f"\n{'-' * 100}")
72+
logger.info(f"Running routine discovery agent.")
73+
logger.info(f"{'-' * 100}\n")
6774

6875
# run the routine discovery agent
6976
routine_discovery_agent.run()
70-
print(f"Routine discovery agent run complete")
77+
logger.info(f"Routine discovery agent run complete")
7178

7279

7380
if __name__ == "__main__":

src/cdp/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)