Skip to content

Commit 7149e46

Browse files
authored
Merge branch 'main' into feature/toonify-integration
2 parents 8f03587 + b1b59f1 commit 7149e46

File tree

103 files changed

+18282
-28
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

103 files changed

+18282
-28
lines changed

.agent/system/project_architecture.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,14 @@ scrapegraph-sdk/
8585
- **aiohttp** 3.10+ - Async HTTP client
8686
- **pydantic** 2.10.2+ - Data validation and modeling
8787
- **python-dotenv** 1.0.1+ - Environment variable management
88-
- **beautifulsoup4** 4.12.3+ - HTML parsing (for pagination)
88+
89+
**Optional Dependencies:**
90+
- **beautifulsoup4** 4.12.3+ - HTML parsing (for HTML validation when using `website_html`)
91+
- Install with: `pip install scrapegraph-py[html]`
92+
- **langchain** 0.3.0+ - Langchain integration for AI workflows
93+
- **langchain-community** 0.2.11+ - Community integrations for Langchain
94+
- **langchain-scrapegraph** 0.1.0+ - ScrapeGraph integration for Langchain
95+
- Install with: `pip install scrapegraph-py[langchain]`
8996

9097
**Development Tools:**
9198
- **pytest** 7.4.0+ - Testing framework
@@ -879,12 +886,17 @@ npm publish
879886

880887
### Python SDK Dependencies
881888

882-
**Runtime:**
889+
**Core Runtime:**
883890
- **requests**: Sync HTTP client
884891
- **aiohttp**: Async HTTP client
885892
- **pydantic**: Data validation
886893
- **python-dotenv**: Environment variables
887-
- **beautifulsoup4**: HTML parsing
894+
895+
**Optional Runtime (install with extras):**
896+
- **beautifulsoup4**: HTML parsing (required when using `website_html`)
897+
- Install with: `pip install scrapegraph-py[html]`
898+
- **langchain, langchain-community, langchain-scrapegraph**: Langchain integration
899+
- Install with: `pip install scrapegraph-py[langchain]`
888900

889901
**Development:**
890902
- **pytest & plugins**: Testing framework
@@ -918,7 +930,7 @@ Both SDKs depend on the ScrapeGraph AI API:
918930
| **Architecture** | Class-based (Client, AsyncClient) | Function-based |
919931
| **Async Support** | ✅ Separate AsyncClient | ✅ All functions async |
920932
| **Type Safety** | ✅ Pydantic models, mypy | ⚠️ JSDoc comments |
921-
| **Dependencies** | 5 runtime deps | 0 runtime deps |
933+
| **Dependencies** | 4 core + 2 optional extras | 0 runtime deps |
922934
| **Testing** | pytest with mocking | Manual tests |
923935
| **Documentation** | MkDocs auto-generated | README examples |
924936
| **Package Size** | ~50KB | ~20KB |

.github/workflows/python-publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
python -m pip install --upgrade pip
3636
pip install pytest pytest-asyncio responses
3737
cd scrapegraph-py
38-
pip install -e .
38+
pip install -e ".[html]"
3939
4040
- name: Run mocked tests with coverage
4141
run: |

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
python -m pip install --upgrade pip
3535
pip install pytest pytest-asyncio responses
3636
cd scrapegraph-py
37-
pip install -e .
37+
pip install -e ".[html]"
3838
3939
- name: Run mocked tests with coverage
4040
run: |

.gitignore

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,3 @@
44
**/.DS_Store
55
*.csv
66
venv/
7-
8-
__pycache__/

CLAUDE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,10 @@ scrapegraph-sdk/
4444
### Python SDK
4545
- **Language**: Python 3.10+
4646
- **Package Manager**: uv (recommended) or pip
47-
- **Dependencies**: requests, pydantic, python-dotenv, aiohttp, beautifulsoup4
47+
- **Core Dependencies**: requests, pydantic, python-dotenv, aiohttp
48+
- **Optional Dependencies**:
49+
- `html`: beautifulsoup4 (for HTML validation when using `website_html`)
50+
- `langchain`: langchain, langchain-community, langchain-scrapegraph (for Langchain integrations)
4851
- **Testing**: pytest, pytest-asyncio, pytest-mock, aioresponses
4952
- **Code Quality**: black, isort, ruff, mypy, pre-commit
5053
- **Documentation**: mkdocs, mkdocs-material
838 Bytes
Binary file not shown.

scrapegraph-js/.gitignore

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Logs
2+
logs
3+
*.log
4+
npm-debug.log*
5+
yarn-debug.log*
6+
yarn-error.log*
7+
lerna-debug.log*
8+
.pnpm-debug.log*
9+
10+
# Diagnostic reports (https://nodejs.org/api/report.html)
11+
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
12+
13+
# Runtime data
14+
pids
15+
*.pid
16+
*.seed
17+
*.pid.lock
18+
19+
# Directory for instrumented libs generated by jscoverage/JSCover
20+
lib-cov
21+
22+
# Coverage directory used by tools like istanbul
23+
coverage
24+
*.lcov
25+
26+
# nyc test coverage
27+
.nyc_output
28+
29+
# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
30+
.grunt
31+
32+
# Bower dependency directory (https://bower.io/)
33+
bower_components
34+
35+
# node-waf configuration
36+
.lock-wscript
37+
38+
# Compiled binary addons (https://nodejs.org/api/addons.html)
39+
build/Release
40+
41+
# Dependency directories
42+
node_modules/
43+
jspm_packages/
44+
45+
# Snowpack dependency directory (https://snowpack.dev/)
46+
web_modules/
47+
48+
# TypeScript cache
49+
*.tsbuildinfo
50+
51+
# Optional npm cache directory
52+
.npm
53+
54+
# Optional eslint cache
55+
.eslintcache
56+
57+
# Optional stylelint cache
58+
.stylelintcache
59+
60+
# Microbundle cache
61+
.rpt2_cache/
62+
.rts2_cache_cjs/
63+
.rts2_cache_es/
64+
.rts2_cache_umd/
65+
66+
# Optional REPL history
67+
.node_repl_history
68+
69+
# Output of 'npm pack'
70+
*.tgz
71+
72+
# Yarn Integrity file
73+
.yarn-integrity
74+
75+
# dotenv environment variable files
76+
.env
77+
.env.development.local
78+
.env.test.local
79+
.env.production.local
80+
.env.local
81+
82+
# parcel-bundler cache (https://parceljs.org/)
83+
.cache
84+
.parcel-cache
85+
86+
# Next.js build output
87+
.next
88+
out
89+
90+
# Nuxt.js build / generate output
91+
.nuxt
92+
dist
93+
94+
# Gatsby files
95+
.cache/
96+
# Comment in the public line in if your project uses Gatsby and not Next.js
97+
# https://nextjs.org/blog/next-9-1#public-directory-support
98+
# public
99+
100+
# vuepress build output
101+
.vuepress/dist
102+
103+
# vuepress v2.x temp and cache directory
104+
.temp
105+
.cache
106+
107+
# Docusaurus cache and generated files
108+
.docusaurus
109+
110+
# Serverless directories
111+
.serverless/
112+
113+
# FuseBox cache
114+
.fusebox/
115+
116+
# DynamoDB Local files
117+
.dynamodb/
118+
119+
# TernJS port file
120+
.tern-port
121+
122+
# Stores VSCode versions used for testing VSCode extensions
123+
.vscode-test
124+
125+
# yarn v2
126+
.yarn/cache
127+
.yarn/unplugged
128+
.yarn/build-state.yml
129+
.yarn/install-state.gz
130+
.pnp.*

scrapegraph-js/.prettierignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/node_modules

scrapegraph-js/.prettierrc.json

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"semi": true,
3+
"singleQuote": true,
4+
"trailingComma": "es5",
5+
"tabWidth": 2,
6+
"useTabs": false,
7+
"printWidth": 110,
8+
"bracketSpacing": true,
9+
"arrowParens": "always",
10+
"quoteProps": "preserve"
11+
}

0 commit comments

Comments
 (0)