Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,63 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.2.0] - 2025-10-24

### Added
- **Structured JSON Output Format (Phase 1)**: Integration-friendly JSON format optimized for VSCode extensions and UIs
- New `diffgraph/structured_export.py` module for structured data transformation
- Automatic file categorization: `auto_generated`, `documentation`, `configuration`, `source_code`
- Rich metadata including git diff stats (additions/deletions per file)
- Impact radius calculation from dependency graphs
- Clean separation of files and components with explicit graph structure
- Complete graph validation (all edge targets exist as nodes)
- Pattern-based classification with 40+ common patterns
- Comprehensive test suite (`test_structured_export.py`) for structured export
- Design documentation in `docs/planning/` for future enhancements

### Changed
- **JSON format now uses structured output by default** (breaking change for JSON, but backwards compatible overall)
- `--format graph --graph-format json`: Now outputs structured format
- `--format graph --graph-format pickle`: Still uses NetworkX format
- `--format graph --graph-format graphml`: Still uses NetworkX format
- Updated README with structured JSON examples and usage patterns
- Enhanced documentation to explain categorization and structure

### Technical Details
- Phase 1 implementation uses existing analysis data
- Advanced fields (complexity, line numbers, parameters) reserved for Phase 2
- External dependency nodes reserved for Phase 2
- Advanced relationship detection (REST/RPC/pub-sub) reserved for Phase 2
- Structure designed for iterative enhancement without breaking changes

## [1.1.0] - 2025-10-24

### Added
- **Graph Data Export Feature**: Export complete networkx graph data structure to file
- New `--format` option to choose between HTML and graph output formats
- New `--graph-format` option to select serialization format (json, pickle, graphml)
- Support for JSON export (human-readable, widely compatible)
- Support for Pickle export (Python-native, preserves exact data structures)
- Support for GraphML export (standard graph format for analysis tools)
- New `diffgraph/graph_export.py` module with export/import functions
- `export_to_dict()` method added to GraphManager for serialization
- `load_graph_from_json()` and `load_graph_from_pickle()` functions for loading exported data
- Comprehensive test suite (`test_graph_export.py`) for graph export functionality
- Example usage script (`example_usage.py`) demonstrating how to use exported data
- Automated test script (`test_cli_manual.sh`) for easy feature validation
- Documentation: `GRAPH_EXPORT_FEATURE.md` and `TESTING_GUIDE.md`

### Changed
- Updated `--output` option description to reflect format-aware default paths
- Enhanced README with graph export documentation and usage examples
- Updated feature list to include graph data export capabilities

### Technical Details
- Exported data includes file nodes, component nodes, dependency graphs, and metadata
- All graph data can be loaded back into GraphManager for programmatic analysis
- NetworkX graphs are serialized using node-link format for compatibility
- Backward compatible: existing HTML output functionality unchanged

## [1.0.0] - 2025-08-06

### Changed
Expand Down
187 changes: 187 additions & 0 deletions GRAPH_EXPORT_FEATURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Graph Export Feature

## Overview

The DiffGraph CLI now supports exporting the complete networkx graph data structure directly to a file, allowing other programs to programmatically access and analyze the code change data.

## What's New

### CLI Options

- `--format` / `-f`: Choose output format (`html` or `graph`)
- `--graph-format`: Choose serialization format for graph export (`json`, `pickle`, or `graphml`)
- `--output` / `-o`: Output file path (auto-detects extension based on format)

### Usage Examples

```bash
# Export as JSON (default for graph format)
wild diff --format graph --output analysis.json

# Export as pickle
wild diff --format graph --graph-format pickle --output analysis.pkl

# Export as GraphML
wild diff --format graph --graph-format graphml --output analysis.graphml

# HTML output still works as before (default)
wild diff --output report.html
```

## Exported Data Structure

The exported graph data includes:

1. **File Nodes**: All analyzed files with their metadata
- Path
- Status (pending/processing/processed/error)
- Change type (added/deleted/modified/unchanged)
- Summary
- Components list

2. **Component Nodes**: All code components (classes, functions, methods)
- Name
- File path
- Change type
- Component type (container/function/method)
- Parent component (for nested components)
- Summary
- Dependencies
- Dependents

3. **Graph Structures**: NetworkX directed graphs
- File dependency graph
- Component dependency graph

4. **Metadata**
- Version information
- Processing status
- List of processed files

## JSON Format Example

```json
{
"version": "1.0",
"file_nodes": {
"app/main.py": {
"path": "app/main.py",
"status": "processed",
"change_type": "modified",
"summary": "Modified main application file",
"error": null,
"components": []
}
},
"component_nodes": {
"app/main.py::MyClass": {
"name": "MyClass",
"file_path": "app/main.py",
"change_type": "modified",
"component_type": "container",
"parent": null,
"summary": "Main application class",
"dependencies": [],
"dependents": []
}
},
"file_graph": { ... },
"component_graph": { ... },
"processed_files": ["app/main.py"]
}
```

## Using Exported Data

### Loading Graph Data

```python
from diffgraph.graph_export import load_graph_from_json

# Load exported data
graph_manager = load_graph_from_json('analysis.json')

# Access file information
for file_path, file_node in graph_manager.file_nodes.items():
print(f"{file_path}: {file_node.change_type.value}")

# Access component information
for component_id, component in graph_manager.component_nodes.items():
print(f"{component.name}: {len(component.dependencies)} dependencies")
```

### Analyzing with NetworkX

```python
import networkx as nx

# Get the component dependency graph
graph = graph_manager.component_graph

# Find most connected components
degree_centrality = nx.degree_centrality(graph)
most_connected = max(degree_centrality.items(), key=lambda x: x[1])

# Find cycles
try:
cycles = nx.find_cycle(graph)
print(f"Found circular dependencies: {cycles}")
except nx.NetworkXNoCycle:
print("No circular dependencies found")
```

## Implementation Details

### New Files

- `diffgraph/graph_export.py`: Core export/import functionality
- `export_graph()`: Main export function
- `export_graph_to_json()`: JSON serialization
- `export_graph_to_pickle()`: Pickle serialization
- `export_graph_to_graphml()`: GraphML serialization
- `load_graph_from_json()`: Load from JSON
- `load_graph_from_pickle()`: Load from pickle

### Modified Files

- `diffgraph/cli.py`: Added new CLI options and conditional output logic
- `diffgraph/graph_manager.py`: Added `export_to_dict()` method
- `README.md`: Updated documentation with new features

### Test Files

- `test_graph_export.py`: Comprehensive tests for export/import functionality
- `example_usage.py`: Example script showing how to use exported data

## Benefits

1. **Programmatic Access**: Other tools can now consume DiffGraph analysis results
2. **Data Persistence**: Save analysis for later review or comparison
3. **Integration**: Easy integration with CI/CD pipelines and automated workflows
4. **Flexibility**: Multiple format options for different use cases
5. **Compatibility**: Standard formats (JSON, GraphML) work with various tools

## Testing

Run the test suite:

```bash
python test_graph_export.py
```

Try the example:

```bash
# Export some changes
wild diff --format graph --output my-changes.json

# Analyze the exported data
python example_usage.py my-changes.json
```

## Backward Compatibility

All existing functionality is preserved. The default behavior remains unchanged:
- Default output format is still HTML
- Existing CLI options work as before
- No breaking changes to the API
114 changes: 111 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ DiffGraph-CLI is a powerful command-line tool that visualizes code changes using

- 📊 Visualizes code changes as a dependency graph
- 🤖 AI-powered analysis of code changes
- 💾 Export graph data in multiple formats (JSON, Pickle, GraphML)
- 🌙 Dark mode support
- 📝 Markdown-formatted summaries
- 🔍 Syntax highlighting for code blocks
Expand Down Expand Up @@ -56,24 +57,131 @@ This will:
### Command-line Options

- `--api-key`: Specify your OpenAI API key (defaults to OPENAI_API_KEY environment variable)
- `--output` or `-o`: Specify the output HTML file path (default: diffgraph.html)
- `--output` or `-o`: Specify the output file path (default: diffgraph.html for HTML, diffgraph.json for graph)
- `--format` or `-f`: Output format: `html` (default) or `graph`
- `--graph-format`: Graph serialization format when using `--format graph`: `json` (default), `pickle`, or `graphml`
- `--no-open`: Don't automatically open the HTML report in browser
- `--version`: Show version information

Example:
Examples:
```bash
# Generate HTML report (default)
wild --output my-report.html --no-open

# Export graph data as JSON
wild --format graph --output graph-data.json

# Export graph data as pickle
wild --format graph --graph-format pickle --output graph-data.pkl

# Export graph data as GraphML
wild --format graph --graph-format graphml --output graph-data.graphml
```

## 📊 Example Output
## 📊 Output Formats

### HTML Report (default)
The generated HTML report includes:
- A summary of code changes
- A Mermaid.js dependency graph
- Syntax-highlighted code blocks
- Dark mode support
- Responsive design for all screen sizes

### Graph Data Export
When using `--format graph`, the tool exports graph data, allowing other programs to programmatically analyze the code changes:

**Supported formats:**
- **JSON** (default): Structured, integration-friendly format optimized for VSCode extensions and UIs
- **Pickle**: Python-specific NetworkX format that preserves exact data structures
- **GraphML**: Standard graph format compatible with many graph analysis tools

#### Structured JSON Format (Default)

The JSON export provides a clean, categorized structure ideal for integrations:

**File Categorization:**
- **auto_generated**: Lock files, build artifacts (excluded from review)
- **documentation**: Markdown, docs with cross-references to code
- **configuration**: Config files with structured change tracking
- **source_code**: Source files with full dependency graphs

**Exported data includes:**
- File-level dependency graph with additions/deletions
- Component-level dependency graph (functions, classes, methods)
- Change types for all nodes and edges
- Impact radius (number of dependent components)
- Git diff statistics per file
- Comprehensive metadata

**Example JSON structure:**
```json
{
"version": "2.0",
"metadata": {
"analyzed_at": "2025-10-24T23:00:00Z",
"total_files_changed": 12,
"total_additions": 1296,
"total_deletions": 28
},
"auto_generated": [...],
"documentation": {...},
"configuration": {...},
"source_code": {
"files": {
"nodes": [{"path": "...", "additions": 10, ...}],
"edges": [{"source": "...", "target": "...", "relationship": "imports"}]
},
"components": {
"nodes": [{"id": "...", "name": "...", "impact_radius": 5, ...}],
"edges": [{"source": "...", "target": "...", "relationship": "calls"}]
}
}
}
```

**Using structured JSON data:**
```python
import json

# Load the structured JSON
with open('diffgraph.json', 'r') as f:
data = json.load(f)

# Access categorized files
print(f"Source files: {len(data['source_code']['files']['nodes'])}")
print(f"Documentation: {len(data['documentation'])}")
print(f"Auto-generated: {len(data['auto_generated'])}")

# Access components
for component in data['source_code']['components']['nodes']:
print(f"{component['name']} ({component['component_type']})")
print(f" Impact radius: {component['impact_radius']}")
print(f" Change type: {component['change_type']}")

# Access dependencies
for edge in data['source_code']['components']['edges']:
print(f"{edge['source']} -> {edge['target']} ({edge['relationship']})")
```

#### NetworkX Format (Pickle/GraphML)

For advanced analysis or Python-specific use cases:

```python
from diffgraph.graph_export import load_graph_from_pickle
import networkx as nx

# Load NetworkX format
graph_manager = load_graph_from_pickle('diffgraph.pkl')

# Use NetworkX algorithms
print(f"Total components: {graph_manager.component_graph.number_of_nodes()}")
print(f"Component dependencies: {graph_manager.component_graph.number_of_edges()}")
```

**See also**: [Structured Output Design](docs/planning/STRUCTURED_OUTPUT_DESIGN.md) for complete schema specification

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
Expand Down
Loading