Feature: output graph data as JSON #11

avikalpg · 2025-10-29T01:30:56Z

Summary by CodeRabbit

Release Notes - Version 1.2.0

New Features
- Added graph data export in multiple formats: JSON, Pickle, and GraphML.
- Introduced structured JSON output format with enhanced metadata and categorization (Phase 1).
- Added CLI options: --format (html|graph), --graph-format (json|pickle|graphml), and --output for custom export paths.
Documentation
- Expanded README with export usage examples and output format specifications.
- Added comprehensive testing guide for export functionality.
Tests
- New test suites validating export/import workflows and data integrity across all formats.

Add ability to export complete networkx graph data structure to file formats that other programs can consume programmatically. New Features: - Add --format option to choose between 'html' (default) and 'graph' output - Add --graph-format option to select serialization (json/pickle/graphml) - Support for JSON export (human-readable, widely compatible) - Support for Pickle export (Python-native, exact data preservation) - Support for GraphML export (standard graph format for analysis tools) New Files: - diffgraph/graph_export.py: Core export/import functionality - test_graph_export.py: Comprehensive test suite - example_usage.py: Example script for using exported data - test_cli_manual.sh: Automated testing script - GRAPH_EXPORT_FEATURE.md: Feature documentation - TESTING_GUIDE.md: Testing instructions Modified Files: - diffgraph/cli.py: Added new CLI options and conditional output logic - diffgraph/graph_manager.py: Added export_to_dict() method - README.md: Updated documentation with new features - CHANGELOG.md: Added v1.1.0 release notes - setup.py: Bumped version to 1.1.0 - diffgraph/__init__.py: Updated version to 1.1.0 Technical Details: - Exported data includes file nodes, component nodes, graphs, and metadata - All formats support round-trip (export → load) with data integrity - NetworkX graphs serialized using node-link format for compatibility - Backward compatible: existing HTML functionality unchanged Testing: - All unit tests pass (test_graph_export.py) - Automated test suite validates all formats - Example usage script demonstrates practical use cases

Add comprehensive design documents for new integration-friendly JSON output format to docs/planning/ folder to keep root directory clean. Documents: - STRUCTURED_OUTPUT_DESIGN.md: Complete schema specification - File categorization (source/docs/config/auto-generated) - Rich component metadata (complexity, impact radius) - External dependency nodes (APIs, databases, services) - Comprehensive relationship types (REST, RPC, pub/sub) - Cross-references between docs and code - Phase-based implementation strategy - PHASE1_IMPLEMENTATION_NOTES.md: Implementation guide - Implementation decisions and rationale - Phase 1 scope (basic restructuring with existing data) - Phase 2/3 future enhancements - Testing strategy and success criteria - Next session pickup instructions These documents preserve design rationale and guide future implementation without cluttering the codebase root.

Add integration-friendly structured JSON format optimized for VSCode extensions, web UIs, and other tool integrations. This is Phase 1 using existing analysis data without requiring new AI analysis. New Features: - Structured JSON output with clean categorization and rich metadata - Automatic file classification into 4 categories: * auto_generated: Lock files, build artifacts (excluded from review) * documentation: Docs with potential cross-references to code * configuration: Config files with change tracking * source_code: Source files with full dependency graphs - Git diff stats (additions/deletions) extracted per file - Impact radius calculation from dependency graph - Complete graph structure (all edge targets exist as nodes) - Pattern-based classification with 40+ common file patterns New Files: - diffgraph/structured_export.py: Core transformation logic (400+ lines) * File categorization and classification * Component and edge transformation * Git stats extraction * Structured format generation - test_structured_export.py: Comprehensive test suite * File classification tests * Component transformation tests * Full export validation * Graph completeness tests * Empty graph handling Modified Files: - diffgraph/cli.py: Route JSON format to structured export - README.md: Document structured format with examples - CHANGELOG.md: Added v1.2.0 release notes - setup.py: Bumped version to 1.2.0 - diffgraph/__init__.py: Updated version to 1.2.0 Breaking Changes: - JSON format now outputs structured format (v2.0 schema) - NetworkX format still available via pickle/graphml Technical Details: - Phase 1: Uses existing data, leaves advanced fields as null - Relationship types: imports, calls, extends, implements - Change types: added, deleted, modified, unchanged - Clean separation of files graph and components graph - All tests pass (5/5 test suites) Phase 2 Planned: - Full codebase impact analysis - External dependency nodes (APIs, databases) - Advanced relationships (REST, RPC, pub/sub) - Complexity metrics and line numbers - Cross-reference detection See docs/planning/STRUCTURED_OUTPUT_DESIGN.md for complete specification.

…id', but just 'parent'. refactor: Clean up whitespace in structured_export.py This commit removes unnecessary whitespace throughout the structured_export.py file, improving code readability without altering functionality. The changes include adjustments to spacing in function definitions, docstrings, and return statements, ensuring a more consistent coding style.

coderabbitai · 2025-10-29T01:31:04Z

Walkthrough

This PR introduces comprehensive graph export capabilities for DiffGraph, adding support for multiple serialization formats (JSON, Pickle, GraphML), a new structured JSON output format for integration with external tools, enhanced CLI options for format selection, and a complete test suite with documentation and planning guides.

Changes

Cohort / File(s)	Summary
Version & Package Updates `diffgraph/__init__.py`, `setup.py`	Bumps version from 0.1.0/1.0.0 to 1.2.0
Core Export Functionality `diffgraph/graph_export.py`	New module with export/load functions for JSON, Pickle, GraphML formats; includes graph data preparation and reconstruction helpers
Structured JSON Export `diffgraph/structured_export.py`	New module for transforming graph data into structured JSON format with file classification, language detection, relationship inference, and metadata aggregation
GraphManager Enhancement `diffgraph/graph_manager.py`	Adds `export_to_dict()` method to serialize graph structures into JSON-compatible dictionaries
CLI Enhancements `diffgraph/cli.py`	Adds `--format` (html\|graph), `--graph-format` (json\|pickle\|graphml), `--output` options; implements conditional export logic for graph formats
Core Documentation `README.md`, `CHANGELOG.md`	Updates feature list, adds export capability documentation, includes usage examples for JSON/Pickle/GraphML/HTML outputs, and release notes for 1.2.0
Planning & Design Docs `docs/planning/PHASE1_IMPLEMENTATION_NOTES.md`, `docs/planning/STRUCTURED_OUTPUT_DESIGN.md`, `GRAPH_EXPORT_FEATURE.md`, `TESTING_GUIDE.md`	Comprehensive documentation of Phase 1 implementation strategy, structured output schema design, feature specifications, and testing procedures
Test Suite `test_graph_export.py`, `test_structured_export.py`, `test_cli_manual.sh`	Unit tests for export/import workflows, structured export validation, manual CLI testing with round-trip integrity checks
Example & Usage `example_usage.py`	Demo script for loading and analyzing exported graph data with statistics and component connectivity analysis

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant GraphManager
    participant Exporter as graph_export/<br/>structured_export
    participant FileSystem

    User->>CLI: diffgraph --format graph --graph-format json
    CLI->>GraphManager: access graph data
    alt Structured JSON
        CLI->>Exporter: export_structured_json()
        Exporter->>GraphManager: query nodes/edges
        Exporter->>Exporter: classify files, transform nodes
        Exporter->>FileSystem: write JSON
    else Standard Export
        CLI->>Exporter: export_graph_to_json()
        Exporter->>GraphManager: export_to_dict()
        Exporter->>FileSystem: write JSON
    end
    FileSystem-->>User: export file path
    
    Note over Exporter: Supports JSON, Pickle, GraphML<br/>with load_graph functions for reconstruction

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

diffgraph/structured_export.py: New module with extensive transformation logic, pattern matching, and Git integration; validate correctness of file classification, relationship inference, and schema conformance
diffgraph/graph_export.py: Graph serialization/deserialization across multiple formats; verify data integrity in round-trip conversions and compatibility with GraphML standard
diffgraph/cli.py: Parameter routing logic and conditional export paths; ensure backward compatibility with HTML output and correct error handling for invalid format combinations
test_structured_export.py & test_graph_export.py: Comprehensive test coverage but requires validation that edge cases (empty graphs, missing files, malformed data) are properly handled
Planning documents (STRUCTURED_OUTPUT_DESIGN.md, PHASE1_IMPLEMENTATION_NOTES.md): Validate alignment between design intent and implementation, especially Phase 1/2 boundaries and placeholder fields

Possibly related PRs

DiffGraph Fix: Components within files are actually detected and added to graph. Tooltip to show summary of changes. Colors fixed. #3: Extends GraphManager to accept and store component metadata fields (summary, dependencies, dependents); directly complements this PR's serialization of those same fields in export/structured-export modules.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Feature: output graph data as JSON" is partially related to the changeset. The PR does introduce JSON output capabilities, which is a legitimate and significant part of the changes. However, the title does not capture the full scope of the implementation, which includes multiple export formats (JSON, pickle, GraphML), a new structured JSON format specifically designed for Phase 1 with enhanced metadata and categorization, multiple new modules, and comprehensive CLI enhancements. The title focuses on one output format while omitting the multi-format nature and the "structured" aspect of the JSON output mentioned in the CHANGELOG as the primary feature.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/output-graph-data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

diffgraph/graph_export.py (1)

109-166: Code duplication noted.

This function duplicates the logic in GraphManager.export_to_dict() (lines 305-360 in graph_manager.py). See the comment on that file for refactoring suggestions.

🧹 Nitpick comments (6)

CHANGELOG.md (1)

8-36: Well-documented release notes!

The changelog clearly documents the Phase 1 structured JSON output format and breaking changes. The phased approach and technical details provide good context for users.

Optional style improvement: Consider hyphenating "backwards-compatible" on line 23 per standard English compound adjective usage.
-  - **JSON format now uses structured output by default** (breaking change for JSON, but backwards compatible overall)
+  - **JSON format now uses structured output by default** (breaking change for JSON, but backwards-compatible overall)

docs/planning/STRUCTURED_OUTPUT_DESIGN.md (1)

1-643: Excellent design document!

This comprehensive design document provides a clear roadmap for the structured JSON output format, with well-defined phases and implementation strategy. The distinction between Phase 1 (basic restructuring) and Phase 2 (enhanced analysis) sets realistic expectations.

Optional markdown improvements for better rendering:

Add language specifier to the code block at line 93

Remove spaces inside emphasis markers (lines 126, 155)

These are minor formatting issues that don't affect the excellent content.

test_graph_export.py (1)

1-1: Shebang without executable permission.

The file has a shebang line but is not marked as executable. Either make the file executable or remove the shebang since this is typically imported as a module.
# Option 1: Make it executable
chmod +x test_graph_export.py

# Option 2: Remove the shebang if not needed
As per static analysis.

diffgraph/graph_manager.py (1)

324-331: Simplify component serialization logic.

The component list comprehension uses nested conditionals with hasattr checks that may not be necessary. The FileNode.components field is typed as List[Dict], suggesting components should already be dictionaries.

If components are always dictionaries (as the type hint suggests), simplify to:
-'components': [
-    {
-        'name': c.name if hasattr(c, 'name') else str(c),
-        'change_type': c.change_type if hasattr(c, 'change_type') else 'unknown',
-        'summary': c.summary if hasattr(c, 'summary') else None
-    } if hasattr(c, '__dict__') else str(c)
-    for c in (node.components or [])
-]
+'components': node.components or []
If components can be mixed types, add a comment explaining why and consider validating the type earlier in the flow.

example_usage.py (1)

47-47: Rename unused loop variable.

The loop variable component_id is not used in the loop body.

Apply this diff:
-    for component_id, component_node in graph_manager.component_nodes.items():
+    for _, component_node in graph_manager.component_nodes.items():
Similar issue at line 64.

docs/planning/PHASE1_IMPLEMENTATION_NOTES.md (1)

300-314: Add language identifier to fenced code block.

The commit message template code block (line 300) is missing a language identifier.

Apply this diff:

-```
+```text
 feat: Add structured JSON output format (Phase 1)
 ...


</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between a43abac1e9bb30104aab181cc68e1b306abe4a70 and d43d668f05e8b913c665d7b312c000c4e9875033.

</details>

<details>
<summary>📒 Files selected for processing (16)</summary>

* `CHANGELOG.md` (1 hunks)
* `GRAPH_EXPORT_FEATURE.md` (1 hunks)
* `README.md` (2 hunks)
* `TESTING_GUIDE.md` (1 hunks)
* `diffgraph/__init__.py` (1 hunks)
* `diffgraph/cli.py` (3 hunks)
* `diffgraph/graph_export.py` (1 hunks)
* `diffgraph/graph_manager.py` (1 hunks)
* `diffgraph/structured_export.py` (1 hunks)
* `docs/planning/PHASE1_IMPLEMENTATION_NOTES.md` (1 hunks)
* `docs/planning/STRUCTURED_OUTPUT_DESIGN.md` (1 hunks)
* `example_usage.py` (1 hunks)
* `setup.py` (1 hunks)
* `test_cli_manual.sh` (1 hunks)
* `test_graph_export.py` (1 hunks)
* `test_structured_export.py` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (6)</summary>

<details>
<summary>test_graph_export.py (2)</summary><blockquote>

<details>
<summary>diffgraph/graph_manager.py (8)</summary>

* `GraphManager` (54-360)
* `ChangeType` (8-13)
* `FileStatus` (15-20)
* `add_file` (85-94)
* `add_component` (96-122)
* `add_component_dependency` (124-135)
* `mark_processed` (150-156)
* `export_to_dict` (305-360)

</details>
<details>
<summary>diffgraph/graph_export.py (3)</summary>

* `export_graph` (249-273)
* `load_graph_from_json` (169-182)
* `load_graph_from_pickle` (185-198)

</details>

</blockquote></details>
<details>
<summary>example_usage.py (1)</summary><blockquote>

<details>
<summary>diffgraph/graph_export.py (1)</summary>

* `load_graph_from_json` (169-182)

</details>

</blockquote></details>
<details>
<summary>diffgraph/cli.py (2)</summary><blockquote>

<details>
<summary>diffgraph/graph_export.py (1)</summary>

* `export_graph` (249-273)

</details>
<details>
<summary>diffgraph/structured_export.py (1)</summary>

* `export_structured_json` (403-433)

</details>

</blockquote></details>
<details>
<summary>diffgraph/graph_export.py (1)</summary><blockquote>

<details>
<summary>diffgraph/graph_manager.py (5)</summary>

* `GraphManager` (54-360)
* `FileNode` (23-34)
* `ComponentNode` (37-52)
* `ChangeType` (8-13)
* `FileStatus` (15-20)

</details>

</blockquote></details>
<details>
<summary>test_structured_export.py (2)</summary><blockquote>

<details>
<summary>diffgraph/graph_manager.py (6)</summary>

* `GraphManager` (54-360)
* `ChangeType` (8-13)
* `add_file` (85-94)
* `add_component` (96-122)
* `add_component_dependency` (124-135)
* `mark_processed` (150-156)

</details>
<details>
<summary>diffgraph/structured_export.py (5)</summary>

* `classify_file` (53-75)
* `export_structured_json` (403-433)
* `transform_component_node` (175-202)
* `transform_file_node` (205-224)
* `transform_to_structured_format` (312-400)

</details>

</blockquote></details>
<details>
<summary>diffgraph/structured_export.py (1)</summary><blockquote>

<details>
<summary>diffgraph/graph_manager.py (4)</summary>

* `GraphManager` (54-360)
* `FileNode` (23-34)
* `ComponentNode` (37-52)
* `ChangeType` (8-13)

</details>

</blockquote></details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>CHANGELOG.md</summary>

[uncategorized] ~23-~23: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efault** (breaking change for JSON, but backwards compatible overall)   - `--format graph --graph-fo...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/planning/PHASE1_IMPLEMENTATION_NOTES.md</summary>

300-300: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>
<details>
<summary>docs/planning/STRUCTURED_OUTPUT_DESIGN.md</summary>

93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

126-126: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

---

155-155: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

</details>

</details>
<details>
<summary>🪛 Ruff (0.14.2)</summary>

<details>
<summary>test_graph_export.py</summary>

1-1: Shebang is present but file is not executable

(EXE001)

</details>
<details>
<summary>example_usage.py</summary>

1-1: Shebang is present but file is not executable

(EXE001)

---

47-47: Loop control variable `component_id` not used within loop body

Rename unused `component_id` to `_component_id`

(B007)

---

64-64: Loop control variable `component_id` not used within loop body

Rename unused `component_id` to `_component_id`

(B007)

</details>
<details>
<summary>diffgraph/cli.py</summary>

238-238: `subprocess` call: check for execution of untrusted input

(S603)

---

238-238: Starting a process with a partial executable path

(S607)

---

240-240: Starting a process without a shell

(S606)

---

242-242: `subprocess` call: check for execution of untrusted input

(S603)

---

242-242: Starting a process with a partial executable path

(S607)

</details>
<details>
<summary>diffgraph/graph_export.py</summary>

196-196: `pickle` and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

---

273-273: Avoid specifying long messages outside the exception class

(TRY003)

</details>
<details>
<summary>test_structured_export.py</summary>

1-1: Shebang is present but file is not executable

(EXE001)

</details>
<details>
<summary>diffgraph/structured_export.py</summary>

90-90: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

---

91-91: `subprocess` call: check for execution of untrusted input

(S603)

---

100-100: Do not catch blind exception: `Exception`

(BLE001)

---

177-177: Unused function argument: `graph_manager`

(ARG001)

---

248-248: Unused function argument: `graph_manager`

(ARG001)

---

405-405: PEP 484 prohibits implicit `Optional`

Convert to `T | None`

(RUF013)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (11)</summary><blockquote>

<details>
<summary>setup.py (1)</summary><blockquote>

`5-5`: **LGTM!**

The version bump to 1.2.0 appropriately reflects the new graph export features being added in this PR.

</blockquote></details>
<details>
<summary>TESTING_GUIDE.md (1)</summary><blockquote>

`1-302`: **Excellent testing documentation!**

This comprehensive testing guide provides clear instructions for verifying the new graph export feature across multiple scenarios. The step-by-step examples and troubleshooting section will be valuable for both users and contributors.

</blockquote></details>
<details>
<summary>GRAPH_EXPORT_FEATURE.md (1)</summary><blockquote>

`1-187`: **Comprehensive feature documentation!**

This document provides excellent coverage of the graph export feature, including clear usage examples, data structure descriptions, and implementation details. The backward compatibility note is particularly helpful.

</blockquote></details>
<details>
<summary>diffgraph/__init__.py (1)</summary><blockquote>

`5-5`: **LGTM!**

Version bump to 1.2.0 is consistent with setup.py and appropriately reflects the new features.

</blockquote></details>
<details>
<summary>test_graph_export.py (1)</summary><blockquote>

`13-111`: **Well-structured test coverage!**

The test function comprehensively validates all export formats (JSON, pickle, GraphML) with proper assertions for data integrity and file creation. The cleanup step is a good practice.

</blockquote></details>
<details>
<summary>README.md (2)</summary><blockquote>

`9-79`: **Clear documentation of new features!**

The updated usage section effectively communicates the new graph export capabilities with clear examples for all supported formats (JSON, Pickle, GraphML).

---

`81-184`: **Excellent output format documentation!**

The new "Output Formats" section provides comprehensive coverage of both HTML reports and graph data export. The structured JSON format explanation with categorization and code examples is particularly helpful for users integrating with the tool.

</blockquote></details>
<details>
<summary>diffgraph/cli.py (1)</summary><blockquote>

`211-242`: **LGTM! Clean format-based routing.**

The implementation correctly branches between graph export and HTML report generation based on the `--format` flag. The use of structured JSON for `--graph-format json` and NetworkX format for pickle/graphml is well-aligned with the Phase 1 design.



Note: The static analysis warnings about subprocess calls (lines 238, 240, 242) are false positives—these commands open the browser with system utilities and don't execute untrusted input.

</blockquote></details>
<details>
<summary>test_structured_export.py (1)</summary><blockquote>

`19-284`: **Excellent test coverage for Phase 1!**

The test suite comprehensively validates:
- File classification across all categories (auto-generated, docs, config, source)
- Component transformation with Phase 2 null fields
- End-to-end structured export with proper categorization
- Graph completeness (all edges reference valid nodes)
- Empty graph handling

This provides strong validation of the Phase 1 structured export implementation.



Minor: Consider making the file executable since it has a shebang (line 1).

</blockquote></details>
<details>
<summary>diffgraph/graph_export.py (1)</summary><blockquote>

`185-198`: **Document pickle security considerations.**

Pickle can be unsafe when deserializing untrusted data. Since this function loads graph data that may come from external sources, consider documenting the security implications.



Add a security warning to the docstring:
```diff
 def load_graph_from_pickle(pickle_path: str) -> GraphManager:
     """
     Load graph data from a pickle file.
+    
+    ⚠️  Security Warning: Only load pickle files from trusted sources.
+    Pickle can execute arbitrary code during deserialization.
     
     Args:
         pickle_path: Path to the pickle file

Alternatively, if pickle files are only used internally or in trusted environments, document that assumption.

diffgraph/structured_export.py (1)

175-202: Unused parameter is acceptable for Phase 1.

The graph_manager parameter is currently unused but is likely needed for Phase 2 enhancements (relationship type detection based on component analysis). Keeping it maintains a consistent API.

Similar unused parameter at line 248 in transform_file_edge is also acceptable for Phase 2 extensibility.

coderabbitai · 2025-10-29T01:35:05Z

diffgraph/graph_manager.py

+    def export_to_dict(self) -> dict:
+        """
+        Export the graph manager state to a dictionary.
+        This is useful for serialization and can be passed to graph_export functions.
+        
+        Returns:
+            Dictionary containing all graph data and metadata
+        """
+        from networkx.readwrite import json_graph
+
+        # Convert file nodes to serializable format
+        file_nodes_data = {}
+        for file_path, node in self.file_nodes.items():
+            file_nodes_data[file_path] = {
+                'path': node.path,
+                'status': node.status.value,
+                'change_type': node.change_type.value,
+                'summary': node.summary,
+                'error': node.error,
+                'components': [
+                    {
+                        'name': c.name if hasattr(c, 'name') else str(c),
+                        'change_type': c.change_type if hasattr(c, 'change_type') else 'unknown',
+                        'summary': c.summary if hasattr(c, 'summary') else None
+                    } if hasattr(c, '__dict__') else str(c)
+                    for c in (node.components or [])
+                ]
+            }
+
+        # Convert component nodes to serializable format
+        component_nodes_data = {}
+        for component_id, node in self.component_nodes.items():
+            component_nodes_data[component_id] = {
+                'name': node.name,
+                'file_path': node.file_path,
+                'change_type': node.change_type.value,
+                'component_type': node.component_type,
+                'parent': node.parent,
+                'summary': node.summary,
+                'dependencies': node.dependencies,
+                'dependents': node.dependents
+            }
+
+        # Convert graphs to node-link format
+        file_graph_data = json_graph.node_link_data(self.file_graph, edges="links")
+        component_graph_data = json_graph.node_link_data(self.component_graph, edges="links")
+
+        # Combine all data
+        return {
+            'version': '1.0',
+            'file_nodes': file_nodes_data,
+            'component_nodes': component_nodes_data,
+            'file_graph': file_graph_data,
+            'component_graph': component_graph_data,
+            'processed_files': list(self.processed_files)
+        }


⚠️ Potential issue | 🟠 Major

Code duplication with graph_export.py.

The export_to_dict() method duplicates the serialization logic in diffgraph/graph_export.py's _prepare_graph_data() function (lines 109-166). Both functions perform identical transformations of file_nodes, component_nodes, and graphs.

Consider one of these approaches:

Option 1 (Recommended): Make _prepare_graph_data call export_to_dict

# In diffgraph/graph_export.py def _prepare_graph_data(graph_manager: GraphManager) -> Dict[str, Any]: - # Convert file nodes to serializable format - file_nodes_data = {} - for file_path, node in graph_manager.file_nodes.items(): - ... - # ... rest of duplication - return data + """Prepare graph data for serialization by delegating to GraphManager.""" + return graph_manager.export_to_dict()

Option 2: Remove export_to_dict and keep logic in graph_export.py

If export_to_dict isn't used elsewhere, consider removing it entirely and keeping serialization logic centralized in the graph_export module.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In diffgraph/graph_manager.py around lines 305-360, export_to_dict duplicates the serialization logic already implemented in diffgraph/graph_export.py::_prepare_graph_data (lines ~109-166); replace the duplicated implementation by delegating to that centralized function (import _prepare_graph_data from diffgraph.graph_export and return its result) or, if export_to_dict is unused elsewhere, remove this method and update callers to use graph_export._prepare_graph_data directly; ensure the function signatures match, import paths are correct, update any unit tests or call sites, and keep returned structure/version identical to avoid breaking callers.

coderabbitai · 2025-10-29T01:35:05Z

diffgraph/structured_export.py

+    try:
+        cmd = ['git', 'diff', '--numstat'] + diff_args + ['--', file_path]
+        result = subprocess.run(cmd, capture_output=True, text=True, check=False)
+
+        if result.returncode == 0 and result.stdout.strip():
+            # Output format: "additions\tdeletions\tfilename"
+            parts = result.stdout.strip().split('\t')
+            if len(parts) >= 2:
+                additions = int(parts[0]) if parts[0] != '-' else 0
+                deletions = int(parts[1]) if parts[1] != '-' else 0
+                return {'additions': additions, 'deletions': deletions}
+    except Exception as e:
+        print(f"Warning: Could not get stats for {file_path}: {e}")
+
+    return {'additions': 0, 'deletions': 0}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify git diff arguments are sanitized.

The get_file_stats function passes diff_args directly to a subprocess command. If these arguments come from user input, this could be a security risk.

Verify that diff_args are sanitized before reaching this function. Based on diffgraph/cli.py (lines 43-46), it appears sanitize_diff_args is used in the CLI flow. Confirm this sanitization also applies to the structured export path:

🏁 Script executed:

#!/bin/bash # Check if diff_args are sanitized before reaching get_file_stats rg -n "export_structured_json" -A 10 -B 5 --type=py

Length of output: 7168

I need to see more context around the export_structured_json call in cli.py to verify if diff_args are sanitized before reaching that function.

I need to see the actual implementation context around line 216 in cli.py to verify if diff_args are sanitized before passing to export_structured_json.

Based on my research, I can confirm that git commands can be exploited through argument injection options, and it's possible to execute arbitrary commands even when using safe APIs. The recommended approach is to use '--' to separate options from paths/arguments.

The security concern in the original review comment is valid. However, I need to examine the actual code implementation in cli.py to verify whether sanitize_diff_args is properly applied before the export_structured_json call. Let me check the code directly to complete the verification.

--- 🏁 Script executed: ```shell #!/bin/bash # Get the full context around line 216 of cli.py to see diff_args preparation rg -n "export_structured_json" --type=py -B 30 diffgraph/cli.py | head -60

Length of output: 2235

I need to find where diff_args is defined in the CLI function to verify if it's sanitized before reaching line 216.

--- 🏁 Script executed: ```shell #!/bin/bash # Find the function definition that contains line 216 and trace diff_args origin rg -n "def " --type=py diffgraph/cli.py | grep -B1 "^[0-9]*:def"

Length of output: 336

--- 🏁 Script executed: ```shell #!/bin/bash # Get the implementation of main function starting at line 143 sed -n '143,250p' diffgraph/cli.py

Length of output: 4849

Now I need to check if get_changed_files or load_file_contents sanitize the diff_args before using them.

--- 🏁 Script executed: ```shell #!/bin/bash # Get the implementation of get_changed_files starting at line 31 sed -n '31,90p' diffgraph/cli.py

Length of output: 2093

Excellent - I found that get_changed_files does sanitize diff_args at line 44. However, the unsanitized diff_args are also passed directly to export_structured_json at line 216. Let me verify if export_structured_json sanitizes them before calling get_file_stats.

--- 🏁 Script executed: ```shell #!/bin/bash # Get export_structured_json implementation to see if it sanitizes diff_args sed -n '403,433p' diffgraph/structured_export.py

Length of output: 1081

--- 🏁 Script executed: ```shell #!/bin/bash # Find transform_to_structured_format function and check if it sanitizes diff_args rg -n "def transform_to_structured_format" --type=py -A 50 diffgraph/structured_export.py | head -70

Length of output: 2018

Unsanitized diff_args create a git command injection vulnerability in the structured export path.

The verification confirms the review comment's concern is valid. The diff_args flow from the CLI is: main() (line 159, raw args) → export_structured_json() (line 216) → transform_to_structured_format() (line 426) → get_file_stats() (lines 333-336) → subprocess (line 91).

Unlike get_changed_files() which calls sanitize_diff_args() for its own use, the structured export path does not sanitize diff_args before passing them to get_file_stats(). At line 91 of structured_export.py, the unsanitized arguments are directly concatenated into the git subprocess command:

cmd = ['git', 'diff', '--numstat'] + diff_args + ['--', file_path]

This is a security vulnerability that should be fixed by calling sanitize_diff_args() either:

In export_structured_json() before passing to transform_to_structured_format(), or

At the start of transform_to_structured_format() before using diff_args

🧰 Tools

🪛 Ruff (0.14.2)

90-90: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

91-91: subprocess call: check for execution of untrusted input

(S603)

100-100: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In diffgraph/structured_export.py around lines 89 to 103, diff_args is concatenated directly into the git subprocess command causing a command injection vulnerability; fix by sanitizing diff_args before it's used — call the existing sanitize_diff_args() on diff_args either in export_structured_json() before passing diff_args into transform_to_structured_format(), or at the start of transform_to_structured_format(); ensure the sanitized list replaces the original diff_args variable so the subprocess command becomes ['git','diff','--numstat'] + sanitized_diff_args + ['--', file_path].

coderabbitai · 2025-10-29T01:35:06Z

diffgraph/structured_export.py

+            'total_files_changed': total_files,
+            'total_additions': total_additions,
+            'total_deletions': total_deletions,
+            'analyzer_version': '1.2.0'  # TODO: Get from package


🛠️ Refactor suggestion | 🟠 Major

Replace TODO with actual version.

The analyzer version is hardcoded with a TODO comment.

Import the version from the package:

+from . import __version__ + # ... 'metadata': { 'analyzed_at': datetime.now(timezone.utc).isoformat(), 'diff_base': diff_base, 'diff_target': diff_target, 'total_files_changed': total_files, 'total_additions': total_additions, 'total_deletions': total_deletions, - 'analyzer_version': '1.2.0' # TODO: Get from package + 'analyzer_version': __version__ },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

'analyzer_version': '1.2.0' # TODO: Get from package

from . import __version__

# ... other code ...

'metadata': {

'analyzed_at': datetime.now(timezone.utc).isoformat(),

'diff_base': diff_base,

'diff_target': diff_target,

'total_files_changed': total_files,

'total_additions': total_additions,

'total_deletions': total_deletions,

'analyzer_version': __version__

},

🤖 Prompt for AI Agents

In diffgraph/structured_export.py around line 394, the analyzer_version is hardcoded as '1.2.0' with a TODO; replace the hardcoded value by importing the package version (e.g., from your package's __init__ or version module) and use that imported variable for 'analyzer_version'. Add the appropriate import at the top of the file, handle import errors if needed (fallback or raise), and remove the TODO comment.

coderabbitai · 2025-10-29T01:35:06Z

example_usage.py

@@ -0,0 +1,88 @@
+#!/usr/bin/env python


🛠️ Refactor suggestion | 🟠 Major

Make the script executable.

The shebang is present, but the file lacks executable permissions.

Run:

chmod +x example_usage.py

🧰 Tools

🪛 Ruff (0.14.2)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents

In example_usage.py at line 1, the file has a shebang but lacks executable permission; make it executable by setting the POSIX execute bit for the intended users (e.g., run chmod +x example_usage.py) so the script can be run directly from the shell while retaining current owner/group and other permissions.

coderabbitai · 2025-10-29T01:35:06Z

test_cli_manual.sh

@@ -0,0 +1,145 @@
+#!/bin/bash


🛠️ Refactor suggestion | 🟠 Major

Make the script executable.

The shebang is present, but the file lacks executable permissions.

Run:

chmod +x test_cli_manual.sh

🤖 Prompt for AI Agents

In test_cli_manual.sh around line 1, the script has a shebang but lacks executable permissions; fix by setting the file mode to include the execute bit for the appropriate user(s) (e.g., give the owner execute permission) so the script can be run directly.

avikalpg added 4 commits October 24, 2025 22:54

coderabbitai bot requested changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: output graph data as JSON #11

Feature: output graph data as JSON #11

Uh oh!

avikalpg commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 29, 2025

Uh oh!

coderabbitai bot Oct 29, 2025

Uh oh!

coderabbitai bot Oct 29, 2025

Uh oh!

coderabbitai bot Oct 29, 2025

Uh oh!

coderabbitai bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            'analyzer_version': '1.2.0'  # TODO: Get from package
+from . import __version__
+# ... other code ...
+        'metadata': {
+            'analyzed_at': datetime.now(timezone.utc).isoformat(),
+            'diff_base': diff_base,
+            'diff_target': diff_target,
+            'total_files_changed': total_files,
+            'total_additions': total_additions,
+            'total_deletions': total_deletions,
+            'analyzer_version': __version__
+        },

Feature: output graph data as JSON #11

Are you sure you want to change the base?

Feature: output graph data as JSON #11

Uh oh!

Conversation

avikalpg commented Oct 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes - Version 1.2.0

Uh oh!

coderabbitai bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

avikalpg commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 29, 2025 •

edited

Loading