GitHub Repository Dependency Scanner

A comprehensive tool for scanning GitHub repositories and building dependency graphs using Neo4j. This tool can parse import statements across multiple programming languages and create detailed dependency visualizations.

Features

Multi-language Support: Parses dependencies from Java, Python, JavaScript, TypeScript, Kotlin, Groovy, XML, Gradle, YAML, and more
GitHub Integration: Directly scans GitHub repositories using the GitHub API
Neo4j Graph Database: Stores dependency relationships in a graph database for complex queries
Advanced Analysis: Detects circular dependencies, identifies most dependent files, and analyzes dependency patterns
Visualization: Generates charts and network graphs for dependency analysis
Export Capabilities: Exports dependency data to JSON format for further analysis
Security Analysis: Vulnerability scanning and upgrade path analysis with AI-powered recommendations
Agentic Graph Traversal: Intelligent dependency chain analysis to find transitive vulnerabilities

Prerequisites

Python 3.8 or higher (Python 3.13 requires specific package versions)
Neo4j Database (local or cloud instance)
GitHub API token (optional, for higher rate limits)
OpenAI API key (optional, for AI-powered upgrade analysis)

Installation

Windows Installation (Handling Compilation Issues)

If you're having compilation issues on Windows, try these approaches in order:

Option 1: Install Core Dependencies Only (Recommended)

# Install core packages (no compilation needed) pip install neo4j==5.15.0 requests==2.31.0 beautifulsoup4==4.12.2 pygithub==2.1.1 python-dotenv==1.0.0 tqdm==4.66.1 openai==1.57.0 aiohttp==3.10.11 packaging==24.2 # Try lxml with pre-compiled wheels pip install lxml==5.3.0 --only-binary=all

What works with core dependencies:

✅ Scanning GitHub repositories
✅ Parsing import statements
✅ Storing dependency graphs in Neo4j
✅ Running analysis queries
✅ Exporting data to JSON
✅ Vulnerability scanning and security analysis
✅ AI-powered upgrade recommendations
❌ Creating charts and network visualizations

Option 2: Full Installation with Pre-compiled Wheels

pip install -r requirements.txt --only-binary=all

Option 3: Use the Installation Script

python install_dependencies.py

Option 4: Manual Step-by-Step Installation

If the above options fail, install packages one by one:

# Step 1: Core packages pip install neo4j==5.15.0 pip install requests==2.31.0 pip install beautifulsoup4==4.12.2 pip install pygithub==2.1.1 pip install python-dotenv==1.0.0 pip install tqdm==4.66.1 pip install openai==1.57.0 pip install aiohttp==3.10.11 pip install packaging==24.2 # Step 2: Try lxml with different approaches pip install lxml==5.3.0 --only-binary=all # If that fails, try: pip install lxml>=5.0.0 --only-binary=all # Or skip lxml entirely (beautifulsoup4 will still work) # Step 3: Visualization packages (optional) pip install networkx==3.2.1 --only-binary=all pip install matplotlib==3.9.3 --only-binary=all pip install seaborn==0.13.0 --only-binary=all pip install pandas==2.2.3 --only-binary=all

Troubleshooting

If you get compilation errors:

Install Visual Studio Build Tools:
- Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
- Install with "C++ build tools" workload

Use Conda instead of pip:

conda install numpy pandas matplotlib seaborn lxml pip install neo4j requests beautifulsoup4 pygithub python-dotenv tqdm

Use a different Python version:
- Try Python 3.9 or 3.10 (better wheel support)
- Python 3.13 requires specific package versions (see requirements.txt)
Skip problematic packages:
- The core functionality works without visualization packages
- You can add them later when you have the build tools

If lxml fails:

The scanner will still work without lxml. BeautifulSoup4 can parse HTML/XML without it, just slightly slower.

Set up Neo4j

Install Neo4j Desktop or use Neo4j AuraDB (cloud)
Create a new database
Note down the connection details (URI, username, password)

Configure environment variables (optional)

Create a .env file in the project root:

NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your_password GITHUB_TOKEN=your_github_token OPENAI_API_KEY=your_openai_api_key

Usage

Basic Usage

Scan the Spring Framework repository (default):

python main.py

Advanced Usage

# Scan a different repository python main.py --repo-owner microsoft --repo-name vscode # Clear database before scanning python main.py --clear-db # Export graph data to JSON python main.py --export-graph # Run analysis queries after scanning python main.py --analyze # Limit number of files for testing python main.py --max-files 100 # Combine multiple options python main.py --clear-db --export-graph --analyze

Visualization

Generate visualizations from existing graph data:

# Create all visualizations python visualization.py # Specify custom graph data file and output directory python visualization.py --graph-data my_graph.json --output-dir my_visualizations

Configuration

Supported File Types

The scanner supports the following file extensions:

Java: .java
Python: .py
JavaScript: .js, .jsx
TypeScript: .ts, .tsx
Kotlin: .kt
Groovy: .groovy
XML: .xml
Gradle: .gradle
Properties: .properties
YAML: .yml, .yaml
Markdown: .md
Text: .txt

Excluded Directories

The following directories are automatically excluded from scanning:

.git, node_modules, target, build, dist
.gradle, out, bin, obj, __pycache__
.pytest_cache, .idea, .vscode, coverage
docs, documentation

Import Patterns

The tool uses regex patterns to detect imports in different languages:

Java: import, import static, package statements
Python: import, from ... import statements
JavaScript/TypeScript: import, require statements
Kotlin/Groovy: import, package statements
XML: Namespace declarations
Gradle: Dependency declarations

Neo4j Database Schema

Nodes

File Node (:File)
- Properties: path, name, extension, language, size, last_modified
Module Node (:Module)
- Properties: name, type

Relationships

DEPENDS_ON (:File)-[:DEPENDS_ON]->(:File)
- Properties: type, import_statement, line_number
DEPENDS_ON (:File)-[:DEPENDS_ON]->(:Module)
- Properties: type, import_statement, line_number

Analysis Queries

The tool provides several built-in analysis queries:

Circular Dependencies: Detects circular dependency chains
Most Dependent Files: Files with the most outgoing dependencies
Most Depended On Files: Files with the most incoming dependencies
External Dependencies: Most commonly used external modules
Language Distribution: File count by programming language

Output Files

JSON Export

The --export-graph option creates a JSON file with:

{ "files": [...], "internal_dependencies": [...], "modules": [...], "external_dependencies": [...], "statistics": {...} }

Visualizations

The visualization module creates:

language_distribution.png: Pie chart of files by language
dependency_types.png: Bar chart of dependency types
dependency_network.png: Network graph of dependencies
top_dependencies.png: Top most depended on files
summary_statistics.txt: Text summary of statistics

Examples

Spring Framework Analysis

# Full analysis of Spring Framework python main.py --clear-db --export-graph --analyze

This will:

Clear the Neo4j database
Scan the Spring Framework repository
Parse all Java, XML, and Gradle files
Create dependency relationships
Export data to JSON
Run analysis queries

Custom Repository

# Analyze a different repository python main.py --repo-owner facebook --repo-name react --clear-db --export-graph

Visualization Only

# Create visualizations from existing data python visualization.py --output-dir spring_analysis

Troubleshooting

Common Issues

Neo4j Connection Error:
- Verify Neo4j is running
- Check connection details in .env file
- Ensure firewall allows connection
GitHub API Rate Limits:
- Add GitHub token to .env file
- Token provides higher rate limits
Memory Issues:
- Use --max-files to limit processing
- Increase system memory if needed
Import Detection Issues:
- Check file extensions are supported
- Verify import patterns in config.py

Performance Tips

Use --max-files for testing with large repositories
Run Neo4j on SSD for better performance
Increase Neo4j memory settings for large graphs
Use GitHub token to avoid rate limiting

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Spring Framework team for the example repository
Neo4j team for the graph database
GitHub for the API access
Python community for the excellent libraries used

Security Features

Vulnerability Scanning

The tool includes advanced security analysis capabilities:

Vulnerability Scanner: Checks packages against known security advisories
Upgrade Analyzer: AI-powered recommendations for safe upgrade paths
Agentic Graph Traversal: Intelligent analysis of transitive dependencies

Running Security Analysis

# Run vulnerability scan on current graph python vulnerability_scanner.py # Analyze upgrade options with AI python upgrade_analyzer.py # Run comprehensive security tests python test_system_vulnerability_analysis.py

Dependencies

Core Dependencies

neo4j: Graph database driver
requests: HTTP library for API calls
beautifulsoup4: HTML/XML parsing
pygithub: GitHub API wrapper
python-dotenv: Environment variable management
tqdm: Progress bars
openai: AI-powered analysis
aiohttp: Asynchronous HTTP client
packaging: Version parsing and comparison

Optional Dependencies (for visualization)

lxml: Fast XML parsing
networkx: Network analysis
matplotlib: Plotting library
seaborn: Statistical visualization
pandas: Data analysis

Python Version Compatibility

Python 3.8-3.12: Use standard package versions
Python 3.13: Requires specific versions (see requirements.txt):
- pandas>=2.2.0
- lxml>=5.0.0
- matplotlib>=3.9.0

Support

For issues and questions:

Check the troubleshooting section
Review the configuration options
Open an issue on GitHub
Check the Neo4j documentation for database-specific questions
For security features, see the vulnerability scanning documentation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
agentic_graph_traversal.py		agentic_graph_traversal.py
dependency_parser.py		dependency_parser.py
extract_package_versions.py		extract_package_versions.py
get_external_dependencies.py		get_external_dependencies.py
github_scanner.py		github_scanner.py
main.py		main.py
neo4j_manager.py		neo4j_manager.py
query_neo4j.py		query_neo4j.py
query_security_advisories.py		query_security_advisories.py
requirements.txt		requirements.txt
test_system_vulnerability_analysis_orig.py		test_system_vulnerability_analysis_orig.py
upgrade_analyzer.py		upgrade_analyzer.py
vulnerability_scanner.py		vulnerability_scanner.py

saurabhrpi/github-repo-dependency-scanner

Folders and files

Latest commit

History

Repository files navigation

GitHub Repository Dependency Scanner

Features

Prerequisites

Installation

Windows Installation (Handling Compilation Issues)

Option 1: Install Core Dependencies Only (Recommended)

Option 2: Full Installation with Pre-compiled Wheels

Option 3: Use the Installation Script

Option 4: Manual Step-by-Step Installation

Troubleshooting

If you get compilation errors:

If lxml fails:

Set up Neo4j

Configure environment variables (optional)

Usage

Basic Usage

Advanced Usage

Visualization

Configuration

Supported File Types

Excluded Directories

Import Patterns

Neo4j Database Schema

Nodes

Relationships

Analysis Queries

Output Files

JSON Export

Visualizations

Examples

Spring Framework Analysis

Custom Repository

Visualization Only

Troubleshooting

Common Issues

Performance Tips

Contributing

License

Acknowledgments

Security Features

Vulnerability Scanning

Running Security Analysis

Dependencies

Core Dependencies

Optional Dependencies (for visualization)

Python Version Compatibility

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages