A comprehensive tool for scanning GitHub repositories and building dependency graphs using Neo4j. This tool can parse import statements across multiple programming languages and create detailed dependency visualizations.
- Multi-language Support: Parses dependencies from Java, Python, JavaScript, TypeScript, Kotlin, Groovy, XML, Gradle, YAML, and more
- GitHub Integration: Directly scans GitHub repositories using the GitHub API
- Neo4j Graph Database: Stores dependency relationships in a graph database for complex queries
- Advanced Analysis: Detects circular dependencies, identifies most dependent files, and analyzes dependency patterns
- Visualization: Generates charts and network graphs for dependency analysis
- Export Capabilities: Exports dependency data to JSON format for further analysis
- Security Analysis: Vulnerability scanning and upgrade path analysis with AI-powered recommendations
- Agentic Graph Traversal: Intelligent dependency chain analysis to find transitive vulnerabilities
- Python 3.8 or higher (Python 3.13 requires specific package versions)
- Neo4j Database (local or cloud instance)
- GitHub API token (optional, for higher rate limits)
- OpenAI API key (optional, for AI-powered upgrade analysis)
If you're having compilation issues on Windows, try these approaches in order:
# Install core packages (no compilation needed) pip install neo4j==5.15.0 requests==2.31.0 beautifulsoup4==4.12.2 pygithub==2.1.1 python-dotenv==1.0.0 tqdm==4.66.1 openai==1.57.0 aiohttp==3.10.11 packaging==24.2 # Try lxml with pre-compiled wheels pip install lxml==5.3.0 --only-binary=allWhat works with core dependencies:
- âś… Scanning GitHub repositories
- âś… Parsing import statements
- âś… Storing dependency graphs in Neo4j
- âś… Running analysis queries
- âś… Exporting data to JSON
- âś… Vulnerability scanning and security analysis
- âś… AI-powered upgrade recommendations
- ❌ Creating charts and network visualizations
pip install -r requirements.txt --only-binary=allpython install_dependencies.pyIf the above options fail, install packages one by one:
# Step 1: Core packages pip install neo4j==5.15.0 pip install requests==2.31.0 pip install beautifulsoup4==4.12.2 pip install pygithub==2.1.1 pip install python-dotenv==1.0.0 pip install tqdm==4.66.1 pip install openai==1.57.0 pip install aiohttp==3.10.11 pip install packaging==24.2 # Step 2: Try lxml with different approaches pip install lxml==5.3.0 --only-binary=all # If that fails, try: pip install lxml>=5.0.0 --only-binary=all # Or skip lxml entirely (beautifulsoup4 will still work) # Step 3: Visualization packages (optional) pip install networkx==3.2.1 --only-binary=all pip install matplotlib==3.9.3 --only-binary=all pip install seaborn==0.13.0 --only-binary=all pip install pandas==2.2.3 --only-binary=all-
Install Visual Studio Build Tools:
- Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
- Install with "C++ build tools" workload
-
Use Conda instead of pip:
conda install numpy pandas matplotlib seaborn lxml pip install neo4j requests beautifulsoup4 pygithub python-dotenv tqdm
-
Use a different Python version:
- Try Python 3.9 or 3.10 (better wheel support)
- Python 3.13 requires specific package versions (see requirements.txt)
-
Skip problematic packages:
- The core functionality works without visualization packages
- You can add them later when you have the build tools
The scanner will still work without lxml. BeautifulSoup4 can parse HTML/XML without it, just slightly slower.
- Install Neo4j Desktop or use Neo4j AuraDB (cloud)
- Create a new database
- Note down the connection details (URI, username, password)
Create a .env file in the project root:
NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your_password GITHUB_TOKEN=your_github_token OPENAI_API_KEY=your_openai_api_keyScan the Spring Framework repository (default):
python main.py# Scan a different repository python main.py --repo-owner microsoft --repo-name vscode # Clear database before scanning python main.py --clear-db # Export graph data to JSON python main.py --export-graph # Run analysis queries after scanning python main.py --analyze # Limit number of files for testing python main.py --max-files 100 # Combine multiple options python main.py --clear-db --export-graph --analyzeGenerate visualizations from existing graph data:
# Create all visualizations python visualization.py # Specify custom graph data file and output directory python visualization.py --graph-data my_graph.json --output-dir my_visualizationsThe scanner supports the following file extensions:
- Java:
.java - Python:
.py - JavaScript:
.js,.jsx - TypeScript:
.ts,.tsx - Kotlin:
.kt - Groovy:
.groovy - XML:
.xml - Gradle:
.gradle - Properties:
.properties - YAML:
.yml,.yaml - Markdown:
.md - Text:
.txt
The following directories are automatically excluded from scanning:
.git,node_modules,target,build,dist.gradle,out,bin,obj,__pycache__.pytest_cache,.idea,.vscode,coveragedocs,documentation
The tool uses regex patterns to detect imports in different languages:
- Java:
import,import static,packagestatements - Python:
import,from ... importstatements - JavaScript/TypeScript:
import,requirestatements - Kotlin/Groovy:
import,packagestatements - XML: Namespace declarations
- Gradle: Dependency declarations
-
File Node (
:File)- Properties:
path,name,extension,language,size,last_modified
- Properties:
-
Module Node (
:Module)- Properties:
name,type
- Properties:
-
DEPENDS_ON (
:File)-[:DEPENDS_ON]->(:File)- Properties:
type,import_statement,line_number
- Properties:
-
DEPENDS_ON (
:File)-[:DEPENDS_ON]->(:Module)- Properties:
type,import_statement,line_number
- Properties:
The tool provides several built-in analysis queries:
- Circular Dependencies: Detects circular dependency chains
- Most Dependent Files: Files with the most outgoing dependencies
- Most Depended On Files: Files with the most incoming dependencies
- External Dependencies: Most commonly used external modules
- Language Distribution: File count by programming language
The --export-graph option creates a JSON file with:
{ "files": [...], "internal_dependencies": [...], "modules": [...], "external_dependencies": [...], "statistics": {...} }The visualization module creates:
language_distribution.png: Pie chart of files by languagedependency_types.png: Bar chart of dependency typesdependency_network.png: Network graph of dependenciestop_dependencies.png: Top most depended on filessummary_statistics.txt: Text summary of statistics
# Full analysis of Spring Framework python main.py --clear-db --export-graph --analyzeThis will:
- Clear the Neo4j database
- Scan the Spring Framework repository
- Parse all Java, XML, and Gradle files
- Create dependency relationships
- Export data to JSON
- Run analysis queries
# Analyze a different repository python main.py --repo-owner facebook --repo-name react --clear-db --export-graph# Create visualizations from existing data python visualization.py --output-dir spring_analysis-
Neo4j Connection Error:
- Verify Neo4j is running
- Check connection details in
.envfile - Ensure firewall allows connection
-
GitHub API Rate Limits:
- Add GitHub token to
.envfile - Token provides higher rate limits
- Add GitHub token to
-
Memory Issues:
- Use
--max-filesto limit processing - Increase system memory if needed
- Use
-
Import Detection Issues:
- Check file extensions are supported
- Verify import patterns in
config.py
- Use
--max-filesfor testing with large repositories - Run Neo4j on SSD for better performance
- Increase Neo4j memory settings for large graphs
- Use GitHub token to avoid rate limiting
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Spring Framework team for the example repository
- Neo4j team for the graph database
- GitHub for the API access
- Python community for the excellent libraries used
The tool includes advanced security analysis capabilities:
- Vulnerability Scanner: Checks packages against known security advisories
- Upgrade Analyzer: AI-powered recommendations for safe upgrade paths
- Agentic Graph Traversal: Intelligent analysis of transitive dependencies
# Run vulnerability scan on current graph python vulnerability_scanner.py # Analyze upgrade options with AI python upgrade_analyzer.py # Run comprehensive security tests python test_system_vulnerability_analysis.py- neo4j: Graph database driver
- requests: HTTP library for API calls
- beautifulsoup4: HTML/XML parsing
- pygithub: GitHub API wrapper
- python-dotenv: Environment variable management
- tqdm: Progress bars
- openai: AI-powered analysis
- aiohttp: Asynchronous HTTP client
- packaging: Version parsing and comparison
- lxml: Fast XML parsing
- networkx: Network analysis
- matplotlib: Plotting library
- seaborn: Statistical visualization
- pandas: Data analysis
- Python 3.8-3.12: Use standard package versions
- Python 3.13: Requires specific versions (see requirements.txt):
- pandas>=2.2.0
- lxml>=5.0.0
- matplotlib>=3.9.0
For issues and questions:
- Check the troubleshooting section
- Review the configuration options
- Open an issue on GitHub
- Check the Neo4j documentation for database-specific questions
- For security features, see the vulnerability scanning documentation