Advanced Reddit Data Collection & Analysis Platform
A modern, cross-platform desktop application for intelligent Reddit data scraping with cloud integration, sentiment analysis, and real-time monitoring.
Table of Contents
SupaScrapeR is a professional-grade Reddit data collection tool built with Electron and React. It provides researchers, analysts, and developers with powerful tools to gather, analyze, and store Reddit data at scale through an intuitive graphical interface.
Centralized dashboard with real-time metrics and scraping controls
Use Cases:
- Market research and competitive analysis
- Social sentiment tracking and brand monitoring
- Academic research and data science projects
- Content strategy and trend analysis
- Community engagement insights
Technology Stack:
- Frontend: React 18, TypeScript, Tailwind CSS
- Backend: Electron 28, Node.js
- Database: Supabase (PostgreSQL)
- Reddit API: PRAW (Python Reddit API Wrapper)
- Analytics: VADER Sentiment Analysis
Multiple Scraping Strategies
- Keyword Search: Target specific topics with automated trending keyword discovery via Google Trends integration
- DeepScan Mode: Analyze high-engagement posts based on comment count and activity metrics
- Hybrid Mode: Combine both approaches for comprehensive data coverage
- Smart Filtering: Built-in deduplication prevents redundant data collection
Configurable Performance
- Adjustable batch sizes for memory optimization
- Real-time progress tracking with detailed metrics
- Automatic retry mechanisms for network failures
- Rate limiting to comply with Reddit API guidelines
Centralized User Management
- Secure authentication system with Supabase Auth
- Encrypted credential storage
- Cross-device profile synchronization
- Community preset sharing
Personal Data Storage
- Each user maintains their own Supabase database instance
- Full control over data retention and access
- Automatic cloud backup and synchronization
- Export capabilities for data portability
Sentiment Analysis
- VADER-based sentiment scoring for posts and comments
- Aggregate sentiment trends across time periods
- Emotion detection and classification
- Custom sentiment threshold configuration
Comprehensive analytics with collection trends and performance metrics
Real-Time Monitoring
- Live collection statistics and progress metrics
- System resource usage monitoring (CPU, RAM)
- Success/failure rate tracking
- Historical performance data
Cross-Platform Desktop App
- Native performance on Windows, macOS, and Linux
- Electron-based architecture for consistent experience
- Automatic updates via GitHub releases
- Offline credential management
Customizable Interface
- Dark and light theme support
- Adjustable font sizes for accessibility
- Collapsible widgets and dashboard customization
- Discord Rich Presence integration (optional)
Community Features
- Share custom scraping presets with other users
- Download community-created configurations
- Rate and report presets
- Preset versioning and updates
Browse and download presets shared by the community
End-to-End Encryption
- All user credentials encrypted at rest
- AES-256 encryption for sensitive data
- Secure key derivation using user authentication
Access Control
- Row-Level Security (RLS) via Supabase
- User-specific data isolation
- Configurable credential persistence
- Automatic log cleanup options
Minimum:
- OS: Windows 10, macOS 10.14 (Mojave), Ubuntu 20.04 or equivalent
- RAM: 4GB
- Storage: 500MB free disk space
- Internet: Stable broadband connection
Recommended:
- OS: Windows 11, macOS 12 (Monterey), Ubuntu 22.04
- RAM: 8GB or more
- Storage: 1GB free disk space
- Internet: High-speed connection (for faster data collection)
For Development:
- Node.js: 16.x or higher
- Python: 3.10 (specifically, not 3.11+)
- Git: Latest version
- Package Manager: npm 8+ or yarn 1.22+
Recommended for most users
Download the latest stable release from the Releases Page.
Requirements:
- Windows 10 or later
- 4GB RAM minimum (8GB recommended)
- 500MB free disk space
Installation Steps:
- Download
SupaScrapeR-Setup-x.x.x.exe - Run the installer
- If Windows SmartScreen appears:
- Click "More info"
- Click "Run anyway"
- Follow the installation wizard
- Launch SupaScrapeR from the Start Menu or desktop shortcut
Note: The SmartScreen warning appears because the application is not code-signed. The software is safe when downloaded from the official GitHub releases.
Requirements:
- macOS 10.14 (Mojave) or later
- 4GB RAM minimum (8GB recommended)
- 500MB free disk space
Installation Steps:
- Download
SupaScrapeR-x.x.x.dmg - Open the downloaded DMG file
- Drag SupaScrapeR.app to the Applications folder
- First launch (important):
- Right-click (or Control-click) on SupaScrapeR.app
- Select "Open" from the context menu
- Click "Open" in the security dialog
If macOS continues to block the application:
xattr -dr com.apple.quarantine "/Applications/SupaScrapeR.app"Requirements:
- Modern Linux distribution (Ubuntu 20.04+, Fedora 35+, etc.)
- 4GB RAM minimum (8GB recommended)
- 500MB free disk space
Installation Steps:
- Download
SupaScrapeR-x.x.x.AppImage - Make it executable:
chmod +x SupaScrapeR-x.x.x.AppImage- Run the application:
./SupaScrapeR-x.x.x.AppImageFor developers who want to build from source or contribute to the project.
- Node.js 16 or higher
- Python 3.10 (required for spaCy compatibility)
- Git
- npm or yarn
Important: Python 3.10 is specifically required. Newer versions (3.11+) may have compatibility issues with spaCy 3.7.2.
# Clone the repository git clone https://github.com/kennethhuang7/SupaScrapeR.git cd SupaScrapeR # Install Node.js dependencies npm install # Install Python dependencies pip install -r requirements.txt # Download spaCy language model (optional, for enhanced NLP features) python -m spacy download en_core_web_smRun with hot-reload:
npm run electron-devThis starts:
- Vite dev server on
http://localhost:5173 - Electron app with React DevTools enabled
- Hot module replacement for instant updates
Create installer for your platform:
npm run distOutput will be in dist-electron/ directory.
Platform-specific builds:
- Windows: Requires Windows or Wine
- macOS: Requires macOS (code signing requires Apple Developer account)
- Linux: Can be built on any platform
SupaScrapeR requires two Supabase instances:
- Central Database (already configured in the app) - Handles user authentication and profiles
- Personal Database (you create this) - Stores your collected Reddit data
Step 1: Create Supabase Account
- Go to supabase.com
- Sign up for a free account
- Create a new project
Step 2: Configure Database Schema
Run the following SQL commands in your Supabase SQL Editor:
Enable UUID Extension:
CREATE EXTENSION IF NOT EXISTS "pgcrypto";Create Posts Table:
CREATE TABLE reddit_posts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), post_id TEXT NOT NULL, title TEXT, body TEXT, url TEXT, permalink TEXT, score INTEGER, upvote_ratio DOUBLE PRECISION NOT NULL, num_comments INTEGER NOT NULL, created_utc TIMESTAMP WITHOUT TIME ZONE NOT NULL, author TEXT, subreddit TEXT NOT NULL, sentiment DOUBLE PRECISION NOT NULL, comments JSONB NOT NULL, live BOOLEAN NOT NULL );Create Performance Index:
CREATE UNIQUE INDEX idx_reddit_posts_post_id ON reddit_posts(post_id);Step 3: Obtain Database Credentials
In your Supabase project dashboard:
- Navigate to Settings → API
- Copy your Project URL (format:
https://xxxxx.supabase.co) - Copy your service_role key (NOT the anon key)
Important: Keep your service role key secure. Never commit it to version control or share it publicly.
Step 4: Verify Setup
Run this verification query in SQL Editor:
SELECT column_name, data_type, is_nullable FROM information_schema.columns WHERE table_name = 'reddit_posts' ORDER BY ordinal_position;You should see all the columns listed above.
Reddit requires API credentials for all applications accessing their platform.
Step 1: Access Reddit App Preferences
- Log in to your Reddit account at reddit.com
- Go to reddit.com/prefs/apps
- Scroll to the bottom and click "Create App" or "Create Another App"
Step 2: Configure Application
Fill out the form with these settings:
- Name:
SupaScrapeR(or any name you prefer) - App type: Select "script" (this is critical)
- Description: Optional
- About URL: Leave blank
- Redirect URI:
http://localhost:8080
Click "Create app"
Step 3: Save Your Credentials
After creation, you'll see a box with your app information:
- Client ID: 14-character string directly under the app name
- Client Secret: 27-character string next to "secret"
- User Agent: Create your own in this format:
SupaScrapeR/2.0 by YourRedditUsername
Example User Agent: SupaScrapeR/2.0 by john_doe
Important: Store these credentials securely. You'll need them when first launching SupaScrapeR.
Step 1: Initial Configuration
On first launch, SupaScrapeR will guide you through setup:
- Data Location: Choose where to store app configuration files (default recommended for most users)
- Account Creation/Login:
- Create a new account OR
- Log in with existing credentials
- Enable "Keep me signed in" for convenience (credentials are encrypted locally)
Step 2: Enter Credentials
You'll need to provide:
Supabase Credentials:
- Project URL:
https://xxxxx.supabase.co - Service Role Key: Your Supabase service_role key
Reddit API Credentials:
- Client ID: 14-character string from Reddit app
- Client Secret: 27-character string from Reddit app
- User Agent: Your custom user agent string
Step 3: Verify Configuration
The app will test your credentials and ensure connectivity to both services.
SupaScrapeR offers three data collection strategies:
Best for: Topic-specific research, brand monitoring, targeted data collection
How it works:
- Enter base keywords (e.g., "electric vehicles, tesla, EV")
- App fetches related trending keywords from Google Trends
- Select which keywords to include in your search
- App searches specified subreddits for posts matching keywords
- Collects post content, comments, and metadata
Configuration:
- Batch size: 5-50 posts per batch (adjust based on available RAM)
- Keyword count: 1-20 keywords recommended
- Subreddit selection: Use presets or custom lists
Best for: Finding viral content, engagement analysis, trending discussions
How it works:
- Fetches newest posts from selected subreddits
- Analyzes engagement metrics (comments, upvotes, activity)
- Identifies high-engagement posts
- Collects detailed data including full comment threads
Configuration:
- Batch size: 5-100 posts per batch
- Engagement threshold: Configurable in settings
- Update interval: For continuous monitoring
Best for: Comprehensive data collection, research projects, trend analysis
How it works:
- Runs both Keyword Search and DeepScan sequentially
- Provides maximum coverage of relevant content
- Automatically deduplicates posts collected by both methods
Recommended for: Most research applications requiring thorough data collection
Presets allow you to save and share subreddit configurations.
- Navigate to Presets page
- Click "Create New Preset"
- Configure:
- Preset name and description
- List of subreddits (one per line)
- Scraping mode (keyword, deepscan, or both)
- Visibility (private or community)
- Save preset
Create and configure custom scraping presets
- Navigate to Community page
- Browse available presets
- Click "Download" on any preset
- Preset appears in your Presets list
- Select it when configuring a scraping session
- Create a preset with "Community" visibility enabled
- Other users can discover and download it
- Presets can be rated and reported
- Top-rated presets appear first in Community page
The scraping interface provides real-time feedback:
Progress Indicators:
- Keyword progress bar (if using keyword mode)
- Subreddit progress bar
- Posts collected (current batch)
- Overall completion percentage
Performance Metrics:
- Posts per second collection rate
- Success/failure counts
- CPU and RAM usage
- Network activity
Real-time progress tracking with detailed logs and system metrics
Live Logging:
- Real-time activity feed
- Error messages with details
- Duplicate detection notifications
- Completion status
Controls:
- Stop scraping (safe shutdown)
- Pause/resume (for continuous mode)
- Export current session data
- View collected posts
Browse and analyze your collected Reddit posts
SupaScrapeR uses a hybrid architecture combining Electron's main/renderer process separation with a Python backend for Reddit API interaction.
┌─────────────────────────────────────────────────┐ │ Electron Main │ │ ┌───────────────────────────────────────────┐ │ │ │ React Renderer (UI) │ │ │ │ - TypeScript + React 18 │ │ │ │ - Tailwind CSS styling │ │ │ │ - Real-time state management │ │ │ └───────────────────────────────────────────┘ │ │ ↕ │ │ ┌───────────────────────────────────────────┐ │ │ │ Main Process (Node.js) │ │ │ │ - IPC communication │ │ │ │ - Window management │ │ │ │ - Auto-updates │ │ │ │ - System integration │ │ │ └───────────────────────────────────────────┘ │ │ ↕ │ │ ┌───────────────────────────────────────────┐ │ │ │ Python Scraper Backend │ │ │ │ - PRAW (Reddit API) │ │ │ │ - VADER sentiment analysis │ │ │ │ - Data processing │ │ │ └───────────────────────────────────────────┘ │ └─────────────────────────────────────────────────┘ ↕ ┌────────────────────────┐ │ Supabase Cloud │ │ - PostgreSQL database │ │ - Authentication │ │ - Real-time sync │ └────────────────────────┘ Frontend (React)
src/pages/: Main application pagessrc/components/: Reusable UI componentssrc/services/: Business logic and API clientssrc/lib/: Utility functions and configurations
Backend (Electron Main)
electron/main.js: Application entry pointelectron/preload.js: Renderer↔Main bridgeelectron/autoUpdater.js: Update managementelectron/services/: Core services (logging, Discord RPC)
Python Backend
scripts/scraper.py: Main scraping logicscripts/reddit_client.py: Reddit API wrapperscripts/data_processor.py: Post/comment processing
Data Flow:
- User interacts with React UI
- UI sends IPC message to Main process
- Main process spawns Python scraper subprocess
- Python collects data from Reddit API
- Python processes and uploads to Supabase
- Main process receives progress updates
- UI updates in real-time via IPC events
Prerequisites:
- Node.js 16+
- Python 3.10
- Git
Setup Steps:
# Clone repository git clone https://github.com/kennethhuang7/SupaScrapeR.git cd SupaScrapeR # Install dependencies npm install pip install -r requirements.txt # Optional: Install spaCy for enhanced NLP python -m spacy download en_core_web_sm # Run in development mode npm run electron-dev # Build for production npm run distSupaScrapeR/ ├── electron/ # Electron main process │ ├── main.js # Application entry point │ ├── preload.js # Context bridge │ ├── autoUpdater.js # Update system │ └── services/ # Core services │ ├── errorLogger.js │ └── discordRPC.js ├── src/ # React frontend │ ├── components/ # UI components │ ├── pages/ # Application pages │ ├── services/ # Business logic │ ├── lib/ # Utilities │ └── styles/ # CSS/Tailwind ├── scripts/ # Python backend │ ├── scraper.py # Main scraping logic │ ├── reddit_client.py # Reddit API wrapper │ └── data_processor.py # Data processing ├── public/ # Static assets ├── assets/ # App icons and images ├── requirements.txt # Python dependencies ├── package.json # Node.js dependencies └── README.md # This file TypeScript/React:
- Follow React functional component patterns
- Use TypeScript for type safety
- Tailwind CSS for styling (no inline styles)
- ESLint configuration provided
Python:
- PEP 8 style guide
- Type hints for function signatures
- Docstrings for classes and functions
Commits:
- Use conventional commits format
- Example:
feat: add auto-update system,fix: resolve login issue
We welcome contributions! Here's how you can help:
Reporting Bugs:
- Check if the issue already exists
- Create a new issue with:
- Clear description
- Steps to reproduce
- Expected vs actual behavior
- System information (OS, version, etc.)
- Error logs if applicable
Suggesting Features:
- Open an issue with "Feature Request" label
- Describe the feature and use case
- Explain why it would be valuable
Submitting Pull Requests:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Test thoroughly
- Commit with clear messages
- Push to your fork
- Open a Pull Request with:
- Description of changes
- Related issue numbers
- Screenshots/videos if UI changes
Development Guidelines:
- Maintain code style consistency
- Add tests for new features
- Update documentation as needed
- Keep commits focused and atomic
For maintainers publishing new versions:
- Update Version:
# Update version in package.json # Example: "2.0.0" → "2.0.1" (bug fix) # "2.0.0" → "2.1.0" (new feature) # "2.0.0" → "3.0.0" (breaking change)- Build Installer:
npm run dist- Commit and Tag:
git add . git commit -m "Release v2.0.1" git tag v2.0.1 git push origin main --tags-
Create GitHub Release:
- Go to GitHub Releases page
- Click "Create a new release"
- Tag:
v2.0.1(must match package.json) - Title:
SupaScrapeR v2.0.1 - Description: List changes and fixes
- Upload installer from
dist-electron/ - Publish release
-
Auto-Update:
- Users will be notified automatically
- In-app update dialog will appear
- One-click update installation
Windows:
- Run as Administrator
- Check Windows Defender hasn't quarantined the app
- Ensure Visual C++ Redistributables are installed
macOS:
- Remove quarantine flag:
xattr -dr com.apple.quarantine /Applications/SupaScrapeR.app - Check System Preferences → Security & Privacy
- Ensure macOS 10.14 or later
Linux:
- Verify AppImage has execute permissions
- Check for missing system libraries:
ldd SupaScrapeR.AppImage - Try running from terminal to see error messages
"Invalid credentials":
- Verify Supabase URL format:
https://xxxxx.supabase.co - Ensure using service_role key (not anon key)
- Check project isn't paused in Supabase dashboard
- Verify Reddit API credentials are correct
"Database connection failed":
- Test Supabase connection in their dashboard
- Verify Row Level Security policies are configured
- Check firewall isn't blocking connections
- Ensure stable internet connection
"No posts collected":
- Verify subreddit names are correct (case-sensitive)
- Check subreddit hasn't banned your Reddit account
- Reduce batch size if memory errors occur
- Verify Reddit API rate limits haven't been exceeded
"Out of memory errors":
- Reduce batch sizes in settings:
- 4GB RAM: Use batch size 5
- 8GB RAM: Use batch size 10-25
- 16GB+ RAM: Use batch size 25-50
- Close other applications
- Restart the application
"Network timeout errors":
- Check internet connection stability
- Reduce batch size to lower concurrent requests
- Wait 10-15 minutes if rate limited
- Try different time of day (Reddit traffic peaks affect API)
"Update check failed":
- Verify internet connection
- Check GitHub isn't blocked by firewall
- Manually check releases page
- Disable VPN temporarily if using one
"Update download failed":
- Check available disk space (need ~200MB free)
- Temporarily disable antivirus
- Download installer manually from GitHub releases
Slow Collection Speed:
- Reduce number of subreddits
- Use fewer keywords
- Increase batch size (if RAM available)
- Check CPU/RAM usage in Task Manager
High Memory Usage:
- Decrease batch sizes
- Reduce number of concurrent operations
- Close unused browser tabs
- Restart app periodically for long sessions
Database Slow:
- Check Supabase project isn't paused
- Verify not exceeding free tier limits
- Consider upgrading Supabase plan for high volume
- Add database indexes for frequent queries
Before requesting help:
-
Check error logs:
- Windows:
%APPDATA%\SupaScrapeR\logs\ - macOS:
~/Library/Application Support/SupaScrapeR/logs/ - Linux:
~/.config/SupaScrapeR/logs/
- Windows:
-
Search existing GitHub issues
-
Try the troubleshooting steps above
Creating a support request:
- Go to GitHub Issues
- Click "New Issue"
- Include:
- Operating system and version
- SupaScrapeR version
- Exact error message
- Steps to reproduce
- Relevant log excerpts (redact credentials)
- What you've already tried
Do NOT include:
- Reddit API credentials
- Supabase keys
- Passwords or personal information
This project is licensed under the MIT License.
MIT License Summary:
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
- ✅ Private use allowed
- ❌ No warranty provided
- ❌ No liability accepted
See the LICENSE file for full details.
Kenneth Huang - Creator and Lead Developer
- GitHub: @kennethhuang7
- LinkedIn: Kenneth Huang
Built with:
- Electron - Cross-platform desktop framework
- React - UI library
- Supabase - Backend and database
- PRAW - Reddit API wrapper
- Tailwind CSS - Utility-first CSS framework






