DEV Community

Cover image for Scrapebase + Permit.io: Web Scraping with API-First Authorization
Tamizh
Tamizh

Posted on

Scrapebase + Permit.io: Web Scraping with API-First Authorization

This is a submission for the Permit.io Authorization Challenge: Permissions Redefined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Key Features

  • Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
  • API Key Authentication: Simple authentication using API keys
  • Role-Based Access Control: Permissions managed through Permit.io
  • Domain Blacklist System: Resource-level restrictions for sensitive domains
  • Text Processing: Basic and advanced text processing with role-based restrictions

How It Works

The core authentication and authorization flow:

  1. User sends request with x-api-key header
  2. permitAuth middleware intercepts the request
  3. Middleware maps API key to user role (free_user, pro_user, or admin)
  4. User is synced to Permit.io
  5. Permission check runs against Permit.io cloud PDP
  6. Request is allowed or denied based on policy decision
┌──────────┐ ┌───────────────┐ ┌────────────┐ ┌──────────────┐ │ Client │───▶│ Scrapebase API│───▶│permitAuth │───▶│ Permit.io │ │ │◀───│ │◀───│ middleware │◀───│ Cloud PDP │ └──────────┘ └───────────────┘ └────────────┘ └──────────────┘ │ ▲ │ │ └────────────────────────────────────────────────────────┘ Permission policies defined in Permit.io dashboard 
Enter fullscreen mode Exit fullscreen mode

Demo

Scrapebase Demo

You can test the API using the following endpoints:

# Test with free user curl -X POST http://localhost:8080/api/processLinks \ -H "Content-Type: application/json" \ -H "x-api-key: 2025DEVChallenge_free" \ -d '{"url": "https://example.com"}' # Test with admin user curl -X POST http://localhost:8080/api/processLinks \ -H "Content-Type: application/json" \ -H "x-api-key: 2025DEVChallenge_admin" \ -d '{"url": "https://example.com", "advanced": true}' 
Enter fullscreen mode Exit fullscreen mode

Project Repo

Scrapebase with Permit.io Authorization

A powerful web scraping API with fine-grained authorization controls powered by Permit.io. This project demonstrates how to implement sophisticated authorization patterns in a real-world API service Demo- https://scrapebase-permit.up.railway.app/

Features

  • Tiered Access Control: Different permissions for Free, Pro, and Admin users
  • Resource-Based Authorization: Control access based on target domains
  • Rate Limiting: Tier-specific rate limits enforced through policies
  • Advanced Scraping Features: Premium capabilities restricted to Pro users
  • Real-time Policy Updates: Changes to permissions take effect immediately
  • Audit Logging: Track all authorization decisions

Quick Start

  1. Clone the repository:
git clone https://github.com/0xtamizh/scrapebase-permit-IO cd scrapebase-permit-IO
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
npm install
Enter fullscreen mode Exit fullscreen mode
  1. Set up environment variables:
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Edit .env with your Permit.io API key and other configurations:

PERMIT_API_KEY=your_permit_api_key ADMIN_API_KEY=2025DEVChallenge_admin USER_API_KEY=2025DEVChallenge_user 
  1. Start the development server:
npm run dev
Enter fullscreen mode Exit fullscreen mode
  1. Visit http://localhost:3000 to access the testing UI

Testing the Authorization Features

Test Credentials

Admin User:

  • Username: admin
  • API Key…

My Journey

The Problem with Traditional Authorization

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. When I started this project, I wanted to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

I chose to build a web scraping service because it presents meaningful access control requirements:

  1. Tiered service levels that mirror real-world SaaS subscription models
  2. Administrative functions that require elevated permissions
  3. Resource-based restrictions through a domain blacklist system

The Power of API-First Authorization

The key insight that drove this project was the separation of concerns: business logic should be distinct from authorization decisions. By using Permit.io, I was able to:

  1. Define all permission policies in one place
  2. Enforce consistent access control across all endpoints
  3. Update policies without changing application code

The implementation was straightforward - here's the core middleware that powers the authorization flow:

// Map API key to user role switch (apiKey) { case process.env.ADMIN_API_KEY: userKey = '2025DEVChallenge_admin'; tier = 'admin'; break; // ...other keys } // Sync user to Permit.io await permit.api.syncUser({ key: userKey, email: `${userKey}@scrapebase.xyz`, attributes: { tier, roles: [tier] } }); // Check permission const action = req.body.advanced ? 'scrape_advanced' : 'scrape_basic'; const permissionCheck = await permit.check(user.key, action, 'website'); if (!permissionCheck) { return res.status(403).json({ success: false, error: 'Access denied by Permit.io' }); } 
Enter fullscreen mode Exit fullscreen mode

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP const resource = { type: 'website', key: hostname, attributes: { is_blacklisted: isBlacklistedDomain } }; const permissionCheck = await permit.check(user.key, action, resource); 
Enter fullscreen mode Exit fullscreen mode

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP const permissionCheck = await permit.check(user.key, action, resourceType); 
Enter fullscreen mode Exit fullscreen mode

Role Assignment

Another challenge was ensuring roles were properly synchronized and recognized. The solution was two-fold:

  1. Properly sync users with their role information
  2. Manually configure role permissions in the Permit.io dashboard

Using Permit.io for Authorization

Setting up Permit.io involved these key steps:

  1. Creating a project in the Permit.io dashboard
  2. Defining resources (website), actions (scrape_basic, scrape_advanced), and roles (free_user, pro_user, admin)
  3. Configuring the permission matrix in the dashboard
  4. Integrating the Permit.io SDK into my application

Here's the role-based capability matrix I implemented:

Feature Free User Pro User Admin
Basic Scraping
Advanced Scraping
Text Cleaning
AI Summarization
View Blacklist
Manage Blacklist
Access Blacklisted Domains

Permission Enforcement

Permissions are enforced in two places:

  1. The permitAuth middleware for API endpoints:
 const permissionCheck = await permit.check(user.key, action, 'website'); if (!permissionCheck) { return res.status(403).json({ success: false, error: 'Access denied' }); } 
Enter fullscreen mode Exit fullscreen mode
  1. Directly in route handlers for specific features:
 // src/routes/summarize.ts if (summarize) { const userTier = req.user?.attributes?.tier; if (userTier !== 'pro_user' && userTier !== 'admin') { return res.status(403).json({ success: false, error: 'Access denied', details: 'Text summarization is only available for Pro and Admin users' }); } } 
Enter fullscreen mode Exit fullscreen mode

What I Learned

Building Scrapebase with Permit.io taught me how to:

  1. Separate authorization concerns from business logic
  2. Implement role-based access control with external policy management
  3. Design a flexible permission system that doesn't require code changes to update policies

The advantages of this approach are clear:

  1. Separation of concerns: Business logic remains focused on core functionality while authorization is handled externally
  2. Adaptable policies: Permissions can be updated without code changes or redeployments
  3. Consistent enforcement: Authorization decisions follow the same rules across all application endpoints
  4. Improved security: Centralized policy management reduces the risk of inconsistent permission checks
  5. Developer experience: Cleaner codebase with reduced authorization-related complexity

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

  1. Set up a local PDP to enable ABAC with resource attributes
  2. Implement tenant isolation for multi-tenant support
  3. Add UI components in the admin dashboard to view permission audit logs
  4. Create more granular roles and permissions beyond the three tiers
  5. Add a user management section to assign roles through the UI

Scrapebase demonstrates how modern SaaS apps can delegate complex authorization to a specialized service like Permit.io, allowing developers to focus on core features while maintaining robust access controls.

Top comments (1)

Collapse
 
inatom_labs_6568f3125f77e profile image
inAtom Labs

This is awesome! Really well done 👏