DEV Community

Jonathan Geiger
Jonathan Geiger

Posted on • Originally published at socialkit.dev

How to Extract YouTube Shorts Video Details Using Puppeteer

YouTube Shorts have transformed the social media landscape, generating billions of views with their bite-sized vertical video format. However, extracting metadata and analytics data from these short-form videos presents unique technical challenges that differ from regular YouTube videos.

The YouTube Shorts URL Challenge

The fundamental difference between YouTube Shorts and regular videos lies in their URL structure:

YouTube Shorts URL Pattern:

https://www.youtube.com/shorts/VIDEO_ID 
Enter fullscreen mode Exit fullscreen mode

Regular YouTube URL Pattern:

https://www.youtube.com/watch?v=VIDEO_ID 
Enter fullscreen mode Exit fullscreen mode

To extract video details from YouTube Shorts, we must convert the Shorts URL to the regular format, as the detailed metadata interface is only accessible through the standard video player.

URL Conversion Implementation

const convertShortsToRegularUrl = (url) => { // Define regex patterns for both URL types const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/; const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/; if (shortsPattern.test(url)) { // Extract video ID from Shorts URL const videoId = url.match(shortsPattern)[1]; return `https://www.youtube.com/watch?v=${videoId}`; } else if (regularPattern.test(url)) { // Already a regular YouTube URL return url; } else { throw new Error('Invalid YouTube URL format'); } }; // Example usage const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz'; const regularUrl = convertShortsToRegularUrl(shortsUrl); // Result: 'https://www.youtube.com/watch?v=ABC123xyz' 
Enter fullscreen mode Exit fullscreen mode

Project Setup

Initialize your YouTube Shorts scraper project:

mkdir youtube-shorts-details-scraper cd youtube-shorts-details-scraper npm init -y npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth 
Enter fullscreen mode Exit fullscreen mode

The stealth plugin helps bypass detection mechanisms that might block automated browser sessions.

Basic YouTube Shorts Details Extractor

Here's a fundamental implementation that converts Shorts URLs and extracts core video metadata:

const puppeteer = require('puppeteer-extra'); const StealthPlugin = require('puppeteer-extra-plugin-stealth'); // Enable stealth mode to avoid detection puppeteer.use(StealthPlugin()); const convertShortsToRegularUrl = (url) => { const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/; const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/; if (shortsPattern.test(url)) { const videoId = url.match(shortsPattern)[1]; return `https://www.youtube.com/watch?v=${videoId}`; } else if (regularPattern.test(url)) { return url; } else { throw new Error('Invalid YouTube URL format'); } }; const extractYouTubeShortsDetails = async (shortsUrl) => { // Convert Shorts URL to regular YouTube format const url = convertShortsToRegularUrl(shortsUrl); console.log(`Converting: ${shortsUrl} -> ${url}`); const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'], }); try { const page = await browser.newPage(); await page.setViewport({ width: 1280, height: 800 }); // Navigate to converted YouTube URL await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 }); await page.waitForTimeout(1500); // Handle cookie consent banner try { await page.evaluate(() => { const cookieButton = document.querySelector( 'button[aria-label*="cookies"]' ); if (cookieButton) { cookieButton.click(); console.log('Cookie banner closed'); } }); await page.waitForTimeout(1000); } catch (e) { console.log('No cookie banner detected'); } // Scroll to trigger content loading await page.evaluate(() => window.scrollBy(0, 300)); await page.waitForTimeout(800); // Extract video details const videoDetails = await page.evaluate(() => { const extractNumber = (text) => { if (!text) return 0; const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim(); const match = cleanText.match( /(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/ ); if (!match) return 0; const numStr = match[0]; const suffix = numStr.slice(-1); if (['K', 'M', 'B'].includes(suffix)) { const num = parseFloat(numStr.slice(0, -1).replace(/,/g, '')); if (isNaN(num)) return 0; switch (suffix) { case 'K': return Math.floor(num * 1000); case 'M': return Math.floor(num * 1000000); case 'B': return Math.floor(num * 1000000000); } } else { const num = parseFloat(numStr.replace(/,/g, '')); return isNaN(num) ? 0 : Math.floor(num); } return 0; }; const data = {}; // Extract video title const titleElement = document.querySelector( 'h1.ytd-watch-metadata yt-formatted-string' ); data.title = titleElement ? titleElement.textContent.trim() : ''; // Extract channel information const channelElement = document.querySelector('ytd-channel-name a'); if (channelElement) { data.channelName = channelElement.textContent.trim(); data.channelLink = channelElement.href || ''; } // Extract view count const viewsElement = document.querySelector('#info span[class*="view"]'); data.views = viewsElement ? extractNumber(viewsElement.textContent) : 0; // Extract likes const likesElement = document.querySelector( '.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content' ); data.likes = likesElement ? extractNumber(likesElement.textContent) : 0; return data; }); return { originalUrl: shortsUrl, convertedUrl: url, ...videoDetails, }; } catch (error) { console.error('Error extracting Shorts details:', error); throw error; } finally { await browser.close(); } }; // Usage example const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz'; extractYouTubeShortsDetails(shortsUrl) .then((details) => console.log('Shorts Details:', details)) .catch((error) => console.error('Extraction failed:', error)); 
Enter fullscreen mode Exit fullscreen mode

Comprehensive YouTube Shorts Metadata Extraction

For production use, here's an advanced implementation with robust error handling and comprehensive data extraction:

const puppeteer = require('puppeteer-extra'); const StealthPlugin = require('puppeteer-extra-plugin-stealth'); puppeteer.use(StealthPlugin()); const convertShortsToRegularUrl = (url) => { const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/; const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/; if (shortsPattern.test(url)) { const videoId = url.match(shortsPattern)[1]; return `https://www.youtube.com/watch?v=${videoId}`; } else if (regularPattern.test(url)) { return url; } else { throw new Error('Invalid YouTube URL format'); } }; const scrapeYouTubeShortsDetails = async (shortsUrl) => { const url = convertShortsToRegularUrl(shortsUrl); console.log(`Processing: ${shortsUrl} -> ${url}`); const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'], }); try { const page = await browser.newPage(); await page.setViewport({ width: 1280, height: 1024, deviceScaleFactor: 1, }); // Navigate to YouTube video await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 }); await page.waitForTimeout(1500); // Handle cookie banner try { await page.evaluate(() => { const cookieButton = document.querySelector( 'button[aria-label*="cookies"]' ); if (cookieButton) { cookieButton.click(); } }); await page.waitForTimeout(1000); } catch (e) { console.log('No cookie banner found'); } // Scroll to load below-the-fold content try { await page.evaluate(() => window.scrollBy(0, 300)); await page.waitForTimeout(800); } catch (e) { console.log('Could not scroll page'); } // Extract comprehensive metadata const metadata = await page.evaluate(() => { const extractNumber = (text) => { if (!text) return 0; const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim(); const match = cleanText.match( /(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/ ); if (!match) return 0; const numStr = match[0]; const suffix = numStr.slice(-1); if (['K', 'M', 'B'].includes(suffix)) { const num = parseFloat(numStr.slice(0, -1).replace(/,/g, '')); if (isNaN(num) || num < 0 || num > 999999) return 0; switch (suffix) { case 'K': return Math.floor(num * 1000); case 'M': return Math.floor(num * 1000000); case 'B': return Math.floor(num * 1000000000); } } else { const num = parseFloat(numStr.replace(/,/g, '')); if (isNaN(num) || num < 0 || num > 999999999999) return 0; return Math.floor(num); } return 0; }; const data = { title: '', channelName: '', channelLink: '', views: 0, likes: 0, comments: 0, publishDate: '', description: '', thumbnailUrl: '', }; // Extract title const titleElement = document.querySelector( 'h1.ytd-watch-metadata yt-formatted-string' ); if (titleElement) { data.title = titleElement.textContent.trim(); } // Extract channel information const channelElement = document.querySelector('ytd-channel-name a'); if (channelElement) { data.channelName = channelElement.textContent.trim(); data.channelLink = channelElement.href || ''; } // Extract views with multiple fallback selectors const viewsSelectors = [ '#info span[class*="view"]', '#info .style-scope.yt-formatted-string', '#info .view-count', ]; for (const selector of viewsSelectors) { const viewsElement = document.querySelector(selector); if (viewsElement && viewsElement.textContent.trim()) { const text = viewsElement.textContent.trim(); if ( text.includes('views') || text.includes('view') || /[\d,]+[KMB]?\s*(views?|watching)/i.test(text) ) { data.views = extractNumber(text); break; } } } // Extract likes const likesElement = document.querySelector( '.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content' ); if (likesElement) { data.likes = extractNumber(likesElement.textContent); } // Extract comments count const commentsElement = document.querySelector('#title #count span'); if (commentsElement) { data.comments = extractNumber(commentsElement.textContent); } // Extract publish date const publishElement = document.querySelector( 'ytd-watch-metadata #info-strings yt-formatted-string:nth-child(2)' ); if (publishElement) { data.publishDate = publishElement.textContent.trim(); } // Extract description const descriptionElement = document.querySelector( 'ytd-watch-metadata #description-text' ); if (descriptionElement) { data.description = descriptionElement.textContent.trim().substring(0, 300) + '...'; } // Extract thumbnail const thumbnailElement = document.querySelector('video'); if (thumbnailElement) { data.thumbnailUrl = thumbnailElement.poster || ''; } return data; }); return { originalShortsUrl: shortsUrl, convertedUrl: url, extractedAt: new Date().toISOString(), ...metadata, }; } catch (error) { console.error('Error scraping Shorts details:', error); throw error; } finally { await browser.close(); } }; // Usage example const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz'; scrapeYouTubeShortsDetails(shortsUrl) .then((details) => { console.log('Shorts Title:', details.title); console.log('Channel:', details.channelName); console.log('Views:', details.views.toLocaleString()); console.log('Likes:', details.likes.toLocaleString()); }) .catch((error) => console.error('Failed to scrape Shorts details:', error)); 
Enter fullscreen mode Exit fullscreen mode

Technical Challenges and Solutions

1. URL Structure Conversion

The most critical aspect of YouTube Shorts scraping is proper URL conversion:

// URL validation and conversion const validateAndConvertUrl = (url) => { try { const urlObj = new URL(url); if ( urlObj.hostname !== 'www.youtube.com' && urlObj.hostname !== 'youtube.com' ) { throw new Error('Not a YouTube URL'); } return convertShortsToRegularUrl(url); } catch (error) { throw new Error(`Invalid URL: ${error.message}`); } }; 
Enter fullscreen mode Exit fullscreen mode

2. Dynamic Content Loading

YouTube Shorts interface elements load asynchronously:

// Wait for critical elements before extraction await page.waitForSelector('h1.ytd-watch-metadata', { timeout: 10000 }); await page.waitForSelector('#info', { timeout: 5000 }); // Additional wait for engagement metrics await page.evaluate(() => window.scrollBy(0, 300)); await page.waitForTimeout(800); 
Enter fullscreen mode Exit fullscreen mode

Alternative: SocialKit YouTube Stats API

For production applications requiring reliable YouTube Shorts analytics, consider SocialKit's YouTube Stats API:

curl "https://api.socialkit.dev/youtube/stats?access_key=<your-access-key>&url=https://youtube.com/watch?v=dQw4w9WgXcQ" 
Enter fullscreen mode Exit fullscreen mode

Example Response

{ "success": true, "data": { "url": "https://youtube.com/watch?v=dQw4w9WgXcQ", "title": "Rick Astley - Never Gonna Give You Up (Official Video)", "channelName": "Rick Astley", "channelLink": "https://youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw", "views": 1428567890, "likes": 16234567, "comments": 4567890 } } 
Enter fullscreen mode Exit fullscreen mode

API Benefits:

  • Automatic URL handling: Processes both Shorts and regular YouTube URLs
  • No conversion needed: Handles URL transformation internally
  • Consistent data structure: Standardized response format across all video types
  • Real-time accuracy: Always up-to-date with current video statistics
  • Scale-ready: Handle thousands of Shorts without rate limits
  • Global availability: Works worldwide without geo-restrictions

Free YouTube Tools

Need instant access to YouTube Shorts data? Try our free tools:

YouTube Video Summarizer Tool

Get AI-powered insights with our free YouTube Video Summarizer tool:

  • Analyze YouTube Shorts content with AI-powered summaries
  • Extract key themes and trending topics from short-form videos
  • Identify viral content patterns for your own content strategy
  • Get instant insights without any setup or registration required

YouTube Transcript Extractor Tool

Extract content from Shorts with our free YouTube Transcript Extractor tool:

  • Extract transcripts from YouTube Shorts automatically
  • Get timestamped segments for precise content analysis
  • Perfect for accessibility and content repurposing
  • 100% free with support for both Shorts and regular videos

Both tools automatically handle YouTube Shorts URLs and provide immediate value for content creators, social media managers, and digital marketers.

Conclusion

Extracting YouTube Shorts video details with Puppeteer requires mastering the critical URL conversion technique that transforms Shorts URLs into standard YouTube video URLs. This conversion unlocks access to YouTube's comprehensive metadata interface, enabling extraction of views, likes, comments, and other valuable analytics data.

Top comments (0)