Automated Google Maps Business Scraper
<h3><strong>Description:</strong></h3><p>This tool is a <strong>powerful Node.js & Puppeteer-based scraper</strong> that automates the extraction of business data from Google Maps. It allows users to provide <strong>keywords, locations, or geolocation coordinates</strong> and retrieves structured business information including <strong>names, addresses, phone numbers, websites, and distances</strong> within a specified radius.</p><p>Designed for marketers, researchers, or businesses needing <strong>bulk business data</strong>, the scraper supports <strong>real-time progress updates</strong>, <strong>duplicate filtering</strong>, and <strong>invalid data handling</strong>, ensuring reliable and efficient data collection.</p><hr><h3><strong>Functionalities:</strong></h3><ul><li><p>🔍 <strong>Keyword & Location Search</strong> – Input any keyword and a location or use live geolocation to target specific areas.</p></li><li><p>🏢 <strong>Business Data Extraction</strong> – Scrapes <strong>name, phone number, website, address</strong>, and calculates <strong>distance</strong> from a given point.</p></li><li><p>📏 <strong>Radius Filtering</strong> – Only returns businesses within a specified radius in meters/kilometers.</p></li><li><p>🔁 <strong>Duplicate Handling</strong> – Automatically detects and skips duplicate entries.</p></li><li><p>🚫 <strong>Invalid Data Filtering</strong> – Skips entries with missing or malformed addresses.</p></li><li><p>📡 <strong>Real-Time Streaming</strong> – Uses <strong>Server-Sent Events (SSE)</strong> to stream progress and results live to the client.</p></li><li><p>🧭 <strong>Geocoding & Distance Calculation</strong> – Converts locations to coordinates and calculates distances using <strong>Node-Geocoder</strong> + <strong>Geolib</strong>.</p></li><li><p>🗄️ <strong>Caching</strong> – Stores geocoding results with <strong>Node-Cache</strong> for 24 hours to speed up repeated searches.</p></li><li><p>⚙️ <strong>Error Handling & Logging</strong> – Detailed logs for debugging, automatic screenshots on scraping errors.</p></li><li><p>🖥️ <strong>Headless Browser Automation</strong> – Puppeteer navigates Google Maps, scrolls listings, and extracts relevant data efficiently.</p></li></ul><hr><h3><strong>Technology Stack:</strong></h3><p><strong>Frontend / Client:</strong></p><ul><li><p>⚛️ <strong>React.js</strong> – Dynamic user interface (<code>maps-scraper-client/src/App.js</code>)</p></li><li><p>🌐 <strong>HTML5 & CSS3</strong> – Public pages (<code>index.html</code>, <code>styles.css</code>)</p></li><li><p>🖥️ <strong>JavaScript</strong> – Client-side logic (<code>script.js</code>)</p></li><li><p>🔄 <strong>Server-Sent Events (SSE)</strong> – Real-time log & data streaming to the browser</p></li></ul><p><strong>Backend / Scraper Engine:</strong></p><ul><li><p>🟢 <strong>Node.js</strong> with <strong>Express.js</strong> – API endpoints (<code>/api/scrape</code>, <code>/api/scrape-stream</code>)</p></li><li><p>🤖 <strong>Puppeteer</strong> – Automated browser for Google Maps navigation & scraping</p></li><li><p>🗺️ <strong>Node-Geocoder</strong> + <strong>LocationIQ API</strong> – Converts typed locations into latitude/longitude</p></li><li><p>📏 <strong>Geolib</strong> – Distance calculations between user location and businesses</p></li><li><p>🗄️ <strong>Node-Cache</strong> – Stores geocoding results for repeated searches</p></li><li><p>🌐</p></li><li><p>– Handles API requests safely</p></li></ul><p><strong>Additional Tools / Features:</strong></p><ul><li><p>📂 Automatic scrolling & panel handling to extract all visible businesses</p></li><li><p>🔄 Deduplication & invalid address skipping for cleaner datasets</p></li><li><p>📸 Automatic screenshots for debugging failures</p></li><li><p>⏱️ Configurable maximum results & timeout handling<br><br>Project Structure:<br>app/</p><p>├── page.tsx # Main page (orchestrates all components)</p><p>├── api/</p><p>│ ├── scrape/route.ts # SSE streaming scraper endpoint</p><p>│ └── export/</p><p>│ ├── csv/route.ts # CSV export</p><p>│ ├── xlsx/route.ts # Excel export</p><p>│ └── pdf/route.ts # PDF export</p><p>components/</p><p>├── scraper-form.tsx # Search form with all filters</p><p>├── counters.tsx # Live statistics</p><p>├── log-terminal.tsx # Real-time logs</p><p>├── results-table.tsx # Data display</p><p>├── export-section.tsx # Export buttons</p><p>├── header.tsx / footer.tsx</p><p>└── particle-background.tsx</p><p>lib/</p><p>├── maps-scraper.ts # Core Playwright scraper</p><p>├── email-extractor.ts # Email extraction logic</p><p>├── selectors-config.ts # Selector definitions & auto-update tracking</p><p>└── rate-limit.ts # Rate limiting logic</p></li></ul><hr>
react.jsHTML5CSS3Server-Sent Events (SSE)+6