ScrapeForge API

Enterprise web scraping that bypasses any protection

Enterprise-Grade Reliability

ScrapeForge handles the most challenging sites with 99.8% success rate, JavaScript rendering, and residential proxy rotation. Perfect for mission-critical data extraction.

POST

https://www.searchhive.dev/apihttps://www.searchhive.dev/api/v1/scrapeforge

Scrape any website with enterprise-grade reliability and JavaScript support

Status Codes

200

Successfully scraped the target URL

400

Invalid request parameters or malformed URL

403

Target site blocked the request

429

Rate limit exceeded - retry after delay

500

Internal server error - contact support

504

Timeout - target site took too long to respond

Quick Start

Get started with ScrapeForge in under 2 minutes. Simply provide a URL and get clean, structured data.

Basic scraping with JavaScript rendering

curl -X POST https://www.searchhive.dev/api/v1/scrapeforge \
  -H "Authorization": "Bearer: sk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "render_js": true,
    "wait_for": "#product-list",
    "extract_links": true,
    "follow_redirects": true
  }'

Core Features

JavaScript Rendering

Full browser rendering with Chromium for SPAs and dynamic content

React/Vue/Angular apps

Dynamic content loading

AJAX requests

Residential Proxies

Premium residential proxy network for high success rates

99.8% success rate

Global IP rotation

Anti-detection

Smart Retry Logic

Intelligent retry system with exponential backoff

Auto-retry failures

Rate limit handling

Optimal timing

Enterprise Security

Bank-grade security and compliance for sensitive operations

SOC 2 compliant

Data encryption

Audit logs

Common Use Cases

E-commerce Data

Product details, pricing, inventory, reviews

Lead Generation

Contact information, company data, social profiles

Market Research

Competitor analysis, market trends, industry data

Content Monitoring

Brand mentions, news articles, social media

Key Parameters

Essential ScrapeForge Parameters

Parameter	Type	Required	Description
`url`	string	Required	The URL to scrape. Must be a valid HTTP/HTTPS URL. Example:`"https://example.com/products"`
`render_js`	boolean	Optional	Execute JavaScript on the page before scraping. Example:`true`
`wait_for`	string	Optional	CSS selector or XPath to wait for before scraping. Example:`"#product-list"`
`extract_links`	boolean	Optional	Extract all links found on the page. Example:`true`
`follow_redirects`	boolean	Optional	Follow HTTP redirects automatically. Example:`true`

Response Format

ScrapeForge Response Fields

Field	Type	Description
`content`	string	The scraped HTML content of the page. Example:`"<html><body>...</body></html>"`
`text_content`	string	Plain text content extracted from HTML. Example:`"Welcome to our product catalog..."`
`links`	array	Array of links found on the page (if extract_links=true). Example:`[{"url": "...", "text": "...", "type": "..."}]`
`load_time`	float	Time taken to load and scrape the page in seconds. Example:`2.34`
`status_code`	integer	HTTP status code returned by the target server. Example:`200`
`credits_used`	integer	Number of API credits consumed by this request. Example:`5`

Bulk Scraping

Process multiple URLs simultaneously with intelligent load balancing and error handling.

Bulk scraping multiple URLs

Bulk Scraping Benefits

• Process up to 100 URLs per request
• Intelligent concurrency control
• Automatic retry for failed requests
• Consolidated billing and reporting

99.8%

Success Rate

0.8s

Avg Response Time

50M+

Pages Scraped

24/7

Uptime

Best Practices

Recommended Practices

Use specific selectors:

Wait for specific elements with wait_for parameter

Enable JS rendering selectively:

Only use render_js when necessary to save credits

Handle failures gracefully:

Implement proper error handling and retry logic

Respect rate limits:

Stay within your plan's concurrent request limits

Common Pitfalls

Scraping too frequently:

Balance data freshness with rate limiting

Ignoring robots.txt:

Respect website policies and terms of service

Not handling dynamic content:

Use render_js for JavaScript-heavy sites

Missing error handling:

Always check status codes and handle failures