Web Scraping APIs benchmark
We developed a benchmark to test selected Web Scraping APIs. It involves scraping various web pages that are commonly targeted in web scraping workflows. The results let us evaluate Web Scraping APIs in terms of reliability, proxy quality, speed and cost.
Python script which we used to run the benchmark is publicly available in a GitHub repository. It can also be used to run a scraping job with Scraping Fish API by providing an input file with a list of URLs to scrape.
Methodolgy overview
The benchmark includes URLs from 5 categories:
- Alexa: URLs from the top 1,000 Alexa rank
- Amazon: Amazon product URLs
- Google: Google search queries
- Instagram: the top 10 Instagram profiles (as of 2022)
- Similarweb: websites from the similarweb ranking (excluding adult and russian websites)
For each category, we made 1,000 requests and recorded:
- ✅ successful URLs
- ❌ failed URLs
- ⛔️ blocked URLs
- ⏱ average URL processing time (seconds/URL)
- 💰 cost of running the benchmark (1000 requests)
More details on methodology and instruction to reproduce the results are provided in a GitHub repository.
Results
Scraping Fish 🐟
Test | ✅ Successful | ❌ Failed | ⛔️ Blocked | ⏱ Processing time | 💰 Cost |
---|---|---|---|---|---|
Alexa | 99.9% | 0.1% | 0% | 2.63 | $2 |
Amazon | 100.0% | 0% | 0% | 3.37 | $2 |
100.0% | 0% | 0% | 1.63 | $2 | |
99.9% | 0.1% | 0% | 1.9 | $2 | |
Similarweb | 100.0% | 0% | 0% | 2.50 | $2 |
Total | 99.96% | 0.04% | 0.0% | 2.4 | $10 |
📝 $0.002 per each successfully scraped URL. The highest overall success rate and the best processing time.
Other Web Scraping APIs
ScrapingAnt
Benchmarks run using --api "https://api.scrapingant.com/v1/general/?proxy_type=residential&"
parameter and adjusted code to pass API key as a header instead of query parameter.
Test | ✅ Successful | ❌ Failed | ⛔️ Blocked | ⏱ Processing time | 💰 Cost |
---|---|---|---|---|---|
Alexa | 100.0% | 0% | 0% | 6.92 | $19 |
Amazon | 98.0% | 2.0% | 0% | 9.84 | $19 |
95.0% | 5.0% | 0% | 13.80 | $19 | |
99.5% | 0.5% | 0% | 6.76 | $19 | |
Similarweb | 96.0% | 4.0% | 0% | 7.40 | $19 |
Total | 97.7% | 2.3% | 0.0% | 8.94 | $49 |
📝 $49 Startup subscription required to scrape 5,000 URLs in total (each consuming 50 or 250 API credits) and using 5 concurrent connections.
ScrapingBee
Benchmarks run using --api "https://app.scrapingbee.com/api/v1/?premium_proxy=true&"
and custom_google
parameter set to true for Google benchmark.
Test | ✅ Successful | ❌ Failed | ⛔️ Blocked | ⏱ Processing time | 💰 Cost |
---|---|---|---|---|---|
Alexa | 81.0% | 18.0% | 1.0% | 4.86 | $99 |
Amazon | 99.0% | 1.0% | 0% | 11.48 | $99 |
100.0% | 0% | 0% | 3.74 | $99 | |
99.0% | 1.0% | 0% | 18.52 | $99 | |
Similarweb | 90.0% | 8.0% | 2.0% | 4.70 | $99 |
Total | 93.8% | 5.6% | 0.6% | 8.66 | $99 |
📝 $99 Startup subscription required to scrape 5,000 URLs in total (each consuming 10, 20, or 25 API credits) and using 5 concurrent connections.
ScraperAPI
Benchmarks run using --api "http://api.scraperapi.com/?premium=true&"
parameter.
Test | ✅ Successful | ❌ Failed | ⛔️ Blocked | ⏱ Processing time | 💰 Cost |
---|---|---|---|---|---|
Alexa | 95.5% | 4.5% | 0% | 7.19 | $49 |
Amazon | 96.0% | 4.0% | 0% | 10.97 | $49 |
100.0% | 0% | 0% | 4.50 | $49 | |
Instagram* | 0.0% | 100.0% | 0% | - | - |
Similarweb | 90.0% | 8.0% | 2.0% | 4.70 | $49 |
Total | 76.3% | 23.3% | 0.4% | 6.84 | $49 |
* Scraping Instagram is not allowed and returns 403 status code.
📝 $49 Hobby subscription required to scrape 5,000 URLs in total (each consuming 10 or 25 API credits) and using 5 concurrent connections.
Conclusions
Scraping Fish 🐟 achieved the highest total success rate of 99.96% with the best average processing time of 3.23 seconds/URL. Moreover, thanks to Scraping Fish API simple and transparent pricing, the total cost of running the benchmark was 5-10 times smaller compared to other tested APIs.
Try Scraping Fish API
To run the scraping script for your use case, you can get a starter pack of 1,000 API requests for only $2.