I started a new project querying Envato API for all WordPress themes.
Limited Theme Index
First of all – it’s not possible to find a list of all themes anywhere. Not the API, not the website itself. Pagination is limited to 60 pages, and with only 30 items per page – each theme category is limited to having 1800 themes.
Pagination is limited to 60 pages, and with only 30 items per page – each theme category is limited to having 1800 themes.
I managed to almost get around that by using Node.js and Cheerio to scrape everything off categories by filtering in price brackets ( For example,
/wordpress/creative/photography with price brackets
[0, 30], [30, 40], [40, 50] and so on ). From that scrape, I got a list of around 9200 theme IDs out of the current 9700 that’s on ThemeForest. For me – that’s enough to experiment with my idea, for now. If everything works out, I can spend a few days making a more robust scraper. I don’t get why I should be building a scraper to do this in the first place, but that’s just where the fun really begins.
After I got all the theme IDs without a hiccup (scraped 7000 items in under 10 seconds), I thought “yay” – I can finally move away from scraping HTML to the more open web part – Envato API.
So I pulled down Guzzle package with composer and wrote a very nice little script that pulls down theme information using Envato API asynchronously. Patting myself on the back I started the script that made around 100 requests in a second.
With that rate, I’ll be able to fetch all the information I need about the themes in under 30 seconds. Perfect.
Or is it….
I hit a brick wall 100 km/h (62.1371 mph for you Americans). Envato API banned me and banned me from the whole Envato ecosystem. I thought it was going to be a fun few days while I try to unban myself through the wonderful customer support that Envato has always provided 🙃 – but it wasn’t that bad. The ban lasted for just a couple minutes.
Okay, so sending 100 requests isn’t a good idea. Let’s try only 3 (instead of 10) concurrent requests, and no more than 10 in a row. And sure enough, just as before – after 75 requests – ban ban ban.
At this point, I’m wondering what the API key is for. Since in order to avoid waiting for a while to ban to clear, I’ve been just changing my VPN locations. Using the same API key I can still use multiple IPs to bombard the server?
Anyhow, the Envato API is very cautious. So let’s try a very cautious approach:
- 1 Concurrent Request
- Sleep for 10 seconds after every 20 requests
I managed to fetch 180 themes that way! Woo! Aaaaaaand – an error 🎉🎉🎉
Error is good, error is progress.
404 Not Found
When I was banned I received a 403 Forbidden, this time the console says I’ve hit 404 on id
2862929. Open up Themeforest and – sure enough – that item has been removed. I don’t think a 404 response is the best kind of response in this case. It makes Guzzle think it hit a wrong URL, but the request is fine, what’s in that URL is that an item has been removed.
On top of the error being a 404, the error also doesn’t return a JSON response. So Now I have to parse the response to figure out which item was “not found” 🤦♂️.
After I fixed the 404 error, – I’m banned again 🤡
Ok, let me try a turtle approach this time:
- Make 5 requests
- Wait for them to complete
- sleep for 5 seconds
I managed to pull information for 300 themes from the API in 5 minutes. That’s just not viable for requesting info about 9000 themes.
I’ve finally figured out the Envato DDoS mech.
Querying 5 results, without sleep gets you a ban at 200 requests.
Querying 5 results, with
sleep(1) between every request is all fine and dandy with the great guardian protector. Yay, no ban. Maybe some day I’ll become friends with the great guardian…
In the end, I’ve managed to optimize it down to 3 concurrent requests, and every 50 requests a 5-second sleep ( I tried bumping that to 75 requests, but got a ban after 800 results). That seems to be working.
More than 3 concurrent requests are going to result in a ban very quickly. Not sleeping in between also results in a ban in no time.
So be careful – don’t use the API too much, k?
I don’t want to bash anyone, but I think this is the worst API I’ve ever worked with.
In contrast, a few days ago I bombarded WordPress.org API with 20 concurrent requests around 10000 times in less than 10 minutes (while debugging my application) and WordPress API didn’t even blink. There is no API key, just ask, and they shalt provide.
From my past experiences with Envato, all I can say is that this is probably as good as it gets. I’ll try contacting the support to figure out if there is a way I can avoid being banned, but I don’t have much hope.
50 requests complete in about 22 seconds in total, so downloading the whole list is going to take about an hour.