Suppose I want to crawl example.com and their server responds with a 4xx or 5xx status code (ex: 403 restricted). From my tests, it appears the apify crawler will not recognize that the page load has failed and incorrectly set
true in the request object.
This brings me to several questions:
- Am I understanding this correctly? Apify crawler needs a timeout to retry a page load?
- I expect I will want to update the intercept request function to set
willLoadto false when the server response is a 4xx or 5xx error. is that right?
- I would like to keep trying to load a page (ex: n tries) until it gives me a 200 response. It would be best if I could set apify to slow it’s throttle each time there is an error so that we don’t hammer the server. Is there a setting/feature for this or do i have to build it?
- I would like to retire a proxy (at least temporarily) if a server responds with an error code (especially 4xx errors). is there a setting for this or do i have to build it?
Apologies if this is explained elsewhere; I’ve been looking through docs and forum and library without finding this topic.