Navigation Timeout Exceeded: 30000ms

#1

I’m having an issue when creating a broken link crawler. I’m trying to crawl tvo.org and the site is about 16,000 pages

I’m currently working off of jancurn/find-broken-links actor and modified it to have non-headless instances so that PDFs can be crawled without error. I also modified the utils.enqueueLinks - selector to omit hrefs that are mailto or javascript(a:not([href^=“mailto”]):not([href^=“javascript”])).

After a while I start getting Timeout errors with the error of Navigation Timeout Exceeded: 30000ms on some of the pages. I noticed that the Autoscaled Pool state is giving,
"cpuInfo":{"isOverloaded":true,"maxOverloadedRatio":0.4,"actualRatio":0.8309705561613958}
So I’ve tinkered with the puppeteerPoolOptions and autoscaledPoolOptions, but I’m still getting the navigation timeout errors.

Here is my source for the actor:

Anyone have any ideas to remedy this problem?

Thanks,
Wilfred

#2

I forgot to mentioned, that I’ve tried to add this line to my puppeteer crawler:

gotoFunction: async ({request, page}) =>{
            return page.goto(request.url, {"waitUntil":["load", "networkidle2"]})
        },
#3

Hi Wilfred, can you try to disable the “Use spare CPU capacity” setting on your actor? And also, I see you are using XVFB but not the apify/actor-node-chrome-xvfb Docker image, why is that?

#4

Also, does it happen to all pages or just some of them? Perhaps the waitUntil":["load", "networkidle2"] setting is not right for some pages, e.g. pages that keep loading content forever.

#5

Hi Jan,

I’ll try to disable the “Use space CPU capacity” setting. I’m using XVFB, but i had to modify the Dockerfile because of permissions errors:

I added this to the Dockerfile:

RUN ["chmod", "+x", "start_xvfb_and_run_cmd.sh"]
RUN ["chmod", "+x", "start_actor.sh"]

This issue happens to some pages. I added the waitUntil because when changing the puppeteerPoolOptions, I was getting this a disconnect error on some pages, so I thought it was because the instances were being closed before the pages were fully loaded. Here is the error:

Error: Navigation failed because browser has disconnected!