Cheerio - task not halting


#1

I have a task that is supposed to crawl a list of URL’s and then stop at the end. The list is around 3500-4000 in total however the task keeps running to around 40k before timing out.

Any suggestions on how to modify the code below to prevent this from happening?

{
"startUrls": [{
    "requestsFromUrl": "https://docs.google.com/spreadsheets/d/id/gviz/tq?tqx=out:csv&sheet=Auto"
}],
"useRequestQueue": false,
"pseudoUrls": [],
"linkSelector": "a",
"pageFunction": "async ({ $, request }) => {\n\n\tconst name = $('html').find('span#a').text().trim();\n\tconst normal_price = $('html').find('span#b').text().trim();\n\tconst stock = $('html').find('span#1').text().trim();\n\tconst eta = $('html').find('span#c').text().trim();\n\tconst nla = $('html').find('div#2').text().trim();\n\tconst backorder = $('html').find('div#9').text().trim();\n\tconst minorder = $('html').find('span#3').text().trim();\n\tconst swatch = $('html').find('span#4').text().trim();\n\tconst manufacturer = $('html').find('a#5').text().trim();\n\tconst sku = $('html').find('span#6').text().trim();\n\tconst notes = $('html').find('#7').text().trim();\n\tconst wasprice = $('html').find('#8').text().trim();\n\tconst sale_price = $('html').find('#9').text().trim().text;\n    return { url: request.url,name,normal_price,stock,eta,nla,backorder,minorder,swatch,manufacturer,sku,notes,wasprice,sale_price };\n}",
"proxyConfiguration": {},
"debugLog": false,
"ignoreSslErrors": false,
"maxPagesPerCrawl": 6000,
"maxResultsPerCrawl": 6000,
"maxCrawlingDepth": 1

}


#2

Hello @PSA,

thanks for reporting this. It is a known issue that’s been fixed upstream already but the changes were not propagated to the apify/crawler-cheerio yet. I’ll let you know once that happens.

Sorry for the inconvenience.


#3

OK great, thanks will keep an eye out for updates.