"BasicCrawler: handleRequestFunction ..." and "ENOENT: no such file or directory ..." error


#1

Hi,
if I run the same example again (see post below), I get following two errors:

  1. A “BasicCrawler: handleRequestFunction …” error.
  2. A “ENOENT: no such file or directory …” error.

About 2. According the second error, I dicovered, there was an “ENOENT” error, when I installed apify:

What are the meaning of both errors and how can I fix them?

Regards, Wolfgang


#2

Hello Wolfgang,

what version of Node.js and NPM are you using?

You can find out by running:
node -v
and
npm -v

npm install apify in an empty folder works fine for me on a Mac.

Nevertheless, you can create a package.json simply by calling

npm init -y

Let me know how that goes :slight_smile:


#3

About 2.: This was it, thanks. I have installed ind initialized the folder according to your advice / “Getting Started” and it works.

About 1: Modifying the example “PuppeteerCrawler” https://sdk.apify.com/docs/examples/puppeteercrawler for the page “https://www.wg-gesucht.de/wg-zimmer-in-Berlin-gesucht.8.0.1.0.html”, I still get the error “BasicCrawler: handleRequestFunction …”. By debugging the script I found out, it has smoething to do with method “$$eval” of the Pupperteer “page” Object. What is wrong with my modifications? Please see below.

Regards, Wolfgang

const Apify = require(‘apify’);

Apify.main(async () => {
// Create and initialize an instance of the RequestList class that contains the start URL.
const requestList = new Apify.RequestList({
sources: [
{ url: ‘https://www.wg-gesucht.de/wg-zimmer-in-Berlin-gesucht.8.0.1.0.html’ },
//{ url: ‘https://news.ycombinator.com/’ },
],
});
await requestList.initialize();

// Apify.openRequestQueue() is a factory to get a preconfigured RequestQueue instance.
const requestQueue = await Apify.openRequestQueue();

const crawler = new Apify.PuppeteerCrawler({
    // The crawler will first fetch start URLs from the RequestList
    // and then the newly discovered URLs from the RequestQueue
    requestList,
    requestQueue,

    // Here you can set options that are passed to the Apify.launchPuppeteer() function.
    // For example, you can set "slowMo" to slow down Puppeteer operations to simplify debugging
    launchPuppeteerOptions: { slowMo: 500 },

    // Stop crawling after several pages
    maxRequestsPerCrawl: 10,

    // This function will be called for each URL to crawl.
    // Here you can write the Puppeteer scripts you are familiar with,
    // with the exception that browsers and pages are automatically managed by the Apify SDK.
    // The function accepts a single parameter, which is an object with the following fields:
    // - request: an instance of the Request class with information such as URL and HTTP method
    // - page: Puppeteer's Page object (see https://pptr.dev/#show=api-class-page)
    handlePageFunction: async ({ request, page }) => {
        console.log(`Processing ${request.url}...`);

        // A function to be evaluated by Puppeteer within the browser context.
        const pageFunction = ($posts) => {
            const data = [];

            // We're getting the title and url of each flat on wg-gesucht.
            $posts.forEach(($post) => {
                data.push({
                    title: $post.querySelector('h3[class*="title"] > .detailansicht'),
                    url: $post.querySelector('h3[class*="title"] > .detailansicht').href,
                    //title: $post.querySelector('.title a').innerText,
                    //rank: $post.querySelector('.rank').innerText,
                    //href: $post.querySelector('.title a').href,
                });
            });

            return data;
        };
        const data = await page.$$eval('div.panel-default:not(.panel-hidden)');
        //const data = await page.$$eval('.athing', pageFunction);

        // Store the results to the default dataset.
        await Apify.pushData(data);

        /*
        // Find the link to the next page using Puppeteer functions.
        let nextHref;
        try {
            nextHref = await page.$eval('.morelink', el => el.href);
        } catch (err) {
            console.log(`${request.url} is the last page!`);
            return;
        }

        // Enqueue the link to the RequestQueue
        await requestQueue.addRequest(new Apify.Request({ url: nextHref }));
        */
    },

    // This function is called if the page processing failed more than maxRequestRetries+1 times.
    handleFailedRequestFunction: async ({ request }) => {
        console.log(`Request ${request.url} failed too many times`);
    },
});

// Run the crawler and wait for it to finish.
await crawler.run();

console.log('Crawler finished.');

});


#4

Hello Wolfgang,

by replacing the const data = await page.$$eval('.athing', pageFunction);

with your const data = await page.$$eval('div.panel-default:not(.panel-hidden)');

you forgot to add the pageFunction as the second parameter. It should be fine once you add it back.

Best Regards,
Ondra