Detect or Not Detect Chrome Headless


#1

Despite these tips described in this article, It is not possible to detect and block Chrome headless, It seems possible to detect Chrome Headless.

For example these 2 sites detect & block Chrome Headless. :

To do this, he uses the DataDome Real-Time Bot Protection service.
I have no idea how he proceeds to detect Chrome Headless.


#2

Are you sure it’s the headless Chrome being detected and not just the IP being blocked or some heuristic that checks user behavior doing the blocking?


#3

Yes, I’m pretty sure.
IP is not blocked because I can access these pages with the Chrome browser without any problems.
Perhaps some heuristic that checks user behavior, but between the consultation in Headless mode or browser mode, from a user point of view I do nothing (no mouse click for example;, …).


#4

So when running locally, it works when

{ headless: false }

but fails when

{ headless: true }

?


#5

I didn’t test with { headless: true } only with { headless: false }. I’ll test it.
I simply viewed these pages with the Chrome browser.


#6

By default, the system runs with the Chromium browser. It’s more stable, but it’s not the same as Chrome. You can run with Chrome if you pass the

{ useChrome: true }

To Apify.launchPuppeteer() or to launchPuppeteerOptions.


#7

Below my test code.
Got result if { useChrome: true } is set or not and headless: false
Got NO result if { useChrome: true } is set or not and headless: true

   /* eslint-disable no-console */

    const Apify = require('apify');

    Apify.main(async () => {
        const sources = [
            { url: 'https://programmetv.ouest-france.fr/' },
        ];

        const requestList = new Apify.RequestList({ sources });
        await requestList.initialize();

        const crawler = new Apify.PuppeteerCrawler({
            launchPuppeteerOptions: {
                useChrome: true,
                headless: false,
                ignoreHTTPSErrors: false,
                slowMo: 500,
            },
            requestList,
            handlePageFunction: async ({ page, request }) => {
                console.log(`=>>> Processing ${request.url}...`);

                const title = await page.title();
                console.log(`=>>> Title of ${request.url}: ${title}`);
            },
        });

        await crawler.run();

        console.log('Crawler finished.');
    });

xxxx