Detect or Not Detect Chrome Headless

#1

Despite these tips described in this article, It is not possible to detect and block Chrome headless, It seems possible to detect Chrome Headless.

For example these 2 sites detect & block Chrome Headless. :

To do this, he uses the DataDome Real-Time Bot Protection service.
I have no idea how he proceeds to detect Chrome Headless.

#2

Are you sure it’s the headless Chrome being detected and not just the IP being blocked or some heuristic that checks user behavior doing the blocking?

#3

Yes, I’m pretty sure.
IP is not blocked because I can access these pages with the Chrome browser without any problems.
Perhaps some heuristic that checks user behavior, but between the consultation in Headless mode or browser mode, from a user point of view I do nothing (no mouse click for example;, …).

#4

So when running locally, it works when

{ headless: false }

but fails when

{ headless: true }

?

#5

I didn’t test with { headless: true } only with { headless: false }. I’ll test it.
I simply viewed these pages with the Chrome browser.

#6

By default, the system runs with the Chromium browser. It’s more stable, but it’s not the same as Chrome. You can run with Chrome if you pass the

{ useChrome: true }

To Apify.launchPuppeteer() or to launchPuppeteerOptions.

#7

Below my test code.
Got result if { useChrome: true } is set or not and headless: false
Got NO result if { useChrome: true } is set or not and headless: true

   /* eslint-disable no-console */

    const Apify = require('apify');

    Apify.main(async () => {
        const sources = [
            { url: 'https://programmetv.ouest-france.fr/' },
        ];

        const requestList = new Apify.RequestList({ sources });
        await requestList.initialize();

        const crawler = new Apify.PuppeteerCrawler({
            launchPuppeteerOptions: {
                useChrome: true,
                headless: false,
                ignoreHTTPSErrors: false,
                slowMo: 500,
            },
            requestList,
            handlePageFunction: async ({ page, request }) => {
                console.log(`=>>> Processing ${request.url}...`);

                const title = await page.title();
                console.log(`=>>> Title of ${request.url}: ${title}`);
            },
        });

        await crawler.run();

        console.log('Crawler finished.');
    });

xxxx

#8

I see, I get the same behavior. However, I thought that you had already implemented all the tips that were mentioned in the article you mentioned earlier. Without those, it is definitely possible to detect headless Chrome easily.

Nevertheless, you can easily bypass this by using the apify/actor-node-chrome-xvfb base image and running the actor in { headless: false } mode.

We’ll try to investigate what exactly causes headless to be detected.

#9

I confirm that I have implemented all the advice that was mentioned in the article, however headless Chrome is being detected.

I use Apify with NodeJS on my Windows system. Unless I’m mistaken, I can’t use apify/actor-node-chrome-xvfb image based and run the actor with NodeJS.

I am curious and very interested in the results of your investigation.

#10

Did you try to use hideWebdriver function from sdk utils? It can help.

#11

Yes, but unfortunately without success.
const Apify = require(‘apify’);

Apify.main(async () => {
    const browser = await Apify.launchPuppeteer({
        // it's OK, we have the right title
	//headless: false, 
	// it's KO, we have 'You have been blocked' as the title
        headless: true,			
        ignoreHTTPSErrors: false,
        slowMo: 500,
	});
    const page = await browser.newPage();
    
    await Apify.utils.puppeteer.hideWebDriver(page);
    
    await page.goto('https://programmetv.ouest-france.fr/');
    
    const title = await page.title();
    console.log(`=>>> Title: ${title}`);
			
    await page.close();
    await browser.close();
});

Results :
With headless: false

INFO: System info {“apifyVersion”:“0.11.8”,“apifyClientVersion”:“0.5.5”,“osType”:“Windows_NT”,“nodeVersion”:“v10.15.0”}
WARNING: Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR=“D:\Developpement\NodeJS\CrawlUpDyn\apify_storage”
INFO: Launching Puppeteer {“headless”:false,“ignoreHTTPSErrors”:false,“slowMo”:500,“args”:["–no-sandbox","–enable-resource-load-scheduler=false","–user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"],“defaultViewport”:{“width”:1366,“height”:768}}
=>>> Title: Programme TV ce soir

With headless: true

INFO: System info {“apifyVersion”:“0.11.8”,“apifyClientVersion”:“0.5.5”,“osType”:“Windows_NT”,“nodeVersion”:“v10.15.0”}
WARNING: Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR=“D:\Developpement\NodeJS\CrawlUpDyn\apify_storage”
INFO: Launching Puppeteer {“headless”:true,“ignoreHTTPSErrors”:false,“slowMo”:500,“args”:["–no-sandbox","–enable-resource-load-scheduler=false","–user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"],“defaultViewport”:{“width”:1366,“height”:768}}
=>>> Title: You have been blocked

#12

I went through conversation and there is one hint, which can help you.

On your local development you can alway use headfull mode. You dont need to use apify/actor-node-chrome-xvfb there. But you have to set up apify/actor-node-chrome-xvfb docker images if you want to run headfull browser on Apify platform. I hope it helps.