Zillow Crawler fails


#1

Hi,

I’m attempting to use the Zillow crawler (https://www.apify.com/petr_cermak/qNTpd-api-zillow-com), but it fails immediately with a Captcha error, saying “Please verify you are a human to continue”.

In the results, the URL =

https://www.zillow.com/captchaPerimeterX/?url=%2Fhomes%2Ffor_sale%2FPortland-OR_rb%2F%3FfromHomePage%3Dtrue%26shouldFireSellPageImplicitClaimGA%3Dfalse%26fromHomePageTab%3Dbuy&uuid=d189ba30-dcac-11e8-9a39-0fe3c602dc27&vid=

And the errorInfo = Error invoking user-provided ‘pageFunction’: Error: TypeError: null is not an object (evaluating ‘ud.repeats’)

How can this be resolved?

@petr_cermak, do you have any suggestions?

Thank you!!


#2

Hi Will,

If you try it again, this issue should already be fixed.
You just might need to use proxies to make to crawler more effective.


#3

Hi Petr,

I did just copy the latest version of the Zillow scraper to my account, and I attempted to run the crawl with both Apify Proxy (automatic) and Apify Proxy (Selected Groups). The persistent error is:

Error invoking user-provided ‘pageFunction’: Error: TypeError: null is not an object (evaluating ‘ud.repeats’)

I added additional console logging to the pageFunction function, as follows:

    var ud = context.request.interceptRequestData;
    console.log("Printing ud");
    console.log(ud);  

And I see the following:

[2018-10-31 19:44:34.672: S0000001] Capturing snapshots to: screenshot_2018-10-31T19-44-34.672_reqMJ0uJyhNvhndv3q.(png|html)
[2018-10-31 19:44:34.794: S0000001] ON CONSOLE MESSAGE | Printing ud
[2018-10-31 19:44:34.794: S0000001] ON CONSOLE MESSAGE | null
[2018-10-31 19:44:34.794: S0000001] ON CONSOLE MESSAGE | re-enqueuing because of ReCaptcha…
[2018-10-31 19:44:34.794: S0000001] ERROR: Error invoking user-provided ‘pageFunction’: Error: TypeError: null is not an object (evaluating ‘ud.repeats’)

Given the above, context.request.interceptRequestData is NULL. Any ideas?

Thanks,
Will


#4

Apify Proxy (selected groups), and selecting only 1 of the proxy groups, rather than all of them. Thank you Petr!