Newbie: Thought this simple task would work.....but

#1

Hi guys, help needed please!

Just playing around with APIFY to see if it will do what I need. As an example, I thought I would extract all the H1’s from a page.

Here is my page function

async function pageFunction(context) {
const { request, log, skipLinks } = context;
const title = document.querySelector(‘title’).textContent;
const $headings = $(“h1”);
log.info(URL: ${request.url} TITLE: ${title} );

return {
        venue: $headings.text(),

}
}

When I run the task I get this even though plenty of <H1 class="…">…</H1> on the page

ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {“url”:“https://webpagetocrawl.com/events-list.html","retryCount”:1}
2019-04-27T11:51:49.396Z TypeError: Cannot read property ‘text’ of null
2019-04-27T11:51:49.398Z at Object.pageFunction (:9:30)
2019-04-27T11:51:49.400Z at puppeteer_evaluation_script:6:67

Any ideas, help appreciated

#2

Hi @scraper123,

you need to define as jQuery, plus you don't have quotes around log.info(`URL: {request.url} TITLE: ${title}`)
There is working code.

async function pageFunction(context) {
    const { request, log, skipLinks } = context;
    const $ = context.jQuery;
    const title = document.querySelector("title").textContent;
    const $headings = $("h1");
    log.info(`URL: ${request.url} TITLE: ${title}`);
    return {
        venue: $headings.text(),
    }
}

I hope, it helps.

#3

Hi @drobnikj,

That worked great thank you. I had been using the APIFY example which didn’t have the jQuery line in it (see https://apify.com/docs/scraping/web-scraper-tutorial#scraping-practice---getting-the-data).

Thanks again!!

#4

No problem,

jQuery is mentioned there with example, you can check https://apify.com/docs/scraping/web-scraper-tutorial#scraping-title--description--last-run-date-and-number-of-runs