Image Urls - data:image/gif;base64

#1

Hi All,

I have an issue that I can’t get my head around.

I’m using an actor https://my.apify.com/actors/oJAas3xzBSoCRHqHE

The problem is that on the returned data some of the image URLs retuned this value. 

This does not happen when i use a legacy crawler.

Url that returns these results:

https://www.tesco.com/groceries/en-GB/shop/fresh-food/cheese/speciality-and-continental-cheese/flavoured-cheese?page=1&count=48

This is the page function:

function pageFunction(context) {
        var $ = context.jQuery;
        var results = [];
        $(".product-list--list-item").each(function() {
            results.push({
             a_product: $(this).find('div.product-details--wrapper > div > a').attr('href'),
             b_title: $(this).find('div.product-details--wrapper > div > a').text().trim(),
             c_price: $(this).find('div.product-controls--wrapper > form > div > div.price-details--wrapper > div.price-control-wrapper > div > div > span > span.value').text().trim(),  
             d_price_unit: $(this).find('.price-per-quantity-weight').text().trim(),
             Image_Url: $(this).find('.product-image').attr('src')
           });
        });
        return results;
    }

Any insight would be appreciated

Thank you

#2

Hi @Jonathan_Gillmor,

the problem is that the page is using lazy-loading for images (there are placeholders and image is shown only after you scroll on a product listing).

To solve this issue, you can scroll down in a Page function. Just add these three lines of code before extracting data:

await new Promise(resolve => setTimeout(resolve, 1000));
$("html, body").animate({ scrollTop: $(document).height() }, 3000);
await new Promise(resolve => setTimeout(resolve, 1000));

btw, you also have to use async before function pageFunction(context) {

#3

This is what I have changed but it has not helped:

async (context) => {
    var $ = context.jQuery;
    var results = [];
    $(".product-list--list-item").each(function() {
        results.push({
         a_product: $(this).find('div.product-details--wrapper > div > a').attr('href'),
         b_title: $(this).find('div.product-details--wrapper > div > a').text().trim(),
         c_price: $(this).find('div.product-controls--wrapper > form > div > div.price-details--wrapper > div.price-control-wrapper > div > div > span > span.value').text().trim(),  
         d_price_unit: $(this).find('.price-per-quantity-weight').text().trim(),
         image_url: $(this).find('.product-image').attr('src')
       });
    });
    await new Promise(resolve => setTimeout(resolve, 1000));
    $("html, body").animate({ scrollTop: $(document).height() }, 3000);
    await new Promise(resolve => setTimeout(resolve, 1000));

    return results;
}

I’m not sure where I went wrong

#4

You have to add it before data extraction as follows:

async function pageFunction(context) {
    var $ = context.jQuery;
    var results = [];
    
    await new Promise(resolve => setTimeout(resolve, 1000));
    $("html, body").animate({ scrollTop: $(document).height() }, 3000);
    await new Promise(resolve => setTimeout(resolve, 1000));

    $(".product-list--list-item").each(function() {
        results.push({
         a_product: $(this).find('div.product-details--wrapper > div > a').attr('href'),
         b_title: $(this).find('div.product-details--wrapper > div > a').text().trim(),
         c_price: $(this).find('div.product-controls--wrapper > form > div > div.price-details--wrapper > div.price-control-wrapper > div > div > span > span.value').text().trim(),  
         d_price_unit: $(this).find('.price-per-quantity-weight').text().trim(),
         image_url: $(this).find('.product-image').attr('src')
       });
    });
    return results;
}
#5

Thank You

Also, should it be:

async function pageFunction(context) {

or

async (context) => {
#6

Hi @Jonathan_Gillmor,

both syntaxes are correct, you can use the one you find more preferable.

#7

Thanks guys for all the help.