Empty result when multiple start urls


#1

Hi,

A very basic crawler-question:

The function below crawls the first url fine, but returns blank results in url 2, 3 etc. I have tried following the guide on importing list from external source (with google sheets), but the results are the same.

I also tried inserting dummy-test in the url, but can get it to work either.

Any tips?

Best

Simon

First URL:
https://www.boligsiden.dk/salgspris/tilsalg/alle/1?postnummer=2830&kontantpris.min=50000000&salgsperiode.min=150

Second URL:
https://www.boligsiden.dk/salgspris/tilsalg/alle/1?postnummer=2800&kontantpris.min=60000000&salgsperiode.min=130

Function:

function pageFunction(context) {
    var $ = context.jQuery;    
    var result = [];    
    $(".ci--property").each( function() {
        result.push({
            propertytype : $(this).find(".ci__head").text().trim(),
            address : $(this).find(".ci__info--address .h4").text().trim(),
            city : $(this).find(".ci__info--address span").text().trim(),
            askingprice : $(this).find("span.h4").text().trim(),
                     
        });
    });        
    return result;
  }

#2

Hi @simon,

It should work. If you want to start crawler with multiple start URL using API. You need to pass:

{
  "startUrls": [
    {
      "key": "",
      "value": "https://www.boligsiden.dk/salgspris/tilsalg/alle/1?postnummer=2830&kontantpris.min=50000000&salgsperiode.min=150"
    },
    {
      "key": "",
      "value": "https://www.boligsiden.dk/salgspris/tilsalg/alle/1?postnummer=2800&kontantpris.min=60000000&salgsperiode.min=130"
    }
  ]
}

as body payload to API run execution(API, JS client)

Let me know if it helps.


#3

Hi dropnikj,

Thanks for the reply! I made a flow in intestogram which take one cell in Google sheets and make and api call, and that works. But here the payload is one one value, if I choose multiple it fails after the first one. Very strange. This could be a solution, but then I will make many calls with each url, instead of just delivering one package.

Under basic setup I have:

Start urls: URL
START https://www.boligsiden.dk/salgspris/tilsalg/alle/1?postnummer=2830&kontantpris.min=5000000&salgsperiode.min=150

Lyngby https://www.boligsiden.dk/salgspris/tilsalg/alle/1?postnummer=2800&kontantpris.min=6000000&salgsperiode.min=130

Does the label have something to do with the output?

Best Simon


#4

Hello @simon,

The labels are used to distinguish between different URLs. You can find them in the pageFunction under context.request.label. This is useful when different pages have different structure and you need to change the scraping code dynamically.

They have no effect on the output.