Passing start url in php

I’m an noob here and to Apify. I have two actors, one is a web-scraper actor and one is a legacy-phantonjs-crawler. When I run either script from the dashboard, I get json results similar to:

“flightNumber”: “1027”,
“departureAirport1”: “CLT”,
“departureTime1”: " 11:52 AM ",
“arrivalAirport1”: “JFK”,
“arrivalTime1”: " 2:00 PM ",
“departureAirport2”: “JFK”,
“departureTime2”: " 3:00 PM ",
“arrivalAirport2”: “CLT”,
“arrivalTime2”: " 5:14 PM ",

which is what I should get.

However, when I run the following php code for my legacy-phantomjs-crawler:

$data = array(
“startUrls” => [
array(“key” => “startUrl”, “value” => “|2246|2019,6,23&ref=search”)]
$data_json = json_encode($data);

$ch = curl_init(‘’);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, “POST”);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_json);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
‘Content-Type: application/json’,
'Content-Length: ’ . strlen($data_json))
$result_json = curl_exec($ch);

echo $result_json;

I don’t get any json result.

2 questions:

  1. Is that the proper way to pass a start url, or should I do it some other way?, and

  2. Is there some reason why the $result_json is not returned and displayed?

Thanks in advance.


Hello Alan,

I’m not very familiar with PHP, but from the looks of it, it seems that you’re passing the data correctly. You can always check that under your task -> RUNS -> find the ones labeled API -> inspect their INPUT.

Now, to answer your second question, I suggest using the Web Scraper actor, instead of the legacy one, as we will no longer update the legacy-phantomjs-crawler.

With the Web Scraper, to get your data using the run-sync endpoint, you need to save the data that should be sent as response to the key value store under the OUTPUT key. It’s mentioned in the docs:

The HTTP response contains actor’s OUTPUT record from its default key-value store.

Now, to save the OUTPUT with the Web Scraper, you just call

await context.setValue('OUTPUT', { foo: 'bar' });

at the end of your pageFunction.

Note that this will only work when scraping a single page, because each subsequent invocation of setValue would overwrite the OUTPUT. We’re working on adding a parameter to the run-sync API to enable selecting whether you wish to return the OUTPUT or the whole Dataset.

Hope this helps.