Getting latest data set from task


#1

I have recently needed to move from a crawler to a cheerio actor and I am having problems getting the data out.

With the crawlers I could use lastexec in the url to get the latest dataset however I can’t see this option. Is there any way to get the latest data set in json with a static URL?


#2

Hi there,

this feature is definitely planned and will be released in a near future but unfortunately I can’t currently give you an exact date. We will let you know here in discussion once it’s deployed.

Possible workaround (with some overhead) could be:

  • either to have another actor started after the cheerio crawler that would copy the data into a named dataset that has fix URL.
  • or to have another actor that will find the last run, download it’s data and return them on output. This actor could be used as synchronous API https://www.apify.com/docs/api/v2#/reference/actors/run-actor-synchronously but the number of items returned would be limited.

Marek


#3

Hello again!

I have good news, we have prioritised this request as there are more people waiting for it and it has been just deployed to production.

Check docs: https://www.apify.com/docs/api/v2#/reference/actors/last-run-object-and-its-storages

The endpoint you want is now available at:

/v2/acts/[ACTOR_ID]/runs/last/dataset/items?token=[YOUR_TOKEN]&status=SUCCEEDED

Let me know if you have any questions.
Best,
Marek


#4

Brilliant! Will check it out now, thanks.