SOLVED: Revisit a specific url on each actor run


#1

Hi!
I would like to scrap articles from https://www.msn.com/de-at/autos/motorrad/

This is the main url on which all articles are listed. Each article url matches this format:

https://www.msn.com/de-at/autos/nachrichten/[.*]

So I would like the actor to revisit the “Base-Url” on each run to check if there are any new subpages. But for my understanding after visiting https://www.msn.com/de-at/autos/motorrad/ for the first time. The actor puts the url into “visited” and won’t revisit until I clear the requestQueue?


#2

I think I have found a solution to my problem:

const startRequestId = js_sha256.sha256(new Date().toISOString());
log.debug("startRequestId: " + startRequestId);

const startRequest = new Apify.Request({ url: 'https://www.msn.com/de-at/autos/motorrad/',                                             
                                         uniqueKey: startRequestId });    

await requestQueue.addRequest(startRequest, { id:  startRequestId }); 

I generate an unique_id by calling a hashfunction on the current date. This makes sure that the same URL gets a different id on each actor run and is added to the requestQueue.