Dump to firebase fails because of cors

#1

Using the examples in the library where result data is saved to Firebase database both the ajax version and the non-ajax version fail when the crawled site uses cors protection.

I can get the results just fine, but when I try to save to firestore db I get this error :

{“readyState”:0,“status”:0,“statusText”:“Error: SecurityError: DOM Exception 18”}

You can try for example from a twitter profile page. You’ll get that error.
Any special headers I could set to alleviate this problem?

#2

Hi @b2times4,

can you share with me some code? It will really help.
But if you are using the legacy crawler you can use Disable web security flag.

#3

Thanks for responding.
I’m actually using your example from the library where data is posted to firebase through ajax and the firebase rest api. (https://apify.com/drobnikj/DreoL-api-generic)

If you run that on http://www.example.com as in the example, it works perfectly because example.com does not have protection.

As soon as you try the same example on https://twitter.com/CharlottedWitte for example, it generates this error :

[2019-04-29 21:33:11.254: S0000001] ON CONSOLE MESSAGE | FAILED!!
[2019-04-29 21:33:11.254: S0000001] ON CONSOLE MESSAGE | {“readyState”:0,“status”:0,“statusText”:“Error: SecurityError: DOM Exception 18”}

Even with “Disable web security flag” enabled

You can try code below, I have a firebase set up with read and write access … you ll see the error in your logs when you try to scrape that twitter link

function pageFunction(context) {
// called on every page the crawler visits, use it to extract data from it
var $ = context.jQuery;
/**
 * Set your DB url and auth secret
 * doc for Firebase REST API: https://firebase.google.com/docs/reference/rest/database/
 * You can find auth secrets in Project Setting > Service accounts > Database secrets
 **/

var dbUrl = "https://apifytest-5041f.firebaseio.com/results.json";
// Function call Firebase DB REST API and finish pageFunction
var saveToFirebaseDbAndFinish = function(result) {
    context.willFinishLater();
    var resultToSave = result;
    resultToSave.url = context.request.loadedUrl;
    $.ajax({
        url: dbUrl,
        accept: "application/json",
        method: 'POST',
        contentType: "application/json; charset=utf-8",
        dataType: "json",
        data: JSON.stringify(resultToSave),
    }).success(function(data) {
        context.finish(result);
    }).fail(function(data) {
        console.log("FAILED!!");
        console.log(JSON.stringify(data)); // log for debuging
        context.finish(result);
    });
};

var result = {
    myValue: "test"
};

saveToFirebaseDbAndFinish(result);
return result;

}

#4

I see, twitter has some blocking of AJAX request.

Never mind you can enqueue new page with random unique key and url to example.com, where you can upload data. You need to pass data using interceptRequestData.

function pageFunction(context) {
    // called on every page the crawler visits, use it to extract data from it
    var $ = context.jQuery;
    /**
     * Set your DB url and auth secret
     * doc for Firebase REST API: https://firebase.google.com/docs/reference/rest/database/
     * You can find auth secrets in Project Setting > Service accounts > Database secrets
     **/
    var pageResult = {
        url: context.request.loadedUrl,
        myValue: "test"
    };
    context.enqueuePage({
        url: "http://example.com",
        label: 'toFirebase',
        uniqueKey: Math.random(),
        interceptRequestData: pageResult,
    });
    
    if (context.request.label === 'toFirebase') {
        var dbUrl = "https://apifytest-5041f.firebaseio.com/results.json";
        // Function call Firebase DB REST API and finish pageFunction
        var result = context.request.interceptRequestData;
        var saveToFirebaseDbAndFinish = function(result) {
            context.willFinishLater();
            var resultToSave = result;
            // resultToSave.url = context.request.loadedUrl;
            $.ajax({
                url: dbUrl,
                accept: "application/json",
                method: 'POST',
                contentType: "application/json; charset=utf-8",
                dataType: "json",
                data: JSON.stringify(resultToSave),
            }).success(function(data) {
                context.finish(result);
            }).fail(function(data) {
                console.log("FAILED!!");
                console.log(JSON.stringify(data)); // log for debuging
                context.finish(result);
            });
        };

        saveToFirebaseDbAndFinish(result);
        return result;
    }
}
#5

Thank you so much , this certainly put me on the right track.
Obviously this needed a small tweak to prevent www.example.com from getting enqueued endlessly (when the crawler hits www.example.com it would enqueue it again, and again , …)

Can you tell me when this way of crawling is going to be completely removed from Apify in favor of Actors / Tasks ?
I tried to convert this to a new Actor, but after it logged an error saying $.ajax is not a function I already gave up :slight_smile: