Wrong character encoding of JSON response



I query a website which responds with UTF-8 encoded JSON, but when I use JSON.parse($('body pre').text()) the encoding is messed up as (apparently) Windows-1252, and actually the page screenshot also displays it as Windows-1252 (and not iso-8859-1 as I first thought, “é” becoming “é” in both, but “’” (U+2019) becoming “’” only in the former).

How can I fix that ? Is there a way to force the reading or parsing as UTF-8 ?

Edit : I tried the unescape(encodeURIComponent(s)) / decodeURIComponent(escape(s)) fix without success.


Finally got it to work using decodeURIComponent(escape(mystring)). It fails on many fields of the JSON response with

Error invoking user-provided ‘pageFunction’: Error: URIError: URI error

The solution was to run it not on the whole response but only on the field I’m interested in. Although there are still a few errors, the rate is quite limited and I guess I may use a few replace() to eradicate such errors.


Nah, it seems to work only for iso-8859-1 / UTF-8 conversion, and here for some reason we have the Windows-1252 charset (part of which overlaps iso-8859-1, hence the improvements).

Anyway, my browser is able to detect the right charset, so Apify should as well, shouldn’t it ? Or maybe there’s a way to force the encoding of the response of the scraped URL ? Content-type is only set to application/json, no charset defined.


Ok, I ended up with the following :


Not the most elegant solution, but it works.

(Basically, there is a confusion between the 8- and 9- rows of the windows-1252 characters and UTF8 ones, so escape(mystring) converts the wrongly windows-1252 string into UTF8 hexadecimal percent-encoding, but that’s not what we want so we use multiple replace() to replace the UTF8 %-encoded characters into their windows-1252 version, and finally decodeURIComponent() reencodes the whole string into correct UTF8.)