JS, Encoding and XMLHttpRequest 11

I run into a quite strange issue today. Basically the problem was that a javascript file was encoded using ISO-8859-1 and contained some weird characters. It worked fine in a plain HTML script tag that included the external file. However it did not work when the file was loaded using an XMLHttpRequest (MSXML has exactly the same behavior).

My first assumption was that the encoding of the file was incorrect and when I changed it to UTF-8 everything worked fine. However, it should work with ISO-8859-1 as well so I changed the encoding back and started sniffing the HTTP headers.

It turns out that neither IIS nor Apache include the encoding of js files in the HTTP response headers. My guess is that when the file was loaded through the HTML script tag the browser did some heuristics to figure out that the file was actually ISO-8859-1. The same heuristics does not apply to the XMLHttpRequest and it assumes that the file is UTF-8 which it obviously isn’t.

So to get this to work you need to set:

Content-Type: application/x-javascript; charset=ISO-8859-1

I guess the issue is that the web servers do not know that js files are text files and therefore don’t bother with the encoding. Does anyone know how to configure web servers to automatically insert the correct charset in the HTTP header?

This is a real world problem that more and more people will run into as the world moves more and more towards using XMLHttpRequest to load JSON and other text based data formats.

  • http://me.eae.net Emil A Eklund

    UTF-8 is the standard encoding for XML files, so it MSXML probably assumes that all files have that encoding if none is set.

    To have the web server set the charset for text files is quite painless haver you must either manually specify the charset for each file or assume that all files in a given location, or with a given name/extension.

    The problem is that there’s no way to tell the difference between an UTF-8 and an ISO-8859-1 file unless there are non-ansi character in it (assuming there’s no UTF-8 BOM) and even if it contains such characters the entire file must be scanned to determine the charset.

    The obvious solution here is, naturally, to always use UTF-8, as that will not only eliminate this kind of problems but also allow other languages to be used in the files, and users of such languages to use the technology. I can’t think of a single reason for using the ISO-8859-* encodings anymore. They are as obsolete as Netscape 4 and should die a very painful death.

  • http://erik.eae.net Erik Arvidsson

    “UTF-8 is the standard encoding for XML files, so it MSXML probably assumes that all files have that encoding if none is set.”

    I’m aware of this and it explains why coding it in UTf-8 works fine.

    “I can’t think of a single reason for using the ISO-8859-* encodings anymore.”

    A customer said that they already had a lot of js files written using Latin-1 and that is the only reason I needed to investigate this. (Besides it being fun trying to learn how things work.)

  • http://me.eae.net Emil A Eklund

    Well guess that is a valid reason, you got me there.

  • http://www.take.no Henrik Kjelsberg

    Well!! If your from Norway and making an application with a whole lot of iso-8859-1 characters, you would be frustrated. I could convert all foreign characters into their hex value, but is’nt that just going threw a whole lot of trouble for no reason, that is if it was possible to use iso-8859-1 with xmlhttprequests.

  • http://bijur.wordpress.com Daniel Wolff

    Man, this is S-O GREAT!

    You just saved my life!
    I Was doing (with PHP) a utf8_encode() in any string that gets in the way on the page…. but if I wasn´t using AJAX to load some content and display on a page (like in an include()) the code was turned into a mess!

    I never thought that all the content loaded by XHR (XmlHttpRequest) calls was coming only in UTF-8… now I´ve loaded the header with the correct content-type and the problem is gone!

    Thank you so much!
    Keep up with the excellent content on your blog, Erik… and again: Thank you!

  • Tom Davis

    If you have perl 5.8 or better compiled with perlio (standard) and can install the Encode module (it’s available on CPAN), it’s pretty easy to translate files from one encoding to another. Basically (from the Encode documentation):

    use Encode;

    open( LATIN, “) {print UTF8 $_; }
    close LATIN;
    close UTF8;

    I put all of that except the use clause into a sub and did a loop over all of the files, I wanted to change, 743 files in less than five seconds.

    NOTE: I thought my files were ISO-8859-1, but it turns out they are ms-1252, and that caused all kinds of problems because smart quotes and emdashes are not part of ISO-8859-1 and so get converted to something in utf8 that MANY applications recognize (correctly) as control characters, but that many browsers will (incorrectly) display as the smartquotes, etc. It might work regardless, but I got bitten. Rule of thumb, most windows apps produce ms-1252 rather than ISO-8859-1.

  • Pingback: Data Travelers Blog » AJAX #1()

  • Laven

    Hi,
    I am having problem passing foreign (Nordic) characters in a queryString via xmlhttprequest (AJAX). When I examine the querystring in the server-side script, the foreign characters are blanked out, when I examine the querysting in the AJAX function, all fine. Could someone help, please!!

  • Jeff Koenke

    This was the exact problem I was having. Your solution saved me hours of investigation. THANKS!!!!! Will strongly consider switching to UTF-8 in the future.

    Thanks again.
    Jeff

  • Fernando

    Hi,

    I was in the middle of a related problem, serving data through XHR and almost lost in the encoding nightmare… ;-) Your suggestion showed me the light. Thanks.

  • Pingback: singapore public speaking()