Announcement

Do not use the forums to submit bug reports, feature requests or patches, submit a New Ticket instead.

#1 2005-03-03 05:40:31

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Charset for Xinha

hello,

did anyone use Xinha or Htmlarea with a different charsets?
has anybody expirience with utf8?

currently in the iframe is no charset defined, but it could be done using a new config-variable, here a small patch:

--- htmlarea.js (Revision 23)
+++ htmlarea.js (Arbeitskopie)
@@ -241,6 +241,9 @@
   if (this.baseURL && this.baseURL.match(/(.*)\/([^\/]+)/))
     this.baseURL = RegExp.$1 + "/";

+  // custom Charset for the iframe document
+  this.charSet = "";
+
   // URL-s
   this.imgURL = "images/";
   this.popupURL = "popups/";
@@ -1387,6 +1390,10 @@
       doc.open();
       var html = "<html>\n";
       html += "<head>\n";
+      if(editor.config.charSet != '')
+      {
+        html += "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=" + editor.config.charSet + "\">";
+      }
       if(typeof editor.config.baseHref != 'undefined')
       {
         html += "<base href=\"" + editor.config.baseHref + "\"/>";




...what happens if the user switches to Text-Mode?
then the charset from the Textarea will be used - which might be different!



or is is like that:
the iframe used the same charset (if none defined) as the parent-page.
Edit: ...i just tested this... and it doesn't work
so imo we NEED a charset for the iframe if we would like to use utf-8.

tanks for any help on this...
niko

Last edited by niko (2005-03-03 06:36:47)


Niko

Offline

#2 2005-03-03 20:30:39

anzenews
Xinha Community Member
Registered: 2005-02-21
Posts: 41

Re: Charset for Xinha

I don't know about iframe, but textarea definitely uses the charset of the parent page (if none supplied). It would be really strange if iframe behaved differently.

Last edited by anzenews (2005-03-03 20:32:27)

Offline

#3 2005-03-04 03:18:08

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Re: Charset for Xinha

anzenews wrote:

I don't know about iframe, but textarea definitely uses the charset of the parent page (if none supplied). It would be really strange if iframe behaved differently.

hmmm... ok, i take everything back....

it works without any change, the iframe uses the charset of the parent page....

thanks big_smile


Niko

Offline

#4 2005-03-05 00:36:37

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

niko wrote:

hello,

did anyone use Xinha or Htmlarea with a different charsets?
has anybody expirience with utf8?

Yes, I use it as utf8.  The one problem with that though is that the translations are not (all) utf8 which kinda messes things up a bit.

When I finish the new translation system I would hope to see translations converted to utf8, possibly they could also made available also in other "common" character sets for the given language.


James Sleeman

Offline

#5 2005-03-05 04:40:05

Mirical Bernd
Xinha Community Member
Registered: 2005-02-17
Posts: 14

Re: Charset for Xinha

We are using utf8, as MSIE tended to forget all post-data when submitting a text which contained characters like (c) or (r) or TM or other Word-stuff, which led into database-errors. Xinha seems to work like a charm with utf8.

Offline

#6 2005-03-05 07:53:42

anzenews
Xinha Community Member
Registered: 2005-02-21
Posts: 41

Re: Charset for Xinha

It was the same with htmlarea2 and ISO-8859-2 (Latin2). I even wrote some JS to convert offending characters before submitting. It worked, but it was clumsy. You wouldn't believe how many characters MSWord replaces automatically... sad

About converting codepages: there is a utility iconv under Linux (have no idea about Windows) that translates codepages rather nicely You jst say:
iconv -f iso-8859-2 -t utf8 somefile.txt > somefile.utf8.txt

If you wish, I can take care of that... Let me know.

Enjoy!

Anze

Offline

#7 2005-03-07 04:43:21

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Re: Charset for Xinha

OK... utf-8-encoded pages are working great....

but i have a problem with non-utf-8-pages:
the SuperClean plugin doesn't work correctly when calling tidy.
this is because tidy is called with the parameter -utf8

when i not have a page that uses latin1 all chars like üä get replaced by ? when i call tidy.
A solution is calling tidy with the right charset.
But we would have to pass the right charset then to the php-script as get-parameter i think(?)

please comment on this - if this would be a solution - i could then write a patch...


Niko

Offline

#8 2005-03-07 05:07:58

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

niko wrote:

OK... utf-8-encoded pages are working great....

but i have a problem with non-utf-8-pages:
the SuperClean plugin doesn't work correctly when calling tidy.
this is because tidy is called with the parameter -utf8

when i not have a page that uses latin1 all chars like üä get replaced by ? when i call tidy.
A solution is calling tidy with the right charset.
But we would have to pass the right charset then to the php-script as get-parameter i think(?)

please comment on this - if this would be a solution - i could then write a patch...

Yes, I think we'll need to add a config variable to HTMLArea.Config to specify the character set that should be used, probably defaulting to utf8,  I can't find any way to find the character encoding of the page from javascript or we could default it to the correct setting automagically.

Tricky bit is I'm not totally sure how the character set in the iframe is set currently.  Well, I do know, it's simply not set, what I don't know is if that means the iframe inherits the page's character set or if it uses the browsers default (which is probably ISO-8859-1 for most).  it may be that we will also need to add a meta tag into the iframe to try and persuade it into using the correct encoding.


James Sleeman

Offline

#9 2005-03-07 05:09:21

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

Ah, man, I'm going senile, I'm sure, I forgot you'd written all that in the first post wink


James Sleeman

Offline

#10 2005-03-07 05:12:29

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

So to summarise my rambling, niko if you would like to add the config variable, the meta tag to the iframe and fix up any of those minor character set issues such as in superclean you can handle (you might try a search for utf that should find any), then submit a patch as a ticket that'd be just great smile


James Sleeman

Offline

#11 2005-03-07 05:32:58

anzenews
Xinha Community Member
Registered: 2005-02-21
Posts: 41

Re: Charset for Xinha

Why not just use utf8 everywhere? We could convert all the lang stuff to utf8 and set iframe's meta tag to utf8 - both are easy tasks.
That way it would be much easier to implement and maintain and nobody loses anything - utf8 is as good as all other charsets combined.
The only possible side-efect is that the Xinha contents would be sent to PHP script in utf8 and not in the charset that is specified on the parent page - but personally I wouldn't care about that.
Still better than specifying iframe's character set to something that is set in my language - and getting different encoding based on language setting.

Just my 0.02 EUR.

Last edited by anzenews (2005-03-07 05:34:10)

Offline

#12 2005-03-07 05:42:05

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

anzenews wrote:

Why not just use utf8 everywhere? We could convert all the lang stuff to utf8 and set iframe's meta tag to utf8 - both are easy tasks.

Problem is, that while learned developers such as we already use utf8, most of the world is really still using ISO-8859-1, and most of the asian world I believe is using the various other encodings (EUC-JP, BIG-5, Shift_JIS).


James Sleeman

Offline

#13 2005-03-07 05:49:00

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

Here's a link on character encoding which may be useful in this discussion
http://www.crazysquirrel.com/compgen/form-encoding.php


James Sleeman

Offline

#14 2005-03-07 08:41:46

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Re: Charset for Xinha

gogo wrote:

I can't find any way to find the character encoding of the page from javascript or we could default it to the correct setting automagically.

I found some non-standard document-properties:

gecko:
document.characterSet
document.actualEncoding

ie:
document.charset

The difference between characterSet and actualEncoding i don't know.
The good thing about this is if you switch the charset in your browser (in the view-menu) this property gets updated too!

so we don't need a new config-setting!

basically this patch would be enough:

--- htmlarea.js (Revision 23)
+++ htmlarea.js (Arbeitskopie)
@@ -1387,6 +1387,12 @@
       doc.open();
       var html = "<html>\n";
       html += "<head>\n";
+      if (HTMLArea.is_gecko) {
+        var charSet = editor._mdoc.characterSet; //or should i use document.characterSet direclty?
+      } else {
+        var charSet = editor._mdoc.charset;
+      }
+      html += "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=" + charSet + "\">";
       if(typeof editor.config.baseHref != 'undefined')
       {
         html += "<base href=\"" + editor.config.baseHref + "\"/>";

But there are surely better solutions to implement this.
imho it could be a good thing to have a editor.charSet variable avaliable (in plugins)
so it might be better to set editor.charSet somewhere else (don't know where there is the right place)

greets
niko


Niko

Offline

#15 2005-03-07 20:44:17

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

niko wrote:
gogo wrote:

I can't find any way to find the character encoding of the page from javascript or we could default it to the correct setting automagically.

I found some non-standard document-properties:
...
The difference between characterSet and actualEncoding i don't know.

Sweet smile  I suspect that actualEncoding is the encoding returned by the server in the response while characterSet is what it is being displayed as (possibly modified by a meta tag, or user-selection).

so we don't need a new config-setting!

I think it would be better, as you allude to, to have

HTMLArea.Config.charSet

config variable which is defaulted to the main document's charset as reported by those document properties (so that everybody can leave it as the default unless they have some peculiar need).  Makes it easier in the long run.


James Sleeman

Offline

#16 2005-03-10 03:04:30

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Re: Charset for Xinha

...i had some troubles getting the SuperClean to work with utf-8.

...try using the SuperClean-Plugin with an ä for example - it won't work (even the current svn-head-version!)

this is waht i found out:
HTMLArea._postback uses encode(data[i]) (about line 4351)
which makes ALWAYS %E4 out of my ä - ignoring any character-set  (correct would be %C3%A4)

a possible solution for that problem:
use function encodeURIComponent instead of encode!
encodeURIComponent uses ALWAYS utf-8 encoding - even if the charset of the page is different.
(see http://www.js-examples.com/javascript/r … oplev.php)

and this is very good for the super-clean-tidy-part (and other plugins that call external application) - as they get ALWAYS utf-8-data and we don't have to handle different charsets!


Please report if this change from encode to encodeURIComponent would cause any problems.

thanks!
niko


Niko

Offline

#17 2005-03-11 00:11:42

gogo
Xinha Leader
From: New Zealand
Registered: 2005-02-11
Posts: 1,015
Website

Re: Charset for Xinha

Can you submit that as a ticket niko, thanks.


James Sleeman

Offline

#18 2005-03-11 02:56:49

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Re: Charset for Xinha

ok, submitted:
http://xinha.gogo.co.nz/cgi-bin/trac.cgi/ticket/57

btw: james, i want to thank you for your work here!
It is really great when i want to add/change anything in the code i just can post a patch and it will be included! (no-more hacking the code and applying all changes again and again big_smile)

niko


Niko

Offline

#19 2005-03-16 19:30:51

Wei
New member
Registered: 2005-03-16
Posts: 9

Re: Charset for Xinha

I think the naming of the language files should at least be logical, e.g

gb.js <= this is gb2312

i think it should be named

zh-CN.GB2312.js or
zh-CN.UTF-8.js for utf-8 charset

or something similar

In htmlarea, i had to write a hash that converts the ISO locale names (e.g. "zh-CN", "en-AU") into disorganized javascript language files.

Cheers, Wei.

Offline

#20 2005-03-17 02:52:38

niko
Xinha Authority
From: Salzburg/Austria
Registered: 2005-02-14
Posts: 338

Re: Charset for Xinha

gogo wants to write a complete new i18n-system....
i don't know how this works and how far he is...


Niko

Offline

Board footer

Powered by FluxBB