UTF8 on display, wrong in database because twice converted (solved)

Jean-Luc
Topic Author
Offline
Senior Member

03 Jun 2014 14:40 - 13 Jun 2014 09:49 #1 by Jean-Luc

UTF8 on display, wrong in database because twice converted (solved) was created by Jean-Luc

Hello !

Testing Projeqtor 4.3.0 with an empty database, I have added a client, and some descriptions (as in "designation") contain non-ascii characters. Logically, quite valid as the whole database fields are defined as "utf8_general_ci".
No problem on display and update, but in the database, "résidence" appears as "rÃ©sidence". And is also wrong in an UTF-8 file when exporting from PhpMyAdmin.
If I rectify directly in the DB or insert a record with "résidence", nothing is displayed in Projeqtor screen ! (But import with "rÃ©sidence" give a display OK).

My whole configuration is UTF-8 (obvious with Linux), so the only explanation is an erroneous conversion in the code (done in both directions, so work from display/update remains OK).

And this problem prohibits any import of data coming from elsewhere.

Can you see what's the fuck ?

Last edit: 13 Jun 2014 09:49 by Jean-Luc.

The topic has been locked.

babynus
Offline
Administrator
Moderator

03 Jun 2014 15:35 #2 by babynus

Replied by babynus on topic UTF8 on display, not in database

It is a database configuration issue.
Not only database must be defined in utf8_general_ci, but engine must be configured in UTF-8:

in my.ini :

[mysql]
default-character-set=utf8

[mysqld]
character-set-server = utf8

Babynus
Administrator of ProjeQtOr web site

The topic has been locked.

Jean-Luc
Topic Author
Offline
Senior Member

03 Jun 2014 16:16 #3 by Jean-Luc

Replied by Jean-Luc on topic UTF8 on display, not in database

As I said, all the system (currenltly local server on my PC) is utf8 (Linux Mint, not Windows !). I've several other databases, some since a lot of years, and I never seen so a problem.

Nevertheless, I have added "charecter-set-server = utf8" under [mysqld] in /etc/mysql/my.cnf (not my.ini, which doesn't exist), restarted mysql service, restarted apache2 service, closed my Projeqtor session then reopen.
No change. The default remains.
So the problem is elsewhere, sorry.

(for info, "default-character-set" is deprecated)

The topic has been locked.

Jean-Luc
Topic Author
Offline
Senior Member

04 Jun 2014 10:48 #4 by Jean-Luc

Replied by Jean-Luc on topic UTF8 on display, not in database

I saw inside the database: the single character "é" is registered with 4 (four !) bytes, as “Ã©” UTF8-coded.

Thus there is a function in Projector which, in input, takes the external unicode string as an ANSI string then convert it to unicode !
And the inverse process runs when reading, so in front of Projector nothing wrong appears !

I repeat : this occurs on a PC with the whole system in UTF8, included general mysql :

Serveur: Localhost via UNIX socket
Version du serveur: 5.5.35-0ubuntu0.12.04.2
Version du protocole: 10
Utilisateur: xxxx@localhost
Jeu de caractères pour MySQL: UTF-8 Unicode (utf8)

… tested also in a remote server (OVH), almost same OS (local is Linux Mint 13, server is Ubuntu 12.04 server), but with Projeqtor 4.2.0. Exactly the same default !

The topic has been locked.

babynus
Offline
Administrator
Moderator

06 Jun 2014 16:08 #5 by babynus

Replied by babynus on topic UTF8 on display, not in database

If "é" is encoded woth 4 bytes in your database, then it is not in UTF8, but UTF16.
This may explain your issues.
I can confirm that with parameters as described in this post, caracters are correctly stored (this works fine in Demo / Track databases)

Babynus
Administrator of ProjeQtOr web site

The topic has been locked.

Jean-Luc
Topic Author
Offline
Senior Member

06 Jun 2014 16:53 #6 by Jean-Luc

Replied by Jean-Luc on topic UTF8 on display, not in database

babynus wrote: If "é" is encoded woth 4 bytes in your database, then it is not in UTF8, but UTF16.
This may explain your issues.
I can confirm that with parameters as described in this post, caracters are correctly stored (this works fine in Demo / Track databases)

No, no, no and no, sorry !

Please read carefully given explanations. I said : « the single character "é" is registered with 4 (four !) bytes, as “Ã©” UTF8-coded »

The four bytes are the two UTF-8 codes (2+2=4) for Ã and ©, nothing to do with an other system (and 4 bytes for true « é » would be UTF-32, not UTF-16(*) !)

So I confirm that storage is wrong !!!

(*) UTF-16 store all characters on 2 bytes, included ASCII, but can't store Unicode characters beyond Uxffff.

To be entirely clear:

hex values for "é":
in ANSI : E9
in UTF-8 : C3A9 (= ANSI string for "Ã©") That's what I should have as stored string.
in UTF-16 : 00E9
in UTF-32 : 000000E9

hex values for "Ã©"
in ANSI : C3A9
in UTF-8 : C385C2A9 <= I've THIS as stored string !!!!!
in UTF-16 : 00C300A9

The topic has been locked.

Moderators: babynus, protion

Time to create page: 0.039 seconds

UTF8 on display, wrong in database because twice converted (solved)

Cookies settings

Functional Cookies

Session

Other cookies

UTF8 on display, wrong in database because twice converted (solved)

Cookies settings

Functional Cookies

Session

Other cookies

Login Form