From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37215) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aB4aY-0001DU-5K for qemu-devel@nongnu.org; Mon, 21 Dec 2015 12:49:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aB4aU-00014w-Us for qemu-devel@nongnu.org; Mon, 21 Dec 2015 12:49:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45349) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aB4aU-00014r-Md for qemu-devel@nongnu.org; Mon, 21 Dec 2015 12:49:34 -0500 References: <1441898372-32679-1-git-send-email-berto@igalia.com> <20151218113844.GA31910@noname.redhat.com> <1450445013.15674.38.camel@redhat.com> <878u4ry1jw.fsf@blackfin.pond.sub.org> From: Eric Blake Message-ID: <56783BA7.7030700@redhat.com> Date: Mon, 21 Dec 2015 10:49:27 -0700 MIME-Version: 1.0 In-Reply-To: <878u4ry1jw.fsf@blackfin.pond.sub.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="iFjEHDsNKUbBFkNiXnbpRWg4LriFq9K9X" Subject: Re: [Qemu-devel] [PATCH] gtk: use setlocale() for LC_MESSAGES only List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster , Alberto Garcia Cc: Kevin Wolf , Gerd Hoffmann , qemu-devel@nongnu.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --iFjEHDsNKUbBFkNiXnbpRWg4LriFq9K9X Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/18/2015 12:55 PM, Markus Armbruster wrote: > Alberto Garcia writes: >=20 >>>>> We do however have translations for a few simple strings for the GT= K+ >>>>> menu items, so in order to run QEMU using the C locale, and yet hav= e a >>>>> translated UI let's use setlocale() for LC_MESSAGES only. >>>>> >>>> Not sure why I noticed it only now and if it's related to any recent= >>>> package upgrade on my side (using RHEL 7), but I noticed that >>>> non-ASCII characters in the GTK UI strings are broken for me and git= >>>> bisect pointed to this commit. >>> >>> I guess we need to set LC_CTYPE too. >> >> That affects functions in ctype.h (isalpha(), islower(), isupper(), ..= =2E) >> I guess that's safe? Gnulib introduces functions named c_isalpha(), c_islower(), and so forth, which behave identically regardless of the current locale, precisely because locale-dependent definitions on which byte sequences form a valid character can cause undesirable behavior. I don't know if glib does the same, but it does indeed have the potential to affect us, in at least util/id.c:id_wellformed(). It would be weird to let the user's choice of locale determine which ids they can create. >=20 > If we're guessing, then I guess it isn't. But we shouldn't be guessing= =2E >=20 > "LC_CTYPE affects the behavior of the character handling functions and > the multibyte and wide character functions." >=20 > I doubt there's much use for the latter in QEMU itself, but in > libraries, all bets are off. I guess this is what actually screws up > GTK. >=20 > We do use the former. LC_CTYPE set to some sufficiently funky locale i= s > bound to upset these uses. >=20 > In short: nope, we can't just set LC_CTYPE, at least not without furthe= r > analysis. In fact, if LC_CTYPE and LC_COLLATE are incompatible, then strcoll() has undefined behavior. GNU coreutils warns: Unless otherwise specified, all comparisons use the character collating sequence specified by the =E2=80=98LC_COLLATE=E2=80=99 loca= le.(1) [...] (1) If you use a non-POSIX locale (e.g., by setting =E2=80=98LC_ALL=E2= =80=99 to =E2=80=98en_US=E2=80=99), then =E2=80=98sort=E2=80=99 may produce out= put that is sorted differently than you=E2=80=99re accustomed to. In that case, set the =E2=80=98LC_ALL=E2= =80=99 environment variable to =E2=80=98C=E2=80=99. Note that setting only =E2=80=98LC_= COLLATE=E2=80=99 has two problems. First, it is ineffective if =E2=80=98LC_ALL=E2=80=99 is also set. Se= cond, it has undefined behavior if =E2=80=98LC_CTYPE=E2=80=99 (or =E2=80=98LANG=E2= =80=99, if =E2=80=98LC_CTYPE=E2=80=99 is unset) is set to an incompatible value. For example, you get undefined behavio= r if =E2=80=98LC_CTYPE=E2=80=99 is =E2=80=98ja_JP.PCK=E2=80=99 but =E2=80= =98LC_COLLATE=E2=80=99 is =E2=80=98en_US.UTF-8=E2=80=99. Off-hand, we are specifically NOT calling setlocale() for the categories that we want to leave in the C locale, so we don't have to worry about LC_ALL throwing us off. And I'm hard-pressed to think of an example where LC_COLLATE=3DC while LC_CTYPE is a multibyte character will cause unusual sorting artifacts (the one that coreutils is warning against is when you have two incompatibly different multibyte character sets involved, where our case is a multibyte character set for display but a unibyte set for collation). But it is indeed a can of worms, that requires special analysis. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --iFjEHDsNKUbBFkNiXnbpRWg4LriFq9K9X Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWeDunAAoJEKeha0olJ0Nq6yYH/2VmuiqbUT+OunsOUw5tWwuT byvRO0B77+7piR/kOaxDG7athT0eU34P/z8P6KrYH5S1yR8mruv/FypJMp91Sp11 ycrBOCDfg90k9WOr0Uq+qLOgqbrOkJqsxoRPXiLPZ35QxCFTkGowS6jG2oJRVjLu Xj2DHcbYWrjTlWhGdMIO9FUlJdlXkOxb95u7en6wFn3/uSSJlMHIj0/P3EnPWXjB s9GsEjJYsXNrELJzxAiWvOlS1ZvkKL4zJ1zEKuV9VmIVDhrq4/TAgMM6+MQhUumw Kr+lSF7+F3uex87jwMPQE08xIcLyGsnGfaqwsly7vKwL5TDUwQigSVHeyaTTiAg= =y/QS -----END PGP SIGNATURE----- --iFjEHDsNKUbBFkNiXnbpRWg4LriFq9K9X--