All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
@ 2014-06-04  7:52 Brian Norris
  2014-06-04  7:52 ` [PATCH 1/2] kconfig: lxdialog: fix spelling Brian Norris
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Brian Norris @ 2014-06-04  7:52 UTC (permalink / raw)
  To: linux-kbuild; +Cc: Brian Norris, Yann E. MORIN, Artem Bityutskiy

Hi,

The first patch is trivial.

The second is inspired by a long-standing bugzilla entry:

  https://bugzilla.kernel.org/show_bug.cgi?id=43067

The MTD_NAND_CAFE Kconfig symbol (drivers/mtd/nand/Kconfig) has description
text which uses a multi-byte UTF-8 character: the 'É' in 'CAFÉ'. This
character (and other similar >8bit UTF-8 characters) is not handled
correctly by many of the kernel configuration tools (notably 'make nconfig'
and 'make xconfig'). nconfig was especially broken, as it would completely
drop any menu entry which had non-ASCII characters, as well as ALL
subsequent entries in the same window (!!).

The fix for nconfig is to allow linking against the "wide" ncurses library.
I did not bother learning QT well enough to fix 'make xconfig'; it still
appears broken w.r.t. wide characters, and makes liberal use of the
QString::latin1() conversion for potentially non-Latin strings.

Notably, this issue is not very obvious for the common user. For instance,
on Ubuntu one might install libncurses5-dev, which is sufficient for getting
'menuconfig' to compile/link/run just fine. It is easy to miss the fact that
unicode handling is incorrect, because the behavior is undefined (usually
just chunk characters, but nconfig just silently drops data), and nothing
informs them that they should have installed libncursesw5-dev instead.

Ideally, we could drop support for linking against legacy ncurses, and
instead require ncursesw, but that might be painful to enforce for all users
(i.e., nearly everyone who configures kernels). I welcome any thoughts on
improving this state for others (like me, for a long time) who don't realize
that they should install the ncursesw development package in order to get
21st century support for unicode help text.

Brian

Brian Norris (2):
  kconfig: lxdialog: fix spelling
  kconfig: nconfig: fix multi-byte UTF handling

 scripts/kconfig/Makefile          |    3 ++-
 scripts/kconfig/lxdialog/dialog.h |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] kconfig: lxdialog: fix spelling
  2014-06-04  7:52 [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Brian Norris
@ 2014-06-04  7:52 ` Brian Norris
  2014-06-04  7:52 ` [PATCH 2/2] kconfig: nconfig: fix multi-byte UTF handling Brian Norris
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Brian Norris @ 2014-06-04  7:52 UTC (permalink / raw)
  To: linux-kbuild; +Cc: Brian Norris, Yann E. MORIN

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: linux-kbuild@vger.kernel.org
---
 scripts/kconfig/lxdialog/dialog.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/lxdialog/dialog.h b/scripts/kconfig/lxdialog/dialog.h
index b4343d384926..fcffd5b41fb0 100644
--- a/scripts/kconfig/lxdialog/dialog.h
+++ b/scripts/kconfig/lxdialog/dialog.h
@@ -170,7 +170,7 @@ char item_tag(void);
 /* item list manipulation for lxdialog use */
 #define MAXITEMSTR 200
 struct dialog_item {
-	char str[MAXITEMSTR];	/* promtp displayed */
+	char str[MAXITEMSTR];	/* prompt displayed */
 	char tag;
 	void *data;	/* pointer to menu item - used by menubox+checklist */
 	int selected;	/* Set to 1 by dialog_*() function if selected. */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] kconfig: nconfig: fix multi-byte UTF handling
  2014-06-04  7:52 [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Brian Norris
  2014-06-04  7:52 ` [PATCH 1/2] kconfig: lxdialog: fix spelling Brian Norris
@ 2014-06-04  7:52 ` Brian Norris
  2014-06-06 13:18   ` Sam Ravnborg
  2014-06-05  1:53 ` [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Martin Walch
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Brian Norris @ 2014-06-04  7:52 UTC (permalink / raw)
  To: linux-kbuild; +Cc: Brian Norris, Artem Bityutskiy, Yann E. MORIN, Martin Walch

Currently, Kconfig descriptions that use multi-byte UTF-8 characters
(such as MTD_NAND_CAFE) will have their menu entries dropped from the
'make nconfig' ncurses menu, and all subsequent entries in the same
window will be omitted. This seems to be due to the ncurses 'menu'
library, which does not traditionally handle UTF-8 >8-bit characters
properly.

The ncursesw library ('w' is for "wide") is written to handle these
UTF-8 characters, and is practically a drop-in replacement at the source
level. Use it by default, if available.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=43067
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: linux-kbuild@vger.kernel.org
Cc: Martin Walch <walch.martin@web.de>
---
 scripts/kconfig/Makefile |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index 844bc9da08da..0171566cc12a 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -220,7 +220,8 @@ HOSTCFLAGS_gconf.o	= `pkg-config --cflags gtk+-2.0 gmodule-2.0 libglade-2.0` \
 HOSTLOADLIBES_mconf   = $(shell $(CONFIG_SHELL) $(check-lxdialog) -ldflags $(HOSTCC))
 
 HOSTLOADLIBES_nconf	= $(shell \
-				pkg-config --libs menu panel ncurses 2>/dev/null \
+				pkg-config --libs menuw panelw ncursesw 2>/dev/null \
+				|| pkg-config --libs menu panel ncurses 2>/dev/null \
 				|| echo "-lmenu -lpanel -lncurses"  )
 $(obj)/qconf.o: $(obj)/.tmp_qtcheck
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-06-04  7:52 [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Brian Norris
  2014-06-04  7:52 ` [PATCH 1/2] kconfig: lxdialog: fix spelling Brian Norris
  2014-06-04  7:52 ` [PATCH 2/2] kconfig: nconfig: fix multi-byte UTF handling Brian Norris
@ 2014-06-05  1:53 ` Martin Walch
  2014-06-06 13:16   ` Sam Ravnborg
  2014-06-06 13:17 ` Sam Ravnborg
  2014-07-10  8:52 ` Brian Norris
  4 siblings, 1 reply; 11+ messages in thread
From: Martin Walch @ 2014-06-05  1:53 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-kbuild, Yann E. MORIN, Artem Bityutskiy

On Wednesday 04 June 2014 00:52:29 Brian Norris wrote:
> The second is inspired by a long-standing bugzilla entry:
> 
>   https://bugzilla.kernel.org/show_bug.cgi?idC067
> 
> The MTD_NAND_CAFE Kconfig symbol (drivers/mtd/nand/Kconfig) has description
> text which uses a multi-byte UTF-8 character: the 'É' in 'CAFÉ'. This
> character (and other similar >8bit UTF-8 characters) is not handled
> correctly by many of the kernel configuration tools (notably 'make nconfig'
> and 'make xconfig'). nconfig was especially broken, as it would completely
> drop any menu entry which had non-ASCII characters, as well as ALL
> subsequent entries in the same window (!!).

Hi,

so far I have not seen any solid hint that the configuration system was
designed with support for anything beyond 7 bit ASCII characters in mind.
Except some "of course we use UTF-8 for everything in the 21st century"
ranting, I have also not seen any commonly accepted decision that it should
use any other character set.

Currently there are 14145 symbols in the mainline kernel and I know of only
two that do not use exclusively 7 bit ASCII characters. One is MTD_NAND_CAFE
which prompts with "NAND support for OLPC CAFÉ chip" and reads in the help
text "Use NAND flash attached to the CAFÉ chip designed for the OLPC
laptop.", the other one is HID_XINMO, which has a UTF-8 "no brake space" in
the help text "[..]Say Y here[..]" (after the Y). I guess the latter one
is only accidentally there.

One reason for this is probably that there is currently no reliable UTF-8
support in the configuration system. Of course, this does not answer the
question whether Kconfig files should accept UTF-8 characters or not.

IMO such a change (use UTF-8) should be consented by a wide audience, because
it affects every user of the configuration system, and in particular every
kernel developer.

As I am no expert for character encoding, please correct me if I am wrong
with anything of the following.

While I think that using UTF-8 is often a good idea, I also think that it is
a bad idea to just hack UTF-8 support into the configuration system without
careful consideration and code review: ASCII is a least common denominator
that is compatible with most character sets in regular use. Currently it
hardly matters what character encoding the terminal uses and what the
font supports as long as it is 7 bit ASCII compatible.

As far as I see, deciding for UTF-8 is an "all-in" thing. It is not feasible
to then allow anything beside UTF-8. This will force any user to use a
terminal and a font that support UTF-8.

For UTF-8 support, the whole code base of the configuration system should
be revisited, because as far as I know it currently makes in some places the
assumption that the size of one character equals sizeof(char), although most
of the time this will not hurt.

Furthermore, consistent UTF-8 support is hard with flex as it does not really
support wide characters. Of course you can make flex accept them, but a
16 bit character will be treated as two 8 bit characters. In flex, this is
probably not too much of a drawback, but it is ugly.

Assumed that UTF-8 is the preferred character encoding, where should this
apply? Only in help texts? Also in comments and in menu prompts? How about
expansion variables? Default values? Symbol names? (the latter would force
the C preprocessor to use that character set, which will probably not happen)

Anyway, I think it would help to have a clear specification (i.e. a
documented decision), no matter if with or without UTF-8.

Regards,
Martin Walch
-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-06-05  1:53 ` [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Martin Walch
@ 2014-06-06 13:16   ` Sam Ravnborg
  0 siblings, 0 replies; 11+ messages in thread
From: Sam Ravnborg @ 2014-06-06 13:16 UTC (permalink / raw)
  To: Martin Walch; +Cc: Brian Norris, linux-kbuild, Yann E. MORIN, Artem Bityutskiy

> 
> Assumed that UTF-8 is the preferred character encoding, where should this
> apply? Only in help texts? Also in comments and in menu prompts?

UTF-8 should be supported in all display texts.
So help + prompts.

The Kconfig type "string" is not UTF-8 enabled.

Any help in audit Kconfig for this - and testing too - would be appreciated.

	Sam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-06-04  7:52 [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Brian Norris
                   ` (2 preceding siblings ...)
  2014-06-05  1:53 ` [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Martin Walch
@ 2014-06-06 13:17 ` Sam Ravnborg
  2014-07-10  8:52 ` Brian Norris
  4 siblings, 0 replies; 11+ messages in thread
From: Sam Ravnborg @ 2014-06-06 13:17 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-kbuild, Yann E. MORIN, Artem Bityutskiy

On Wed, Jun 04, 2014 at 12:52:29AM -0700, Brian Norris wrote:
> Hi,
> 
> The first patch is trivial.
> 
> The second is inspired by a long-standing bugzilla entry:
> 
>   https://bugzilla.kernel.org/show_bug.cgi?id=43067
> 
> The MTD_NAND_CAFE Kconfig symbol (drivers/mtd/nand/Kconfig) has description
> text which uses a multi-byte UTF-8 character: the 'É' in 'CAFÉ'. This
> character (and other similar >8bit UTF-8 characters) is not handled
> correctly by many of the kernel configuration tools (notably 'make nconfig'
> and 'make xconfig'). nconfig was especially broken, as it would completely
> drop any menu entry which had non-ASCII characters, as well as ALL
> subsequent entries in the same window (!!).
Thanks for fixing this.

	Sam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] kconfig: nconfig: fix multi-byte UTF handling
  2014-06-04  7:52 ` [PATCH 2/2] kconfig: nconfig: fix multi-byte UTF handling Brian Norris
@ 2014-06-06 13:18   ` Sam Ravnborg
  0 siblings, 0 replies; 11+ messages in thread
From: Sam Ravnborg @ 2014-06-06 13:18 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-kbuild, Artem Bityutskiy, Yann E. MORIN, Martin Walch

On Wed, Jun 04, 2014 at 12:52:31AM -0700, Brian Norris wrote:
> Currently, Kconfig descriptions that use multi-byte UTF-8 characters
> (such as MTD_NAND_CAFE) will have their menu entries dropped from the
> 'make nconfig' ncurses menu, and all subsequent entries in the same
> window will be omitted. This seems to be due to the ncurses 'menu'
> library, which does not traditionally handle UTF-8 >8-bit characters
> properly.
> 
> The ncursesw library ('w' is for "wide") is written to handle these
> UTF-8 characters, and is practically a drop-in replacement at the source
> level. Use it by default, if available.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=43067
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: linux-kbuild@vger.kernel.org
> Cc: Martin Walch <walch.martin@web.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-06-04  7:52 [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Brian Norris
                   ` (3 preceding siblings ...)
  2014-06-06 13:17 ` Sam Ravnborg
@ 2014-07-10  8:52 ` Brian Norris
  2014-08-20 16:40   ` Brian Norris
  4 siblings, 1 reply; 11+ messages in thread
From: Brian Norris @ 2014-07-10  8:52 UTC (permalink / raw)
  To: linux-kbuild; +Cc: Brian Norris, Yann E. MORIN, Artem Bityutskiy, Sam Ravnborg

Hi Yan,

Ping? The patches are simple.

On Wed, Jun 4, 2014 at 12:52 AM, Brian Norris
<computersforpeace@gmail.com> wrote:
> Hi,
>
> The first patch is trivial.
>
> The second is inspired by a long-standing bugzilla entry:
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=43067
>
> The MTD_NAND_CAFE Kconfig symbol (drivers/mtd/nand/Kconfig) has description
> text which uses a multi-byte UTF-8 character: the 'É' in 'CAFÉ'. This
> character (and other similar >8bit UTF-8 characters) is not handled
> correctly by many of the kernel configuration tools (notably 'make nconfig'
> and 'make xconfig'). nconfig was especially broken, as it would completely
> drop any menu entry which had non-ASCII characters, as well as ALL
> subsequent entries in the same window (!!).
>
> The fix for nconfig is to allow linking against the "wide" ncurses library.
> I did not bother learning QT well enough to fix 'make xconfig'; it still
> appears broken w.r.t. wide characters, and makes liberal use of the
> QString::latin1() conversion for potentially non-Latin strings.
>
> Notably, this issue is not very obvious for the common user. For instance,
> on Ubuntu one might install libncurses5-dev, which is sufficient for getting
> 'menuconfig' to compile/link/run just fine. It is easy to miss the fact that
> unicode handling is incorrect, because the behavior is undefined (usually
> just chunk characters, but nconfig just silently drops data), and nothing
> informs them that they should have installed libncursesw5-dev instead.
>
> Ideally, we could drop support for linking against legacy ncurses, and
> instead require ncursesw, but that might be painful to enforce for all users
> (i.e., nearly everyone who configures kernels). I welcome any thoughts on
> improving this state for others (like me, for a long time) who don't realize
> that they should install the ncursesw development package in order to get
> 21st century support for unicode help text.
>
> Brian
>
> Brian Norris (2):
>   kconfig: lxdialog: fix spelling
>   kconfig: nconfig: fix multi-byte UTF handling
>
>  scripts/kconfig/Makefile          |    3 ++-
>  scripts/kconfig/lxdialog/dialog.h |    2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-07-10  8:52 ` Brian Norris
@ 2014-08-20 16:40   ` Brian Norris
  2014-08-22 11:02     ` Michal Marek
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Norris @ 2014-08-20 16:40 UTC (permalink / raw)
  To: Yann E. MORIN; +Cc: Artem Bityutskiy, Sam Ravnborg, linux-kbuild

Hi Yan,

On Thu, Jul 10, 2014 at 01:52:29AM -0700, Brian Norris wrote:
> Ping? The patches are simple.

Ping.

> On Wed, Jun 4, 2014 at 12:52 AM, Brian Norris
> <computersforpeace@gmail.com> wrote:
> > Hi,
> >
> > The first patch is trivial.
> >
> > The second is inspired by a long-standing bugzilla entry:
> >
> >   https://bugzilla.kernel.org/show_bug.cgi?id=43067
> >
> > The MTD_NAND_CAFE Kconfig symbol (drivers/mtd/nand/Kconfig) has description
> > text which uses a multi-byte UTF-8 character: the 'É' in 'CAFÉ'. This
> > character (and other similar >8bit UTF-8 characters) is not handled
> > correctly by many of the kernel configuration tools (notably 'make nconfig'
> > and 'make xconfig'). nconfig was especially broken, as it would completely
> > drop any menu entry which had non-ASCII characters, as well as ALL
> > subsequent entries in the same window (!!).
> >
> > The fix for nconfig is to allow linking against the "wide" ncurses library.
> > I did not bother learning QT well enough to fix 'make xconfig'; it still
> > appears broken w.r.t. wide characters, and makes liberal use of the
> > QString::latin1() conversion for potentially non-Latin strings.
> >
> > Notably, this issue is not very obvious for the common user. For instance,
> > on Ubuntu one might install libncurses5-dev, which is sufficient for getting
> > 'menuconfig' to compile/link/run just fine. It is easy to miss the fact that
> > unicode handling is incorrect, because the behavior is undefined (usually
> > just chunk characters, but nconfig just silently drops data), and nothing
> > informs them that they should have installed libncursesw5-dev instead.
> >
> > Ideally, we could drop support for linking against legacy ncurses, and
> > instead require ncursesw, but that might be painful to enforce for all users
> > (i.e., nearly everyone who configures kernels). I welcome any thoughts on
> > improving this state for others (like me, for a long time) who don't realize
> > that they should install the ncursesw development package in order to get
> > 21st century support for unicode help text.
> >
> > Brian
> >
> > Brian Norris (2):
> >   kconfig: lxdialog: fix spelling
> >   kconfig: nconfig: fix multi-byte UTF handling
> >
> >  scripts/kconfig/Makefile          |    3 ++-
> >  scripts/kconfig/lxdialog/dialog.h |    2 +-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > --
> > 1.7.9.5
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-08-20 16:40   ` Brian Norris
@ 2014-08-22 11:02     ` Michal Marek
  2014-08-24  5:17       ` Brian Norris
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Marek @ 2014-08-22 11:02 UTC (permalink / raw)
  To: Brian Norris, Yann E. MORIN; +Cc: Artem Bityutskiy, Sam Ravnborg, linux-kbuild

Dne 20.8.2014 18:40, Brian Norris napsal(a):
> Hi Yan,
> 
> On Thu, Jul 10, 2014 at 01:52:29AM -0700, Brian Norris wrote:
>> Ping? The patches are simple.
> 
> Ping.

I applied the patches to my kbuild tree.

Michal


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig
  2014-08-22 11:02     ` Michal Marek
@ 2014-08-24  5:17       ` Brian Norris
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Norris @ 2014-08-24  5:17 UTC (permalink / raw)
  To: Michal Marek; +Cc: Yann E. MORIN, Artem Bityutskiy, Sam Ravnborg, linux-kbuild

On Fri, Aug 22, 2014 at 01:02:40PM +0200, Michal Marek wrote:
> Dne 20.8.2014 18:40, Brian Norris napsal(a):
> > Hi Yan,
> > 
> > On Thu, Jul 10, 2014 at 01:52:29AM -0700, Brian Norris wrote:
> >> Ping? The patches are simple.
> > 
> > Ping.
> 
> I applied the patches to my kbuild tree.

Great, thanks!

Brian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-08-24  5:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04  7:52 [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Brian Norris
2014-06-04  7:52 ` [PATCH 1/2] kconfig: lxdialog: fix spelling Brian Norris
2014-06-04  7:52 ` [PATCH 2/2] kconfig: nconfig: fix multi-byte UTF handling Brian Norris
2014-06-06 13:18   ` Sam Ravnborg
2014-06-05  1:53 ` [PATCH 0/2] kconfig: fix multi-byte UTF handling in nconfig Martin Walch
2014-06-06 13:16   ` Sam Ravnborg
2014-06-06 13:17 ` Sam Ravnborg
2014-07-10  8:52 ` Brian Norris
2014-08-20 16:40   ` Brian Norris
2014-08-22 11:02     ` Michal Marek
2014-08-24  5:17       ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.