From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f68.google.com ([74.125.83.68]:37000 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752607AbdK1Qa5 (ORCPT ); Tue, 28 Nov 2017 11:30:57 -0500 Received: by mail-pg0-f68.google.com with SMTP id y6so149550pgp.4 for ; Tue, 28 Nov 2017 08:30:56 -0800 (PST) Message-ID: <1511886654.2822.6.camel@dubeyko.com> Subject: Re: [PATCH] hfsplus: fix decomposition of Hangul characters From: Viacheslav Dubeyko To: "Ernesto A." =?ISO-8859-1?Q?Fern=E1ndez?= Cc: linux-fsdevel@vger.kernel.org, tchou , htl10@users.sourceforge.net, Alexander Viro , Andrew Morton Date: Tue, 28 Nov 2017 08:30:54 -0800 In-Reply-To: <20171128150234.GA1268@debian.home> References: <1510906805-2142-1-git-send-email-tchou@synology.com> <20171119005704.GA3495@debian.home> <080024a85dc413b72c181c6e75bdc736@synology.com> <20171123113230.GA5581@debian.home> <1028889c01cdd513c2cf8cabf6c1762e@synology.com> <20171124114525.GA3265@debian.home> <510d6873cbcd4d93f0902c62e3a2c22d@synology.com> <20171127193652.GA3690@debian.home> <1511822453.2830.7.camel@slavad-ubuntu-14.04> <20171128150234.GA1268@debian.home> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, 2017-11-28 at 12:02 -0300, Ernesto A. Fernández wrote: > On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote: > > > > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote: > > > > > > It would be good to have some feedback on the use of code under > > > the > > > Unicode License, I don't know if that can be a problem. > > > > > > To Ting-Chang Hou: This patch is slightly different from the one > > > you > > > tested with your mac. You are still tagged as Tested-by, but you > > > may > > > want to check again that I have not broken anything. > > > > > > Thanks, > > > Ernest > > > > > > -- >8 -- > > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > > > > > Files created under macOS become essentially unusable under linux > > > if > > > their names contain any Hangul, and vice versa. > > > > > > This happens because the normalization of Hangul characters is a > > > special > > > case: it can be done algorithmically, without need of a table. > > > The > > > hfsplus module deals with Hangul correctly when composing, but > > > completely forgets about it when performing a decomposition. > > > > > > Solve this by using the Hangul decomposition function provided in > > > the > > > Unicode Standard. It's under the Unicode License, compatible with > > > the > > > GPL. > > > > > I believe it makes sense to share more details about algorithm > > details. > > I think that the Unicode standard contains some explanation how to > > decompose correctly. It will be great to have some brief citation > > in the > > patch description. > The algorithm is very simple, the best way to understand it is just > looking at the code. I don't know the first thing about Korean > writing, so > I don't think I should attempt to explain why the decomposition is > done > this way. If somebody else is interested in the details, they can > follow > the citation in the header comment of the decompose_hangul function. > You should share the link where it can be found. Finally, the comment needs in more detailed explanation. > > > > > > > > This patch will cause trouble for Hangul filenames already > > > created by > > > the module in the past. This shouldn't be a special concern > > > because the > > > main purpose of the module was always sharing with macOS. If a > > > user > > > actually needs to access such a file the nodecompose mount option > > > should > > > be enough. > > > > > > Reported-by: Ting-Chang Hou > > > Tested-by: Ting-Chang Hou > > > Signed-off-by: Ernesto A. Fernández > > com> > > > --- > > >  fs/hfsplus/unicode.c | 62 > > > +++++++++++++++++++++++++++++++++++++++++++++++----- > > >  1 file changed, 56 insertions(+), 6 deletions(-) > > > > > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > > index dfa90c2..7ceae91 100644 > > > --- a/fs/hfsplus/unicode.c > > > +++ b/fs/hfsplus/unicode.c > > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct > > > super_block *sb, const char *astr, int len, > > >   return size; > > >  } > > >   > > > -/* Decomposes a single unicode character. */ > > > -static inline u16 *decompose_unichar(wchar_t uc, int *size) > > > +/* Decomposes a non-Hangul unicode character. */ > > > +static u16 *decompose_nonhangul(wchar_t uc, int *size) > > >  { > > >   int off; > > >   > > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t > > > uc, int *size) > > >   return hfsplus_decompose_table + (off / 4); > > >  } > > >   > > > +/* > > > + * Decomposes a Hangul unicode character. > > > + * > > > + * This function was adapted from sample code in section 3.12 of > > > the > > > + * Unicode Standard, version 10.0. > > > + * > > > + * Copyright (C) 1991-2017 Unicode, Inc.  All rights > > > reserved.  Distributed > > > + * under the Terms of Use in http://www.unicode.org/copyright.ht > > > ml. > > > + */ > > > +static int decompose_hangul(wchar_t uc, u16 *result) > > > +{ > > > + int index; > > > + int l, v, t; > > > + > > > + index = uc - Hangul_SBase; > > > + if (index < 0 || index >= Hangul_SCount) > > > + return 0; > > > + > > > + l = Hangul_LBase + index / Hangul_NCount; > > > + v = Hangul_VBase + (index % Hangul_NCount) / > > > Hangul_TCount; > > > + t = Hangul_TBase + index % Hangul_TCount; > > > + > > > + result[0] = l; > > > + result[1] = v; > > > + if (t != Hangul_TBase) { > > > + result[2] = t; > > > + return 3; > > > + } > > > + return 2; > > > +} > > > + > > > +/* Decomposes a single unicode character. */ > > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16 > > > *hangul_buffer) > > > +{ > > > + u16 *result; > > > + > > > + /* Hangul is handled separately */ > > > + result = hangul_buffer; > > > + *size = decompose_hangul(uc, result); > > Is it possible to distinguish the different cases beforehand and to > > select one proper function for execution without the execution both > > ones > > in the worst case? > The decompose_hangul and decompose_nonhangul functions are called > only once, > so I would expect the compiler to inline them both. The result should > be > similar to your suggestion. > It will be much better to have: If (hangful) {     } else {     } Currently, the code looks weird. Especially, decompose_hangul() is able to distinguish two different cases. So, it's easy to extract the check. > > > > > > > > + if (*size == 0) > > > + /* not Hangul */ > > > + result = decompose_nonhangul(uc, size); > > I think it makes sense to use the brackets here. If I remember > > correctly, this is the code style's requirement. > I couldn't find such a requirement, but I did find some other places > within > the hfsplus code where the comments are done my way. For example, in > the > hfsplus_cat_write_inode() function of the inode.c file. > The hfsplus code contains many awful places. It doesn't mean that it needs to follow the bad style examples. If you have only one string: if (condition)     then it's OK not to use the brackets. Otherwise, you should include the branch into the brackets. Thanks, Vyacheslav Dubeyko.