linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <slava@dubeyko.com>
To: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, tchou <tchou@synology.com>,
	htl10@users.sourceforge.net,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] hfsplus: fix decomposition of Hangul characters
Date: Tue, 28 Nov 2017 08:30:54 -0800	[thread overview]
Message-ID: <1511886654.2822.6.camel@dubeyko.com> (raw)
In-Reply-To: <20171128150234.GA1268@debian.home>

On Tue, 2017-11-28 at 12:02 -0300, Ernesto A. Fernández wrote:
> On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote:
> > 
> > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote:
> > > 
> > > It would be good to have some feedback on the use of code under
> > > the
> > > Unicode License, I don't know if that can be a problem.
> > > 
> > > To Ting-Chang Hou: This patch is slightly different from the one
> > > you
> > > tested with your mac. You are still tagged as Tested-by, but you
> > > may
> > > want to check again that I have not broken anything.
> > > 
> > > Thanks,
> > > Ernest
> > > 
> > > -- >8 --
> > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters
> > > 
> > > Files created under macOS become essentially unusable under linux
> > > if
> > > their names contain any Hangul, and vice versa.
> > > 
> > > This happens because the normalization of Hangul characters is a
> > > special
> > > case: it can be done algorithmically, without need of a table.
> > > The
> > > hfsplus module deals with Hangul correctly when composing, but
> > > completely forgets about it when performing a decomposition.
> > > 
> > > Solve this by using the Hangul decomposition function provided in
> > > the
> > > Unicode Standard. It's under the Unicode License, compatible with
> > > the
> > > GPL.
> > > 
> > I believe it makes sense to share more details about algorithm
> > details.
> > I think that the Unicode standard contains some explanation how to
> > decompose correctly. It will be great to have some brief citation
> > in the
> > patch description.
> The algorithm is very simple, the best way to understand it is just
> looking at the code. I don't know the first thing about Korean
> writing, so
> I don't think I should attempt to explain why the decomposition is
> done
> this way. If somebody else is interested in the details, they can
> follow
> the citation in the header comment of the decompose_hangul function.
> 

You should share the link where it can be found. Finally, the comment
needs in more detailed explanation.


> > 
> > > 
> > > This patch will cause trouble for Hangul filenames already
> > > created by
> > > the module in the past. This shouldn't be a special concern
> > > because the
> > > main purpose of the module was always sharing with macOS. If a
> > > user
> > > actually needs to access such a file the nodecompose mount option
> > > should
> > > be enough.
> > > 
> > > Reported-by: Ting-Chang Hou <tchou@synology.com>
> > > Tested-by: Ting-Chang Hou <tchou@synology.com>
> > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.
> > > com>
> > > ---
> > >  fs/hfsplus/unicode.c | 62
> > > +++++++++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 56 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
> > > index dfa90c2..7ceae91 100644
> > > --- a/fs/hfsplus/unicode.c
> > > +++ b/fs/hfsplus/unicode.c
> > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct
> > > super_block *sb, const char *astr, int len,
> > >  	return size;
> > >  }
> > >  
> > > -/* Decomposes a single unicode character. */
> > > -static inline u16 *decompose_unichar(wchar_t uc, int *size)
> > > +/* Decomposes a non-Hangul unicode character. */
> > > +static u16 *decompose_nonhangul(wchar_t uc, int *size)
> > >  {
> > >  	int off;
> > >  
> > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t
> > > uc, int *size)
> > >  	return hfsplus_decompose_table + (off / 4);
> > >  }
> > >  
> > > +/*
> > > + * Decomposes a Hangul unicode character.
> > > + *
> > > + * This function was adapted from sample code in section 3.12 of
> > > the
> > > + * Unicode Standard, version 10.0.
> > > + *
> > > + * Copyright (C) 1991-2017 Unicode, Inc.  All rights
> > > reserved.  Distributed
> > > + * under the Terms of Use in http://www.unicode.org/copyright.ht
> > > ml.
> > > + */
> > > +static int decompose_hangul(wchar_t uc, u16 *result)
> > > +{
> > > +	int index;
> > > +	int l, v, t;
> > > +
> > > +	index = uc - Hangul_SBase;
> > > +	if (index < 0 || index >= Hangul_SCount)
> > > +		return 0;
> > > +
> > > +	l = Hangul_LBase + index / Hangul_NCount;
> > > +	v = Hangul_VBase + (index % Hangul_NCount) /
> > > Hangul_TCount;
> > > +	t = Hangul_TBase + index % Hangul_TCount;
> > > +
> > > +	result[0] = l;
> > > +	result[1] = v;
> > > +	if (t != Hangul_TBase) {
> > > +		result[2] = t;
> > > +		return 3;
> > > +	}
> > > +	return 2;
> > > +}
> > > +
> > > +/* Decomposes a single unicode character. */
> > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16
> > > *hangul_buffer)
> > > +{
> > > +	u16 *result;
> > > +
> > > +	/* Hangul is handled separately */
> > > +	result = hangul_buffer;
> > > +	*size = decompose_hangul(uc, result);
> > Is it possible to distinguish the different cases beforehand and to
> > select one proper function for execution without the execution both
> > ones
> > in the worst case?
> The decompose_hangul and decompose_nonhangul functions are called
> only once,
> so I would expect the compiler to inline them both. The result should
> be
> similar to your suggestion.
> 


It will be much better to have:

If (hangful) {
    <decompose_hangul>
} else {
    <decompose_nonhangul>
}

Currently, the code looks weird. Especially, decompose_hangul() is able
to distinguish two different cases. So, it's easy to extract the check.

> > 
> > > 
> > > +	if (*size == 0)
> > > +		/* not Hangul */
> > > +		result = decompose_nonhangul(uc, size);
> > I think it makes sense to use the brackets here. If I remember
> > correctly, this is the code style's requirement.
> I couldn't find such a requirement, but I did find some other places
> within
> the hfsplus code where the comments are done my way. For example, in
> the
> hfsplus_cat_write_inode() function of the inode.c file.
> 


The hfsplus code contains many awful places. It doesn't mean that it
needs to follow the bad style examples. If you have only one string:

if (condition)
    <do_something>

then it's OK not to use the brackets. Otherwise, you should include the
branch into the brackets.

Thanks,
Vyacheslav Dubeyko.

  reply	other threads:[~2017-11-28 16:30 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-17  8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou
2017-11-19  0:57 ` Ernesto A. Fernández
2017-11-23  3:57   ` tchou
2017-11-23  4:21     ` Viacheslav Dubeyko
2017-11-23  6:05       ` tchou
2017-11-23  6:23         ` Viacheslav Dubeyko
2017-11-23  6:34           ` tchou
2017-11-23 11:32     ` Ernesto A. Fernández
2017-11-23 18:36       ` Viacheslav Dubeyko
2017-11-23 22:20         ` Ernesto A. Fernández
2017-11-24  7:25       ` tchou
2017-11-24 11:45         ` Ernesto A. Fernández
2017-11-27  2:07           ` tchou
2017-11-27 19:36             ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández
2017-11-27 22:40               ` Viacheslav Dubeyko
2017-11-28 15:02                 ` Ernesto A. Fernández
2017-11-28 16:30                   ` Viacheslav Dubeyko [this message]
2017-11-28 18:15                     ` Ernesto A. Fernández
2018-08-23 18:29               ` Ernesto A. Fernández
2018-08-24  1:20                 ` tchou
     [not found] <1999104027.7153972.1511885718712.ref@mail.yahoo.com>
2017-11-28 16:15 ` Hin-Tak Leung
2017-11-28 17:50   ` Viacheslav Dubeyko
2017-11-28 18:51   ` Ernesto A. Fernández
     [not found] <114108759.7167494.1511886580033.ref@mail.yahoo.com>
2017-11-28 16:29 ` Hin-Tak Leung
     [not found] <1047295440.7535049.1511905720846.ref@mail.yahoo.com>
2017-11-28 21:48 ` Hin-Tak Leung
2017-11-28 23:10   ` Ernesto A. Fernández
     [not found] <1691800505.7838608.1511945866988.ref@mail.yahoo.com>
2017-11-29  8:57 ` Hin-Tak Leung
2017-11-29 14:42   ` Ernesto A. Fernández
2017-11-29 16:28     ` Viacheslav Dubeyko
2017-11-30  4:03       ` Ernesto A. Fernández
     [not found] <748914583.7912009.1511948801051.ref@mail.yahoo.com>
2017-11-29  9:46 ` Hin-Tak Leung
2018-07-17 22:09 Ernesto A. Fernández

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1511886654.2822.6.camel@dubeyko.com \
    --to=slava@dubeyko.com \
    --cc=akpm@linux-foundation.org \
    --cc=ernesto.mnd.fernandez@gmail.com \
    --cc=htl10@users.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tchou@synology.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).