From: Viacheslav Dubeyko <slava@dubeyko.com>
To: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, tchou <tchou@synology.com>,
htl10@users.sourceforge.net,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] hfsplus: fix decomposition of Hangul characters
Date: Tue, 28 Nov 2017 08:30:54 -0800 [thread overview]
Message-ID: <1511886654.2822.6.camel@dubeyko.com> (raw)
In-Reply-To: <20171128150234.GA1268@debian.home>
On Tue, 2017-11-28 at 12:02 -0300, Ernesto A. Fernández wrote:
> On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote:
> >
> > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote:
> > >
> > > It would be good to have some feedback on the use of code under
> > > the
> > > Unicode License, I don't know if that can be a problem.
> > >
> > > To Ting-Chang Hou: This patch is slightly different from the one
> > > you
> > > tested with your mac. You are still tagged as Tested-by, but you
> > > may
> > > want to check again that I have not broken anything.
> > >
> > > Thanks,
> > > Ernest
> > >
> > > -- >8 --
> > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters
> > >
> > > Files created under macOS become essentially unusable under linux
> > > if
> > > their names contain any Hangul, and vice versa.
> > >
> > > This happens because the normalization of Hangul characters is a
> > > special
> > > case: it can be done algorithmically, without need of a table.
> > > The
> > > hfsplus module deals with Hangul correctly when composing, but
> > > completely forgets about it when performing a decomposition.
> > >
> > > Solve this by using the Hangul decomposition function provided in
> > > the
> > > Unicode Standard. It's under the Unicode License, compatible with
> > > the
> > > GPL.
> > >
> > I believe it makes sense to share more details about algorithm
> > details.
> > I think that the Unicode standard contains some explanation how to
> > decompose correctly. It will be great to have some brief citation
> > in the
> > patch description.
> The algorithm is very simple, the best way to understand it is just
> looking at the code. I don't know the first thing about Korean
> writing, so
> I don't think I should attempt to explain why the decomposition is
> done
> this way. If somebody else is interested in the details, they can
> follow
> the citation in the header comment of the decompose_hangul function.
>
You should share the link where it can be found. Finally, the comment
needs in more detailed explanation.
> >
> > >
> > > This patch will cause trouble for Hangul filenames already
> > > created by
> > > the module in the past. This shouldn't be a special concern
> > > because the
> > > main purpose of the module was always sharing with macOS. If a
> > > user
> > > actually needs to access such a file the nodecompose mount option
> > > should
> > > be enough.
> > >
> > > Reported-by: Ting-Chang Hou <tchou@synology.com>
> > > Tested-by: Ting-Chang Hou <tchou@synology.com>
> > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.
> > > com>
> > > ---
> > > fs/hfsplus/unicode.c | 62
> > > +++++++++++++++++++++++++++++++++++++++++++++++-----
> > > 1 file changed, 56 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
> > > index dfa90c2..7ceae91 100644
> > > --- a/fs/hfsplus/unicode.c
> > > +++ b/fs/hfsplus/unicode.c
> > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct
> > > super_block *sb, const char *astr, int len,
> > > return size;
> > > }
> > >
> > > -/* Decomposes a single unicode character. */
> > > -static inline u16 *decompose_unichar(wchar_t uc, int *size)
> > > +/* Decomposes a non-Hangul unicode character. */
> > > +static u16 *decompose_nonhangul(wchar_t uc, int *size)
> > > {
> > > int off;
> > >
> > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t
> > > uc, int *size)
> > > return hfsplus_decompose_table + (off / 4);
> > > }
> > >
> > > +/*
> > > + * Decomposes a Hangul unicode character.
> > > + *
> > > + * This function was adapted from sample code in section 3.12 of
> > > the
> > > + * Unicode Standard, version 10.0.
> > > + *
> > > + * Copyright (C) 1991-2017 Unicode, Inc. All rights
> > > reserved. Distributed
> > > + * under the Terms of Use in http://www.unicode.org/copyright.ht
> > > ml.
> > > + */
> > > +static int decompose_hangul(wchar_t uc, u16 *result)
> > > +{
> > > + int index;
> > > + int l, v, t;
> > > +
> > > + index = uc - Hangul_SBase;
> > > + if (index < 0 || index >= Hangul_SCount)
> > > + return 0;
> > > +
> > > + l = Hangul_LBase + index / Hangul_NCount;
> > > + v = Hangul_VBase + (index % Hangul_NCount) /
> > > Hangul_TCount;
> > > + t = Hangul_TBase + index % Hangul_TCount;
> > > +
> > > + result[0] = l;
> > > + result[1] = v;
> > > + if (t != Hangul_TBase) {
> > > + result[2] = t;
> > > + return 3;
> > > + }
> > > + return 2;
> > > +}
> > > +
> > > +/* Decomposes a single unicode character. */
> > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16
> > > *hangul_buffer)
> > > +{
> > > + u16 *result;
> > > +
> > > + /* Hangul is handled separately */
> > > + result = hangul_buffer;
> > > + *size = decompose_hangul(uc, result);
> > Is it possible to distinguish the different cases beforehand and to
> > select one proper function for execution without the execution both
> > ones
> > in the worst case?
> The decompose_hangul and decompose_nonhangul functions are called
> only once,
> so I would expect the compiler to inline them both. The result should
> be
> similar to your suggestion.
>
It will be much better to have:
If (hangful) {
<decompose_hangul>
} else {
<decompose_nonhangul>
}
Currently, the code looks weird. Especially, decompose_hangul() is able
to distinguish two different cases. So, it's easy to extract the check.
> >
> > >
> > > + if (*size == 0)
> > > + /* not Hangul */
> > > + result = decompose_nonhangul(uc, size);
> > I think it makes sense to use the brackets here. If I remember
> > correctly, this is the code style's requirement.
> I couldn't find such a requirement, but I did find some other places
> within
> the hfsplus code where the comments are done my way. For example, in
> the
> hfsplus_cat_write_inode() function of the inode.c file.
>
The hfsplus code contains many awful places. It doesn't mean that it
needs to follow the bad style examples. If you have only one string:
if (condition)
<do_something>
then it's OK not to use the brackets. Otherwise, you should include the
branch into the brackets.
Thanks,
Vyacheslav Dubeyko.
next prev parent reply other threads:[~2017-11-28 16:30 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-17 8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou
2017-11-19 0:57 ` Ernesto A. Fernández
2017-11-23 3:57 ` tchou
2017-11-23 4:21 ` Viacheslav Dubeyko
2017-11-23 6:05 ` tchou
2017-11-23 6:23 ` Viacheslav Dubeyko
2017-11-23 6:34 ` tchou
2017-11-23 11:32 ` Ernesto A. Fernández
2017-11-23 18:36 ` Viacheslav Dubeyko
2017-11-23 22:20 ` Ernesto A. Fernández
2017-11-24 7:25 ` tchou
2017-11-24 11:45 ` Ernesto A. Fernández
2017-11-27 2:07 ` tchou
2017-11-27 19:36 ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández
2017-11-27 22:40 ` Viacheslav Dubeyko
2017-11-28 15:02 ` Ernesto A. Fernández
2017-11-28 16:30 ` Viacheslav Dubeyko [this message]
2017-11-28 18:15 ` Ernesto A. Fernández
2018-08-23 18:29 ` Ernesto A. Fernández
2018-08-24 1:20 ` tchou
[not found] <1999104027.7153972.1511885718712.ref@mail.yahoo.com>
2017-11-28 16:15 ` Hin-Tak Leung
2017-11-28 17:50 ` Viacheslav Dubeyko
2017-11-28 18:51 ` Ernesto A. Fernández
[not found] <114108759.7167494.1511886580033.ref@mail.yahoo.com>
2017-11-28 16:29 ` Hin-Tak Leung
[not found] <1047295440.7535049.1511905720846.ref@mail.yahoo.com>
2017-11-28 21:48 ` Hin-Tak Leung
2017-11-28 23:10 ` Ernesto A. Fernández
[not found] <1691800505.7838608.1511945866988.ref@mail.yahoo.com>
2017-11-29 8:57 ` Hin-Tak Leung
2017-11-29 14:42 ` Ernesto A. Fernández
2017-11-29 16:28 ` Viacheslav Dubeyko
2017-11-30 4:03 ` Ernesto A. Fernández
[not found] <748914583.7912009.1511948801051.ref@mail.yahoo.com>
2017-11-29 9:46 ` Hin-Tak Leung
2018-07-17 22:09 Ernesto A. Fernández
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1511886654.2822.6.camel@dubeyko.com \
--to=slava@dubeyko.com \
--cc=akpm@linux-foundation.org \
--cc=ernesto.mnd.fernandez@gmail.com \
--cc=htl10@users.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tchou@synology.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).