linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
To: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>,
	tchou <tchou@synology.com>,
	linux-fsdevel@vger.kernel.org,
	linux-fsdevel-owner@vger.kernel.org, htl10@users.sourceforge.net
Subject: Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name
Date: Thu, 23 Nov 2017 19:20:11 -0300	[thread overview]
Message-ID: <20171123222009.GA1269@debian.home> (raw)
In-Reply-To: <1511462197.2541.24.camel@dubeyko.com>

On Thu, Nov 23, 2017 at 10:36:37AM -0800, Viacheslav Dubeyko wrote:
> On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote:
> > Hi:
> > 
> > your issue seems to be in the decomposition of hangul characters, not
> > in
> > the recomposition before printing. The hfsplus module on linux is
> > saving
> > the name of your actor as AC F5 C7 20, without performing any
> > decomposition at all.
> > 
> > The reason your patch hides the bug is because it causes linux to
> > present
> > filenames as decomposed utf8, so it is not necessary to decompose
> > again
> > before working with them. But the issue is still there, and you will
> > most
> > likely run into trouble if you make a hangul filename in linux and
> > try
> > to work with it in MacOS.
> > 
> > Reviewing the code it would seem that the developers completely
> > forgot
> > the hangul characters had their own rules for decomposition. It's
> > weird
> > because they did the composition part correctly.
> > 
> > I've made a quick draft of a patch, mostly by copying the code
> > provided
> > in the unicode web. I don't think we can actually use it on a 
> 
> 
> Could you please share the link for "the unicode web"?
> 
> Thanks,
> Vyacheslav Dubeyko.

I'm not asking for any reviews yet, just testing because I don't have
a Mac. As long as that's clear, this is the latest version of Unicode:

www.unicode.org/versions/Unicode10.0.0/

You want section 3.12.

> 
> 
> > release,
> > but it should be enough to check if I'm right. It works fine on
> > linux,
> > but I don't have a mac, so it would be great if you could test it for
> > me.
> > 
> > Thanks,
> > Ernest
> > 
> > (By the way, there is no reason you should have to use the
> > nodecompose
> > mount option, as the other reviewer suggested. Using that option will
> > have a similar effect to that of your patch. It will hide the
> > problem,
> > but if you create a hangul filename on linux with that option you
> > probably won't be able to use it on a mac.)
> > 
> > ---
> > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
> > index dfa90c2..9006c61 100644
> > --- a/fs/hfsplus/unicode.c
> > +++ b/fs/hfsplus/unicode.c
> > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block
> > *sb, const char *astr, int len,
> >  	return size;
> >  }
> >  
> > -/* Decomposes a single unicode character. */
> > +/* Decomposes a single non-Hangul unicode character. */
> >  static inline u16 *decompose_unichar(wchar_t uc, int *size)
> >  {
> >  	int off;
> > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc,
> > int *size)
> >  	return hfsplus_decompose_table + (off / 4);
> >  }
> >  
> > +/* Decomposes a Hangul unicode character. */
> > +int decompose_hangul(wchar_t uc, u16 *result)
> > +{
> > +	int index;
> > +	int l, v, t;
> > +
> > +	index = uc - Hangul_SBase;
> > +	if (index < 0 || index >= Hangul_SCount)
> > +		return 0;
> > +
> > +	l = Hangul_LBase + index / Hangul_NCount;
> > +	v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount;
> > +	t = Hangul_TBase + index % Hangul_TCount;
> > +
> > +	result[0] = l;
> > +	result[1] = v;
> > +	if (t != Hangul_TBase) {
> > +		result[2] = t;
> > +		return 3;
> > +	}
> > +	return 2;
> > +}
> > +
> >  int hfsplus_asc2uni(struct super_block *sb,
> >  		    struct hfsplus_unistr *ustr, int max_unistr_len,
> >  		    const char *astr, int len)
> > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb,
> >  	int size, dsize, decompose;
> >  	u16 *dstr, outlen = 0;
> >  	wchar_t c;
> > +	u16 hangul_buf[3];
> >  
> >  	decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE,
> > &HFSPLUS_SB(sb)->flags);
> >  	while (outlen < max_unistr_len && len > 0) {
> >  		size = asc2unichar(sb, astr, len, &c);
> >  
> > -		if (decompose)
> > -			dstr = decompose_unichar(c, &dsize);
> > -		else
> > +		if (decompose) {
> > +			/* Hangul is handled separately */
> > +			dstr = &hangul_buf[0];
> > +			dsize = decompose_hangul(c, dstr);
> > +			if (dsize == 0)
> > +				/* not Hangul */
> > +				dstr = decompose_unichar(c, &dsize);
> > +		} else {
> >  			dstr = NULL;
> > +		}
> > +
> >  		if (dstr) {
> >  			if (outlen + dsize > max_unistr_len)
> >  				break;

  reply	other threads:[~2017-11-23 22:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-17  8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou
2017-11-19  0:57 ` Ernesto A. Fernández
2017-11-23  3:57   ` tchou
2017-11-23  4:21     ` Viacheslav Dubeyko
2017-11-23  6:05       ` tchou
2017-11-23  6:23         ` Viacheslav Dubeyko
2017-11-23  6:34           ` tchou
2017-11-23 11:32     ` Ernesto A. Fernández
2017-11-23 18:36       ` Viacheslav Dubeyko
2017-11-23 22:20         ` Ernesto A. Fernández [this message]
2017-11-24  7:25       ` tchou
2017-11-24 11:45         ` Ernesto A. Fernández
2017-11-27  2:07           ` tchou
2017-11-27 19:36             ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández
2017-11-27 22:40               ` Viacheslav Dubeyko
2017-11-28 15:02                 ` Ernesto A. Fernández
2017-11-28 16:30                   ` Viacheslav Dubeyko
2017-11-28 18:15                     ` Ernesto A. Fernández
2018-08-23 18:29               ` Ernesto A. Fernández
2018-08-24  1:20                 ` tchou
2017-11-17 19:33 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Slava Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171123222009.GA1269@debian.home \
    --to=ernesto.mnd.fernandez@gmail.com \
    --cc=htl10@users.sourceforge.net \
    --cc=linux-fsdevel-owner@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=slava@dubeyko.com \
    --cc=tchou@synology.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).