* [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name @ 2017-11-17 8:20 Ting-Chang Hou 2017-11-19 0:57 ` Ernesto A. Fernández 0 siblings, 1 reply; 21+ messages in thread From: Ting-Chang Hou @ 2017-11-17 8:20 UTC (permalink / raw) To: linux-fsdevel; +Cc: Ting-Chang Hou The unicode of hangul from macOS is decomposed. There has a bug that mistake decomposed unicode for composed when change unicode to ascii, so it cannot recognize the hangul correctly. Signed-off-by: Ting-Chang Hou <tchou@synology.com> --- fs/hfsplus/unicode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c index dfa90c2..2daf7b0 100644 --- a/fs/hfsplus/unicode.c +++ b/fs/hfsplus/unicode.c @@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb, ustrlen = be16_to_cpu(ustr->length); len = *len_p; ce1 = NULL; - compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); + compose = test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); while (ustrlen > 0) { c0 = be16_to_cpu(*ip++); -- 2.7.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-17 8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou @ 2017-11-19 0:57 ` Ernesto A. Fernández 2017-11-23 3:57 ` tchou 0 siblings, 1 reply; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-19 0:57 UTC (permalink / raw) To: Ting-Chang Hou; +Cc: linux-fsdevel, Ernesto A. Fernández On Fri, Nov 17, 2017 at 04:20:05PM +0800, Ting-Chang Hou wrote: > The unicode of hangul from macOS is decomposed. There has a bug that > mistake decomposed unicode for composed when change unicode to ascii, > so it cannot recognize the hangul correctly. > > Signed-off-by: Ting-Chang Hou <tchou@synology.com> > --- > fs/hfsplus/unicode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > index dfa90c2..2daf7b0 100644 > --- a/fs/hfsplus/unicode.c > +++ b/fs/hfsplus/unicode.c > @@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb, > ustrlen = be16_to_cpu(ustr->length); > len = *len_p; > ce1 = NULL; > - compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > + compose = test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); I'm not sure this is a mistake. The developers probably wanted the filenames to be recomposed before being presented in utf8. With your patch, if you try the following (with the default mount options): touch Á ls | hexdump -C the utf8 output filename will be using the combining accent (CC 81) instead of the Á character (C3 81). This is a bit annoying because it won't print correctly in my terminal anymore. What is it exactly that you are trying to fix? You mention an issue with hangul characters, but I failed to trigger it. Could you expand on that? > while (ustrlen > 0) { > c0 = be16_to_cpu(*ip++); > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-19 0:57 ` Ernesto A. Fernández @ 2017-11-23 3:57 ` tchou 2017-11-23 4:21 ` Viacheslav Dubeyko 2017-11-23 11:32 ` Ernesto A. Fernández 0 siblings, 2 replies; 21+ messages in thread From: tchou @ 2017-11-23 3:57 UTC (permalink / raw) To: Ernesto A. Fernández Cc: linux-fsdevel, linux-fsdevel-owner, slava, htl10 > Could you please share the examples of incorrect and correct > behaviour? > What is it exactly that you are trying to fix? You mention an issue > with > hangul characters, but I failed to trigger it. Could you expand on > that? > Hi all, There is an example.I use Mac mini to format my usb disk to HFS+ and touch the file with filename "공유"(a Korean actor, https://goo.gl/VcBsrn) on it. After it, I mount the usb disk on my ubuntu(Linux 4.14.0+) and get the following error message when trying to ls and cp the file: ls: cannot access 공유: No such file or directory cp: cannot stat ‘공유’: No such file or directory It seem's a problem for a long time(https://goo.gl/LiWGe5). After applying my patch, I can ls and cp the file correctly. > On Fri, Nov 17, 2017 at 04:20:05PM +0800, Ting-Chang Hou wrote: >> The unicode of hangul from macOS is decomposed. There has a bug that >> mistake decomposed unicode for composed when change unicode to ascii, >> so it cannot recognize the hangul correctly. >> >> Signed-off-by: Ting-Chang Hou <tchou@synology.com> >> --- >> fs/hfsplus/unicode.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c >> index dfa90c2..2daf7b0 100644 >> --- a/fs/hfsplus/unicode.c >> +++ b/fs/hfsplus/unicode.c >> @@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb, >> ustrlen = be16_to_cpu(ustr->length); >> len = *len_p; >> ce1 = NULL; >> - compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); >> + compose = test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > > I'm not sure this is a mistake. The developers probably wanted the > filenames to be recomposed before being presented in utf8. With your > patch, > if you try the following (with the default mount options): > > touch Á > ls | hexdump -C > > the utf8 output filename will be using the combining accent (CC 81) > instead > of the Á character (C3 81). This is a bit annoying because it won't > print > correctly in my terminal anymore. I'm not exatly know why combining accent cannot print correctly in terminal and how to avoid it. Whether apply my patch or not, my terminal cannot print the hangul charactor correctly. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 3:57 ` tchou @ 2017-11-23 4:21 ` Viacheslav Dubeyko 2017-11-23 6:05 ` tchou 2017-11-23 11:32 ` Ernesto A. Fernández 1 sibling, 1 reply; 21+ messages in thread From: Viacheslav Dubeyko @ 2017-11-23 4:21 UTC (permalink / raw) To: tchou, Ernesto A. Fernández Cc: linux-fsdevel, linux-fsdevel-owner, htl10 On Thu, 2017-11-23 at 11:57 +0800, tchou wrote: > > > > Could you please share the examples of incorrect and correct > > behaviour? > > > > What is it exactly that you are trying to fix? You mention an > > issue > > with > > hangul characters, but I failed to trigger it. Could you expand on > > that? > > > Hi all, > There is an example.I use Mac mini to format my usb disk to HFS+ and > touch > the file with filename "공유"(a Korean actor, https://goo.gl/VcBsrn) > on > it. > After it, I mount the usb disk on my ubuntu(Linux 4.14.0+) and get > the > following error message when trying to ls and cp the file: > Could you share the all mount options that you used? It looks that you simply don't use the proper mount options set under the Linux. Because MacOS X will mount properly always. Thanks, Vyacheslav Dubeyko. > ls: cannot access 공유: No such file or directory > cp: cannot stat ‘공유’: No such file or directory > > It seem's a problem for a long time(https://goo.gl/LiWGe5). > After applying my patch, I can ls and cp the file correctly. > > > > > On Fri, Nov 17, 2017 at 04:20:05PM +0800, Ting-Chang Hou wrote: > > > > > > The unicode of hangul from macOS is decomposed. There has a bug > > > that > > > mistake decomposed unicode for composed when change unicode to > > > ascii, > > > so it cannot recognize the hangul correctly. > > > > > > Signed-off-by: Ting-Chang Hou <tchou@synology.com> > > > --- > > > fs/hfsplus/unicode.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > > index dfa90c2..2daf7b0 100644 > > > --- a/fs/hfsplus/unicode.c > > > +++ b/fs/hfsplus/unicode.c > > > @@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb, > > > ustrlen = be16_to_cpu(ustr->length); > > > len = *len_p; > > > ce1 = NULL; > > > - compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > > > &HFSPLUS_SB(sb)->flags); > > > + compose = test_bit(HFSPLUS_SB_NODECOMPOSE, > > > &HFSPLUS_SB(sb)->flags); > > I'm not sure this is a mistake. The developers probably wanted the > > filenames to be recomposed before being presented in utf8. With > > your > > patch, > > if you try the following (with the default mount options): > > > > touch Á > > ls | hexdump -C > > > > the utf8 output filename will be using the combining accent (CC > > 81) > > instead > > of the Á character (C3 81). This is a bit annoying because it > > won't > > print > > correctly in my terminal anymore. > I'm not exatly know why combining accent cannot print correctly in > terminal > and how to avoid it. Whether apply my patch or not, my terminal > cannot > print the hangul charactor correctly. > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 4:21 ` Viacheslav Dubeyko @ 2017-11-23 6:05 ` tchou 2017-11-23 6:23 ` Viacheslav Dubeyko 0 siblings, 1 reply; 21+ messages in thread From: tchou @ 2017-11-23 6:05 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: "Ernesto A." Fernández, linux-fsdevel, linux-fsdevel-owner, htl10 Viacheslav Dubeyko 於 2017-11-23 12:21 寫到: > On Thu, 2017-11-23 at 11:57 +0800, tchou wrote: >> > >> > Could you please share the examples of incorrect and correct >> > behaviour? >> > >> > What is it exactly that you are trying to fix? You mention an >> > issue >> > with >> > hangul characters, but I failed to trigger it. Could you expand on >> > that? >> > >> Hi all, >> There is an example.I use Mac mini to format my usb disk to HFS+ and >> touch >> the file with filename "공유"(a Korean actor, https://goo.gl/VcBsrn) >> on >> it. >> After it, I mount the usb disk on my ubuntu(Linux 4.14.0+) and get >> the >> following error message when trying to ls and cp the file: >> > > > Could you share the all mount options that you used? It looks that you > simply don't use the proper mount options set under the Linux. Because > MacOS X will mount properly always. > > Thanks, > Vyacheslav Dubeyko. > Hi Vyacheslav, I use the default mount option: mount /dev/sdc1 /mnt/hfs /dev/sdc1 on /mnt/hfs type hfsplus (rw,relatime,umask=22,uid=0,gid=0,nls=utf8) Thanks, TCHou > > >> ls: cannot access 공유: No such file or directory >> cp: cannot stat ‘공유’: No such file or directory >> >> It seem's a problem for a long time(https://goo.gl/LiWGe5). >> After applying my patch, I can ls and cp the file correctly. >> >> > >> > On Fri, Nov 17, 2017 at 04:20:05PM +0800, Ting-Chang Hou wrote: >> > > >> > > The unicode of hangul from macOS is decomposed. There has a bug >> > > that >> > > mistake decomposed unicode for composed when change unicode to >> > > ascii, >> > > so it cannot recognize the hangul correctly. >> > > >> > > Signed-off-by: Ting-Chang Hou <tchou@synology.com> >> > > --- >> > > fs/hfsplus/unicode.c | 2 +- >> > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > >> > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c >> > > index dfa90c2..2daf7b0 100644 >> > > --- a/fs/hfsplus/unicode.c >> > > +++ b/fs/hfsplus/unicode.c >> > > @@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb, >> > > ustrlen = be16_to_cpu(ustr->length); >> > > len = *len_p; >> > > ce1 = NULL; >> > > - compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, >> > > &HFSPLUS_SB(sb)->flags); >> > > + compose = test_bit(HFSPLUS_SB_NODECOMPOSE, >> > > &HFSPLUS_SB(sb)->flags); >> > I'm not sure this is a mistake. The developers probably wanted the >> > filenames to be recomposed before being presented in utf8. With >> > your >> > patch, >> > if you try the following (with the default mount options): >> > >> > touch Á >> > ls | hexdump -C >> > >> > the utf8 output filename will be using the combining accent (CC >> > 81) >> > instead >> > of the Á character (C3 81). This is a bit annoying because it >> > won't >> > print >> > correctly in my terminal anymore. >> I'm not exatly know why combining accent cannot print correctly in >> terminal >> and how to avoid it. Whether apply my patch or not, my terminal >> cannot >> print the hangul charactor correctly. >> >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 6:05 ` tchou @ 2017-11-23 6:23 ` Viacheslav Dubeyko 2017-11-23 6:34 ` tchou 0 siblings, 1 reply; 21+ messages in thread From: Viacheslav Dubeyko @ 2017-11-23 6:23 UTC (permalink / raw) To: tchou Cc: "Ernesto A." Fernández, linux-fsdevel, linux-fsdevel-owner, htl10 On Thu, 2017-11-23 at 14:05 +0800, tchou wrote: > Viacheslav Dubeyko 於 2017-11-23 12:21 寫到: > > > > On Thu, 2017-11-23 at 11:57 +0800, tchou wrote: > > > > > > > > > > > > > > > Could you please share the examples of incorrect and correct > > > > behaviour? > > > > > > > > What is it exactly that you are trying to fix? You mention an > > > > issue > > > > with > > > > hangul characters, but I failed to trigger it. Could you expand > > > > on > > > > that? > > > > > > > Hi all, > > > There is an example.I use Mac mini to format my usb disk to HFS+ > > > and > > > touch > > > the file with filename "공유"(a Korean actor, https://goo.gl/VcBsrn > > > ) > > > on > > > it. > > > After it, I mount the usb disk on my ubuntu(Linux 4.14.0+) and > > > get > > > the > > > following error message when trying to ls and cp the file: > > > > > > > Could you share the all mount options that you used? It looks that > > you > > simply don't use the proper mount options set under the Linux. > > Because > > MacOS X will mount properly always. > > > > Thanks, > > Vyacheslav Dubeyko. > > > Hi Vyacheslav, > > I use the default mount option: > > mount /dev/sdc1 /mnt/hfs > /dev/sdc1 on /mnt/hfs type hfsplus > (rw,relatime,umask=22,uid=0,gid=0,nls=utf8) > Please, take a look into mount options description [1]. Especially, take a look into decompose/nodecompose options. You simply needs to use the proper mount option for your case. Thanks, Vyacheslav Dubeyko. [1] https://www.kernel.org/doc/Documentation/filesystems/hfsplus.txt > Thanks, > TCHou > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 6:23 ` Viacheslav Dubeyko @ 2017-11-23 6:34 ` tchou 0 siblings, 0 replies; 21+ messages in thread From: tchou @ 2017-11-23 6:34 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: "\"Ernesto A.\"" Fernández, linux-fsdevel, linux-fsdevel-owner, htl10 Viacheslav Dubeyko 於 2017-11-23 14:23 寫到: > On Thu, 2017-11-23 at 14:05 +0800, tchou wrote: >> Viacheslav Dubeyko 於 2017-11-23 12:21 寫到: >> > >> > On Thu, 2017-11-23 at 11:57 +0800, tchou wrote: >> > > >> > > > >> > > > >> > > > Could you please share the examples of incorrect and correct >> > > > behaviour? >> > > > >> > > > What is it exactly that you are trying to fix? You mention an >> > > > issue >> > > > with >> > > > hangul characters, but I failed to trigger it. Could you expand >> > > > on >> > > > that? >> > > > >> > > Hi all, >> > > There is an example.I use Mac mini to format my usb disk to HFS+ >> > > and >> > > touch >> > > the file with filename "공유"(a Korean actor, https://goo.gl/VcBsrn >> > > ) >> > > on >> > > it. >> > > After it, I mount the usb disk on my ubuntu(Linux 4.14.0+) and >> > > get >> > > the >> > > following error message when trying to ls and cp the file: >> > > >> > >> > Could you share the all mount options that you used? It looks that >> > you >> > simply don't use the proper mount options set under the Linux. >> > Because >> > MacOS X will mount properly always. >> > >> > Thanks, >> > Vyacheslav Dubeyko. >> > >> Hi Vyacheslav, >> >> I use the default mount option: >> >> mount /dev/sdc1 /mnt/hfs >> /dev/sdc1 on /mnt/hfs type hfsplus >> (rw,relatime,umask=22,uid=0,gid=0,nls=utf8) >> > > > Please, take a look into mount options description [1]. Especially, > take a look into decompose/nodecompose options. You simply needs to use > the proper mount option for your case. > I got it. It's my mistake. Thank you all for making me right. Thanks, TCHou > Thanks, > Vyacheslav Dubeyko. > > > [1] https://www.kernel.org/doc/Documentation/filesystems/hfsplus.txt > >> Thanks, >> TCHou >> > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 3:57 ` tchou 2017-11-23 4:21 ` Viacheslav Dubeyko @ 2017-11-23 11:32 ` Ernesto A. Fernández 2017-11-23 18:36 ` Viacheslav Dubeyko 2017-11-24 7:25 ` tchou 1 sibling, 2 replies; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-23 11:32 UTC (permalink / raw) To: tchou Cc: linux-fsdevel, linux-fsdevel-owner, slava, htl10, Ernesto A. Fernández Hi: your issue seems to be in the decomposition of hangul characters, not in the recomposition before printing. The hfsplus module on linux is saving the name of your actor as AC F5 C7 20, without performing any decomposition at all. The reason your patch hides the bug is because it causes linux to present filenames as decomposed utf8, so it is not necessary to decompose again before working with them. But the issue is still there, and you will most likely run into trouble if you make a hangul filename in linux and try to work with it in MacOS. Reviewing the code it would seem that the developers completely forgot the hangul characters had their own rules for decomposition. It's weird because they did the composition part correctly. I've made a quick draft of a patch, mostly by copying the code provided in the unicode web. I don't think we can actually use it on a release, but it should be enough to check if I'm right. It works fine on linux, but I don't have a mac, so it would be great if you could test it for me. Thanks, Ernest (By the way, there is no reason you should have to use the nodecompose mount option, as the other reviewer suggested. Using that option will have a similar effect to that of your patch. It will hide the problem, but if you create a hangul filename on linux with that option you probably won't be able to use it on a mac.) --- diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c index dfa90c2..9006c61 100644 --- a/fs/hfsplus/unicode.c +++ b/fs/hfsplus/unicode.c @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block *sb, const char *astr, int len, return size; } -/* Decomposes a single unicode character. */ +/* Decomposes a single non-Hangul unicode character. */ static inline u16 *decompose_unichar(wchar_t uc, int *size) { int off; @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, int *size) return hfsplus_decompose_table + (off / 4); } +/* Decomposes a Hangul unicode character. */ +int decompose_hangul(wchar_t uc, u16 *result) +{ + int index; + int l, v, t; + + index = uc - Hangul_SBase; + if (index < 0 || index >= Hangul_SCount) + return 0; + + l = Hangul_LBase + index / Hangul_NCount; + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; + t = Hangul_TBase + index % Hangul_TCount; + + result[0] = l; + result[1] = v; + if (t != Hangul_TBase) { + result[2] = t; + return 3; + } + return 2; +} + int hfsplus_asc2uni(struct super_block *sb, struct hfsplus_unistr *ustr, int max_unistr_len, const char *astr, int len) @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, int size, dsize, decompose; u16 *dstr, outlen = 0; wchar_t c; + u16 hangul_buf[3]; decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); while (outlen < max_unistr_len && len > 0) { size = asc2unichar(sb, astr, len, &c); - if (decompose) - dstr = decompose_unichar(c, &dsize); - else + if (decompose) { + /* Hangul is handled separately */ + dstr = &hangul_buf[0]; + dsize = decompose_hangul(c, dstr); + if (dsize == 0) + /* not Hangul */ + dstr = decompose_unichar(c, &dsize); + } else { dstr = NULL; + } + if (dstr) { if (outlen + dsize > max_unistr_len) break; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 11:32 ` Ernesto A. Fernández @ 2017-11-23 18:36 ` Viacheslav Dubeyko 2017-11-23 22:20 ` Ernesto A. Fernández 2017-11-24 7:25 ` tchou 1 sibling, 1 reply; 21+ messages in thread From: Viacheslav Dubeyko @ 2017-11-23 18:36 UTC (permalink / raw) To: Ernesto A. Fernández, tchou Cc: linux-fsdevel, linux-fsdevel-owner, htl10 On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote: > Hi: > > your issue seems to be in the decomposition of hangul characters, not > in > the recomposition before printing. The hfsplus module on linux is > saving > the name of your actor as AC F5 C7 20, without performing any > decomposition at all. > > The reason your patch hides the bug is because it causes linux to > present > filenames as decomposed utf8, so it is not necessary to decompose > again > before working with them. But the issue is still there, and you will > most > likely run into trouble if you make a hangul filename in linux and > try > to work with it in MacOS. > > Reviewing the code it would seem that the developers completely > forgot > the hangul characters had their own rules for decomposition. It's > weird > because they did the composition part correctly. > > I've made a quick draft of a patch, mostly by copying the code > provided > in the unicode web. I don't think we can actually use it on a Could you please share the link for "the unicode web"? Thanks, Vyacheslav Dubeyko. > release, > but it should be enough to check if I'm right. It works fine on > linux, > but I don't have a mac, so it would be great if you could test it for > me. > > Thanks, > Ernest > > (By the way, there is no reason you should have to use the > nodecompose > mount option, as the other reviewer suggested. Using that option will > have a similar effect to that of your patch. It will hide the > problem, > but if you create a hangul filename on linux with that option you > probably won't be able to use it on a mac.) > > --- > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > index dfa90c2..9006c61 100644 > --- a/fs/hfsplus/unicode.c > +++ b/fs/hfsplus/unicode.c > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > *sb, const char *astr, int len, > return size; > } > > -/* Decomposes a single unicode character. */ > +/* Decomposes a single non-Hangul unicode character. */ > static inline u16 *decompose_unichar(wchar_t uc, int *size) > { > int off; > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, > int *size) > return hfsplus_decompose_table + (off / 4); > } > > +/* Decomposes a Hangul unicode character. */ > +int decompose_hangul(wchar_t uc, u16 *result) > +{ > + int index; > + int l, v, t; > + > + index = uc - Hangul_SBase; > + if (index < 0 || index >= Hangul_SCount) > + return 0; > + > + l = Hangul_LBase + index / Hangul_NCount; > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > + t = Hangul_TBase + index % Hangul_TCount; > + > + result[0] = l; > + result[1] = v; > + if (t != Hangul_TBase) { > + result[2] = t; > + return 3; > + } > + return 2; > +} > + > int hfsplus_asc2uni(struct super_block *sb, > struct hfsplus_unistr *ustr, int max_unistr_len, > const char *astr, int len) > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, > int size, dsize, decompose; > u16 *dstr, outlen = 0; > wchar_t c; > + u16 hangul_buf[3]; > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > &HFSPLUS_SB(sb)->flags); > while (outlen < max_unistr_len && len > 0) { > size = asc2unichar(sb, astr, len, &c); > > - if (decompose) > - dstr = decompose_unichar(c, &dsize); > - else > + if (decompose) { > + /* Hangul is handled separately */ > + dstr = &hangul_buf[0]; > + dsize = decompose_hangul(c, dstr); > + if (dsize == 0) > + /* not Hangul */ > + dstr = decompose_unichar(c, &dsize); > + } else { > dstr = NULL; > + } > + > if (dstr) { > if (outlen + dsize > max_unistr_len) > break; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 18:36 ` Viacheslav Dubeyko @ 2017-11-23 22:20 ` Ernesto A. Fernández 0 siblings, 0 replies; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-23 22:20 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: Ernesto A. Fernández, tchou, linux-fsdevel, linux-fsdevel-owner, htl10 On Thu, Nov 23, 2017 at 10:36:37AM -0800, Viacheslav Dubeyko wrote: > On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote: > > Hi: > > > > your issue seems to be in the decomposition of hangul characters, not > > in > > the recomposition before printing. The hfsplus module on linux is > > saving > > the name of your actor as AC F5 C7 20, without performing any > > decomposition at all. > > > > The reason your patch hides the bug is because it causes linux to > > present > > filenames as decomposed utf8, so it is not necessary to decompose > > again > > before working with them. But the issue is still there, and you will > > most > > likely run into trouble if you make a hangul filename in linux and > > try > > to work with it in MacOS. > > > > Reviewing the code it would seem that the developers completely > > forgot > > the hangul characters had their own rules for decomposition. It's > > weird > > because they did the composition part correctly. > > > > I've made a quick draft of a patch, mostly by copying the code > > provided > > in the unicode web. I don't think we can actually use it on a > > > Could you please share the link for "the unicode web"? > > Thanks, > Vyacheslav Dubeyko. I'm not asking for any reviews yet, just testing because I don't have a Mac. As long as that's clear, this is the latest version of Unicode: www.unicode.org/versions/Unicode10.0.0/ You want section 3.12. > > > > release, > > but it should be enough to check if I'm right. It works fine on > > linux, > > but I don't have a mac, so it would be great if you could test it for > > me. > > > > Thanks, > > Ernest > > > > (By the way, there is no reason you should have to use the > > nodecompose > > mount option, as the other reviewer suggested. Using that option will > > have a similar effect to that of your patch. It will hide the > > problem, > > but if you create a hangul filename on linux with that option you > > probably won't be able to use it on a mac.) > > > > --- > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > index dfa90c2..9006c61 100644 > > --- a/fs/hfsplus/unicode.c > > +++ b/fs/hfsplus/unicode.c > > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > > *sb, const char *astr, int len, > > return size; > > } > > > > -/* Decomposes a single unicode character. */ > > +/* Decomposes a single non-Hangul unicode character. */ > > static inline u16 *decompose_unichar(wchar_t uc, int *size) > > { > > int off; > > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, > > int *size) > > return hfsplus_decompose_table + (off / 4); > > } > > > > +/* Decomposes a Hangul unicode character. */ > > +int decompose_hangul(wchar_t uc, u16 *result) > > +{ > > + int index; > > + int l, v, t; > > + > > + index = uc - Hangul_SBase; > > + if (index < 0 || index >= Hangul_SCount) > > + return 0; > > + > > + l = Hangul_LBase + index / Hangul_NCount; > > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > > + t = Hangul_TBase + index % Hangul_TCount; > > + > > + result[0] = l; > > + result[1] = v; > > + if (t != Hangul_TBase) { > > + result[2] = t; > > + return 3; > > + } > > + return 2; > > +} > > + > > int hfsplus_asc2uni(struct super_block *sb, > > struct hfsplus_unistr *ustr, int max_unistr_len, > > const char *astr, int len) > > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, > > int size, dsize, decompose; > > u16 *dstr, outlen = 0; > > wchar_t c; > > + u16 hangul_buf[3]; > > > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > > &HFSPLUS_SB(sb)->flags); > > while (outlen < max_unistr_len && len > 0) { > > size = asc2unichar(sb, astr, len, &c); > > > > - if (decompose) > > - dstr = decompose_unichar(c, &dsize); > > - else > > + if (decompose) { > > + /* Hangul is handled separately */ > > + dstr = &hangul_buf[0]; > > + dsize = decompose_hangul(c, dstr); > > + if (dsize == 0) > > + /* not Hangul */ > > + dstr = decompose_unichar(c, &dsize); > > + } else { > > dstr = NULL; > > + } > > + > > if (dstr) { > > if (outlen + dsize > max_unistr_len) > > break; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-23 11:32 ` Ernesto A. Fernández 2017-11-23 18:36 ` Viacheslav Dubeyko @ 2017-11-24 7:25 ` tchou 2017-11-24 11:45 ` Ernesto A. Fernández 1 sibling, 1 reply; 21+ messages in thread From: tchou @ 2017-11-24 7:25 UTC (permalink / raw) To: Ernesto A. Fernández Cc: linux-fsdevel, linux-fsdevel-owner, slava, htl10 Ernesto A. Fernández 於 2017-11-23 19:32 寫到: > Hi: > > your issue seems to be in the decomposition of hangul characters, not > in > the recomposition before printing. The hfsplus module on linux is > saving > the name of your actor as AC F5 C7 20, without performing any > decomposition at all. > > The reason your patch hides the bug is because it causes linux to > present > filenames as decomposed utf8, so it is not necessary to decompose again > before working with them. But the issue is still there, and you will > most > likely run into trouble if you make a hangul filename in linux and try > to work with it in MacOS. > > Reviewing the code it would seem that the developers completely forgot > the hangul characters had their own rules for decomposition. It's weird > because they did the composition part correctly. > > I've made a quick draft of a patch, mostly by copying the code provided > in the unicode web. I don't think we can actually use it on a release, > but it should be enough to check if I'm right. It works fine on linux, > but I don't have a mac, so it would be great if you could test it for > me. > > Thanks, > Ernest > > (By the way, there is no reason you should have to use the nodecompose > mount option, as the other reviewer suggested. Using that option will > have a similar effect to that of your patch. It will hide the problem, > but if you create a hangul filename on linux with that option you > probably won't be able to use it on a mac.) > > --- > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > index dfa90c2..9006c61 100644 > --- a/fs/hfsplus/unicode.c > +++ b/fs/hfsplus/unicode.c > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > *sb, const char *astr, int len, > return size; > } > > -/* Decomposes a single unicode character. */ > +/* Decomposes a single non-Hangul unicode character. */ > static inline u16 *decompose_unichar(wchar_t uc, int *size) > { > int off; > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, > int *size) > return hfsplus_decompose_table + (off / 4); > } > > +/* Decomposes a Hangul unicode character. */ > +int decompose_hangul(wchar_t uc, u16 *result) > +{ > + int index; > + int l, v, t; > + > + index = uc - Hangul_SBase; > + if (index < 0 || index >= Hangul_SCount) > + return 0; > + > + l = Hangul_LBase + index / Hangul_NCount; > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > + t = Hangul_TBase + index % Hangul_TCount; > + > + result[0] = l; > + result[1] = v; > + if (t != Hangul_TBase) { > + result[2] = t; > + return 3; > + } > + return 2; > +} > + > int hfsplus_asc2uni(struct super_block *sb, > struct hfsplus_unistr *ustr, int max_unistr_len, > const char *astr, int len) > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, > int size, dsize, decompose; > u16 *dstr, outlen = 0; > wchar_t c; > + u16 hangul_buf[3]; > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > &HFSPLUS_SB(sb)->flags); > while (outlen < max_unistr_len && len > 0) { > size = asc2unichar(sb, astr, len, &c); > > - if (decompose) > - dstr = decompose_unichar(c, &dsize); > - else > + if (decompose) { > + /* Hangul is handled separately */ > + dstr = &hangul_buf[0]; > + dsize = decompose_hangul(c, dstr); > + if (dsize == 0) > + /* not Hangul */ > + dstr = decompose_unichar(c, &dsize); > + } else { > dstr = NULL; > + } > + > if (dstr) { > if (outlen + dsize > max_unistr_len) > break; Hi, Thank you for your explanation and your draft. I test four versions of hfsplus module: -------------------------------- 1. origin linux 2. apply my patch 3. mount with nodecompose option 4. apply your patch -------------------------------- There are the results of the test: --------------------------------------------------------------------- Linux can ls and cp the file from mac correctly with module 2,3,4. Mac cannot cp the files correctly which touched by module 1,2,3. Mac can cp the file correctly which touched by module with your patch. Mac can ls the files which touch by all modules. --------------------------------------------------------------------- It seems that everything goes well with your patch. The result is just like you say, my patch or mount with nodecompose option cannot perform normally. By the way, are the decompose procedures of function hfsplus_hash_dentry() and hfsplus_compare_dentry() need to modify as well? Thanks, TCHou ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-24 7:25 ` tchou @ 2017-11-24 11:45 ` Ernesto A. Fernández 2017-11-27 2:07 ` tchou 0 siblings, 1 reply; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-24 11:45 UTC (permalink / raw) To: tchou Cc: linux-fsdevel, linux-fsdevel-owner, slava, htl10, Ernesto A. Fernández On Fri, Nov 24, 2017 at 03:25:40PM +0800, tchou wrote: > Ernesto A. Fernández 於 2017-11-23 19:32 寫到: > >Hi: > > > >your issue seems to be in the decomposition of hangul characters, not in > >the recomposition before printing. The hfsplus module on linux is saving > >the name of your actor as AC F5 C7 20, without performing any > >decomposition at all. > > > >The reason your patch hides the bug is because it causes linux to present > >filenames as decomposed utf8, so it is not necessary to decompose again > >before working with them. But the issue is still there, and you will most > >likely run into trouble if you make a hangul filename in linux and try > >to work with it in MacOS. > > > >Reviewing the code it would seem that the developers completely forgot > >the hangul characters had their own rules for decomposition. It's weird > >because they did the composition part correctly. > > > >I've made a quick draft of a patch, mostly by copying the code provided > >in the unicode web. I don't think we can actually use it on a release, > >but it should be enough to check if I'm right. It works fine on linux, > >but I don't have a mac, so it would be great if you could test it for me. > > > >Thanks, > >Ernest > > > >(By the way, there is no reason you should have to use the nodecompose > >mount option, as the other reviewer suggested. Using that option will > >have a similar effect to that of your patch. It will hide the problem, > >but if you create a hangul filename on linux with that option you > >probably won't be able to use it on a mac.) > > > >--- > >diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > >index dfa90c2..9006c61 100644 > >--- a/fs/hfsplus/unicode.c > >+++ b/fs/hfsplus/unicode.c > >@@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > >*sb, const char *astr, int len, > > return size; > > } > > > >-/* Decomposes a single unicode character. */ > >+/* Decomposes a single non-Hangul unicode character. */ > > static inline u16 *decompose_unichar(wchar_t uc, int *size) > > { > > int off; > >@@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, int > >*size) > > return hfsplus_decompose_table + (off / 4); > > } > > > >+/* Decomposes a Hangul unicode character. */ > >+int decompose_hangul(wchar_t uc, u16 *result) > >+{ > >+ int index; > >+ int l, v, t; > >+ > >+ index = uc - Hangul_SBase; > >+ if (index < 0 || index >= Hangul_SCount) > >+ return 0; > >+ > >+ l = Hangul_LBase + index / Hangul_NCount; > >+ v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > >+ t = Hangul_TBase + index % Hangul_TCount; > >+ > >+ result[0] = l; > >+ result[1] = v; > >+ if (t != Hangul_TBase) { > >+ result[2] = t; > >+ return 3; > >+ } > >+ return 2; > >+} > >+ > > int hfsplus_asc2uni(struct super_block *sb, > > struct hfsplus_unistr *ustr, int max_unistr_len, > > const char *astr, int len) > >@@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, > > int size, dsize, decompose; > > u16 *dstr, outlen = 0; > > wchar_t c; > >+ u16 hangul_buf[3]; > > > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > > while (outlen < max_unistr_len && len > 0) { > > size = asc2unichar(sb, astr, len, &c); > > > >- if (decompose) > >- dstr = decompose_unichar(c, &dsize); > >- else > >+ if (decompose) { > >+ /* Hangul is handled separately */ > >+ dstr = &hangul_buf[0]; > >+ dsize = decompose_hangul(c, dstr); > >+ if (dsize == 0) > >+ /* not Hangul */ > >+ dstr = decompose_unichar(c, &dsize); > >+ } else { > > dstr = NULL; > >+ } > >+ > > if (dstr) { > > if (outlen + dsize > max_unistr_len) > > break; > > Hi, > > Thank you for your explanation and your draft. > I test four versions of hfsplus module: > -------------------------------- > 1. origin linux > 2. apply my patch > 3. mount with nodecompose option > 4. apply your patch > -------------------------------- It seems you were very thorough, thank you. > There are the results of the test: > --------------------------------------------------------------------- > Linux can ls and cp the file from mac correctly with module 2,3,4. > Mac cannot cp the files correctly which touched by module 1,2,3. > Mac can cp the file correctly which touched by module with your patch. > Mac can ls the files which touch by all modules. > --------------------------------------------------------------------- > > It seems that everything goes well with your patch. The result is just > like you say, my patch or mount with nodecompose option cannot perform > normally. That's good to hear. I will submit a patch as soon as I can figure out the licensing issues. If you want I can credit you as reporter and tester. > > By the way, are the decompose procedures of function hfsplus_hash_dentry() > and hfsplus_compare_dentry() need to modify as well? You are correct, good thing you noticed. I'll add that when I resend the patch. > Thanks, > TCHou ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name 2017-11-24 11:45 ` Ernesto A. Fernández @ 2017-11-27 2:07 ` tchou 2017-11-27 19:36 ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández 0 siblings, 1 reply; 21+ messages in thread From: tchou @ 2017-11-27 2:07 UTC (permalink / raw) To: Ernesto A. Fernández Cc: linux-fsdevel, linux-fsdevel-owner, slava, htl10 Ernesto A. Fernández 於 2017-11-24 19:45 寫到: > On Fri, Nov 24, 2017 at 03:25:40PM +0800, tchou wrote: >> Ernesto A. Fernández 於 2017-11-23 19:32 寫到: >> >Hi: >> > >> >your issue seems to be in the decomposition of hangul characters, not in >> >the recomposition before printing. The hfsplus module on linux is saving >> >the name of your actor as AC F5 C7 20, without performing any >> >decomposition at all. >> > >> >The reason your patch hides the bug is because it causes linux to present >> >filenames as decomposed utf8, so it is not necessary to decompose again >> >before working with them. But the issue is still there, and you will most >> >likely run into trouble if you make a hangul filename in linux and try >> >to work with it in MacOS. >> > >> >Reviewing the code it would seem that the developers completely forgot >> >the hangul characters had their own rules for decomposition. It's weird >> >because they did the composition part correctly. >> > >> >I've made a quick draft of a patch, mostly by copying the code provided >> >in the unicode web. I don't think we can actually use it on a release, >> >but it should be enough to check if I'm right. It works fine on linux, >> >but I don't have a mac, so it would be great if you could test it for me. >> > >> >Thanks, >> >Ernest >> > >> >(By the way, there is no reason you should have to use the nodecompose >> >mount option, as the other reviewer suggested. Using that option will >> >have a similar effect to that of your patch. It will hide the problem, >> >but if you create a hangul filename on linux with that option you >> >probably won't be able to use it on a mac.) >> > >> >--- >> >diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c >> >index dfa90c2..9006c61 100644 >> >--- a/fs/hfsplus/unicode.c >> >+++ b/fs/hfsplus/unicode.c >> >@@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block >> >*sb, const char *astr, int len, >> > return size; >> > } >> > >> >-/* Decomposes a single unicode character. */ >> >+/* Decomposes a single non-Hangul unicode character. */ >> > static inline u16 *decompose_unichar(wchar_t uc, int *size) >> > { >> > int off; >> >@@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, int >> >*size) >> > return hfsplus_decompose_table + (off / 4); >> > } >> > >> >+/* Decomposes a Hangul unicode character. */ >> >+int decompose_hangul(wchar_t uc, u16 *result) >> >+{ >> >+ int index; >> >+ int l, v, t; >> >+ >> >+ index = uc - Hangul_SBase; >> >+ if (index < 0 || index >= Hangul_SCount) >> >+ return 0; >> >+ >> >+ l = Hangul_LBase + index / Hangul_NCount; >> >+ v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; >> >+ t = Hangul_TBase + index % Hangul_TCount; >> >+ >> >+ result[0] = l; >> >+ result[1] = v; >> >+ if (t != Hangul_TBase) { >> >+ result[2] = t; >> >+ return 3; >> >+ } >> >+ return 2; >> >+} >> >+ >> > int hfsplus_asc2uni(struct super_block *sb, >> > struct hfsplus_unistr *ustr, int max_unistr_len, >> > const char *astr, int len) >> >@@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, >> > int size, dsize, decompose; >> > u16 *dstr, outlen = 0; >> > wchar_t c; >> >+ u16 hangul_buf[3]; >> > >> > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); >> > while (outlen < max_unistr_len && len > 0) { >> > size = asc2unichar(sb, astr, len, &c); >> > >> >- if (decompose) >> >- dstr = decompose_unichar(c, &dsize); >> >- else >> >+ if (decompose) { >> >+ /* Hangul is handled separately */ >> >+ dstr = &hangul_buf[0]; >> >+ dsize = decompose_hangul(c, dstr); >> >+ if (dsize == 0) >> >+ /* not Hangul */ >> >+ dstr = decompose_unichar(c, &dsize); >> >+ } else { >> > dstr = NULL; >> >+ } >> >+ >> > if (dstr) { >> > if (outlen + dsize > max_unistr_len) >> > break; >> >> Hi, >> >> Thank you for your explanation and your draft. >> I test four versions of hfsplus module: >> -------------------------------- >> 1. origin linux >> 2. apply my patch >> 3. mount with nodecompose option >> 4. apply your patch >> -------------------------------- > > It seems you were very thorough, thank you. > >> There are the results of the test: >> --------------------------------------------------------------------- >> Linux can ls and cp the file from mac correctly with module 2,3,4. >> Mac cannot cp the files correctly which touched by module 1,2,3. >> Mac can cp the file correctly which touched by module with your patch. >> Mac can ls the files which touch by all modules. >> --------------------------------------------------------------------- >> >> It seems that everything goes well with your patch. The result is just >> like you say, my patch or mount with nodecompose option cannot perform >> normally. > > That's good to hear. I will submit a patch as soon as I can figure out > the licensing issues. If you want I can credit you as reporter and > tester. > OK, thanks a lot. >> >> By the way, are the decompose procedures of function >> hfsplus_hash_dentry() >> and hfsplus_compare_dentry() need to modify as well? > > You are correct, good thing you noticed. I'll add that when I resend > the > patch. > >> Thanks, >> TCHou ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] hfsplus: fix decomposition of Hangul characters 2017-11-27 2:07 ` tchou @ 2017-11-27 19:36 ` Ernesto A. Fernández 2017-11-27 22:40 ` Viacheslav Dubeyko 2018-08-23 18:29 ` Ernesto A. Fernández 0 siblings, 2 replies; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-27 19:36 UTC (permalink / raw) To: linux-fsdevel Cc: tchou, slava, htl10, Alexander Viro, Andrew Morton, Ernesto A. Fernández It would be good to have some feedback on the use of code under the Unicode License, I don't know if that can be a problem. To Ting-Chang Hou: This patch is slightly different from the one you tested with your mac. You are still tagged as Tested-by, but you may want to check again that I have not broken anything. Thanks, Ernest -- >8 -- Subject: [PATCH] hfsplus: fix decomposition of Hangul characters Files created under macOS become essentially unusable under linux if their names contain any Hangul, and vice versa. This happens because the normalization of Hangul characters is a special case: it can be done algorithmically, without need of a table. The hfsplus module deals with Hangul correctly when composing, but completely forgets about it when performing a decomposition. Solve this by using the Hangul decomposition function provided in the Unicode Standard. It's under the Unicode License, compatible with the GPL. This patch will cause trouble for Hangul filenames already created by the module in the past. This shouldn't be a special concern because the main purpose of the module was always sharing with macOS. If a user actually needs to access such a file the nodecompose mount option should be enough. Reported-by: Ting-Chang Hou <tchou@synology.com> Tested-by: Ting-Chang Hou <tchou@synology.com> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> --- fs/hfsplus/unicode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 56 insertions(+), 6 deletions(-) diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c index dfa90c2..7ceae91 100644 --- a/fs/hfsplus/unicode.c +++ b/fs/hfsplus/unicode.c @@ -272,8 +272,8 @@ static inline int asc2unichar(struct super_block *sb, const char *astr, int len, return size; } -/* Decomposes a single unicode character. */ -static inline u16 *decompose_unichar(wchar_t uc, int *size) +/* Decomposes a non-Hangul unicode character. */ +static u16 *decompose_nonhangul(wchar_t uc, int *size) { int off; @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t uc, int *size) return hfsplus_decompose_table + (off / 4); } +/* + * Decomposes a Hangul unicode character. + * + * This function was adapted from sample code in section 3.12 of the + * Unicode Standard, version 10.0. + * + * Copyright (C) 1991-2017 Unicode, Inc. All rights reserved. Distributed + * under the Terms of Use in http://www.unicode.org/copyright.html. + */ +static int decompose_hangul(wchar_t uc, u16 *result) +{ + int index; + int l, v, t; + + index = uc - Hangul_SBase; + if (index < 0 || index >= Hangul_SCount) + return 0; + + l = Hangul_LBase + index / Hangul_NCount; + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; + t = Hangul_TBase + index % Hangul_TCount; + + result[0] = l; + result[1] = v; + if (t != Hangul_TBase) { + result[2] = t; + return 3; + } + return 2; +} + +/* Decomposes a single unicode character. */ +static u16 *decompose_unichar(wchar_t uc, int *size, u16 *hangul_buffer) +{ + u16 *result; + + /* Hangul is handled separately */ + result = hangul_buffer; + *size = decompose_hangul(uc, result); + if (*size == 0) + /* not Hangul */ + result = decompose_nonhangul(uc, size); + return result; +} + int hfsplus_asc2uni(struct super_block *sb, struct hfsplus_unistr *ustr, int max_unistr_len, const char *astr, int len) @@ -303,13 +348,14 @@ int hfsplus_asc2uni(struct super_block *sb, int size, dsize, decompose; u16 *dstr, outlen = 0; wchar_t c; + u16 dhangul[3]; decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); while (outlen < max_unistr_len && len > 0) { size = asc2unichar(sb, astr, len, &c); if (decompose) - dstr = decompose_unichar(c, &dsize); + dstr = decompose_unichar(c, &dsize, &dhangul[0]); else dstr = NULL; if (dstr) { @@ -344,6 +390,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) unsigned long hash; wchar_t c; u16 c2; + u16 dhangul[3]; casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); @@ -357,7 +404,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) len -= size; if (decompose) - dstr = decompose_unichar(c, &dsize); + dstr = decompose_unichar(c, &dsize, &dhangul[0]); else dstr = NULL; if (dstr) { @@ -396,6 +443,7 @@ int hfsplus_compare_dentry(const struct dentry *dentry, const char *astr1, *astr2; u16 c1, c2; wchar_t c; + u16 dhangul_1[3], dhangul_2[3]; casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); @@ -413,7 +461,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, len1 -= size; if (decompose) - dstr1 = decompose_unichar(c, &dsize1); + dstr1 = decompose_unichar(c, &dsize1, + &dhangul_1[0]); if (!decompose || !dstr1) { c1 = c; dstr1 = &c1; @@ -427,7 +476,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, len2 -= size; if (decompose) - dstr2 = decompose_unichar(c, &dsize2); + dstr2 = decompose_unichar(c, &dsize2, + &dhangul_2[0]); if (!decompose || !dstr2) { c2 = c; dstr2 = &c2; -- 2.1.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix decomposition of Hangul characters 2017-11-27 19:36 ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández @ 2017-11-27 22:40 ` Viacheslav Dubeyko 2017-11-28 15:02 ` Ernesto A. Fernández 2018-08-23 18:29 ` Ernesto A. Fernández 1 sibling, 1 reply; 21+ messages in thread From: Viacheslav Dubeyko @ 2017-11-27 22:40 UTC (permalink / raw) To: Ernesto A. Fernández Cc: linux-fsdevel, tchou, htl10, Alexander Viro, Andrew Morton On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote: > It would be good to have some feedback on the use of code under the > Unicode License, I don't know if that can be a problem. > > To Ting-Chang Hou: This patch is slightly different from the one you > tested with your mac. You are still tagged as Tested-by, but you may > want to check again that I have not broken anything. > > Thanks, > Ernest > > -- >8 -- > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > Files created under macOS become essentially unusable under linux if > their names contain any Hangul, and vice versa. > > This happens because the normalization of Hangul characters is a special > case: it can be done algorithmically, without need of a table. The > hfsplus module deals with Hangul correctly when composing, but > completely forgets about it when performing a decomposition. > > Solve this by using the Hangul decomposition function provided in the > Unicode Standard. It's under the Unicode License, compatible with the > GPL. > I believe it makes sense to share more details about algorithm details. I think that the Unicode standard contains some explanation how to decompose correctly. It will be great to have some brief citation in the patch description. > This patch will cause trouble for Hangul filenames already created by > the module in the past. This shouldn't be a special concern because the > main purpose of the module was always sharing with macOS. If a user > actually needs to access such a file the nodecompose mount option should > be enough. > > Reported-by: Ting-Chang Hou <tchou@synology.com> > Tested-by: Ting-Chang Hou <tchou@synology.com> > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> > --- > fs/hfsplus/unicode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 56 insertions(+), 6 deletions(-) > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > index dfa90c2..7ceae91 100644 > --- a/fs/hfsplus/unicode.c > +++ b/fs/hfsplus/unicode.c > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct super_block *sb, const char *astr, int len, > return size; > } > > -/* Decomposes a single unicode character. */ > -static inline u16 *decompose_unichar(wchar_t uc, int *size) > +/* Decomposes a non-Hangul unicode character. */ > +static u16 *decompose_nonhangul(wchar_t uc, int *size) > { > int off; > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t uc, int *size) > return hfsplus_decompose_table + (off / 4); > } > > +/* > + * Decomposes a Hangul unicode character. > + * > + * This function was adapted from sample code in section 3.12 of the > + * Unicode Standard, version 10.0. > + * > + * Copyright (C) 1991-2017 Unicode, Inc. All rights reserved. Distributed > + * under the Terms of Use in http://www.unicode.org/copyright.html. > + */ > +static int decompose_hangul(wchar_t uc, u16 *result) > +{ > + int index; > + int l, v, t; > + > + index = uc - Hangul_SBase; > + if (index < 0 || index >= Hangul_SCount) > + return 0; > + > + l = Hangul_LBase + index / Hangul_NCount; > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > + t = Hangul_TBase + index % Hangul_TCount; > + > + result[0] = l; > + result[1] = v; > + if (t != Hangul_TBase) { > + result[2] = t; > + return 3; > + } > + return 2; > +} > + > +/* Decomposes a single unicode character. */ > +static u16 *decompose_unichar(wchar_t uc, int *size, u16 *hangul_buffer) > +{ > + u16 *result; > + > + /* Hangul is handled separately */ > + result = hangul_buffer; > + *size = decompose_hangul(uc, result); Is it possible to distinguish the different cases beforehand and to select one proper function for execution without the execution both ones in the worst case? > + if (*size == 0) > + /* not Hangul */ > + result = decompose_nonhangul(uc, size); I think it makes sense to use the brackets here. If I remember correctly, this is the code style's requirement. Thanks, Vyacheslav Dubeyko. > + return result; > +} > + > int hfsplus_asc2uni(struct super_block *sb, > struct hfsplus_unistr *ustr, int max_unistr_len, > const char *astr, int len) > @@ -303,13 +348,14 @@ int hfsplus_asc2uni(struct super_block *sb, > int size, dsize, decompose; > u16 *dstr, outlen = 0; > wchar_t c; > + u16 dhangul[3]; > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > while (outlen < max_unistr_len && len > 0) { > size = asc2unichar(sb, astr, len, &c); > > if (decompose) > - dstr = decompose_unichar(c, &dsize); > + dstr = decompose_unichar(c, &dsize, &dhangul[0]); > else > dstr = NULL; > if (dstr) { > @@ -344,6 +390,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) > unsigned long hash; > wchar_t c; > u16 c2; > + u16 dhangul[3]; > > casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > @@ -357,7 +404,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) > len -= size; > > if (decompose) > - dstr = decompose_unichar(c, &dsize); > + dstr = decompose_unichar(c, &dsize, &dhangul[0]); > else > dstr = NULL; > if (dstr) { > @@ -396,6 +443,7 @@ int hfsplus_compare_dentry(const struct dentry *dentry, > const char *astr1, *astr2; > u16 c1, c2; > wchar_t c; > + u16 dhangul_1[3], dhangul_2[3]; > > casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > @@ -413,7 +461,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, > len1 -= size; > > if (decompose) > - dstr1 = decompose_unichar(c, &dsize1); > + dstr1 = decompose_unichar(c, &dsize1, > + &dhangul_1[0]); > if (!decompose || !dstr1) { > c1 = c; > dstr1 = &c1; > @@ -427,7 +476,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, > len2 -= size; > > if (decompose) > - dstr2 = decompose_unichar(c, &dsize2); > + dstr2 = decompose_unichar(c, &dsize2, > + &dhangul_2[0]); > if (!decompose || !dstr2) { > c2 = c; > dstr2 = &c2; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix decomposition of Hangul characters 2017-11-27 22:40 ` Viacheslav Dubeyko @ 2017-11-28 15:02 ` Ernesto A. Fernández 2017-11-28 16:30 ` Viacheslav Dubeyko 0 siblings, 1 reply; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-28 15:02 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: linux-fsdevel, tchou, htl10, Alexander Viro, Andrew Morton, Ernesto A. Fernández On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote: > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote: > > It would be good to have some feedback on the use of code under the > > Unicode License, I don't know if that can be a problem. > > > > To Ting-Chang Hou: This patch is slightly different from the one you > > tested with your mac. You are still tagged as Tested-by, but you may > > want to check again that I have not broken anything. > > > > Thanks, > > Ernest > > > > -- >8 -- > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > > > Files created under macOS become essentially unusable under linux if > > their names contain any Hangul, and vice versa. > > > > This happens because the normalization of Hangul characters is a special > > case: it can be done algorithmically, without need of a table. The > > hfsplus module deals with Hangul correctly when composing, but > > completely forgets about it when performing a decomposition. > > > > Solve this by using the Hangul decomposition function provided in the > > Unicode Standard. It's under the Unicode License, compatible with the > > GPL. > > > > I believe it makes sense to share more details about algorithm details. > I think that the Unicode standard contains some explanation how to > decompose correctly. It will be great to have some brief citation in the > patch description. The algorithm is very simple, the best way to understand it is just looking at the code. I don't know the first thing about Korean writing, so I don't think I should attempt to explain why the decomposition is done this way. If somebody else is interested in the details, they can follow the citation in the header comment of the decompose_hangul function. > > This patch will cause trouble for Hangul filenames already created by > > the module in the past. This shouldn't be a special concern because the > > main purpose of the module was always sharing with macOS. If a user > > actually needs to access such a file the nodecompose mount option should > > be enough. > > > > Reported-by: Ting-Chang Hou <tchou@synology.com> > > Tested-by: Ting-Chang Hou <tchou@synology.com> > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> > > --- > > fs/hfsplus/unicode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 56 insertions(+), 6 deletions(-) > > > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > index dfa90c2..7ceae91 100644 > > --- a/fs/hfsplus/unicode.c > > +++ b/fs/hfsplus/unicode.c > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct super_block *sb, const char *astr, int len, > > return size; > > } > > > > -/* Decomposes a single unicode character. */ > > -static inline u16 *decompose_unichar(wchar_t uc, int *size) > > +/* Decomposes a non-Hangul unicode character. */ > > +static u16 *decompose_nonhangul(wchar_t uc, int *size) > > { > > int off; > > > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t uc, int *size) > > return hfsplus_decompose_table + (off / 4); > > } > > > > +/* > > + * Decomposes a Hangul unicode character. > > + * > > + * This function was adapted from sample code in section 3.12 of the > > + * Unicode Standard, version 10.0. > > + * > > + * Copyright (C) 1991-2017 Unicode, Inc. All rights reserved. Distributed > > + * under the Terms of Use in http://www.unicode.org/copyright.html. > > + */ > > +static int decompose_hangul(wchar_t uc, u16 *result) > > +{ > > + int index; > > + int l, v, t; > > + > > + index = uc - Hangul_SBase; > > + if (index < 0 || index >= Hangul_SCount) > > + return 0; > > + > > + l = Hangul_LBase + index / Hangul_NCount; > > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > > + t = Hangul_TBase + index % Hangul_TCount; > > + > > + result[0] = l; > > + result[1] = v; > > + if (t != Hangul_TBase) { > > + result[2] = t; > > + return 3; > > + } > > + return 2; > > +} > > + > > +/* Decomposes a single unicode character. */ > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16 *hangul_buffer) > > +{ > > + u16 *result; > > + > > + /* Hangul is handled separately */ > > + result = hangul_buffer; > > + *size = decompose_hangul(uc, result); > > Is it possible to distinguish the different cases beforehand and to > select one proper function for execution without the execution both ones > in the worst case? The decompose_hangul and decompose_nonhangul functions are called only once, so I would expect the compiler to inline them both. The result should be similar to your suggestion. > > + if (*size == 0) > > + /* not Hangul */ > > + result = decompose_nonhangul(uc, size); > > I think it makes sense to use the brackets here. If I remember > correctly, this is the code style's requirement. I couldn't find such a requirement, but I did find some other places within the hfsplus code where the comments are done my way. For example, in the hfsplus_cat_write_inode() function of the inode.c file. > Thanks, > Vyacheslav Dubeyko. Thanks, Ernest > > + return result; > > +} > > + > > int hfsplus_asc2uni(struct super_block *sb, > > struct hfsplus_unistr *ustr, int max_unistr_len, > > const char *astr, int len) > > @@ -303,13 +348,14 @@ int hfsplus_asc2uni(struct super_block *sb, > > int size, dsize, decompose; > > u16 *dstr, outlen = 0; > > wchar_t c; > > + u16 dhangul[3]; > > > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > > while (outlen < max_unistr_len && len > 0) { > > size = asc2unichar(sb, astr, len, &c); > > > > if (decompose) > > - dstr = decompose_unichar(c, &dsize); > > + dstr = decompose_unichar(c, &dsize, &dhangul[0]); > > else > > dstr = NULL; > > if (dstr) { > > @@ -344,6 +390,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) > > unsigned long hash; > > wchar_t c; > > u16 c2; > > + u16 dhangul[3]; > > > > casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > > @@ -357,7 +404,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) > > len -= size; > > > > if (decompose) > > - dstr = decompose_unichar(c, &dsize); > > + dstr = decompose_unichar(c, &dsize, &dhangul[0]); > > else > > dstr = NULL; > > if (dstr) { > > @@ -396,6 +443,7 @@ int hfsplus_compare_dentry(const struct dentry *dentry, > > const char *astr1, *astr2; > > u16 c1, c2; > > wchar_t c; > > + u16 dhangul_1[3], dhangul_2[3]; > > > > casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); > > @@ -413,7 +461,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, > > len1 -= size; > > > > if (decompose) > > - dstr1 = decompose_unichar(c, &dsize1); > > + dstr1 = decompose_unichar(c, &dsize1, > > + &dhangul_1[0]); > > if (!decompose || !dstr1) { > > c1 = c; > > dstr1 = &c1; > > @@ -427,7 +476,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, > > len2 -= size; > > > > if (decompose) > > - dstr2 = decompose_unichar(c, &dsize2); > > + dstr2 = decompose_unichar(c, &dsize2, > > + &dhangul_2[0]); > > if (!decompose || !dstr2) { > > c2 = c; > > dstr2 = &c2; > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix decomposition of Hangul characters 2017-11-28 15:02 ` Ernesto A. Fernández @ 2017-11-28 16:30 ` Viacheslav Dubeyko 2017-11-28 18:15 ` Ernesto A. Fernández 0 siblings, 1 reply; 21+ messages in thread From: Viacheslav Dubeyko @ 2017-11-28 16:30 UTC (permalink / raw) To: Ernesto A. Fernández Cc: linux-fsdevel, tchou, htl10, Alexander Viro, Andrew Morton On Tue, 2017-11-28 at 12:02 -0300, Ernesto A. Fernández wrote: > On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote: > > > > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote: > > > > > > It would be good to have some feedback on the use of code under > > > the > > > Unicode License, I don't know if that can be a problem. > > > > > > To Ting-Chang Hou: This patch is slightly different from the one > > > you > > > tested with your mac. You are still tagged as Tested-by, but you > > > may > > > want to check again that I have not broken anything. > > > > > > Thanks, > > > Ernest > > > > > > -- >8 -- > > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > > > > > Files created under macOS become essentially unusable under linux > > > if > > > their names contain any Hangul, and vice versa. > > > > > > This happens because the normalization of Hangul characters is a > > > special > > > case: it can be done algorithmically, without need of a table. > > > The > > > hfsplus module deals with Hangul correctly when composing, but > > > completely forgets about it when performing a decomposition. > > > > > > Solve this by using the Hangul decomposition function provided in > > > the > > > Unicode Standard. It's under the Unicode License, compatible with > > > the > > > GPL. > > > > > I believe it makes sense to share more details about algorithm > > details. > > I think that the Unicode standard contains some explanation how to > > decompose correctly. It will be great to have some brief citation > > in the > > patch description. > The algorithm is very simple, the best way to understand it is just > looking at the code. I don't know the first thing about Korean > writing, so > I don't think I should attempt to explain why the decomposition is > done > this way. If somebody else is interested in the details, they can > follow > the citation in the header comment of the decompose_hangul function. > You should share the link where it can be found. Finally, the comment needs in more detailed explanation. > > > > > > > > This patch will cause trouble for Hangul filenames already > > > created by > > > the module in the past. This shouldn't be a special concern > > > because the > > > main purpose of the module was always sharing with macOS. If a > > > user > > > actually needs to access such a file the nodecompose mount option > > > should > > > be enough. > > > > > > Reported-by: Ting-Chang Hou <tchou@synology.com> > > > Tested-by: Ting-Chang Hou <tchou@synology.com> > > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail. > > > com> > > > --- > > > fs/hfsplus/unicode.c | 62 > > > +++++++++++++++++++++++++++++++++++++++++++++++----- > > > 1 file changed, 56 insertions(+), 6 deletions(-) > > > > > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > > index dfa90c2..7ceae91 100644 > > > --- a/fs/hfsplus/unicode.c > > > +++ b/fs/hfsplus/unicode.c > > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct > > > super_block *sb, const char *astr, int len, > > > return size; > > > } > > > > > > -/* Decomposes a single unicode character. */ > > > -static inline u16 *decompose_unichar(wchar_t uc, int *size) > > > +/* Decomposes a non-Hangul unicode character. */ > > > +static u16 *decompose_nonhangul(wchar_t uc, int *size) > > > { > > > int off; > > > > > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t > > > uc, int *size) > > > return hfsplus_decompose_table + (off / 4); > > > } > > > > > > +/* > > > + * Decomposes a Hangul unicode character. > > > + * > > > + * This function was adapted from sample code in section 3.12 of > > > the > > > + * Unicode Standard, version 10.0. > > > + * > > > + * Copyright (C) 1991-2017 Unicode, Inc. All rights > > > reserved. Distributed > > > + * under the Terms of Use in http://www.unicode.org/copyright.ht > > > ml. > > > + */ > > > +static int decompose_hangul(wchar_t uc, u16 *result) > > > +{ > > > + int index; > > > + int l, v, t; > > > + > > > + index = uc - Hangul_SBase; > > > + if (index < 0 || index >= Hangul_SCount) > > > + return 0; > > > + > > > + l = Hangul_LBase + index / Hangul_NCount; > > > + v = Hangul_VBase + (index % Hangul_NCount) / > > > Hangul_TCount; > > > + t = Hangul_TBase + index % Hangul_TCount; > > > + > > > + result[0] = l; > > > + result[1] = v; > > > + if (t != Hangul_TBase) { > > > + result[2] = t; > > > + return 3; > > > + } > > > + return 2; > > > +} > > > + > > > +/* Decomposes a single unicode character. */ > > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16 > > > *hangul_buffer) > > > +{ > > > + u16 *result; > > > + > > > + /* Hangul is handled separately */ > > > + result = hangul_buffer; > > > + *size = decompose_hangul(uc, result); > > Is it possible to distinguish the different cases beforehand and to > > select one proper function for execution without the execution both > > ones > > in the worst case? > The decompose_hangul and decompose_nonhangul functions are called > only once, > so I would expect the compiler to inline them both. The result should > be > similar to your suggestion. > It will be much better to have: If (hangful) { <decompose_hangul> } else { <decompose_nonhangul> } Currently, the code looks weird. Especially, decompose_hangul() is able to distinguish two different cases. So, it's easy to extract the check. > > > > > > > > + if (*size == 0) > > > + /* not Hangul */ > > > + result = decompose_nonhangul(uc, size); > > I think it makes sense to use the brackets here. If I remember > > correctly, this is the code style's requirement. > I couldn't find such a requirement, but I did find some other places > within > the hfsplus code where the comments are done my way. For example, in > the > hfsplus_cat_write_inode() function of the inode.c file. > The hfsplus code contains many awful places. It doesn't mean that it needs to follow the bad style examples. If you have only one string: if (condition) <do_something> then it's OK not to use the brackets. Otherwise, you should include the branch into the brackets. Thanks, Vyacheslav Dubeyko. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix decomposition of Hangul characters 2017-11-28 16:30 ` Viacheslav Dubeyko @ 2017-11-28 18:15 ` Ernesto A. Fernández 0 siblings, 0 replies; 21+ messages in thread From: Ernesto A. Fernández @ 2017-11-28 18:15 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: linux-fsdevel, tchou, htl10, Alexander Viro, Andrew Morton, Ernesto A. Fernández On Tue, Nov 28, 2017 at 08:30:54AM -0800, Viacheslav Dubeyko wrote: > On Tue, 2017-11-28 at 12:02 -0300, Ernesto A. Fernández wrote: > > On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote: > > > > > > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote: > > > > > > > > It would be good to have some feedback on the use of code under > > > > the > > > > Unicode License, I don't know if that can be a problem. > > > > > > > > To Ting-Chang Hou: This patch is slightly different from the one > > > > you > > > > tested with your mac. You are still tagged as Tested-by, but you > > > > may > > > > want to check again that I have not broken anything. > > > > > > > > Thanks, > > > > Ernest > > > > > > > > -- >8 -- > > > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > > > > > > > Files created under macOS become essentially unusable under linux > > > > if > > > > their names contain any Hangul, and vice versa. > > > > > > > > This happens because the normalization of Hangul characters is a > > > > special > > > > case: it can be done algorithmically, without need of a table. > > > > The > > > > hfsplus module deals with Hangul correctly when composing, but > > > > completely forgets about it when performing a decomposition. > > > > > > > > Solve this by using the Hangul decomposition function provided in > > > > the > > > > Unicode Standard. It's under the Unicode License, compatible with > > > > the > > > > GPL. > > > > > > > I believe it makes sense to share more details about algorithm > > > details. > > > I think that the Unicode standard contains some explanation how to > > > decompose correctly. It will be great to have some brief citation > > > in the > > > patch description. > > The algorithm is very simple, the best way to understand it is just > > looking at the code. I don't know the first thing about Korean > > writing, so > > I don't think I should attempt to explain why the decomposition is > > done > > this way. If somebody else is interested in the details, they can > > follow > > the citation in the header comment of the decompose_hangul function. > > > > You should share the link where it can be found. Is that necessary? I think people can find the Unicode Standard on their own. Even if I did provide a link, it's a pdf, so people would still need to look up the right sections. It's not going to save anybody much work. > Finally, the comment > needs in more detailed explanation. Are you talking about the function header comment? I don't know if I follow. Perhaps I could explain that the return value is the size, but it seems pretty obvious from the use. > > > > > > > > > > > This patch will cause trouble for Hangul filenames already > > > > created by > > > > the module in the past. This shouldn't be a special concern > > > > because the > > > > main purpose of the module was always sharing with macOS. If a > > > > user > > > > actually needs to access such a file the nodecompose mount option > > > > should > > > > be enough. > > > > > > > > Reported-by: Ting-Chang Hou <tchou@synology.com> > > > > Tested-by: Ting-Chang Hou <tchou@synology.com> > > > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail. > > > > com> > > > > --- > > > > fs/hfsplus/unicode.c | 62 > > > > +++++++++++++++++++++++++++++++++++++++++++++++----- > > > > 1 file changed, 56 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > > > index dfa90c2..7ceae91 100644 > > > > --- a/fs/hfsplus/unicode.c > > > > +++ b/fs/hfsplus/unicode.c > > > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct > > > > super_block *sb, const char *astr, int len, > > > > return size; > > > > } > > > > > > > > -/* Decomposes a single unicode character. */ > > > > -static inline u16 *decompose_unichar(wchar_t uc, int *size) > > > > +/* Decomposes a non-Hangul unicode character. */ > > > > +static u16 *decompose_nonhangul(wchar_t uc, int *size) > > > > { > > > > int off; > > > > > > > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t > > > > uc, int *size) > > > > return hfsplus_decompose_table + (off / 4); > > > > } > > > > > > > > +/* > > > > + * Decomposes a Hangul unicode character. > > > > + * > > > > + * This function was adapted from sample code in section 3.12 of > > > > the > > > > + * Unicode Standard, version 10.0. > > > > + * > > > > + * Copyright (C) 1991-2017 Unicode, Inc. All rights > > > > reserved. Distributed > > > > + * under the Terms of Use in http://www.unicode.org/copyright.ht > > > > ml. > > > > + */ > > > > +static int decompose_hangul(wchar_t uc, u16 *result) > > > > +{ > > > > + int index; > > > > + int l, v, t; > > > > + > > > > + index = uc - Hangul_SBase; > > > > + if (index < 0 || index >= Hangul_SCount) > > > > + return 0; > > > > + > > > > + l = Hangul_LBase + index / Hangul_NCount; > > > > + v = Hangul_VBase + (index % Hangul_NCount) / > > > > Hangul_TCount; > > > > + t = Hangul_TBase + index % Hangul_TCount; > > > > + > > > > + result[0] = l; > > > > + result[1] = v; > > > > + if (t != Hangul_TBase) { > > > > + result[2] = t; > > > > + return 3; > > > > + } > > > > + return 2; > > > > +} > > > > + > > > > +/* Decomposes a single unicode character. */ > > > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16 > > > > *hangul_buffer) > > > > +{ > > > > + u16 *result; > > > > + > > > > + /* Hangul is handled separately */ > > > > + result = hangul_buffer; > > > > + *size = decompose_hangul(uc, result); > > > Is it possible to distinguish the different cases beforehand and to > > > select one proper function for execution without the execution both > > > ones > > > in the worst case? > > The decompose_hangul and decompose_nonhangul functions are called > > only once, > > so I would expect the compiler to inline them both. The result should > > be > > similar to your suggestion. > > > > > It will be much better to have: > > If (hangful) { > <decompose_hangul> > } else { > <decompose_nonhangul> > } > > Currently, the code looks weird. Especially, decompose_hangul() is able > to distinguish two different cases. So, it's easy to extract the check. OK, your previous objection seemed to be about efficiency, not style. Moving the Hangul check outside the function would require a reader of decompose_unichar() to figure out what Hangul_SBase and Hangul_SCount are, when they may not be interested in the Hangul case at all. My version may be weird but I think it's easier to understand. Keep in mind that the old decompose_unichar function was also checking if the character was Hangul, and doing nothing if that was the case. So this is not a big departure from the existing code. Finally, there is the issue of the license; I don't know if I should be splitting decompose_hangul() since it would make it harder to give attribution. > > > > > > > > > > > + if (*size == 0) > > > > + /* not Hangul */ > > > > + result = decompose_nonhangul(uc, size); > > > I think it makes sense to use the brackets here. If I remember > > > correctly, this is the code style's requirement. > > I couldn't find such a requirement, but I did find some other places > > within > > the hfsplus code where the comments are done my way. For example, in > > the > > hfsplus_cat_write_inode() function of the inode.c file. > > > > > The hfsplus code contains many awful places. I'd say most of the older code is decent and easy to understand. > It doesn't mean that it > needs to follow the bad style examples. If you have only one string: > > if (condition) > <do_something> > > then it's OK not to use the brackets. Otherwise, you should include the > branch into the brackets. I looked around outside of hfsplus, and it seems that both styles are in use. The most charming example I found was in the rb_next_postorder() function of lib/rbtree.c, where one branch of a conditional uses your style and the other uses mine. If you can actually find the place in the syle guide where this is required then I will make the change. Otherwise I would prefer to leave it as it is. > Thanks, > Vyacheslav Dubeyko. > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix decomposition of Hangul characters 2017-11-27 19:36 ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández 2017-11-27 22:40 ` Viacheslav Dubeyko @ 2018-08-23 18:29 ` Ernesto A. Fernández 2018-08-24 1:20 ` tchou 1 sibling, 1 reply; 21+ messages in thread From: Ernesto A. Fernández @ 2018-08-23 18:29 UTC (permalink / raw) To: tchou; +Cc: slava, linux-fsdevel, htl10 Hi again: On Mon, Nov 27, 2017 at 04:36:54PM -0300, Ernesto A. Fernández wrote: > It would be good to have some feedback on the use of code under the > Unicode License, I don't know if that can be a problem. > > To Ting-Chang Hou: This patch is slightly different from the one you > tested with your mac. You are still tagged as Tested-by, but you may > want to check again that I have not broken anything. > > Thanks, > Ernest > > -- >8 -- > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > Files created under macOS become essentially unusable under linux if > their names contain any Hangul, and vice versa. > > This happens because the normalization of Hangul characters is a special > case: it can be done algorithmically, without need of a table. The > hfsplus module deals with Hangul correctly when composing, but > completely forgets about it when performing a decomposition. > > Solve this by using the Hangul decomposition function provided in the > Unicode Standard. It's under the Unicode License, compatible with the > GPL. > > This patch will cause trouble for Hangul filenames already created by > the module in the past. This shouldn't be a special concern because the > main purpose of the module was always sharing with macOS. If a user > actually needs to access such a file the nodecompose mount option should > be enough. > > Reported-by: Ting-Chang Hou <tchou@synology.com> > Tested-by: Ting-Chang Hou <tchou@synology.com> > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> > It took a while but this patch has been merged now. Thanks again for your report. Ernest ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] hfsplus: fix decomposition of Hangul characters 2018-08-23 18:29 ` Ernesto A. Fernández @ 2018-08-24 1:20 ` tchou 0 siblings, 0 replies; 21+ messages in thread From: tchou @ 2018-08-24 1:20 UTC (permalink / raw) To: Ernesto A. Fernández Cc: slava, linux-fsdevel, htl10, linux-fsdevel-owner Hi, Ernesto A. Fernández 於 2018-08-24 02:29 寫到: > Hi again: > > On Mon, Nov 27, 2017 at 04:36:54PM -0300, Ernesto A. Fernández wrote: >> It would be good to have some feedback on the use of code under the >> Unicode License, I don't know if that can be a problem. >> >> To Ting-Chang Hou: This patch is slightly different from the one you >> tested with your mac. You are still tagged as Tested-by, but you may >> want to check again that I have not broken anything. >> >> Thanks, >> Ernest >> >> -- >8 -- >> Subject: [PATCH] hfsplus: fix decomposition of Hangul characters >> >> Files created under macOS become essentially unusable under linux if >> their names contain any Hangul, and vice versa. >> >> This happens because the normalization of Hangul characters is a >> special >> case: it can be done algorithmically, without need of a table. The >> hfsplus module deals with Hangul correctly when composing, but >> completely forgets about it when performing a decomposition. >> >> Solve this by using the Hangul decomposition function provided in the >> Unicode Standard. It's under the Unicode License, compatible with the >> GPL. >> >> This patch will cause trouble for Hangul filenames already created by >> the module in the past. This shouldn't be a special concern because >> the >> main purpose of the module was always sharing with macOS. If a user >> actually needs to access such a file the nodecompose mount option >> should >> be enough. >> >> Reported-by: Ting-Chang Hou <tchou@synology.com> >> Tested-by: Ting-Chang Hou <tchou@synology.com> >> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> >> > > It took a while but this patch has been merged now. Thanks again for > your > report. And thanks for your patch, too. > > Ernest ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name @ 2017-11-17 19:33 Slava Dubeyko 0 siblings, 0 replies; 21+ messages in thread From: Slava Dubeyko @ 2017-11-17 19:33 UTC (permalink / raw) To: tchou, Hin-Tak Leung; +Cc: linux-fsdevel Hi Ting-Chang, Sorry, I need to reapet my e-mail. Could you please share the examples of incorrect and corect behavior? Are you sure that we need in such fix because it is tne mount option. Hi Hin-Tak, Could you please test the fix? Thanks, Vyacheslav Dubeyko. >> From: Ting-Chang Hou <tchou@synology.com> >> Date: Fri, Nov 17, 2017 at 12:20 AM >> Subject: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name >> To: linux-fsdevel@vger.kernel.org >> Cc: Ting-Chang Hou <tchou@synology.com> >> >> The unicode of hangul from macOS is decomposed. There has a bug that >> mistake decomposed unicode for composed when change unicode to ascii, >> so it cannot recognize the hangul correctly. >> >> Signed-off-by: Ting-Chang Hou <tchou@synology.com> >> --- >> fs/hfsplus/unicode.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c >> index dfa90c2..2daf7b0 100644 >> --- a/fs/hfsplus/unicode.c >> +++ b/fs/hfsplus/unicode.c >> @@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb, >> ustrlen = be16_to_cpu(ustr->length); >> len = *len_p; >> ce1 = NULL; >> - compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); >> + compose = test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); >> >> while (ustrlen > 0) { >> c0 = be16_to_cpu(*ip++); >> -- >> 2.7.4 ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-08-24 4:53 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-17 8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou 2017-11-19 0:57 ` Ernesto A. Fernández 2017-11-23 3:57 ` tchou 2017-11-23 4:21 ` Viacheslav Dubeyko 2017-11-23 6:05 ` tchou 2017-11-23 6:23 ` Viacheslav Dubeyko 2017-11-23 6:34 ` tchou 2017-11-23 11:32 ` Ernesto A. Fernández 2017-11-23 18:36 ` Viacheslav Dubeyko 2017-11-23 22:20 ` Ernesto A. Fernández 2017-11-24 7:25 ` tchou 2017-11-24 11:45 ` Ernesto A. Fernández 2017-11-27 2:07 ` tchou 2017-11-27 19:36 ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández 2017-11-27 22:40 ` Viacheslav Dubeyko 2017-11-28 15:02 ` Ernesto A. Fernández 2017-11-28 16:30 ` Viacheslav Dubeyko 2017-11-28 18:15 ` Ernesto A. Fernández 2018-08-23 18:29 ` Ernesto A. Fernández 2018-08-24 1:20 ` tchou 2017-11-17 19:33 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Slava Dubeyko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.