linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hfsplus: fix decomposition of Hangul characters
@ 2018-07-17 22:09 Ernesto A. Fernández
  0 siblings, 0 replies; 19+ messages in thread
From: Ernesto A. Fernández @ 2018-07-17 22:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Andrew Morton, Ting-Chang Hou

Files created under macOS cannot be opened under linux if their names
contain Korean characters, and vice versa.

The Korean alphabet is special because its normalization is done without
a table. The module deals with it correctly when composing, but forgets
about it for the decomposition.

Fix this using the Hangul decomposition function provided in the Unicode
Standard. The code fits a bit awkwardly because it requires a buffer,
while all the other normalizations are returned as pointers to the
decomposition table. This is actually also a bug because reordering may
still be needed, but for now leave it as it is.

The patch will cause trouble for Hangul filenames already created by the
module in the past. This shouldn't really be concern because its main
purpose was always sharing with macOS. If a user actually needs to
access such a file the nodecompose mount option should be enough.

Reported-and-tested-by: Ting-Chang Hou <tchou@synology.com>
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
---
 fs/hfsplus/unicode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
index dfa90c21948f..c8d1b2be7854 100644
--- a/fs/hfsplus/unicode.c
+++ b/fs/hfsplus/unicode.c
@@ -272,8 +272,8 @@ static inline int asc2unichar(struct super_block *sb, const char *astr, int len,
 	return size;
 }
 
-/* Decomposes a single unicode character. */
-static inline u16 *decompose_unichar(wchar_t uc, int *size)
+/* Decomposes a non-Hangul unicode character. */
+static u16 *hfsplus_decompose_nonhangul(wchar_t uc, int *size)
 {
 	int off;
 
@@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t uc, int *size)
 	return hfsplus_decompose_table + (off / 4);
 }
 
+/*
+ * Try to decompose a unicode character as Hangul. Return 0 if @uc is not
+ * precomposed Hangul, otherwise return the length of the decomposition.
+ *
+ * This function was adapted from sample code from the Unicode Standard
+ * Annex #15: Unicode Normalization Forms, version 3.2.0.
+ *
+ * Copyright (C) 1991-2018 Unicode, Inc.  All rights reserved.  Distributed
+ * under the Terms of Use in http://www.unicode.org/copyright.html.
+ */
+static int hfsplus_try_decompose_hangul(wchar_t uc, u16 *result)
+{
+	int index;
+	int l, v, t;
+
+	index = uc - Hangul_SBase;
+	if (index < 0 || index >= Hangul_SCount)
+		return 0;
+
+	l = Hangul_LBase + index / Hangul_NCount;
+	v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount;
+	t = Hangul_TBase + index % Hangul_TCount;
+
+	result[0] = l;
+	result[1] = v;
+	if (t != Hangul_TBase) {
+		result[2] = t;
+		return 3;
+	}
+	return 2;
+}
+
+/* Decomposes a single unicode character. */
+static u16 *decompose_unichar(wchar_t uc, int *size, u16 *hangul_buffer)
+{
+	u16 *result;
+
+	/* Hangul is handled separately */
+	result = hangul_buffer;
+	*size = hfsplus_try_decompose_hangul(uc, result);
+	if (*size == 0)
+		result = hfsplus_decompose_nonhangul(uc, size);
+	return result;
+}
+
 int hfsplus_asc2uni(struct super_block *sb,
 		    struct hfsplus_unistr *ustr, int max_unistr_len,
 		    const char *astr, int len)
@@ -303,13 +348,14 @@ int hfsplus_asc2uni(struct super_block *sb,
 	int size, dsize, decompose;
 	u16 *dstr, outlen = 0;
 	wchar_t c;
+	u16 dhangul[3];
 
 	decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
 	while (outlen < max_unistr_len && len > 0) {
 		size = asc2unichar(sb, astr, len, &c);
 
 		if (decompose)
-			dstr = decompose_unichar(c, &dsize);
+			dstr = decompose_unichar(c, &dsize, dhangul);
 		else
 			dstr = NULL;
 		if (dstr) {
@@ -344,6 +390,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str)
 	unsigned long hash;
 	wchar_t c;
 	u16 c2;
+	u16 dhangul[3];
 
 	casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
 	decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
@@ -357,7 +404,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str)
 		len -= size;
 
 		if (decompose)
-			dstr = decompose_unichar(c, &dsize);
+			dstr = decompose_unichar(c, &dsize, dhangul);
 		else
 			dstr = NULL;
 		if (dstr) {
@@ -396,6 +443,7 @@ int hfsplus_compare_dentry(const struct dentry *dentry,
 	const char *astr1, *astr2;
 	u16 c1, c2;
 	wchar_t c;
+	u16 dhangul_1[3], dhangul_2[3];
 
 	casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
 	decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
@@ -413,7 +461,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry,
 			len1 -= size;
 
 			if (decompose)
-				dstr1 = decompose_unichar(c, &dsize1);
+				dstr1 = decompose_unichar(c, &dsize1,
+							  dhangul_1);
 			if (!decompose || !dstr1) {
 				c1 = c;
 				dstr1 = &c1;
@@ -427,7 +476,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry,
 			len2 -= size;
 
 			if (decompose)
-				dstr2 = decompose_unichar(c, &dsize2);
+				dstr2 = decompose_unichar(c, &dsize2,
+							  dhangul_2);
 			if (!decompose || !dstr2) {
 				c2 = c;
 				dstr2 = &c2;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread
[parent not found: <748914583.7912009.1511948801051.ref@mail.yahoo.com>]
[parent not found: <1691800505.7838608.1511945866988.ref@mail.yahoo.com>]
[parent not found: <1047295440.7535049.1511905720846.ref@mail.yahoo.com>]
[parent not found: <114108759.7167494.1511886580033.ref@mail.yahoo.com>]
* [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name
@ 2017-11-17  8:20 Ting-Chang Hou
  2017-11-19  0:57 ` Ernesto A. Fernández
  0 siblings, 1 reply; 19+ messages in thread
From: Ting-Chang Hou @ 2017-11-17  8:20 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Ting-Chang Hou

The unicode of hangul from macOS is decomposed. There has a bug that
mistake decomposed unicode for composed when change unicode to ascii,
so it cannot recognize the hangul correctly.

Signed-off-by: Ting-Chang Hou <tchou@synology.com>
---
 fs/hfsplus/unicode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
index dfa90c2..2daf7b0 100644
--- a/fs/hfsplus/unicode.c
+++ b/fs/hfsplus/unicode.c
@@ -135,7 +135,7 @@ int hfsplus_uni2asc(struct super_block *sb,
 	ustrlen = be16_to_cpu(ustr->length);
 	len = *len_p;
 	ce1 = NULL;
-	compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
+	compose = test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
 
 	while (ustrlen > 0) {
 		c0 = be16_to_cpu(*ip++);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-08-24  4:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1999104027.7153972.1511885718712.ref@mail.yahoo.com>
2017-11-28 16:15 ` [PATCH] hfsplus: fix decomposition of Hangul characters Hin-Tak Leung
2017-11-28 17:50   ` Viacheslav Dubeyko
2017-11-28 18:51   ` Ernesto A. Fernández
2018-07-17 22:09 Ernesto A. Fernández
     [not found] <748914583.7912009.1511948801051.ref@mail.yahoo.com>
2017-11-29  9:46 ` Hin-Tak Leung
     [not found] <1691800505.7838608.1511945866988.ref@mail.yahoo.com>
2017-11-29  8:57 ` Hin-Tak Leung
2017-11-29 14:42   ` Ernesto A. Fernández
2017-11-29 16:28     ` Viacheslav Dubeyko
2017-11-30  4:03       ` Ernesto A. Fernández
     [not found] <1047295440.7535049.1511905720846.ref@mail.yahoo.com>
2017-11-28 21:48 ` Hin-Tak Leung
2017-11-28 23:10   ` Ernesto A. Fernández
     [not found] <114108759.7167494.1511886580033.ref@mail.yahoo.com>
2017-11-28 16:29 ` Hin-Tak Leung
  -- strict thread matches above, loose matches on Subject: below --
2017-11-17  8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou
2017-11-19  0:57 ` Ernesto A. Fernández
2017-11-23  3:57   ` tchou
2017-11-23 11:32     ` Ernesto A. Fernández
2017-11-24  7:25       ` tchou
2017-11-24 11:45         ` Ernesto A. Fernández
2017-11-27  2:07           ` tchou
2017-11-27 19:36             ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández
2017-11-27 22:40               ` Viacheslav Dubeyko
2017-11-28 15:02                 ` Ernesto A. Fernández
2017-11-28 16:30                   ` Viacheslav Dubeyko
2017-11-28 18:15                     ` Ernesto A. Fernández
2018-08-23 18:29               ` Ernesto A. Fernández
2018-08-24  1:20                 ` tchou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).