linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Make UTF-8 encoding loadable
@ 2021-03-29 20:42 Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 1/4] fs: unicode: Use strscpy() instead of strncpy() Shreeya Patel
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Shreeya Patel @ 2021-03-29 20:42 UTC (permalink / raw)
  To: tytso, adilger.kernel, jaegeuk, chao, krisman, ebiggers, drosen,
	ebiggers, yuchao0
  Cc: linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida

utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions and it is not
necessary to carry this large table in the kernel.
Goal is to make UTF-8 encoding loadable by converting it into a module
and adding a layer between the filesystems and the utf8 module which will
load the module whenever any filesystem that needs unicode is mounted.

1st patch in the series resolves the warning reported by kernel test
robot by using strscpy instead of strncpy.

Unicode is the subsystem and utf8 is a charachter encoding for the
subsystem, hence 2nd and 3rd patches in the series are renaming functions
and file name to unicode for better understanding the difference between
UTF-8 module and unicode layer.

Last patch in the series adds the layer and utf8 module and also uses
static calls which gives performance benefit when compared to indirect
calls using function pointers.

---
Changes in v5
  - Remove patch which adds NULL check in ext4/super.c and f2fs/super.c
    before calling unicode_unload().
  - Rename global variables and default static call functions for better
    understanding
  - Make only config UNICODE_UTF8 visible and config UNICODE to be always
    enabled provided UNICODE_UTF8 is enabled.  
  - Improve the documentation for Kconfig
  - Improve the commit message.
 
Changes in v4
  - Return error from the static calls instead of doing nothing and
    succeeding even without loading the module.
  - Remove the complete usage of utf8_ops and use static calls at all
    places.
  - Restore the static calls to default values when module is unloaded.
  - Decrement the reference of module after calling the unload function.
  - Remove spinlock as there will be no race conditions after removing
    utf8_ops.

Changes in v3
  - Add a patch which checks if utf8 is loaded before calling utf8_unload()
    in ext4 and f2fs filesystems
  - Return error if strscpy() returns value < 0
  - Correct the conditions to prevent NULL pointer dereference while
    accessing functions via utf8_ops variable.
  - Add spinlock to avoid race conditions.
  - Use static_call() for preventing speculative execution attacks.

Changes in v2
  - Remove the duplicate file from the last patch.
  - Make the wrapper functions inline.
  - Remove msleep and use try_module_get() and module_put()
    for ensuring that module is loaded correctly and also
    doesn't get unloaded while in use.
  - Resolve the warning reported by kernel test robot.
  - Resolve all the checkpatch.pl warnings.

Shreeya Patel (4):
  fs: unicode: Use strscpy() instead of strncpy()
  fs: unicode: Rename function names from utf8 to unicode
  fs: unicode: Rename utf8-core file to unicode-core
  fs: unicode: Add utf8 module and a unicode layer

 fs/ext4/hash.c                             |   2 +-
 fs/ext4/namei.c                            |  12 +-
 fs/ext4/super.c                            |   6 +-
 fs/f2fs/dir.c                              |  12 +-
 fs/f2fs/super.c                            |   6 +-
 fs/libfs.c                                 |   6 +-
 fs/unicode/Kconfig                         |  17 ++-
 fs/unicode/Makefile                        |   5 +-
 fs/unicode/unicode-core.c                  |  80 +++++++++++++
 fs/unicode/{utf8-core.c => unicode-utf8.c} |  90 +++++++++------
 fs/unicode/utf8-selftest.c                 |   8 +-
 include/linux/unicode.h                    | 127 ++++++++++++++++++---
 12 files changed, 291 insertions(+), 80 deletions(-)
 create mode 100644 fs/unicode/unicode-core.c
 rename fs/unicode/{utf8-core.c => unicode-utf8.c} (59%)

-- 
2.30.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/4] fs: unicode: Use strscpy() instead of strncpy()
  2021-03-29 20:42 [PATCH v5 0/4] Make UTF-8 encoding loadable Shreeya Patel
@ 2021-03-29 20:42 ` Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode Shreeya Patel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Shreeya Patel @ 2021-03-29 20:42 UTC (permalink / raw)
  To: tytso, adilger.kernel, jaegeuk, chao, krisman, ebiggers, drosen,
	ebiggers, yuchao0
  Cc: linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida, kernel test robot

Following warning was reported by Kernel Test Robot.

In function 'utf8_parse_version',
inlined from 'utf8_load' at fs/unicode/utf8mod.c:195:7:
>> fs/unicode/utf8mod.c:175:2: warning: 'strncpy' specified bound 12 equals
destination size [-Wstringop-truncation]
175 |  strncpy(version_string, version, sizeof(version_string));
    |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The -Wstringop-truncation warning highlights the unintended
uses of the strncpy function that truncate the terminating NULL
character from the source string.
Unlike strncpy(), strscpy() always null-terminates the destination string,
hence use strscpy() instead of strncpy().

Fixes: 9d53690f0d4e5 (unicode: implement higher level API for string handling)
Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 fs/unicode/utf8-core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c
index dc25823bfed9..f9e6a2718aba 100644
--- a/fs/unicode/utf8-core.c
+++ b/fs/unicode/utf8-core.c
@@ -179,8 +179,10 @@ static int utf8_parse_version(const char *version, unsigned int *maj,
 		{1, "%d.%d.%d"},
 		{0, NULL}
 	};
+	int ret = strscpy(version_string, version, sizeof(version_string));
 
-	strncpy(version_string, version, sizeof(version_string));
+	if (ret < 0)
+		return ret;
 
 	if (match_token(version_string, token, args) != 1)
 		return -EINVAL;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode
  2021-03-29 20:42 [PATCH v5 0/4] Make UTF-8 encoding loadable Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 1/4] fs: unicode: Use strscpy() instead of strncpy() Shreeya Patel
@ 2021-03-29 20:42 ` Shreeya Patel
  2021-03-30  1:53   ` Eric Biggers
  2021-03-29 20:42 ` [PATCH v5 3/4] fs: unicode: Rename utf8-core file to unicode-core Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer Shreeya Patel
  3 siblings, 1 reply; 14+ messages in thread
From: Shreeya Patel @ 2021-03-29 20:42 UTC (permalink / raw)
  To: tytso, adilger.kernel, jaegeuk, chao, krisman, ebiggers, drosen,
	ebiggers, yuchao0
  Cc: linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida

utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions and it is not
necessary to carry this large table in the kernel.
Goal is to make UTF-8 encoding loadable by converting it into a module
and adding a unicode subsystem layer between the filesystems and the
utf8 module.
This layer will load the module whenever any filesystem that
needs unicode is mounted.
utf8-core will be converted into this layer file in the future patches,
hence rename the function names from utf8 to unicode which will denote the
functions as the unicode subsystem layer functions and this will also be
the first step towards the transformation of utf8-core file into the
unicode subsystem layer file.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
---
Changes in v5
  - Improve the commit message.

 fs/ext4/hash.c             |  2 +-
 fs/ext4/namei.c            | 12 ++++----
 fs/ext4/super.c            |  6 ++--
 fs/f2fs/dir.c              | 12 ++++----
 fs/f2fs/super.c            |  6 ++--
 fs/libfs.c                 |  6 ++--
 fs/unicode/utf8-core.c     | 57 +++++++++++++++++++-------------------
 fs/unicode/utf8-selftest.c |  8 +++---
 include/linux/unicode.h    | 32 ++++++++++-----------
 9 files changed, 70 insertions(+), 71 deletions(-)

diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index a92eb79de0cc..8890a76abe86 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -285,7 +285,7 @@ int ext4fs_dirhash(const struct inode *dir, const char *name, int len,
 		if (!buff)
 			return -ENOMEM;
 
-		dlen = utf8_casefold(um, &qstr, buff, PATH_MAX);
+		dlen = unicode_casefold(um, &qstr, buff, PATH_MAX);
 		if (dlen < 0) {
 			kfree(buff);
 			goto opaque_seq;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 686bf982c84e..dde5ce795416 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1290,9 +1290,9 @@ int ext4_ci_compare(const struct inode *parent, const struct qstr *name,
 	int ret;
 
 	if (quick)
-		ret = utf8_strncasecmp_folded(um, name, entry);
+		ret = unicode_strncasecmp_folded(um, name, entry);
 	else
-		ret = utf8_strncasecmp(um, name, entry);
+		ret = unicode_strncasecmp(um, name, entry);
 
 	if (ret < 0) {
 		/* Handle invalid character sequence as either an error
@@ -1324,9 +1324,9 @@ void ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
 	if (!cf_name->name)
 		return;
 
-	len = utf8_casefold(dir->i_sb->s_encoding,
-			    iname, cf_name->name,
-			    EXT4_NAME_LEN);
+	len = unicode_casefold(dir->i_sb->s_encoding,
+			       iname, cf_name->name,
+			       EXT4_NAME_LEN);
 	if (len <= 0) {
 		kfree(cf_name->name);
 		cf_name->name = NULL;
@@ -2201,7 +2201,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
 
 #ifdef CONFIG_UNICODE
 	if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) &&
-	    sb->s_encoding && utf8_validate(sb->s_encoding, &dentry->d_name))
+	    sb->s_encoding && unicode_validate(sb->s_encoding, &dentry->d_name))
 		return -EINVAL;
 #endif
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ad34a37278cd..2fb845752c90 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1259,7 +1259,7 @@ static void ext4_put_super(struct super_block *sb)
 	fs_put_dax(sbi->s_daxdev);
 	fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
 #ifdef CONFIG_UNICODE
-	utf8_unload(sb->s_encoding);
+	unicode_unload(sb->s_encoding);
 #endif
 	kfree(sbi);
 }
@@ -4304,7 +4304,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			goto failed_mount;
 		}
 
-		encoding = utf8_load(encoding_info->version);
+		encoding = unicode_load(encoding_info->version);
 		if (IS_ERR(encoding)) {
 			ext4_msg(sb, KERN_ERR,
 				 "can't mount with superblock charset: %s-%s "
@@ -5165,7 +5165,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		crypto_free_shash(sbi->s_chksum_driver);
 
 #ifdef CONFIG_UNICODE
-	utf8_unload(sb->s_encoding);
+	unicode_unload(sb->s_encoding);
 #endif
 
 #ifdef CONFIG_QUOTA
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index e6270a867be1..f160f9dd667d 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -84,10 +84,10 @@ int f2fs_init_casefolded_name(const struct inode *dir,
 						   GFP_NOFS);
 		if (!fname->cf_name.name)
 			return -ENOMEM;
-		fname->cf_name.len = utf8_casefold(sb->s_encoding,
-						   fname->usr_fname,
-						   fname->cf_name.name,
-						   F2FS_NAME_LEN);
+		fname->cf_name.len = unicode_casefold(sb->s_encoding,
+						      fname->usr_fname,
+						      fname->cf_name.name,
+						      F2FS_NAME_LEN);
 		if ((int)fname->cf_name.len <= 0) {
 			kfree(fname->cf_name.name);
 			fname->cf_name.name = NULL;
@@ -237,7 +237,7 @@ static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name,
 		entry.len = decrypted_name.len;
 	}
 
-	res = utf8_strncasecmp_folded(um, name, &entry);
+	res = unicode_strncasecmp_folded(um, name, &entry);
 	/*
 	 * In strict mode, ignore invalid names.  In non-strict mode,
 	 * fall back to treating them as opaque byte sequences.
@@ -246,7 +246,7 @@ static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name,
 		res = name->len == entry.len &&
 				memcmp(name->name, entry.name, name->len) == 0;
 	} else {
-		/* utf8_strncasecmp_folded returns 0 on match */
+		/* unicode_strncasecmp_folded returns 0 on match */
 		res = (res == 0);
 	}
 out:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 7069793752f1..b4a92e763e27 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1430,7 +1430,7 @@ static void f2fs_put_super(struct super_block *sb)
 	for (i = 0; i < NR_PAGE_TYPE; i++)
 		kvfree(sbi->write_io[i]);
 #ifdef CONFIG_UNICODE
-	utf8_unload(sb->s_encoding);
+	unicode_unload(sb->s_encoding);
 #endif
 	kfree(sbi);
 }
@@ -3560,7 +3560,7 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi)
 			return -EINVAL;
 		}
 
-		encoding = utf8_load(encoding_info->version);
+		encoding = unicode_load(encoding_info->version);
 		if (IS_ERR(encoding)) {
 			f2fs_err(sbi,
 				 "can't mount with superblock charset: %s-%s "
@@ -4073,7 +4073,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 		kvfree(sbi->write_io[i]);
 
 #ifdef CONFIG_UNICODE
-	utf8_unload(sb->s_encoding);
+	unicode_unload(sb->s_encoding);
 	sb->s_encoding = NULL;
 #endif
 free_options:
diff --git a/fs/libfs.c b/fs/libfs.c
index e2de5401abca..766556165bb5 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1404,7 +1404,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 	 * If the dentry name is stored in-line, then it may be concurrently
 	 * modified by a rename.  If this happens, the VFS will eventually retry
 	 * the lookup, so it doesn't matter what ->d_compare() returns.
-	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
+	 * However, it's unsafe to call unicode_strncasecmp() with an unstable
 	 * string.  Therefore, we have to copy the name into a temporary buffer.
 	 */
 	if (len <= DNAME_INLINE_LEN - 1) {
@@ -1414,7 +1414,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 		/* prevent compiler from optimizing out the temporary buffer */
 		barrier();
 	}
-	ret = utf8_strncasecmp(um, name, &qstr);
+	ret = unicode_strncasecmp(um, name, &qstr);
 	if (ret >= 0)
 		return ret;
 
@@ -1443,7 +1443,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 	if (!dir || !needs_casefold(dir))
 		return 0;
 
-	ret = utf8_casefold_hash(um, dentry, str);
+	ret = unicode_casefold_hash(um, dentry, str);
 	if (ret < 0 && sb_has_strict_encoding(sb))
 		return -EINVAL;
 	return 0;
diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c
index f9e6a2718aba..730dbaedf593 100644
--- a/fs/unicode/utf8-core.c
+++ b/fs/unicode/utf8-core.c
@@ -10,7 +10,7 @@
 
 #include "utf8n.h"
 
-int utf8_validate(const struct unicode_map *um, const struct qstr *str)
+int unicode_validate(const struct unicode_map *um, const struct qstr *str)
 {
 	const struct utf8data *data = utf8nfdi(um->version);
 
@@ -18,10 +18,10 @@ int utf8_validate(const struct unicode_map *um, const struct qstr *str)
 		return -1;
 	return 0;
 }
-EXPORT_SYMBOL(utf8_validate);
+EXPORT_SYMBOL(unicode_validate);
 
-int utf8_strncmp(const struct unicode_map *um,
-		 const struct qstr *s1, const struct qstr *s2)
+int unicode_strncmp(const struct unicode_map *um,
+		    const struct qstr *s1, const struct qstr *s2)
 {
 	const struct utf8data *data = utf8nfdi(um->version);
 	struct utf8cursor cur1, cur2;
@@ -45,10 +45,10 @@ int utf8_strncmp(const struct unicode_map *um,
 
 	return 0;
 }
-EXPORT_SYMBOL(utf8_strncmp);
+EXPORT_SYMBOL(unicode_strncmp);
 
-int utf8_strncasecmp(const struct unicode_map *um,
-		     const struct qstr *s1, const struct qstr *s2)
+int unicode_strncasecmp(const struct unicode_map *um,
+			const struct qstr *s1, const struct qstr *s2)
 {
 	const struct utf8data *data = utf8nfdicf(um->version);
 	struct utf8cursor cur1, cur2;
@@ -72,14 +72,14 @@ int utf8_strncasecmp(const struct unicode_map *um,
 
 	return 0;
 }
-EXPORT_SYMBOL(utf8_strncasecmp);
+EXPORT_SYMBOL(unicode_strncasecmp);
 
 /* String cf is expected to be a valid UTF-8 casefolded
  * string.
  */
-int utf8_strncasecmp_folded(const struct unicode_map *um,
-			    const struct qstr *cf,
-			    const struct qstr *s1)
+int unicode_strncasecmp_folded(const struct unicode_map *um,
+			       const struct qstr *cf,
+			       const struct qstr *s1)
 {
 	const struct utf8data *data = utf8nfdicf(um->version);
 	struct utf8cursor cur1;
@@ -100,10 +100,10 @@ int utf8_strncasecmp_folded(const struct unicode_map *um,
 
 	return 0;
 }
-EXPORT_SYMBOL(utf8_strncasecmp_folded);
+EXPORT_SYMBOL(unicode_strncasecmp_folded);
 
-int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
-		  unsigned char *dest, size_t dlen)
+int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
+		     unsigned char *dest, size_t dlen)
 {
 	const struct utf8data *data = utf8nfdicf(um->version);
 	struct utf8cursor cur;
@@ -123,10 +123,10 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
 	}
 	return -EINVAL;
 }
-EXPORT_SYMBOL(utf8_casefold);
+EXPORT_SYMBOL(unicode_casefold);
 
-int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
-		       struct qstr *str)
+int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
+			  struct qstr *str)
 {
 	const struct utf8data *data = utf8nfdicf(um->version);
 	struct utf8cursor cur;
@@ -144,10 +144,10 @@ int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
 	str->hash = end_name_hash(hash);
 	return 0;
 }
-EXPORT_SYMBOL(utf8_casefold_hash);
+EXPORT_SYMBOL(unicode_casefold_hash);
 
-int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
-		   unsigned char *dest, size_t dlen)
+int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
+		      unsigned char *dest, size_t dlen)
 {
 	const struct utf8data *data = utf8nfdi(um->version);
 	struct utf8cursor cur;
@@ -167,11 +167,10 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
 	}
 	return -EINVAL;
 }
+EXPORT_SYMBOL(unicode_normalize);
 
-EXPORT_SYMBOL(utf8_normalize);
-
-static int utf8_parse_version(const char *version, unsigned int *maj,
-			      unsigned int *min, unsigned int *rev)
+static int unicode_parse_version(const char *version, unsigned int *maj,
+				 unsigned int *min, unsigned int *rev)
 {
 	substring_t args[3];
 	char version_string[12];
@@ -194,7 +193,7 @@ static int utf8_parse_version(const char *version, unsigned int *maj,
 	return 0;
 }
 
-struct unicode_map *utf8_load(const char *version)
+struct unicode_map *unicode_load(const char *version)
 {
 	struct unicode_map *um = NULL;
 	int unicode_version;
@@ -202,7 +201,7 @@ struct unicode_map *utf8_load(const char *version)
 	if (version) {
 		unsigned int maj, min, rev;
 
-		if (utf8_parse_version(version, &maj, &min, &rev) < 0)
+		if (unicode_parse_version(version, &maj, &min, &rev) < 0)
 			return ERR_PTR(-EINVAL);
 
 		if (!utf8version_is_supported(maj, min, rev))
@@ -227,12 +226,12 @@ struct unicode_map *utf8_load(const char *version)
 
 	return um;
 }
-EXPORT_SYMBOL(utf8_load);
+EXPORT_SYMBOL(unicode_load);
 
-void utf8_unload(struct unicode_map *um)
+void unicode_unload(struct unicode_map *um)
 {
 	kfree(um);
 }
-EXPORT_SYMBOL(utf8_unload);
+EXPORT_SYMBOL(unicode_unload);
 
 MODULE_LICENSE("GPL v2");
diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c
index 6fe8af7edccb..796c1ed922ea 100644
--- a/fs/unicode/utf8-selftest.c
+++ b/fs/unicode/utf8-selftest.c
@@ -235,7 +235,7 @@ static void check_utf8_nfdicf(void)
 static void check_utf8_comparisons(void)
 {
 	int i;
-	struct unicode_map *table = utf8_load("12.1.0");
+	struct unicode_map *table = unicode_load("12.1.0");
 
 	if (IS_ERR(table)) {
 		pr_err("%s: Unable to load utf8 %d.%d.%d. Skipping.\n",
@@ -249,7 +249,7 @@ static void check_utf8_comparisons(void)
 		const struct qstr s2 = {.name = nfdi_test_data[i].dec,
 					.len = sizeof(nfdi_test_data[i].dec)};
 
-		test_f(!utf8_strncmp(table, &s1, &s2),
+		test_f(!unicode_strncmp(table, &s1, &s2),
 		       "%s %s comparison mismatch\n", s1.name, s2.name);
 	}
 
@@ -259,11 +259,11 @@ static void check_utf8_comparisons(void)
 		const struct qstr s2 = {.name = nfdicf_test_data[i].ncf,
 					.len = sizeof(nfdicf_test_data[i].ncf)};
 
-		test_f(!utf8_strncasecmp(table, &s1, &s2),
+		test_f(!unicode_strncasecmp(table, &s1, &s2),
 		       "%s %s comparison mismatch\n", s1.name, s2.name);
 	}
 
-	utf8_unload(table);
+	unicode_unload(table);
 }
 
 static void check_supported_versions(void)
diff --git a/include/linux/unicode.h b/include/linux/unicode.h
index 74484d44c755..de23f9ee720b 100644
--- a/include/linux/unicode.h
+++ b/include/linux/unicode.h
@@ -10,27 +10,27 @@ struct unicode_map {
 	int version;
 };
 
-int utf8_validate(const struct unicode_map *um, const struct qstr *str);
+int unicode_validate(const struct unicode_map *um, const struct qstr *str);
 
-int utf8_strncmp(const struct unicode_map *um,
-		 const struct qstr *s1, const struct qstr *s2);
+int unicode_strncmp(const struct unicode_map *um,
+		    const struct qstr *s1, const struct qstr *s2);
 
-int utf8_strncasecmp(const struct unicode_map *um,
-		 const struct qstr *s1, const struct qstr *s2);
-int utf8_strncasecmp_folded(const struct unicode_map *um,
-			    const struct qstr *cf,
-			    const struct qstr *s1);
+int unicode_strncasecmp(const struct unicode_map *um,
+			const struct qstr *s1, const struct qstr *s2);
+int unicode_strncasecmp_folded(const struct unicode_map *um,
+			       const struct qstr *cf,
+			       const struct qstr *s1);
 
-int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
-		   unsigned char *dest, size_t dlen);
+int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
+		      unsigned char *dest, size_t dlen);
 
-int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
-		  unsigned char *dest, size_t dlen);
+int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
+		     unsigned char *dest, size_t dlen);
 
-int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
-		       struct qstr *str);
+int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
+			  struct qstr *str);
 
-struct unicode_map *utf8_load(const char *version);
-void utf8_unload(struct unicode_map *um);
+struct unicode_map *unicode_load(const char *version);
+void unicode_unload(struct unicode_map *um);
 
 #endif /* _LINUX_UNICODE_H */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/4] fs: unicode: Rename utf8-core file to unicode-core
  2021-03-29 20:42 [PATCH v5 0/4] Make UTF-8 encoding loadable Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 1/4] fs: unicode: Use strscpy() instead of strncpy() Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode Shreeya Patel
@ 2021-03-29 20:42 ` Shreeya Patel
  2021-03-29 20:42 ` [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer Shreeya Patel
  3 siblings, 0 replies; 14+ messages in thread
From: Shreeya Patel @ 2021-03-29 20:42 UTC (permalink / raw)
  To: tytso, adilger.kernel, jaegeuk, chao, krisman, ebiggers, drosen,
	ebiggers, yuchao0
  Cc: linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida

utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions and it is not
necessary to carry this large table in the kernel.
Goal is to make UTF-8 encoding loadable by converting it into a module
and adding a unicode subsystem layer between the filesystems and the
utf8 module.
This layer will load the module whenever any filesystem that
needs unicode is mounted.
Rename the file name from utf8-core to unicode-core for transformation of
utf8-core file into the unicode subsystem layer file and also for better
understanding.
Implementation for unicode-core file to act as layer will be added in the
future patches.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
---
Changes in v5
  - Improve the commit message.

 fs/unicode/Makefile                        | 2 +-
 fs/unicode/{utf8-core.c => unicode-core.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename fs/unicode/{utf8-core.c => unicode-core.c} (100%)

diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
index b88aecc86550..fbf9a629ed0d 100644
--- a/fs/unicode/Makefile
+++ b/fs/unicode/Makefile
@@ -3,7 +3,7 @@
 obj-$(CONFIG_UNICODE) += unicode.o
 obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
 
-unicode-y := utf8-norm.o utf8-core.o
+unicode-y := utf8-norm.o unicode-core.o
 
 $(obj)/utf8-norm.o: $(obj)/utf8data.h
 
diff --git a/fs/unicode/utf8-core.c b/fs/unicode/unicode-core.c
similarity index 100%
rename from fs/unicode/utf8-core.c
rename to fs/unicode/unicode-core.c
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-29 20:42 [PATCH v5 0/4] Make UTF-8 encoding loadable Shreeya Patel
                   ` (2 preceding siblings ...)
  2021-03-29 20:42 ` [PATCH v5 3/4] fs: unicode: Rename utf8-core file to unicode-core Shreeya Patel
@ 2021-03-29 20:42 ` Shreeya Patel
  2021-03-29 21:20   ` Gabriel Krisman Bertazi
  2021-03-30  2:01   ` Eric Biggers
  3 siblings, 2 replies; 14+ messages in thread
From: Shreeya Patel @ 2021-03-29 20:42 UTC (permalink / raw)
  To: tytso, adilger.kernel, jaegeuk, chao, krisman, ebiggers, drosen,
	ebiggers, yuchao0
  Cc: linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida

utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions.
It is not necessary to load this large table in the kernel if no
filesystem is using it, hence make UTF-8 encoding loadable by converting
it into a module.
Modify the file called unicode-core which will act as a layer for
unicode subsystem. It will load the UTF-8 module and access it's functions
whenever any filesystem that needs unicode is mounted.
Also, indirect calls using function pointers are slow, use static calls to
avoid overhead caused in case of repeated indirect calls. Static calls
improves the performance by directly calling the functions as opposed to
indirect calls.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
---
Changes in v5
  - Rename global variables and default static call functions for better
    understanding
  - Make only config UNICODE_UTF8 visible and config UNICODE to be always
    enabled provided UNICODE_UTF8 is enabled.  
  - Improve the documentation for Kconfig
  - Improve the commit message.
 
Changes in v4
  - Return error from the static calls instead of doing nothing and
    succeeding even without loading the module.
  - Remove the complete usage of utf8_ops and use static calls at all
    places.
  - Restore the static calls to default values when module is unloaded.
  - Decrement the reference of module after calling the unload function.
  - Remove spinlock as there will be no race conditions after removing
    utf8_ops.

Changes in v3
  - Add a patch which checks if utf8 is loaded before calling utf8_unload()
    in ext4 and f2fs filesystems
  - Return error if strscpy() returns value < 0
  - Correct the conditions to prevent NULL pointer dereference while
    accessing functions via utf8_ops variable.
  - Add spinlock to avoid race conditions.
  - Use static_call() for preventing speculative execution attacks.

Changes in v2
  - Remove the duplicate file from the last patch.
  - Make the wrapper functions inline.
  - Remove msleep and use try_module_get() and module_put()
    for ensuring that module is loaded correctly and also
    doesn't get unloaded while in use.
  - Resolve the warning reported by kernel test robot.
  - Resolve all the checkpatch.pl warnings.


 fs/unicode/Kconfig        |  17 ++-
 fs/unicode/Makefile       |   5 +-
 fs/unicode/unicode-core.c | 241 +++++++----------------------------
 fs/unicode/unicode-utf8.c | 256 ++++++++++++++++++++++++++++++++++++++
 include/linux/unicode.h   | 123 +++++++++++++++---
 5 files changed, 426 insertions(+), 216 deletions(-)
 create mode 100644 fs/unicode/unicode-utf8.c

diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
index 2c27b9a5cd6c..ad4b837f2eb2 100644
--- a/fs/unicode/Kconfig
+++ b/fs/unicode/Kconfig
@@ -2,13 +2,26 @@
 #
 # UTF-8 normalization
 #
+# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
+# is enabled. This config option adds the unicode subsystem layer which loads
+# the UTF-8 module whenever any filesystem needs it.
 config UNICODE
-	bool "UTF-8 normalization and casefolding support"
+	bool
+
+# utf8data.h_shipped has a large database table which is an auto-generated
+# decodification trie for the unicode normalization functions and it is not
+# necessary to carry this large table in the kernel.
+# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a
+# module and this module will be loaded by the unicode subsystem layer only
+# when any filesystem needs it.
+config UNICODE_UTF8
+	tristate "UTF-8 module"
 	help
 	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
 	  support.
+	select UNICODE
 
 config UNICODE_NORMALIZATION_SELFTEST
 	tristate "Test UTF-8 normalization support"
-	depends on UNICODE
+	depends on UNICODE_UTF8
 	default n
diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
index fbf9a629ed0d..49d50083e6ee 100644
--- a/fs/unicode/Makefile
+++ b/fs/unicode/Makefile
@@ -1,11 +1,14 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_UNICODE) += unicode.o
+obj-$(CONFIG_UNICODE_UTF8) += utf8.o
 obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
 
-unicode-y := utf8-norm.o unicode-core.o
+unicode-y := unicode-core.o
+utf8-y := unicode-utf8.o utf8-norm.o
 
 $(obj)/utf8-norm.o: $(obj)/utf8data.h
+$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o
 
 # In the normal build, the checked-in utf8data.h is just shipped.
 #
diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c
index 730dbaedf593..07d42f471e42 100644
--- a/fs/unicode/unicode-core.c
+++ b/fs/unicode/unicode-core.c
@@ -1,237 +1,80 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/module.h>
 #include <linux/kernel.h>
-#include <linux/string.h>
 #include <linux/slab.h>
-#include <linux/parser.h>
 #include <linux/errno.h>
 #include <linux/unicode.h>
-#include <linux/stringhash.h>
 
-#include "utf8n.h"
+static struct module *utf8mod;
 
-int unicode_validate(const struct unicode_map *um, const struct qstr *str)
-{
-	const struct utf8data *data = utf8nfdi(um->version);
+DEFINE_STATIC_CALL(_unicode_validate, unicode_validate_default);
+EXPORT_STATIC_CALL(_unicode_validate);
 
-	if (utf8nlen(data, str->name, str->len) < 0)
-		return -1;
-	return 0;
-}
-EXPORT_SYMBOL(unicode_validate);
+DEFINE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
+EXPORT_STATIC_CALL(_unicode_strncmp);
 
-int unicode_strncmp(const struct unicode_map *um,
-		    const struct qstr *s1, const struct qstr *s2)
-{
-	const struct utf8data *data = utf8nfdi(um->version);
-	struct utf8cursor cur1, cur2;
-	int c1, c2;
+DEFINE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
+EXPORT_STATIC_CALL(_unicode_strncasecmp);
 
-	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-		return -EINVAL;
+DEFINE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
+EXPORT_STATIC_CALL(_unicode_strncasecmp_folded);
 
-	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
-		return -EINVAL;
+DEFINE_STATIC_CALL(_unicode_normalize, unicode_normalize_default);
+EXPORT_STATIC_CALL(_unicode_normalize);
 
-	do {
-		c1 = utf8byte(&cur1);
-		c2 = utf8byte(&cur2);
+DEFINE_STATIC_CALL(_unicode_casefold, unicode_casefold_default);
+EXPORT_STATIC_CALL(_unicode_casefold);
 
-		if (c1 < 0 || c2 < 0)
-			return -EINVAL;
-		if (c1 != c2)
-			return 1;
-	} while (c1);
+DEFINE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default);
+EXPORT_STATIC_CALL(_unicode_casefold_hash);
 
-	return 0;
-}
-EXPORT_SYMBOL(unicode_strncmp);
+DEFINE_STATIC_CALL(_unicode_load, unicode_load_default);
+EXPORT_STATIC_CALL(_unicode_load);
 
-int unicode_strncasecmp(const struct unicode_map *um,
-			const struct qstr *s1, const struct qstr *s2)
+static int unicode_load_module(void)
 {
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur1, cur2;
-	int c1, c2;
-
-	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-		return -EINVAL;
-
-	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
-		return -EINVAL;
-
-	do {
-		c1 = utf8byte(&cur1);
-		c2 = utf8byte(&cur2);
-
-		if (c1 < 0 || c2 < 0)
-			return -EINVAL;
-		if (c1 != c2)
-			return 1;
-	} while (c1);
-
-	return 0;
-}
-EXPORT_SYMBOL(unicode_strncasecmp);
-
-/* String cf is expected to be a valid UTF-8 casefolded
- * string.
- */
-int unicode_strncasecmp_folded(const struct unicode_map *um,
-			       const struct qstr *cf,
-			       const struct qstr *s1)
-{
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur1;
-	int c1, c2;
-	int i = 0;
-
-	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-		return -EINVAL;
-
-	do {
-		c1 = utf8byte(&cur1);
-		c2 = cf->name[i++];
-		if (c1 < 0)
-			return -EINVAL;
-		if (c1 != c2)
-			return 1;
-	} while (c1);
-
-	return 0;
-}
-EXPORT_SYMBOL(unicode_strncasecmp_folded);
-
-int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
-		     unsigned char *dest, size_t dlen)
-{
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur;
-	size_t nlen = 0;
-
-	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
-		return -EINVAL;
-
-	for (nlen = 0; nlen < dlen; nlen++) {
-		int c = utf8byte(&cur);
-
-		dest[nlen] = c;
-		if (!c)
-			return nlen;
-		if (c == -1)
-			break;
-	}
-	return -EINVAL;
-}
-EXPORT_SYMBOL(unicode_casefold);
+	int ret = request_module("utf8");
 
-int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
-			  struct qstr *str)
-{
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur;
-	int c;
-	unsigned long hash = init_name_hash(salt);
-
-	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
-		return -EINVAL;
-
-	while ((c = utf8byte(&cur))) {
-		if (c < 0)
-			return -EINVAL;
-		hash = partial_name_hash((unsigned char)c, hash);
+	if (ret) {
+		pr_err("Failed to load UTF-8 module\n");
+		return ret;
 	}
-	str->hash = end_name_hash(hash);
 	return 0;
 }
-EXPORT_SYMBOL(unicode_casefold_hash);
 
-int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
-		      unsigned char *dest, size_t dlen)
+struct unicode_map *unicode_load(const char *version)
 {
-	const struct utf8data *data = utf8nfdi(um->version);
-	struct utf8cursor cur;
-	ssize_t nlen = 0;
-
-	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
-		return -EINVAL;
+	int ret = unicode_load_module();
 
-	for (nlen = 0; nlen < dlen; nlen++) {
-		int c = utf8byte(&cur);
+	if (ret)
+		return ERR_PTR(ret);
 
-		dest[nlen] = c;
-		if (!c)
-			return nlen;
-		if (c == -1)
-			break;
-	}
-	return -EINVAL;
+	if (!try_module_get(utf8mod))
+		return ERR_PTR(-ENODEV);
+	else
+		return static_call(_unicode_load)(version);
 }
-EXPORT_SYMBOL(unicode_normalize);
+EXPORT_SYMBOL(unicode_load);
 
-static int unicode_parse_version(const char *version, unsigned int *maj,
-				 unsigned int *min, unsigned int *rev)
+void unicode_unload(struct unicode_map *um)
 {
-	substring_t args[3];
-	char version_string[12];
-	static const struct match_token token[] = {
-		{1, "%d.%d.%d"},
-		{0, NULL}
-	};
-	int ret = strscpy(version_string, version, sizeof(version_string));
-
-	if (ret < 0)
-		return ret;
-
-	if (match_token(version_string, token, args) != 1)
-		return -EINVAL;
-
-	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
-	    match_int(&args[2], rev))
-		return -EINVAL;
+	kfree(um);
 
-	return 0;
+	if (utf8mod)
+		module_put(utf8mod);
 }
+EXPORT_SYMBOL(unicode_unload);
 
-struct unicode_map *unicode_load(const char *version)
+void unicode_register(struct module *owner)
 {
-	struct unicode_map *um = NULL;
-	int unicode_version;
-
-	if (version) {
-		unsigned int maj, min, rev;
-
-		if (unicode_parse_version(version, &maj, &min, &rev) < 0)
-			return ERR_PTR(-EINVAL);
-
-		if (!utf8version_is_supported(maj, min, rev))
-			return ERR_PTR(-EINVAL);
-
-		unicode_version = UNICODE_AGE(maj, min, rev);
-	} else {
-		unicode_version = utf8version_latest();
-		printk(KERN_WARNING"UTF-8 version not specified. "
-		       "Assuming latest supported version (%d.%d.%d).",
-		       (unicode_version >> 16) & 0xff,
-		       (unicode_version >> 8) & 0xff,
-		       (unicode_version & 0xff));
-	}
-
-	um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL);
-	if (!um)
-		return ERR_PTR(-ENOMEM);
-
-	um->charset = "UTF-8";
-	um->version = unicode_version;
-
-	return um;
+	utf8mod = owner;
 }
-EXPORT_SYMBOL(unicode_load);
+EXPORT_SYMBOL(unicode_register);
 
-void unicode_unload(struct unicode_map *um)
+void unicode_unregister(void)
 {
-	kfree(um);
+	utf8mod = NULL;
 }
-EXPORT_SYMBOL(unicode_unload);
+EXPORT_SYMBOL(unicode_unregister);
 
 MODULE_LICENSE("GPL v2");
diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c
new file mode 100644
index 000000000000..9c6b58239067
--- /dev/null
+++ b/fs/unicode/unicode-utf8.c
@@ -0,0 +1,256 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/errno.h>
+#include <linux/unicode.h>
+#include <linux/stringhash.h>
+#include <linux/static_call.h>
+
+#include "utf8n.h"
+
+static int utf8_validate(const struct unicode_map *um, const struct qstr *str)
+{
+	const struct utf8data *data = utf8nfdi(um->version);
+
+	if (utf8nlen(data, str->name, str->len) < 0)
+		return -1;
+	return 0;
+}
+
+static int utf8_strncmp(const struct unicode_map *um,
+			const struct qstr *s1, const struct qstr *s2)
+{
+	const struct utf8data *data = utf8nfdi(um->version);
+	struct utf8cursor cur1, cur2;
+	int c1, c2;
+
+	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
+		return -EINVAL;
+
+	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
+		return -EINVAL;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = utf8byte(&cur2);
+
+		if (c1 < 0 || c2 < 0)
+			return -EINVAL;
+		if (c1 != c2)
+			return 1;
+	} while (c1);
+
+	return 0;
+}
+
+static int utf8_strncasecmp(const struct unicode_map *um,
+			    const struct qstr *s1, const struct qstr *s2)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur1, cur2;
+	int c1, c2;
+
+	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
+		return -EINVAL;
+
+	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
+		return -EINVAL;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = utf8byte(&cur2);
+
+		if (c1 < 0 || c2 < 0)
+			return -EINVAL;
+		if (c1 != c2)
+			return 1;
+	} while (c1);
+
+	return 0;
+}
+
+/* String cf is expected to be a valid UTF-8 casefolded
+ * string.
+ */
+static int utf8_strncasecmp_folded(const struct unicode_map *um,
+				   const struct qstr *cf,
+				   const struct qstr *s1)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur1;
+	int c1, c2;
+	int i = 0;
+
+	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
+		return -EINVAL;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = cf->name[i++];
+		if (c1 < 0)
+			return -EINVAL;
+		if (c1 != c2)
+			return 1;
+	} while (c1);
+
+	return 0;
+}
+
+static int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
+			 unsigned char *dest, size_t dlen)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur;
+	size_t nlen = 0;
+
+	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+		return -EINVAL;
+
+	for (nlen = 0; nlen < dlen; nlen++) {
+		int c = utf8byte(&cur);
+
+		dest[nlen] = c;
+		if (!c)
+			return nlen;
+		if (c == -1)
+			break;
+	}
+	return -EINVAL;
+}
+
+static int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
+			      struct qstr *str)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur;
+	int c;
+	unsigned long hash = init_name_hash(salt);
+
+	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+		return -EINVAL;
+
+	while ((c = utf8byte(&cur))) {
+		if (c < 0)
+			return -EINVAL;
+		hash = partial_name_hash((unsigned char)c, hash);
+	}
+	str->hash = end_name_hash(hash);
+	return 0;
+}
+
+static int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
+			  unsigned char *dest, size_t dlen)
+{
+	const struct utf8data *data = utf8nfdi(um->version);
+	struct utf8cursor cur;
+	ssize_t nlen = 0;
+
+	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+		return -EINVAL;
+
+	for (nlen = 0; nlen < dlen; nlen++) {
+		int c = utf8byte(&cur);
+
+		dest[nlen] = c;
+		if (!c)
+			return nlen;
+		if (c == -1)
+			break;
+	}
+	return -EINVAL;
+}
+
+static int utf8_parse_version(const char *version, unsigned int *maj,
+			      unsigned int *min, unsigned int *rev)
+{
+	substring_t args[3];
+	char version_string[12];
+	static const struct match_token token[] = {
+		{1, "%d.%d.%d"},
+		{0, NULL}
+	};
+
+	int ret = strscpy(version_string, version, sizeof(version_string));
+
+	if (ret < 0)
+		return ret;
+
+	if (match_token(version_string, token, args) != 1)
+		return -EINVAL;
+
+	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
+	    match_int(&args[2], rev))
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct unicode_map *utf8_load(const char *version)
+{
+	struct unicode_map *um = NULL;
+	int unicode_version;
+
+	if (version) {
+		unsigned int maj, min, rev;
+
+		if (utf8_parse_version(version, &maj, &min, &rev) < 0)
+			return ERR_PTR(-EINVAL);
+
+		if (!utf8version_is_supported(maj, min, rev))
+			return ERR_PTR(-EINVAL);
+
+		unicode_version = UNICODE_AGE(maj, min, rev);
+	} else {
+		unicode_version = utf8version_latest();
+		pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).",
+			(unicode_version >> 16) & 0xff,
+			(unicode_version >> 8) & 0xff,
+			(unicode_version & 0xfe));
+	}
+
+	um = kzalloc(sizeof(*um), GFP_KERNEL);
+	if (!um)
+		return ERR_PTR(-ENOMEM);
+
+	um->charset = "UTF-8";
+	um->version = unicode_version;
+
+	return um;
+}
+
+static int __init utf8_init(void)
+{
+	static_call_update(_unicode_validate, utf8_validate);
+	static_call_update(_unicode_strncmp, utf8_strncmp);
+	static_call_update(_unicode_strncasecmp, utf8_strncasecmp);
+	static_call_update(_unicode_strncasecmp_folded, utf8_strncasecmp_folded);
+	static_call_update(_unicode_normalize, utf8_normalize);
+	static_call_update(_unicode_casefold, utf8_casefold);
+	static_call_update(_unicode_casefold_hash, utf8_casefold_hash);
+	static_call_update(_unicode_load, utf8_load);
+
+	unicode_register(THIS_MODULE);
+	return 0;
+}
+
+static void __exit utf8_exit(void)
+{
+	static_call_update(_unicode_validate, unicode_validate_default);
+	static_call_update(_unicode_strncmp, unicode_strncmp_default);
+	static_call_update(_unicode_strncasecmp, unicode_strncasecmp_default);
+	static_call_update(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
+	static_call_update(_unicode_normalize, unicode_normalize_default);
+	static_call_update(_unicode_casefold, unicode_casefold_default);
+	static_call_update(_unicode_casefold_hash, unicode_casefold_hash_default);
+	static_call_update(_unicode_load, unicode_load_default);
+
+	unicode_unregister();
+}
+
+module_init(utf8_init);
+module_exit(utf8_exit);
+
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/unicode.h b/include/linux/unicode.h
index de23f9ee720b..18a1d3db9de5 100644
--- a/include/linux/unicode.h
+++ b/include/linux/unicode.h
@@ -4,33 +4,128 @@
 
 #include <linux/init.h>
 #include <linux/dcache.h>
+#include <linux/static_call.h>
+
 
 struct unicode_map {
 	const char *charset;
 	int version;
 };
 
-int unicode_validate(const struct unicode_map *um, const struct qstr *str);
+static int unicode_warn_on(void)
+{
+	WARN_ON(1);
+	return -EIO;
+}
+
+static int unicode_validate_default(const struct unicode_map *um,
+				    const struct qstr *str)
+{
+	return unicode_warn_on();
+}
+
+static int unicode_strncmp_default(const struct unicode_map *um,
+				   const struct qstr *s1,
+				   const struct qstr *s2)
+{
+	return unicode_warn_on();
+}
+
+static int unicode_strncasecmp_default(const struct unicode_map *um,
+				       const struct qstr *s1,
+				       const struct qstr *s2)
+{
+	return unicode_warn_on();
+}
+
+static int unicode_strncasecmp_folded_default(const struct unicode_map *um,
+					      const struct qstr *cf,
+					      const struct qstr *s1)
+{
+	return unicode_warn_on();
+}
+
+static int unicode_normalize_default(const struct unicode_map *um,
+				     const struct qstr *str,
+				     unsigned char *dest, size_t dlen)
+{
+	return unicode_warn_on();
+}
+
+static int unicode_casefold_default(const struct unicode_map *um,
+				    const struct qstr *str,
+				    unsigned char *dest, size_t dlen)
+{
+	return unicode_warn_on();
+}
 
-int unicode_strncmp(const struct unicode_map *um,
-		    const struct qstr *s1, const struct qstr *s2);
+static int unicode_casefold_hash_default(const struct unicode_map *um,
+					 const void *salt, struct qstr *str)
+{
+	return unicode_warn_on();
+}
 
-int unicode_strncasecmp(const struct unicode_map *um,
-			const struct qstr *s1, const struct qstr *s2);
-int unicode_strncasecmp_folded(const struct unicode_map *um,
-			       const struct qstr *cf,
-			       const struct qstr *s1);
+static struct unicode_map *unicode_load_default(const char *version)
+{
+	unicode_warn_on();
+	return ERR_PTR(-EIO);
+}
 
-int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
-		      unsigned char *dest, size_t dlen);
+DECLARE_STATIC_CALL(_unicode_validate, unicode_validate_default);
+DECLARE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
+DECLARE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
+DECLARE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
+DECLARE_STATIC_CALL(_unicode_normalize, unicode_normalize_default);
+DECLARE_STATIC_CALL(_unicode_casefold, unicode_casefold_default);
+DECLARE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default);
+DECLARE_STATIC_CALL(_unicode_load, unicode_load_default);
 
-int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
-		     unsigned char *dest, size_t dlen);
+static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str)
+{
+	return static_call(_unicode_validate)(um, str);
+}
 
-int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
-			  struct qstr *str);
+static inline int unicode_strncmp(const struct unicode_map *um,
+				  const struct qstr *s1, const struct qstr *s2)
+{
+	return static_call(_unicode_strncmp)(um, s1, s2);
+}
+
+static inline int unicode_strncasecmp(const struct unicode_map *um,
+				      const struct qstr *s1, const struct qstr *s2)
+{
+	return static_call(_unicode_strncasecmp)(um, s1, s2);
+}
+
+static inline int unicode_strncasecmp_folded(const struct unicode_map *um,
+					     const struct qstr *cf,
+					     const struct qstr *s1)
+{
+	return static_call(_unicode_strncasecmp_folded)(um, cf, s1);
+}
+
+static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
+				    unsigned char *dest, size_t dlen)
+{
+	return static_call(_unicode_normalize)(um, str, dest, dlen);
+}
+
+static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
+				   unsigned char *dest, size_t dlen)
+{
+	return static_call(_unicode_casefold)(um, str, dest, dlen);
+}
+
+static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
+					struct qstr *str)
+{
+	return static_call(_unicode_casefold_hash)(um, salt, str);
+}
 
 struct unicode_map *unicode_load(const char *version);
 void unicode_unload(struct unicode_map *um);
 
+void unicode_register(struct module *owner);
+void unicode_unregister(void);
+
 #endif /* _LINUX_UNICODE_H */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-29 20:42 ` [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer Shreeya Patel
@ 2021-03-29 21:20   ` Gabriel Krisman Bertazi
  2021-03-29 22:38     ` Shreeya Patel
  2021-03-30  2:01   ` Eric Biggers
  1 sibling, 1 reply; 14+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-03-29 21:20 UTC (permalink / raw)
  To: Shreeya Patel
  Cc: tytso, adilger.kernel, jaegeuk, chao, ebiggers, drosen, ebiggers,
	yuchao0, linux-ext4, linux-kernel, linux-f2fs-devel,
	linux-fsdevel, kernel, andre.almeida

Shreeya Patel <shreeya.patel@collabora.com> writes:

> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions.
> It is not necessary to load this large table in the kernel if no
> filesystem is using it, hence make UTF-8 encoding loadable by converting
> it into a module.
> Modify the file called unicode-core which will act as a layer for
> unicode subsystem. It will load the UTF-8 module and access it's functions
> whenever any filesystem that needs unicode is mounted.
> Also, indirect calls using function pointers are slow, use static calls to
> avoid overhead caused in case of repeated indirect calls. Static calls
> improves the performance by directly calling the functions as opposed to
> indirect calls.
>
> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
> ---
> Changes in v5
>   - Rename global variables and default static call functions for better
>     understanding
>   - Make only config UNICODE_UTF8 visible and config UNICODE to be always
>     enabled provided UNICODE_UTF8 is enabled.  
>   - Improve the documentation for Kconfig
>   - Improve the commit message.
>  
> Changes in v4
>   - Return error from the static calls instead of doing nothing and
>     succeeding even without loading the module.
>   - Remove the complete usage of utf8_ops and use static calls at all
>     places.
>   - Restore the static calls to default values when module is unloaded.
>   - Decrement the reference of module after calling the unload function.
>   - Remove spinlock as there will be no race conditions after removing
>     utf8_ops.
>
> Changes in v3
>   - Add a patch which checks if utf8 is loaded before calling utf8_unload()
>     in ext4 and f2fs filesystems
>   - Return error if strscpy() returns value < 0
>   - Correct the conditions to prevent NULL pointer dereference while
>     accessing functions via utf8_ops variable.
>   - Add spinlock to avoid race conditions.
>   - Use static_call() for preventing speculative execution attacks.
>
> Changes in v2
>   - Remove the duplicate file from the last patch.
>   - Make the wrapper functions inline.
>   - Remove msleep and use try_module_get() and module_put()
>     for ensuring that module is loaded correctly and also
>     doesn't get unloaded while in use.
>   - Resolve the warning reported by kernel test robot.
>   - Resolve all the checkpatch.pl warnings.
>
>
>  fs/unicode/Kconfig        |  17 ++-
>  fs/unicode/Makefile       |   5 +-
>  fs/unicode/unicode-core.c | 241 +++++++----------------------------
>  fs/unicode/unicode-utf8.c | 256 ++++++++++++++++++++++++++++++++++++++
>  include/linux/unicode.h   | 123 +++++++++++++++---
>  5 files changed, 426 insertions(+), 216 deletions(-)
>  create mode 100644 fs/unicode/unicode-utf8.c
>
> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> index 2c27b9a5cd6c..ad4b837f2eb2 100644
> --- a/fs/unicode/Kconfig
> +++ b/fs/unicode/Kconfig
> @@ -2,13 +2,26 @@
>  #
>  # UTF-8 normalization
>  #
> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
> +# is enabled. This config option adds the unicode subsystem layer which loads
> +# the UTF-8 module whenever any filesystem needs it.
>  config UNICODE
> -	bool "UTF-8 normalization and casefolding support"
> +	bool
> +
> +# utf8data.h_shipped has a large database table which is an auto-generated
> +# decodification trie for the unicode normalization functions and it is not
> +# necessary to carry this large table in the kernel.
> +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a
> +# module and this module will be loaded by the unicode subsystem layer only
> +# when any filesystem needs it.
> +config UNICODE_UTF8
> +	tristate "UTF-8 module"
>  	help
>  	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
>  	  support.
> +	select UNICODE
>  
>  config UNICODE_NORMALIZATION_SELFTEST
>  	tristate "Test UTF-8 normalization support"
> -	depends on UNICODE
> +	depends on UNICODE_UTF8
>  	default n
> diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
> index fbf9a629ed0d..49d50083e6ee 100644
> --- a/fs/unicode/Makefile
> +++ b/fs/unicode/Makefile
> @@ -1,11 +1,14 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
>  obj-$(CONFIG_UNICODE) += unicode.o
> +obj-$(CONFIG_UNICODE_UTF8) += utf8.o
>  obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
>  
> -unicode-y := utf8-norm.o unicode-core.o
> +unicode-y := unicode-core.o
> +utf8-y := unicode-utf8.o utf8-norm.o
>  
>  $(obj)/utf8-norm.o: $(obj)/utf8data.h
> +$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o
>  
>  # In the normal build, the checked-in utf8data.h is just shipped.
>  #
> diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c
> index 730dbaedf593..07d42f471e42 100644
> --- a/fs/unicode/unicode-core.c
> +++ b/fs/unicode/unicode-core.c
> @@ -1,237 +1,80 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #include <linux/module.h>
>  #include <linux/kernel.h>
> -#include <linux/string.h>
>  #include <linux/slab.h>
> -#include <linux/parser.h>
>  #include <linux/errno.h>
>  #include <linux/unicode.h>
> -#include <linux/stringhash.h>
>  
> -#include "utf8n.h"
> +static struct module *utf8mod;
>  
> -int unicode_validate(const struct unicode_map *um, const struct qstr *str)
> -{
> -	const struct utf8data *data = utf8nfdi(um->version);
> +DEFINE_STATIC_CALL(_unicode_validate, unicode_validate_default);
> +EXPORT_STATIC_CALL(_unicode_validate);
>  
> -	if (utf8nlen(data, str->name, str->len) < 0)
> -		return -1;
> -	return 0;
> -}
> -EXPORT_SYMBOL(unicode_validate);
> +DEFINE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
> +EXPORT_STATIC_CALL(_unicode_strncmp);
>  
> -int unicode_strncmp(const struct unicode_map *um,
> -		    const struct qstr *s1, const struct qstr *s2)
> -{
> -	const struct utf8data *data = utf8nfdi(um->version);
> -	struct utf8cursor cur1, cur2;
> -	int c1, c2;
> +DEFINE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
> +EXPORT_STATIC_CALL(_unicode_strncasecmp);

Why are these here if the _default functions are defined in the header
file?  I think the definitions could be in this file. No?

> -	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> -		return -EINVAL;
> +DEFINE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
> +EXPORT_STATIC_CALL(_unicode_strncasecmp_folded);
>  
> -	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
> -		return -EINVAL;
> +DEFINE_STATIC_CALL(_unicode_normalize, unicode_normalize_default);
> +EXPORT_STATIC_CALL(_unicode_normalize);
>  
> -	do {
> -		c1 = utf8byte(&cur1);
> -		c2 = utf8byte(&cur2);
> +DEFINE_STATIC_CALL(_unicode_casefold, unicode_casefold_default);
> +EXPORT_STATIC_CALL(_unicode_casefold);
>  
> -		if (c1 < 0 || c2 < 0)
> -			return -EINVAL;
> -		if (c1 != c2)
> -			return 1;
> -	} while (c1);
> +DEFINE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default);
> +EXPORT_STATIC_CALL(_unicode_casefold_hash);
>  
> -	return 0;
> -}
> -EXPORT_SYMBOL(unicode_strncmp);
> +DEFINE_STATIC_CALL(_unicode_load, unicode_load_default);
> +EXPORT_STATIC_CALL(_unicode_load);
>  
> -int unicode_strncasecmp(const struct unicode_map *um,
> -			const struct qstr *s1, const struct qstr *s2)
> +static int unicode_load_module(void)
>  {
> -	const struct utf8data *data = utf8nfdicf(um->version);
> -	struct utf8cursor cur1, cur2;
> -	int c1, c2;
> -
> -	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> -		return -EINVAL;
> -
> -	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
> -		return -EINVAL;
> -
> -	do {
> -		c1 = utf8byte(&cur1);
> -		c2 = utf8byte(&cur2);
> -
> -		if (c1 < 0 || c2 < 0)
> -			return -EINVAL;
> -		if (c1 != c2)
> -			return 1;
> -	} while (c1);
> -
> -	return 0;
> -}
> -EXPORT_SYMBOL(unicode_strncasecmp);
> -
> -/* String cf is expected to be a valid UTF-8 casefolded
> - * string.
> - */
> -int unicode_strncasecmp_folded(const struct unicode_map *um,
> -			       const struct qstr *cf,
> -			       const struct qstr *s1)
> -{
> -	const struct utf8data *data = utf8nfdicf(um->version);
> -	struct utf8cursor cur1;
> -	int c1, c2;
> -	int i = 0;
> -
> -	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> -		return -EINVAL;
> -
> -	do {
> -		c1 = utf8byte(&cur1);
> -		c2 = cf->name[i++];
> -		if (c1 < 0)
> -			return -EINVAL;
> -		if (c1 != c2)
> -			return 1;
> -	} while (c1);
> -
> -	return 0;
> -}
> -EXPORT_SYMBOL(unicode_strncasecmp_folded);
> -
> -int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
> -		     unsigned char *dest, size_t dlen)
> -{
> -	const struct utf8data *data = utf8nfdicf(um->version);
> -	struct utf8cursor cur;
> -	size_t nlen = 0;
> -
> -	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> -		return -EINVAL;
> -
> -	for (nlen = 0; nlen < dlen; nlen++) {
> -		int c = utf8byte(&cur);
> -
> -		dest[nlen] = c;
> -		if (!c)
> -			return nlen;
> -		if (c == -1)
> -			break;
> -	}
> -	return -EINVAL;
> -}
> -EXPORT_SYMBOL(unicode_casefold);
> +	int ret = request_module("utf8");
>  
> -int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
> -			  struct qstr *str)
> -{
> -	const struct utf8data *data = utf8nfdicf(um->version);
> -	struct utf8cursor cur;
> -	int c;
> -	unsigned long hash = init_name_hash(salt);
> -
> -	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> -		return -EINVAL;
> -
> -	while ((c = utf8byte(&cur))) {
> -		if (c < 0)
> -			return -EINVAL;
> -		hash = partial_name_hash((unsigned char)c, hash);
> +	if (ret) {
> +		pr_err("Failed to load UTF-8 module\n");
> +		return ret;
>  	}
> -	str->hash = end_name_hash(hash);
>  	return 0;
>  }
> -EXPORT_SYMBOL(unicode_casefold_hash);
>  
> -int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
> -		      unsigned char *dest, size_t dlen)
> +struct unicode_map *unicode_load(const char *version)
>  {
> -	const struct utf8data *data = utf8nfdi(um->version);
> -	struct utf8cursor cur;
> -	ssize_t nlen = 0;
> -
> -	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> -		return -EINVAL;
> +	int ret = unicode_load_module();

Splitting this in two functions sound unnecessary, since the other
function just calls request_module.  By the way, is there any protection
against calling request_module if the module is already loaded?  Surely
that's not necessary, perhaps try_then_request_module(utf8mod, "utf8")?

> -	for (nlen = 0; nlen < dlen; nlen++) {
> -		int c = utf8byte(&cur);
> +	if (ret)
> +		return ERR_PTR(ret);
>  
> -		dest[nlen] = c;
> -		if (!c)
> -			return nlen;
> -		if (c == -1)
> -			break;
> -	}
> -	return -EINVAL;
> +	if (!try_module_get(utf8mod))

Can't module_unregister be called in between the register_module and
here, and then you have a bogus utf8mod pointer? true, try_module_get
checks for NULL, but if you are unlucky module_is_live will it breaks.
I still think utf8mod needs to be protected while you don't have a
reference to the module.

> +		return ERR_PTR(-ENODEV);
> +	else
> +		return static_call(_unicode_load)(version);
>  }
> -EXPORT_SYMBOL(unicode_normalize);
> +EXPORT_SYMBOL(unicode_load);
>  
> -static int unicode_parse_version(const char *version, unsigned int *maj,
> -				 unsigned int *min, unsigned int *rev)
> +void unicode_unload(struct unicode_map *um)
>  {
> -	substring_t args[3];
> -	char version_string[12];
> -	static const struct match_token token[] = {
> -		{1, "%d.%d.%d"},
> -		{0, NULL}
> -	};
> -	int ret = strscpy(version_string, version, sizeof(version_string));
> -
> -	if (ret < 0)
> -		return ret;
> -
> -	if (match_token(version_string, token, args) != 1)
> -		return -EINVAL;
> -
> -	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
> -	    match_int(&args[2], rev))
> -		return -EINVAL;
> +	kfree(um);
>  
> -	return 0;
> +	if (utf8mod)
> +		module_put(utf8mod);
>  }
> +EXPORT_SYMBOL(unicode_unload);
>  
> -struct unicode_map *unicode_load(const char *version)
> +void unicode_register(struct module *owner)
>  {
> -	struct unicode_map *um = NULL;
> -	int unicode_version;
> -
> -	if (version) {
> -		unsigned int maj, min, rev;
> -
> -		if (unicode_parse_version(version, &maj, &min, &rev) < 0)
> -			return ERR_PTR(-EINVAL);
> -
> -		if (!utf8version_is_supported(maj, min, rev))
> -			return ERR_PTR(-EINVAL);
> -
> -		unicode_version = UNICODE_AGE(maj, min, rev);
> -	} else {
> -		unicode_version = utf8version_latest();
> -		printk(KERN_WARNING"UTF-8 version not specified. "
> -		       "Assuming latest supported version (%d.%d.%d).",
> -		       (unicode_version >> 16) & 0xff,
> -		       (unicode_version >> 8) & 0xff,
> -		       (unicode_version & 0xff));
> -	}
> -
> -	um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL);
> -	if (!um)
> -		return ERR_PTR(-ENOMEM);
> -
> -	um->charset = "UTF-8";
> -	um->version = unicode_version;
> -
> -	return um;
> +	utf8mod = owner;
>  }
> -EXPORT_SYMBOL(unicode_load);
> +EXPORT_SYMBOL(unicode_register);
>  
> -void unicode_unload(struct unicode_map *um)
> +void unicode_unregister(void)
>  {
> -	kfree(um);
> +	utf8mod = NULL;
>  }
> -EXPORT_SYMBOL(unicode_unload);
> +EXPORT_SYMBOL(unicode_unregister);
>  
>  MODULE_LICENSE("GPL v2");
> diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c
> new file mode 100644
> index 000000000000..9c6b58239067
> --- /dev/null
> +++ b/fs/unicode/unicode-utf8.c
> @@ -0,0 +1,256 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/slab.h>
> +#include <linux/parser.h>
> +#include <linux/errno.h>
> +#include <linux/unicode.h>
> +#include <linux/stringhash.h>
> +#include <linux/static_call.h>
> +
> +#include "utf8n.h"
> +
> +static int utf8_validate(const struct unicode_map *um, const struct qstr *str)
> +{
> +	const struct utf8data *data = utf8nfdi(um->version);
> +
> +	if (utf8nlen(data, str->name, str->len) < 0)
> +		return -1;
> +	return 0;
> +}
> +
> +static int utf8_strncmp(const struct unicode_map *um,
> +			const struct qstr *s1, const struct qstr *s2)
> +{
> +	const struct utf8data *data = utf8nfdi(um->version);
> +	struct utf8cursor cur1, cur2;
> +	int c1, c2;
> +
> +	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> +		return -EINVAL;
> +
> +	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
> +		return -EINVAL;
> +
> +	do {
> +		c1 = utf8byte(&cur1);
> +		c2 = utf8byte(&cur2);
> +
> +		if (c1 < 0 || c2 < 0)
> +			return -EINVAL;
> +		if (c1 != c2)
> +			return 1;
> +	} while (c1);
> +
> +	return 0;
> +}
> +
> +static int utf8_strncasecmp(const struct unicode_map *um,
> +			    const struct qstr *s1, const struct qstr *s2)
> +{
> +	const struct utf8data *data = utf8nfdicf(um->version);
> +	struct utf8cursor cur1, cur2;
> +	int c1, c2;
> +
> +	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> +		return -EINVAL;
> +
> +	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
> +		return -EINVAL;
> +
> +	do {
> +		c1 = utf8byte(&cur1);
> +		c2 = utf8byte(&cur2);
> +
> +		if (c1 < 0 || c2 < 0)
> +			return -EINVAL;
> +		if (c1 != c2)
> +			return 1;
> +	} while (c1);
> +
> +	return 0;
> +}
> +
> +/* String cf is expected to be a valid UTF-8 casefolded
> + * string.
> + */
> +static int utf8_strncasecmp_folded(const struct unicode_map *um,
> +				   const struct qstr *cf,
> +				   const struct qstr *s1)
> +{
> +	const struct utf8data *data = utf8nfdicf(um->version);
> +	struct utf8cursor cur1;
> +	int c1, c2;
> +	int i = 0;
> +
> +	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> +		return -EINVAL;
> +
> +	do {
> +		c1 = utf8byte(&cur1);
> +		c2 = cf->name[i++];
> +		if (c1 < 0)
> +			return -EINVAL;
> +		if (c1 != c2)
> +			return 1;
> +	} while (c1);
> +
> +	return 0;
> +}
> +
> +static int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
> +			 unsigned char *dest, size_t dlen)
> +{
> +	const struct utf8data *data = utf8nfdicf(um->version);
> +	struct utf8cursor cur;
> +	size_t nlen = 0;
> +
> +	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> +		return -EINVAL;
> +
> +	for (nlen = 0; nlen < dlen; nlen++) {
> +		int c = utf8byte(&cur);
> +
> +		dest[nlen] = c;
> +		if (!c)
> +			return nlen;
> +		if (c == -1)
> +			break;
> +	}
> +	return -EINVAL;
> +}
> +
> +static int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
> +			      struct qstr *str)
> +{
> +	const struct utf8data *data = utf8nfdicf(um->version);
> +	struct utf8cursor cur;
> +	int c;
> +	unsigned long hash = init_name_hash(salt);
> +
> +	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> +		return -EINVAL;
> +
> +	while ((c = utf8byte(&cur))) {
> +		if (c < 0)
> +			return -EINVAL;
> +		hash = partial_name_hash((unsigned char)c, hash);
> +	}
> +	str->hash = end_name_hash(hash);
> +	return 0;
> +}
> +
> +static int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
> +			  unsigned char *dest, size_t dlen)
> +{
> +	const struct utf8data *data = utf8nfdi(um->version);
> +	struct utf8cursor cur;
> +	ssize_t nlen = 0;
> +
> +	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
> +		return -EINVAL;
> +
> +	for (nlen = 0; nlen < dlen; nlen++) {
> +		int c = utf8byte(&cur);
> +
> +		dest[nlen] = c;
> +		if (!c)
> +			return nlen;
> +		if (c == -1)
> +			break;
> +	}
> +	return -EINVAL;
> +}
> +
> +static int utf8_parse_version(const char *version, unsigned int *maj,
> +			      unsigned int *min, unsigned int *rev)
> +{
> +	substring_t args[3];
> +	char version_string[12];
> +	static const struct match_token token[] = {
> +		{1, "%d.%d.%d"},
> +		{0, NULL}
> +	};
> +
> +	int ret = strscpy(version_string, version, sizeof(version_string));
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (match_token(version_string, token, args) != 1)
> +		return -EINVAL;
> +
> +	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
> +	    match_int(&args[2], rev))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static struct unicode_map *utf8_load(const char *version)
> +{
> +	struct unicode_map *um = NULL;
> +	int unicode_version;
> +
> +	if (version) {
> +		unsigned int maj, min, rev;
> +
> +		if (utf8_parse_version(version, &maj, &min, &rev) < 0)
> +			return ERR_PTR(-EINVAL);
> +
> +		if (!utf8version_is_supported(maj, min, rev))
> +			return ERR_PTR(-EINVAL);
> +
> +		unicode_version = UNICODE_AGE(maj, min, rev);
> +	} else {
> +		unicode_version = utf8version_latest();
> +		pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).",
> +			(unicode_version >> 16) & 0xff,
> +			(unicode_version >> 8) & 0xff,
> +			(unicode_version & 0xfe));
> +	}
> +
> +	um = kzalloc(sizeof(*um), GFP_KERNEL);
> +	if (!um)
> +		return ERR_PTR(-ENOMEM);
> +
> +	um->charset = "UTF-8";
> +	um->version = unicode_version;
> +
> +	return um;
> +}
> +
> +static int __init utf8_init(void)
> +{
> +	static_call_update(_unicode_validate, utf8_validate);
> +	static_call_update(_unicode_strncmp, utf8_strncmp);
> +	static_call_update(_unicode_strncasecmp, utf8_strncasecmp);
> +	static_call_update(_unicode_strncasecmp_folded, utf8_strncasecmp_folded);
> +	static_call_update(_unicode_normalize, utf8_normalize);
> +	static_call_update(_unicode_casefold, utf8_casefold);
> +	static_call_update(_unicode_casefold_hash, utf8_casefold_hash);
> +	static_call_update(_unicode_load, utf8_load);
> +
> +	unicode_register(THIS_MODULE);
> +	return 0;
> +}
> +
> +static void __exit utf8_exit(void)
> +{
> +	static_call_update(_unicode_validate, unicode_validate_default);
> +	static_call_update(_unicode_strncmp, unicode_strncmp_default);
> +	static_call_update(_unicode_strncasecmp, unicode_strncasecmp_default);
> +	static_call_update(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
> +	static_call_update(_unicode_normalize, unicode_normalize_default);
> +	static_call_update(_unicode_casefold, unicode_casefold_default);
> +	static_call_update(_unicode_casefold_hash, unicode_casefold_hash_default);
> +	static_call_update(_unicode_load, unicode_load_default);
> +
> +	unicode_unregister();
> +}
> +
> +module_init(utf8_init);
> +module_exit(utf8_exit);
> +
> +MODULE_LICENSE("GPL v2");
> diff --git a/include/linux/unicode.h b/include/linux/unicode.h
> index de23f9ee720b..18a1d3db9de5 100644
> --- a/include/linux/unicode.h
> +++ b/include/linux/unicode.h
> @@ -4,33 +4,128 @@
>  
>  #include <linux/init.h>
>  #include <linux/dcache.h>
> +#include <linux/static_call.h>
> +
>  
>  struct unicode_map {
>  	const char *charset;
>  	int version;
>  };
>  
> -int unicode_validate(const struct unicode_map *um, const struct qstr *str);
> +static int unicode_warn_on(void)
> +{
> +	WARN_ON(1);
> +	return -EIO;
> +}

Creating this extra function adds the same number of lines than if you
write `WARN_ON(1); return -EIO;` in each of the few handlers below, but
the later would be more clear, and you already do it for
unicode_load_default anyway. :)
> +
> +static int unicode_validate_default(const struct unicode_map *um,
> +				    const struct qstr *str)
> +{
> +	return unicode_warn_on();
> +}
> +
> +static int unicode_strncmp_default(const struct unicode_map *um,
> +				   const struct qstr *s1,
> +				   const struct qstr *s2)
> +{
> +	return unicode_warn_on();
> +}
> +
> +static int unicode_strncasecmp_default(const struct unicode_map *um,
> +				       const struct qstr *s1,
> +				       const struct qstr *s2)
> +{
> +	return unicode_warn_on();
> +}
> +
> +static int unicode_strncasecmp_folded_default(const struct unicode_map *um,
> +					      const struct qstr *cf,
> +					      const struct qstr *s1)
> +{
> +	return unicode_warn_on();
> +}
> +
> +static int unicode_normalize_default(const struct unicode_map *um,
> +				     const struct qstr *str,
> +				     unsigned char *dest, size_t dlen)
> +{
> +	return unicode_warn_on();
> +}
> +
> +static int unicode_casefold_default(const struct unicode_map *um,
> +				    const struct qstr *str,
> +				    unsigned char *dest, size_t dlen)
> +{
> +	return unicode_warn_on();
> +}
>  
> -int unicode_strncmp(const struct unicode_map *um,
> -		    const struct qstr *s1, const struct qstr *s2);
> +static int unicode_casefold_hash_default(const struct unicode_map *um,
> +					 const void *salt, struct qstr *str)
> +{
> +	return unicode_warn_on();
> +}

Again, why isn't this in a .c ?  Does it need to be here?

>  
> -int unicode_strncasecmp(const struct unicode_map *um,
> -			const struct qstr *s1, const struct qstr *s2);
> -int unicode_strncasecmp_folded(const struct unicode_map *um,
> -			       const struct qstr *cf,
> -			       const struct qstr *s1);
> +static struct unicode_map *unicode_load_default(const char *version)
> +{
> +	unicode_warn_on();
> +	return ERR_PTR(-EIO);
> +}
>  
> -int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
> -		      unsigned char *dest, size_t dlen);
> +DECLARE_STATIC_CALL(_unicode_validate, unicode_validate_default);
> +DECLARE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
> +DECLARE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
> +DECLARE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
> +DECLARE_STATIC_CALL(_unicode_normalize, unicode_normalize_default);
> +DECLARE_STATIC_CALL(_unicode_casefold, unicode_casefold_default);
> +DECLARE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default);
> +DECLARE_STATIC_CALL(_unicode_load, unicode_load_default);

nit: I hate this functions starting with a single  _ .  they are not common in the
rest of the kernel either.

> -int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
> -		     unsigned char *dest, size_t dlen);
> +static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str)
> +{
> +	return static_call(_unicode_validate)(um, str);
> +}
>  
> -int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
> -			  struct qstr *str);
> +static inline int unicode_strncmp(const struct unicode_map *um,
> +				  const struct qstr *s1, const struct qstr *s2)
> +{
> +	return static_call(_unicode_strncmp)(um, s1, s2);
> +}
> +
> +static inline int unicode_strncasecmp(const struct unicode_map *um,
> +				      const struct qstr *s1, const struct qstr *s2)
> +{
> +	return static_call(_unicode_strncasecmp)(um, s1, s2);
> +}
> +
> +static inline int unicode_strncasecmp_folded(const struct unicode_map *um,
> +					     const struct qstr *cf,
> +					     const struct qstr *s1)
> +{
> +	return static_call(_unicode_strncasecmp_folded)(um, cf, s1);
> +}
> +
> +static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
> +				    unsigned char *dest, size_t dlen)
> +{
> +	return static_call(_unicode_normalize)(um, str, dest, dlen);
> +}
> +
> +static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
> +				   unsigned char *dest, size_t dlen)
> +{
> +	return static_call(_unicode_casefold)(um, str, dest, dlen);
> +}
> +
> +static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
> +					struct qstr *str)
> +{
> +	return static_call(_unicode_casefold_hash)(um, salt, str);
> +}
>  
>  struct unicode_map *unicode_load(const char *version);
>  void unicode_unload(struct unicode_map *um);
>  
> +void unicode_register(struct module *owner);
> +void unicode_unregister(void);
> +
>  #endif /* _LINUX_UNICODE_H */

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-29 21:20   ` Gabriel Krisman Bertazi
@ 2021-03-29 22:38     ` Shreeya Patel
  2021-03-29 23:10       ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 14+ messages in thread
From: Shreeya Patel @ 2021-03-29 22:38 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: tytso, adilger.kernel, jaegeuk, chao, ebiggers, drosen, ebiggers,
	yuchao0, linux-ext4, linux-kernel, linux-f2fs-devel,
	linux-fsdevel, kernel, andre.almeida


On 30/03/21 2:50 am, Gabriel Krisman Bertazi wrote:
> Shreeya Patel <shreeya.patel@collabora.com> writes:
>
>> utf8data.h_shipped has a large database table which is an auto-generated
>> decodification trie for the unicode normalization functions.
>> It is not necessary to load this large table in the kernel if no
>> filesystem is using it, hence make UTF-8 encoding loadable by converting
>> it into a module.
>> Modify the file called unicode-core which will act as a layer for
>> unicode subsystem. It will load the UTF-8 module and access it's functions
>> whenever any filesystem that needs unicode is mounted.
>> Also, indirect calls using function pointers are slow, use static calls to
>> avoid overhead caused in case of repeated indirect calls. Static calls
>> improves the performance by directly calling the functions as opposed to
>> indirect calls.
>>
>> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
>> ---
>> Changes in v5
>>    - Rename global variables and default static call functions for better
>>      understanding
>>    - Make only config UNICODE_UTF8 visible and config UNICODE to be always
>>      enabled provided UNICODE_UTF8 is enabled.
>>    - Improve the documentation for Kconfig
>>    - Improve the commit message.
>>   
>> Changes in v4
>>    - Return error from the static calls instead of doing nothing and
>>      succeeding even without loading the module.
>>    - Remove the complete usage of utf8_ops and use static calls at all
>>      places.
>>    - Restore the static calls to default values when module is unloaded.
>>    - Decrement the reference of module after calling the unload function.
>>    - Remove spinlock as there will be no race conditions after removing
>>      utf8_ops.
>>
>> Changes in v3
>>    - Add a patch which checks if utf8 is loaded before calling utf8_unload()
>>      in ext4 and f2fs filesystems
>>    - Return error if strscpy() returns value < 0
>>    - Correct the conditions to prevent NULL pointer dereference while
>>      accessing functions via utf8_ops variable.
>>    - Add spinlock to avoid race conditions.
>>    - Use static_call() for preventing speculative execution attacks.
>>
>> Changes in v2
>>    - Remove the duplicate file from the last patch.
>>    - Make the wrapper functions inline.
>>    - Remove msleep and use try_module_get() and module_put()
>>      for ensuring that module is loaded correctly and also
>>      doesn't get unloaded while in use.
>>    - Resolve the warning reported by kernel test robot.
>>    - Resolve all the checkpatch.pl warnings.
>>
>>
>>   fs/unicode/Kconfig        |  17 ++-
>>   fs/unicode/Makefile       |   5 +-
>>   fs/unicode/unicode-core.c | 241 +++++++----------------------------
>>   fs/unicode/unicode-utf8.c | 256 ++++++++++++++++++++++++++++++++++++++
>>   include/linux/unicode.h   | 123 +++++++++++++++---
>>   5 files changed, 426 insertions(+), 216 deletions(-)
>>   create mode 100644 fs/unicode/unicode-utf8.c
>>
>> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
>> index 2c27b9a5cd6c..ad4b837f2eb2 100644
>> --- a/fs/unicode/Kconfig
>> +++ b/fs/unicode/Kconfig
>> @@ -2,13 +2,26 @@
>>   #
>>   # UTF-8 normalization
>>   #
>> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
>> +# is enabled. This config option adds the unicode subsystem layer which loads
>> +# the UTF-8 module whenever any filesystem needs it.
>>   config UNICODE
>> -	bool "UTF-8 normalization and casefolding support"
>> +	bool
>> +
>> +# utf8data.h_shipped has a large database table which is an auto-generated
>> +# decodification trie for the unicode normalization functions and it is not
>> +# necessary to carry this large table in the kernel.
>> +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a
>> +# module and this module will be loaded by the unicode subsystem layer only
>> +# when any filesystem needs it.
>> +config UNICODE_UTF8
>> +	tristate "UTF-8 module"
>>   	help
>>   	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
>>   	  support.
>> +	select UNICODE
>>   
>>   config UNICODE_NORMALIZATION_SELFTEST
>>   	tristate "Test UTF-8 normalization support"
>> -	depends on UNICODE
>> +	depends on UNICODE_UTF8
>>   	default n
>> diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
>> index fbf9a629ed0d..49d50083e6ee 100644
>> --- a/fs/unicode/Makefile
>> +++ b/fs/unicode/Makefile
>> @@ -1,11 +1,14 @@
>>   # SPDX-License-Identifier: GPL-2.0
>>   
>>   obj-$(CONFIG_UNICODE) += unicode.o
>> +obj-$(CONFIG_UNICODE_UTF8) += utf8.o
>>   obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
>>   
>> -unicode-y := utf8-norm.o unicode-core.o
>> +unicode-y := unicode-core.o
>> +utf8-y := unicode-utf8.o utf8-norm.o
>>   
>>   $(obj)/utf8-norm.o: $(obj)/utf8data.h
>> +$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o
>>   
>>   # In the normal build, the checked-in utf8data.h is just shipped.
>>   #
>> diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c
>> index 730dbaedf593..07d42f471e42 100644
>> --- a/fs/unicode/unicode-core.c
>> +++ b/fs/unicode/unicode-core.c
>> @@ -1,237 +1,80 @@
>>   /* SPDX-License-Identifier: GPL-2.0 */
>>   #include <linux/module.h>
>>   #include <linux/kernel.h>
>> -#include <linux/string.h>
>>   #include <linux/slab.h>
>> -#include <linux/parser.h>
>>   #include <linux/errno.h>
>>   #include <linux/unicode.h>
>> -#include <linux/stringhash.h>
>>   
>> -#include "utf8n.h"
>> +static struct module *utf8mod;
>>   
>> -int unicode_validate(const struct unicode_map *um, const struct qstr *str)
>> -{
>> -	const struct utf8data *data = utf8nfdi(um->version);
>> +DEFINE_STATIC_CALL(_unicode_validate, unicode_validate_default);
>> +EXPORT_STATIC_CALL(_unicode_validate);
>>   
>> -	if (utf8nlen(data, str->name, str->len) < 0)
>> -		return -1;
>> -	return 0;
>> -}
>> -EXPORT_SYMBOL(unicode_validate);
>> +DEFINE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
>> +EXPORT_STATIC_CALL(_unicode_strncmp);
>>   
>> -int unicode_strncmp(const struct unicode_map *um,
>> -		    const struct qstr *s1, const struct qstr *s2)
>> -{
>> -	const struct utf8data *data = utf8nfdi(um->version);
>> -	struct utf8cursor cur1, cur2;
>> -	int c1, c2;
>> +DEFINE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
>> +EXPORT_STATIC_CALL(_unicode_strncasecmp);
> Why are these here if the _default functions are defined in the header
> file?  I think the definitions could be in this file. No?


Inline functions defined in header file are using these functions so
cannot define them here in .c file.


>> -	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
>> -		return -EINVAL;
>> +DEFINE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
>> +EXPORT_STATIC_CALL(_unicode_strncasecmp_folded);
>>   
>> -	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
>> -		return -EINVAL;
>> +DEFINE_STATIC_CALL(_unicode_normalize, unicode_normalize_default);
>> +EXPORT_STATIC_CALL(_unicode_normalize);
>>   
>> -	do {
>> -		c1 = utf8byte(&cur1);
>> -		c2 = utf8byte(&cur2);
>> +DEFINE_STATIC_CALL(_unicode_casefold, unicode_casefold_default);
>> +EXPORT_STATIC_CALL(_unicode_casefold);
>>   
>> -		if (c1 < 0 || c2 < 0)
>> -			return -EINVAL;
>> -		if (c1 != c2)
>> -			return 1;
>> -	} while (c1);
>> +DEFINE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default);
>> +EXPORT_STATIC_CALL(_unicode_casefold_hash);
>>   
>> -	return 0;
>> -}
>> -EXPORT_SYMBOL(unicode_strncmp);
>> +DEFINE_STATIC_CALL(_unicode_load, unicode_load_default);
>> +EXPORT_STATIC_CALL(_unicode_load);
>>   
>> -int unicode_strncasecmp(const struct unicode_map *um,
>> -			const struct qstr *s1, const struct qstr *s2)
>> +static int unicode_load_module(void)
>>   {
>> -	const struct utf8data *data = utf8nfdicf(um->version);
>> -	struct utf8cursor cur1, cur2;
>> -	int c1, c2;
>> -
>> -	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
>> -		return -EINVAL;
>> -
>> -	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
>> -		return -EINVAL;
>> -
>> -	do {
>> -		c1 = utf8byte(&cur1);
>> -		c2 = utf8byte(&cur2);
>> -
>> -		if (c1 < 0 || c2 < 0)
>> -			return -EINVAL;
>> -		if (c1 != c2)
>> -			return 1;
>> -	} while (c1);
>> -
>> -	return 0;
>> -}
>> -EXPORT_SYMBOL(unicode_strncasecmp);
>> -
>> -/* String cf is expected to be a valid UTF-8 casefolded
>> - * string.
>> - */
>> -int unicode_strncasecmp_folded(const struct unicode_map *um,
>> -			       const struct qstr *cf,
>> -			       const struct qstr *s1)
>> -{
>> -	const struct utf8data *data = utf8nfdicf(um->version);
>> -	struct utf8cursor cur1;
>> -	int c1, c2;
>> -	int i = 0;
>> -
>> -	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
>> -		return -EINVAL;
>> -
>> -	do {
>> -		c1 = utf8byte(&cur1);
>> -		c2 = cf->name[i++];
>> -		if (c1 < 0)
>> -			return -EINVAL;
>> -		if (c1 != c2)
>> -			return 1;
>> -	} while (c1);
>> -
>> -	return 0;
>> -}
>> -EXPORT_SYMBOL(unicode_strncasecmp_folded);
>> -
>> -int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
>> -		     unsigned char *dest, size_t dlen)
>> -{
>> -	const struct utf8data *data = utf8nfdicf(um->version);
>> -	struct utf8cursor cur;
>> -	size_t nlen = 0;
>> -
>> -	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
>> -		return -EINVAL;
>> -
>> -	for (nlen = 0; nlen < dlen; nlen++) {
>> -		int c = utf8byte(&cur);
>> -
>> -		dest[nlen] = c;
>> -		if (!c)
>> -			return nlen;
>> -		if (c == -1)
>> -			break;
>> -	}
>> -	return -EINVAL;
>> -}
>> -EXPORT_SYMBOL(unicode_casefold);
>> +	int ret = request_module("utf8");
>>   
>> -int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
>> -			  struct qstr *str)
>> -{
>> -	const struct utf8data *data = utf8nfdicf(um->version);
>> -	struct utf8cursor cur;
>> -	int c;
>> -	unsigned long hash = init_name_hash(salt);
>> -
>> -	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
>> -		return -EINVAL;
>> -
>> -	while ((c = utf8byte(&cur))) {
>> -		if (c < 0)
>> -			return -EINVAL;
>> -		hash = partial_name_hash((unsigned char)c, hash);
>> +	if (ret) {
>> +		pr_err("Failed to load UTF-8 module\n");
>> +		return ret;
>>   	}
>> -	str->hash = end_name_hash(hash);
>>   	return 0;
>>   }
>> -EXPORT_SYMBOL(unicode_casefold_hash);
>>   
>> -int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
>> -		      unsigned char *dest, size_t dlen)
>> +struct unicode_map *unicode_load(const char *version)
>>   {
>> -	const struct utf8data *data = utf8nfdi(um->version);
>> -	struct utf8cursor cur;
>> -	ssize_t nlen = 0;
>> -
>> -	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
>> -		return -EINVAL;
>> +	int ret = unicode_load_module();
> Splitting this in two functions sound unnecessary, since the other
> function just calls request_module.  By the way, is there any protection
> against calling request_module if the module is already loaded?  Surely
> that's not necessary, perhaps try_then_request_module(utf8mod, "utf8")?


Yes, try_then_request_module  would be a better choice.


>> -	for (nlen = 0; nlen < dlen; nlen++) {
>> -		int c = utf8byte(&cur);
>> +	if (ret)
>> +		return ERR_PTR(ret);
>>   
>> -		dest[nlen] = c;
>> -		if (!c)
>> -			return nlen;
>> -		if (c == -1)
>> -			break;
>> -	}
>> -	return -EINVAL;
>> +	if (!try_module_get(utf8mod))
> Can't module_unregister be called in between the register_module and
> here, and then you have a bogus utf8mod pointer? true, try_module_get
> checks for NULL, but if you are unlucky module_is_live will it breaks.
> I still think utf8mod needs to be protected while you don't have a
> reference to the module.
>
>> +		return ERR_PTR(-ENODEV);
>> +	else
>> +		return static_call(_unicode_load)(version);
>>   }
>> -EXPORT_SYMBOL(unicode_normalize);
>> +EXPORT_SYMBOL(unicode_load);
>>   
>> -static int unicode_parse_version(const char *version, unsigned int *maj,
>> -				 unsigned int *min, unsigned int *rev)
>> +void unicode_unload(struct unicode_map *um)
>>   {
>> -	substring_t args[3];
>> -	char version_string[12];
>> -	static const struct match_token token[] = {
>> -		{1, "%d.%d.%d"},
>> -		{0, NULL}
>> -	};
>> -	int ret = strscpy(version_string, version, sizeof(version_string));
>> -
>> -	if (ret < 0)
>> -		return ret;
>> -
>> -	if (match_token(version_string, token, args) != 1)
>> -		return -EINVAL;
>> -
>> -	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
>> -	    match_int(&args[2], rev))
>> -		return -EINVAL;
>> +	kfree(um);
>>   
>> -	return 0;
>> +	if (utf8mod)
>> +		module_put(utf8mod);
>>   }
>> +EXPORT_SYMBOL(unicode_unload);
>>   
>> -struct unicode_map *unicode_load(const char *version)
>> +void unicode_register(struct module *owner)
>>   {
>> -	struct unicode_map *um = NULL;
>> -	int unicode_version;
>> -
>> -	if (version) {
>> -		unsigned int maj, min, rev;
>> -
>> -		if (unicode_parse_version(version, &maj, &min, &rev) < 0)
>> -			return ERR_PTR(-EINVAL);
>> -
>> -		if (!utf8version_is_supported(maj, min, rev))
>> -			return ERR_PTR(-EINVAL);
>> -
>> -		unicode_version = UNICODE_AGE(maj, min, rev);
>> -	} else {
>> -		unicode_version = utf8version_latest();
>> -		printk(KERN_WARNING"UTF-8 version not specified. "
>> -		       "Assuming latest supported version (%d.%d.%d).",
>> -		       (unicode_version >> 16) & 0xff,
>> -		       (unicode_version >> 8) & 0xff,
>> -		       (unicode_version & 0xff));
>> -	}
>> -
>> -	um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL);
>> -	if (!um)
>> -		return ERR_PTR(-ENOMEM);
>> -
>> -	um->charset = "UTF-8";
>> -	um->version = unicode_version;
>> -
>> -	return um;
>> +	utf8mod = owner;
>>   }
>> -EXPORT_SYMBOL(unicode_load);
>> +EXPORT_SYMBOL(unicode_register);
>>   
>> -void unicode_unload(struct unicode_map *um)
>> +void unicode_unregister(void)
>>   {
>> -	kfree(um);
>> +	utf8mod = NULL;
>>   }
>> -EXPORT_SYMBOL(unicode_unload);
>> +EXPORT_SYMBOL(unicode_unregister);
>>   
>>   MODULE_LICENSE("GPL v2");
>> diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c
>> new file mode 100644
>> index 000000000000..9c6b58239067
>> --- /dev/null
>> +++ b/fs/unicode/unicode-utf8.c
>> @@ -0,0 +1,256 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/module.h>
>> +#include <linux/kernel.h>
>> +#include <linux/string.h>
>> +#include <linux/slab.h>
>> +#include <linux/parser.h>
>> +#include <linux/errno.h>
>> +#include <linux/unicode.h>
>> +#include <linux/stringhash.h>
>> +#include <linux/static_call.h>
>> +
>> +#include "utf8n.h"
>> +
>> +static int utf8_validate(const struct unicode_map *um, const struct qstr *str)
>> +{
>> +	const struct utf8data *data = utf8nfdi(um->version);
>> +
>> +	if (utf8nlen(data, str->name, str->len) < 0)
>> +		return -1;
>> +	return 0;
>> +}
>> +
>> +static int utf8_strncmp(const struct unicode_map *um,
>> +			const struct qstr *s1, const struct qstr *s2)
>> +{
>> +	const struct utf8data *data = utf8nfdi(um->version);
>> +	struct utf8cursor cur1, cur2;
>> +	int c1, c2;
>> +
>> +	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
>> +		return -EINVAL;
>> +
>> +	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
>> +		return -EINVAL;
>> +
>> +	do {
>> +		c1 = utf8byte(&cur1);
>> +		c2 = utf8byte(&cur2);
>> +
>> +		if (c1 < 0 || c2 < 0)
>> +			return -EINVAL;
>> +		if (c1 != c2)
>> +			return 1;
>> +	} while (c1);
>> +
>> +	return 0;
>> +}
>> +
>> +static int utf8_strncasecmp(const struct unicode_map *um,
>> +			    const struct qstr *s1, const struct qstr *s2)
>> +{
>> +	const struct utf8data *data = utf8nfdicf(um->version);
>> +	struct utf8cursor cur1, cur2;
>> +	int c1, c2;
>> +
>> +	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
>> +		return -EINVAL;
>> +
>> +	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
>> +		return -EINVAL;
>> +
>> +	do {
>> +		c1 = utf8byte(&cur1);
>> +		c2 = utf8byte(&cur2);
>> +
>> +		if (c1 < 0 || c2 < 0)
>> +			return -EINVAL;
>> +		if (c1 != c2)
>> +			return 1;
>> +	} while (c1);
>> +
>> +	return 0;
>> +}
>> +
>> +/* String cf is expected to be a valid UTF-8 casefolded
>> + * string.
>> + */
>> +static int utf8_strncasecmp_folded(const struct unicode_map *um,
>> +				   const struct qstr *cf,
>> +				   const struct qstr *s1)
>> +{
>> +	const struct utf8data *data = utf8nfdicf(um->version);
>> +	struct utf8cursor cur1;
>> +	int c1, c2;
>> +	int i = 0;
>> +
>> +	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
>> +		return -EINVAL;
>> +
>> +	do {
>> +		c1 = utf8byte(&cur1);
>> +		c2 = cf->name[i++];
>> +		if (c1 < 0)
>> +			return -EINVAL;
>> +		if (c1 != c2)
>> +			return 1;
>> +	} while (c1);
>> +
>> +	return 0;
>> +}
>> +
>> +static int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
>> +			 unsigned char *dest, size_t dlen)
>> +{
>> +	const struct utf8data *data = utf8nfdicf(um->version);
>> +	struct utf8cursor cur;
>> +	size_t nlen = 0;
>> +
>> +	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
>> +		return -EINVAL;
>> +
>> +	for (nlen = 0; nlen < dlen; nlen++) {
>> +		int c = utf8byte(&cur);
>> +
>> +		dest[nlen] = c;
>> +		if (!c)
>> +			return nlen;
>> +		if (c == -1)
>> +			break;
>> +	}
>> +	return -EINVAL;
>> +}
>> +
>> +static int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
>> +			      struct qstr *str)
>> +{
>> +	const struct utf8data *data = utf8nfdicf(um->version);
>> +	struct utf8cursor cur;
>> +	int c;
>> +	unsigned long hash = init_name_hash(salt);
>> +
>> +	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
>> +		return -EINVAL;
>> +
>> +	while ((c = utf8byte(&cur))) {
>> +		if (c < 0)
>> +			return -EINVAL;
>> +		hash = partial_name_hash((unsigned char)c, hash);
>> +	}
>> +	str->hash = end_name_hash(hash);
>> +	return 0;
>> +}
>> +
>> +static int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
>> +			  unsigned char *dest, size_t dlen)
>> +{
>> +	const struct utf8data *data = utf8nfdi(um->version);
>> +	struct utf8cursor cur;
>> +	ssize_t nlen = 0;
>> +
>> +	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
>> +		return -EINVAL;
>> +
>> +	for (nlen = 0; nlen < dlen; nlen++) {
>> +		int c = utf8byte(&cur);
>> +
>> +		dest[nlen] = c;
>> +		if (!c)
>> +			return nlen;
>> +		if (c == -1)
>> +			break;
>> +	}
>> +	return -EINVAL;
>> +}
>> +
>> +static int utf8_parse_version(const char *version, unsigned int *maj,
>> +			      unsigned int *min, unsigned int *rev)
>> +{
>> +	substring_t args[3];
>> +	char version_string[12];
>> +	static const struct match_token token[] = {
>> +		{1, "%d.%d.%d"},
>> +		{0, NULL}
>> +	};
>> +
>> +	int ret = strscpy(version_string, version, sizeof(version_string));
>> +
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (match_token(version_string, token, args) != 1)
>> +		return -EINVAL;
>> +
>> +	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
>> +	    match_int(&args[2], rev))
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>> +static struct unicode_map *utf8_load(const char *version)
>> +{
>> +	struct unicode_map *um = NULL;
>> +	int unicode_version;
>> +
>> +	if (version) {
>> +		unsigned int maj, min, rev;
>> +
>> +		if (utf8_parse_version(version, &maj, &min, &rev) < 0)
>> +			return ERR_PTR(-EINVAL);
>> +
>> +		if (!utf8version_is_supported(maj, min, rev))
>> +			return ERR_PTR(-EINVAL);
>> +
>> +		unicode_version = UNICODE_AGE(maj, min, rev);
>> +	} else {
>> +		unicode_version = utf8version_latest();
>> +		pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).",
>> +			(unicode_version >> 16) & 0xff,
>> +			(unicode_version >> 8) & 0xff,
>> +			(unicode_version & 0xfe));
>> +	}
>> +
>> +	um = kzalloc(sizeof(*um), GFP_KERNEL);
>> +	if (!um)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	um->charset = "UTF-8";
>> +	um->version = unicode_version;
>> +
>> +	return um;
>> +}
>> +
>> +static int __init utf8_init(void)
>> +{
>> +	static_call_update(_unicode_validate, utf8_validate);
>> +	static_call_update(_unicode_strncmp, utf8_strncmp);
>> +	static_call_update(_unicode_strncasecmp, utf8_strncasecmp);
>> +	static_call_update(_unicode_strncasecmp_folded, utf8_strncasecmp_folded);
>> +	static_call_update(_unicode_normalize, utf8_normalize);
>> +	static_call_update(_unicode_casefold, utf8_casefold);
>> +	static_call_update(_unicode_casefold_hash, utf8_casefold_hash);
>> +	static_call_update(_unicode_load, utf8_load);
>> +
>> +	unicode_register(THIS_MODULE);
>> +	return 0;
>> +}
>> +
>> +static void __exit utf8_exit(void)
>> +{
>> +	static_call_update(_unicode_validate, unicode_validate_default);
>> +	static_call_update(_unicode_strncmp, unicode_strncmp_default);
>> +	static_call_update(_unicode_strncasecmp, unicode_strncasecmp_default);
>> +	static_call_update(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
>> +	static_call_update(_unicode_normalize, unicode_normalize_default);
>> +	static_call_update(_unicode_casefold, unicode_casefold_default);
>> +	static_call_update(_unicode_casefold_hash, unicode_casefold_hash_default);
>> +	static_call_update(_unicode_load, unicode_load_default);
>> +
>> +	unicode_unregister();
>> +}
>> +
>> +module_init(utf8_init);
>> +module_exit(utf8_exit);
>> +
>> +MODULE_LICENSE("GPL v2");
>> diff --git a/include/linux/unicode.h b/include/linux/unicode.h
>> index de23f9ee720b..18a1d3db9de5 100644
>> --- a/include/linux/unicode.h
>> +++ b/include/linux/unicode.h
>> @@ -4,33 +4,128 @@
>>   
>>   #include <linux/init.h>
>>   #include <linux/dcache.h>
>> +#include <linux/static_call.h>
>> +
>>   
>>   struct unicode_map {
>>   	const char *charset;
>>   	int version;
>>   };
>>   
>> -int unicode_validate(const struct unicode_map *um, const struct qstr *str);
>> +static int unicode_warn_on(void)
>> +{
>> +	WARN_ON(1);
>> +	return -EIO;
>> +}
> Creating this extra function adds the same number of lines than if you
> write `WARN_ON(1); return -EIO;` in each of the few handlers below, but
> the later would be more clear, and you already do it for
> unicode_load_default anyway. :)
>> +
>> +static int unicode_validate_default(const struct unicode_map *um,
>> +				    const struct qstr *str)
>> +{
>> +	return unicode_warn_on();
>> +}
>> +
>> +static int unicode_strncmp_default(const struct unicode_map *um,
>> +				   const struct qstr *s1,
>> +				   const struct qstr *s2)
>> +{
>> +	return unicode_warn_on();
>> +}
>> +
>> +static int unicode_strncasecmp_default(const struct unicode_map *um,
>> +				       const struct qstr *s1,
>> +				       const struct qstr *s2)
>> +{
>> +	return unicode_warn_on();
>> +}
>> +
>> +static int unicode_strncasecmp_folded_default(const struct unicode_map *um,
>> +					      const struct qstr *cf,
>> +					      const struct qstr *s1)
>> +{
>> +	return unicode_warn_on();
>> +}
>> +
>> +static int unicode_normalize_default(const struct unicode_map *um,
>> +				     const struct qstr *str,
>> +				     unsigned char *dest, size_t dlen)
>> +{
>> +	return unicode_warn_on();
>> +}
>> +
>> +static int unicode_casefold_default(const struct unicode_map *um,
>> +				    const struct qstr *str,
>> +				    unsigned char *dest, size_t dlen)
>> +{
>> +	return unicode_warn_on();
>> +}
>>   
>> -int unicode_strncmp(const struct unicode_map *um,
>> -		    const struct qstr *s1, const struct qstr *s2);
>> +static int unicode_casefold_hash_default(const struct unicode_map *um,
>> +					 const void *salt, struct qstr *str)
>> +{
>> +	return unicode_warn_on();
>> +}
> Again, why isn't this in a .c ?  Does it need to be here?
>
>>   
>> -int unicode_strncasecmp(const struct unicode_map *um,
>> -			const struct qstr *s1, const struct qstr *s2);
>> -int unicode_strncasecmp_folded(const struct unicode_map *um,
>> -			       const struct qstr *cf,
>> -			       const struct qstr *s1);
>> +static struct unicode_map *unicode_load_default(const char *version)
>> +{
>> +	unicode_warn_on();
>> +	return ERR_PTR(-EIO);
>> +}
>>   
>> -int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
>> -		      unsigned char *dest, size_t dlen);
>> +DECLARE_STATIC_CALL(_unicode_validate, unicode_validate_default);
>> +DECLARE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
>> +DECLARE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
>> +DECLARE_STATIC_CALL(_unicode_strncasecmp_folded, unicode_strncasecmp_folded_default);
>> +DECLARE_STATIC_CALL(_unicode_normalize, unicode_normalize_default);
>> +DECLARE_STATIC_CALL(_unicode_casefold, unicode_casefold_default);
>> +DECLARE_STATIC_CALL(_unicode_casefold_hash, unicode_casefold_hash_default);
>> +DECLARE_STATIC_CALL(_unicode_load, unicode_load_default);
> nit: I hate this functions starting with a single  _ .  they are not common in the
> rest of the kernel either.
>> -int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
>> -		     unsigned char *dest, size_t dlen);
>> +static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str)
>> +{
>> +	return static_call(_unicode_validate)(um, str);
>> +}
>>   
>> -int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
>> -			  struct qstr *str);
>> +static inline int unicode_strncmp(const struct unicode_map *um,
>> +				  const struct qstr *s1, const struct qstr *s2)
>> +{
>> +	return static_call(_unicode_strncmp)(um, s1, s2);
>> +}
>> +
>> +static inline int unicode_strncasecmp(const struct unicode_map *um,
>> +				      const struct qstr *s1, const struct qstr *s2)
>> +{
>> +	return static_call(_unicode_strncasecmp)(um, s1, s2);
>> +}
>> +
>> +static inline int unicode_strncasecmp_folded(const struct unicode_map *um,
>> +					     const struct qstr *cf,
>> +					     const struct qstr *s1)
>> +{
>> +	return static_call(_unicode_strncasecmp_folded)(um, cf, s1);
>> +}
>> +
>> +static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
>> +				    unsigned char *dest, size_t dlen)
>> +{
>> +	return static_call(_unicode_normalize)(um, str, dest, dlen);
>> +}
>> +
>> +static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
>> +				   unsigned char *dest, size_t dlen)
>> +{
>> +	return static_call(_unicode_casefold)(um, str, dest, dlen);
>> +}
>> +
>> +static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
>> +					struct qstr *str)
>> +{
>> +	return static_call(_unicode_casefold_hash)(um, salt, str);
>> +}
>>   
>>   struct unicode_map *unicode_load(const char *version);
>>   void unicode_unload(struct unicode_map *um);
>>   
>> +void unicode_register(struct module *owner);
>> +void unicode_unregister(void);
>> +
>>   #endif /* _LINUX_UNICODE_H */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-29 22:38     ` Shreeya Patel
@ 2021-03-29 23:10       ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 14+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-03-29 23:10 UTC (permalink / raw)
  To: Shreeya Patel
  Cc: tytso, adilger.kernel, jaegeuk, chao, ebiggers, drosen, ebiggers,
	yuchao0, linux-ext4, linux-kernel, linux-f2fs-devel,
	linux-fsdevel, kernel, andre.almeida

Shreeya Patel <shreeya.patel@collabora.com> writes:

> On 30/03/21 2:50 am, Gabriel Krisman Bertazi wrote:

>>> +DEFINE_STATIC_CALL(_unicode_strncmp, unicode_strncmp_default);
>>> +EXPORT_STATIC_CALL(_unicode_strncmp);
>>>   -int unicode_strncmp(const struct unicode_map *um,
>>> -		    const struct qstr *s1, const struct qstr *s2)
>>> -{
>>> -	const struct utf8data *data = utf8nfdi(um->version);
>>> -	struct utf8cursor cur1, cur2;
>>> -	int c1, c2;
>>> +DEFINE_STATIC_CALL(_unicode_strncasecmp, unicode_strncasecmp_default);
>>> +EXPORT_STATIC_CALL(_unicode_strncasecmp);
>> Why are these here if the _default functions are defined in the header
>> file?  I think the definitions could be in this file. No?
>
>
> Inline functions defined in header file are using these functions so
> cannot define them here in .c file.

That is not a problem.  It is regular C code, you can just move the
definition to the C code and add the declaration to the header file, and
it will work fine.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode
  2021-03-29 20:42 ` [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode Shreeya Patel
@ 2021-03-30  1:53   ` Eric Biggers
  2021-03-30  9:49     ` Shreeya Patel
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Biggers @ 2021-03-30  1:53 UTC (permalink / raw)
  To: Shreeya Patel
  Cc: tytso, adilger.kernel, jaegeuk, chao, krisman, drosen, yuchao0,
	linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida

On Tue, Mar 30, 2021 at 02:12:38AM +0530, Shreeya Patel wrote:
> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions and it is not
> necessary to carry this large table in the kernel.
> Goal is to make UTF-8 encoding loadable by converting it into a module
> and adding a unicode subsystem layer between the filesystems and the
> utf8 module.
> This layer will load the module whenever any filesystem that
> needs unicode is mounted.
> utf8-core will be converted into this layer file in the future patches,
> hence rename the function names from utf8 to unicode which will denote the
> functions as the unicode subsystem layer functions and this will also be
> the first step towards the transformation of utf8-core file into the
> unicode subsystem layer file.
> 
> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
> ---
> Changes in v5
>   - Improve the commit message.

This didn't really answer my questions about the reason for this renaming.
Aren't the functions like unicode_casefold() still tied to UTF-8 (as opposed to
e.g. supporting both UTF-8 and UTF-16)?  Is that something you're trying to
change?

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-29 20:42 ` [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer Shreeya Patel
  2021-03-29 21:20   ` Gabriel Krisman Bertazi
@ 2021-03-30  2:01   ` Eric Biggers
  2021-03-30  2:16     ` Gabriel Krisman Bertazi
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Biggers @ 2021-03-30  2:01 UTC (permalink / raw)
  To: Shreeya Patel
  Cc: tytso, adilger.kernel, jaegeuk, chao, krisman, drosen, yuchao0,
	linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida

On Tue, Mar 30, 2021 at 02:12:40AM +0530, Shreeya Patel wrote:
> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> index 2c27b9a5cd6c..ad4b837f2eb2 100644
> --- a/fs/unicode/Kconfig
> +++ b/fs/unicode/Kconfig
> @@ -2,13 +2,26 @@
>  #
>  # UTF-8 normalization
>  #
> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
> +# is enabled. This config option adds the unicode subsystem layer which loads
> +# the UTF-8 module whenever any filesystem needs it.
>  config UNICODE
> -	bool "UTF-8 normalization and casefolding support"
> +	bool
> +
> +# utf8data.h_shipped has a large database table which is an auto-generated
> +# decodification trie for the unicode normalization functions and it is not
> +# necessary to carry this large table in the kernel.
> +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a
> +# module and this module will be loaded by the unicode subsystem layer only
> +# when any filesystem needs it.
> +config UNICODE_UTF8
> +	tristate "UTF-8 module"
>  	help
>  	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
>  	  support.
> +	select UNICODE

This seems problematic; it allows users to set CONFIG_EXT4_FS=y (or
CONFIG_F2FS_FS=y) but then CONFIG_UNICODE_UTF8=m.  Then the filesystem won't
work if the modules are located on the filesystem itself.

I think it should work analogously to CONFIG_FS_ENCRYPTION and
CONFIG_FS_ENCRYPTION_ALGS.  That is, CONFIG_UNICODE should be a user-selectable
bool, and then the tristate symbols CONFIG_EXT4_FS and CONFIG_F2FS_FS should
select the tristate symbol CONFIG_UNICODE_UTF8 if CONFIG_UNICODE.

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-30  2:01   ` Eric Biggers
@ 2021-03-30  2:16     ` Gabriel Krisman Bertazi
  2021-03-30  5:47       ` Eric Biggers
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-03-30  2:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Shreeya Patel, tytso, adilger.kernel, jaegeuk, chao, drosen,
	yuchao0, linux-ext4, linux-kernel, linux-f2fs-devel,
	linux-fsdevel, kernel, andre.almeida

Eric Biggers <ebiggers@kernel.org> writes:

> On Tue, Mar 30, 2021 at 02:12:40AM +0530, Shreeya Patel wrote:
>> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
>> index 2c27b9a5cd6c..ad4b837f2eb2 100644
>> --- a/fs/unicode/Kconfig
>> +++ b/fs/unicode/Kconfig
>> @@ -2,13 +2,26 @@
>>  #
>>  # UTF-8 normalization
>>  #
>> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
>> +# is enabled. This config option adds the unicode subsystem layer which loads
>> +# the UTF-8 module whenever any filesystem needs it.
>>  config UNICODE
>> -	bool "UTF-8 normalization and casefolding support"
>> +	bool
>> +
>> +# utf8data.h_shipped has a large database table which is an auto-generated
>> +# decodification trie for the unicode normalization functions and it is not
>> +# necessary to carry this large table in the kernel.
>> +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a
>> +# module and this module will be loaded by the unicode subsystem layer only
>> +# when any filesystem needs it.
>> +config UNICODE_UTF8
>> +	tristate "UTF-8 module"
>>  	help
>>  	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
>>  	  support.
>> +	select UNICODE
>
> This seems problematic; it allows users to set CONFIG_EXT4_FS=y (or
> CONFIG_F2FS_FS=y) but then CONFIG_UNICODE_UTF8=m.  Then the filesystem won't
> work if the modules are located on the filesystem itself.

Hi Eric,

Isn't this a user problem?  If the modules required to boot are on the
filesystem itself, you are in trouble.  But, if that is the case, your
rootfs is case-insensitive and you gotta have utf8 as built-in or have
it in an early userspace.

> I think it should work analogously to CONFIG_FS_ENCRYPTION and
> CONFIG_FS_ENCRYPTION_ALGS.  That is, CONFIG_UNICODE should be a user-selectable
> bool, and then the tristate symbols CONFIG_EXT4_FS and CONFIG_F2FS_FS should
> select the tristate symbol CONFIG_UNICODE_UTF8 if CONFIG_UNICODE.





-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-30  2:16     ` Gabriel Krisman Bertazi
@ 2021-03-30  5:47       ` Eric Biggers
  2021-03-30 16:00         ` Theodore Ts'o
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Biggers @ 2021-03-30  5:47 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Shreeya Patel, tytso, adilger.kernel, jaegeuk, chao, drosen,
	yuchao0, linux-ext4, linux-kernel, linux-f2fs-devel,
	linux-fsdevel, kernel, andre.almeida

On Mon, Mar 29, 2021 at 10:16:57PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Tue, Mar 30, 2021 at 02:12:40AM +0530, Shreeya Patel wrote:
> >> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> >> index 2c27b9a5cd6c..ad4b837f2eb2 100644
> >> --- a/fs/unicode/Kconfig
> >> +++ b/fs/unicode/Kconfig
> >> @@ -2,13 +2,26 @@
> >>  #
> >>  # UTF-8 normalization
> >>  #
> >> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
> >> +# is enabled. This config option adds the unicode subsystem layer which loads
> >> +# the UTF-8 module whenever any filesystem needs it.
> >>  config UNICODE
> >> -	bool "UTF-8 normalization and casefolding support"
> >> +	bool
> >> +
> >> +# utf8data.h_shipped has a large database table which is an auto-generated
> >> +# decodification trie for the unicode normalization functions and it is not
> >> +# necessary to carry this large table in the kernel.
> >> +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a
> >> +# module and this module will be loaded by the unicode subsystem layer only
> >> +# when any filesystem needs it.
> >> +config UNICODE_UTF8
> >> +	tristate "UTF-8 module"
> >>  	help
> >>  	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
> >>  	  support.
> >> +	select UNICODE
> >
> > This seems problematic; it allows users to set CONFIG_EXT4_FS=y (or
> > CONFIG_F2FS_FS=y) but then CONFIG_UNICODE_UTF8=m.  Then the filesystem won't
> > work if the modules are located on the filesystem itself.
> 
> Hi Eric,
> 
> Isn't this a user problem?  If the modules required to boot are on the
> filesystem itself, you are in trouble.  But, if that is the case, your
> rootfs is case-insensitive and you gotta have utf8 as built-in or have
> it in an early userspace.
> 

We could make it the user's problem, but that seems rather unfriendly.
Especially because the utf8 module would be needed if the filesystem has the
casefold feature at all, regardless of whether any casefolded directories are
needed at boot time or not.  (Unless there is a plan to change that?)

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode
  2021-03-30  1:53   ` Eric Biggers
@ 2021-03-30  9:49     ` Shreeya Patel
  0 siblings, 0 replies; 14+ messages in thread
From: Shreeya Patel @ 2021-03-30  9:49 UTC (permalink / raw)
  To: Eric Biggers
  Cc: tytso, adilger.kernel, jaegeuk, chao, krisman, drosen, yuchao0,
	linux-ext4, linux-kernel, linux-f2fs-devel, linux-fsdevel,
	kernel, andre.almeida


On 30/03/21 7:23 am, Eric Biggers wrote:
> On Tue, Mar 30, 2021 at 02:12:38AM +0530, Shreeya Patel wrote:
>> utf8data.h_shipped has a large database table which is an auto-generated
>> decodification trie for the unicode normalization functions and it is not
>> necessary to carry this large table in the kernel.
>> Goal is to make UTF-8 encoding loadable by converting it into a module
>> and adding a unicode subsystem layer between the filesystems and the
>> utf8 module.
>> This layer will load the module whenever any filesystem that
>> needs unicode is mounted.
>> utf8-core will be converted into this layer file in the future patches,
>> hence rename the function names from utf8 to unicode which will denote the
>> functions as the unicode subsystem layer functions and this will also be
>> the first step towards the transformation of utf8-core file into the
>> unicode subsystem layer file.
>>
>> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
>> ---
>> Changes in v5
>>    - Improve the commit message.
> This didn't really answer my questions about the reason for this renaming.
> Aren't the functions like unicode_casefold() still tied to UTF-8 (as opposed to
> e.g. supporting both UTF-8 and UTF-16)?  Is that something you're trying to
> change?


Currently, layer's functions are still tied to UTF-8 encoding only. But 
in future if we will have UTF-16 support then layer file would have to 
be changed accordingly to support both.
Unicode subsystem layer is a generic layer which connects the 
filesystems and UTF8 module ( and/or UTF16 in future )


>
> - Eric
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer
  2021-03-30  5:47       ` Eric Biggers
@ 2021-03-30 16:00         ` Theodore Ts'o
  0 siblings, 0 replies; 14+ messages in thread
From: Theodore Ts'o @ 2021-03-30 16:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Gabriel Krisman Bertazi, Shreeya Patel, adilger.kernel, jaegeuk,
	chao, drosen, yuchao0, linux-ext4, linux-kernel,
	linux-f2fs-devel, linux-fsdevel, kernel, andre.almeida

On Mon, Mar 29, 2021 at 10:47:52PM -0700, Eric Biggers wrote:
> > Isn't this a user problem?  If the modules required to boot are on the
> > filesystem itself, you are in trouble.  But, if that is the case, your
> > rootfs is case-insensitive and you gotta have utf8 as built-in or have
> > it in an early userspace.
> 
> We could make it the user's problem, but that seems rather unfriendly.
> Especially because the utf8 module would be needed if the filesystem has the
> casefold feature at all, regardless of whether any casefolded directories are
> needed at boot time or not.  (Unless there is a plan to change that?)

I guess I'm not that worried, since the vast majority of desktop
distribution are using initial ramdisks these days.  And if someone
did build a monolithic kernel that couldn't mount the root file
system, they would figure that out pretty quickly.

The biggest problem they would have with trying to enable encryption
or casefolding on the root file system is that if they are using Grub,
older versions of Grub would see an unknown incompat feature, and
immediately have heartburn, and refuse to touch whatever file system
/boot is located on.  If the distribution has /boot as a stand-alone
partition, that won't be a problem, but if you have a single file
system which includes the location of kernels and initrds' are
located, the moment you try set the encryption or casefold on the file
system, you're immediately hosed --- and if you do this on a laptop
while you are on an airplane, without thinking things through, and
without access to a rescue USB thumb drive, life can
get... interesting.  (Why, yes, I'm speaking from direct experience;
why do you ask?  :-)

So in comparison to making such a mistake, building a kernel that was
missing casefold, and needing to fall back to an older kernel is not
really that bad of a user experience.  You just have to fall back the
distro kernel, which most kernel developers who are dogfooding
bleeding kernels are probably smart enough keep one around.

We *could* teach ext4 to support mounting file systems that have
casefold, without having the unicode module loaded, which would make
things a bit better, but I'm not sure it's worth the effort.  We could
even make the argument that letting the system boot, and then having
access to some directories return ENOTSUPP would actually be a more
confusing user experience than a simple hard failure when we try
mounting the file system.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-30 16:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29 20:42 [PATCH v5 0/4] Make UTF-8 encoding loadable Shreeya Patel
2021-03-29 20:42 ` [PATCH v5 1/4] fs: unicode: Use strscpy() instead of strncpy() Shreeya Patel
2021-03-29 20:42 ` [PATCH v5 2/4] fs: unicode: Rename function names from utf8 to unicode Shreeya Patel
2021-03-30  1:53   ` Eric Biggers
2021-03-30  9:49     ` Shreeya Patel
2021-03-29 20:42 ` [PATCH v5 3/4] fs: unicode: Rename utf8-core file to unicode-core Shreeya Patel
2021-03-29 20:42 ` [PATCH v5 4/4] fs: unicode: Add utf8 module and a unicode layer Shreeya Patel
2021-03-29 21:20   ` Gabriel Krisman Bertazi
2021-03-29 22:38     ` Shreeya Patel
2021-03-29 23:10       ` Gabriel Krisman Bertazi
2021-03-30  2:01   ` Eric Biggers
2021-03-30  2:16     ` Gabriel Krisman Bertazi
2021-03-30  5:47       ` Eric Biggers
2021-03-30 16:00         ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).