linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs
@ 2021-03-23 19:59 André Almeida
  2021-03-23 19:59 ` [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()" André Almeida
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: André Almeida @ 2021-03-23 19:59 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Alexander Viro
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, linux-kernel,
	Daniel Rosenberg, André Almeida

Hello,

This patchset adds support for case-insensitive file name lookups in
tmpfs. The implementation (and even the commit message) was based on the
work done at b886ee3e778e ("ext4: Support case-insensitive file name
lookups").

* Use case

The use case for this feature is similar to the use case for ext4, to
better support compatibility layers (like Wine), particularly in
combination with sandboxing/container tools (like Flatpak). Those
containerization tools can share a subset of the host filesystem with an
application. In the container, the root directory and any parent
directories required for a shared directory are on tmpfs, with the
shared directories bind-mounted into the container's view of the
filesystem.

If the host filesystem is using case-insensitive directories, then the
application can do lookups inside those directories in a
case-insensitive way, without this needing to be implemented in
user-space. However, if the host is only sharing a subset of a
case-insensitive directory with the application, then the parent
directories of the mount point will be part of the container's root
tmpfs. When the application tries to do case-insensitive lookups of
those parent directories on a case-sensitive tmpfs, the lookup will
fail.

For example, if /srv/games is a case-insensitive directory on the host,
then applications will expect /srv/games/Steam/Half-Life and
/srv/games/steam/half-life to be interchangeable; but if the
container framework is only sharing /srv/games/Steam/Half-Life and
/srv/games/Steam/Portal (and not the rest of /srv/games) with the
container, with /srv, /srv/games and /srv/games/Steam as part of the
container's tmpfs root, then making /srv/games a case-insensitive
directory inside the container would be necessary to meet that
expectation.

* The patchset

Note that, since there's no on disk information about this filesystem
(and thus, no mkfs support) we need to pass this information in the
mount options. This is the main difference with other fs supporting
casefolding like ext4 and f2fs. The folder attribute uses the same value
used by ext4/f2fs, so userspace tools like chattr already works with
this implementation.

- Patch 1 reverts the unexportation of casefolding functions for dentry
operations that are going to be used by tmpfs.

- Patch 2 does the wiring up of casefold functions inside tmpfs, along
with creating the mounting options for casefold support.

- Patch 3 gives tmpfs support for IOCTL for get/set file flags. This is
needed since the casefold is done in a per-directory basis at supported
mount points, via directory flags.

- Patch 4 documents the new options, along with an usage example.

This work is also available at
https://gitlab.collabora.com/tonyk/linux/-/tree/tmpfs-ic

* Testing

xfstests already has a test for casefold filesystems (generic/556). I
have adapted it to work with tmpfs in a hacky way and this work can be
found at https://gitlab.collabora.com/tonyk/xfstests. All tests succeed.

Whenever we manage to get in a common ground around the interface, I'll
make it more upstreamable so it can get merged along with the kernel
work.

* FAQ

- Can't this be done in userspace?

Yes, but it's slow and can't assure correctness (imagine two files named
file.c and FILE.C; an app asks for FiLe.C, which one is the correct?).

- Which changes are required in userspace?

Apart of the container tools that will use this feature, no change is
needed. Both mount and chattr already work with this patchset.

- This will completely obliterate my setup!
  
Casefold support in tmpfs is disabled by default.

Thanks,
	André

André Almeida (4):
  Revert "libfs: unexport generic_ci_d_compare() and
    generic_ci_d_hash()"
  mm: shmem: Support case-insensitive file name lookups
  mm: shmem: Add IOCTL support for tmpfs
  docs: tmpfs: Add casefold options

 Documentation/filesystems/tmpfs.rst |  26 +++++
 fs/libfs.c                          |   8 +-
 include/linux/fs.h                  |   5 +
 include/linux/shmem_fs.h            |   5 +
 mm/shmem.c                          | 175 +++++++++++++++++++++++++++-
 5 files changed, 213 insertions(+), 6 deletions(-)

-- 
2.31.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()"
  2021-03-23 19:59 [RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs André Almeida
@ 2021-03-23 19:59 ` André Almeida
  2021-03-23 20:15   ` Matthew Wilcox
  2021-03-23 19:59 ` [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups André Almeida
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: André Almeida @ 2021-03-23 19:59 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Alexander Viro
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, linux-kernel,
	Daniel Rosenberg, André Almeida

This reverts commit 794c43f716845e2d48ce195ed5c4179a4e05ce5f.

For implementing casefolding support at tmpfs, it needs to set dentry
operations at superblock level, given that tmpfs has no support for
fscrypt and we don't need to set operations on a per-dentry basis.
Revert this commit so we can access those exported function from tmpfs
code.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
---
 fs/libfs.c         | 8 +++++---
 include/linux/fs.h | 5 +++++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index e2de5401abca..d1d06494463a 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1387,8 +1387,8 @@ static bool needs_casefold(const struct inode *dir)
  *
  * Return: 0 if names match, 1 if mismatch, or -ERRNO
  */
-static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
-				const char *str, const struct qstr *name)
+int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
+			  const char *str, const struct qstr *name)
 {
 	const struct dentry *parent = READ_ONCE(dentry->d_parent);
 	const struct inode *dir = READ_ONCE(parent->d_inode);
@@ -1425,6 +1425,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 		return 1;
 	return !!memcmp(str, name->name, len);
 }
+EXPORT_SYMBOL(generic_ci_d_compare);
 
 /**
  * generic_ci_d_hash - generic d_hash implementation for casefolding filesystems
@@ -1433,7 +1434,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
  *
  * Return: 0 if hash was successful or unchanged, and -EINVAL on error
  */
-static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
+int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 {
 	const struct inode *dir = READ_ONCE(dentry->d_inode);
 	struct super_block *sb = dentry->d_sb;
@@ -1448,6 +1449,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 		return -EINVAL;
 	return 0;
 }
+EXPORT_SYMBOL(generic_ci_d_hash);
 
 static const struct dentry_operations generic_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec8f3ddf4a6a..29a0c6371f7d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3313,6 +3313,11 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
 
 extern int generic_check_addressable(unsigned, u64);
 
+#ifdef CONFIG_UNICODE
+extern int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str);
+extern int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
+				const char *str, const struct qstr *name);
+#endif
 extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
 
 #ifdef CONFIG_MIGRATION
-- 
2.31.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
  2021-03-23 19:59 [RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs André Almeida
  2021-03-23 19:59 ` [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()" André Almeida
@ 2021-03-23 19:59 ` André Almeida
  2021-03-23 20:18   ` Gabriel Krisman Bertazi
  2021-03-23 23:19   ` Al Viro
  2021-03-23 19:59 ` [RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs André Almeida
  2021-03-23 19:59 ` [RFC PATCH 4/4] docs: tmpfs: Add casefold options André Almeida
  3 siblings, 2 replies; 16+ messages in thread
From: André Almeida @ 2021-03-23 19:59 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Alexander Viro
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, linux-kernel,
	Daniel Rosenberg, André Almeida

This patch implements the support for case-insensitive file name lookups
in tmpfs, based on the encoding passed in the mount options.

A filesystem that has the casefold feature set is able to configure
directories with the +F (TMPFS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.

The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.

* dcache handling:

For a +F directory, tmpfs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().

d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.

For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the VFS layer to fix. We can live without that for now, and so does
everyone else.

The lookup() path at tmpfs creates negatives dentries, that are later
instantiated if the file is created. In that way, all files in tmpfs
have a dentry given that the filesystem exists exclusively in memory.
As explained above, we don't have negative dentries for casefold files,
so dentries are created at lookup() iff files aren't casefolded. Else,
the dentry is created just before being instantiated at create path.
At the remove path, dentries are invalidated for casefolded files.

* Dealing with invalid sequences:

By default, when an invalid UTF-8 sequence is identified, tmpfs will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional flag
(cf_strict) can be set in the mount arguments telling the filesystem
code and userspace tools to enforce the encoding. When that optional
flag is set, any attempt to create a file name using an invalid UTF-8
sequence will fail and return an error to userspace.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
---
 include/linux/shmem_fs.h |  1 +
 mm/shmem.c               | 91 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index d82b6f396588..29ee64352807 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -43,6 +43,7 @@ struct shmem_sb_info {
 	spinlock_t shrinklist_lock;   /* Protects shrinklist */
 	struct list_head shrinklist;  /* List of shinkable inodes */
 	unsigned long shrinklist_len; /* Length of shrinklist */
+	bool casefold;              /* If this mount point supports casefolding */
 };
 
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
diff --git a/mm/shmem.c b/mm/shmem.c
index b2db4ed0fbc7..20df81763995 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -38,6 +38,7 @@
 #include <linux/hugetlb.h>
 #include <linux/frontswap.h>
 #include <linux/fs_parser.h>
+#include <linux/unicode.h>
 
 #include <asm/tlbflush.h> /* for arch/microblaze update_mmu_cache() */
 
@@ -117,6 +118,8 @@ struct shmem_options {
 	bool full_inums;
 	int huge;
 	int seen;
+	struct unicode_map *encoding;
+	bool cf_strict;
 #define SHMEM_SEEN_BLOCKS 1
 #define SHMEM_SEEN_INODES 2
 #define SHMEM_SEEN_HUGE 4
@@ -161,6 +164,13 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
 	return sb->s_fs_info;
 }
 
+#ifdef CONFIG_UNICODE
+static const struct dentry_operations casefold_dentry_ops = {
+	.d_hash = generic_ci_d_hash,
+	.d_compare = generic_ci_d_compare,
+};
+#endif
+
 /*
  * shmem_file_setup pre-accounts the whole fixed size of a VM object,
  * for shared memory and for shared anonymous (/dev/zero) mappings
@@ -2859,8 +2869,18 @@ shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	struct inode *inode;
 	int error = -ENOSPC;
 
+#ifdef CONFIG_UNICODE
+	struct super_block *sb = dir->i_sb;
+
+	if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) &&
+	    sb->s_encoding && utf8_validate(sb->s_encoding, &dentry->d_name))
+		return -EINVAL;
+#endif
+
 	inode = shmem_get_inode(dir->i_sb, dir, mode, dev, VM_NORESERVE);
 	if (inode) {
+		inode->i_flags |= dir->i_flags;
+
 		error = simple_acl_create(dir, inode);
 		if (error)
 			goto out_iput;
@@ -2870,6 +2890,9 @@ shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		if (error && error != -EOPNOTSUPP)
 			goto out_iput;
 
+		if (IS_CASEFOLDED(dir))
+			d_add(dentry, NULL);
+
 		error = 0;
 		dir->i_size += BOGO_DIRENT_SIZE;
 		dir->i_ctime = dir->i_mtime = current_time(dir);
@@ -2925,6 +2948,19 @@ static int shmem_create(struct user_namespace *mnt_userns, struct inode *dir,
 	return shmem_mknod(&init_user_ns, dir, dentry, mode | S_IFREG, 0);
 }
 
+static struct dentry *shmem_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
+{
+	if (dentry->d_name.len > NAME_MAX)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	if (IS_CASEFOLDED(dir))
+		return NULL;
+
+	d_add(dentry, NULL);
+
+	return NULL;
+}
+
 /*
  * Link a file..
  */
@@ -2946,6 +2982,9 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 			goto out;
 	}
 
+	if (IS_CASEFOLDED(dir))
+		d_add(dentry, NULL);
+
 	dir->i_size += BOGO_DIRENT_SIZE;
 	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
 	inc_nlink(inode);
@@ -2967,6 +3006,10 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
 	drop_nlink(inode);
 	dput(dentry);	/* Undo the count from "create" - this does all the work */
+
+	if (IS_CASEFOLDED(dir))
+		d_invalidate(dentry);
+
 	return 0;
 }
 
@@ -3128,6 +3171,8 @@ static int shmem_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	}
 	dir->i_size += BOGO_DIRENT_SIZE;
 	dir->i_ctime = dir->i_mtime = current_time(dir);
+	if (IS_CASEFOLDED(dir))
+		d_add(dentry, NULL);
 	d_instantiate(dentry, inode);
 	dget(dentry);
 	return 0;
@@ -3364,6 +3409,8 @@ enum shmem_param {
 	Opt_uid,
 	Opt_inode32,
 	Opt_inode64,
+	Opt_casefold,
+	Opt_cf_strict,
 };
 
 static const struct constant_table shmem_param_enums_huge[] = {
@@ -3385,6 +3432,8 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
 	fsparam_u32   ("uid",		Opt_uid),
 	fsparam_flag  ("inode32",	Opt_inode32),
 	fsparam_flag  ("inode64",	Opt_inode64),
+	fsparam_string("casefold",	Opt_casefold),
+	fsparam_flag  ("cf_strict",	Opt_cf_strict),
 	{}
 };
 
@@ -3392,9 +3441,11 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 {
 	struct shmem_options *ctx = fc->fs_private;
 	struct fs_parse_result result;
+	struct unicode_map *encoding;
 	unsigned long long size;
+	char version[10];
 	char *rest;
-	int opt;
+	int opt, ret;
 
 	opt = fs_parse(fc, shmem_fs_parameters, param, &result);
 	if (opt < 0)
@@ -3468,6 +3519,23 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		ctx->full_inums = true;
 		ctx->seen |= SHMEM_SEEN_INUMS;
 		break;
+	case Opt_casefold:
+		if (strncmp(param->string, "utf8-", 5))
+			return invalfc(fc, "Only utf8 encondings are supported");
+		ret = strscpy(version, param->string + 5, sizeof(version));
+		if (ret < 0)
+			return invalfc(fc, "Invalid enconding argument: %s", param->string);
+
+		encoding = utf8_load(version);
+		if (IS_ERR(encoding))
+			return invalfc(fc, "Invalid utf8 version: %s", version);
+		pr_info("tmpfs: Using encoding defined by mount options: %s\n",
+			param->string);
+		ctx->encoding = encoding;
+		break;
+	case Opt_cf_strict:
+		ctx->cf_strict = true;
+		break;
 	}
 	return 0;
 
@@ -3646,6 +3714,11 @@ static void shmem_put_super(struct super_block *sb)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 
+#ifdef CONFIG_UNICODE
+	if (sbinfo->casefold)
+		utf8_unload(sb->s_encoding);
+#endif
+
 	free_percpu(sbinfo->ino_batch);
 	percpu_counter_destroy(&sbinfo->used_blocks);
 	mpol_put(sbinfo->mpol);
@@ -3686,6 +3759,18 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	}
 	sb->s_export_op = &shmem_export_ops;
 	sb->s_flags |= SB_NOSEC;
+
+#ifdef CONFIG_UNICODE
+	if (ctx->encoding) {
+		sb->s_d_op = &casefold_dentry_ops;
+		sb->s_encoding = ctx->encoding;
+		if (ctx->cf_strict)
+			sb->s_encoding_flags = SB_ENC_STRICT_MODE_FL;
+		sbinfo->casefold = true;
+	} else if (ctx->cf_strict) {
+		pr_warn("tmpfs: casefold strict mode enabled without encoding, ignoring\n");
+	}
+#endif /* CONFIG_UNICODE */
 #else
 	sb->s_flags |= SB_NOUSER;
 #endif
@@ -3846,7 +3931,7 @@ static const struct inode_operations shmem_inode_operations = {
 static const struct inode_operations shmem_dir_inode_operations = {
 #ifdef CONFIG_TMPFS
 	.create		= shmem_create,
-	.lookup		= simple_lookup,
+	.lookup		= shmem_lookup,
 	.link		= shmem_link,
 	.unlink		= shmem_unlink,
 	.symlink	= shmem_symlink,
@@ -3912,6 +3997,8 @@ int shmem_init_fs_context(struct fs_context *fc)
 	ctx->mode = 0777 | S_ISVTX;
 	ctx->uid = current_fsuid();
 	ctx->gid = current_fsgid();
+	ctx->encoding = NULL;
+	ctx->cf_strict = false;
 
 	fc->fs_private = ctx;
 	fc->ops = &shmem_fs_context_ops;
-- 
2.31.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs
  2021-03-23 19:59 [RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs André Almeida
  2021-03-23 19:59 ` [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()" André Almeida
  2021-03-23 19:59 ` [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups André Almeida
@ 2021-03-23 19:59 ` André Almeida
  2021-03-23 22:15   ` Gabriel Krisman Bertazi
  2021-03-23 19:59 ` [RFC PATCH 4/4] docs: tmpfs: Add casefold options André Almeida
  3 siblings, 1 reply; 16+ messages in thread
From: André Almeida @ 2021-03-23 19:59 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Alexander Viro
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, linux-kernel,
	Daniel Rosenberg, André Almeida

Implement IOCTL operations for files to set/get file flags. Implement
the only supported flag by now, that is S_CASEFOLD.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
---
 include/linux/shmem_fs.h |  4 ++
 mm/shmem.c               | 84 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 29ee64352807..2c89c5a66508 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -140,4 +140,8 @@ extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
 				 dst_addr)      ({ BUG(); 0; })
 #endif
 
+#define TMPFS_CASEFOLD_FL	0x40000000 /* Casefolded file */
+#define TMPFS_USER_FLS		TMPFS_CASEFOLD_FL /* Userspace supported flags */
+#define TMPFS_FLS		S_CASEFOLD /* Kernel supported flags */
+
 #endif
diff --git a/mm/shmem.c b/mm/shmem.c
index 20df81763995..2f2c996d215b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -258,6 +258,7 @@ static inline void shmem_inode_unacct_blocks(struct inode *inode, long pages)
 static const struct super_operations shmem_ops;
 const struct address_space_operations shmem_aops;
 static const struct file_operations shmem_file_operations;
+static const struct file_operations shmem_dir_operations;
 static const struct inode_operations shmem_inode_operations;
 static const struct inode_operations shmem_dir_inode_operations;
 static const struct inode_operations shmem_special_inode_operations;
@@ -2347,7 +2348,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 			/* Some things misbehave if size == 0 on a directory */
 			inode->i_size = 2 * BOGO_DIRENT_SIZE;
 			inode->i_op = &shmem_dir_inode_operations;
-			inode->i_fop = &simple_dir_operations;
+			inode->i_fop = &shmem_dir_operations;
 			break;
 		case S_IFLNK:
 			/*
@@ -2838,6 +2839,76 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 	return error;
 }
 
+static long shmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	int ret;
+	u32 fsflags = 0, old, new = 0;
+	struct inode *inode = file_inode(file);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
+
+	switch (cmd) {
+	case FS_IOC_GETFLAGS:
+		if ((inode->i_flags & S_CASEFOLD) && S_ISDIR(inode->i_mode))
+			fsflags |= TMPFS_CASEFOLD_FL;
+
+		if (put_user(fsflags, (int __user *)arg))
+			return -EFAULT;
+
+		return 0;
+
+	case FS_IOC_SETFLAGS:
+		if (get_user(fsflags, (int __user *)arg))
+			return -EFAULT;
+
+		old = inode->i_flags;
+
+		if (fsflags & ~TMPFS_USER_FLS)
+			return -EINVAL;
+
+		if (fsflags & TMPFS_CASEFOLD_FL) {
+			if (!sbinfo->casefold) {
+				pr_err("tmpfs: casefold disabled at this mount point\n");
+				return -EOPNOTSUPP;
+			}
+
+			if (!S_ISDIR(inode->i_mode))
+				return -ENOTDIR;
+
+			if (!simple_empty(file_dentry(file)))
+				return -ENOTEMPTY;
+
+			new |= S_CASEFOLD;
+		} else if (old & S_CASEFOLD) {
+			if (!simple_empty(file_dentry(file)))
+				return -ENOTEMPTY;
+		}
+
+		ret = mnt_want_write_file(file);
+		if (ret)
+			return ret;
+
+		inode_lock(inode);
+
+		ret = vfs_ioc_setflags_prepare(inode, old, new);
+		if (ret) {
+			inode_unlock(inode);
+			mnt_drop_write_file(file);
+			return ret;
+		}
+
+		inode_set_flags(inode, new, TMPFS_FLS);
+
+		inode_unlock(inode);
+		mnt_drop_write_file(file);
+		return 0;
+
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
 static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb);
@@ -3916,6 +3987,7 @@ static const struct file_operations shmem_file_operations = {
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fallocate	= shmem_fallocate,
+	.unlocked_ioctl = shmem_ioctl,
 #endif
 };
 
@@ -3928,6 +4000,16 @@ static const struct inode_operations shmem_inode_operations = {
 #endif
 };
 
+static const struct file_operations shmem_dir_operations = {
+	.open		= dcache_dir_open,
+	.release	= dcache_dir_close,
+	.llseek		= dcache_dir_lseek,
+	.read		= generic_read_dir,
+	.iterate_shared	= dcache_readdir,
+	.fsync		= noop_fsync,
+	.unlocked_ioctl = shmem_ioctl,
+};
+
 static const struct inode_operations shmem_dir_inode_operations = {
 #ifdef CONFIG_TMPFS
 	.create		= shmem_create,
-- 
2.31.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 4/4] docs: tmpfs: Add casefold options
  2021-03-23 19:59 [RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs André Almeida
                   ` (2 preceding siblings ...)
  2021-03-23 19:59 ` [RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs André Almeida
@ 2021-03-23 19:59 ` André Almeida
  2021-03-23 21:58   ` Randy Dunlap
  2021-03-23 22:19   ` Gabriel Krisman Bertazi
  3 siblings, 2 replies; 16+ messages in thread
From: André Almeida @ 2021-03-23 19:59 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Alexander Viro
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, linux-kernel,
	Daniel Rosenberg, André Almeida

Document mounting options to enable casefold support in tmpfs.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
---
 Documentation/filesystems/tmpfs.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 0408c245785e..84c87c309bd7 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
 will give you tmpfs instance on /mytmpfs which can allocate 10GB
 RAM/SWAP in 10240 inodes and it is only accessible by root.
 
+tmpfs has the following mounting options for case-insesitive lookups support:
+
+=========   ==============================================================
+casefold    Enable casefold support at this mount point using the given
+            argument as enconding. Currently only utf8 encondings are supported.
+cf_strict   Enable strict casefolding at this mouting point (disabled by
+            default). This means that invalid strings should be reject by the
+            file system.
+=========   ==============================================================
+
+Note that this option doesn't enable casefold by default, one needs to set
+casefold flag per directory, setting the +F attribute in an empty directory. New
+directories within a casefolded one will inherit the flag.
+
+Example::
+
+    $ mount -t tmpfs -o casefold=utf8-12.1.0,cf_strict tmpfs /mytmpfs
+    $ cd /mytmpfs
+    $ touch a; touch A
+    $ ls
+    A  a
+    $ mkdir dir
+    $ chattr +F dir
+    $ touch dir/a; touch dir/A
+    $ ls dir
+    a
 
 :Author:
    Christoph Rohland <cr@sap.com>, 1.12.01
-- 
2.31.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()"
  2021-03-23 19:59 ` [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()" André Almeida
@ 2021-03-23 20:15   ` Matthew Wilcox
  2021-03-24 20:09     ` André Almeida
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2021-03-23 20:15 UTC (permalink / raw)
  To: André Almeida
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, krisman, smcv,
	kernel, linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

On Tue, Mar 23, 2021 at 04:59:38PM -0300, André Almeida wrote:
> This reverts commit 794c43f716845e2d48ce195ed5c4179a4e05ce5f.
> 
> For implementing casefolding support at tmpfs, it needs to set dentry
> operations at superblock level, given that tmpfs has no support for
> fscrypt and we don't need to set operations on a per-dentry basis.
> Revert this commit so we can access those exported function from tmpfs
> code.

But tmpfs / shmem are Kconfig bools, not tristate.  They can't be built
as modules, so there's no need to export the symbols.

> +#ifdef CONFIG_UNICODE
> +extern int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str);
> +extern int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
> +				const char *str, const struct qstr *name);
> +#endif

There's no need for the ifdef (it only causes unnecessary rebuilds) and
the 'extern' keyword is also unwelcome.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
  2021-03-23 19:59 ` [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups André Almeida
@ 2021-03-23 20:18   ` Gabriel Krisman Bertazi
  2021-03-24 20:17     ` André Almeida
  2021-03-23 23:19   ` Al Viro
  1 sibling, 1 reply; 16+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-03-23 20:18 UTC (permalink / raw)
  To: André Almeida
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, smcv, kernel,
	linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

André Almeida <andrealmeid@collabora.com> writes:

> This patch implements the support for case-insensitive file name lookups
> in tmpfs, based on the encoding passed in the mount options.

Thanks for doing this.

>  
> +#ifdef CONFIG_UNICODE
> +static const struct dentry_operations casefold_dentry_ops = {
> +	.d_hash = generic_ci_d_hash,
> +	.d_compare = generic_ci_d_compare,
> +};
> +#endif

Why not reuse struct generic_ci_dentry_ops ?

> +
>  /*
>   * shmem_file_setup pre-accounts the whole fixed size of a VM object,
>   * for shared memory and for shared anonymous (/dev/zero) mappings
> @@ -2859,8 +2869,18 @@ shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir,
>  	struct inode *inode;
>  	int error = -ENOSPC;
>  
> +#ifdef CONFIG_UNICODE
> +	struct super_block *sb = dir->i_sb;
> +
> +	if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) &&
> +	    sb->s_encoding && utf8_validate(sb->s_encoding, &dentry->d_name))
> +		return -EINVAL;
> +#endif
> +
>  	inode = shmem_get_inode(dir->i_sb, dir, mode, dev, VM_NORESERVE);
>  	if (inode) {
> +		inode->i_flags |= dir->i_flags;
> +
>  		error = simple_acl_create(dir, inode);
>  		if (error)
>  			goto out_iput;
> @@ -2870,6 +2890,9 @@ shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir,
>  		if (error && error != -EOPNOTSUPP)
>  			goto out_iput;
>  
> +		if (IS_CASEFOLDED(dir))
> +			d_add(dentry, NULL);
> +
>  		error = 0;
>  		dir->i_size += BOGO_DIRENT_SIZE;
>  		dir->i_ctime = dir->i_mtime = current_time(dir);
> @@ -2925,6 +2948,19 @@ static int shmem_create(struct user_namespace *mnt_userns, struct inode *dir,
>  	return shmem_mknod(&init_user_ns, dir, dentry, mode | S_IFREG, 0);
>  }
>  
> +static struct dentry *shmem_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
> +{
> +	if (dentry->d_name.len > NAME_MAX)
> +		return ERR_PTR(-ENAMETOOLONG);
> +
> +	if (IS_CASEFOLDED(dir))
> +		return NULL;

I think this deserves a comment explaining why it is necessary.

> +
> +	d_add(dentry, NULL);
> +
> +	return NULL;
> +}
> +
>  /*
>   * Link a file..
>   */
> @@ -2946,6 +2982,9 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
>  			goto out;
>  	}
>  
> +	if (IS_CASEFOLDED(dir))
> +		d_add(dentry, NULL);
> +
>  	dir->i_size += BOGO_DIRENT_SIZE;
>  	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
>  	inc_nlink(inode);
> @@ -2967,6 +3006,10 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
>  	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
>  	drop_nlink(inode);
>  	dput(dentry);	/* Undo the count from "create" - this does all the work */
> +
> +	if (IS_CASEFOLDED(dir))
> +		d_invalidate(dentry);
> +
>  	return 0;
>  }
>  
> @@ -3128,6 +3171,8 @@ static int shmem_symlink(struct user_namespace *mnt_userns, struct inode *dir,
>  	}
>  	dir->i_size += BOGO_DIRENT_SIZE;
>  	dir->i_ctime = dir->i_mtime = current_time(dir);
> +	if (IS_CASEFOLDED(dir))
> +		d_add(dentry, NULL);
>  	d_instantiate(dentry, inode);
>  	dget(dentry);
>  	return 0;
> @@ -3364,6 +3409,8 @@ enum shmem_param {
>  	Opt_uid,
>  	Opt_inode32,
>  	Opt_inode64,
> +	Opt_casefold,
> +	Opt_cf_strict,
>  };
>  
>  static const struct constant_table shmem_param_enums_huge[] = {
> @@ -3385,6 +3432,8 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
>  	fsparam_u32   ("uid",		Opt_uid),
>  	fsparam_flag  ("inode32",	Opt_inode32),
>  	fsparam_flag  ("inode64",	Opt_inode64),
> +	fsparam_string("casefold",	Opt_casefold),
> +	fsparam_flag  ("cf_strict",	Opt_cf_strict),
>  	{}
>  };
>  
> @@ -3392,9 +3441,11 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>  {
>  	struct shmem_options *ctx = fc->fs_private;
>  	struct fs_parse_result result;
> +	struct unicode_map *encoding;
>  	unsigned long long size;
> +	char version[10];
>  	char *rest;
> -	int opt;
> +	int opt, ret;
>  
>  	opt = fs_parse(fc, shmem_fs_parameters, param, &result);
>  	if (opt < 0)
> @@ -3468,6 +3519,23 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>  		ctx->full_inums = true;
>  		ctx->seen |= SHMEM_SEEN_INUMS;
>  		break;
> +	case Opt_casefold:
> +		if (strncmp(param->string, "utf8-", 5))
> +			return invalfc(fc, "Only utf8 encondings are supported");
> +		ret = strscpy(version, param->string + 5, sizeof(version));

Ugh.  Now we are doing two strscpy for the parse api (in unicode_load).
Can change the unicode_load api to reuse it?

thanks,

-- 
Gabriel Krisman Bertazi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 4/4] docs: tmpfs: Add casefold options
  2021-03-23 19:59 ` [RFC PATCH 4/4] docs: tmpfs: Add casefold options André Almeida
@ 2021-03-23 21:58   ` Randy Dunlap
  2021-03-25 14:27     ` André Almeida
  2021-03-23 22:19   ` Gabriel Krisman Bertazi
  1 sibling, 1 reply; 16+ messages in thread
From: Randy Dunlap @ 2021-03-23 21:58 UTC (permalink / raw)
  To: André Almeida, Hugh Dickins, Andrew Morton, Alexander Viro
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, linux-kernel,
	Daniel Rosenberg

Hi--

On 3/23/21 12:59 PM, André Almeida wrote:
> Document mounting options to enable casefold support in tmpfs.
> 
> Signed-off-by: André Almeida <andrealmeid@collabora.com>
> ---
>  Documentation/filesystems/tmpfs.rst | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..84c87c309bd7 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
>  will give you tmpfs instance on /mytmpfs which can allocate 10GB
>  RAM/SWAP in 10240 inodes and it is only accessible by root.
>  
> +tmpfs has the following mounting options for case-insesitive lookups support:
> +
> +=========   ==============================================================
> +casefold    Enable casefold support at this mount point using the given
> +            argument as enconding. Currently only utf8 encondings are supported.

                           encoding.                      encodings

> +cf_strict   Enable strict casefolding at this mouting point (disabled by

                                                 mount

> +            default). This means that invalid strings should be reject by the

                                                                   rejected

> +            file system.
> +=========   ==============================================================
> +
> +Note that this option doesn't enable casefold by default, one needs to set

                                                    default; one needs to set the

> +casefold flag per directory, setting the +F attribute in an empty directory. New
> +directories within a casefolded one will inherit the flag.


-- 
~Randy



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs
  2021-03-23 19:59 ` [RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs André Almeida
@ 2021-03-23 22:15   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 16+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-03-23 22:15 UTC (permalink / raw)
  To: André Almeida
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, smcv, kernel,
	linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

André Almeida <andrealmeid@collabora.com> writes:

> Implement IOCTL operations for files to set/get file flags. Implement
> the only supported flag by now, that is S_CASEFOLD.
>
> Signed-off-by: André Almeida <andrealmeid@collabora.com>
> ---
>  include/linux/shmem_fs.h |  4 ++
>  mm/shmem.c               | 84 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 87 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 29ee64352807..2c89c5a66508 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -140,4 +140,8 @@ extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
>  				 dst_addr)      ({ BUG(); 0; })
>  #endif
>  
> +#define TMPFS_CASEFOLD_FL	0x40000000 /* Casefolded file */
> +#define TMPFS_USER_FLS		TMPFS_CASEFOLD_FL /* Userspace supported flags */
> +#define TMPFS_FLS		S_CASEFOLD /* Kernel supported flags */

Minor nit: FLS?  _FLAGS is short enough :).

> +
>  #endif
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 20df81763995..2f2c996d215b 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -258,6 +258,7 @@ static inline void shmem_inode_unacct_blocks(struct inode *inode, long pages)
>  static const struct super_operations shmem_ops;
>  const struct address_space_operations shmem_aops;
>  static const struct file_operations shmem_file_operations;
> +static const struct file_operations shmem_dir_operations;
>  static const struct inode_operations shmem_inode_operations;
>  static const struct inode_operations shmem_dir_inode_operations;
>  static const struct inode_operations shmem_special_inode_operations;
> @@ -2347,7 +2348,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
>  			/* Some things misbehave if size == 0 on a directory */
>  			inode->i_size = 2 * BOGO_DIRENT_SIZE;
>  			inode->i_op = &shmem_dir_inode_operations;
> -			inode->i_fop = &simple_dir_operations;
> +			inode->i_fop = &shmem_dir_operations;
>  			break;
>  		case S_IFLNK:
>  			/*
> @@ -2838,6 +2839,76 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  	return error;
>  }
>  
> +static long shmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> +	int ret;
> +	u32 fsflags = 0, old, new = 0;
> +	struct inode *inode = file_inode(file);
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> +
> +	switch (cmd) {
> +	case FS_IOC_GETFLAGS:
> +		if ((inode->i_flags & S_CASEFOLD) && S_ISDIR(inode->i_mode))
> +			fsflags |= TMPFS_CASEFOLD_FL;
> +
> +		if (put_user(fsflags, (int __user *)arg))
> +			return -EFAULT;
> +
> +		return 0;
> +
> +	case FS_IOC_SETFLAGS:
> +		if (get_user(fsflags, (int __user *)arg))
> +			return -EFAULT;
> +
> +		old = inode->i_flags;
> +
> +		if (fsflags & ~TMPFS_USER_FLS)
> +			return -EINVAL;
> +
> +		if (fsflags & TMPFS_CASEFOLD_FL) {
> +			if (!sbinfo->casefold) {
> +				pr_err("tmpfs: casefold disabled at this mount point\n");

Minor nit: no point in logging an error here.  The user has simply not
enabled casefolding.  The error returned below should be enough.

> +				return -EOPNOTSUPP;
> +			}
> +
> +			if (!S_ISDIR(inode->i_mode))
> +				return -ENOTDIR;
> +
> +			if (!simple_empty(file_dentry(file)))
> +				return -ENOTEMPTY;
> +
> +			new |= S_CASEFOLD;
> +		} else if (old & S_CASEFOLD) {
> +			if (!simple_empty(file_dentry(file)))
> +				return -ENOTEMPTY;
> +		}
> +
> +		ret = mnt_want_write_file(file);
> +		if (ret)
> +			return ret;
> +
> +		inode_lock(inode);
> +
> +		ret = vfs_ioc_setflags_prepare(inode, old, new);
> +		if (ret) {
> +			inode_unlock(inode);
> +			mnt_drop_write_file(file);
> +			return ret;
> +		}
> +
> +		inode_set_flags(inode, new, TMPFS_FLS);
> +
> +		inode_unlock(inode);
> +		mnt_drop_write_file(file);
> +		return 0;
> +
> +	default:
> +		return -ENOTTY;
> +	}
> +
> +	return 0;
> +}
> +
>  static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb);
> @@ -3916,6 +3987,7 @@ static const struct file_operations shmem_file_operations = {
>  	.splice_read	= generic_file_splice_read,
>  	.splice_write	= iter_file_splice_write,
>  	.fallocate	= shmem_fallocate,
> +	.unlocked_ioctl = shmem_ioctl,
>  #endif
>  };
>  
> @@ -3928,6 +4000,16 @@ static const struct inode_operations shmem_inode_operations = {
>  #endif
>  };
>  
> +static const struct file_operations shmem_dir_operations = {
> +	.open		= dcache_dir_open,
> +	.release	= dcache_dir_close,
> +	.llseek		= dcache_dir_lseek,
> +	.read		= generic_read_dir,
> +	.iterate_shared	= dcache_readdir,
> +	.fsync		= noop_fsync,
> +	.unlocked_ioctl = shmem_ioctl,
> +};
> +
>  static const struct inode_operations shmem_dir_inode_operations = {
>  #ifdef CONFIG_TMPFS
>  	.create		= shmem_create,

-- 
Gabriel Krisman Bertazi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 4/4] docs: tmpfs: Add casefold options
  2021-03-23 19:59 ` [RFC PATCH 4/4] docs: tmpfs: Add casefold options André Almeida
  2021-03-23 21:58   ` Randy Dunlap
@ 2021-03-23 22:19   ` Gabriel Krisman Bertazi
  2021-03-24 20:47     ` André Almeida
  1 sibling, 1 reply; 16+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-03-23 22:19 UTC (permalink / raw)
  To: André Almeida
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, smcv, kernel,
	linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

André Almeida <andrealmeid@collabora.com> writes:

> Document mounting options to enable casefold support in tmpfs.
>
> Signed-off-by: André Almeida <andrealmeid@collabora.com>
> ---
>  Documentation/filesystems/tmpfs.rst | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..84c87c309bd7 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
>  will give you tmpfs instance on /mytmpfs which can allocate 10GB
>  RAM/SWAP in 10240 inodes and it is only accessible by root.
>  
> +tmpfs has the following mounting options for case-insesitive lookups support:
> +
> +=========   ==============================================================
> +casefold    Enable casefold support at this mount point using the given
> +            argument as enconding. Currently only utf8 encondings are supported.
> +cf_strict   Enable strict casefolding at this mouting point (disabled by
> +            default). This means that invalid strings should be reject by the
> +            file system.

strict mode refers to the encoding, not exactly casefold.  Maybe we
could have a parameter encoding_flags that accepts the flag 'strict'.
This would make it closer to the ext4 interface.  Alternatively, call
this option strict_encoding.

> +=========   ==============================================================
> +
> +Note that this option doesn't enable casefold by default, one needs to set
> +casefold flag per directory, setting the +F attribute in an empty directory. New
> +directories within a casefolded one will inherit the flag.
> +
> +Example::
> +
> +    $ mount -t tmpfs -o casefold=utf8-12.1.0,cf_strict tmpfs /mytmpfs
> +    $ cd /mytmpfs
> +    $ touch a; touch A
> +    $ ls
> +    A  a
> +    $ mkdir dir
> +    $ chattr +F dir
> +    $ touch dir/a; touch dir/A
> +    $ ls dir
> +    a
>  
>  :Author:
>     Christoph Rohland <cr@sap.com>, 1.12.01

-- 
Gabriel Krisman Bertazi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
  2021-03-23 19:59 ` [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups André Almeida
  2021-03-23 20:18   ` Gabriel Krisman Bertazi
@ 2021-03-23 23:19   ` Al Viro
  2021-03-24 20:44     ` André Almeida
  1 sibling, 1 reply; 16+ messages in thread
From: Al Viro @ 2021-03-23 23:19 UTC (permalink / raw)
  To: André Almeida
  Cc: Hugh Dickins, Andrew Morton, krisman, smcv, kernel, linux-mm,
	linux-fsdevel, linux-kernel, Daniel Rosenberg

On Tue, Mar 23, 2021 at 04:59:39PM -0300, André Almeida wrote:

> * dcache handling:
> 
> For a +F directory, tmpfs only stores the first equivalent name dentry
> used in the dcache. This is done to prevent unintentional duplication of
> dentries in the dcache, while also allowing the VFS code to quickly find
> the right entry in the cache despite which equivalent string was used in
> a previous lookup, without having to resort to ->lookup().
> 
> d_hash() of casefolded directories is implemented as the hash of the
> casefolded string, such that we always have a well-known bucket for all
> the equivalencies of the same string. d_compare() uses the
> utf8_strncasecmp() infrastructure, which handles the comparison of
> equivalent, same case, names as well.
> 
> For now, negative lookups are not inserted in the dcache, since they
> would need to be invalidated anyway, because we can't trust missing file
> dentries. This is bad for performance but requires some leveraging of
> the VFS layer to fix. We can live without that for now, and so does
> everyone else.

"For now"?  Not a single practical suggestion has ever materialized.
Pardon me, but by now I'm very sceptical about the odds of that
ever changing.  And no, I don't have any suggestions either.

> The lookup() path at tmpfs creates negatives dentries, that are later
> instantiated if the file is created. In that way, all files in tmpfs
> have a dentry given that the filesystem exists exclusively in memory.
> As explained above, we don't have negative dentries for casefold files,
> so dentries are created at lookup() iff files aren't casefolded. Else,
> the dentry is created just before being instantiated at create path.
> At the remove path, dentries are invalidated for casefolded files.

Umm...  What happens to those assertions if previously sane directory
gets case-buggered?  You've got an ioctl for doing just that...
Incidentally, that ioctl is obviously racy - result of that simple_empty() 
might have nothing to do with reality before it is returned to caller.
And while we are at it, simple_empty() doesn't check a damn thing about
negative dentries in there...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()"
  2021-03-23 20:15   ` Matthew Wilcox
@ 2021-03-24 20:09     ` André Almeida
  0 siblings, 0 replies; 16+ messages in thread
From: André Almeida @ 2021-03-24 20:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, krisman, smcv,
	kernel, linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

Hi Matthew,

Às 17:15 de 23/03/21, Matthew Wilcox escreveu:
> On Tue, Mar 23, 2021 at 04:59:38PM -0300, André Almeida wrote:
>> This reverts commit 794c43f716845e2d48ce195ed5c4179a4e05ce5f.
>>
>> For implementing casefolding support at tmpfs, it needs to set dentry
>> operations at superblock level, given that tmpfs has no support for
>> fscrypt and we don't need to set operations on a per-dentry basis.
>> Revert this commit so we can access those exported function from tmpfs
>> code.
> 
> But tmpfs / shmem are Kconfig bools, not tristate.  They can't be built
> as modules, so there's no need to export the symbols.
> 
>> +#ifdef CONFIG_UNICODE
>> +extern int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str);
>> +extern int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
>> +				const char *str, const struct qstr *name);
>> +#endif
> 
> There's no need for the ifdef (it only causes unnecessary rebuilds) and
> the 'extern' keyword is also unwelcome.
> 

Thank you. Instead of reverting the commit, I'll do a new commit doing 
this in a properly way.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
  2021-03-23 20:18   ` Gabriel Krisman Bertazi
@ 2021-03-24 20:17     ` André Almeida
  0 siblings, 0 replies; 16+ messages in thread
From: André Almeida @ 2021-03-24 20:17 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, smcv, kernel,
	linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

Às 17:18 de 23/03/21, Gabriel Krisman Bertazi escreveu:
> André Almeida <andrealmeid@collabora.com> writes:
>>   	opt = fs_parse(fc, shmem_fs_parameters, param, &result);
>>   	if (opt < 0)
>> @@ -3468,6 +3519,23 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>>   		ctx->full_inums = true;
>>   		ctx->seen |= SHMEM_SEEN_INUMS;
>>   		break;
>> +	case Opt_casefold:
>> +		if (strncmp(param->string, "utf8-", 5))
>> +			return invalfc(fc, "Only utf8 encondings are supported");
>> +		ret = strscpy(version, param->string + 5, sizeof(version));
> 
> Ugh.  Now we are doing two strscpy for the parse api (in unicode_load).
> Can change the unicode_load api to reuse it?
> 

So instead of getting just the version number (e.g. "12.1.0") as 
parameter, utf8_load/unicode_load would get the full encoding string 
(e.g. "utf8-12.1.0") right?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
  2021-03-23 23:19   ` Al Viro
@ 2021-03-24 20:44     ` André Almeida
  0 siblings, 0 replies; 16+ messages in thread
From: André Almeida @ 2021-03-24 20:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Andrew Morton, krisman, smcv, kernel, linux-mm,
	linux-fsdevel, linux-kernel, Daniel Rosenberg

Hi Al Viro,

Às 20:19 de 23/03/21, Al Viro escreveu:
> On Tue, Mar 23, 2021 at 04:59:39PM -0300, André Almeida wrote:
> 
>> * dcache handling:
>>
>> For now, negative lookups are not inserted in the dcache, since they
>> would need to be invalidated anyway, because we can't trust missing file
>> dentries. This is bad for performance but requires some leveraging of
>> the VFS layer to fix. We can live without that for now, and so does
>> everyone else.
> 
> "For now"?  Not a single practical suggestion has ever materialized.
> Pardon me, but by now I'm very sceptical about the odds of that
> ever changing.  And no, I don't have any suggestions either.

Right, I'll reword this to reflect that there's no expectation that this 
will be done, while keeping documented this performance issue.

> 
>> The lookup() path at tmpfs creates negatives dentries, that are later
>> instantiated if the file is created. In that way, all files in tmpfs
>> have a dentry given that the filesystem exists exclusively in memory.
>> As explained above, we don't have negative dentries for casefold files,
>> so dentries are created at lookup() iff files aren't casefolded. Else,
>> the dentry is created just before being instantiated at create path.
>> At the remove path, dentries are invalidated for casefolded files.
> 
> Umm...  What happens to those assertions if previously sane directory
> gets case-buggered?  You've got an ioctl for doing just that...
> Incidentally, that ioctl is obviously racy - result of that simple_empty()
> might have nothing to do with reality before it is returned to caller.
> And while we are at it, simple_empty() doesn't check a damn thing about
> negative dentries in there...
> 

Thanks for pointing those issues. I'll move my lock at IOCTL to make 
impossible to change directory attributes and add a file there at the 
same time. About the negative dentries that existed before at that 
directory, I believe the way to solve this is by invalidating them all. 
How that sound to you?

Thanks,
	André


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 4/4] docs: tmpfs: Add casefold options
  2021-03-23 22:19   ` Gabriel Krisman Bertazi
@ 2021-03-24 20:47     ` André Almeida
  0 siblings, 0 replies; 16+ messages in thread
From: André Almeida @ 2021-03-24 20:47 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Hugh Dickins, Andrew Morton, Alexander Viro, smcv, kernel,
	linux-mm, linux-fsdevel, linux-kernel, Daniel Rosenberg

Hi Gabriel,

Às 19:19 de 23/03/21, Gabriel Krisman Bertazi escreveu:
> André Almeida <andrealmeid@collabora.com> writes:
> 
>> Document mounting options to enable casefold support in tmpfs.
>>
>> Signed-off-by: André Almeida <andrealmeid@collabora.com>
>> ---
>>   Documentation/filesystems/tmpfs.rst | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
>> index 0408c245785e..84c87c309bd7 100644
>> --- a/Documentation/filesystems/tmpfs.rst
>> +++ b/Documentation/filesystems/tmpfs.rst
>> @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
>>   will give you tmpfs instance on /mytmpfs which can allocate 10GB
>>   RAM/SWAP in 10240 inodes and it is only accessible by root.
>>   
>> +tmpfs has the following mounting options for case-insesitive lookups support:
>> +
>> +=========   ==============================================================
>> +casefold    Enable casefold support at this mount point using the given
>> +            argument as enconding. Currently only utf8 encondings are supported.
>> +cf_strict   Enable strict casefolding at this mouting point (disabled by
>> +            default). This means that invalid strings should be reject by the
>> +            file system.
> 
> strict mode refers to the encoding, not exactly casefold.  Maybe we
> could have a parameter encoding_flags that accepts the flag 'strict'.
> This would make it closer to the ext4 interface.

What are the other enconding flags? Or is this more about having a 
properly extensible interface?

> Alternatively, call this option strict_encoding.
> 

Thanks,
	André


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 4/4] docs: tmpfs: Add casefold options
  2021-03-23 21:58   ` Randy Dunlap
@ 2021-03-25 14:27     ` André Almeida
  0 siblings, 0 replies; 16+ messages in thread
From: André Almeida @ 2021-03-25 14:27 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: krisman, smcv, kernel, linux-mm, linux-fsdevel, Andrew Morton,
	Hugh Dickins, linux-kernel, Daniel Rosenberg, Alexander Viro

Às 18:58 de 23/03/21, Randy Dunlap escreveu:
> Hi--
> 
> On 3/23/21 12:59 PM, André Almeida wrote:
>> Document mounting options to enable casefold support in tmpfs.
>>
>> Signed-off-by: André Almeida <andrealmeid@collabora.com>
>> ---
>>   Documentation/filesystems/tmpfs.rst | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
>> index 0408c245785e..84c87c309bd7 100644
>> --- a/Documentation/filesystems/tmpfs.rst
>> +++ b/Documentation/filesystems/tmpfs.rst
>> @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
>>   will give you tmpfs instance on /mytmpfs which can allocate 10GB
>>   RAM/SWAP in 10240 inodes and it is only accessible by root.
>>   
>> +tmpfs has the following mounting options for case-insesitive lookups support:
>> +
>> +=========   ==============================================================
>> +casefold    Enable casefold support at this mount point using the given
>> +            argument as enconding. Currently only utf8 encondings are supported.
> 
>                             encoding.                      encodings
> 
>> +cf_strict   Enable strict casefolding at this mouting point (disabled by
> 
>                                                   mount
> 
>> +            default). This means that invalid strings should be reject by the
> 
>                                                                     rejected
> 
>> +            file system.
>> +=========   ==============================================================
>> +
>> +Note that this option doesn't enable casefold by default, one needs to set
> 
>                                                      default; one needs to set the
> 
>> +casefold flag per directory, setting the +F attribute in an empty directory. New
>> +directories within a casefolded one will inherit the flag.
> 
> 

Thanks for the feedback Randy, all changes applied.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-03-25 14:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23 19:59 [RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs André Almeida
2021-03-23 19:59 ` [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()" André Almeida
2021-03-23 20:15   ` Matthew Wilcox
2021-03-24 20:09     ` André Almeida
2021-03-23 19:59 ` [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups André Almeida
2021-03-23 20:18   ` Gabriel Krisman Bertazi
2021-03-24 20:17     ` André Almeida
2021-03-23 23:19   ` Al Viro
2021-03-24 20:44     ` André Almeida
2021-03-23 19:59 ` [RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs André Almeida
2021-03-23 22:15   ` Gabriel Krisman Bertazi
2021-03-23 19:59 ` [RFC PATCH 4/4] docs: tmpfs: Add casefold options André Almeida
2021-03-23 21:58   ` Randy Dunlap
2021-03-25 14:27     ` André Almeida
2021-03-23 22:19   ` Gabriel Krisman Bertazi
2021-03-24 20:47     ` André Almeida

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).