All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
@ 2023-07-19 22:19 ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, Gabriel Krisman Bertazi

Hi,

V3 applies the fixes suggested by Eric Biggers (thank you for your
review!).  Changelog inlined in the patches.

Retested with xfstests for ext4 and f2fs.

--
cover letter from v1.

This patchset enables negative dentries for case-insensitive directories
in ext4/f2fs.  It solves the corner cases for this feature, including
those already tested by fstests (generic/556).  It also solves an
existing bug with the existing implementation where old negative
dentries are left behind after a directory conversion to
case-insensitive.

Testing-wise, I ran sanity checks to show it properly uses the created
negative dentries, observed the expected performance increase of the
dentry cache hit, and showed it survives the quick group in fstests on
both f2fs and ext4 without regressions.

* Background

Negative dentries have always been disabled in case-insensitive
directories because, in their current form, they can't provide enough
assurances that all the case variations of a filename won't exist in a
directory, and the name-preserving case-insenstive semantics
during file creation prevents some negative dentries from being
instantiated unmodified.

Nevertheless, for the general case, the existing implementation would
already work with negative dentries, even though they are fully
disabled. That is: if the original lookup that created the dentry was
done in a case-insensitive way, the negative dentry can usually be
validated, since it assures that no other dcache entry exists, *and*
that no variation of the file exists on disk (since the lookup
failed). A following lookup would then be executed with the
case-insensitive-aware d_hash and d_lookup, which would find the right
negative dentry and use it.

The first corner case arises when a case-insensitive directory has
negative dentries that were created before the directory was flipped to
case-insensitive.  A directory must be empty to be converted, but it
doesn't mean the directory doesn't have negative dentry children.  If
that happens, the dangling dentries left behind can't assure that no
case-variation of the name exists. They only mean the exact name
doesn't exist.  A further lookup would incorrectly validate them.

The code below demonstrates the problem.  In this example $1 and $2 are
two strings, where:

      (i) $1 != $2
     (ii) casefold($1) == casefold($2)
    (iii) hash($1) == hash($2) == hash(casefold($1))

Then, the following sequence could potentially return a ENOENT, even
though the case-insensitive lookup should exist:

  mkdir  d      <- Case-sensitive directory
  touch  d/$1
  touch  d/$2
  unlink d/$1   <- leaves negative dentry  behind.
  unlink d/$2   <- leaves *another* negative dentry behind.
  chattr +F d   <- make 'd' case-insensitive.
  touch  d/$1   <- Both negative dentries could match. finds one of them,
		   and instantiate
  access d/$1   <- Find the other negative dentry, get -ENOENT.

In fact, this is a problem even on the current implementation, where
negative dentries for CI are disabled.  There was a bug reported by Al
Viro in 2020, where a directory might end up with dangling negative
dentries created during a case-sensitive lookup, because they existed
before the +F attribute was set.

It is hard to trigger the issue, because condition (iii) is hard to test
on an unmodified kernel.  By hacking the kernel to force the hash
collision, there are a few ways we can trigger this bizarre behavior in
case-insensitive directories through the insertion of negative dentries.

Another problem exists when turning a negative dentry to positive.  If
the negative dentry has a different case than what is currently being
used for lookup, the dentry cannot be reused without changing its name,
in order to guarantee filename-preserving semantics to userspace.  We
need to either change the name or invalidate the dentry. This issue is
currently avoided in mainline, since the negative dentry mechanism is
disabled.

* Proposal

The main idea is to differentiate negative dentries created in a
case-insensitive context from those created during a case-sensitive
lookup via a new dentry flag, D_CASEFOLD_LOOKUP, set by the filesystem
the d_lookup hook.  Since the former can be used (except for the
name-preserving issue), d_revalidate will just check the flag to
quickly accept or reject the dentry.

A different solution would be to guarantee no negative dentry exists
during the case-sensitive to case-insensitive directory conversion (the
other direction is safe).  It has the following problems:

  1) It is not trivial to implement a race-free mechanism to ensure
  negative dentries won't be recreated immediately after invalidation
  while converting the directory.

  2) The knowledge whether the negative dentry is valid (i.e. comes from
  a case-insensitive lookup) is implicit on the fact that we are
  correctly invalidating dentries when converting the directory.

Having a D_CASEFOLD_LOOKUP avoids both issues, and seems to be a cheap
solution to the problem.

But, as explained above, due to the filename preserving semantics, we
cannot just validate based on D_CASEFOLD_LOOKUP.

For that, one solution would be to invalidate the negative dentry when
it is decided to turn it positive, instead of reusing it. I implemented
that in the past (2018) but Al Viro made it clear we don't want to incur
costs on the VFS critical path for filesystems who don't care about
case-insensitiveness.

Instead, this patch invalidates negative dentries in casefold
directories in d_revalidate during creation lookups, iff the lookup name
is not exactly what is cached.  Other kinds of lookups wouldn't need
this limitation.

* caveats

1) Encryption

Negative dentries on case-insensitive encrypted directories are also
disabled.  No semantic change for them is intended in
this patchset; we just bypass the revalidation directly to fscrypt, for
positive dentries.  Encryption support is future work.

2) revalidate the cached dentry using the name under lookup

Validating based on the lookup name is strange for a cache.  the new
semantic is implemented by d_revalidate, to stay out of the critical
path of filesystems who don't care about case-insensitiveness, as much
as possible.  The only change is the addition of a new flavor of
d_revalidate.

* Tests

There are a tests in place for most of the corner cases in generic/556.
They mainly verify the name-preserving semantics.  The invalidation when
converting the directory is harder to test, because it is hard to force
the invalidation of specific cached dentries that occlude a dangling
invalid dentry.  I tested it with forcing the positive dentries to be
removed, but I'm not sure how to write an upstreamable test.

It also survives fstests quick group regression testing on both ext4 and
f2fs.

* Performance

The latency of lookups of non-existing files is obviously improved, as
would be expected.  The following numbers compare the execution time of 10^6
lookups of a non-existing file in a case-insensitive directory
pre-populated with 100k files in ext4.

Without the patch: 10.363s / 0.349s / 9.920s  (real/user/sys)
With the patch:     1.752s / 0.276s / 1.472s  (real/user/sys)

* patchset

Patch 1 introduces a new flavor of d_revalidate to provide the
filesystem with the name under lookup; Patch 2 introduces the new flag
to signal the dentry creation context; Patch 3 introduces a libfs helper
to revalidate negative dentries on case-insensitive directories; Patch 4
deals with encryption; Patch 5 cleans up the now redundant dentry
operations for case-insensitive with and without encryption; Finally,
Patch 6 and 7 enable support on case-insensitive directories
for ext4 and f2fs, respectively.

Gabriel Krisman Bertazi (7):
  fs: Expose name under lookup to d_revalidate hook
  fs: Add DCACHE_CASEFOLDED_NAME flag
  libfs: Validate negative dentries in case-insensitive directories
  libfs: Chain encryption checks after case-insensitive revalidation
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  ext4: Enable negative dentries on case-insensitive lookup
  f2fs: Enable negative dentries on case-insensitive lookup

 Documentation/filesystems/locking.rst |  3 +
 Documentation/filesystems/vfs.rst     | 12 ++++
 fs/dcache.c                           | 10 ++-
 fs/ext4/namei.c                       | 35 ++--------
 fs/f2fs/namei.c                       | 25 ++-----
 fs/libfs.c                            | 93 ++++++++++++++++++---------
 fs/namei.c                            | 23 ++++---
 include/linux/dcache.h                |  9 +++
 8 files changed, 116 insertions(+), 94 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
@ 2023-07-19 22:19 ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4, linux-f2fs-devel

Hi,

V3 applies the fixes suggested by Eric Biggers (thank you for your
review!).  Changelog inlined in the patches.

Retested with xfstests for ext4 and f2fs.

--
cover letter from v1.

This patchset enables negative dentries for case-insensitive directories
in ext4/f2fs.  It solves the corner cases for this feature, including
those already tested by fstests (generic/556).  It also solves an
existing bug with the existing implementation where old negative
dentries are left behind after a directory conversion to
case-insensitive.

Testing-wise, I ran sanity checks to show it properly uses the created
negative dentries, observed the expected performance increase of the
dentry cache hit, and showed it survives the quick group in fstests on
both f2fs and ext4 without regressions.

* Background

Negative dentries have always been disabled in case-insensitive
directories because, in their current form, they can't provide enough
assurances that all the case variations of a filename won't exist in a
directory, and the name-preserving case-insenstive semantics
during file creation prevents some negative dentries from being
instantiated unmodified.

Nevertheless, for the general case, the existing implementation would
already work with negative dentries, even though they are fully
disabled. That is: if the original lookup that created the dentry was
done in a case-insensitive way, the negative dentry can usually be
validated, since it assures that no other dcache entry exists, *and*
that no variation of the file exists on disk (since the lookup
failed). A following lookup would then be executed with the
case-insensitive-aware d_hash and d_lookup, which would find the right
negative dentry and use it.

The first corner case arises when a case-insensitive directory has
negative dentries that were created before the directory was flipped to
case-insensitive.  A directory must be empty to be converted, but it
doesn't mean the directory doesn't have negative dentry children.  If
that happens, the dangling dentries left behind can't assure that no
case-variation of the name exists. They only mean the exact name
doesn't exist.  A further lookup would incorrectly validate them.

The code below demonstrates the problem.  In this example $1 and $2 are
two strings, where:

      (i) $1 != $2
     (ii) casefold($1) == casefold($2)
    (iii) hash($1) == hash($2) == hash(casefold($1))

Then, the following sequence could potentially return a ENOENT, even
though the case-insensitive lookup should exist:

  mkdir  d      <- Case-sensitive directory
  touch  d/$1
  touch  d/$2
  unlink d/$1   <- leaves negative dentry  behind.
  unlink d/$2   <- leaves *another* negative dentry behind.
  chattr +F d   <- make 'd' case-insensitive.
  touch  d/$1   <- Both negative dentries could match. finds one of them,
		   and instantiate
  access d/$1   <- Find the other negative dentry, get -ENOENT.

In fact, this is a problem even on the current implementation, where
negative dentries for CI are disabled.  There was a bug reported by Al
Viro in 2020, where a directory might end up with dangling negative
dentries created during a case-sensitive lookup, because they existed
before the +F attribute was set.

It is hard to trigger the issue, because condition (iii) is hard to test
on an unmodified kernel.  By hacking the kernel to force the hash
collision, there are a few ways we can trigger this bizarre behavior in
case-insensitive directories through the insertion of negative dentries.

Another problem exists when turning a negative dentry to positive.  If
the negative dentry has a different case than what is currently being
used for lookup, the dentry cannot be reused without changing its name,
in order to guarantee filename-preserving semantics to userspace.  We
need to either change the name or invalidate the dentry. This issue is
currently avoided in mainline, since the negative dentry mechanism is
disabled.

* Proposal

The main idea is to differentiate negative dentries created in a
case-insensitive context from those created during a case-sensitive
lookup via a new dentry flag, D_CASEFOLD_LOOKUP, set by the filesystem
the d_lookup hook.  Since the former can be used (except for the
name-preserving issue), d_revalidate will just check the flag to
quickly accept or reject the dentry.

A different solution would be to guarantee no negative dentry exists
during the case-sensitive to case-insensitive directory conversion (the
other direction is safe).  It has the following problems:

  1) It is not trivial to implement a race-free mechanism to ensure
  negative dentries won't be recreated immediately after invalidation
  while converting the directory.

  2) The knowledge whether the negative dentry is valid (i.e. comes from
  a case-insensitive lookup) is implicit on the fact that we are
  correctly invalidating dentries when converting the directory.

Having a D_CASEFOLD_LOOKUP avoids both issues, and seems to be a cheap
solution to the problem.

But, as explained above, due to the filename preserving semantics, we
cannot just validate based on D_CASEFOLD_LOOKUP.

For that, one solution would be to invalidate the negative dentry when
it is decided to turn it positive, instead of reusing it. I implemented
that in the past (2018) but Al Viro made it clear we don't want to incur
costs on the VFS critical path for filesystems who don't care about
case-insensitiveness.

Instead, this patch invalidates negative dentries in casefold
directories in d_revalidate during creation lookups, iff the lookup name
is not exactly what is cached.  Other kinds of lookups wouldn't need
this limitation.

* caveats

1) Encryption

Negative dentries on case-insensitive encrypted directories are also
disabled.  No semantic change for them is intended in
this patchset; we just bypass the revalidation directly to fscrypt, for
positive dentries.  Encryption support is future work.

2) revalidate the cached dentry using the name under lookup

Validating based on the lookup name is strange for a cache.  the new
semantic is implemented by d_revalidate, to stay out of the critical
path of filesystems who don't care about case-insensitiveness, as much
as possible.  The only change is the addition of a new flavor of
d_revalidate.

* Tests

There are a tests in place for most of the corner cases in generic/556.
They mainly verify the name-preserving semantics.  The invalidation when
converting the directory is harder to test, because it is hard to force
the invalidation of specific cached dentries that occlude a dangling
invalid dentry.  I tested it with forcing the positive dentries to be
removed, but I'm not sure how to write an upstreamable test.

It also survives fstests quick group regression testing on both ext4 and
f2fs.

* Performance

The latency of lookups of non-existing files is obviously improved, as
would be expected.  The following numbers compare the execution time of 10^6
lookups of a non-existing file in a case-insensitive directory
pre-populated with 100k files in ext4.

Without the patch: 10.363s / 0.349s / 9.920s  (real/user/sys)
With the patch:     1.752s / 0.276s / 1.472s  (real/user/sys)

* patchset

Patch 1 introduces a new flavor of d_revalidate to provide the
filesystem with the name under lookup; Patch 2 introduces the new flag
to signal the dentry creation context; Patch 3 introduces a libfs helper
to revalidate negative dentries on case-insensitive directories; Patch 4
deals with encryption; Patch 5 cleans up the now redundant dentry
operations for case-insensitive with and without encryption; Finally,
Patch 6 and 7 enable support on case-insensitive directories
for ext4 and f2fs, respectively.

Gabriel Krisman Bertazi (7):
  fs: Expose name under lookup to d_revalidate hook
  fs: Add DCACHE_CASEFOLDED_NAME flag
  libfs: Validate negative dentries in case-insensitive directories
  libfs: Chain encryption checks after case-insensitive revalidation
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  ext4: Enable negative dentries on case-insensitive lookup
  f2fs: Enable negative dentries on case-insensitive lookup

 Documentation/filesystems/locking.rst |  3 +
 Documentation/filesystems/vfs.rst     | 12 ++++
 fs/dcache.c                           | 10 ++-
 fs/ext4/namei.c                       | 35 ++--------
 fs/f2fs/namei.c                       | 25 ++-----
 fs/libfs.c                            | 93 ++++++++++++++++++---------
 fs/namei.c                            | 23 ++++---
 include/linux/dcache.h                |  9 +++
 8 files changed, 116 insertions(+), 94 deletions(-)

-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/7] fs: Expose name under lookup to d_revalidate hook
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Negative dentries support on case-insensitive ext4/f2fs will require
access to the name under lookup to ensure it matches the dentry.  This
adds an optional new flavor of cached dentry revalidation hook to expose
this extra parameter.

I'm fine with extending d_revalidate instead of adding a new hook, if
it is considered cleaner and the approach is accepted.  I wrote a new
hook to simplify reviewing.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Document d_revalidate_name hook. (Eric)
---
 Documentation/filesystems/locking.rst |  3 +++
 Documentation/filesystems/vfs.rst     | 12 ++++++++++++
 fs/dcache.c                           |  2 +-
 fs/namei.c                            | 23 ++++++++++++++---------
 include/linux/dcache.h                |  1 +
 5 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index ed148919e11a..d68997ba6584 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -18,6 +18,8 @@ dentry_operations
 prototypes::
 
 	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate_name)(struct dentry *, const struct qstr *,
+				 unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
@@ -37,6 +39,7 @@ locking rules:
 ops		   rename_lock	->d_lock	may block	rcu-walk
 ================== ===========	========	==============	========
 d_revalidate:	   no		no		yes (ref-walk)	maybe
+d_revalidate_name: no		no		yes (ref-walk)	maybe
 d_weak_revalidate: no		no		yes	 	no
 d_hash		   no		no		no		maybe
 d_compare:	   yes		no		no		maybe
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index cb2a97e49872..34c842bd7cb2 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1252,6 +1252,8 @@ defined:
 
 	struct dentry_operations {
 		int (*d_revalidate)(struct dentry *, unsigned int);
+		int (*d_revalidate_name)(struct dentry *, const struct qstr *,
+					 unsigned int);
 		int (*d_weak_revalidate)(struct dentry *, unsigned int);
 		int (*d_hash)(const struct dentry *, struct qstr *);
 		int (*d_compare)(const struct dentry *,
@@ -1288,6 +1290,16 @@ defined:
 	return
 	-ECHILD and it will be called again in ref-walk mode.
 
+``d_revalidate_name``
+	Variant of d_revalidate that also provides the name under look-up.  Most
+	filesystems will keep it as NULL, unless there are particular semantics
+	for filenames encoding that need to be handled during dentry
+	revalidation.
+
+	When available, it is called in lieu of d_revalidate and has the same
+	locking rules and return semantics.  Refer to d_revalidate for more
+	information.
+
 ``d_weak_revalidate``
 	called when the VFS needs to revalidate a "jumped" dentry.  This
 	is called when a path-walk ends at dentry that was not acquired
diff --git a/fs/dcache.c b/fs/dcache.c
index 52e6d5fdab6b..98521862e58a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1928,7 +1928,7 @@ void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op)
 		dentry->d_flags |= DCACHE_OP_HASH;
 	if (op->d_compare)
 		dentry->d_flags |= DCACHE_OP_COMPARE;
-	if (op->d_revalidate)
+	if (op->d_revalidate || op->d_revalidate_name)
 		dentry->d_flags |= DCACHE_OP_REVALIDATE;
 	if (op->d_weak_revalidate)
 		dentry->d_flags |= DCACHE_OP_WEAK_REVALIDATE;
diff --git a/fs/namei.c b/fs/namei.c
index e56ff39a79bc..84df0ddd20db 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -853,11 +853,16 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	return false;
 }
 
-static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
+static inline int d_revalidate(struct dentry *dentry,
+			       const struct qstr *name,
+			       unsigned int flags)
 {
-	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
+
+	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
+		if (dentry->d_op->d_revalidate_name)
+			return dentry->d_op->d_revalidate_name(dentry, name, flags);
 		return dentry->d_op->d_revalidate(dentry, flags);
-	else
+	} else
 		return 1;
 }
 
@@ -1565,7 +1570,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 {
 	struct dentry *dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dentry, name, flags);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
@@ -1636,19 +1641,19 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 		if (read_seqcount_retry(&parent->d_seq, nd->seq))
 			return ERR_PTR(-ECHILD);
 
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(dentry, &nd->last, nd->flags);
 		if (likely(status > 0))
 			return dentry;
 		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
-			status = d_revalidate(dentry, nd->flags);
+			status = d_revalidate(dentry, &nd->last, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
 			return NULL;
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(dentry, &nd->last, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
@@ -1676,7 +1681,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dentry, name, flags);
 		if (unlikely(error <= 0)) {
 			if (!error) {
 				d_invalidate(dentry);
@@ -3421,7 +3426,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (d_in_lookup(dentry))
 			break;
 
-		error = d_revalidate(dentry, nd->flags);
+		error = d_revalidate(dentry, &nd->last, nd->flags);
 		if (likely(error > 0))
 			break;
 		if (error)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6b351e009f59..b6188f2e8950 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -127,6 +127,7 @@ enum dentry_d_lock_class
 
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate_name)(struct dentry *, const struct qstr *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 1/7] fs: Expose name under lookup to d_revalidate hook
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Negative dentries support on case-insensitive ext4/f2fs will require
access to the name under lookup to ensure it matches the dentry.  This
adds an optional new flavor of cached dentry revalidation hook to expose
this extra parameter.

I'm fine with extending d_revalidate instead of adding a new hook, if
it is considered cleaner and the approach is accepted.  I wrote a new
hook to simplify reviewing.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Document d_revalidate_name hook. (Eric)
---
 Documentation/filesystems/locking.rst |  3 +++
 Documentation/filesystems/vfs.rst     | 12 ++++++++++++
 fs/dcache.c                           |  2 +-
 fs/namei.c                            | 23 ++++++++++++++---------
 include/linux/dcache.h                |  1 +
 5 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index ed148919e11a..d68997ba6584 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -18,6 +18,8 @@ dentry_operations
 prototypes::
 
 	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate_name)(struct dentry *, const struct qstr *,
+				 unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
@@ -37,6 +39,7 @@ locking rules:
 ops		   rename_lock	->d_lock	may block	rcu-walk
 ================== ===========	========	==============	========
 d_revalidate:	   no		no		yes (ref-walk)	maybe
+d_revalidate_name: no		no		yes (ref-walk)	maybe
 d_weak_revalidate: no		no		yes	 	no
 d_hash		   no		no		no		maybe
 d_compare:	   yes		no		no		maybe
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index cb2a97e49872..34c842bd7cb2 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1252,6 +1252,8 @@ defined:
 
 	struct dentry_operations {
 		int (*d_revalidate)(struct dentry *, unsigned int);
+		int (*d_revalidate_name)(struct dentry *, const struct qstr *,
+					 unsigned int);
 		int (*d_weak_revalidate)(struct dentry *, unsigned int);
 		int (*d_hash)(const struct dentry *, struct qstr *);
 		int (*d_compare)(const struct dentry *,
@@ -1288,6 +1290,16 @@ defined:
 	return
 	-ECHILD and it will be called again in ref-walk mode.
 
+``d_revalidate_name``
+	Variant of d_revalidate that also provides the name under look-up.  Most
+	filesystems will keep it as NULL, unless there are particular semantics
+	for filenames encoding that need to be handled during dentry
+	revalidation.
+
+	When available, it is called in lieu of d_revalidate and has the same
+	locking rules and return semantics.  Refer to d_revalidate for more
+	information.
+
 ``d_weak_revalidate``
 	called when the VFS needs to revalidate a "jumped" dentry.  This
 	is called when a path-walk ends at dentry that was not acquired
diff --git a/fs/dcache.c b/fs/dcache.c
index 52e6d5fdab6b..98521862e58a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1928,7 +1928,7 @@ void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op)
 		dentry->d_flags |= DCACHE_OP_HASH;
 	if (op->d_compare)
 		dentry->d_flags |= DCACHE_OP_COMPARE;
-	if (op->d_revalidate)
+	if (op->d_revalidate || op->d_revalidate_name)
 		dentry->d_flags |= DCACHE_OP_REVALIDATE;
 	if (op->d_weak_revalidate)
 		dentry->d_flags |= DCACHE_OP_WEAK_REVALIDATE;
diff --git a/fs/namei.c b/fs/namei.c
index e56ff39a79bc..84df0ddd20db 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -853,11 +853,16 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	return false;
 }
 
-static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
+static inline int d_revalidate(struct dentry *dentry,
+			       const struct qstr *name,
+			       unsigned int flags)
 {
-	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
+
+	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
+		if (dentry->d_op->d_revalidate_name)
+			return dentry->d_op->d_revalidate_name(dentry, name, flags);
 		return dentry->d_op->d_revalidate(dentry, flags);
-	else
+	} else
 		return 1;
 }
 
@@ -1565,7 +1570,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 {
 	struct dentry *dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dentry, name, flags);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
@@ -1636,19 +1641,19 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 		if (read_seqcount_retry(&parent->d_seq, nd->seq))
 			return ERR_PTR(-ECHILD);
 
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(dentry, &nd->last, nd->flags);
 		if (likely(status > 0))
 			return dentry;
 		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
-			status = d_revalidate(dentry, nd->flags);
+			status = d_revalidate(dentry, &nd->last, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
 			return NULL;
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(dentry, &nd->last, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
@@ -1676,7 +1681,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dentry, name, flags);
 		if (unlikely(error <= 0)) {
 			if (!error) {
 				d_invalidate(dentry);
@@ -3421,7 +3426,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (d_in_lookup(dentry))
 			break;
 
-		error = d_revalidate(dentry, nd->flags);
+		error = d_revalidate(dentry, &nd->last, nd->flags);
 		if (likely(error > 0))
 			break;
 		if (error)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6b351e009f59..b6188f2e8950 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -127,6 +127,7 @@ enum dentry_d_lock_class
 
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate_name)(struct dentry *, const struct qstr *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 2/7] fs: Add DCACHE_CASEFOLDED_NAME flag
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

This flag marks a negative or positive dentry as being created after a
case-insensitive lookup operation.  It is useful to differentiate
dentries this way to detect whether the negative dentry can be trusted
during a case-insensitive lookup.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  -  Rename DCACHE_CASEFOLD_LOOKUP -> DCACHE_CASEFOLDED_NAME (Eric)
---
 fs/dcache.c            | 8 ++++++++
 include/linux/dcache.h | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 98521862e58a..5791489b589f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1958,6 +1958,14 @@ void d_set_fallthru(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_set_fallthru);
 
+void d_set_casefold_lookup(struct dentry *dentry)
+{
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_CASEFOLDED_NAME;
+	spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL(d_set_casefold_lookup);
+
 static unsigned d_flags_for_inode(struct inode *inode)
 {
 	unsigned add_flags = DCACHE_REGULAR_TYPE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b6188f2e8950..14aa0255bd04 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -209,6 +209,7 @@ struct dentry_operations {
 #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
 #define DCACHE_NOKEY_NAME		0x02000000 /* Encrypted name encoded without key */
 #define DCACHE_OP_REAL			0x04000000
+#define DCACHE_CASEFOLDED_NAME		0x08000000 /* Dentry comes from a casefold directory */
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
@@ -497,6 +498,13 @@ static inline bool d_is_fallthru(const struct dentry *dentry)
 	return dentry->d_flags & DCACHE_FALLTHRU;
 }
 
+extern void d_set_casefold_lookup(struct dentry *dentry);
+
+static inline bool d_is_casefold_lookup(const struct dentry *dentry)
+{
+	return dentry->d_flags & DCACHE_CASEFOLDED_NAME;
+}
+
 
 extern int sysctl_vfs_cache_pressure;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 2/7] fs: Add DCACHE_CASEFOLDED_NAME flag
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

This flag marks a negative or positive dentry as being created after a
case-insensitive lookup operation.  It is useful to differentiate
dentries this way to detect whether the negative dentry can be trusted
during a case-insensitive lookup.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  -  Rename DCACHE_CASEFOLD_LOOKUP -> DCACHE_CASEFOLDED_NAME (Eric)
---
 fs/dcache.c            | 8 ++++++++
 include/linux/dcache.h | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 98521862e58a..5791489b589f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1958,6 +1958,14 @@ void d_set_fallthru(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_set_fallthru);
 
+void d_set_casefold_lookup(struct dentry *dentry)
+{
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_CASEFOLDED_NAME;
+	spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL(d_set_casefold_lookup);
+
 static unsigned d_flags_for_inode(struct inode *inode)
 {
 	unsigned add_flags = DCACHE_REGULAR_TYPE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b6188f2e8950..14aa0255bd04 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -209,6 +209,7 @@ struct dentry_operations {
 #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
 #define DCACHE_NOKEY_NAME		0x02000000 /* Encrypted name encoded without key */
 #define DCACHE_OP_REAL			0x04000000
+#define DCACHE_CASEFOLDED_NAME		0x08000000 /* Dentry comes from a casefold directory */
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
@@ -497,6 +498,13 @@ static inline bool d_is_fallthru(const struct dentry *dentry)
 	return dentry->d_flags & DCACHE_FALLTHRU;
 }
 
+extern void d_set_casefold_lookup(struct dentry *dentry);
+
+static inline bool d_is_casefold_lookup(const struct dentry *dentry)
+{
+	return dentry->d_flags & DCACHE_CASEFOLDED_NAME;
+}
+
 
 extern int sysctl_vfs_cache_pressure;
 
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Introduce a dentry revalidation helper to be used by case-insensitive
filesystems to check if it is safe to reuse a negative dentry.

A negative dentry is safe to be reused on a case-insensitive lookup if
it was created during a case-insensitive lookup and this is not a lookup
that will instantiate a dentry. If this is a creation lookup, we also
need to make sure the name matches sensitively the name under lookup in
order to assure the name preserving semantics.

dentry->d_name is only checked by the case-insensitive d_revalidate hook
in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
d_revalidate is always called with the parent inode locked, and
therefore the name cannot change from under us.

d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
lookup_open and lookup_fast:

  - lookup_dcache always calls it with zeroed flags, with the exception
  of when coming from __lookup_hash, which needs the parent locked
  already, for instance in the open/creation path, which is locked in
  open_last_lookups.

  - In __lookup_slow, either the parent inode is locked by the
  caller (lookup_slow), or it is called with no
  flags (lookup_one/lookup_one_len).

  - lookup_open also requires the parent to be locked in the creation
  case, which is done in open_last_lookups.

  - lookup_fast will indeed be called with the parent unlocked, but it
  shouldn't be called with LOOKUP_CREATE.  Either it is called in the
  link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
  open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
  because it is only called on the !O_CREAT case, which means op->intent
  doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
  set).

Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
parents inodes are also be locked.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Add comments to all rejection cases (eric)
  - safeguard against filesystem creating dentries without LOOKUP flags
---
 fs/libfs.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 5b851315eeed..dd213f446427 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1462,9 +1462,57 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 	return 0;
 }
 
+static inline int generic_ci_d_revalidate(struct dentry *dentry,
+					  const struct qstr *name,
+					  unsigned int flags)
+{
+	if (d_is_negative(dentry)) {
+		const struct dentry *parent = READ_ONCE(dentry->d_parent);
+		const struct inode *dir = READ_ONCE(parent->d_inode);
+
+		if (dir && needs_casefold(dir)) {
+			/*
+			 * Filesystems will call into d_revalidate without
+			 * setting LOOKUP_ flags even for file creation(see
+			 * lookup_one* variants).  Reject negative dentries in
+			 * this case, since we can't know for sure it won't be
+			 * used for creation.
+			 */
+			if (!flags)
+				return 0;
+
+			/*
+			 * Negative dentries created prior to turning the
+			 * directory case-insensitive cannot be trusted, since
+			 * they don't ensure any possible case version of the
+			 * filename doesn't exist.
+			 */
+			if (!d_is_casefold_lookup(dentry))
+				return 0;
+
+			if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
+				/*
+				 * ->d_name won't change from under us in the
+				 * creation path only, since d_revalidate during
+				 * creation and renames is always called with
+				 * the parent inode locked.  It isn't the case
+				 * for all lookup callpaths, so ->d_name must
+				 * not be touched outside
+				 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
+				 */
+				if (dentry->d_name.len != name->len ||
+				    memcmp(dentry->d_name.name, name->name, name->len))
+					return 0;
+			}
+		}
+	}
+	return 1;
+}
+
 static const struct dentry_operations generic_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
+	.d_revalidate_name = generic_ci_d_revalidate,
 };
 #endif
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Introduce a dentry revalidation helper to be used by case-insensitive
filesystems to check if it is safe to reuse a negative dentry.

A negative dentry is safe to be reused on a case-insensitive lookup if
it was created during a case-insensitive lookup and this is not a lookup
that will instantiate a dentry. If this is a creation lookup, we also
need to make sure the name matches sensitively the name under lookup in
order to assure the name preserving semantics.

dentry->d_name is only checked by the case-insensitive d_revalidate hook
in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
d_revalidate is always called with the parent inode locked, and
therefore the name cannot change from under us.

d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
lookup_open and lookup_fast:

  - lookup_dcache always calls it with zeroed flags, with the exception
  of when coming from __lookup_hash, which needs the parent locked
  already, for instance in the open/creation path, which is locked in
  open_last_lookups.

  - In __lookup_slow, either the parent inode is locked by the
  caller (lookup_slow), or it is called with no
  flags (lookup_one/lookup_one_len).

  - lookup_open also requires the parent to be locked in the creation
  case, which is done in open_last_lookups.

  - lookup_fast will indeed be called with the parent unlocked, but it
  shouldn't be called with LOOKUP_CREATE.  Either it is called in the
  link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
  open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
  because it is only called on the !O_CREAT case, which means op->intent
  doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
  set).

Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
parents inodes are also be locked.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Add comments to all rejection cases (eric)
  - safeguard against filesystem creating dentries without LOOKUP flags
---
 fs/libfs.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 5b851315eeed..dd213f446427 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1462,9 +1462,57 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 	return 0;
 }
 
+static inline int generic_ci_d_revalidate(struct dentry *dentry,
+					  const struct qstr *name,
+					  unsigned int flags)
+{
+	if (d_is_negative(dentry)) {
+		const struct dentry *parent = READ_ONCE(dentry->d_parent);
+		const struct inode *dir = READ_ONCE(parent->d_inode);
+
+		if (dir && needs_casefold(dir)) {
+			/*
+			 * Filesystems will call into d_revalidate without
+			 * setting LOOKUP_ flags even for file creation(see
+			 * lookup_one* variants).  Reject negative dentries in
+			 * this case, since we can't know for sure it won't be
+			 * used for creation.
+			 */
+			if (!flags)
+				return 0;
+
+			/*
+			 * Negative dentries created prior to turning the
+			 * directory case-insensitive cannot be trusted, since
+			 * they don't ensure any possible case version of the
+			 * filename doesn't exist.
+			 */
+			if (!d_is_casefold_lookup(dentry))
+				return 0;
+
+			if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
+				/*
+				 * ->d_name won't change from under us in the
+				 * creation path only, since d_revalidate during
+				 * creation and renames is always called with
+				 * the parent inode locked.  It isn't the case
+				 * for all lookup callpaths, so ->d_name must
+				 * not be touched outside
+				 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
+				 */
+				if (dentry->d_name.len != name->len ||
+				    memcmp(dentry->d_name.name, name->name, name->len))
+					return 0;
+			}
+		}
+	}
+	return 1;
+}
+
 static const struct dentry_operations generic_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
+	.d_revalidate_name = generic_ci_d_revalidate,
 };
 #endif
 
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 4/7] libfs: Chain encryption checks after case-insensitive revalidation
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Support encrypted dentries in generic_ci_d_revalidate by chaining
fscrypt_d_revalidate at the tail of the d_revalidate.  This allows
filesystem to just call generic_ci_d_revalidate and let it handle any
case-insensitive dentry (encrypted or not).

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Enable negative dentries of encrypted filesystems (Eric)
---
 fs/libfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index dd213f446427..0e5d3bb1dddc 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1506,7 +1506,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
 			}
 		}
 	}
-	return 1;
+	return fscrypt_d_revalidate(dentry, flags);
 }
 
 static const struct dentry_operations generic_ci_dentry_ops = {
@@ -1526,7 +1526,7 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
 static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
-	.d_revalidate = fscrypt_d_revalidate,
+	.d_revalidate_name = generic_ci_d_revalidate,
 };
 #endif
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 4/7] libfs: Chain encryption checks after case-insensitive revalidation
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Support encrypted dentries in generic_ci_d_revalidate by chaining
fscrypt_d_revalidate at the tail of the d_revalidate.  This allows
filesystem to just call generic_ci_d_revalidate and let it handle any
case-insensitive dentry (encrypted or not).

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Enable negative dentries of encrypted filesystems (Eric)
---
 fs/libfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index dd213f446427..0e5d3bb1dddc 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1506,7 +1506,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
 			}
 		}
 	}
-	return 1;
+	return fscrypt_d_revalidate(dentry, flags);
 }
 
 static const struct dentry_operations generic_ci_dentry_ops = {
@@ -1526,7 +1526,7 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
 static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
-	.d_revalidate = fscrypt_d_revalidate,
+	.d_revalidate_name = generic_ci_d_revalidate,
 };
 #endif
 
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 5/7] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
equivalent.  Merge them together and simplify the setup code.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
changes since v2:
  - reword comment for clarity (Eric)
---
 fs/libfs.c | 45 +++++++++++++--------------------------------
 1 file changed, 13 insertions(+), 32 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 0e5d3bb1dddc..73cd06e7ff90 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1509,7 +1509,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
 	return fscrypt_d_revalidate(dentry, flags);
 }
 
-static const struct dentry_operations generic_ci_dentry_ops = {
+static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
 	.d_revalidate_name = generic_ci_d_revalidate,
@@ -1522,26 +1522,19 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
 };
 #endif
 
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
-	.d_hash = generic_ci_d_hash,
-	.d_compare = generic_ci_d_compare,
-	.d_revalidate_name = generic_ci_d_revalidate,
-};
-#endif
-
 /**
  * generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
  * @dentry:	dentry to set ops on
  *
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively.  Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later.  As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
+ * Casefolded directories need some dentry_operations set, so that the dentries
+ * contained in them are handled case-insensitively.  Note that d_hash and
+ * d_compare are needed on the parent directory rather than on the dentries in
+ * it, and while the casefolding flag can be toggled on and off on an empty
+ * directory, dentry_operations can't be changed later.  As a result, if the
+ * filesystem has casefolding support enabled at all, we have to give all
+ * dentries the casefolding operations even if their inode doesn't have the
+ * casefolding flag currently (and thus the casefolding ops would be no-ops for
+ * now).
  *
  * Encryption works differently in that the only dentry operation it needs is
  * d_revalidate, which it only needs on dentries that have the no-key name flag.
@@ -1550,34 +1543,22 @@ static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
  * Finally, to maximize compatibility with overlayfs (which isn't compatible
  * with certain dentry operations) and to avoid taking an unnecessary
  * performance hit, we use custom dentry_operations for each possible
- * combination rather than always installing all operations.
+ * combination of operations rather than always installing them.
  */
 void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
 {
-#ifdef CONFIG_FS_ENCRYPTION
-	bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
 #if IS_ENABLED(CONFIG_UNICODE)
-	bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-	if (needs_encrypt_ops && needs_ci_ops) {
+	if (dentry->d_sb->s_encoding) {
 		d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
 		return;
 	}
 #endif
 #ifdef CONFIG_FS_ENCRYPTION
-	if (needs_encrypt_ops) {
+	if (dentry->d_flags & DCACHE_NOKEY_NAME) {
 		d_set_d_op(dentry, &generic_encrypted_dentry_ops);
 		return;
 	}
 #endif
-#if IS_ENABLED(CONFIG_UNICODE)
-	if (needs_ci_ops) {
-		d_set_d_op(dentry, &generic_ci_dentry_ops);
-		return;
-	}
-#endif
 }
 EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 5/7] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
equivalent.  Merge them together and simplify the setup code.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
changes since v2:
  - reword comment for clarity (Eric)
---
 fs/libfs.c | 45 +++++++++++++--------------------------------
 1 file changed, 13 insertions(+), 32 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 0e5d3bb1dddc..73cd06e7ff90 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1509,7 +1509,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
 	return fscrypt_d_revalidate(dentry, flags);
 }
 
-static const struct dentry_operations generic_ci_dentry_ops = {
+static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
 	.d_revalidate_name = generic_ci_d_revalidate,
@@ -1522,26 +1522,19 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
 };
 #endif
 
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
-	.d_hash = generic_ci_d_hash,
-	.d_compare = generic_ci_d_compare,
-	.d_revalidate_name = generic_ci_d_revalidate,
-};
-#endif
-
 /**
  * generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
  * @dentry:	dentry to set ops on
  *
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively.  Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later.  As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
+ * Casefolded directories need some dentry_operations set, so that the dentries
+ * contained in them are handled case-insensitively.  Note that d_hash and
+ * d_compare are needed on the parent directory rather than on the dentries in
+ * it, and while the casefolding flag can be toggled on and off on an empty
+ * directory, dentry_operations can't be changed later.  As a result, if the
+ * filesystem has casefolding support enabled at all, we have to give all
+ * dentries the casefolding operations even if their inode doesn't have the
+ * casefolding flag currently (and thus the casefolding ops would be no-ops for
+ * now).
  *
  * Encryption works differently in that the only dentry operation it needs is
  * d_revalidate, which it only needs on dentries that have the no-key name flag.
@@ -1550,34 +1543,22 @@ static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
  * Finally, to maximize compatibility with overlayfs (which isn't compatible
  * with certain dentry operations) and to avoid taking an unnecessary
  * performance hit, we use custom dentry_operations for each possible
- * combination rather than always installing all operations.
+ * combination of operations rather than always installing them.
  */
 void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
 {
-#ifdef CONFIG_FS_ENCRYPTION
-	bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
 #if IS_ENABLED(CONFIG_UNICODE)
-	bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-	if (needs_encrypt_ops && needs_ci_ops) {
+	if (dentry->d_sb->s_encoding) {
 		d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
 		return;
 	}
 #endif
 #ifdef CONFIG_FS_ENCRYPTION
-	if (needs_encrypt_ops) {
+	if (dentry->d_flags & DCACHE_NOKEY_NAME) {
 		d_set_d_op(dentry, &generic_encrypted_dentry_ops);
 		return;
 	}
 #endif
-#if IS_ENABLED(CONFIG_UNICODE)
-	if (needs_ci_ops) {
-		d_set_d_op(dentry, &generic_ci_dentry_ops);
-		return;
-	}
-#endif
 }
 EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
 
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 6/7] ext4: Enable negative dentries on case-insensitive lookup
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Move dentry flag set closer to fscrypt code (Eric)
---
 fs/ext4/namei.c | 35 ++++-------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0caf6c730ce3..b22194a83e1a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1759,6 +1759,10 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir,
 
 	err = ext4_fname_prepare_lookup(dir, dentry, &fname);
 	generic_set_encrypted_ci_d_ops(dentry);
+
+	if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+		d_set_casefold_lookup(dentry);
+
 	if (err == -ENOENT)
 		return NULL;
 	if (err)
@@ -1866,16 +1870,6 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
 		}
 	}
 
-#if IS_ENABLED(CONFIG_UNICODE)
-	if (!inode && IS_CASEFOLDED(dir)) {
-		/* Eventually we want to call d_add_ci(dentry, NULL)
-		 * for negative dentries in the encoding case as
-		 * well.  For now, prevent the negative dentry
-		 * from being cached.
-		 */
-		return NULL;
-	}
-#endif
 	return d_splice_alias(inode, dentry);
 }
 
@@ -3206,17 +3200,6 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
 	ext4_fc_track_unlink(handle, dentry);
 	retval = ext4_mark_inode_dirty(handle, dir);
 
-#if IS_ENABLED(CONFIG_UNICODE)
-	/* VFS negative dentries are incompatible with Encoding and
-	 * Case-insensitiveness. Eventually we'll want avoid
-	 * invalidating the dentries here, alongside with returning the
-	 * negative dentries at ext4_lookup(), when it is better
-	 * supported by the VFS for the CI case.
-	 */
-	if (IS_CASEFOLDED(dir))
-		d_invalidate(dentry);
-#endif
-
 end_rmdir:
 	brelse(bh);
 	if (handle)
@@ -3317,16 +3300,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
 		goto out_trace;
 
 	retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
-	/* VFS negative dentries are incompatible with Encoding and
-	 * Case-insensitiveness. Eventually we'll want avoid
-	 * invalidating the dentries here, alongside with returning the
-	 * negative dentries at ext4_lookup(), when it is  better
-	 * supported by the VFS for the CI case.
-	 */
-	if (IS_CASEFOLDED(dir))
-		d_invalidate(dentry);
-#endif
 
 out_trace:
 	trace_ext4_unlink_exit(dentry, retval);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 6/7] ext4: Enable negative dentries on case-insensitive lookup
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Move dentry flag set closer to fscrypt code (Eric)
---
 fs/ext4/namei.c | 35 ++++-------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0caf6c730ce3..b22194a83e1a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1759,6 +1759,10 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir,
 
 	err = ext4_fname_prepare_lookup(dir, dentry, &fname);
 	generic_set_encrypted_ci_d_ops(dentry);
+
+	if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+		d_set_casefold_lookup(dentry);
+
 	if (err == -ENOENT)
 		return NULL;
 	if (err)
@@ -1866,16 +1870,6 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
 		}
 	}
 
-#if IS_ENABLED(CONFIG_UNICODE)
-	if (!inode && IS_CASEFOLDED(dir)) {
-		/* Eventually we want to call d_add_ci(dentry, NULL)
-		 * for negative dentries in the encoding case as
-		 * well.  For now, prevent the negative dentry
-		 * from being cached.
-		 */
-		return NULL;
-	}
-#endif
 	return d_splice_alias(inode, dentry);
 }
 
@@ -3206,17 +3200,6 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
 	ext4_fc_track_unlink(handle, dentry);
 	retval = ext4_mark_inode_dirty(handle, dir);
 
-#if IS_ENABLED(CONFIG_UNICODE)
-	/* VFS negative dentries are incompatible with Encoding and
-	 * Case-insensitiveness. Eventually we'll want avoid
-	 * invalidating the dentries here, alongside with returning the
-	 * negative dentries at ext4_lookup(), when it is better
-	 * supported by the VFS for the CI case.
-	 */
-	if (IS_CASEFOLDED(dir))
-		d_invalidate(dentry);
-#endif
-
 end_rmdir:
 	brelse(bh);
 	if (handle)
@@ -3317,16 +3300,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
 		goto out_trace;
 
 	retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
-	/* VFS negative dentries are incompatible with Encoding and
-	 * Case-insensitiveness. Eventually we'll want avoid
-	 * invalidating the dentries here, alongside with returning the
-	 * negative dentries at ext4_lookup(), when it is  better
-	 * supported by the VFS for the CI case.
-	 */
-	if (IS_CASEFOLDED(dir))
-		d_invalidate(dentry);
-#endif
 
 out_trace:
 	trace_ext4_unlink_exit(dentry, retval);
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 7/7] f2fs: Enable negative dentries on case-insensitive lookup
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Move dentry flag set closer to fscrypt code (Eric)
---
 fs/f2fs/namei.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index bee0568888da..fef8e2e77f75 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -533,6 +533,10 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
 
 	err = f2fs_prepare_lookup(dir, dentry, &fname);
 	generic_set_encrypted_ci_d_ops(dentry);
+
+	if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+		d_set_casefold_lookup(dentry);
+
 	if (err == -ENOENT)
 		goto out_splice;
 	if (err)
@@ -578,17 +582,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
 		goto out_iput;
 	}
 out_splice:
-#if IS_ENABLED(CONFIG_UNICODE)
-	if (!inode && IS_CASEFOLDED(dir)) {
-		/* Eventually we want to call d_add_ci(dentry, NULL)
-		 * for negative dentries in the encoding case as
-		 * well.  For now, prevent the negative dentry
-		 * from being cached.
-		 */
-		trace_f2fs_lookup_end(dir, dentry, ino, err);
-		return NULL;
-	}
-#endif
 	new = d_splice_alias(inode, dentry);
 	trace_f2fs_lookup_end(dir, !IS_ERR_OR_NULL(new) ? new : dentry,
 				ino, IS_ERR(new) ? PTR_ERR(new) : err);
@@ -641,16 +634,6 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
 	f2fs_delete_entry(de, page, dir, inode);
 	f2fs_unlock_op(sbi);
 
-#if IS_ENABLED(CONFIG_UNICODE)
-	/* VFS negative dentries are incompatible with Encoding and
-	 * Case-insensitiveness. Eventually we'll want avoid
-	 * invalidating the dentries here, alongside with returning the
-	 * negative dentries at f2fs_lookup(), when it is better
-	 * supported by the VFS for the CI case.
-	 */
-	if (IS_CASEFOLDED(dir))
-		d_invalidate(dentry);
-#endif
 	if (IS_DIRSYNC(dir))
 		f2fs_sync_fs(sbi->sb, 1);
 fail:
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [f2fs-dev] [PATCH v3 7/7] f2fs: Enable negative dentries on case-insensitive lookup
@ 2023-07-19 22:19   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-19 22:19 UTC (permalink / raw)
  To: viro, brauner, tytso, ebiggers, jaegeuk
  Cc: linux-fsdevel, Gabriel Krisman Bertazi, linux-ext4,
	Gabriel Krisman Bertazi, linux-f2fs-devel

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Move dentry flag set closer to fscrypt code (Eric)
---
 fs/f2fs/namei.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index bee0568888da..fef8e2e77f75 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -533,6 +533,10 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
 
 	err = f2fs_prepare_lookup(dir, dentry, &fname);
 	generic_set_encrypted_ci_d_ops(dentry);
+
+	if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+		d_set_casefold_lookup(dentry);
+
 	if (err == -ENOENT)
 		goto out_splice;
 	if (err)
@@ -578,17 +582,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
 		goto out_iput;
 	}
 out_splice:
-#if IS_ENABLED(CONFIG_UNICODE)
-	if (!inode && IS_CASEFOLDED(dir)) {
-		/* Eventually we want to call d_add_ci(dentry, NULL)
-		 * for negative dentries in the encoding case as
-		 * well.  For now, prevent the negative dentry
-		 * from being cached.
-		 */
-		trace_f2fs_lookup_end(dir, dentry, ino, err);
-		return NULL;
-	}
-#endif
 	new = d_splice_alias(inode, dentry);
 	trace_f2fs_lookup_end(dir, !IS_ERR_OR_NULL(new) ? new : dentry,
 				ino, IS_ERR(new) ? PTR_ERR(new) : err);
@@ -641,16 +634,6 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
 	f2fs_delete_entry(de, page, dir, inode);
 	f2fs_unlock_op(sbi);
 
-#if IS_ENABLED(CONFIG_UNICODE)
-	/* VFS negative dentries are incompatible with Encoding and
-	 * Case-insensitiveness. Eventually we'll want avoid
-	 * invalidating the dentries here, alongside with returning the
-	 * negative dentries at f2fs_lookup(), when it is better
-	 * supported by the VFS for the CI case.
-	 */
-	if (IS_CASEFOLDED(dir))
-		d_invalidate(dentry);
-#endif
 	if (IS_DIRSYNC(dir))
 		f2fs_sync_fs(sbi->sb, 1);
 fail:
-- 
2.41.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
  2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-20  6:06     ` Eric Biggers
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-20  6:06 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: viro, brauner, tytso, jaegeuk, linux-fsdevel, linux-ext4,
	linux-f2fs-devel, Gabriel Krisman Bertazi

On Wed, Jul 19, 2023 at 06:19:14PM -0400, Gabriel Krisman Bertazi wrote:
> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
> +					  const struct qstr *name,
> +					  unsigned int flags)

This shouldn't be 'inline', since it can't be inlined into anywhere anyway.

> +		if (dir && needs_casefold(dir)) {
> +			/*
> +			 * Filesystems will call into d_revalidate without
> +			 * setting LOOKUP_ flags even for file creation(see
> +			 * lookup_one* variants).  Reject negative dentries in
> +			 * this case, since we can't know for sure it won't be
> +			 * used for creation.
> +			 */
> +			if (!flags)
> +				return 0;
> +
> +			/*
> +			 * Negative dentries created prior to turning the
> +			 * directory case-insensitive cannot be trusted, since
> +			 * they don't ensure any possible case version of the
> +			 * filename doesn't exist.
> +			 */
> +			if (!d_is_casefold_lookup(dentry))
> +				return 0;
> +
> +			if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> +				/*
> +				 * ->d_name won't change from under us in the
> +				 * creation path only, since d_revalidate during
> +				 * creation and renames is always called with
> +				 * the parent inode locked.  It isn't the case
> +				 * for all lookup callpaths, so ->d_name must
> +				 * not be touched outside
> +				 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> +				 */
> +				if (dentry->d_name.len != name->len ||
> +				    memcmp(dentry->d_name.name, name->name, name->len))
> +					return 0;
> +			}
> +		}

This would be easier to follow if the '!flags' and 'flags & (LOOKUP_CREATE |
LOOKUP_RENAME_TARGET)' sections were adjacent to each other.  They group
together logically, since they both deal with the following problem: "when the
lookup might be for creation, invalidate the negative dentry if it might not be
a case-sensitive match".  (And it would help if there was a comment explaining
that problem.)  The d_is_casefold_lookup() check solves a different problem.

I'm also having trouble understanding exactly when ->d_name is stable here.
AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
without its parent's ->i_rwsem being held.  It happens when a subdirectory is
"found" under multiple names.  The VFS doesn't support directory hard links, so
if it finds a second link to a directory, it just moves the whole dentry tree to
the new location.  This can happen if a filesystem image is corrupted and
contains directory hard links.  Coincidentally, it can also happen in an
encrypted directory due to the no-key name => normal name transition...

But, negative dentries are never moved, at all.  (__d_move() even WARNs if you
ask it to move a negative dentry.)  That feels like that would make everything
else irrelevant here, though your patchset doesn't mention this.  I suppose the
problem would be what if the dentry is made positive concurrently.

I'm struggling to find an ideal solution here.  Maybe ->d_lock should just be
taken for the name comparison.  That is legal here, and it reliably synchronizes
with the dentry being moved, without having to consider anything else.

So, the following is probably what I'd do.  I'd be interested to hear your
thoughts, though:

			/*
			 * A negative dentry that was created before the
			 * directory became case-insensitive can't be trusted,
			 * since it doesn't ensure any possible case version of
			 * the filename doesn't exist.
			 */
			if (!d_is_casefold_lookup(dentry))
				return 0;

			/*
			 * If the lookup is for creation, then a negative dentry
			 * can only be reused if it's a case-sensitive match,
			 * not just a case-insensitive one.  This is required to
			 * provide case-preserving semantics.
			 *
			 * In some cases (lookup_one_*()), the lookup intent is
			 * unknown, resulting in flags==0 here.  So we have to
			 * assume that flags==0 is potentially a creation.
			 */
			if (flags == 0 ||
			    (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))) {
				bool match;

				spin_lock(&dentry->d_lock);
				match = (dentry->d_name.len == name->len &&
					 memcmp(dentry->d_name.name, name->name,
						name->len) == 0);
				spin_unlock(&dentry->d_lock);
				if (!match)
					return 0;
			}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
@ 2023-07-20  6:06     ` Eric Biggers
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-20  6:06 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

On Wed, Jul 19, 2023 at 06:19:14PM -0400, Gabriel Krisman Bertazi wrote:
> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
> +					  const struct qstr *name,
> +					  unsigned int flags)

This shouldn't be 'inline', since it can't be inlined into anywhere anyway.

> +		if (dir && needs_casefold(dir)) {
> +			/*
> +			 * Filesystems will call into d_revalidate without
> +			 * setting LOOKUP_ flags even for file creation(see
> +			 * lookup_one* variants).  Reject negative dentries in
> +			 * this case, since we can't know for sure it won't be
> +			 * used for creation.
> +			 */
> +			if (!flags)
> +				return 0;
> +
> +			/*
> +			 * Negative dentries created prior to turning the
> +			 * directory case-insensitive cannot be trusted, since
> +			 * they don't ensure any possible case version of the
> +			 * filename doesn't exist.
> +			 */
> +			if (!d_is_casefold_lookup(dentry))
> +				return 0;
> +
> +			if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> +				/*
> +				 * ->d_name won't change from under us in the
> +				 * creation path only, since d_revalidate during
> +				 * creation and renames is always called with
> +				 * the parent inode locked.  It isn't the case
> +				 * for all lookup callpaths, so ->d_name must
> +				 * not be touched outside
> +				 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> +				 */
> +				if (dentry->d_name.len != name->len ||
> +				    memcmp(dentry->d_name.name, name->name, name->len))
> +					return 0;
> +			}
> +		}

This would be easier to follow if the '!flags' and 'flags & (LOOKUP_CREATE |
LOOKUP_RENAME_TARGET)' sections were adjacent to each other.  They group
together logically, since they both deal with the following problem: "when the
lookup might be for creation, invalidate the negative dentry if it might not be
a case-sensitive match".  (And it would help if there was a comment explaining
that problem.)  The d_is_casefold_lookup() check solves a different problem.

I'm also having trouble understanding exactly when ->d_name is stable here.
AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
without its parent's ->i_rwsem being held.  It happens when a subdirectory is
"found" under multiple names.  The VFS doesn't support directory hard links, so
if it finds a second link to a directory, it just moves the whole dentry tree to
the new location.  This can happen if a filesystem image is corrupted and
contains directory hard links.  Coincidentally, it can also happen in an
encrypted directory due to the no-key name => normal name transition...

But, negative dentries are never moved, at all.  (__d_move() even WARNs if you
ask it to move a negative dentry.)  That feels like that would make everything
else irrelevant here, though your patchset doesn't mention this.  I suppose the
problem would be what if the dentry is made positive concurrently.

I'm struggling to find an ideal solution here.  Maybe ->d_lock should just be
taken for the name comparison.  That is legal here, and it reliably synchronizes
with the dentry being moved, without having to consider anything else.

So, the following is probably what I'd do.  I'd be interested to hear your
thoughts, though:

			/*
			 * A negative dentry that was created before the
			 * directory became case-insensitive can't be trusted,
			 * since it doesn't ensure any possible case version of
			 * the filename doesn't exist.
			 */
			if (!d_is_casefold_lookup(dentry))
				return 0;

			/*
			 * If the lookup is for creation, then a negative dentry
			 * can only be reused if it's a case-sensitive match,
			 * not just a case-insensitive one.  This is required to
			 * provide case-preserving semantics.
			 *
			 * In some cases (lookup_one_*()), the lookup intent is
			 * unknown, resulting in flags==0 here.  So we have to
			 * assume that flags==0 is potentially a creation.
			 */
			if (flags == 0 ||
			    (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))) {
				bool match;

				spin_lock(&dentry->d_lock);
				match = (dentry->d_name.len == name->len &&
					 memcmp(dentry->d_name.name, name->name,
						name->len) == 0);
				spin_unlock(&dentry->d_lock);
				if (!match)
					return 0;
			}


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
  2023-07-20  6:06     ` [f2fs-dev] " Eric Biggers
@ 2023-07-20  6:41       ` Eric Biggers
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-20  6:41 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
> 
> I'm also having trouble understanding exactly when ->d_name is stable here.
> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
> "found" under multiple names.  The VFS doesn't support directory hard links, so
> if it finds a second link to a directory, it just moves the whole dentry tree to
> the new location.  This can happen if a filesystem image is corrupted and
> contains directory hard links.  Coincidentally, it can also happen in an
> encrypted directory due to the no-key name => normal name transition...

Sorry, I think I got this slightly wrong.  The move does happen with the
parent's ->i_rwsem held, but it's for read, not for write.  First, before
->lookup is called, the ->i_rwsem of the parent directory is taken for read.
->lookup() calls d_splice_alias() which can call __d_unalias() which does the
__d_move().  If the old alias is in a different directory (which cannot happen
in that fscrypt case, but can happen in the general "directory hard links"
case), __d_unalias() takes that directory's ->i_rwsem for read too.

So it looks like the parent's ->i_rwsem does indeed exclude moves of child
dentries, but only if it's taken for *write*.  So I guess you can rely on that;
it's just a bit more subtle than it first appears.  Though, some of your
explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
there is still a problem.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
@ 2023-07-20  6:41       ` Eric Biggers
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-20  6:41 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
> 
> I'm also having trouble understanding exactly when ->d_name is stable here.
> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
> "found" under multiple names.  The VFS doesn't support directory hard links, so
> if it finds a second link to a directory, it just moves the whole dentry tree to
> the new location.  This can happen if a filesystem image is corrupted and
> contains directory hard links.  Coincidentally, it can also happen in an
> encrypted directory due to the no-key name => normal name transition...

Sorry, I think I got this slightly wrong.  The move does happen with the
parent's ->i_rwsem held, but it's for read, not for write.  First, before
->lookup is called, the ->i_rwsem of the parent directory is taken for read.
->lookup() calls d_splice_alias() which can call __d_unalias() which does the
__d_move().  If the old alias is in a different directory (which cannot happen
in that fscrypt case, but can happen in the general "directory hard links"
case), __d_unalias() takes that directory's ->i_rwsem for read too.

So it looks like the parent's ->i_rwsem does indeed exclude moves of child
dentries, but only if it's taken for *write*.  So I guess you can rely on that;
it's just a bit more subtle than it first appears.  Though, some of your
explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
there is still a problem.

- Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
  2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-20  7:43   ` Eric Biggers
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-20  7:43 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: viro, brauner, tytso, jaegeuk, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

Sorry, one more thing...

On Wed, Jul 19, 2023 at 06:19:11PM -0400, Gabriel Krisman Bertazi wrote:
> 
> Another problem exists when turning a negative dentry to positive.  If
> the negative dentry has a different case than what is currently being
> used for lookup, the dentry cannot be reused without changing its name,
> in order to guarantee filename-preserving semantics to userspace.  We
> need to either change the name or invalidate the dentry. This issue is
> currently avoided in mainline, since the negative dentry mechanism is
> disabled.

Are you sure this problem even needs to be solved?

It actually isn't specific to negative dentries.  If you have a file "foo"
that's not in the dcache, and you open it (or look it up in any other way) as
"FOO", then the positive dentry that gets created is named "FOO".

As a result, the name that shows up in /proc/$pid/fd/ for anyone who has the
file open is "FOO", not the true name "foo".  This is true even for processes
that open it as "foo", as long as the dentry remains in the dcache.

No negative dentries involved at all!

Is your thinking that you just don't want to increase the number of ways in
which this behavior can occur?

Or, it looks like the positive dentry case is solvable using d_add_ci().
So maybe you are planning to do that?  It's not clear to me.

- Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
@ 2023-07-20  7:43   ` Eric Biggers
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-20  7:43 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4

Sorry, one more thing...

On Wed, Jul 19, 2023 at 06:19:11PM -0400, Gabriel Krisman Bertazi wrote:
> 
> Another problem exists when turning a negative dentry to positive.  If
> the negative dentry has a different case than what is currently being
> used for lookup, the dentry cannot be reused without changing its name,
> in order to guarantee filename-preserving semantics to userspace.  We
> need to either change the name or invalidate the dentry. This issue is
> currently avoided in mainline, since the negative dentry mechanism is
> disabled.

Are you sure this problem even needs to be solved?

It actually isn't specific to negative dentries.  If you have a file "foo"
that's not in the dcache, and you open it (or look it up in any other way) as
"FOO", then the positive dentry that gets created is named "FOO".

As a result, the name that shows up in /proc/$pid/fd/ for anyone who has the
file open is "FOO", not the true name "foo".  This is true even for processes
that open it as "foo", as long as the dentry remains in the dcache.

No negative dentries involved at all!

Is your thinking that you just don't want to increase the number of ways in
which this behavior can occur?

Or, it looks like the positive dentry case is solvable using d_add_ci().
So maybe you are planning to do that?  It's not clear to me.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
  2023-07-20  7:43   ` [f2fs-dev] " Eric Biggers
@ 2023-07-20 17:35     ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-20 17:35 UTC (permalink / raw)
  To: Eric Biggers
  Cc: viro, brauner, tytso, jaegeuk, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

Eric Biggers <ebiggers@kernel.org> writes:

>> Another problem exists when turning a negative dentry to positive.  If
>> the negative dentry has a different case than what is currently being
>> used for lookup, the dentry cannot be reused without changing its name,
>> in order to guarantee filename-preserving semantics to userspace.  We
>> need to either change the name or invalidate the dentry. This issue is
>> currently avoided in mainline, since the negative dentry mechanism is
>> disabled.
>
> Are you sure this problem even needs to be solved?

Yes, because we promise name-preserving semantics.  If we don't do it,
the name on the disk might be different than what was asked for, and tools
that rely on it (they exist) will break.  During initial testing, I've
had tools breaking with case-insensitive ext4 because they created a
file, did getdents and wanted to see exactly the name they used.

> It actually isn't specific to negative dentries.  If you have a file "foo"
> that's not in the dcache, and you open it (or look it up in any other way) as
> "FOO", then the positive dentry that gets created is named "FOO".
>
> As a result, the name that shows up in /proc/$pid/fd/ for anyone who has the
> file open is "FOO", not the true name "foo".  This is true even for processes
> that open it as "foo", as long as the dentry remains in the dcache.
>
> No negative dentries involved at all!

I totally agree it is goes beyond negative dentries, but this case is
particularly important because it is the only one (that I know of) where
the incorrect case might actually be written to the disk.  other cases,
like /proc/<pid>/fd/ can just display a different case to userspace,
which is confusing.  Still, the disk has the right version, exactly as
originally created.

I see the current /proc/$pid/fd/ semantics as a bug. In fact, I have/had
a bug report for bwrap/flatkpak breaking because it was mounting
something and immediately checking /proc/mounts to confirm it worked.  A
former team member tried fixing it a while ago, but we didn't follow up,
and I don't really love the way they did it.  I need to look into that.

> Or, it looks like the positive dentry case is solvable using d_add_ci().
> So maybe you are planning to do that?  It's not clear to me.

I want to use d_add_ci for the future, yes. It is not hard, but not
trivial, because there is an infinite recursion if d_add_ci uses
->d_compare() when doing the lookup for a duplicate.  We sent a patch to
fix d_add_ci a while ago, but it was rejected.  I need to revisit.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
@ 2023-07-20 17:35     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-20 17:35 UTC (permalink / raw)
  To: Eric Biggers
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4

Eric Biggers <ebiggers@kernel.org> writes:

>> Another problem exists when turning a negative dentry to positive.  If
>> the negative dentry has a different case than what is currently being
>> used for lookup, the dentry cannot be reused without changing its name,
>> in order to guarantee filename-preserving semantics to userspace.  We
>> need to either change the name or invalidate the dentry. This issue is
>> currently avoided in mainline, since the negative dentry mechanism is
>> disabled.
>
> Are you sure this problem even needs to be solved?

Yes, because we promise name-preserving semantics.  If we don't do it,
the name on the disk might be different than what was asked for, and tools
that rely on it (they exist) will break.  During initial testing, I've
had tools breaking with case-insensitive ext4 because they created a
file, did getdents and wanted to see exactly the name they used.

> It actually isn't specific to negative dentries.  If you have a file "foo"
> that's not in the dcache, and you open it (or look it up in any other way) as
> "FOO", then the positive dentry that gets created is named "FOO".
>
> As a result, the name that shows up in /proc/$pid/fd/ for anyone who has the
> file open is "FOO", not the true name "foo".  This is true even for processes
> that open it as "foo", as long as the dentry remains in the dcache.
>
> No negative dentries involved at all!

I totally agree it is goes beyond negative dentries, but this case is
particularly important because it is the only one (that I know of) where
the incorrect case might actually be written to the disk.  other cases,
like /proc/<pid>/fd/ can just display a different case to userspace,
which is confusing.  Still, the disk has the right version, exactly as
originally created.

I see the current /proc/$pid/fd/ semantics as a bug. In fact, I have/had
a bug report for bwrap/flatkpak breaking because it was mounting
something and immediately checking /proc/mounts to confirm it worked.  A
former team member tried fixing it a while ago, but we didn't follow up,
and I don't really love the way they did it.  I need to look into that.

> Or, it looks like the positive dentry case is solvable using d_add_ci().
> So maybe you are planning to do that?  It's not clear to me.

I want to use d_add_ci for the future, yes. It is not hard, but not
trivial, because there is an infinite recursion if d_add_ci uses
->d_compare() when doing the lookup for a duplicate.  We sent a patch to
fix d_add_ci a while ago, but it was rejected.  I need to revisit.

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
  2023-07-20 17:35     ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-21  3:12       ` Eric Biggers
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-21  3:12 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: viro, brauner, tytso, jaegeuk, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

On Thu, Jul 20, 2023 at 01:35:37PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> >> Another problem exists when turning a negative dentry to positive.  If
> >> the negative dentry has a different case than what is currently being
> >> used for lookup, the dentry cannot be reused without changing its name,
> >> in order to guarantee filename-preserving semantics to userspace.  We
> >> need to either change the name or invalidate the dentry. This issue is
> >> currently avoided in mainline, since the negative dentry mechanism is
> >> disabled.
> >
> > Are you sure this problem even needs to be solved?
> 
> Yes, because we promise name-preserving semantics.  If we don't do it,
> the name on the disk might be different than what was asked for, and tools
> that rely on it (they exist) will break.  During initial testing, I've
> had tools breaking with case-insensitive ext4 because they created a
> file, did getdents and wanted to see exactly the name they used.
> 
> > It actually isn't specific to negative dentries.  If you have a file "foo"
> > that's not in the dcache, and you open it (or look it up in any other way) as
> > "FOO", then the positive dentry that gets created is named "FOO".
> >
> > As a result, the name that shows up in /proc/$pid/fd/ for anyone who has the
> > file open is "FOO", not the true name "foo".  This is true even for processes
> > that open it as "foo", as long as the dentry remains in the dcache.
> >
> > No negative dentries involved at all!
> 
> I totally agree it is goes beyond negative dentries, but this case is
> particularly important because it is the only one (that I know of) where
> the incorrect case might actually be written to the disk.  other cases,
> like /proc/<pid>/fd/ can just display a different case to userspace,
> which is confusing.  Still, the disk has the right version, exactly as
> originally created.
> 
> I see the current /proc/$pid/fd/ semantics as a bug. In fact, I have/had
> a bug report for bwrap/flatkpak breaking because it was mounting
> something and immediately checking /proc/mounts to confirm it worked.  A
> former team member tried fixing it a while ago, but we didn't follow up,
> and I don't really love the way they did it.  I need to look into that.
> 
> > Or, it looks like the positive dentry case is solvable using d_add_ci().
> > So maybe you are planning to do that?  It's not clear to me.
> 
> I want to use d_add_ci for the future, yes. It is not hard, but not
> trivial, because there is an infinite recursion if d_add_ci uses
> ->d_compare() when doing the lookup for a duplicate.  We sent a patch to
> fix d_add_ci a while ago, but it was rejected.  I need to revisit.
> 

Thanks, I missed that choosing a different-case dentry actually changes the name
given to the new file.  This is because the filesystem is told about the name of
the file to create via the negative dentry that gets found/created -- not via
the original user-specified string.  It would help if you made this clear in a
code comment.  Taking the comment I suggested, I'd maybe revise it to:

			/*
			 * If the lookup is for creation, then a negative dentry
			 * can only be reused if it's a case-sensitive match,
                         * not just a case-insensitive one.  This is needed to
                         * make the new file be created with the name the user
                         * specified, preserving case.
			 *

- Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs
@ 2023-07-21  3:12       ` Eric Biggers
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-21  3:12 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4

On Thu, Jul 20, 2023 at 01:35:37PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> >> Another problem exists when turning a negative dentry to positive.  If
> >> the negative dentry has a different case than what is currently being
> >> used for lookup, the dentry cannot be reused without changing its name,
> >> in order to guarantee filename-preserving semantics to userspace.  We
> >> need to either change the name or invalidate the dentry. This issue is
> >> currently avoided in mainline, since the negative dentry mechanism is
> >> disabled.
> >
> > Are you sure this problem even needs to be solved?
> 
> Yes, because we promise name-preserving semantics.  If we don't do it,
> the name on the disk might be different than what was asked for, and tools
> that rely on it (they exist) will break.  During initial testing, I've
> had tools breaking with case-insensitive ext4 because they created a
> file, did getdents and wanted to see exactly the name they used.
> 
> > It actually isn't specific to negative dentries.  If you have a file "foo"
> > that's not in the dcache, and you open it (or look it up in any other way) as
> > "FOO", then the positive dentry that gets created is named "FOO".
> >
> > As a result, the name that shows up in /proc/$pid/fd/ for anyone who has the
> > file open is "FOO", not the true name "foo".  This is true even for processes
> > that open it as "foo", as long as the dentry remains in the dcache.
> >
> > No negative dentries involved at all!
> 
> I totally agree it is goes beyond negative dentries, but this case is
> particularly important because it is the only one (that I know of) where
> the incorrect case might actually be written to the disk.  other cases,
> like /proc/<pid>/fd/ can just display a different case to userspace,
> which is confusing.  Still, the disk has the right version, exactly as
> originally created.
> 
> I see the current /proc/$pid/fd/ semantics as a bug. In fact, I have/had
> a bug report for bwrap/flatkpak breaking because it was mounting
> something and immediately checking /proc/mounts to confirm it worked.  A
> former team member tried fixing it a while ago, but we didn't follow up,
> and I don't really love the way they did it.  I need to look into that.
> 
> > Or, it looks like the positive dentry case is solvable using d_add_ci().
> > So maybe you are planning to do that?  It's not clear to me.
> 
> I want to use d_add_ci for the future, yes. It is not hard, but not
> trivial, because there is an infinite recursion if d_add_ci uses
> ->d_compare() when doing the lookup for a duplicate.  We sent a patch to
> fix d_add_ci a while ago, but it was rejected.  I need to revisit.
> 

Thanks, I missed that choosing a different-case dentry actually changes the name
given to the new file.  This is because the filesystem is told about the name of
the file to create via the negative dentry that gets found/created -- not via
the original user-specified string.  It would help if you made this clear in a
code comment.  Taking the comment I suggested, I'd maybe revise it to:

			/*
			 * If the lookup is for creation, then a negative dentry
			 * can only be reused if it's a case-sensitive match,
                         * not just a case-insensitive one.  This is needed to
                         * make the new file be created with the name the user
                         * specified, preserving case.
			 *

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
  2023-07-20  6:41       ` Eric Biggers
@ 2023-07-21 20:16         ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-21 20:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

Eric Biggers <ebiggers@kernel.org> writes:

> On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
>> 
>> I'm also having trouble understanding exactly when ->d_name is stable here.
>> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
>> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
>> "found" under multiple names.  The VFS doesn't support directory hard links, so
>> if it finds a second link to a directory, it just moves the whole dentry tree to
>> the new location.  This can happen if a filesystem image is corrupted and
>> contains directory hard links.  Coincidentally, it can also happen in an
>> encrypted directory due to the no-key name => normal name transition...
>
> Sorry, I think I got this slightly wrong.  The move does happen with the
> parent's ->i_rwsem held, but it's for read, not for write.  First, before
> ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
> ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
> __d_move().  If the old alias is in a different directory (which cannot happen
> in that fscrypt case, but can happen in the general "directory hard links"
> case), __d_unalias() takes that directory's ->i_rwsem for read too.
>
> So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> dentries, but only if it's taken for *write*.  So I guess you can rely on that;
> it's just a bit more subtle than it first appears.  Though, some of your
> explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
> either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
> there is still a problem.

I think I'm missing something on your clarification. I see your point
about __d_unalias, and I see in the case where alias->d_parent !=
dentry->d_parent we acquire the parent inode read lock:

static int __d_unalias(struct inode *inode,
		struct dentry *dentry, struct dentry *alias)
{
...
	m1 = &dentry->d_sb->s_vfs_rename_mutex;
	if (!inode_trylock_shared(alias->d_parent->d_inode))
		goto out_err;
}

And it seems to use that for __d_move. In this case, __d_move changes
from under us even with a read lock, which is dangerous.  I think I
agree with your first email more than the clarification.

In the lookup_slow then:

lookup_slow()
  d_lookup()
    d_splice_alias()
      __d_unalias()
        __d_move()

this __d_move Can do a dentry move and race with d_revalidate even
though it has the parent read lock.

> So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> dentries, but only if it's taken for *write*.  So I guess you can rely on that;

We can get away of it with acquiring the d_lock as you suggested, I
think.  But can you clarify the above? I wanna make sure I didn't miss
anything. I am indeed relying only on the read lock here, as you can see.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
@ 2023-07-21 20:16         ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-21 20:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

Eric Biggers <ebiggers@kernel.org> writes:

> On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
>> 
>> I'm also having trouble understanding exactly when ->d_name is stable here.
>> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
>> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
>> "found" under multiple names.  The VFS doesn't support directory hard links, so
>> if it finds a second link to a directory, it just moves the whole dentry tree to
>> the new location.  This can happen if a filesystem image is corrupted and
>> contains directory hard links.  Coincidentally, it can also happen in an
>> encrypted directory due to the no-key name => normal name transition...
>
> Sorry, I think I got this slightly wrong.  The move does happen with the
> parent's ->i_rwsem held, but it's for read, not for write.  First, before
> ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
> ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
> __d_move().  If the old alias is in a different directory (which cannot happen
> in that fscrypt case, but can happen in the general "directory hard links"
> case), __d_unalias() takes that directory's ->i_rwsem for read too.
>
> So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> dentries, but only if it's taken for *write*.  So I guess you can rely on that;
> it's just a bit more subtle than it first appears.  Though, some of your
> explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
> either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
> there is still a problem.

I think I'm missing something on your clarification. I see your point
about __d_unalias, and I see in the case where alias->d_parent !=
dentry->d_parent we acquire the parent inode read lock:

static int __d_unalias(struct inode *inode,
		struct dentry *dentry, struct dentry *alias)
{
...
	m1 = &dentry->d_sb->s_vfs_rename_mutex;
	if (!inode_trylock_shared(alias->d_parent->d_inode))
		goto out_err;
}

And it seems to use that for __d_move. In this case, __d_move changes
from under us even with a read lock, which is dangerous.  I think I
agree with your first email more than the clarification.

In the lookup_slow then:

lookup_slow()
  d_lookup()
    d_splice_alias()
      __d_unalias()
        __d_move()

this __d_move Can do a dentry move and race with d_revalidate even
though it has the parent read lock.

> So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> dentries, but only if it's taken for *write*.  So I guess you can rely on that;

We can get away of it with acquiring the d_lock as you suggested, I
think.  But can you clarify the above? I wanna make sure I didn't miss
anything. I am indeed relying only on the read lock here, as you can see.

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
  2023-07-21 20:16         ` [f2fs-dev] " Gabriel Krisman Bertazi
@ 2023-07-22  4:29           ` Eric Biggers
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-22  4:29 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

On Fri, Jul 21, 2023 at 04:16:30PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
> >> 
> >> I'm also having trouble understanding exactly when ->d_name is stable here.
> >> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
> >> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
> >> "found" under multiple names.  The VFS doesn't support directory hard links, so
> >> if it finds a second link to a directory, it just moves the whole dentry tree to
> >> the new location.  This can happen if a filesystem image is corrupted and
> >> contains directory hard links.  Coincidentally, it can also happen in an
> >> encrypted directory due to the no-key name => normal name transition...
> >
> > Sorry, I think I got this slightly wrong.  The move does happen with the
> > parent's ->i_rwsem held, but it's for read, not for write.  First, before
> > ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
> > ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
> > __d_move().  If the old alias is in a different directory (which cannot happen
> > in that fscrypt case, but can happen in the general "directory hard links"
> > case), __d_unalias() takes that directory's ->i_rwsem for read too.
> >
> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
> > it's just a bit more subtle than it first appears.  Though, some of your
> > explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
> > either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
> > there is still a problem.
> 
> I think I'm missing something on your clarification. I see your point
> about __d_unalias, and I see in the case where alias->d_parent !=
> dentry->d_parent we acquire the parent inode read lock:
> 
> static int __d_unalias(struct inode *inode,
> 		struct dentry *dentry, struct dentry *alias)
> {
> ...
> 	m1 = &dentry->d_sb->s_vfs_rename_mutex;
> 	if (!inode_trylock_shared(alias->d_parent->d_inode))
> 		goto out_err;
> }
> 
> And it seems to use that for __d_move. In this case, __d_move changes
> from under us even with a read lock, which is dangerous.  I think I
> agree with your first email more than the clarification.
> 
> In the lookup_slow then:
> 
> lookup_slow()
>   d_lookup()
>     d_splice_alias()
>       __d_unalias()
>         __d_move()
> 
> this __d_move Can do a dentry move and race with d_revalidate even
> though it has the parent read lock.
> 
> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
> 
> We can get away of it with acquiring the d_lock as you suggested, I
> think.  But can you clarify the above? I wanna make sure I didn't miss
> anything. I am indeed relying only on the read lock here, as you can see.

In my first email I thought that __d_move() can be called without the parent
inode's i_rwsem held at all.  In my second email I realized that it is always
called with at least a read (shared) lock.

The question is what kind of parent i_rwsem lock is guaranteed to be held by the
caller of ->d_revalidate() when the name comparison is done.  Based on the
above, it needs to be at least a write (exclusive) lock to exclude __d_move()
without taking d_lock.  However, your analysis (in the commit message of "libfs:
Validate negative dentries in case-insensitive directories") only talks about
i_rwsem being "taken", without saying whether it's for read or write.  One thing
you mentioned as taking i_rwsem is lookup_slow, but that only takes it for read.
That seems like a problem, as it makes your analysis not correct.

- Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
@ 2023-07-22  4:29           ` Eric Biggers
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Biggers @ 2023-07-22  4:29 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

On Fri, Jul 21, 2023 at 04:16:30PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
> >> 
> >> I'm also having trouble understanding exactly when ->d_name is stable here.
> >> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
> >> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
> >> "found" under multiple names.  The VFS doesn't support directory hard links, so
> >> if it finds a second link to a directory, it just moves the whole dentry tree to
> >> the new location.  This can happen if a filesystem image is corrupted and
> >> contains directory hard links.  Coincidentally, it can also happen in an
> >> encrypted directory due to the no-key name => normal name transition...
> >
> > Sorry, I think I got this slightly wrong.  The move does happen with the
> > parent's ->i_rwsem held, but it's for read, not for write.  First, before
> > ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
> > ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
> > __d_move().  If the old alias is in a different directory (which cannot happen
> > in that fscrypt case, but can happen in the general "directory hard links"
> > case), __d_unalias() takes that directory's ->i_rwsem for read too.
> >
> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
> > it's just a bit more subtle than it first appears.  Though, some of your
> > explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
> > either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
> > there is still a problem.
> 
> I think I'm missing something on your clarification. I see your point
> about __d_unalias, and I see in the case where alias->d_parent !=
> dentry->d_parent we acquire the parent inode read lock:
> 
> static int __d_unalias(struct inode *inode,
> 		struct dentry *dentry, struct dentry *alias)
> {
> ...
> 	m1 = &dentry->d_sb->s_vfs_rename_mutex;
> 	if (!inode_trylock_shared(alias->d_parent->d_inode))
> 		goto out_err;
> }
> 
> And it seems to use that for __d_move. In this case, __d_move changes
> from under us even with a read lock, which is dangerous.  I think I
> agree with your first email more than the clarification.
> 
> In the lookup_slow then:
> 
> lookup_slow()
>   d_lookup()
>     d_splice_alias()
>       __d_unalias()
>         __d_move()
> 
> this __d_move Can do a dentry move and race with d_revalidate even
> though it has the parent read lock.
> 
> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
> 
> We can get away of it with acquiring the d_lock as you suggested, I
> think.  But can you clarify the above? I wanna make sure I didn't miss
> anything. I am indeed relying only on the read lock here, as you can see.

In my first email I thought that __d_move() can be called without the parent
inode's i_rwsem held at all.  In my second email I realized that it is always
called with at least a read (shared) lock.

The question is what kind of parent i_rwsem lock is guaranteed to be held by the
caller of ->d_revalidate() when the name comparison is done.  Based on the
above, it needs to be at least a write (exclusive) lock to exclude __d_move()
without taking d_lock.  However, your analysis (in the commit message of "libfs:
Validate negative dentries in case-insensitive directories") only talks about
i_rwsem being "taken", without saying whether it's for read or write.  One thing
you mentioned as taking i_rwsem is lookup_slow, but that only takes it for read.
That seems like a problem, as it makes your analysis not correct.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
  2023-07-22  4:29           ` [f2fs-dev] " Eric Biggers
@ 2023-07-24 21:33             ` Gabriel Krisman Bertazi
  -1 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-24 21:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

Eric Biggers <ebiggers@kernel.org> writes:

> On Fri, Jul 21, 2023 at 04:16:30PM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>> > On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
>> >> 
>> >> I'm also having trouble understanding exactly when ->d_name is stable here.
>> >> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
>> >> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
>> >> "found" under multiple names.  The VFS doesn't support directory hard links, so
>> >> if it finds a second link to a directory, it just moves the whole dentry tree to
>> >> the new location.  This can happen if a filesystem image is corrupted and
>> >> contains directory hard links.  Coincidentally, it can also happen in an
>> >> encrypted directory due to the no-key name => normal name transition...
>> >
>> > Sorry, I think I got this slightly wrong.  The move does happen with the
>> > parent's ->i_rwsem held, but it's for read, not for write.  First, before
>> > ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
>> > ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
>> > __d_move().  If the old alias is in a different directory (which cannot happen
>> > in that fscrypt case, but can happen in the general "directory hard links"
>> > case), __d_unalias() takes that directory's ->i_rwsem for read too.
>> >
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> > it's just a bit more subtle than it first appears.  Though, some of your
>> > explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
>> > either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
>> > there is still a problem.
>> 
>> I think I'm missing something on your clarification. I see your point
>> about __d_unalias, and I see in the case where alias->d_parent !=
>> dentry->d_parent we acquire the parent inode read lock:
>> 
>> static int __d_unalias(struct inode *inode,
>> 		struct dentry *dentry, struct dentry *alias)
>> {
>> ...
>> 	m1 = &dentry->d_sb->s_vfs_rename_mutex;
>> 	if (!inode_trylock_shared(alias->d_parent->d_inode))
>> 		goto out_err;
>> }

>> this __d_move Can do a dentry move and race with d_revalidate even
>> though it has the parent read lock.
>> 
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> 
>> We can get away of it with acquiring the d_lock as you suggested, I
>> think.  But can you clarify the above? I wanna make sure I didn't miss
>> anything. I am indeed relying only on the read lock here, as you can see.
>
> In my first email I thought that __d_move() can be called without the parent
> inode's i_rwsem held at all.  In my second email I realized that it is always
> called with at least a read (shared) lock.

I see. Thank you.  We are on the same page now.   I was confused by
this part of your second comment:

>> > I guess you can rely on that; it's just a bit more subtle than it
>> > first appears.  Though, some of your explanation seems to assume
>> > that a read lock is sufficient ("In __lookup_slow, either the
>> > parent inode is locked by the caller (lookup_slow) ..."),

...because I was then failing to see, after learning about the __d_move
case, how I could rely on the inode read lock.  But, as I now realize,
__d_move is not called for negative dentries, so lookup_slow is indeed
safe.

> The question is what kind of parent i_rwsem lock is guaranteed to be held by the
> caller of ->d_revalidate() when the name comparison is done.  Based on the
> above, it needs to be at least a write (exclusive) lock to exclude __d_move()
> without taking d_lock.  However, your analysis (in the commit message of "libfs:
> Validate negative dentries in case-insensitive directories") only talks about
> i_rwsem being "taken", without saying whether it's for read or write.  One thing
> you mentioned as taking i_rwsem is lookup_slow, but that only takes it for read.
> That seems like a problem, as it makes your analysis not correct.

My understanding and explanation was that a read lock should be enough
at all times, despite the __d_move case.  Any time d_revalidate is
called for a (LOOKUP_CREATE | LOOKUP_RENAME_TARGET), it holds at least
the read lock, preventing concurrent changes to d_name of negative
dentries.

I will review the places that touch ->d_name again and I will keep the
patch as-is and update my explanation to include this case.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
@ 2023-07-24 21:33             ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 32+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-24 21:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: brauner, tytso, linux-f2fs-devel, viro, linux-fsdevel, jaegeuk,
	linux-ext4, Gabriel Krisman Bertazi

Eric Biggers <ebiggers@kernel.org> writes:

> On Fri, Jul 21, 2023 at 04:16:30PM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>> > On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
>> >> 
>> >> I'm also having trouble understanding exactly when ->d_name is stable here.
>> >> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
>> >> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
>> >> "found" under multiple names.  The VFS doesn't support directory hard links, so
>> >> if it finds a second link to a directory, it just moves the whole dentry tree to
>> >> the new location.  This can happen if a filesystem image is corrupted and
>> >> contains directory hard links.  Coincidentally, it can also happen in an
>> >> encrypted directory due to the no-key name => normal name transition...
>> >
>> > Sorry, I think I got this slightly wrong.  The move does happen with the
>> > parent's ->i_rwsem held, but it's for read, not for write.  First, before
>> > ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
>> > ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
>> > __d_move().  If the old alias is in a different directory (which cannot happen
>> > in that fscrypt case, but can happen in the general "directory hard links"
>> > case), __d_unalias() takes that directory's ->i_rwsem for read too.
>> >
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> > it's just a bit more subtle than it first appears.  Though, some of your
>> > explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
>> > either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
>> > there is still a problem.
>> 
>> I think I'm missing something on your clarification. I see your point
>> about __d_unalias, and I see in the case where alias->d_parent !=
>> dentry->d_parent we acquire the parent inode read lock:
>> 
>> static int __d_unalias(struct inode *inode,
>> 		struct dentry *dentry, struct dentry *alias)
>> {
>> ...
>> 	m1 = &dentry->d_sb->s_vfs_rename_mutex;
>> 	if (!inode_trylock_shared(alias->d_parent->d_inode))
>> 		goto out_err;
>> }

>> this __d_move Can do a dentry move and race with d_revalidate even
>> though it has the parent read lock.
>> 
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> 
>> We can get away of it with acquiring the d_lock as you suggested, I
>> think.  But can you clarify the above? I wanna make sure I didn't miss
>> anything. I am indeed relying only on the read lock here, as you can see.
>
> In my first email I thought that __d_move() can be called without the parent
> inode's i_rwsem held at all.  In my second email I realized that it is always
> called with at least a read (shared) lock.

I see. Thank you.  We are on the same page now.   I was confused by
this part of your second comment:

>> > I guess you can rely on that; it's just a bit more subtle than it
>> > first appears.  Though, some of your explanation seems to assume
>> > that a read lock is sufficient ("In __lookup_slow, either the
>> > parent inode is locked by the caller (lookup_slow) ..."),

...because I was then failing to see, after learning about the __d_move
case, how I could rely on the inode read lock.  But, as I now realize,
__d_move is not called for negative dentries, so lookup_slow is indeed
safe.

> The question is what kind of parent i_rwsem lock is guaranteed to be held by the
> caller of ->d_revalidate() when the name comparison is done.  Based on the
> above, it needs to be at least a write (exclusive) lock to exclude __d_move()
> without taking d_lock.  However, your analysis (in the commit message of "libfs:
> Validate negative dentries in case-insensitive directories") only talks about
> i_rwsem being "taken", without saying whether it's for read or write.  One thing
> you mentioned as taking i_rwsem is lookup_slow, but that only takes it for read.
> That seems like a problem, as it makes your analysis not correct.

My understanding and explanation was that a read lock should be enough
at all times, despite the __d_move case.  Any time d_revalidate is
called for a (LOOKUP_CREATE | LOOKUP_RENAME_TARGET), it holds at least
the read lock, preventing concurrent changes to d_name of negative
dentries.

I will review the places that touch ->d_name again and I will keep the
patch as-is and update my explanation to include this case.

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-07-24 21:34 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-19 22:19 [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs Gabriel Krisman Bertazi
2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 1/7] fs: Expose name under lookup to d_revalidate hook Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 2/7] fs: Add DCACHE_CASEFOLDED_NAME flag Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-20  6:06   ` Eric Biggers
2023-07-20  6:06     ` [f2fs-dev] " Eric Biggers
2023-07-20  6:41     ` Eric Biggers
2023-07-20  6:41       ` Eric Biggers
2023-07-21 20:16       ` Gabriel Krisman Bertazi
2023-07-21 20:16         ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-22  4:29         ` Eric Biggers
2023-07-22  4:29           ` [f2fs-dev] " Eric Biggers
2023-07-24 21:33           ` Gabriel Krisman Bertazi
2023-07-24 21:33             ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 4/7] libfs: Chain encryption checks after case-insensitive revalidation Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 5/7] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 6/7] ext4: Enable negative dentries on case-insensitive lookup Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 7/7] f2fs: " Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-20  7:43 ` [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs Eric Biggers
2023-07-20  7:43   ` [f2fs-dev] " Eric Biggers
2023-07-20 17:35   ` Gabriel Krisman Bertazi
2023-07-20 17:35     ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-21  3:12     ` Eric Biggers
2023-07-21  3:12       ` [f2fs-dev] " Eric Biggers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.