linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] fs: rename S_KERNEL_FILE
@ 2022-01-28  7:47 Christoph Hellwig
  2022-01-28  9:01 ` Chaitanya Kulkarni
  2022-01-28 10:12 ` David Howells
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-01-28  7:47 UTC (permalink / raw)
  To: torvalds, viro, dhowells; +Cc: linux-fsdevel, linux-kernel, Amir Goldstein

S_KERNEL_FILE is grossly misnamed.  We have plenty of files hold open by
the kernel kernel using filp_open.  This flag OTOH signals the inode as
being a special snowflake that cachefiles holds onto that can't be
unlinked because of ..., erm, pixie dust.  So clearly mark it as such.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---

Changes since v1:
 - fix commit message typos

 fs/cachefiles/namei.c | 12 ++++++------
 fs/namei.c            |  2 +-
 include/linux/fs.h    |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 9bd692870617c..599dc13a7d9ab 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -20,8 +20,8 @@ static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
 	struct inode *inode = d_backing_inode(dentry);
 	bool can_use = false;
 
-	if (!(inode->i_flags & S_KERNEL_FILE)) {
-		inode->i_flags |= S_KERNEL_FILE;
+	if (!(inode->i_flags & S_CACHEFILE)) {
+		inode->i_flags |= S_CACHEFILE;
 		trace_cachefiles_mark_active(object, inode);
 		can_use = true;
 	} else {
@@ -51,7 +51,7 @@ static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
 {
 	struct inode *inode = d_backing_inode(dentry);
 
-	inode->i_flags &= ~S_KERNEL_FILE;
+	inode->i_flags &= ~S_CACHEFILE;
 	trace_cachefiles_mark_inactive(object, inode);
 }
 
@@ -742,7 +742,7 @@ static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
 		goto lookup_error;
 	if (d_is_negative(victim))
 		goto lookup_put;
-	if (d_inode(victim)->i_flags & S_KERNEL_FILE)
+	if (d_inode(victim)->i_flags & S_CACHEFILE)
 		goto lookup_busy;
 	return victim;
 
@@ -789,11 +789,11 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	/* check to see if someone is using this object */
 	inode = d_inode(victim);
 	inode_lock(inode);
-	if (inode->i_flags & S_KERNEL_FILE) {
+	if (inode->i_flags & S_CACHEFILE) {
 		ret = -EBUSY;
 	} else {
 		/* Stop the cache from picking it back up */
-		inode->i_flags |= S_KERNEL_FILE;
+		inode->i_flags |= S_CACHEFILE;
 		ret = 0;
 	}
 	inode_unlock(inode);
diff --git a/fs/namei.c b/fs/namei.c
index d81f04f8d8188..7402277ecc1f5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3959,7 +3959,7 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
 
 	error = -EBUSY;
 	if (is_local_mountpoint(dentry) ||
-	    (dentry->d_inode->i_flags & S_KERNEL_FILE))
+	    (dentry->d_inode->i_flags & S_CACHEFILE))
 		goto out;
 
 	error = security_inode_rmdir(dir, dentry);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8510da6cc6dc..099d7e03d46e6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2173,7 +2173,7 @@ struct super_operations {
 #define S_ENCRYPTED	(1 << 14) /* Encrypted file (using fs/crypto/) */
 #define S_CASEFOLD	(1 << 15) /* Casefolded file */
 #define S_VERITY	(1 << 16) /* Verity file (using fs/verity/) */
-#define S_KERNEL_FILE	(1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
+#define S_CACHEFILE	(1 << 17) /* In use as cachefile, can't be unlinked */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs: rename S_KERNEL_FILE
  2022-01-28  7:47 [PATCH v2] fs: rename S_KERNEL_FILE Christoph Hellwig
@ 2022-01-28  9:01 ` Chaitanya Kulkarni
  2022-01-28 10:12 ` David Howells
  1 sibling, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2022-01-28  9:01 UTC (permalink / raw)
  To: Christoph Hellwig, torvalds, viro, dhowells
  Cc: linux-fsdevel, linux-kernel, Amir Goldstein

On 1/27/22 11:47 PM, Christoph Hellwig wrote:
> S_KERNEL_FILE is grossly misnamed.  We have plenty of files hold open by
> the kernel kernel using filp_open.  This flag OTOH signals the inode as
> being a special snowflake that cachefiles holds onto that can't be
> unlinked because of ..., erm, pixie dust.  So clearly mark it as such.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs: rename S_KERNEL_FILE
  2022-01-28  7:47 [PATCH v2] fs: rename S_KERNEL_FILE Christoph Hellwig
  2022-01-28  9:01 ` Chaitanya Kulkarni
@ 2022-01-28 10:12 ` David Howells
  2022-01-28 11:09   ` Amir Goldstein
  2022-01-28 11:35   ` David Howells
  1 sibling, 2 replies; 7+ messages in thread
From: David Howells @ 2022-01-28 10:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, torvalds, viro, linux-fsdevel, linux-kernel,
	Amir Goldstein, Chaitanya Kulkarni

Christoph Hellwig <hch@lst.de> wrote:

> S_KERNEL_FILE is grossly misnamed.  We have plenty of files hold

"held".

> open by the kernel kernel using filp_open.

You said "kernel" twice.

And so what?  Cachefiles holds files open by filp_open - but it can only do so
temporarily otherwise EMFILE/ENFILE and OOMs start to become a serious problem
because it could end up holding thousands of files open (one or two of the
xfstests cause this to happen).

Further, holding the file open *doesn't* prevent cachefilesd from trying to
cull the file to make space whilst it's "in use".

Yet further, I'm not holding the directories that form the cache layout open
with filp_open at all - I'm not reading them, so that would just be a waste of
resources - but I really don't want cachefilesd culling them because it sees
they're empty but cachefiles is holding them ready.

> This flag OTOH signals the inode as being a special snowflake that
> cachefiles holds onto that can't be unlinked because of ..., erm, pixie
> dust.

Really?  I presume you read the explanation I gave of the races that are a
consequence of splitting the driver between the kernel and userspace?  I could
avoid them - or at least mitigate them - by keeping my own list of all the
inodes in use by cachefiles so that cachefilesd can query it.  I did, in fact,
use to have such a list, but the core kernel already has such lists and the
facilities to translate pathnames into internal objects, so my stuff ought to
be redundant - all I need is one inode flag.

Further, that inode flag can be shared with anyone else who wants to do
something similar.  It's just an "I'm using this" lock.  There's no particular
reason to limit its use to cachefiles.  In fact, it is better if it is then
shared so that in the unlikely event that two drivers try to use the same
file, an error will occur.

I do use it to defend cachefiles against itself also.  In the event that
there's a bug or a race and it tries to reuse its own cache - particularly
with something like NFS that can have multiple superblocks for the same
source, just with different I/O parameters, for example - this can lead to
data corruption.  I try to defend against it in fscache also, but there can be
delayed effects due to object finalisation being done in the background after
fscache has returned to the netfs.

Now, I do think there's an argument to be made for splitting the flag into
two, as I advanced in a proposed patch.  One piece would just be an "I'm using
this", the other would be a "don't delete this" flag.  Both are of potential
use to other drivers.

I take it you'd prefer this to be done a different way?

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs: rename S_KERNEL_FILE
  2022-01-28 10:12 ` David Howells
@ 2022-01-28 11:09   ` Amir Goldstein
  2022-01-28 11:35   ` David Howells
  1 sibling, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2022-01-28 11:09 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, Linus Torvalds, Al Viro, linux-fsdevel,
	linux-kernel, Chaitanya Kulkarni, Miklos Szeredi

On Fri, Jan 28, 2022 at 12:12 PM David Howells <dhowells@redhat.com> wrote:
>
> Christoph Hellwig <hch@lst.de> wrote:
>
> > S_KERNEL_FILE is grossly misnamed.  We have plenty of files hold
>
> "held".
>
> > open by the kernel kernel using filp_open.
>
> You said "kernel" twice.
>
> And so what?  Cachefiles holds files open by filp_open - but it can only do so
> temporarily otherwise EMFILE/ENFILE and OOMs start to become a serious problem
> because it could end up holding thousands of files open (one or two of the
> xfstests cause this to happen).
>
> Further, holding the file open *doesn't* prevent cachefilesd from trying to
> cull the file to make space whilst it's "in use".
>
> Yet further, I'm not holding the directories that form the cache layout open
> with filp_open at all - I'm not reading them, so that would just be a waste of
> resources - but I really don't want cachefilesd culling them because it sees
> they're empty but cachefiles is holding them ready.
>
> > This flag OTOH signals the inode as being a special snowflake that
> > cachefiles holds onto that can't be unlinked because of ..., erm, pixie
> > dust.
>
> Really?  I presume you read the explanation I gave of the races that are a
> consequence of splitting the driver between the kernel and userspace?  I could
> avoid them - or at least mitigate them - by keeping my own list of all the
> inodes in use by cachefiles so that cachefilesd can query it.  I did, in fact,
> use to have such a list, but the core kernel already has such lists and the
> facilities to translate pathnames into internal objects, so my stuff ought to
> be redundant - all I need is one inode flag.
>
> Further, that inode flag can be shared with anyone else who wants to do
> something similar.  It's just an "I'm using this" lock.  There's no particular
> reason to limit its use to cachefiles.  In fact, it is better if it is then
> shared so that in the unlikely event that two drivers try to use the same
> file, an error will occur.
>

Good idea, but then the helpers to set the flag should not be internal
to cachefiles and the locking semantics should be clear.
FYI, overlayfs already takes an "exclusive lock" on upper/work dir
among all ovl instances.

How do you feel about hoisting ovl_inuse_* helpers to fs.h
and rename s/I_OVL_INUSE/I_EXCL_INUSE?

Whether deny rmdir should have its own flag or not I don't know,
but from ovl POV I *think* it should not be a problem to deny rmdir
for the ovl upper/work dirs as long as ovl is mounted(?).

From our experience, adding the exclusive lock caused regressions
in setups with container runtimes that had mount leaks bugs.
I am hoping that all those mount leaks bugs were fixed, but one never
knows what sort of regressions deny rmdir of upper/work may cause.

Another problem with generic deny of rmdir is that users getting
EBUSY have no way to figure out the reason.
At least for a specific subsystem (i.e. cachefiles) users should be able
to check if the denied dir is in the subsystem's inventory(?)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs: rename S_KERNEL_FILE
  2022-01-28 10:12 ` David Howells
  2022-01-28 11:09   ` Amir Goldstein
@ 2022-01-28 11:35   ` David Howells
  2022-01-28 13:17     ` Al Viro
  1 sibling, 1 reply; 7+ messages in thread
From: David Howells @ 2022-01-28 11:35 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: dhowells, Christoph Hellwig, Linus Torvalds, Al Viro,
	linux-fsdevel, linux-kernel, Chaitanya Kulkarni, Miklos Szeredi

Amir Goldstein <amir73il@gmail.com> wrote:

> Good idea, but then the helpers to set the flag should not be internal
> to cachefiles and the locking semantics should be clear.

I could move them out, at least partially.  They do log some information
that's private to cachefiles through the tracepoint, but it's just one number
and could be passed in as a parameter.  I could move the tracepoint to
somewhere more generic.

> FYI, overlayfs already takes an "exclusive lock" on upper/work dir
> among all ovl instances.
> 
> How do you feel about hoisting ovl_inuse_* helpers to fs.h
> and rename s/I_OVL_INUSE/I_EXCL_INUSE?

Fine by me.  Sharing a cache with or through an overlay would make for very
fun coherency management.

> Whether deny rmdir should have its own flag or not I don't know,
> but from ovl POV I *think* it should not be a problem to deny rmdir
> for the ovl upper/work dirs as long as ovl is mounted(?).

What's the consequence of someone rearranging the directories directly in the
contributing dirs whilst there's an overlay over them?

> Another problem with generic deny of rmdir is that users getting
> EBUSY have no way to figure out the reason.
> At least for a specific subsystem (i.e. cachefiles) users should be able
> to check if the denied dir is in the subsystem's inventory(?)

I could add a tracepoint for that.  It could form a set with the cachefiles
tracepoints if I move those out too.

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs: rename S_KERNEL_FILE
  2022-01-28 11:35   ` David Howells
@ 2022-01-28 13:17     ` Al Viro
  2022-01-28 15:24       ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2022-01-28 13:17 UTC (permalink / raw)
  To: David Howells
  Cc: Amir Goldstein, Christoph Hellwig, Linus Torvalds, linux-fsdevel,
	linux-kernel, Chaitanya Kulkarni, Miklos Szeredi

On Fri, Jan 28, 2022 at 11:35:59AM +0000, David Howells wrote:

> > Whether deny rmdir should have its own flag or not I don't know,
> > but from ovl POV I *think* it should not be a problem to deny rmdir
> > for the ovl upper/work dirs as long as ovl is mounted(?).
> 
> What's the consequence of someone rearranging the directories directly in the
> contributing dirs whilst there's an overlay over them?

"Don't do it, then - presumably the kernel won't panic, but don't expect it to
try and invent nice semantics for the crap you are trying to pull"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs: rename S_KERNEL_FILE
  2022-01-28 13:17     ` Al Viro
@ 2022-01-28 15:24       ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2022-01-28 15:24 UTC (permalink / raw)
  To: Al Viro
  Cc: David Howells, Christoph Hellwig, Linus Torvalds, linux-fsdevel,
	linux-kernel, Chaitanya Kulkarni, Miklos Szeredi

On Fri, Jan 28, 2022 at 3:17 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Jan 28, 2022 at 11:35:59AM +0000, David Howells wrote:
>
> > > Whether deny rmdir should have its own flag or not I don't know,
> > > but from ovl POV I *think* it should not be a problem to deny rmdir
> > > for the ovl upper/work dirs as long as ovl is mounted(?).
> >
> > What's the consequence of someone rearranging the directories directly in the
> > contributing dirs whilst there's an overlay over them?
>
> "Don't do it, then - presumably the kernel won't panic, but don't expect it to
> try and invent nice semantics for the crap you are trying to pull"

IIUC, I think that is the point Dave was trying to make.
Nothing good can come out of allowing users to manipulate the overlay
upper/work dirs, so denying rmdir on those dirs that are already marked
with the OVL_INUSE flag is probably not a bad idea anyway, so ovl
and cachefiles could potentially use the same flag with same semantics.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-28 15:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28  7:47 [PATCH v2] fs: rename S_KERNEL_FILE Christoph Hellwig
2022-01-28  9:01 ` Chaitanya Kulkarni
2022-01-28 10:12 ` David Howells
2022-01-28 11:09   ` Amir Goldstein
2022-01-28 11:35   ` David Howells
2022-01-28 13:17     ` Al Viro
2022-01-28 15:24       ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).