All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
@ 2023-06-15 11:38 Jan Kara
  2023-06-15 12:53 ` Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jan Kara @ 2023-06-15 11:38 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Al Viro, linux-fsdevel, Ted Tso, David Howells, Jan Kara

The reconfigure / remount code takes a lot of effort to protect
filesystem's reconfiguration code from racing writes on remounting
read-only. However during remounting read-only filesystem to read-write
mode userspace writes can start immediately once we clear SB_RDONLY
flag. This is inconvenient for example for ext4 because we need to do
some writes to the filesystem (such as preparation of quota files)
before we can take userspace writes so we are clearing SB_RDONLY flag
before we are fully ready to accept userpace writes and syzbot has found
a way to exploit this [1]. Also as far as I'm reading the code
the filesystem remount code was protected from racing writes in the
legacy mount path by the mount's MNT_READONLY flag so this is relatively
new problem. It is actually fairly easy to protect remount read-write
from racing writes using sb->s_readonly_remount flag so let's just do
that instead of having to workaround these races in the filesystem code.

[1] https://lore.kernel.org/all/00000000000006a0df05f6667499@google.com/T/
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/super.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2b..6cd64961aa07 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -903,6 +903,7 @@ int reconfigure_super(struct fs_context *fc)
 	struct super_block *sb = fc->root->d_sb;
 	int retval;
 	bool remount_ro = false;
+	bool remount_rw = false;
 	bool force = fc->sb_flags & SB_FORCE;
 
 	if (fc->sb_flags_mask & ~MS_RMT_MASK)
@@ -920,7 +921,7 @@ int reconfigure_super(struct fs_context *fc)
 		    bdev_read_only(sb->s_bdev))
 			return -EACCES;
 #endif
-
+		remount_rw = !(fc->sb_flags & SB_RDONLY) && sb_rdonly(sb);
 		remount_ro = (fc->sb_flags & SB_RDONLY) && !sb_rdonly(sb);
 	}
 
@@ -950,6 +951,14 @@ int reconfigure_super(struct fs_context *fc)
 			if (retval)
 				return retval;
 		}
+	} else if (remount_rw) {
+		/*
+		 * We set s_readonly_remount here to protect filesystem's
+		 * reconfigure code from writes from userspace until
+		 * reconfigure finishes.
+		 */
+		sb->s_readonly_remount = 1;
+		smp_wmb();
 	}
 
 	if (fc->ops->reconfigure) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-15 11:38 [PATCH] fs: Protect reconfiguration of sb read-write from racing writes Jan Kara
@ 2023-06-15 12:53 ` Christian Brauner
  2023-06-15 14:10   ` Theodore Ts'o
  2023-06-15 15:01 ` Christian Brauner
  2023-06-15 22:36 ` Dave Chinner
  2 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2023-06-15 12:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, Ted Tso, David Howells

On Thu, Jun 15, 2023 at 01:38:48PM +0200, Jan Kara wrote:
> The reconfigure / remount code takes a lot of effort to protect
> filesystem's reconfiguration code from racing writes on remounting
> read-only. However during remounting read-only filesystem to read-write
> mode userspace writes can start immediately once we clear SB_RDONLY
> flag. This is inconvenient for example for ext4 because we need to do
> some writes to the filesystem (such as preparation of quota files)
> before we can take userspace writes so we are clearing SB_RDONLY flag
> before we are fully ready to accept userpace writes and syzbot has found
> a way to exploit this [1]. Also as far as I'm reading the code
> the filesystem remount code was protected from racing writes in the
> legacy mount path by the mount's MNT_READONLY flag so this is relatively
> new problem. It is actually fairly easy to protect remount read-write
> from racing writes using sb->s_readonly_remount flag so let's just do
> that instead of having to workaround these races in the filesystem code.
> 
> [1] https://lore.kernel.org/all/00000000000006a0df05f6667499@google.com/T/
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

So looking at the ext4 code this can only happen when you clear
SB_RDONLY in ->reconfigure() too early (and the mount isn't
MNT_READONLY). Afaict, this was fixed in:

a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")

by clearing SB_RDONLY late, right before returning from ->reconfigure()
when everything's ready. So your change is not about fixing that bug in
[1] it's about making the vfs give the guarantee that an fs is free to
clear SB_RDONLY because any ro<->rw transitions are protected via
s_readonly_remount. Correct? It seems ok to me just making sure.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-15 12:53 ` Christian Brauner
@ 2023-06-15 14:10   ` Theodore Ts'o
  2023-06-15 14:48     ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2023-06-15 14:10 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Al Viro, linux-fsdevel, David Howells

On Thu, Jun 15, 2023 at 02:53:53PM +0200, Christian Brauner wrote:
> 
> So looking at the ext4 code this can only happen when you clear
> SB_RDONLY in ->reconfigure() too early (and the mount isn't
> MNT_READONLY). Afaict, this was fixed in:
> 
> a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
> 
> by clearing SB_RDONLY late, right before returning from ->reconfigure()
> when everything's ready. So your change is not about fixing that bug in
> [1] it's about making the vfs give the guarantee that an fs is free to
> clear SB_RDONLY because any ro<->rw transitions are protected via
> s_readonly_remount. Correct? It seems ok to me just making sure.

Unfortunately we had to revert that commit because that broke
r/o->r/w writes when quota was enabled.  The problem is we need a way
of enabling file system writes for internal purposes (e.g., because
quota needs to set up quota inodes) but *not* allow userspace file
system writes to occur until we are fully done with the remount process.

See the discussion here:

	https://lore.kernel.org/all/20230608044056.GA1418535@mit.edu/

The problem with the current state of the tree is commit dea9d8f7643f
("ext4: only check dquot_initialize_needed() when debugging") has
caught real bugs in the past where the caller of
ext4_xattr_block_set() failed to call dquot_initialize(inode).  In
addition, shutting up the warning doesn't fix the problem that while
we hit this race where we have started remounting r/w, quota hasn't
been initialized, quota tracking will get silently dropped, leading to
the quota usage tracking no longer reflecting reality.

Jan's patch will fix this problem.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-15 14:10   ` Theodore Ts'o
@ 2023-06-15 14:48     ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2023-06-15 14:48 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christian Brauner, Jan Kara, Al Viro, linux-fsdevel, David Howells

On Thu 15-06-23 10:10:40, Theodore Ts'o wrote:
> On Thu, Jun 15, 2023 at 02:53:53PM +0200, Christian Brauner wrote:
> > 
> > So looking at the ext4 code this can only happen when you clear
> > SB_RDONLY in ->reconfigure() too early (and the mount isn't
> > MNT_READONLY). Afaict, this was fixed in:
> > 
> > a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
> > 
> > by clearing SB_RDONLY late, right before returning from ->reconfigure()
> > when everything's ready. So your change is not about fixing that bug in
> > [1] it's about making the vfs give the guarantee that an fs is free to
> > clear SB_RDONLY because any ro<->rw transitions are protected via
> > s_readonly_remount. Correct? It seems ok to me just making sure.
> 
> Unfortunately we had to revert that commit because that broke
> r/o->r/w writes when quota was enabled.  The problem is we need a way
> of enabling file system writes for internal purposes (e.g., because
> quota needs to set up quota inodes) but *not* allow userspace file
> system writes to occur until we are fully done with the remount process.
> 
> See the discussion here:
> 
> 	https://lore.kernel.org/all/20230608044056.GA1418535@mit.edu/
> 
> The problem with the current state of the tree is commit dea9d8f7643f
> ("ext4: only check dquot_initialize_needed() when debugging") has
> caught real bugs in the past where the caller of
> ext4_xattr_block_set() failed to call dquot_initialize(inode).  In
> addition, shutting up the warning doesn't fix the problem that while
> we hit this race where we have started remounting r/w, quota hasn't
> been initialized, quota tracking will get silently dropped, leading to
> the quota usage tracking no longer reflecting reality.
> 
> Jan's patch will fix this problem.

Yes, and to fully reveal the situation, we can indeed introduce
ext4-private superblock state "only internal writes allowed" which can be
used for quota setup. It just requires a bit of tidying and helper
adjustment (in fact I have such patches on my private branch). But it has
occured to me that generally it is easier if the filesystem's remount code
doesn't have to care about racing writers on rw<->ro transitions since we
have many filesystems and making sure all are getting this correct is ...
tedious.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-15 11:38 [PATCH] fs: Protect reconfiguration of sb read-write from racing writes Jan Kara
  2023-06-15 12:53 ` Christian Brauner
@ 2023-06-15 15:01 ` Christian Brauner
  2023-06-15 22:36 ` Dave Chinner
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2023-06-15 15:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Ted Tso, David Howells

On Thu, 15 Jun 2023 13:38:48 +0200, Jan Kara wrote:
> The reconfigure / remount code takes a lot of effort to protect
> filesystem's reconfiguration code from racing writes on remounting
> read-only. However during remounting read-only filesystem to read-write
> mode userspace writes can start immediately once we clear SB_RDONLY
> flag. This is inconvenient for example for ext4 because we need to do
> some writes to the filesystem (such as preparation of quota files)
> before we can take userspace writes so we are clearing SB_RDONLY flag
> before we are fully ready to accept userpace writes and syzbot has found
> a way to exploit this [1]. Also as far as I'm reading the code
> the filesystem remount code was protected from racing writes in the
> legacy mount path by the mount's MNT_READONLY flag so this is relatively
> new problem. It is actually fairly easy to protect remount read-write
> from racing writes using sb->s_readonly_remount flag so let's just do
> that instead of having to workaround these races in the filesystem code.
> 
> [...]

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] fs: Protect reconfiguration of sb read-write from racing writes
      https://git.kernel.org/vfs/vfs/c/496de0b41695

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-15 11:38 [PATCH] fs: Protect reconfiguration of sb read-write from racing writes Jan Kara
  2023-06-15 12:53 ` Christian Brauner
  2023-06-15 15:01 ` Christian Brauner
@ 2023-06-15 22:36 ` Dave Chinner
  2023-06-16 16:37   ` Jan Kara
  2 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2023-06-15 22:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Ted Tso, David Howells

On Thu, Jun 15, 2023 at 01:38:48PM +0200, Jan Kara wrote:
> The reconfigure / remount code takes a lot of effort to protect
> filesystem's reconfiguration code from racing writes on remounting
> read-only. However during remounting read-only filesystem to read-write
> mode userspace writes can start immediately once we clear SB_RDONLY
> flag. This is inconvenient for example for ext4 because we need to do
> some writes to the filesystem (such as preparation of quota files)
> before we can take userspace writes so we are clearing SB_RDONLY flag
> before we are fully ready to accept userpace writes and syzbot has found
> a way to exploit this [1]. Also as far as I'm reading the code
> the filesystem remount code was protected from racing writes in the
> legacy mount path by the mount's MNT_READONLY flag so this is relatively
> new problem. It is actually fairly easy to protect remount read-write
> from racing writes using sb->s_readonly_remount flag so let's just do
> that instead of having to workaround these races in the filesystem code.
> 
> [1] https://lore.kernel.org/all/00000000000006a0df05f6667499@google.com/T/
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/super.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2b..6cd64961aa07 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -903,6 +903,7 @@ int reconfigure_super(struct fs_context *fc)
>  	struct super_block *sb = fc->root->d_sb;
>  	int retval;
>  	bool remount_ro = false;
> +	bool remount_rw = false;
>  	bool force = fc->sb_flags & SB_FORCE;
>  
>  	if (fc->sb_flags_mask & ~MS_RMT_MASK)
> @@ -920,7 +921,7 @@ int reconfigure_super(struct fs_context *fc)
>  		    bdev_read_only(sb->s_bdev))
>  			return -EACCES;
>  #endif
> -
> +		remount_rw = !(fc->sb_flags & SB_RDONLY) && sb_rdonly(sb);
>  		remount_ro = (fc->sb_flags & SB_RDONLY) && !sb_rdonly(sb);
>  	}
>  
> @@ -950,6 +951,14 @@ int reconfigure_super(struct fs_context *fc)
>  			if (retval)
>  				return retval;
>  		}
> +	} else if (remount_rw) {
> +		/*
> +		 * We set s_readonly_remount here to protect filesystem's
> +		 * reconfigure code from writes from userspace until
> +		 * reconfigure finishes.
> +		 */
> +		sb->s_readonly_remount = 1;
> +		smp_wmb();

What does the magic random memory barrier do? What is it ordering,
and what is it paired with?

This sort of thing is much better done with small helpers that
encapsulate the necessary memory barriers:

sb_set_readonly_remount()
sb_clear_readonly_remount()

alongside the helper that provides the read-side check and memory
barrier the write barrier is associated with.

I don't often ask for code to be cleaned up before a bug fix can be
added, but I think this is one of the important cases where it does
actually matter - we should never add undocumented memory barriers
in the code like this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-15 22:36 ` Dave Chinner
@ 2023-06-16 16:37   ` Jan Kara
  2023-06-16 22:48     ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2023-06-16 16:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Christian Brauner, Al Viro, linux-fsdevel, Ted Tso,
	David Howells

On Fri 16-06-23 08:36:53, Dave Chinner wrote:
> On Thu, Jun 15, 2023 at 01:38:48PM +0200, Jan Kara wrote:
> > The reconfigure / remount code takes a lot of effort to protect
> > filesystem's reconfiguration code from racing writes on remounting
> > read-only. However during remounting read-only filesystem to read-write
> > mode userspace writes can start immediately once we clear SB_RDONLY
> > flag. This is inconvenient for example for ext4 because we need to do
> > some writes to the filesystem (such as preparation of quota files)
> > before we can take userspace writes so we are clearing SB_RDONLY flag
> > before we are fully ready to accept userpace writes and syzbot has found
> > a way to exploit this [1]. Also as far as I'm reading the code
> > the filesystem remount code was protected from racing writes in the
> > legacy mount path by the mount's MNT_READONLY flag so this is relatively
> > new problem. It is actually fairly easy to protect remount read-write
> > from racing writes using sb->s_readonly_remount flag so let's just do
> > that instead of having to workaround these races in the filesystem code.
...
> > +	} else if (remount_rw) {
> > +		/*
> > +		 * We set s_readonly_remount here to protect filesystem's
> > +		 * reconfigure code from writes from userspace until
> > +		 * reconfigure finishes.
> > +		 */
> > +		sb->s_readonly_remount = 1;
> > +		smp_wmb();
> 
> What does the magic random memory barrier do? What is it ordering,
> and what is it paired with?
> 
> This sort of thing is much better done with small helpers that
> encapsulate the necessary memory barriers:
> 
> sb_set_readonly_remount()
> sb_clear_readonly_remount()
> 
> alongside the helper that provides the read-side check and memory
> barrier the write barrier is associated with.

Fair remark. The new code including barrier just copies what happens a few
lines above for remount read-only case (and what happens ib several other
places throughout VFS code). I agree having helpers for this and actually
documenting how the barriers are matching there is a good cleanup.

> I don't often ask for code to be cleaned up before a bug fix can be
> added, but I think this is one of the important cases where it does
> actually matter - we should never add undocumented memory barriers
> in the code like this...

I've talked to Christian and we'll queue this as a separate cleanup. I'll
post it shortly.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: Protect reconfiguration of sb read-write from racing writes
  2023-06-16 16:37   ` Jan Kara
@ 2023-06-16 22:48     ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2023-06-16 22:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Ted Tso, David Howells

On Fri, Jun 16, 2023 at 06:37:00PM +0200, Jan Kara wrote:
> On Fri 16-06-23 08:36:53, Dave Chinner wrote:
> > On Thu, Jun 15, 2023 at 01:38:48PM +0200, Jan Kara wrote:
> > > The reconfigure / remount code takes a lot of effort to protect
> > > filesystem's reconfiguration code from racing writes on remounting
> > > read-only. However during remounting read-only filesystem to read-write
> > > mode userspace writes can start immediately once we clear SB_RDONLY
> > > flag. This is inconvenient for example for ext4 because we need to do
> > > some writes to the filesystem (such as preparation of quota files)
> > > before we can take userspace writes so we are clearing SB_RDONLY flag
> > > before we are fully ready to accept userpace writes and syzbot has found
> > > a way to exploit this [1]. Also as far as I'm reading the code
> > > the filesystem remount code was protected from racing writes in the
> > > legacy mount path by the mount's MNT_READONLY flag so this is relatively
> > > new problem. It is actually fairly easy to protect remount read-write
> > > from racing writes using sb->s_readonly_remount flag so let's just do
> > > that instead of having to workaround these races in the filesystem code.
> ...
> > > +	} else if (remount_rw) {
> > > +		/*
> > > +		 * We set s_readonly_remount here to protect filesystem's
> > > +		 * reconfigure code from writes from userspace until
> > > +		 * reconfigure finishes.
> > > +		 */
> > > +		sb->s_readonly_remount = 1;
> > > +		smp_wmb();
> > 
> > What does the magic random memory barrier do? What is it ordering,
> > and what is it paired with?
> > 
> > This sort of thing is much better done with small helpers that
> > encapsulate the necessary memory barriers:
> > 
> > sb_set_readonly_remount()
> > sb_clear_readonly_remount()
> > 
> > alongside the helper that provides the read-side check and memory
> > barrier the write barrier is associated with.
> 
> Fair remark. The new code including barrier just copies what happens a few
> lines above for remount read-only case (and what happens ib several other
> places throughout VFS code).

Yes, I saw that you'd copied that magic memory barrier. Good for
consistency, but it doesn't answer any of the above questions
either, so....

> I agree having helpers for this and actually
> documenting how the barriers are matching there is a good cleanup.
> 
> > I don't often ask for code to be cleaned up before a bug fix can be
> > added, but I think this is one of the important cases where it does
> > actually matter - we should never add undocumented memory barriers
> > in the code like this...
> 
> I've talked to Christian and we'll queue this as a separate cleanup. I'll
> post it shortly.

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-16 22:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-15 11:38 [PATCH] fs: Protect reconfiguration of sb read-write from racing writes Jan Kara
2023-06-15 12:53 ` Christian Brauner
2023-06-15 14:10   ` Theodore Ts'o
2023-06-15 14:48     ` Jan Kara
2023-06-15 15:01 ` Christian Brauner
2023-06-15 22:36 ` Dave Chinner
2023-06-16 16:37   ` Jan Kara
2023-06-16 22:48     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.