All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
	Tejun Heo <tj@kernel.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Damien Le Moal <dlemoal@kernel.org>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nfs@vger.kernel.org, linux-hardening@vger.kernel.org,
	cgroups@vger.kernel.org
Subject: Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super
Date: Thu, 14 Sep 2023 06:38:43 +0100	[thread overview]
Message-ID: <20230914053843.GI800259@ZenIV> (raw)
In-Reply-To: <20230914023705.GH800259@ZenIV>

On Thu, Sep 14, 2023 at 03:37:05AM +0100, Al Viro wrote:
> On Thu, Sep 14, 2023 at 12:27:12AM +0100, Al Viro wrote:
> > On Wed, Sep 13, 2023 at 08:09:57AM -0300, Christoph Hellwig wrote:
> > > Releasing an anon dev_t is a very common thing when freeing a
> > > super_block, as that's done for basically any not block based file
> > > system (modulo the odd mtd special case).  So instead of requiring
> > > a special ->kill_sb helper and a lot of boilerplate in more complicated
> > > file systems, just release the anon dev_t in deactivate_locked_super if
> > > the super_block was using one.
> > > 
> > > As the freeing is done after the main call to kill_super_notify, this
> > > removes the need for having two slightly different call sites for it.
> > 
> > Huh?  At this stage in your series freeing is still in ->kill_sb()
> > instances, after the calls of kill_anon_super() you've turned into
> > the calls of generic_shutdown_super().
> > 
> > You do split it off into a separate method later in the series, but
> > at this point you are reopening the same UAF that had been dealt with
> > in dc3216b14160 "super: ensure valid info".
> > 
> > Either move the introduction of ->free_sb() before that one, or
> > split it into lifting put_anon_bdev() (left here) and getting rid
> > of kill_anon_super() (after ->free_sb() introduction).
> 
> Actually, looking at the final stage in the series, you still have
> kill_super_notify() done *AFTER* ->free_sb() call.  So the problem
> persists until the very end...

It's worse - look at the rationale for 2c18a63b760a "super: wait until
we passed kill super".  Basically, "don't remove from the lists
until after block device closing".  IOW, we have

* stuff that needs to be done before generic_shutdown_super() (things
like pinned dentries on ramfs, etc.)
* generic_shutdown_super() itself (dentry/inode eviction, optionally
->put_super())
* stuff that needs to be done before eviction from the lists (block
device closing, since 2c18a63b760a)
* eviction from the lists
* stuff that needs to be done *after* eviction from the lists.

BTW, this part of commit message in 2c18a63b760a is rather confused:
    Recent rework moved block device closing out of sb->put_super() and into
    sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and
    blkdev_put() can end up taking s_umount again.

That was *NOT* what a recent rework had done.  Block device closing had never
been inside ->put_super() - at no point since that (closing, that is) had been
introduced back in 0.97 ;-)  ->put_super() predates it (0.95c+).

The race is real, but the cause is not some kind of move of blkdev_put().
Your 2ea6f68932f7 "fs: use the super_block as holder when mounting file
systems" is where it actually came from.

Christoph, could you explain what the hell do we need that for?  It does
create the race in question and AFAICS 2c18a63b760a (and followups trying
to plug holes in it) had been nothing but headache.

Old logics: if mount attempt with a different fs type happens, -EBUSY
is precisely corrent - we would've gotten just that if mount() came
before umount().  If the type matches, we might
	1) come before deactivate_locked_super() by umount(2).
No problem, we succeed.
	2) come after the beginning of shutdown, but before the
removal from the list; fine, we'll wait for the sucker to be
unlocked (which happens in the end of generic_shutdown_super()),
notice it's dead and create a new superblock.  Since the only
part left on the umount side is closing the device, we are
just fine.
	3) come after the removal from the list.  So we won't
wait for the old superblock to be unlocked, other than that
it's exactly the same as (2).  It doesn't matter whether we
open the device before or after close by umount - same owner
anyway, no -EBUSY.

Your "owner shall be the superblock" breaks that...

If you want to mess with _three_-way split of ->kill_sb(),
please start with writing down the rules re what should
go into each of those parts; such writeup should go into
Documentation/filesystems/porting anyway, even if the
split is a two-way one, BTW.

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
To: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Christian Brauner
	<brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Heiko Carstens <hca-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	Vasily Gorbik <gor-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	Alexander Gordeev
	<agordeev-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	Fenghua Yu <fenghua.yu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Reinette Chatre
	<reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Miquel Raynal
	<miquel.raynal-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org>,
	Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org>,
	Vignesh Raghavendra <vigneshr-l0cyMroinI0@public.gmane.org>,
	Dennis Dalessandro
	<dennis.dalessandro-ntyVByD3zXaTtA8H5PvdGFaTQe2KTcn/@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Trond Myklebust
	<trond.myklebust-F/q8l9xzQnoyLce1RVWEUA@public.gmane.org>,
	Anna Schumaker <anna-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	Damien Le Moal <dlemoal-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Naohiro Aota <naohiro.aota-Sjgp3cTcYWE@public.gmane.org>,
	Greg Kroah-Hartman
	<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-s39
Subject: Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super
Date: Thu, 14 Sep 2023 06:38:43 +0100	[thread overview]
Message-ID: <20230914053843.GI800259@ZenIV> (raw)
In-Reply-To: <20230914023705.GH800259@ZenIV>

On Thu, Sep 14, 2023 at 03:37:05AM +0100, Al Viro wrote:
> On Thu, Sep 14, 2023 at 12:27:12AM +0100, Al Viro wrote:
> > On Wed, Sep 13, 2023 at 08:09:57AM -0300, Christoph Hellwig wrote:
> > > Releasing an anon dev_t is a very common thing when freeing a
> > > super_block, as that's done for basically any not block based file
> > > system (modulo the odd mtd special case).  So instead of requiring
> > > a special ->kill_sb helper and a lot of boilerplate in more complicated
> > > file systems, just release the anon dev_t in deactivate_locked_super if
> > > the super_block was using one.
> > > 
> > > As the freeing is done after the main call to kill_super_notify, this
> > > removes the need for having two slightly different call sites for it.
> > 
> > Huh?  At this stage in your series freeing is still in ->kill_sb()
> > instances, after the calls of kill_anon_super() you've turned into
> > the calls of generic_shutdown_super().
> > 
> > You do split it off into a separate method later in the series, but
> > at this point you are reopening the same UAF that had been dealt with
> > in dc3216b14160 "super: ensure valid info".
> > 
> > Either move the introduction of ->free_sb() before that one, or
> > split it into lifting put_anon_bdev() (left here) and getting rid
> > of kill_anon_super() (after ->free_sb() introduction).
> 
> Actually, looking at the final stage in the series, you still have
> kill_super_notify() done *AFTER* ->free_sb() call.  So the problem
> persists until the very end...

It's worse - look at the rationale for 2c18a63b760a "super: wait until
we passed kill super".  Basically, "don't remove from the lists
until after block device closing".  IOW, we have

* stuff that needs to be done before generic_shutdown_super() (things
like pinned dentries on ramfs, etc.)
* generic_shutdown_super() itself (dentry/inode eviction, optionally
->put_super())
* stuff that needs to be done before eviction from the lists (block
device closing, since 2c18a63b760a)
* eviction from the lists
* stuff that needs to be done *after* eviction from the lists.

BTW, this part of commit message in 2c18a63b760a is rather confused:
    Recent rework moved block device closing out of sb->put_super() and into
    sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and
    blkdev_put() can end up taking s_umount again.

That was *NOT* what a recent rework had done.  Block device closing had never
been inside ->put_super() - at no point since that (closing, that is) had been
introduced back in 0.97 ;-)  ->put_super() predates it (0.95c+).

The race is real, but the cause is not some kind of move of blkdev_put().
Your 2ea6f68932f7 "fs: use the super_block as holder when mounting file
systems" is where it actually came from.

Christoph, could you explain what the hell do we need that for?  It does
create the race in question and AFAICS 2c18a63b760a (and followups trying
to plug holes in it) had been nothing but headache.

Old logics: if mount attempt with a different fs type happens, -EBUSY
is precisely corrent - we would've gotten just that if mount() came
before umount().  If the type matches, we might
	1) come before deactivate_locked_super() by umount(2).
No problem, we succeed.
	2) come after the beginning of shutdown, but before the
removal from the list; fine, we'll wait for the sucker to be
unlocked (which happens in the end of generic_shutdown_super()),
notice it's dead and create a new superblock.  Since the only
part left on the umount side is closing the device, we are
just fine.
	3) come after the removal from the list.  So we won't
wait for the old superblock to be unlocked, other than that
it's exactly the same as (2).  It doesn't matter whether we
open the device before or after close by umount - same owner
anyway, no -EBUSY.

Your "owner shall be the superblock" breaks that...

If you want to mess with _three_-way split of ->kill_sb(),
please start with writing down the rules re what should
go into each of those parts; such writeup should go into
Documentation/filesystems/porting anyway, even if the
split is a two-way one, BTW.

  reply	other threads:[~2023-09-14  5:39 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-13 11:09 split up ->kill_sb Christoph Hellwig
2023-09-13 11:09 ` Christoph Hellwig
2023-09-13 11:09 ` [PATCH 01/19] fs: reflow deactivate_locked_super Christoph Hellwig
2023-09-13 11:09   ` Christoph Hellwig
2023-09-13 16:35   ` Christian Brauner
2023-09-13 16:35     ` Christian Brauner
2023-09-26  9:24     ` Christoph Hellwig
2023-09-26  9:24       ` Christoph Hellwig
2023-09-13 11:09 ` [PATCH 02/19] fs: make ->kill_sb optional Christoph Hellwig
2023-09-13 11:09   ` Christoph Hellwig
2023-09-13 11:09 ` [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super Christoph Hellwig
2023-09-13 11:09   ` Christoph Hellwig
2023-09-13 23:27   ` Al Viro
2023-09-13 23:27     ` Al Viro
2023-09-14  2:37     ` Al Viro
2023-09-14  2:37       ` Al Viro
2023-09-14  5:38       ` Al Viro [this message]
2023-09-14  5:38         ` Al Viro
2023-09-14  7:56         ` Christian Brauner
2023-09-14  7:56           ` Christian Brauner
2023-09-26  9:31           ` Christoph Hellwig
2023-09-26  9:31             ` Christoph Hellwig
2023-09-14 14:02         ` Christian Brauner
2023-09-14 14:02           ` Christian Brauner
2023-09-14 16:58           ` Al Viro
2023-09-14 16:58             ` Al Viro
2023-09-14 19:23             ` Al Viro
2023-09-14 19:23               ` Al Viro
2023-09-15  7:40               ` Christian Brauner
2023-09-15  7:40                 ` Christian Brauner
2023-09-15  9:44             ` Christian Brauner
2023-09-15  9:44               ` Christian Brauner
2023-09-15 14:12               ` Christian Brauner
2023-09-15 14:12                 ` Christian Brauner
2023-09-15 14:28                 ` Al Viro
2023-09-15 14:28                   ` Al Viro
2023-09-15 14:33                   ` Al Viro
2023-09-15 14:33                     ` Al Viro
2023-09-15 14:40                   ` Christian Brauner
2023-09-15 14:40                     ` Christian Brauner
2023-09-26  9:41         ` Christoph Hellwig
2023-09-26  9:41           ` Christoph Hellwig
2023-09-26  9:38     ` Christoph Hellwig
2023-09-26  9:38       ` Christoph Hellwig
2023-09-26 21:25       ` Al Viro
2023-09-27 22:29         ` Al Viro
2023-10-02  6:46         ` Christoph Hellwig
2023-10-09 21:57           ` Al Viro
2023-10-10  8:44             ` Christian Brauner
2023-10-17 19:50               ` Al Viro
2023-09-13 11:09 ` [PATCH 04/19] NFS: remove the s_dev field from struct nfs_server Christoph Hellwig
2023-09-13 11:09   ` Christoph Hellwig
2023-09-13 11:09 ` [PATCH 05/19] fs: assign an anon dev_t in common code Christoph Hellwig
2023-09-13 11:09   ` Christoph Hellwig
2023-09-14  0:34   ` Al Viro
2023-09-14  0:34     ` Al Viro
2023-09-13 11:10 ` [PATCH 06/19] qibfs: use simple_release_fs Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-18 11:41   ` Leon Romanovsky
2023-09-18 11:41     ` Leon Romanovsky
2023-09-13 11:10 ` [PATCH 07/19] hypfs: use d_genocide to kill fs entries Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 11:10 ` [PATCH 08/19] pstore: shrink the pstore_sb_lock critical section in pstore_kill_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 22:07   ` Kees Cook
2023-09-13 22:07     ` Kees Cook
2023-09-13 11:10 ` [PATCH 09/19] zonefs: remove duplicate cleanup in zonefs_fill_super Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-14  0:33   ` Damien Le Moal
2023-09-14  0:33     ` Damien Le Moal
2023-09-14  0:49   ` Al Viro
2023-09-14  0:49     ` Al Viro
2023-09-13 11:10 ` [PATCH 10/19] USB: gadget/legacy: remove sb_mutex Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 16:10   ` Alan Stern
2023-09-13 16:10     ` Alan Stern
2023-09-26  9:24     ` Christoph Hellwig
2023-09-26  9:24       ` Christoph Hellwig
2023-09-14 10:22   ` Sergey Shtylyov
2023-09-14 10:22     ` Sergey Shtylyov
2023-09-13 11:10 ` [PATCH 11/19] fs: add new shutdown_sb and free_sb methods Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-14  2:07   ` Al Viro
2023-09-14  2:07     ` Al Viro
2023-09-13 11:10 ` [PATCH 12/19] fs: convert kill_litter_super to litter_shutdown_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 22:07   ` Kees Cook
2023-09-13 22:07     ` Kees Cook
2023-09-13 11:10 ` [PATCH 13/19] fs: convert kill_block_super to block_free_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-14  2:29   ` Al Viro
2023-09-14  2:29     ` Al Viro
2023-09-13 11:10 ` [PATCH 14/19] jffs2: convert to ->shutdown_sb and ->free_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 11:10 ` [PATCH 15/19] kernfs: split ->kill_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-18 15:24   ` Michal Koutný
2023-09-18 15:24     ` Michal Koutný
2023-09-13 11:10 ` [PATCH 16/19] x86/resctrl: release rdtgroup_mutex and the CPU hotplug lock in rdt_shutdown_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 11:10 ` [PATCH 17/19] NFS: move nfs_kill_super to fs_context.c Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 11:10 ` [PATCH 18/19] fs: simple ->shutdown_sb and ->free_sb conversions Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig
2023-09-13 11:10 ` [PATCH 19/19] fs: remove ->kill_sb Christoph Hellwig
2023-09-13 11:10   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230914053843.GI800259@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=agordeev@linux.ibm.com \
    --cc=anna@kernel.org \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=dlemoal@kernel.org \
    --cc=fenghua.yu@intel.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=naohiro.aota@wdc.com \
    --cc=reinette.chatre@intel.com \
    --cc=richard@nod.at \
    --cc=tj@kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.