All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Kees Cook <keescook@chromium.org>
Cc: tj@kernel.org, gregkh@linuxfoundation.org,
	akpm@linux-foundation.org, minchan@kernel.org, jeyu@kernel.org,
	shuah@kernel.org, bvanassche@acm.org, dan.j.williams@intel.com,
	joe@perches.com, tglx@linutronix.de, rostedt@goodmis.org,
	linux-spdx@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v8 09/12] sysfs: fix deadlock race with module removal
Date: Mon, 11 Oct 2021 15:26:02 -0700	[thread overview]
Message-ID: <YWS5+uyDQ4qklTcT@bombadil.infradead.org> (raw)
In-Reply-To: <202110051315.B2165F455@keescook>

On Tue, Oct 05, 2021 at 01:50:31PM -0700, Kees Cook wrote:
> On Mon, Sep 27, 2021 at 09:38:02AM -0700, Luis Chamberlain wrote:
> > A sketch of how this can happen follows, consider foo a local mutex
> > part of a driver, and used on the driver's module exit routine and
> > on one of its sysfs ops:
> > 
> > foo.c:
> > static DEFINE_MUTEX(foo);
> > static ssize_t foo_store(struct device *dev,
> > 			 struct device_attribute *attr,
> > 			 const char *buf, size_t count)
> > {
> > 	...
> > 	mutex_lock(&foo);
> > 	...
> > 	mutex_lock(&foo);
> > 	...
> > }
> > static DEVICE_ATTR_RW(foo);
> > ...
> > void foo_exit(void)
> > {
> > 	mutex_lock(&foo);
> > 	...
> > 	mutex_unlock(&foo);
> > }
> > module_exit(foo_exit);
> > 
> > And this can lead to this condition:
> > 
> > CPU A                              CPU B
> >                                    foo_store()
> > foo_exit()
> >   mutex_lock(&foo)
> >                                    mutex_lock(&foo)
> >    del_gendisk(some_struct->disk);
> >      device_del()
> >        device_remove_groups()
> 
> Please expand this further, where does device_remove_groups() end up
> waiting for that never happens?

Sure. How about:

Furthermore, device_remove_groups() will just go on trying to remove
the sysfs files, which are kernfs entries. The way kernfs deals with
removal is that it will wait until all active references for the files
being removed are done. The active reference is obtained through
kernfs_get_active(). Removal ends up waiting through kernfs_drain()
for the active references to be done, and that only happens if the
kernfs file ops can complete. If these kernfs ops / sysfs files
are waiting for a mutex which taken by the module's exit routine
prior to trying to remove the sysfs files we deadlock.

> > In this situation foo_store() is waiting for the mutex foo to
> > become unlocked, but that won't happen until module removal is complete.
> > But module removal won't complete until the sysfs file being poked at
> > completes which is waiting for a lock already held.
> > 
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 +-
> >  fs/kernfs/dir.c                        | 44 ++++++++++++++++++----
> >  fs/kernfs/file.c                       |  6 ++-
> >  fs/kernfs/kernfs-internal.h            |  3 +-
> >  fs/kernfs/symlink.c                    |  3 +-
> >  fs/sysfs/dir.c                         |  2 +-
> >  fs/sysfs/file.c                        |  6 ++-
> >  fs/sysfs/group.c                       |  3 +-
> >  include/linux/kernfs.h                 | 14 ++++---
> >  include/linux/sysfs.h                  | 52 ++++++++++++++++++++------
> >  kernel/cgroup/cgroup.c                 |  2 +-
> >  11 files changed, 105 insertions(+), 34 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index b57b3db9a6a7..4edf3b37fd2c 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -209,7 +209,7 @@ static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft)
> >  
> >  	kn = __kernfs_create_file(parent_kn, rft->name, rft->mode,
> >  				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
> > -				  0, rft->kf_ops, rft, NULL, NULL);
> > +				  0, rft->kf_ops, rft, NULL, NULL, NULL);
> >  	if (IS_ERR(kn))
> >  		return PTR_ERR(kn);
> >  
> > @@ -2482,7 +2482,7 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
> >  
> >  	kn = __kernfs_create_file(parent_kn, name, 0444,
> >  				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
> > -				  &kf_mondata_ops, priv, NULL, NULL);
> > +				  &kf_mondata_ops, priv, NULL, NULL, NULL);
> >  	if (IS_ERR(kn))
> >  		return PTR_ERR(kn);
> >  
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index ba581429bf7b..e841201fd11b 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/slab.h>
> >  #include <linux/security.h>
> >  #include <linux/hash.h>
> > +#include <linux/module.h>
> >  
> >  #include "kernfs-internal.h"
> >  
> > @@ -414,12 +415,29 @@ static bool kernfs_unlink_sibling(struct kernfs_node *kn)
> >   */
> >  struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)
> >  {
> > +	int v;
> > +
> >  	if (unlikely(!kn))
> >  		return NULL;
> >  
> >  	if (!atomic_inc_unless_negative(&kn->active))
> >  		return NULL;
> >  
> > +	/*
> > +	 * If a module created the kernfs_node, the module cannot possibly be
> > +	 * removed if the above atomic_inc_unless_negative() succeeded. So the
> > +	 * try_module_get() below is not to protect the lifetime of the module
> > +	 * as that is already guaranteed. The try_module_get() below is used
> > +	 * to ensure that we don't deadlock in case a kernfs operation and
> > +	 * module removal used a shared lock.
> > +	 */
> > +	if (!try_module_get(kn->owner)) {
> > +		v = atomic_dec_return(&kn->active);
> > +		if (unlikely(v == KN_DEACTIVATED_BIAS))
> > +			wake_up_all(&kernfs_root(kn)->deactivate_waitq);
> > +		return NULL;
> > +	}
> 
> The special casing in here makes me think this isn't happening the right
> place. (i.e this looks like an open-coded version of kernfs_put_active())

No, well you see, in effect the special care taken in
kernfs_put_active() *is* the right way to inform a waiter that
that the *taken* reference right above *also* is no longer active.

The special casing here is because we took the active reference
before the try_module_get() in the above atomic_inc_unless_negative()
call. Outside callers deal with this through kernfs_put_active().

We are special casing to deal with the deadlock case.

> > +
> >  	if (kernfs_lockdep(kn))
> >  		rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);
> >  	return kn;
> > @@ -442,6 +460,13 @@ void kernfs_put_active(struct kernfs_node *kn)
> >  	if (kernfs_lockdep(kn))
> >  		rwsem_release(&kn->dep_map, _RET_IP_);
> >  	v = atomic_dec_return(&kn->active);
> > +
> > +	/*
> > +	 * We prevent module exit *until* we know for sure all possible
> > +	 * kernfs ops are done.
> > +	 */
> > +	module_put(kn->owner);
> > +
> >  	if (likely(v != KN_DEACTIVATED_BIAS))
> >  		return;
> 
> What I don't understand, however, is what kernfs_get/put_active() is
> intending to do -- it looks like it's trying to provide an interruption
> point for open kernfs file operations?

It is essentially ensuring that removal does not happen if any ops
are being used.

> This all seems extremely complex for what seems like it should just be a
> global "am I being removed?" bool?

It used to be worse :) And Tejun has cleaned this up over time. Yes,
perhaps we can improve that more but, given how sensible this code
is I think such improvements should be made separately.

> Regardless, while I do see the logic of associating the module get/put
> with get/put of kernfs "active", why is it not better tied to strictly
> kernfs open/close? 

It's not just files, consider kernfs_iop_mkdir() which also calls
kernfs_get_active(). How about kernfs_fop_mmap()? And so, the common
denominator is actually kernfs_get_active().

> That would seem to be much simpler and not require
> any special handling?

Yes true, but it I think this would still leave open some other possible
deadlocks.

> For example, why does this not work?

It does for the write case for sure, but I haven't written tests for the
other odd cases, but suspect that would deadlock as well.

 Luis

  reply	other threads:[~2021-10-11 22:26 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27 16:37 [PATCH v8 00/12] syfs: generic deadlock fix with module removal Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 01/12] LICENSES: Add the copyleft-next-0.3.1 license Luis Chamberlain
     [not found]   ` <202110050907.35FBD2A1@keescook>
     [not found]     ` <YWR2ZrtzChamY1y4@bombadil.infradead.org>
2021-10-11 17:57       ` Kees Cook
2021-09-27 16:37 ` [PATCH v8 02/12] testing: use the copyleft-next-0.3.1 SPDX tag Luis Chamberlain
2021-10-05 16:11   ` Kees Cook
2021-09-27 16:37 ` [PATCH v8 03/12] selftests: add tests_sysfs module Luis Chamberlain
2021-10-05 14:16   ` Greg KH
2021-10-05 16:57     ` Tim.Bird
2021-10-11 17:40       ` Luis Chamberlain
2021-10-11 17:38     ` Luis Chamberlain
2021-10-07 14:23   ` Miroslav Benes
2021-10-11 19:11     ` Luis Chamberlain
     [not found]   ` <202110050912.3DF681ED@keescook>
2021-10-11 19:03     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 04/12] kernfs: add initial failure injection support Luis Chamberlain
2021-10-05 19:47   ` Kees Cook
2021-10-11 20:44     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 05/12] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
2021-10-05 19:51   ` Kees Cook
2021-10-11 20:56     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 06/12] kernel/module: add documentation for try_module_get() Luis Chamberlain
2021-10-05 19:58   ` Kees Cook
2021-10-11 21:16     ` Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 07/12] fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on kernfs_create_link() Luis Chamberlain
2021-10-05 19:59   ` Kees Cook
2021-09-27 16:38 ` [PATCH v8 08/12] fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() Luis Chamberlain
2021-10-05 16:05   ` Kees Cook
2021-09-27 16:38 ` [PATCH v8 09/12] sysfs: fix deadlock race with module removal Luis Chamberlain
2021-10-05  9:24   ` Ming Lei
2021-10-11 21:25     ` Luis Chamberlain
2021-10-12  0:20       ` Ming Lei
2021-10-12 21:18         ` Luis Chamberlain
2021-10-13  1:07           ` Ming Lei
2021-10-13 12:35             ` Luis Chamberlain
2021-10-13 15:04               ` Ming Lei
2021-10-13 21:16                 ` Luis Chamberlain
2021-10-05 20:50   ` Kees Cook
2021-10-11 22:26     ` Luis Chamberlain [this message]
2021-10-13 12:41       ` Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 10/12] test_sysfs: enable deadlock tests by default Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
2021-10-05 20:55   ` Kees Cook
2021-10-11 18:27     ` Luis Chamberlain
2021-10-14  1:55   ` Ming Lei
2021-10-14  2:11     ` Ming Lei
2021-10-14 20:24       ` Luis Chamberlain
2021-10-14 23:52         ` Ming Lei
2021-10-15  0:22           ` Luis Chamberlain
2021-10-15  8:36             ` Ming Lei
2021-10-15  8:52               ` Greg KH
2021-10-15 17:31               ` Luis Chamberlain
2021-10-16 11:28                 ` Ming Lei
2021-10-18 19:32                   ` Luis Chamberlain
2021-10-19  2:34                     ` Ming Lei
2021-10-19  6:23                       ` Miroslav Benes
2021-10-19  9:23                         ` Ming Lei
2021-10-20  6:43                           ` Miroslav Benes
2021-10-20  7:49                             ` Ming Lei
2021-10-20  8:19                               ` Miroslav Benes
2021-10-20  8:28                                 ` Greg KH
2021-10-25  9:58                                   ` Miroslav Benes
2021-10-20 10:09                                 ` Ming Lei
2021-10-26  8:48                                   ` Petr Mladek
2021-10-26 15:37                                     ` Ming Lei
2021-10-26 17:01                                       ` Luis Chamberlain
2021-10-27 11:57                                         ` Miroslav Benes
2021-10-27 14:27                                           ` Luis Chamberlain
2021-11-02 15:24                                           ` Petr Mladek
2021-11-02 16:25                                             ` Luis Chamberlain
2021-11-03  0:01                                               ` Ming Lei
2021-11-03 12:44                                                 ` Luis Chamberlain
2021-10-27 11:42                                       ` Miroslav Benes
2021-11-02 14:15                                       ` Petr Mladek
2021-11-02 14:51                                         ` Petr Mladek
2021-11-02 15:17                                           ` Ming Lei
2021-11-02 14:56                                         ` Ming Lei
2021-10-19 15:28                       ` Luis Chamberlain
2021-10-19 16:29                         ` Ming Lei
2021-10-19 19:36                           ` Luis Chamberlain
2021-10-20  1:15                             ` Ming Lei
2021-10-20 15:48                               ` Luis Chamberlain
2021-10-21  0:39                                 ` Ming Lei
2021-10-21 17:18                                   ` Luis Chamberlain
2021-10-22  0:05                                     ` Ming Lei
2021-10-19 15:50                       ` Luis Chamberlain
2021-10-19 16:25                         ` Greg KH
2021-10-19 16:30                           ` Luis Chamberlain
2021-10-19 17:28                             ` Greg KH
2021-10-19 19:46                               ` Luis Chamberlain
2021-10-19 16:39                         ` Ming Lei
2021-10-19 19:38                           ` Luis Chamberlain
2021-10-20  0:55                             ` Ming Lei
2021-09-27 16:38 ` [PATCH v8 12/12] zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal Luis Chamberlain
2021-10-05 20:57   ` Kees Cook
2021-10-11 18:28     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWS5+uyDQ4qklTcT@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bvanassche@acm.org \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeyu@kernel.org \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-spdx@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.