All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Tejun Heo" <tj@kernel.org>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Minchan Kim" <minchan@kernel.org>,
	jeyu@kernel.org, shuah <shuah@kernel.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Masahiro Yamada" <masahiroy@kernel.org>,
	"Nick Desaulniers" <ndesaulniers@google.com>,
	yzaikin@google.com, "Nathan Chancellor" <nathan@kernel.org>,
	ojeda@kernel.org,
	"Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
	vitor@massaru.org, elver@google.com,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Alexander Potapenko" <glider@google.com>,
	rf@opensource.cirrus.com,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"David Laight" <David.Laight@aculab.com>,
	bvanassche@acm.org, jolsa@kernel.org,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	trishalfonso@google.com, andreyknvl@gmail.com,
	"Jiri Kosina" <jikos@kernel.org>,
	mbenes@suse.com, "Nitin Gupta" <ngupta@vflare.org>,
	"Sergey Senozhatsky" <sergey.senozhatsky.work@gmail.com>,
	"Reinette Chatre" <reinette.chatre@intel.com>,
	"Fenghua Yu" <fenghua.yu@intel.com>,
	"Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	lizefan.x@bytedance.com, "Johannes Weiner" <hannes@cmpxchg.org>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Krzysztof Wilczyński" <kw@linux.com>,
	senozhatsky@chromium.org, "Christoph Hellwig" <hch@lst.de>,
	"Joe Perches" <joe@perches.com>,
	hkallweit1@gmail.com, "Jens Axboe" <axboe@kernel.dk>,
	"Josh Poimboeuf" <jpoimboe@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Kees Cook" <keescook@chromium.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	linux-spdx@vger.kernel.org,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	linux-block@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	copyleft-next@lists.fedorahosted.org
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal
Date: Mon, 20 Sep 2021 13:52:21 -0700	[thread overview]
Message-ID: <CAPcyv4i0xEwMQ5kSK-xGroV7aZr3j1YNrGMVLiLMr3U8nFCMKA@mail.gmail.com> (raw)
In-Reply-To: <20210918050430.3671227-10-mcgrof@kernel.org>

On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> When sysfs attributes use a lock also used on module removal we can
> race to deadlock. This happens when for instance a sysfs file on
> a driver is used, then at the same time we have module removal call
> trigger. The module removal call code holds a lock, and then the sysfs
> file entry waits for the same lock. While holding the lock the module
> removal tries to remove the sysfs entries, but these cannot be removed
> yet as one is waiting for a lock. This won't complete as the lock is
> already held. Likewise module removal cannot complete, and so we deadlock.
>
> This can now be easily reproducible with our sysfs selftest as follows:
>
> ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
>
> To fix this we extend the struct kernfs_node with a module reference and
> use the try_module_get() after kernfs_get_active() is called which
> protects integrity and the existence of the kernfs node during the
> operation.
>
> So long as the kernfs node is protected with kernfs_get_active() we know
> we can rely on its contents. And, as now just documented in the previous
> patch, we also now know that once kernfs_get_active() is called the module
> is also guarded to exist and cannot be removed.
>
> If try_module_get() fails we fail the operation on the kernfs node.
>
> We use a try method as a full lock means we'd then make our sysfs
> attributes busy us out from possible module removal, and so userspace
> could force denying module removal, a silly form of "DOS" against module
> removal. A try lock on the module removal ensures we give priority to
> module removal and interacting with sysfs attributes only comes second.
> Using a full lock could mean for instance that if you don't stop poking
> at sysfs files you cannot remove a module.
>
> Races between removal of sysfs files and the module are not possible
> given sysfs files are created by the same module, and when a sysfs file
> is being used kernfs prevents removal of the sysfs file. So if module
> removal is actually happening the removal would have to wait until
> the sysfs file operation is complete.
>
> This deadlock was first reported with the zram driver, however the live
> patching folks have acknowledged they have observed this as well with
> live patching, when a live patch is removed. I was then able to
> reproduce easily by creating a dedicated selftests.
>
> A sketch of how this can happen follows:
>
> CPU A                              CPU B
>                                    whatever_store()
> module_unload
>   mutex_lock(foo)
>                                    mutex_lock(foo)
>    del_gendisk(zram->disk);
>      device_del()
>        device_remove_groups()

This flow seems possible to trigger with:

   echo $dev > /sys/bus/$bus/drivers/$driver/unbind

I am missing why module pinning is part of the solution when it's the
device_del() path that is racing? Module removal is just a more coarse
grained way to trigger unbind => device_del(). Isn't the above a bug
in the driver, not missing synchronization in kernfs? Forgive me if
the unbind question was asked and answered elsewhere, this is my first
time taking a look at this series.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Luis Chamberlain <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Greg KH
	<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	jeyu-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	shuah <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"Rafael J. Wysocki"
	<rafael-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Masahiro Yamada
	<masahiroy-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Nick Desaulniers
	<ndesaulniers-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	yzaikin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	Nathan Chancellor
	<nathan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	ojeda-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Tetsuo Handa
	<penguin-kernel-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>,
	vitor-HdruG0TBZYlAfugRpC6u6w@public.gmane.org,
	elver-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	Jarkko Sakkinen <jarkko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Alexander Potapenko
	<glider-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	rf-yzvPICuk2AA4QjBA90+/kJqQE7yCjDx5@public.gmane.org,
	Stephen Hemminger
	<stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>,
	David Laight
	<David.Laight-JxhZ9S5GRejQT0dZR+AlfA@public.gmane.org>,
	bvanassche-HInyCGIudOg@public.gmane.org,
	jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Andy Shevchenko <andriy.s>
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal
Date: Mon, 20 Sep 2021 13:52:21 -0700	[thread overview]
Message-ID: <CAPcyv4i0xEwMQ5kSK-xGroV7aZr3j1YNrGMVLiLMr3U8nFCMKA@mail.gmail.com> (raw)
In-Reply-To: <20210918050430.3671227-10-mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> When sysfs attributes use a lock also used on module removal we can
> race to deadlock. This happens when for instance a sysfs file on
> a driver is used, then at the same time we have module removal call
> trigger. The module removal call code holds a lock, and then the sysfs
> file entry waits for the same lock. While holding the lock the module
> removal tries to remove the sysfs entries, but these cannot be removed
> yet as one is waiting for a lock. This won't complete as the lock is
> already held. Likewise module removal cannot complete, and so we deadlock.
>
> This can now be easily reproducible with our sysfs selftest as follows:
>
> ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
>
> To fix this we extend the struct kernfs_node with a module reference and
> use the try_module_get() after kernfs_get_active() is called which
> protects integrity and the existence of the kernfs node during the
> operation.
>
> So long as the kernfs node is protected with kernfs_get_active() we know
> we can rely on its contents. And, as now just documented in the previous
> patch, we also now know that once kernfs_get_active() is called the module
> is also guarded to exist and cannot be removed.
>
> If try_module_get() fails we fail the operation on the kernfs node.
>
> We use a try method as a full lock means we'd then make our sysfs
> attributes busy us out from possible module removal, and so userspace
> could force denying module removal, a silly form of "DOS" against module
> removal. A try lock on the module removal ensures we give priority to
> module removal and interacting with sysfs attributes only comes second.
> Using a full lock could mean for instance that if you don't stop poking
> at sysfs files you cannot remove a module.
>
> Races between removal of sysfs files and the module are not possible
> given sysfs files are created by the same module, and when a sysfs file
> is being used kernfs prevents removal of the sysfs file. So if module
> removal is actually happening the removal would have to wait until
> the sysfs file operation is complete.
>
> This deadlock was first reported with the zram driver, however the live
> patching folks have acknowledged they have observed this as well with
> live patching, when a live patch is removed. I was then able to
> reproduce easily by creating a dedicated selftests.
>
> A sketch of how this can happen follows:
>
> CPU A                              CPU B
>                                    whatever_store()
> module_unload
>   mutex_lock(foo)
>                                    mutex_lock(foo)
>    del_gendisk(zram->disk);
>      device_del()
>        device_remove_groups()

This flow seems possible to trigger with:

   echo $dev > /sys/bus/$bus/drivers/$driver/unbind

I am missing why module pinning is part of the solution when it's the
device_del() path that is racing? Module removal is just a more coarse
grained way to trigger unbind => device_del(). Isn't the above a bug
in the driver, not missing synchronization in kernfs? Forgive me if
the unbind question was asked and answered elsewhere, this is my first
time taking a look at this series.

  parent reply	other threads:[~2021-09-21  1:43 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210918050430.3671227-1-mcgrof@kernel.org>
2021-09-18  5:04 ` [PATCH v7 01/12] LICENSES: Add the copyleft-next-0.3.1 license Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 02/12] testing: use the copyleft-next-0.3.1 SPDX tag Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 03/12] selftests: add tests_sysfs module Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-20 21:38   ` Bart Van Assche
2021-09-20 21:38     ` Bart Van Assche
2021-09-18  5:04 ` [PATCH v7 04/12] kernfs: add initial failure injection support Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 05/12] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 06/12] kernel/module: add documentation for try_module_get() Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 07/12] fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on kernfs_create_link() Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 08/12] fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 09/12] sysfs: fix deadlock race with module removal Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-20 17:53   ` Tejun Heo
2021-09-20 17:53     ` Tejun Heo
2021-09-20 19:15     ` Luis Chamberlain
2021-09-20 19:15       ` Luis Chamberlain
2021-09-20 19:22       ` Tejun Heo
2021-09-20 19:22         ` Tejun Heo
2021-09-20 19:38         ` Luis Chamberlain
2021-09-20 19:38           ` Luis Chamberlain
2021-09-20 20:52   ` Dan Williams [this message]
2021-09-20 20:52     ` Dan Williams
2021-09-20 21:17     ` Luis Chamberlain
2021-09-20 21:17       ` Luis Chamberlain
2021-09-20 21:55       ` Dan Williams
2021-09-20 21:55         ` Dan Williams
2021-09-21  0:03         ` Luis Chamberlain
2021-09-21  0:03           ` Luis Chamberlain
2021-09-20 21:36   ` Bart Van Assche
2021-09-20 21:36     ` Bart Van Assche
2021-09-20 21:43     ` Luis Chamberlain
2021-09-20 21:43       ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 10/12] test_sysfs: enable deadlock tests by default Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 11/12] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-18  5:04 ` [PATCH v7 12/12] zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal Luis Chamberlain
2021-09-18  5:04   ` Luis Chamberlain
2021-09-20 17:55 ` [PATCH v7 00/12 (RESEND)] syfs: generic deadlock fix with " Tejun Heo
2021-09-20 17:55   ` Tejun Heo
2021-09-27 15:11   ` Luis Chamberlain
2021-09-27 15:11     ` Luis Chamberlain
     [not found] <20210917194709.3562413-1-mcgrof@kernel.org>
     [not found] ` <20210917194709.3562413-10-mcgrof@kernel.org>
     [not found]   ` <c70dcb03e27e43c5b5311e184357df39@AcuMS.aculab.com>
2021-09-21 15:48     ` [PATCH v7 09/12] sysfs: fix deadlock race " Luis Chamberlain
2021-09-21 15:48       ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4i0xEwMQ5kSK-xGroV7aZr3j1YNrGMVLiLMr3U8nFCMKA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=David.Laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=bvanassche@acm.org \
    --cc=cgroups@vger.kernel.org \
    --cc=copyleft-next@lists.fedorahosted.org \
    --cc=daniel.vetter@ffwll.ch \
    --cc=elver@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=hkallweit1@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=jeyu@kernel.org \
    --cc=jikos@kernel.org \
    --cc=joe@perches.com \
    --cc=jolsa@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kw@linux.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-spdx@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=masahiroy@kernel.org \
    --cc=mbenes@suse.com \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=ngupta@vflare.org \
    --cc=ojeda@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=reinette.chatre@intel.com \
    --cc=rf@opensource.cirrus.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=shuah@kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trishalfonso@google.com \
    --cc=vitor@massaru.org \
    --cc=x86@kernel.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.