linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: minchan@kernel.org, jeyu@kernel.org, ngupta@vflare.org,
	sergey.senozhatsky.work@gmail.com, axboe@kernel.dk,
	mbenes@suse.com, jpoimboe@redhat.com, tglx@linutronix.de,
	keescook@chromium.org, jikos@kernel.org, rostedt@goodmis.org,
	peterz@infradead.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal
Date: Tue, 22 Jun 2021 09:32:08 -0700	[thread overview]
Message-ID: <20210622163208.epx4lf3pv2x2d5b4@garbanzo> (raw)
In-Reply-To: <YNGVI/vKSBAM8dlh@kroah.com>

On Tue, Jun 22, 2021 at 09:45:39AM +0200, Greg KH wrote:
> On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
> > When sysfs attributes use a lock also used on driver removal we can
> > potentially deadlock. This happens when for instance a sysfs file on
> > a driver is used, then at the same time we have driver removal trigger.
> > The driver removal code holds a lock, and then the sysfs file entry waits
> > for the same lock. While holding the lock the driver removal tries to
> > remove the sysfs entries, but these cannot be removed yet as one is
> > waiting for a lock. This won't complete as the lock is already held.
> > Likewise module removal cannot complete, and so we deadlock.
> 
> This is all about removing modules, not about "driver removal" from a
> device.  Please make the subject line here more explicit.

Sure.

> > To fix this we just *try* to get a refcount to the module when a shared
> > lock is used, prior to mucking with a sysfs attribute. If this fails we
> > just give up right away.
> > 
> > We use a try method as a full lock means we'd then make our sysfs attributes
> > busy us out from possible module removal, and so userspace could force denying
> > module removal, a silly form of "DOS" against module removal. A try lock on
> > the module removal ensures we give priority to module removal and interacting
> > with sysfs attributes only comes second. Using a full lock could mean for
> > instance that if you don't stop poking at sysfs files you cannot remove a
> > module.
> > 
> > This deadlock was first reported with the zram driver, a sketch of how
> > this can happen follows:
> > 
> > CPU A                              CPU B
> >                                    whatever_store()
> > module_unload
> >   mutex_lock(foo)
> >                                    mutex_lock(foo)
> >    del_gendisk(zram->disk);
> >      device_del()
> >        device_remove_groups()
> 
> Can you duplicate this in a real-world situation?
> 
> What tools remove the zram module from the system on the fly?

A customer did run into it through a series of automated tests. I was
able to finally reproduce with the instructions given below. I
simplified it given that the series of test the customer was running
was much more complex.

> > In this situation whatever_store() is waiting for the mutex foo to
> > become unlocked, but that won't happen until module removal is complete.
> > But module removal won't complete until the syfs file being poked completes
> > which is waiting for a lock already held.
> > 
> > This is a generic kernel issue with sysfs files which use any lock also
> > used on module removal. Different generic solutions have been proposed.
> > One approach proposed is by directly by augmenting attributes with module
> > information [0]. This patch implements a solution by adding macros with
> > the prefix MODULE_DEVICE_ATTR_*() which accomplish the same. Until we
> > don't have a generic agreed upon solution for this shared between drivers,
> > we must implement a fix for this on each driver.
> > 
> > We make zram use the new MODULE_DEVICE_ATTR_*() helpers, and completely
> > open code the solution for class attributes as there are only a few of
> > those.
> > 
> > This issue can be reproduced easily on the zram driver as follows:
> > 
> > Loop 1 on one terminal:
> > 
> > while true;
> > 	do modprobe zram;
> > 	modprobe -r zram;
> > done
> > 
> > Loop 2 on a second terminal:
> > while true; do
> > 	echo 1024 >  /sys/block/zram0/disksize;
> > 	echo 1 > /sys/block/zram0/reset;
> > done
> 
> As fun as this is, it's not a real workload, please do not pretend that
> it is.

Whoever said that it was? This is just a way to reproduce an issue which
was reported.

> And your code is still racy, see below.  You just made the window even
> smaller, which you still should be objecting to as you somehow feel this
> is a valid usecase :)
> 
> > @@ -2048,13 +2048,19 @@ static ssize_t hot_add_show(struct class *class,
> >  {
> >  	int ret;
> >  
> > +	if (!try_module_get(THIS_MODULE))
> > +		return -ENODEV;
> > +
> 
> You can not increment/decrement your own module's reference count and
> expect it to work properly, as it is still a race.

The goal here is to prevent an rmmod call if this succeeds. If it
succeeds then any subsequent rmmod will fail. Can you explain how
this is still racy?

  Luis

  reply	other threads:[~2021-06-22 16:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 23:30 [PATCH v3 0/3] zram: fix few sysfs races Luis Chamberlain
2021-06-21 23:30 ` [PATCH v3 1/3] zram: fix crashes due to use of cpu hotplug multistate Luis Chamberlain
2021-06-22  7:39   ` Greg KH
2021-06-22 15:12     ` Luis Chamberlain
2021-06-21 23:36 ` [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal Luis Chamberlain
2021-06-22  7:41   ` Greg KH
2021-06-22 15:27     ` Luis Chamberlain
2021-06-22 16:27       ` Greg KH
2021-06-22 16:40         ` Luis Chamberlain
2021-06-22 16:51           ` Greg KH
2021-06-22 17:00             ` Luis Chamberlain
2021-06-22  7:45   ` Greg KH
2021-06-22 16:32     ` Luis Chamberlain [this message]
2021-06-22 17:16       ` Greg KH
2021-06-22 17:27         ` Luis Chamberlain
2021-06-22 18:05           ` Greg KH
2021-06-22 19:57             ` Luis Chamberlain
2021-06-21 23:36 ` [PATCH v3 3/3] drivers/base/core: refcount kobject and bus on device attribute read / store Luis Chamberlain
2021-06-22  7:46   ` Greg KH
2021-06-22 16:44     ` Luis Chamberlain
2021-06-22 16:48       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210622163208.epx4lf3pv2x2d5b4@garbanzo \
    --to=mcgrof@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeyu@kernel.org \
    --cc=jikos@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbenes@suse.com \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).