linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Minchan Kim <minchan@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH V2 2/4] zram: don't fail to remove zram during unloading module
Date: Fri, 22 Oct 2021 08:38:28 +0800	[thread overview]
Message-ID: <YXIIBPMdA547wy1b@T590> (raw)
In-Reply-To: <YXH836VvSpnBnvuK@bombadil.infradead.org>

On Thu, Oct 21, 2021 at 04:50:55PM -0700, Luis Chamberlain wrote:
> On Wed, Oct 20, 2021 at 09:55:46AM +0800, Ming Lei wrote:
> > When zram module is started to be unloaded, no one should use all zram
> > disks at that time. But disk's show() or store() attribute method may be
> > running, this way is expected because device_del()(called from
> > del_gendisk) will drain all pending show()/store().
> 
> How about:
> 
> ----------
> When the zram module is being unloaded, no one should be using the
> zram disks. However even while being unloaded the zram module's
> sysfs attributes might be poked at to re-configure the zram module.

're-configure the zram module' is confused, and it should be zram
device.

> This is expected, and kernfs ensures that these operations complete
> before device_del() completes.

Otherwise, the above is pretty good, and I will take that in V3.

> ----------
> 
> > But reset_store() may set ->claim which will fail zram_remove(), when
> > this happens, the zram device will be leaked and the warning of 'Error:
> > Removing state 63 which has instances left.' is triggered during
> > unloading module.
> 
> Note: the "Removing state 63 which has instances left" does not
> necessarily mean the struct zram is leaked. It just means that the per
> cpu struct zcomp instances are leaked, given the CPU hotplug removal
> callback is in charge of cleaning them up. That this gave us a hint that
> we do in fact leak a struct zram device as well is separate from what
> the message means, but I do agree it should be *that* because as you
> noted, LTP does not yet try to make spaghetti with hot_add_show().
> 
> So, how about:
> 
> ----------
> When the reset sysfs op (reset_store()) gets called the zram->claim will
> be set, and this prevents zram_remove() from removing a zram device. If one
> is unloading the module and one races to run the reset sysfs op we end
> up leaking the struct zram device. We learned about this issue through
> the error "Error: Removing state 63 which has instances left". While
> this just means the any of the per cpu struct zcomp instances are
> leaked when we're removing the CPU hotplug multistate callback, the
> test case (running LTP zram02.sh twice in different terminals) that
> allows us to reproduce this issue only mucks with reseting the devices,
> not hot_add_show(). Setting zram->claim will prevent zram_remove() from
> completing successfully, and so on module removal zram_remove_cb() does
> not tell us when it failed to remove the full struct zram device and
> it leaks.
> ----------

commit log is for helping people to understand the change, and too long or
exposing too much details may not serve that purpose since we also talk with
code, and I believe the following words are clear enough for explaining the
problem, and it is short & straightforward, and won't make people terrified, :-)

	But reset_store() may set ->claim which will fail zram_remove(), when
	this happens, zram_reset_device() is bypassed, and zram->comp can't
	be destroyed, so the warning of 'Error: Removing state 63 which has
	instances left.' is triggered during unloading module.

> 
> This begs the question, should we not then also make zram_remove_cb()
> chatty on failure?

When zram_remove() is run from module unloading path, it shouldn't be
failed cause no one owns any zram disk since unloading module, that is
why I add WARN_ON_ONCE() in zram_remove_cb() for document benefit because
zram_remove() is called from two contexts. But it can be failed in case of
hot removing.

> 
> > Fixes the issue by not failing zram_remove() if ->claim is set, and
> > we actually need to do nothing in case that zram_reset() is running
> > since del_gendisk() will wait until zram_reset() is done.
> 
> Sure this all looks like a reasonable alternative to the issue without
> using a generic lock. It does beg the questions if this suffices for
> the other oddball sysfs ops, but we can take that up with time.

->claim is only set in reset_store(), so it is enough for avoiding
zram_remove failure during unloading module.

> 
> > Reported-by: Luis Chamberlain <mcgrof@kernel.org>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> In so far as the code is concerned:
> 
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

Thanks for the review!

-- 
Ming


  reply	other threads:[~2021-10-22  0:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20  1:55 [PATCH V2 0/4] zram: fix two races and one zram leak Ming Lei
2021-10-20  1:55 ` [PATCH V2 1/4] zram: fix race between zram_reset_device() and disksize_store() Ming Lei
2021-10-21 23:03   ` Luis Chamberlain
2021-10-20  1:55 ` [PATCH V2 2/4] zram: don't fail to remove zram during unloading module Ming Lei
2021-10-21 23:50   ` Luis Chamberlain
2021-10-22  0:38     ` Ming Lei [this message]
2021-10-20  1:55 ` [PATCH V2 3/4] zram: avoid race between zram_remove and disksize_store Ming Lei
2021-10-22 19:02   ` Luis Chamberlain
2021-10-20  1:55 ` [PATCH V2 4/4] zram: replace fsync_bdev with sync_blockdev Ming Lei
2021-10-22 19:06   ` Luis Chamberlain
2021-10-20 21:40 ` [PATCH V2 0/4] zram: fix two races and one zram leak Minchan Kim
2021-10-21 17:39   ` Luis Chamberlain
2021-10-21 17:46     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXIIBPMdA547wy1b@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).