All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Minchan Kim <minchan@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH V2 2/4] zram: don't fail to remove zram during unloading module
Date: Fri, 22 Oct 2021 08:38:28 +0800	[thread overview]
Message-ID: <YXIIBPMdA547wy1b@T590> (raw)
In-Reply-To: <YXH836VvSpnBnvuK@bombadil.infradead.org>

On Thu, Oct 21, 2021 at 04:50:55PM -0700, Luis Chamberlain wrote:
> On Wed, Oct 20, 2021 at 09:55:46AM +0800, Ming Lei wrote:
> > When zram module is started to be unloaded, no one should use all zram
> > disks at that time. But disk's show() or store() attribute method may be
> > running, this way is expected because device_del()(called from
> > del_gendisk) will drain all pending show()/store().
> 
> How about:
> 
> ----------
> When the zram module is being unloaded, no one should be using the
> zram disks. However even while being unloaded the zram module's
> sysfs attributes might be poked at to re-configure the zram module.

're-configure the zram module' is confused, and it should be zram
device.

> This is expected, and kernfs ensures that these operations complete
> before device_del() completes.

Otherwise, the above is pretty good, and I will take that in V3.

> ----------
> 
> > But reset_store() may set ->claim which will fail zram_remove(), when
> > this happens, the zram device will be leaked and the warning of 'Error:
> > Removing state 63 which has instances left.' is triggered during
> > unloading module.
> 
> Note: the "Removing state 63 which has instances left" does not
> necessarily mean the struct zram is leaked. It just means that the per
> cpu struct zcomp instances are leaked, given the CPU hotplug removal
> callback is in charge of cleaning them up. That this gave us a hint that
> we do in fact leak a struct zram device as well is separate from what
> the message means, but I do agree it should be *that* because as you
> noted, LTP does not yet try to make spaghetti with hot_add_show().
> 
> So, how about:
> 
> ----------
> When the reset sysfs op (reset_store()) gets called the zram->claim will
> be set, and this prevents zram_remove() from removing a zram device. If one
> is unloading the module and one races to run the reset sysfs op we end
> up leaking the struct zram device. We learned about this issue through
> the error "Error: Removing state 63 which has instances left". While
> this just means the any of the per cpu struct zcomp instances are
> leaked when we're removing the CPU hotplug multistate callback, the
> test case (running LTP zram02.sh twice in different terminals) that
> allows us to reproduce this issue only mucks with reseting the devices,
> not hot_add_show(). Setting zram->claim will prevent zram_remove() from
> completing successfully, and so on module removal zram_remove_cb() does
> not tell us when it failed to remove the full struct zram device and
> it leaks.
> ----------

commit log is for helping people to understand the change, and too long or
exposing too much details may not serve that purpose since we also talk with
code, and I believe the following words are clear enough for explaining the
problem, and it is short & straightforward, and won't make people terrified, :-)

	But reset_store() may set ->claim which will fail zram_remove(), when
	this happens, zram_reset_device() is bypassed, and zram->comp can't
	be destroyed, so the warning of 'Error: Removing state 63 which has
	instances left.' is triggered during unloading module.

> 
> This begs the question, should we not then also make zram_remove_cb()
> chatty on failure?

When zram_remove() is run from module unloading path, it shouldn't be
failed cause no one owns any zram disk since unloading module, that is
why I add WARN_ON_ONCE() in zram_remove_cb() for document benefit because
zram_remove() is called from two contexts. But it can be failed in case of
hot removing.

> 
> > Fixes the issue by not failing zram_remove() if ->claim is set, and
> > we actually need to do nothing in case that zram_reset() is running
> > since del_gendisk() will wait until zram_reset() is done.
> 
> Sure this all looks like a reasonable alternative to the issue without
> using a generic lock. It does beg the questions if this suffices for
> the other oddball sysfs ops, but we can take that up with time.

->claim is only set in reset_store(), so it is enough for avoiding
zram_remove failure during unloading module.

> 
> > Reported-by: Luis Chamberlain <mcgrof@kernel.org>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> In so far as the code is concerned:
> 
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

Thanks for the review!

-- 
Ming


  reply	other threads:[~2021-10-22  0:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20  1:55 [PATCH V2 0/4] zram: fix two races and one zram leak Ming Lei
2021-10-20  1:55 ` [PATCH V2 1/4] zram: fix race between zram_reset_device() and disksize_store() Ming Lei
2021-10-21 23:03   ` Luis Chamberlain
2021-10-20  1:55 ` [PATCH V2 2/4] zram: don't fail to remove zram during unloading module Ming Lei
2021-10-21 23:50   ` Luis Chamberlain
2021-10-22  0:38     ` Ming Lei [this message]
2021-10-20  1:55 ` [PATCH V2 3/4] zram: avoid race between zram_remove and disksize_store Ming Lei
2021-10-22 19:02   ` Luis Chamberlain
2021-10-20  1:55 ` [PATCH V2 4/4] zram: replace fsync_bdev with sync_blockdev Ming Lei
2021-10-22 19:06   ` Luis Chamberlain
2021-10-20 21:40 ` [PATCH V2 0/4] zram: fix two races and one zram leak Minchan Kim
2021-10-21 17:39   ` Luis Chamberlain
2021-10-21 17:46     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXIIBPMdA547wy1b@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.