linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Minchan Kim <minchan@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 2/4] zram: don't fail to remove zram during unloading module
Date: Thu, 21 Oct 2021 16:50:55 -0700	[thread overview]
Message-ID: <YXH836VvSpnBnvuK@bombadil.infradead.org> (raw)
In-Reply-To: <20211020015548.2374568-3-ming.lei@redhat.com>

On Wed, Oct 20, 2021 at 09:55:46AM +0800, Ming Lei wrote:
> When zram module is started to be unloaded, no one should use all zram
> disks at that time. But disk's show() or store() attribute method may be
> running, this way is expected because device_del()(called from
> del_gendisk) will drain all pending show()/store().

How about:

----------
When the zram module is being unloaded, no one should be using the
zram disks. However even while being unloaded the zram module's
sysfs attributes might be poked at to re-configure the zram module.
This is expected, and kernfs ensures that these operations complete
before device_del() completes.
----------

> But reset_store() may set ->claim which will fail zram_remove(), when
> this happens, the zram device will be leaked and the warning of 'Error:
> Removing state 63 which has instances left.' is triggered during
> unloading module.

Note: the "Removing state 63 which has instances left" does not
necessarily mean the struct zram is leaked. It just means that the per
cpu struct zcomp instances are leaked, given the CPU hotplug removal
callback is in charge of cleaning them up. That this gave us a hint that
we do in fact leak a struct zram device as well is separate from what
the message means, but I do agree it should be *that* because as you
noted, LTP does not yet try to make spaghetti with hot_add_show().

So, how about:

----------
When the reset sysfs op (reset_store()) gets called the zram->claim will
be set, and this prevents zram_remove() from removing a zram device. If one
is unloading the module and one races to run the reset sysfs op we end
up leaking the struct zram device. We learned about this issue through
the error "Error: Removing state 63 which has instances left". While
this just means the any of the per cpu struct zcomp instances are
leaked when we're removing the CPU hotplug multistate callback, the
test case (running LTP zram02.sh twice in different terminals) that
allows us to reproduce this issue only mucks with reseting the devices,
not hot_add_show(). Setting zram->claim will prevent zram_remove() from
completing successfully, and so on module removal zram_remove_cb() does
not tell us when it failed to remove the full struct zram device and
it leaks.
----------

This begs the question, should we not then also make zram_remove_cb()
chatty on failure?

> Fixes the issue by not failing zram_remove() if ->claim is set, and
> we actually need to do nothing in case that zram_reset() is running
> since del_gendisk() will wait until zram_reset() is done.

Sure this all looks like a reasonable alternative to the issue without
using a generic lock. It does beg the questions if this suffices for
the other oddball sysfs ops, but we can take that up with time.

> Reported-by: Luis Chamberlain <mcgrof@kernel.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

In so far as the code is concerned:

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis

  reply	other threads:[~2021-10-21 23:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20  1:55 [PATCH V2 0/4] zram: fix two races and one zram leak Ming Lei
2021-10-20  1:55 ` [PATCH V2 1/4] zram: fix race between zram_reset_device() and disksize_store() Ming Lei
2021-10-21 23:03   ` Luis Chamberlain
2021-10-20  1:55 ` [PATCH V2 2/4] zram: don't fail to remove zram during unloading module Ming Lei
2021-10-21 23:50   ` Luis Chamberlain [this message]
2021-10-22  0:38     ` Ming Lei
2021-10-20  1:55 ` [PATCH V2 3/4] zram: avoid race between zram_remove and disksize_store Ming Lei
2021-10-22 19:02   ` Luis Chamberlain
2021-10-20  1:55 ` [PATCH V2 4/4] zram: replace fsync_bdev with sync_blockdev Ming Lei
2021-10-22 19:06   ` Luis Chamberlain
2021-10-20 21:40 ` [PATCH V2 0/4] zram: fix two races and one zram leak Minchan Kim
2021-10-21 17:39   ` Luis Chamberlain
2021-10-21 17:46     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXH836VvSpnBnvuK@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).