From: Minchan Kim <minchan@kernel.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: keescook@chromium.org, dhowells@redhat.com, hch@infradead.org,
mbenes@suse.com, gregkh@linuxfoundation.org, ngupta@vflare.org,
sergey.senozhatsky.work@gmail.com, axboe@kernel.dk,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate
Date: Tue, 6 Apr 2021 18:23:46 -0700 [thread overview]
Message-ID: <YG0JouWqrJPHbpqz@google.com> (raw)
In-Reply-To: <20210406002909.GY4332@42.do-not-panic.com>
On Tue, Apr 06, 2021 at 12:29:09AM +0000, Luis Chamberlain wrote:
> On Mon, Apr 05, 2021 at 12:58:05PM -0700, Minchan Kim wrote:
> > On Mon, Apr 05, 2021 at 07:00:23PM +0000, Luis Chamberlain wrote:
> > > On Mon, Apr 05, 2021 at 10:07:24AM -0700, Minchan Kim wrote:
> > > > On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote:
> > > > > And come to think of it the last patch I had sent with a new
> > > > > DECLARE_RWSEM(zram_unload) also has this same issue making most
> > > > > sysfs attributes rather fragile.
> > > >
> > > > Thanks for looking the way. I agree the single zram_index_rwlock is
> > > > not the right approach to fix it. However, I still hope we find more
> > > > generic solution to fix them at once since I see it's zram instance
> > > > racing problem.
> > >
> > > They are 3 separate different problems. Related, but different.
> >
> > What are 3 different problems? I am asking since I remember only two:
> > one for CPU multistate and the other one for sysfs during rmmod.
>
> The third one is the race to use sysfs attributes and those routines
> then derefernece th egendisk private_data.
First of all, thanks for keeping discussion, Luis.
That was the one I thought race between sysfs and during rmmod.
>
> > > If the idea then is to busy out rmmod if a sysfs attribute is being
> > > read, that could then mean rmmod can sometimes never complete. Hogging
> > > up / busying out sysfs attributes means the module cannto be removed.
> >
> > It's true but is it a big problem? There are many cases that system
> > just return error if it's busy and rely on the admin. IMHO, rmmod should
> > be part of them.
>
> It depends on existing userspace scripts which are used to test and
> expectations set. Consider existing tests, you would know better, and
> since you are the maintainer you decide.
>
> I at least know for many other types of device drivers an rmmod is
> a sledge hammer.
>
> You decide. I just thought it would be good to highlight the effect now
> rather than us considering it later.
To me, the rmmod faillure is not a big problem for zram since it's
common cases in the system with -EBUSY(Having said, I agree that's the
best if we could avoid the fail-and-retrial. IOW, -EBUSY should be
last resort unless we have nicer way.)
>
> > > Which is why the *try_module_get()* I think is much more suitable, as
> > > it will always fails if we're already going down.
> >
> > How does the try_module_get solved the problem?
>
> The try stuff only resolves the deadlock. The bget() / bdput() resolves
> the race to access to the gendisk private_data.
That's the one I missed in this discussion. Now I am reading your [2/2]
in original patch. I thought it was just zram instance was destroyed
by sysfs race problem so you had seen the deadlock. I might miss the
point here, too.
Hmm, we are discussing several problems all at once. I feel it's time
to jump v2 with your way in this point. You said three different
problems. As I asked, please write it down with more detail with
code sequence as we discussed other thread. If you mean a deadlock,
please write what specific locks was deadlock with it.
It would make discussion much easier. Let's discuss the issue
one by one in each thread.
>
> > > > I see one of the problems is how I could make new zram object's
> > > > attribute group for zram knobs under /sys/block/zram0 since block
> > > > layer already made zram0 kobject via device_add_disk.
> > >
> > > Right.. well the syfs attribute races uncovered here actually do
> > > apply to any block driver as well. And which is why I was aiming
> > > for something generic if possible.
> >
> > It would be great but that's not the one we have atm so want to
> > proceed to fix anyway.
>
> What is not the one we have atm? I *do* have a proposed generic solution
> for 2/3 issues we have been disussing:
>
> a) deadlock on sysfs access
This is the one I didn't understand.
> b) gendisk private_data race
Yub.
>
> But so far Greg does not see enough justification for a), so we can either
> show how wider this issue is (which I can do using coccinelle), or we
> just open code the try_module_get() / put on each driver that needs it
> for now. Either way it would resolve the issue.
I second if it's general problem for drivers, I agree it's worth to
addresss in the core unless the driver introduce the mess. I have
no idea here since I didn't understand the problem, yet.
>
> As for b), given that I think even you had missed my attempt to
> generialzie the bdget/bdput solution for any attribute type (did you see
> my dev_type_get() and dev_type_put() proposed changes?), I don't think
> this problem is yet well defined in a generic way for us to rule it out.
> It is however easier to simply open code this per driver that needs it
> for now given that I don't think Greg is yet convinced the deadlock is
> yet a widespread issue. I however am pretty sure both races races *do*
> exist outside of zram in many places.
It would be good sign to propose general solution.
>
> > > I am not sure if you missed the last hunks of the generic solution,
> > > but that would resolve the issue you noted. Here is the same approach
> > > but in a non-generic solution, specific to just one attribute so far
> > > and to zram:
> >
> > So idea is refcount of the block_device's inode
>
> Yes that itself prevents races against the gendisk private_data from
> being invalid. Why because a bdget() would not be successful after
> del_gendisk():
>
> del_gendisk() --> invalidate_partition() --> __invalidate_device() --> invalidate_inodes()
>
> > and module_exit path
> > checks also the inode refcount to make rmmod failure?
>
> They try_module_get() approach resolves the deadlock race, but it does
> so in a lazy way. I mean lazy in that then rmmod wins over sysfs knobs.
> So touching sysfs knobs won't make an rmmod fail. I think that's more
> typical expected behaviour. Why? Because I find it odd that looping
> forever touching sysfs attributes should prevent a module removal. But
> that's a personal preference.
I agree with you that would be better but let's see how it could go
clean.
FYI, please look at hot_remove_store which also can remove zram
instance on demand. I am looking forward to seeing your v2.
Thanks for your patience, Luis.
next prev parent reply other threads:[~2021-04-07 1:23 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-06 2:20 [PATCH 0/2] zram: fix few ltp zram02.sh crashes Luis Chamberlain
2021-03-06 2:20 ` [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate Luis Chamberlain
2021-03-09 2:55 ` Minchan Kim
2021-03-10 13:11 ` Luis Chamberlain
2021-03-10 21:25 ` Luis Chamberlain
2021-03-12 2:08 ` Minchan Kim
2021-03-10 21:21 ` Luis Chamberlain
2021-03-12 2:14 ` Minchan Kim
2021-03-12 18:32 ` Luis Chamberlain
2021-03-12 19:28 ` Minchan Kim
2021-03-19 19:09 ` Luis Chamberlain
2021-03-22 16:37 ` Minchan Kim
2021-03-22 20:41 ` Luis Chamberlain
2021-03-22 22:12 ` Minchan Kim
2021-04-01 23:59 ` Luis Chamberlain
2021-04-02 7:54 ` Greg KH
2021-04-02 18:30 ` Luis Chamberlain
2021-04-03 6:13 ` Greg KH
[not found] ` <20210406003152.GZ4332@42.do-not-panic.com>
2021-04-06 12:00 ` Miroslav Benes
2021-04-06 15:54 ` Josh Poimboeuf
2021-04-07 14:09 ` Peter Zijlstra
2021-04-07 15:30 ` Josh Poimboeuf
2021-04-07 16:48 ` Peter Zijlstra
2021-04-07 20:17 ` Josh Poimboeuf
2021-04-08 6:18 ` Greg KH
2021-04-08 13:16 ` Steven Rostedt
2021-04-08 13:37 ` Josh Poimboeuf
2021-04-08 1:37 ` Thomas Gleixner
2021-04-08 6:16 ` Greg KH
2021-04-08 8:01 ` Jiri Kosina
2021-04-08 8:09 ` Greg KH
2021-04-08 8:35 ` Jiri Kosina
2021-04-08 8:55 ` Greg KH
2021-04-08 18:40 ` Luis Chamberlain
2021-04-09 3:01 ` Kees Cook
2021-04-05 17:07 ` Minchan Kim
2021-04-05 19:00 ` Luis Chamberlain
2021-04-05 19:58 ` Minchan Kim
2021-04-06 0:29 ` Luis Chamberlain
2021-04-07 1:23 ` Minchan Kim [this message]
2021-04-07 1:38 ` Minchan Kim
2021-04-07 14:52 ` Luis Chamberlain
2021-04-07 14:50 ` Luis Chamberlain
2021-03-06 2:20 ` [PATCH 2/2] zram: fix races of sysfs attribute removal and usage Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YG0JouWqrJPHbpqz@google.com \
--to=minchan@kernel.org \
--cc=axboe@kernel.dk \
--cc=dhowells@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hch@infradead.org \
--cc=keescook@chromium.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mbenes@suse.com \
--cc=mcgrof@kernel.org \
--cc=ngupta@vflare.org \
--cc=sergey.senozhatsky.work@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).