linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Khazhy Kumykov <khazhy@google.com>
Cc: Jan Kara <jack@suse.cz>, Paolo Valente <paolo.valente@linaro.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] bfq: silence lockdep for bfqd/ioc lock inversion
Date: Thu, 15 Apr 2021 12:47:22 +0200	[thread overview]
Message-ID: <20210415104722.GB25217@quack2.suse.cz> (raw)
In-Reply-To: <CACGdZYJAoTjE3bbXS8ATRT6R3WHdY8P+q989RjKM_z_J7wpvyg@mail.gmail.com>

On Wed 14-04-21 11:33:14, Khazhy Kumykov wrote:
> On Wed, Apr 14, 2021 at 2:54 AM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 18-03-21 23:00:15, Khazhismel Kumykov wrote:
> > > lockdep warns of circular locking due to inversion between
> > > bfq_insert_requests and bfq_exit_icq. If we end freeing a request when
> > > merging, we *may* grab an ioc->lock if that request is the last refcount
> > > to that ioc. bfq_bio_merge also potentially could have this ordering.
> > > bfq_exit_icq, conversely, grabs bfqd but is always called with ioc->lock
> > > held.
> > >
> > > bfq_exit_icq may either be called from put_io_context_active with ioc
> > > refcount raised, ioc_release_fn after the last refcount was already
> > > dropped, or ioc_clear_queue, which is only called while queue is
> > > quiesced or exiting, so the inverted orderings should never conflict.
> > >
> > > Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as
> > > an extra scheduler")
> > >
> > > Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> >
> > I've just hit the same lockdep complaint. When looking at this another
> > option to solve this complaint seemed to be to modify bfq_bio_merge() like:
> >
> >         ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
> >
> > +       spin_unlock_irq(&bfqd->lock);
> >         if (free)
> >                 blk_mq_free_request(free);
> > -       spin_unlock_irq(&bfqd->lock);
> >
> >         return ret;
> >
> > to release request outside of bfqd->lock. Because AFAICT there's no good
> > reason why we are actually freeing the request under bfqd->lock. And it
> > would seem a bit safer than annotating-away the lockdep complaint (as much
> > as I don't see a problem with your analysis). Paolo?
> 
> If we can re-order the locking so we don't need the annotation, that
> seems better ("inversion is OK so long as either we're frozen or we
> have ioc refcount, and we only grab ioc->lock normally if we drop the
> last refcount" is a tad "clever"). Though we still need to deal with
> blk_mq_sched_try_insert_merge which can potentially free a request.

I see, right.

> (See the first stacktrace). Something simple that I wasn't sure of is:
> could we delay bfq_exit_icq work, then avoid the inversion? Simpler to
> analyze then.

That's problematic because ICQ (referencing BFQQs etc.) is going to be
freed after RCU grace period expires. So we cannot really postpone the
teardown of bfq_io_cq. What we could do is to modify
blk_mq_sched_try_insert_merge() so that it returns request to free
similarly to blk_mq_sched_try_merge().  Then we can free the request after
dropping bfqd->lock.

								Honza

> > > ---
> > >  block/bfq-iosched.c | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > Noticed this lockdep running xfstests (generic/464) on top of a bfq
> > > block device. I was also able to tease it out w/ binary trying to issue
> > > requests that would end up merging while rapidly swapping the active
> > > scheduler. As far as I could see, the deadlock would not actually occur,
> > > so this patch opts to change lock class for the inverted case.
> > >
> > > bfqd -> ioc :
> > > [ 2995.524557] __lock_acquire+0x18f5/0x2660
> > > [ 2995.524562] lock_acquire+0xb4/0x3a0
> > > [ 2995.524565] _raw_spin_lock_irqsave+0x3f/0x60
> > > [ 2995.524569] put_io_context+0x33/0x90.  -> ioc->lock grabbed
> > > [ 2995.524573] blk_mq_free_request+0x51/0x140
> > > [ 2995.524577] blk_put_request+0xe/0x10
> > > [ 2995.524580] blk_attempt_req_merge+0x1d/0x30
> > > [ 2995.524585] elv_attempt_insert_merge+0x56/0xa0
> > > [ 2995.524590] blk_mq_sched_try_insert_merge+0x4b/0x60
> > > [ 2995.524595] bfq_insert_requests+0x9e/0x18c0.    -> bfqd->lock grabbed
> > > [ 2995.524598] blk_mq_sched_insert_requests+0xd6/0x2b0
> > > [ 2995.524602] blk_mq_flush_plug_list+0x154/0x280
> > > [ 2995.524606] blk_finish_plug+0x40/0x60
> > > [ 2995.524609] ext4_writepages+0x696/0x1320
> > > [ 2995.524614] do_writepages+0x1c/0x80
> > > [ 2995.524621] __filemap_fdatawrite_range+0xd7/0x120
> > > [ 2995.524625] sync_file_range+0xac/0xf0
> > > [ 2995.524642] __x64_sys_sync_file_range+0x44/0x70
> > > [ 2995.524646] do_syscall_64+0x31/0x40
> > > [ 2995.524649] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > ioc -> bfqd
> > > [ 2995.524490] _raw_spin_lock_irqsave+0x3f/0x60
> > > [ 2995.524498] bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
> > > [ 2995.524512] put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
> > > [ 2995.524516] exit_io_context+0x48/0x50
> > > [ 2995.524519] do_exit+0x7e9/0xdd0
> > > [ 2995.524526] do_group_exit+0x54/0xc0
> > > [ 2995.524530] __x64_sys_exit_group+0x18/0x20
> > > [ 2995.524534] do_syscall_64+0x31/0x40
> > > [ 2995.524537] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > Another trace where we grab ioc -> bfqd through bfq_exit_icq is when
> > > changing elevator
> > >                -> #1 (&(&bfqd->lock)->rlock){-.-.}:
> > > [  646.890820]        lock_acquire+0x9b/0x140
> > > [  646.894868]        _raw_spin_lock_irqsave+0x3b/0x50
> > > [  646.899707]        bfq_exit_icq_bfqq+0x47/0x1f0
> > > [  646.904196]        bfq_exit_icq+0x21/0x30
> > > [  646.908160]        ioc_destroy_icq+0xf3/0x130
> > > [  646.912466]        ioc_clear_queue+0xb8/0x140
> > > [  646.916771]        elevator_switch_mq+0xa4/0x3c0
> > > [  646.921333]        elevator_switch+0x5f/0x340
> > >
> > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> > > index 95586137194e..cb50ac0ffe80 100644
> > > --- a/block/bfq-iosched.c
> > > +++ b/block/bfq-iosched.c
> > > @@ -5027,7 +5027,14 @@ static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync)
> > >       if (bfqq && bfqd) {
> > >               unsigned long flags;
> > >
> > > -             spin_lock_irqsave(&bfqd->lock, flags);
> > > +             /* bfq_exit_icq is usually called with ioc->lock held, which is
> > > +              * inverse order from elsewhere, which may grab ioc->lock
> > > +              * under bfqd->lock if we merge requests and drop the last ioc
> > > +              * refcount. Since exit_icq is either called with a refcount,
> > > +              * or with queue quiesced, use a differnet lock class to
> > > +              * silence lockdep
> > > +              */
> > > +             spin_lock_irqsave_nested(&bfqd->lock, flags, 1);
> > >               bfqq->bic = NULL;
> > >               bfq_exit_bfqq(bfqd, bfqq);
> > >               bic_set_bfqq(bic, NULL, is_sync);
> > > --
> > > 2.31.0.rc2.261.g7f71774620-goog
> > >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR


-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2021-04-15 10:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  6:00 [PATCH] bfq: silence lockdep for bfqd/ioc lock inversion Khazhismel Kumykov
2021-04-14  9:54 ` Jan Kara
2021-04-14 18:33   ` Khazhy Kumykov
2021-04-15 10:47     ` Jan Kara [this message]
2021-05-05 13:35       ` Paolo Valente
2021-05-05 19:51         ` Khazhy Kumykov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210415104722.GB25217@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=khazhy@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).