linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Khazhy Kumykov <khazhy@google.com>
To: Jan Kara <jack@suse.cz>
Cc: Paolo Valente <paolo.valente@linaro.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] bfq: silence lockdep for bfqd/ioc lock inversion
Date: Wed, 14 Apr 2021 11:33:14 -0700	[thread overview]
Message-ID: <CACGdZYJAoTjE3bbXS8ATRT6R3WHdY8P+q989RjKM_z_J7wpvyg@mail.gmail.com> (raw)
In-Reply-To: <20210414095455.GA29760@quack2.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 5966 bytes --]

On Wed, Apr 14, 2021 at 2:54 AM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 18-03-21 23:00:15, Khazhismel Kumykov wrote:
> > lockdep warns of circular locking due to inversion between
> > bfq_insert_requests and bfq_exit_icq. If we end freeing a request when
> > merging, we *may* grab an ioc->lock if that request is the last refcount
> > to that ioc. bfq_bio_merge also potentially could have this ordering.
> > bfq_exit_icq, conversely, grabs bfqd but is always called with ioc->lock
> > held.
> >
> > bfq_exit_icq may either be called from put_io_context_active with ioc
> > refcount raised, ioc_release_fn after the last refcount was already
> > dropped, or ioc_clear_queue, which is only called while queue is
> > quiesced or exiting, so the inverted orderings should never conflict.
> >
> > Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as
> > an extra scheduler")
> >
> > Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
>
> I've just hit the same lockdep complaint. When looking at this another
> option to solve this complaint seemed to be to modify bfq_bio_merge() like:
>
>         ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
>
> +       spin_unlock_irq(&bfqd->lock);
>         if (free)
>                 blk_mq_free_request(free);
> -       spin_unlock_irq(&bfqd->lock);
>
>         return ret;
>
> to release request outside of bfqd->lock. Because AFAICT there's no good
> reason why we are actually freeing the request under bfqd->lock. And it
> would seem a bit safer than annotating-away the lockdep complaint (as much
> as I don't see a problem with your analysis). Paolo?

If we can re-order the locking so we don't need the annotation, that
seems better ("inversion is OK so long as either we're frozen or we
have ioc refcount, and we only grab ioc->lock normally if we drop the
last refcount" is a tad "clever"). Though we still need to deal with
blk_mq_sched_try_insert_merge which can potentially free a request.
(See the first stacktrace). Something simple that I wasn't sure of is:
could we delay bfq_exit_icq work, then avoid the inversion? Simpler to
analyze then.

Khazhy

>
>                                                                 Honza
> > ---
> >  block/bfq-iosched.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > Noticed this lockdep running xfstests (generic/464) on top of a bfq
> > block device. I was also able to tease it out w/ binary trying to issue
> > requests that would end up merging while rapidly swapping the active
> > scheduler. As far as I could see, the deadlock would not actually occur,
> > so this patch opts to change lock class for the inverted case.
> >
> > bfqd -> ioc :
> > [ 2995.524557] __lock_acquire+0x18f5/0x2660
> > [ 2995.524562] lock_acquire+0xb4/0x3a0
> > [ 2995.524565] _raw_spin_lock_irqsave+0x3f/0x60
> > [ 2995.524569] put_io_context+0x33/0x90.  -> ioc->lock grabbed
> > [ 2995.524573] blk_mq_free_request+0x51/0x140
> > [ 2995.524577] blk_put_request+0xe/0x10
> > [ 2995.524580] blk_attempt_req_merge+0x1d/0x30
> > [ 2995.524585] elv_attempt_insert_merge+0x56/0xa0
> > [ 2995.524590] blk_mq_sched_try_insert_merge+0x4b/0x60
> > [ 2995.524595] bfq_insert_requests+0x9e/0x18c0.    -> bfqd->lock grabbed
> > [ 2995.524598] blk_mq_sched_insert_requests+0xd6/0x2b0
> > [ 2995.524602] blk_mq_flush_plug_list+0x154/0x280
> > [ 2995.524606] blk_finish_plug+0x40/0x60
> > [ 2995.524609] ext4_writepages+0x696/0x1320
> > [ 2995.524614] do_writepages+0x1c/0x80
> > [ 2995.524621] __filemap_fdatawrite_range+0xd7/0x120
> > [ 2995.524625] sync_file_range+0xac/0xf0
> > [ 2995.524642] __x64_sys_sync_file_range+0x44/0x70
> > [ 2995.524646] do_syscall_64+0x31/0x40
> > [ 2995.524649] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > ioc -> bfqd
> > [ 2995.524490] _raw_spin_lock_irqsave+0x3f/0x60
> > [ 2995.524498] bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
> > [ 2995.524512] put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
> > [ 2995.524516] exit_io_context+0x48/0x50
> > [ 2995.524519] do_exit+0x7e9/0xdd0
> > [ 2995.524526] do_group_exit+0x54/0xc0
> > [ 2995.524530] __x64_sys_exit_group+0x18/0x20
> > [ 2995.524534] do_syscall_64+0x31/0x40
> > [ 2995.524537] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Another trace where we grab ioc -> bfqd through bfq_exit_icq is when
> > changing elevator
> >                -> #1 (&(&bfqd->lock)->rlock){-.-.}:
> > [  646.890820]        lock_acquire+0x9b/0x140
> > [  646.894868]        _raw_spin_lock_irqsave+0x3b/0x50
> > [  646.899707]        bfq_exit_icq_bfqq+0x47/0x1f0
> > [  646.904196]        bfq_exit_icq+0x21/0x30
> > [  646.908160]        ioc_destroy_icq+0xf3/0x130
> > [  646.912466]        ioc_clear_queue+0xb8/0x140
> > [  646.916771]        elevator_switch_mq+0xa4/0x3c0
> > [  646.921333]        elevator_switch+0x5f/0x340
> >
> > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> > index 95586137194e..cb50ac0ffe80 100644
> > --- a/block/bfq-iosched.c
> > +++ b/block/bfq-iosched.c
> > @@ -5027,7 +5027,14 @@ static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync)
> >       if (bfqq && bfqd) {
> >               unsigned long flags;
> >
> > -             spin_lock_irqsave(&bfqd->lock, flags);
> > +             /* bfq_exit_icq is usually called with ioc->lock held, which is
> > +              * inverse order from elsewhere, which may grab ioc->lock
> > +              * under bfqd->lock if we merge requests and drop the last ioc
> > +              * refcount. Since exit_icq is either called with a refcount,
> > +              * or with queue quiesced, use a differnet lock class to
> > +              * silence lockdep
> > +              */
> > +             spin_lock_irqsave_nested(&bfqd->lock, flags, 1);
> >               bfqq->bic = NULL;
> >               bfq_exit_bfqq(bfqd, bfqq);
> >               bic_set_bfqq(bic, NULL, is_sync);
> > --
> > 2.31.0.rc2.261.g7f71774620-goog
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

  reply	other threads:[~2021-04-14 18:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  6:00 [PATCH] bfq: silence lockdep for bfqd/ioc lock inversion Khazhismel Kumykov
2021-04-14  9:54 ` Jan Kara
2021-04-14 18:33   ` Khazhy Kumykov [this message]
2021-04-15 10:47     ` Jan Kara
2021-05-05 13:35       ` Paolo Valente
2021-05-05 19:51         ` Khazhy Kumykov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACGdZYJAoTjE3bbXS8ATRT6R3WHdY8P+q989RjKM_z_J7wpvyg@mail.gmail.com \
    --to=khazhy@google.com \
    --cc=axboe@kernel.dk \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).