linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Danny Shih <dannyshih@synology.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: axboe@kernel.dk, agk@redhat.com, dm-devel@redhat.com,
	song@kernel.org, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
Subject: Re: [PATCH 0/4] Fix order when split bio and send remaining back to itself
Date: Thu, 31 Dec 2020 16:28:55 +0800	[thread overview]
Message-ID: <a318fd04-4f8c-2aec-2a58-18f456c98ef0@synology.com> (raw)
In-Reply-To: <20201230233429.GA6456@redhat.com>


Mike Snitzer writes:
>> submit_bio_noacct_add_head() in block device layer when we want to
>> split bio and send remaining back to itself.
> Ordering aside, you cannot split more than once.  So your proposed fix
> to insert at head isn't valid because you're still implicitly allocating
> more than one bio from the bioset which could cause deadlock in a low
> memory situation.
>
> I had to deal with a comparable issue with DM core not too long ago, see
> this commit:
>
> commit ee1dfad5325ff1cfb2239e564cd411b3bfe8667a
> Author: Mike Snitzer <snitzer@redhat.com>
> Date:   Mon Sep 14 13:04:19 2020 -0400
>
>      dm: fix bio splitting and its bio completion order for regular IO
>
>      dm_queue_split() is removed because __split_and_process_bio() _must_
>      handle splitting bios to ensure proper bio submission and completion
>      ordering as a bio is split.
>
>      Otherwise, multiple recursive calls to ->submit_bio will cause multiple
>      split bios to be allocated from the same ->bio_split mempool at the same
>      time. This would result in deadlock in low memory conditions because no
>      progress could be made (only one bio is available in ->bio_split
>      mempool).
>
>      This fix has been verified to still fix the loss of performance, due
>      to excess splitting, that commit 120c9257f5f1 provided.
>
>      Fixes: 120c9257f5f1 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
>      Cc: stable@vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
>      Reported-by: Ming Lei <ming.lei@redhat.com>
>      Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>
> Basically you cannot split the same bio more than once without
> recursing.  Your elaborate documentation shows things going wrong quite
> early in step 3.  That additional split and recursing back to MD
> shouldn't happen before the first bio split completes.
>
> Seems the proper fix is to disallow max_sectors_kb to be imposed, via
> blk_queue_split(), if MD has further splitting constraints, via
> chunk_sectors, that negate max_sectors_kb anyway.
>
> Mike


Hi Mike,

I think you're right that a driver should not split the same bio more
than once without recursing when using the same mempool.

If a driver only split bio once, the out-of-order issue no longer exists.
(Therefore, this problem won't occur on DM device.)

But the MD devices are using their private bioset (mddev->bio_set
or conf->bio_split) for splitting by themselves that are not the same
bioset used in blk_queue_split() (i.e. q->bio_split). The deadlock
you have mentioned might not happen to them.

I think there are two solutions:

1. In case MD devices want to change to use q->bio_split someday
    without this out-of-order issue, make them do split once would be
    a solution.

2. If MD devices should split the bio twice, so we can separately handle
    limits in blk_queue_split() and each raid level's (raid0, raid5, 
raid1, ...).
    I will try to find another solution in this case.

    My proposal is not suitable after I reconsider the problem:

    If a bio is split into A part and B part.

    +------|------+
    |   A  |   B  |
    +------|------+

    I think a driver should make sure A part is always handled before B 
part.
    Inserting bio at head of current->bio_list and submitting bio in the 
same
    time while handling A part could make bios generated from A part be
    handled before B part. This broke the order of those bios that generated
    form A part.

    (Maybe I should find a way to make B part at the head of 
bio_list_on_stack[1]
    while submitting it...)

Thanks for your comments.
I will try to figure out a better way to fix it in the next version.

Best regards,
Danny Shih


      reply	other threads:[~2020-12-31  8:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-29  9:18 [PATCH 0/4] Fix order when split bio and send remaining back to itself dannyshih
2020-12-29  9:18 ` [PATCH 1/4] block: introduce submit_bio_noacct_add_head dannyshih
2020-12-30  0:00   ` John Stoffel
2020-12-30  9:51     ` Danny Shih
2020-12-30 17:06       ` John Stoffel
2020-12-30 17:53         ` antlists
2020-12-30 11:35     ` antlists
2020-12-30 16:53       ` John Stoffel
2020-12-29  9:18 ` [PATCH 2/4] block: use submit_bio_noacct_add_head for split bio sending back dannyshih
2020-12-29  9:18 ` [PATCH 3/4] dm: " dannyshih
2020-12-29  9:18 ` [PATCH 4/4] md: " dannyshih
2020-12-30 23:34 ` [PATCH 0/4] Fix order when split bio and send remaining back to itself Mike Snitzer
2020-12-31  8:28   ` Danny Shih [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a318fd04-4f8c-2aec-2a58-18f456c98ef0@synology.com \
    --to=dannyshih@synology.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).