All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: NeilBrown <neilb@suse.com>, Mike Snitzer <snitzer@redhat.com>,
	Jens Axboe <axboe@kernel.dk>,
	Jack Wang <jinpu.wang@profitbricks.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Pavel Machek <pavel@ucw.cz>
Subject: Re: blk: improve order of bio handling in generic_make_request()
Date: Wed, 8 Mar 2017 18:15:04 +0100	[thread overview]
Message-ID: <CANr6vz_DMrVLAkCV+nhGHy21mGEAupocz+cJMK+mFdMJqUGawA@mail.gmail.com> (raw)
In-Reply-To: <alpine.LRH.2.02.1703081129080.17825@file01.intranet.prod.int.rdu2.redhat.com>

On 8 March 2017 at 17:40, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> On Wed, 8 Mar 2017, NeilBrown wrote:
> > I don't think this will fix the DM snapshot deadlock by itself.
> > Rather, it make it possible for some internal changes to DM to fix it.
> > The DM change might be something vaguely like:
> >
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 3086da5664f3..06ee0960e415 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci)
> >
> >       len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
> >
> > +     if (len < ci->sector_count) {
> > +             struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set);
>
> fs_bio_set is a shared bio set, so it is prone to deadlocks. For this
> change, we would need two bio sets per dm device, one for the split bio
> and one for the outgoing bio. (this also means having one more kernel
> thread per dm device)
>
> It would be possible to avoid having two bio sets if the incoming bio were
> the same as the outgoing bio (allocate a small structure, move bi_end_io
> and bi_private into it, replace bi_end_io and bi_private with pointers to
> device mapper and send the bio to the target driver), but it would need
> much more changes - basically rewrite the whole bio handling code in dm.c
> and in the targets.
>
> Mikulas

"back then" (see previously posted link into ML archive)
I suggested this:

...

A bit of conflict here may be that DM has all its own
split and clone and queue magic, and wants to process
"all of the bio" before returning back to generic_make_request().

To change that, __split_and_process_bio() and all its helpers
would need to learn to "push back" (pieces of) the bio they are
currently working on, and not push back via "DM_ENDIO_REQUEUE",
but by bio_list_add_head(&current->bio_lists->queue, piece_to_be_done_later).

Then, after they processed each piece,
*return* all the way up to the top-level generic_make_request(),
where the recursion-to-iteration logic would then
make sure that all deeper level bios, submitted via
recursive calls to generic_make_request() will be processed, before the
next, pushed back, piece of the "original incoming" bio.

And *not* do their own iteration over all pieces first.

Probably not as easy as dropping the while loop,
using bio_advance, and pushing that "advanced" bio back to
current->...queue?

static void __split_and_process_bio(struct mapped_device *md,
				    struct dm_table *map, struct bio *bio)
...
	ci.bio = bio;
	ci.sector_count = bio_sectors(bio);
	while (ci.sector_count && !error)
		error = __split_and_process_non_flush(&ci);
...
	error = __split_and_process_non_flush(&ci);
	if (ci.sector_count)
		bio_advance()
		bio_list_add_head(&current->bio_lists->queue, )
...

Something like that, maybe?


Needs to be adapted to this new and improved recursion-to-iteration
logic, obviously.  Would that be doable, or does device-mapper for some
reason really *need* its own iteration loop (which, because it is called
from generic_make_request(), won't be able to ever submit anything to
any device, ever, so needs all these helper threads just in case).

    Lars

  reply	other threads:[~2017-03-08 17:32 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-03  5:14 [PATCH] blk: improve order of bio handling in generic_make_request() NeilBrown
2017-03-03  9:28 ` Jack Wang
2017-03-06  4:40   ` NeilBrown
2017-03-06  9:43     ` Jack Wang
2017-03-07 15:46       ` Pavel Machek
2017-03-07 15:53         ` Jack Wang
2017-03-07 16:21         ` Jens Axboe
2017-03-06 20:18     ` Jens Axboe
2017-03-07  8:49       ` Jack Wang
2017-03-07 16:52         ` Mike Snitzer
2017-03-07 17:05           ` Jens Axboe
2017-03-07 17:14             ` Mike Snitzer
2017-03-07 20:29               ` NeilBrown
2017-03-07 23:01                 ` Mike Snitzer
2017-03-08 16:40                 ` Mikulas Patocka
2017-03-08 17:15                   ` Lars Ellenberg [this message]
2017-03-09  6:08                   ` NeilBrown
2017-03-08 11:46           ` Lars Ellenberg
2017-03-07 20:38       ` [PATCH v2] " NeilBrown
2017-03-07 20:38         ` NeilBrown
2017-03-10  4:32         ` NeilBrown
2017-03-10  4:33           ` [PATCH 1/5 v3] " NeilBrown
2017-03-10  4:34           ` [PATCH 2/5] blk: remove bio_set arg from blk_queue_split() NeilBrown
2017-03-10  4:35           ` [PATCH 3/5] blk: make the bioset rescue_workqueue optional NeilBrown
2017-03-10  4:36           ` [PATCH 4/5] blk: use non-rescuing bioset for q->bio_split NeilBrown
2017-03-10  4:37           ` [PATCH 5/5] block_dev: make blkdev_dio_pool a non-rescuing bioset NeilBrown
2017-03-10  4:38           ` [PATCH v2] blk: improve order of bio handling in generic_make_request() Jens Axboe
2017-03-10  4:40             ` Jens Axboe
2017-03-10  5:19             ` NeilBrown
2017-03-10 12:34               ` Lars Ellenberg
2017-03-10 14:38                 ` Mike Snitzer
2017-03-10 14:55                   ` Mikulas Patocka
2017-03-10 15:07                     ` Jack Wang
2017-03-10 15:35                       ` Mike Snitzer
2017-03-10 18:51                       ` Lars Ellenberg
2017-03-11  0:47                 ` NeilBrown
2017-03-11  0:47                   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANr6vz_DMrVLAkCV+nhGHy21mGEAupocz+cJMK+mFdMJqUGawA@mail.gmail.com \
    --to=lars.ellenberg@linbit.com \
    --cc=axboe@kernel.dk \
    --cc=jinpu.wang@profitbricks.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=neilb@suse.com \
    --cc=pavel@ucw.cz \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.