linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@canonical.com>
To: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: linux-block <linux-block@vger.kernel.org>,
	Roland Kammerer <roland.kammerer@linbit.com>,
	Jens Axboe <axboe@kernel.dk>, NeilBrown <neilb@suse.com>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Shaohua Li <shli@kernel.org>, Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	"open list:DEVICE-MAPPER (LVM)" <dm-devel@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Takashi Iwai <tiwai@suse.de>, Jiri Kosina <jkosina@suse.cz>,
	Zheng Liu <gnehzuil.liu@gmail.com>,
	Keith Busch <keith.busch@intel.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"open list:BCACHE (BLOCK LAYER CACHE)"
	<linux-bcache@vger.kernel.org>,
	"open list:SOFTWARE RAID (Multiple Disks) SUPPORT" 
	<linux-raid@vger.kernel.org>
Subject: Re: [RFC] block: fix blk_queue_split() resource exhaustion
Date: Thu, 7 Jul 2016 21:14:59 +0800	[thread overview]
Message-ID: <CACVXFVNZD9Li9gf2r-jjPkSHin5XmSpWFseRt2-5SYdn4GYj0g@mail.gmail.com> (raw)
In-Reply-To: <20160707080328.GG13335@soda.linbit>

On Thu, Jul 7, 2016 at 4:03 PM, Lars Ellenberg
<lars.ellenberg@linbit.com> wrote:
> On Wed, Jul 06, 2016 at 11:57:51PM +0800, Ming Lei wrote:
>> > ==== my suggestion
>> >
>> > generic_make_request(bio_orig)
>> >                 NULL                    in-flight=0
>> > bio_orig        empty                   in-flight=0
>> > qA->make_request_fn(bio_orig)
>> >   blk_queue_split()
>> >   result:
>> >   bio_s, and bio_r stuffed away to head of remainder list.
>> >                                         in-flight=1
>> >   bio_c = bio_clone(bio_s)
>> >   generic_make_request(bio_c to qB)
>> >                 bio_c
>> > <-return
>> >                 bio_c
>> >   bio_list_pop()
>> >                 empty
>> > qB->make_request_fn(bio_c)
>> >   (Assume it does not clone, but only remap.
>> >   But it may also be a striping layer,
>> >   and queue more than one bio here.)
>> >   generic_make_request(bio_c to qC)
>> >                 bio_c
>> > <-return
>> >   bio_list_pop()
>> >                 empty
>> > qC->make_request_fn(bio_c)
>> >   generic_make_request(bio_c to qD)
>> >                 bio_c
>> > <-return
>> >   bio_list_pop()
>> >                 empty
>> > qD->make_request_fn(bio_c)
>> >         dispatches to hardware
>> > <-return
>> >                 empty
>> >    bio_list_pop()
>> >    NULL, great, lets pop from remainder list
>> > qA->make_request_fn(bio_r)              in-flight=?
>> >
>> >         May block, but only until completion of bio_c.
>> >         Which may already have happened.
>> >
>> >         *makes progress*
>>
>> I admit your solution is smart, but it isn't easy to prove it as correct
>> in theory.  But if the traversal can be mapped into pre-order traversal
>> of the above binary tree, it may be correct.
>
> What are you talking about.
> There is no tree.
> There is a single fifo.
> And I suggest to make that one fifo, and one lifo instead.

The implementation may use fifo(queue) or lifo(stack), but the traversal
order actually is pre-order on one binary tree if we think the splitted bio and
the cloned bio as left child, and the remainder bio as right child.

The big benifit of modeling the algorithem as tree is that the implementation
can be proved easily.

Your patch is very similar with non-recursive pre-order algorithem.

And the one line change approach is just the typical pre-order
recursive algorithem on binary tree.

Let's abstract the one line change into following pseudocode:

bio_list_add_head(current->bio_list, bio);

while (!bio_list_empty(current->bio_list)) {

       bio = bio_list_pop(current->bio_list);

        q->make_request_fn(q, bio);   //visit the node
}

q->make_request_fn(q, bio)
       -> blk_queue_split(split, bio)
              -> generic_make_request(remainder)
                    ->bio_list_add_head(current->bio_list, bio);
              -> push(split) & pop(split) & visit the split bio
                        ->generic_make_request(cloned_bio)
                              ->bio_list_add_head(current->bio_list, bio);

If you compare the above with the non-recursive pre-order traversal
implementation[1], it is basically same.

[1] http://algorithmsandme.in/2015/03/preorder-traversal-of-tree-without-recursion/

The main difference between oneline change and this patch is the submit
order in the following situation:

q->make_request_fn(bio)
          ->generic_make_request(cloned_bio0)
          ->generic_make_request(cloned_bio1)
          ->generic_make_request(cloned_bio2)

But the three BIOs are often submitted into different queues, so the order
may not make a big difference. Even they are submitted to same queue
(disk), the order change may not be a issue too because there is still
I/O scheduler.

And do we have stacked driver which submits several BIOs to same queue
in .make_request_fn()?

If there isn't such case, I suggest the oneline change, together with
the pre-order traversal comment since it is very simple.

Thanks,

>
>   |<------ original bio ----->|
>   |piece|----remainder--------|
>
>   |piece| is then processed, just as it was before,
>   all recursive submissions turned into iterative processing,
>   in the exact order they have been called recursively.
>   Until all deeper level submissions have been fully processed.
>
>   If deeper levels are again calling bio_queue_split, their
>   respective remainder are queued in front of the "top level"
>   remainder.
>
>   And only then, the remainders are processed,
>   just as if they did come in as "original bio", see above.
>
> So if it did make progress before,
> it will make progress now.
>
>     Lars
>

  reply	other threads:[~2016-07-07 13:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-22  8:22 [RFC] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-06-24 11:36 ` Ming Lei
2016-06-24 14:27   ` Lars Ellenberg
2016-06-24 15:15     ` Mike Snitzer
2016-06-28  8:24       ` Lars Ellenberg
2016-06-25  9:30     ` [RFC] " Ming Lei
2016-06-28  8:45       ` Lars Ellenberg
2016-07-02 10:03         ` Ming Lei
2016-07-02 10:28 ` Ming Lei
2016-07-04  8:20   ` Lars Ellenberg
2016-07-04 10:47     ` Ming Lei
2016-07-06 12:38       ` Lars Ellenberg
2016-07-06 15:57         ` Ming Lei
2016-07-07  8:03           ` Lars Ellenberg
2016-07-07 13:14             ` Ming Lei [this message]
2016-07-07  5:35 ` [dm-devel] " NeilBrown
2016-07-07  8:16   ` Lars Ellenberg
2016-07-07 12:39     ` Lars Ellenberg
2016-07-07 12:47       ` Mike Snitzer
2016-07-07 22:07     ` [dm-devel] [RFC] " NeilBrown
2016-07-08  8:02       ` Lars Ellenberg
2016-07-08  9:39         ` NeilBrown
2016-07-08 13:00           ` Lars Ellenberg
2016-07-08 13:59             ` Lars Ellenberg
2016-07-08 11:08       ` Ming Lei
2016-07-08 12:52         ` Lars Ellenberg
2016-07-08 13:05           ` Mike Snitzer
2016-07-07 12:45   ` Mike Snitzer
2016-07-07 22:40     ` NeilBrown
2016-07-07 14:36 ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACVXFVNZD9Li9gf2r-jjPkSHin5XmSpWFseRt2-5SYdn4GYj0g@mail.gmail.com \
    --to=ming.lei@canonical.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=keith.busch@intel.com \
    --cc=kent.overstreet@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mingo@redhat.com \
    --cc=neilb@suse.com \
    --cc=peterz@infradead.org \
    --cc=roland.kammerer@linbit.com \
    --cc=shli@kernel.org \
    --cc=snitzer@redhat.com \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).