All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Ming Lei <ming.lei@canonical.com>, NeilBrown <neilb@suse.com>,
	linux-block <linux-block@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	"open list:SOFTWARE RAID (Multiple Disks) SUPPORT"
	<linux-raid@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jiri Kosina <jkosina@suse.cz>,
	"open list:BCACHE (BLOCK LAYER CACHE)"
	<linux-bcache@vger.kernel.org>,
	Zheng Liu <gnehzuil.liu@gmail.com>,
	Keith Busch <keith.busch@intel.com>, Takashi Iwai <tiwai@suse.de>,
	"open list:DEVICE-MAPPER (LVM)" <dm-devel@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Shaohua Li <shli@kernel.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Alasdair Kergon <agk@redhat.com>,
	Roland Kammerer <roland.kammerer@linbit.com>
Subject: Re: block: fix blk_queue_split() resource exhaustion
Date: Fri, 8 Jul 2016 09:05:51 -0400	[thread overview]
Message-ID: <20160708130550.GB9196@redhat.com> (raw)
In-Reply-To: <20160708125225.GV13335@soda.linbit>

On Fri, Jul 08 2016 at  8:52am -0400,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> On Fri, Jul 08, 2016 at 07:08:32PM +0800, Ming Lei wrote:
> > > So after processing a particular bio, we should then process all the
> > > 'child' bios - bios send to underlying devices.  Then the 'sibling'
> > > bios, that were split off, and then any remaining parents and ancestors.
> > 
> > IMHO, that is just what the oneline patch is doing, isn't it?
> > 
> > | diff --git a/block/blk-core.c b/block/blk-core.c
> >  | index 2475b1c7..a5623f6 100644
> >  | --- a/block/blk-core.c
> >  | +++ b/block/blk-core.c
> >  | @@ -2048,7 +2048,7 @@ blk_qc_t generic_make_request(struct bio *bio)
> >  |       * should be added at the tail
> >  |       */
> >  |      if (current->bio_list) {
> >  | -            bio_list_add(current->bio_list, bio);
> >  | +            bio_list_add_head(current->bio_list, bio);
> >  |              goto out;
> >  |      }
> 
> Almost, but not quite.
> 
> As explained earlier, this will re-order.
> It will still process bios in "deepest level first" order,
> but it will process "sibling" bios in reverse submission order.
> 
> Think "very large bio" submitted to a stripe set
> with small stripe width/stripe unit size.
> 
> So I'd expect this to be a performance hit in some scenarios,
> unless the stack at some deeper level does back-merging in its elevator.
> (If some driver is not able to merge stuff because of "reverse submission
> order" this can easily mean saturating IOPS of the physical device with
> small requests, throttling bandwidth to a minimum.)
> 
> That's why I mentioned it as "potential easy fix for the deadlock",
> but did not suggest it as the proper way to fix this.
> 
> If however the powers that be decide that this was a non-issue,
> we could use it this way.

No, we cannot do this.  With blk-mq it doesn't have any of the more
elaborate IO scheduling that request_fn request_queues have.  We should
not be knowingly mangling the order of IO with the thought that some
other layer will fix it up.

I think it best for you to rebase your work (against jens' for-4.8/core)
into a single coherent patch and resubmit for 4.8 inclusion.  I really
don't see a huge benefit to keeping neilb's suggestion split out -- but
if you or others do that's fine.

The only concern I have relative to DM is: DM doesn't use
blk_queue_split, so will it need to open-code setting
recursion/remainder in order to ensure forward progress?  neilb seemed
to think the rework in generic_make_request would "just work" for the
dm-snapshot deadlock case though so maybe this isn't a valid
concern... unfortunately we don't have a quick reproducer for that
dm-snapshot issue so it'll take a bit to prove.

Mike

  reply	other threads:[~2016-07-08 13:05 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-22  8:22 [RFC] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-06-22  8:22 ` Lars Ellenberg
2016-06-24 11:36 ` Ming Lei
2016-06-24 11:36   ` Ming Lei
2016-06-24 14:27   ` Lars Ellenberg
2016-06-24 14:27     ` Lars Ellenberg
2016-06-24 15:15     ` Mike Snitzer
2016-06-24 15:15       ` Mike Snitzer
2016-06-28  8:24       ` Lars Ellenberg
2016-06-28  8:24         ` Lars Ellenberg
2016-06-25  9:30     ` [RFC] " Ming Lei
2016-06-25  9:30       ` Ming Lei
2016-06-28  8:45       ` Lars Ellenberg
2016-06-28  8:45         ` Lars Ellenberg
2016-07-02 10:03         ` Ming Lei
2016-07-02 10:03           ` Ming Lei
2016-07-02 10:28 ` Ming Lei
2016-07-02 10:28   ` Ming Lei
2016-07-04  8:20   ` Lars Ellenberg
2016-07-04  8:20     ` Lars Ellenberg
2016-07-04 10:47     ` Ming Lei
2016-07-04 10:47       ` Ming Lei
2016-07-06 12:38       ` Lars Ellenberg
2016-07-06 12:38         ` Lars Ellenberg
2016-07-06 15:57         ` Ming Lei
2016-07-06 15:57           ` Ming Lei
2016-07-07  8:03           ` Lars Ellenberg
2016-07-07  8:03             ` Lars Ellenberg
2016-07-07  8:03             ` Lars Ellenberg
2016-07-07 13:14             ` Ming Lei
2016-07-07 13:14               ` Ming Lei
2016-07-07  5:35 ` [dm-devel] " NeilBrown
2016-07-07  5:35   ` NeilBrown
2016-07-07  8:16   ` Lars Ellenberg
2016-07-07 12:39     ` Lars Ellenberg
2016-07-07 12:39       ` Lars Ellenberg
2016-07-07 12:47       ` Mike Snitzer
2016-07-07 22:07     ` [dm-devel] [RFC] " NeilBrown
2016-07-08  8:02       ` Lars Ellenberg
2016-07-08  9:39         ` NeilBrown
2016-07-08 13:00           ` Lars Ellenberg
2016-07-08 13:24             ` Re[2]: " Pavel Goran
2016-07-08 13:59             ` Lars Ellenberg
2016-07-08 11:08       ` Ming Lei
2016-07-08 11:08         ` Ming Lei
2016-07-08 11:08         ` Ming Lei
2016-07-08 12:52         ` Lars Ellenberg
2016-07-08 12:52           ` Lars Ellenberg
2016-07-08 13:05           ` Mike Snitzer [this message]
2016-07-08 13:05             ` Mike Snitzer
2016-07-07 12:45   ` Mike Snitzer
2016-07-07 22:40     ` NeilBrown
2016-07-07 14:36 ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160708130550.GB9196@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=keith.busch@intel.com \
    --cc=kent.overstreet@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@canonical.com \
    --cc=mingo@redhat.com \
    --cc=neilb@suse.com \
    --cc=peterz@infradead.org \
    --cc=roland.kammerer@linbit.com \
    --cc=shli@kernel.org \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.