From: Kent Overstreet <kent.overstreet@gmail.com>
To: Ming Lei <ming.lei@canonical.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
"oleg.drokin@intel.com" <oleg.drokin@intel.com>,
Ming Lin-SSI <ming.l@ssi.samsung.com>,
"andreas.dilger@intel.com" <andreas.dilger@intel.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"minchan@kernel.org" <minchan@kernel.org>,
"jkosina@suse.cz" <jkosina@suse.cz>,
kernel list <linux-kernel@vger.kernel.org>,
"jim@jtan.com" <jim@jtan.com>,
"pjk1939@linux.vnet.ibm.com" <pjk1939@linux.vnet.ibm.com>,
"axboe@fb.com" <axboe@fb.com>,
"geoff@infradead.org" <geoff@infradead.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
"dpark@posteo.net" <dpark@posteo.net>,
Pavel Machek <pavel@ucw.cz>,
"ngupta@vflare.org" <ngupta@vflare.org>,
"hch@lst.de" <hch@lst.de>, "agk@redhat.com" <agk@redhat.com>
Subject: Re: 4.4-final: 28 bioset threads on small notebook
Date: Tue, 23 Feb 2016 18:23:52 -0900 [thread overview]
Message-ID: <20160224032352.GA22341@kmo-pixel> (raw)
In-Reply-To: <CACVXFVN+xV5_4zktCGpQUrnOtJ1EVOLPt5v4niuj9VUGL8uUEA@mail.gmail.com>
On Wed, Feb 24, 2016 at 10:48:10AM +0800, Ming Lei wrote:
> On Tue, Feb 23, 2016 at 10:54 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> > On Mon, Feb 22 2016 at 9:55pm -0500,
> > Ming Lei <ming.lei@canonical.com> wrote:
> >
> >> On Tue, Feb 23, 2016 at 6:58 AM, Kent Overstreet
> >> <kent.overstreet@gmail.com> wrote:
> >> > On Sun, Feb 21, 2016 at 05:40:59PM +0800, Ming Lei wrote:
> >> >> On Sun, Feb 21, 2016 at 2:43 PM, Ming Lin-SSI <ming.l@ssi.samsung.com> wrote:
> >> >> >>-----Original Message-----
> >> >> >
> >> >> > So it's almost already "per request_queue"
> >> >>
> >> >> Yes, that is because of the following line:
> >> >>
> >> >> q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
> >> >>
> >> >> in blk_alloc_queue_node().
> >> >>
> >> >> Looks like this bio_set doesn't need to be per-request_queue, and
> >> >> now it is only used for fast-cloning bio for splitting, and one global
> >> >> split bio_set should be enough.
> >> >
> >> > It does have to be per request queue for stacking block devices (which includes
> >> > loopback).
> >>
> >> In commit df2cb6daa4(block: Avoid deadlocks with bio allocation by
> >> stacking drivers), deadlock in this situation has been avoided already.
> >> Or are there other issues with global bio_set? I appreciate if you may
> >> explain it a bit if there are.
> >
> > Even with commit df2cb6daa4 there is still risk of deadlocks (even
> > without low memory condition), see:
> > https://patchwork.kernel.org/patch/7398411/
>
> That is definitely another problem which isn't related with low memory,
> and I guess Kent means there might be deadlock risk in case of shared
> bio_set.
>
> >
> > (you may recall you blocked this patch with concerns about performance,
> > context switches, plug merging being compromised, etc.. to which I never
> > circled back to verify your concerns)
>
> I still remember that problem:
>
> 1) Process A
> - two bio(a, b) are splitted in dm's make_request funtion
> - bio(a) is submitted via generic_make_request(), so it is staged
> in current->bio_list
> - time t1
> - before bio(b) is submitted, down_write(&s->lock) is run and
> never return
>
> 2) Process B:
> - just during time t1, wait completion of bio(a) by down_write(&s->lock)
>
> Then Process A waits the lock which is acquired by B first, and the
> two bio(a, b)
> can't reach to driver/device at all.
>
> Looks that current->bio_list is fragile to locks from make_request function,
> and moving the lock into workqueue context should be helpful.
>
> And I am happy to continue to discuss this issue further.
>
> >
> > But it illustrates the type of problems that can occur when your rescue
> > infrastructure is shared across devices (in the context of df2cb6daa4,
> > current->bio_list contains bios from multiple devices).
> >
> > If a single splitting bio_set were shared across devices there would be
> > no guarantee of forward progress with complex stacked devices (one or
> > more devices could exhaust the reserve and starve out other devices in
> > the stack). So keeping the bio_set per request_queue isn't prone to
> > failure like a shared bio_set might be.
>
> Not consider the dm lock problem, from Kent's commit(df2cb6daa4) log and
> the patch, looks forward progress can be guaranteed for stacked devices
> with same bio_set, but better to get Kent's clarification.
>
> If forward progress can be guaranteed, percpu mempool might avoid
> easy exhausting, because it is reasonable to assume that one CPU can only
> provide a certain amount of bandwidth wrt. block transfer.
Generally speaking, with potential deadlocks like this I don't bother to work
out the specific scenario, it's enough to know that there's a shared resource
and multiple users that depend on each other... if you've got that, you'll have
a deadlock.
But, if you're curious: say we've got block devices a and b, when you submit to
a the bio will get passed down to b:
for the bioset itself: if a bio gets split when submitted to a, then needs to be
split again when it's submitted to b - you're allocating twice from the same
mempool, and the first allocation can't be freed until the original bio
completes. deadlock.
with the rescuer threads it's more subtle, but you just need a scenario where
the rescuer is required twice in a row. I'm not going to bother trying to work
out the details, but it's the same principle - you can end up in a situation
where you're blocked, and you need the rescuer thread to make forward progress
(or you'd deadlock - that's why it exists, right?) - well, what happens if that
happens twice in a row, and the second time you're running out of the rescuer
thread? oops.
next prev parent reply other threads:[~2016-02-24 3:23 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-11 10:49 4.4-rc: 28 bioset threads on small notebook Pavel Machek
2015-12-11 14:08 ` Mike Snitzer
2015-12-11 17:14 ` Pavel Machek
2016-02-20 17:40 ` 4.4-final: " Pavel Machek
2016-02-20 18:42 ` Pavel Machek
2016-02-20 19:51 ` Mike Snitzer
2016-02-20 20:04 ` Pavel Machek
2016-02-20 20:38 ` Mike Snitzer
2016-02-20 20:55 ` Pavel Machek
2016-02-21 4:15 ` Kent Overstreet
2016-02-21 6:43 ` Ming Lin-SSI
2016-02-21 9:40 ` Ming Lei
2016-02-22 22:58 ` Kent Overstreet
2016-02-23 2:55 ` Ming Lei
2016-02-23 14:54 ` Mike Snitzer
2016-02-24 2:48 ` Ming Lei
2016-02-24 3:23 ` Kent Overstreet [this message]
2016-02-23 20:45 ` Pavel Machek
2017-02-06 12:53 ` v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone Pavel Machek
2017-02-07 1:47 ` Kent Overstreet
2017-02-07 2:49 ` Kent Overstreet
2017-02-07 17:13 ` Mike Snitzer
2017-02-07 20:39 ` Pavel Machek
2017-02-08 3:12 ` Mike Galbraith
2017-02-08 4:58 ` Kent Overstreet
2017-02-08 6:22 ` [PATCH] block: Make rescuer threads per request_queue, not per bioset kbuild test robot
2017-02-08 6:23 ` kbuild test robot
2017-02-08 6:57 ` v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone Mike Galbraith
2017-02-08 16:34 ` Mike Snitzer
2017-02-09 21:25 ` Kent Overstreet
2017-02-14 16:34 ` [dm-devel] " Mikulas Patocka
2017-02-14 17:33 ` Mike Snitzer
2017-02-08 2:47 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160224032352.GA22341@kmo-pixel \
--to=kent.overstreet@gmail.com \
--cc=agk@redhat.com \
--cc=andreas.dilger@intel.com \
--cc=axboe@fb.com \
--cc=dm-devel@redhat.com \
--cc=dpark@posteo.net \
--cc=geoff@infradead.org \
--cc=hch@lst.de \
--cc=jim@jtan.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=minchan@kernel.org \
--cc=ming.l@ssi.samsung.com \
--cc=ming.lei@canonical.com \
--cc=ngupta@vflare.org \
--cc=oleg.drokin@intel.com \
--cc=pavel@ucw.cz \
--cc=pjk1939@linux.vnet.ibm.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).