From: NeilBrown <neilb@suse.com> To: Mike Snitzer <snitzer@redhat.com>, Mikulas Patocka <mpatocka@redhat.com> Cc: Jack Wang <jack.wang.usish@gmail.com>, Lars Ellenberg <lars.ellenberg@linbit.com>, Jens Axboe <axboe@kernel.dk>, linux-raid <linux-raid@vger.kernel.org>, Michael Wang <yun.wang@profitbricks.com>, Peter Zijlstra <peterz@infradead.org>, Jiri Kosina <jkosina@suse.cz>, Ming Lei <ming.lei@canonical.com>, linux-kernel@vger.kernel.org, Zheng Liu <gnehzuil.liu@gmail.com>, linux-block@vger.kernel.org, Takashi Iwai <tiwai@suse.de>, "linux-bcache@vger.kernel.org" <linux-bcache@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>, Alasdair Kergon <agk@redhat.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, Keith Busch <keith.busch@intel.com>, device-mapper development <dm-devel@redhat.com>, Shaohua Li <shli@kernel.org>, Kent Overstreet <kent.overstreet@gmai> Subject: Re: [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion Date: Sat, 07 Jan 2017 10:01:07 +1100 [thread overview] Message-ID: <877f67lrnw.fsf@notabene.neil.brown.name> (raw) In-Reply-To: <20170106195216.GA15583@redhat.com> [-- Attachment #1: Type: text/plain, Size: 4866 bytes --] On Sat, Jan 07 2017, Mike Snitzer wrote: > On Fri, Jan 06 2017 at 12:34pm -0500, > Mikulas Patocka <mpatocka@redhat.com> wrote: > >> >> >> On Fri, 6 Jan 2017, Mikulas Patocka wrote: >> >> > >> > >> > On Wed, 4 Jan 2017, Mike Snitzer wrote: >> > >> > > On Wed, Jan 04 2017 at 12:12am -0500, >> > > NeilBrown <neilb@suse.com> wrote: >> > > >> > > > > Suggested-by: NeilBrown <neilb@suse.com> >> > > > > Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> >> > > > > --- >> > > > > block/blk-core.c | 20 ++++++++++++++++++++ >> > > > > 1 file changed, 20 insertions(+) >> > > > > >> > > > > diff --git a/block/blk-core.c b/block/blk-core.c >> > > > > index 9e3ac56..47ef373 100644 >> > > > > --- a/block/blk-core.c >> > > > > +++ b/block/blk-core.c >> > > > > @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio) >> > > > > struct request_queue *q = bdev_get_queue(bio->bi_bdev); >> > > > > >> > > > > if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { >> > > > > + struct bio_list lower, same, hold; >> > > > > + >> > > > > + /* Create a fresh bio_list for all subordinate requests */ >> > > > > + bio_list_init(&hold); >> > > > > + bio_list_merge(&hold, &bio_list_on_stack); >> > > > > + bio_list_init(&bio_list_on_stack); >> > > > > >> > > > > ret = q->make_request_fn(q, bio); >> > > > > >> > > > > blk_queue_exit(q); >> > > > > + /* sort new bios into those for a lower level >> > > > > + * and those for the same level >> > > > > + */ >> > > > > + bio_list_init(&lower); >> > > > > + bio_list_init(&same); >> > > > > + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) >> > > > > + if (q == bdev_get_queue(bio->bi_bdev)) >> > > > > + bio_list_add(&same, bio); >> > > > > + else >> > > > > + bio_list_add(&lower, bio); >> > > > > + /* now assemble so we handle the lowest level first */ >> > > > > + bio_list_merge(&bio_list_on_stack, &lower); >> > > > > + bio_list_merge(&bio_list_on_stack, &same); >> > > > > + bio_list_merge(&bio_list_on_stack, &hold); >> > > > > >> > > > > bio = bio_list_pop(current->bio_list); >> > > > > } else { >> > > > > -- >> > > > > 2.7.4 >> > > >> > > Mikulas, would you be willing to try the below patch with the >> > > dm-snapshot deadlock scenario and report back on whether it fixes that? >> > > >> > > Patch below looks to be the same as here: >> > > https://marc.info/?l=linux-raid&m=148232453107685&q=p3 >> > > >> > > Neil and/or others if that isn't the patch that should be tested please >> > > provide a pointer to the latest. >> > > >> > > Thanks, >> > > Mike >> > >> > The bad news is that this doesn't fix the snapshot deadlock. >> > >> > I created a test program for the snapshot deadlock bug (it was originally >> > created years ago to test for a different bug, so it contains some cruft). >> > You also need to insert "if (ci->sector_count) msleep(100);" to the end of >> > __split_and_process_non_flush to make the kernel sleep when splitting the >> > bio. >> > >> > And with the above above patch, the snapshot deadlock bug still happens. > > That is really unfortunate. Would be useful to dig in and understand > why. Because ordering of the IO in generic_make_request() really should > take care of it. I *think* you might be able to resolve this by changing __split_and_process_bio() to only ever perform a single split. No looping. i.e. if the bio is too big to handle directly, then split off the front to a new bio, which you bio_chain to the original. The original then has bio_advance() called to stop over the front, then generic_make_request() so it is queued. Then the code proceeds to __clone_and_map_data_bio() on the front that got split off. When that completes it *doesn't* loop round, but returns into generic_make_request() which does the looping and makes sure to handle the lowest-level bio next. something vaguely like this: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5664f3..06ee0960e415 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci) len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count); + if (len < ci->sector_count) { + struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set); + bio_chain(split, bio); + generic_make_request(bio); + bio = split; + ci->sector_count = len; + } + r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); if (r < 0) return r; though I haven't tested, and the change (if it works) should probably be more fully integrated into surrounding code. You probably don't want to use "fs_bio_set" either - a target-local pool would be best. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.com> To: Mike Snitzer <snitzer@redhat.com>, Mikulas Patocka <mpatocka@redhat.com> Cc: Jack Wang <jack.wang.usish@gmail.com>, Lars Ellenberg <lars.ellenberg@linbit.com>, Jens Axboe <axboe@kernel.dk>, linux-raid <linux-raid@vger.kernel.org>, Michael Wang <yun.wang@profitbricks.com>, Peter Zijlstra <peterz@infradead.org>, Jiri Kosina <jkosina@suse.cz>, Ming Lei <ming.lei@canonical.com>, linux-kernel@vger.kernel.org, Zheng Liu <gnehzuil.liu@gmail.com>, linux-block@vger.kernel.org, Takashi Iwai <tiwai@suse.de>, "linux-bcache\@vger.kernel.org" <linux-bcache@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>, Alasdair Kergon <agk@redhat.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, Keith Busch <keith.busch@intel.com>, device-mapper development <dm-devel@redhat.com>, Shaohua Li <shli@kernel.org>, Kent Overstreet <kent.overstreet@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Roland Kammerer <roland.kammerer@linbit.com> Subject: Re: [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion Date: Sat, 07 Jan 2017 10:01:07 +1100 [thread overview] Message-ID: <877f67lrnw.fsf@notabene.neil.brown.name> (raw) In-Reply-To: <20170106195216.GA15583@redhat.com> [-- Attachment #1: Type: text/plain, Size: 4866 bytes --] On Sat, Jan 07 2017, Mike Snitzer wrote: > On Fri, Jan 06 2017 at 12:34pm -0500, > Mikulas Patocka <mpatocka@redhat.com> wrote: > >> >> >> On Fri, 6 Jan 2017, Mikulas Patocka wrote: >> >> > >> > >> > On Wed, 4 Jan 2017, Mike Snitzer wrote: >> > >> > > On Wed, Jan 04 2017 at 12:12am -0500, >> > > NeilBrown <neilb@suse.com> wrote: >> > > >> > > > > Suggested-by: NeilBrown <neilb@suse.com> >> > > > > Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> >> > > > > --- >> > > > > block/blk-core.c | 20 ++++++++++++++++++++ >> > > > > 1 file changed, 20 insertions(+) >> > > > > >> > > > > diff --git a/block/blk-core.c b/block/blk-core.c >> > > > > index 9e3ac56..47ef373 100644 >> > > > > --- a/block/blk-core.c >> > > > > +++ b/block/blk-core.c >> > > > > @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio) >> > > > > struct request_queue *q = bdev_get_queue(bio->bi_bdev); >> > > > > >> > > > > if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { >> > > > > + struct bio_list lower, same, hold; >> > > > > + >> > > > > + /* Create a fresh bio_list for all subordinate requests */ >> > > > > + bio_list_init(&hold); >> > > > > + bio_list_merge(&hold, &bio_list_on_stack); >> > > > > + bio_list_init(&bio_list_on_stack); >> > > > > >> > > > > ret = q->make_request_fn(q, bio); >> > > > > >> > > > > blk_queue_exit(q); >> > > > > + /* sort new bios into those for a lower level >> > > > > + * and those for the same level >> > > > > + */ >> > > > > + bio_list_init(&lower); >> > > > > + bio_list_init(&same); >> > > > > + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) >> > > > > + if (q == bdev_get_queue(bio->bi_bdev)) >> > > > > + bio_list_add(&same, bio); >> > > > > + else >> > > > > + bio_list_add(&lower, bio); >> > > > > + /* now assemble so we handle the lowest level first */ >> > > > > + bio_list_merge(&bio_list_on_stack, &lower); >> > > > > + bio_list_merge(&bio_list_on_stack, &same); >> > > > > + bio_list_merge(&bio_list_on_stack, &hold); >> > > > > >> > > > > bio = bio_list_pop(current->bio_list); >> > > > > } else { >> > > > > -- >> > > > > 2.7.4 >> > > >> > > Mikulas, would you be willing to try the below patch with the >> > > dm-snapshot deadlock scenario and report back on whether it fixes that? >> > > >> > > Patch below looks to be the same as here: >> > > https://marc.info/?l=linux-raid&m=148232453107685&q=p3 >> > > >> > > Neil and/or others if that isn't the patch that should be tested please >> > > provide a pointer to the latest. >> > > >> > > Thanks, >> > > Mike >> > >> > The bad news is that this doesn't fix the snapshot deadlock. >> > >> > I created a test program for the snapshot deadlock bug (it was originally >> > created years ago to test for a different bug, so it contains some cruft). >> > You also need to insert "if (ci->sector_count) msleep(100);" to the end of >> > __split_and_process_non_flush to make the kernel sleep when splitting the >> > bio. >> > >> > And with the above above patch, the snapshot deadlock bug still happens. > > That is really unfortunate. Would be useful to dig in and understand > why. Because ordering of the IO in generic_make_request() really should > take care of it. I *think* you might be able to resolve this by changing __split_and_process_bio() to only ever perform a single split. No looping. i.e. if the bio is too big to handle directly, then split off the front to a new bio, which you bio_chain to the original. The original then has bio_advance() called to stop over the front, then generic_make_request() so it is queued. Then the code proceeds to __clone_and_map_data_bio() on the front that got split off. When that completes it *doesn't* loop round, but returns into generic_make_request() which does the looping and makes sure to handle the lowest-level bio next. something vaguely like this: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5664f3..06ee0960e415 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci) len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count); + if (len < ci->sector_count) { + struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set); + bio_chain(split, bio); + generic_make_request(bio); + bio = split; + ci->sector_count = len; + } + r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); if (r < 0) return r; though I haven't tested, and the change (if it works) should probably be more fully integrated into surrounding code. You probably don't want to use "fs_bio_set" either - a target-local pool would be best. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-01-06 23:01 UTC|newest] Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-07-08 15:04 [PATCH 0/1] block: fix blk_queue_split() resource exhaustion Lars Ellenberg 2016-07-08 15:04 ` [PATCH 1/1] " Lars Ellenberg 2016-07-08 18:49 ` Mike Snitzer 2016-07-11 14:13 ` Lars Ellenberg 2016-07-11 14:10 ` [PATCH v2 " Lars Ellenberg 2016-07-12 2:55 ` [dm-devel] " NeilBrown 2016-07-13 2:18 ` Eric Wheeler 2016-07-13 2:32 ` Mike Snitzer 2016-07-19 9:00 ` Lars Ellenberg 2016-07-19 9:00 ` Lars Ellenberg 2016-07-21 22:53 ` Eric Wheeler 2016-07-25 20:39 ` Jeff Moyer 2016-08-11 4:16 ` Eric Wheeler 2017-01-07 19:56 ` Lars Ellenberg 2017-01-07 19:56 ` Lars Ellenberg 2016-09-16 20:14 ` [dm-devel] " Matthias Ferdinand 2016-09-18 23:10 ` Eric Wheeler 2016-09-19 20:43 ` Matthias Ferdinand 2016-09-21 21:08 ` bcache: discard BUG (was: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion) Eric Wheeler 2016-12-23 8:49 ` [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion Michael Wang 2016-12-23 11:45 ` Lars Ellenberg 2016-12-23 11:45 ` Lars Ellenberg 2017-01-02 14:33 ` [dm-devel] " Jack Wang 2017-01-02 14:33 ` Jack Wang 2017-01-04 5:12 ` NeilBrown 2017-01-04 5:12 ` [dm-devel] " NeilBrown 2017-01-04 18:50 ` Mike Snitzer 2017-01-04 18:50 ` Mike Snitzer 2017-01-05 10:54 ` 王金浦 2017-01-05 10:54 ` 王金浦 2017-01-06 16:50 ` Mikulas Patocka 2017-01-06 16:50 ` Mikulas Patocka 2017-01-06 17:34 ` Mikulas Patocka 2017-01-06 17:34 ` Mikulas Patocka 2017-01-06 17:34 ` Mikulas Patocka 2017-01-06 19:52 ` Mike Snitzer 2017-01-06 19:52 ` Mike Snitzer 2017-01-06 23:01 ` NeilBrown [this message] 2017-01-06 23:01 ` NeilBrown
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=877f67lrnw.fsf@notabene.neil.brown.name \ --to=neilb@suse.com \ --cc=agk@redhat.com \ --cc=axboe@kernel.dk \ --cc=dm-devel@redhat.com \ --cc=gnehzuil.liu@gmail.com \ --cc=jack.wang.usish@gmail.com \ --cc=jkosina@suse.cz \ --cc=keith.busch@intel.com \ --cc=kent.overstreet@gmai \ --cc=lars.ellenberg@linbit.com \ --cc=linux-bcache@vger.kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=ming.lei@canonical.com \ --cc=mingo@redhat.com \ --cc=mpatocka@redhat.com \ --cc=peterz@infradead.org \ --cc=shli@kernel.org \ --cc=snitzer@redhat.com \ --cc=tiwai@suse.de \ --cc=yun.wang@profitbricks.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.