linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <koverstreet@google.com>
To: linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com
Cc: Kent Overstreet <koverstreet@google.com>,
	tj@kernel.org, vgoyal@redhat.com, mpatocka@redhat.com,
	bharrosh@panasas.com, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers
Date: Tue, 28 Aug 2012 10:37:36 -0700	[thread overview]
Message-ID: <1346175456-1572-10-git-send-email-koverstreet@google.com> (raw)
In-Reply-To: <1346175456-1572-1-git-send-email-koverstreet@google.com>

Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current->bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
---
 fs/bio.c            | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/bio.h |  9 +++++++
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 31e637a..5d46318 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -285,6 +285,23 @@ void bio_reset(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_reset);
 
+static void bio_alloc_rescue(struct work_struct *work)
+{
+	struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
+	struct bio *bio;
+
+	while (1) {
+		spin_lock(&bs->rescue_lock);
+		bio = bio_list_pop(&bs->rescue_list);
+		spin_unlock(&bs->rescue_lock);
+
+		if (!bio)
+			break;
+
+		generic_make_request(bio);
+	}
+}
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -307,6 +324,7 @@ EXPORT_SYMBOL(bio_reset);
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
+	gfp_t saved_gfp = gfp_mask;
 	unsigned front_pad;
 	unsigned inline_vecs;
 	unsigned long idx = BIO_POOL_NONE;
@@ -324,13 +342,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 		front_pad = 0;
 		inline_vecs = nr_iovecs;
 	} else {
+		/*
+		 * generic_make_request() converts recursion to iteration; this
+		 * means if we're running beneath it, any bios we allocate and
+		 * submit will not be submitted (and thus freed) until after we
+		 * return.
+		 *
+		 * This exposes us to a potential deadlock if we allocate
+		 * multiple bios from the same bio_set() while running
+		 * underneath generic_make_request(). If we were to allocate
+		 * multiple bios (say a stacking block driver that was splitting
+		 * bios), we would deadlock if we exhausted the mempool's
+		 * reserve.
+		 *
+		 * We solve this, and guarantee forward progress, with a rescuer
+		 * workqueue per bio_set. If we go to allocate and there are
+		 * bios on current->bio_list, we first try the allocation
+		 * without __GFP_WAIT; if that fails, we punt those bios we
+		 * would be blocking to the rescuer workqueue before we retry
+		 * with the original gfp_flags.
+		 */
+
+		if (current->bio_list && !bio_list_empty(current->bio_list))
+			gfp_mask &= ~__GFP_WAIT;
+retry:
 		p = mempool_alloc(bs->bio_pool, gfp_mask);
 		front_pad = bs->front_pad;
 		inline_vecs = BIO_INLINE_VECS;
 	}
 
 	if (unlikely(!p))
-		return NULL;
+		goto err;
 
 	bio = p + front_pad;
 	bio_init(bio);
@@ -351,6 +393,19 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 
 err_free:
 	mempool_free(p, bs->bio_pool);
+err:
+	if (gfp_mask != saved_gfp) {
+		gfp_mask = saved_gfp;
+
+		spin_lock(&bs->rescue_lock);
+		bio_list_merge(&bs->rescue_list, current->bio_list);
+		bio_list_init(current->bio_list);
+		spin_unlock(&bs->rescue_lock);
+
+		queue_work(bs->rescue_workqueue, &bs->rescue_work);
+		goto retry;
+	}
+
 	return NULL;
 }
 EXPORT_SYMBOL(bio_alloc_bioset);
@@ -1562,6 +1617,9 @@ static void biovec_free_pools(struct bio_set *bs)
 
 void bioset_free(struct bio_set *bs)
 {
+	if (bs->rescue_workqueue)
+		destroy_workqueue(bs->rescue_workqueue);
+
 	if (bs->bio_pool)
 		mempool_destroy(bs->bio_pool);
 
@@ -1597,6 +1655,10 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
 
 	bs->front_pad = front_pad;
 
+	spin_lock_init(&bs->rescue_lock);
+	bio_list_init(&bs->rescue_list);
+	INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
+
 	bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
 	if (!bs->bio_slab) {
 		kfree(bs);
@@ -1607,9 +1669,14 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
 	if (!bs->bio_pool)
 		goto bad;
 
-	if (!biovec_create_pools(bs, pool_size))
-		return bs;
+	if (biovec_create_pools(bs, pool_size))
+		goto bad;
+
+	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
+	if (!bs->rescue_workqueue)
+		goto bad;
 
+	return bs;
 bad:
 	bioset_free(bs);
 	return NULL;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3a8345e..84fdaac 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -492,6 +492,15 @@ struct bio_set {
 	mempool_t *bio_integrity_pool;
 #endif
 	mempool_t *bvec_pool;
+
+	/*
+	 * Deadlock avoidance for stacking block drivers: see comments in
+	 * bio_alloc_bioset() for details
+	 */
+	spinlock_t		rescue_lock;
+	struct bio_list		rescue_list;
+	struct work_struct	rescue_work;
+	struct workqueue_struct	*rescue_workqueue;
 };
 
 struct biovec_slab {
-- 
1.7.12


  parent reply	other threads:[~2012-08-28 17:38 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-28 17:37 [PATCH v7 0/9] Block cleanups, deadlock fix Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 1/9] block: Generalized bio pool freeing Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 2/9] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 3/9] block: Add bio_reset() Kent Overstreet
2012-08-28 20:31   ` Tejun Heo
2012-08-28 22:17     ` Kent Overstreet
2012-08-28 22:53       ` Kent Overstreet
2012-09-01  2:23       ` Tejun Heo
2012-09-05 20:13         ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 4/9] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
2012-08-28 20:32   ` Tejun Heo
2012-08-28 22:24     ` Kent Overstreet
2012-09-04  9:05     ` Jiri Kosina
2012-09-05 19:44       ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 5/9] block: Kill bi_destructor Kent Overstreet
2012-08-28 20:36   ` Tejun Heo
2012-08-28 22:07     ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 6/9] block: Consolidate bio_alloc_bioset(), bio_kmalloc() Kent Overstreet
2012-08-28 20:41   ` Tejun Heo
2012-08-28 22:03     ` Kent Overstreet
2012-09-01  2:17       ` Tejun Heo
2012-08-28 17:37 ` [PATCH v7 7/9] block: Add bio_clone_bioset(), bio_clone_kmalloc() Kent Overstreet
2012-08-28 20:44   ` Tejun Heo
2012-08-28 22:05     ` Kent Overstreet
2012-09-01  2:19       ` Tejun Heo
2012-08-28 17:37 ` [PATCH v7 8/9] block: Reorder struct bio_set Kent Overstreet
2012-08-28 17:37 ` Kent Overstreet [this message]
2012-08-28 20:49   ` [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers Tejun Heo
2012-08-28 22:28     ` Kent Overstreet
2012-08-28 23:01       ` Kent Overstreet
2012-08-29  1:31         ` Vivek Goyal
2012-08-29  3:25           ` Kent Overstreet
2012-08-29 12:57             ` Vivek Goyal
2012-08-29 14:39               ` [dm-devel] " Alasdair G Kergon
2012-08-29 16:26                 ` Kent Overstreet
2012-08-29 21:01                   ` John Stoffel
2012-08-29 21:08                     ` Kent Overstreet
2012-08-28 22:06   ` Vivek Goyal
2012-08-28 22:23     ` Kent Overstreet
2012-08-29 16:24   ` Mikulas Patocka
2012-08-29 16:50     ` Kent Overstreet
2012-08-29 16:57       ` [dm-devel] " Alasdair G Kergon
2012-08-29 17:07       ` Vivek Goyal
2012-08-29 17:13         ` Kent Overstreet
2012-08-29 17:23           ` [dm-devel] " Alasdair G Kergon
2012-08-29 17:32             ` Kent Overstreet
2012-08-30 22:07           ` Vivek Goyal
2012-08-31  1:43             ` Kent Overstreet
2012-08-31  1:55               ` Kent Overstreet
2012-08-31 15:01               ` Vivek Goyal
2012-09-03  1:26                 ` Kent Overstreet
2012-09-03 20:41               ` Mikulas Patocka
2012-09-04  3:41                 ` Kent Overstreet
2012-09-04 18:55                   ` Tejun Heo
2012-09-04 19:01                     ` Tejun Heo
2012-09-04 19:43                       ` Kent Overstreet
2012-09-04 19:42                     ` Kent Overstreet
2012-09-04 21:03                       ` Tejun Heo
2012-09-04 19:26                   ` Mikulas Patocka
2012-09-04 19:39                     ` Vivek Goyal
2012-09-04 19:51                     ` [PATCH] dm: Use bioset's front_pad for dm_target_io Kent Overstreet
2012-09-04 21:20                       ` Tejun Heo
2012-09-11 19:28                       ` [PATCH 2] " Mikulas Patocka
2012-09-11 19:50                         ` Kent Overstreet
2012-09-12 22:31                           ` Mikulas Patocka
2012-09-14 23:09                             ` [dm-devel] " Alasdair G Kergon
2012-09-01  2:13             ` [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers Tejun Heo
2012-09-03  1:34               ` [PATCH v2] " Kent Overstreet
2012-09-04 15:00               ` [PATCH v7 9/9] " Vivek Goyal
2012-09-03  0:49             ` Dave Chinner
2012-09-03  1:17               ` Kent Overstreet
2012-09-04 13:54               ` Vivek Goyal
2012-09-04 18:26                 ` Tejun Heo
2012-09-05  3:57                   ` Dave Chinner
2012-09-05  4:37                     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1346175456-1572-10-git-send-email-koverstreet@google.com \
    --to=koverstreet@google.com \
    --cc=axboe@kernel.dk \
    --cc=bharrosh@panasas.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).