All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikulas Patocka <mpatocka@redhat.com>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com, tj@kernel.org, vgoyal@redhat.com,
	bharrosh@panasas.com, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers
Date: Wed, 29 Aug 2012 12:24:43 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.1208291210180.774@file.rdu.redhat.com> (raw)
In-Reply-To: <1346175456-1572-10-git-send-email-koverstreet@google.com>

Hi

This fixes the bio allocation problems, but doesn't fix a similar deadlock 
in device mapper when allocating from md->io_pool or other mempools in 
the target driver.

The problem is that majority of device mapper code assumes that if we 
submit a bio, that bio will be finished in a finite time. The commit 
d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 in 2.6.22 broke this assumption.

I suggest - instead of writing workarounds for this current->bio_list 
misbehavior, why not remove current->bio_list at all? We could revert 
d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1, allocate a per-device workqueue, 
test stack usage in generic_make_request, and if it is too high (more than 
half of the stack used, or so), put the bio to the target device's 
blockqueue.

That could be simpler than allocating per-bioset workqueue and it also 
solves more problems (possible deadlocks in dm).

Mikulas



On Tue, 28 Aug 2012, Kent Overstreet wrote:

> Previously, if we ever try to allocate more than once from the same bio
> set while running under generic_make_request() (i.e. a stacking block
> driver), we risk deadlock.
> 
> This is because of the code in generic_make_request() that converts
> recursion to iteration; any bios we submit won't actually be submitted
> (so they can complete and eventually be freed) until after we return -
> this means if we allocate a second bio, we're blocking the first one
> from ever being freed.
> 
> Thus if enough threads call into a stacking block driver at the same
> time with bios that need multiple splits, and the bio_set's reserve gets
> used up, we deadlock.
> 
> This can be worked around in the driver code - we could check if we're
> running under generic_make_request(), then mask out __GFP_WAIT when we
> go to allocate a bio, and if the allocation fails punt to workqueue and
> retry the allocation.
> 
> But this is tricky and not a generic solution. This patch solves it for
> all users by inverting the previously described technique. We allocate a
> rescuer workqueue for each bio_set, and then in the allocation code if
> there are bios on current->bio_list we would be blocking, we punt them
> to the rescuer workqueue to be submitted.
> 
> Tested it by forcing the rescue codepath to be taken (by disabling the
> first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
> of arbitrary bio splitting) and verified that the rescuer was being
> invoked.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>
> CC: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/bio.c            | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  include/linux/bio.h |  9 +++++++
>  2 files changed, 79 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index 31e637a..5d46318 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -285,6 +285,23 @@ void bio_reset(struct bio *bio)
>  }
>  EXPORT_SYMBOL(bio_reset);
>  
> +static void bio_alloc_rescue(struct work_struct *work)
> +{
> +	struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> +	struct bio *bio;
> +
> +	while (1) {
> +		spin_lock(&bs->rescue_lock);
> +		bio = bio_list_pop(&bs->rescue_list);
> +		spin_unlock(&bs->rescue_lock);
> +
> +		if (!bio)
> +			break;
> +
> +		generic_make_request(bio);
> +	}
> +}
> +
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -307,6 +324,7 @@ EXPORT_SYMBOL(bio_reset);
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  {
> +	gfp_t saved_gfp = gfp_mask;
>  	unsigned front_pad;
>  	unsigned inline_vecs;
>  	unsigned long idx = BIO_POOL_NONE;
> @@ -324,13 +342,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  		front_pad = 0;
>  		inline_vecs = nr_iovecs;
>  	} else {
> +		/*
> +		 * generic_make_request() converts recursion to iteration; this
> +		 * means if we're running beneath it, any bios we allocate and
> +		 * submit will not be submitted (and thus freed) until after we
> +		 * return.
> +		 *
> +		 * This exposes us to a potential deadlock if we allocate
> +		 * multiple bios from the same bio_set() while running
> +		 * underneath generic_make_request(). If we were to allocate
> +		 * multiple bios (say a stacking block driver that was splitting
> +		 * bios), we would deadlock if we exhausted the mempool's
> +		 * reserve.
> +		 *
> +		 * We solve this, and guarantee forward progress, with a rescuer
> +		 * workqueue per bio_set. If we go to allocate and there are
> +		 * bios on current->bio_list, we first try the allocation
> +		 * without __GFP_WAIT; if that fails, we punt those bios we
> +		 * would be blocking to the rescuer workqueue before we retry
> +		 * with the original gfp_flags.
> +		 */
> +
> +		if (current->bio_list && !bio_list_empty(current->bio_list))
> +			gfp_mask &= ~__GFP_WAIT;
> +retry:
>  		p = mempool_alloc(bs->bio_pool, gfp_mask);
>  		front_pad = bs->front_pad;
>  		inline_vecs = BIO_INLINE_VECS;
>  	}
>  
>  	if (unlikely(!p))
> -		return NULL;
> +		goto err;
>  
>  	bio = p + front_pad;
>  	bio_init(bio);
> @@ -351,6 +393,19 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  
>  err_free:
>  	mempool_free(p, bs->bio_pool);
> +err:
> +	if (gfp_mask != saved_gfp) {
> +		gfp_mask = saved_gfp;
> +
> +		spin_lock(&bs->rescue_lock);
> +		bio_list_merge(&bs->rescue_list, current->bio_list);
> +		bio_list_init(current->bio_list);
> +		spin_unlock(&bs->rescue_lock);
> +
> +		queue_work(bs->rescue_workqueue, &bs->rescue_work);
> +		goto retry;
> +	}
> +
>  	return NULL;
>  }
>  EXPORT_SYMBOL(bio_alloc_bioset);
> @@ -1562,6 +1617,9 @@ static void biovec_free_pools(struct bio_set *bs)
>  
>  void bioset_free(struct bio_set *bs)
>  {
> +	if (bs->rescue_workqueue)
> +		destroy_workqueue(bs->rescue_workqueue);
> +
>  	if (bs->bio_pool)
>  		mempool_destroy(bs->bio_pool);
>  
> @@ -1597,6 +1655,10 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
>  
>  	bs->front_pad = front_pad;
>  
> +	spin_lock_init(&bs->rescue_lock);
> +	bio_list_init(&bs->rescue_list);
> +	INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
> +
>  	bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
>  	if (!bs->bio_slab) {
>  		kfree(bs);
> @@ -1607,9 +1669,14 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
>  	if (!bs->bio_pool)
>  		goto bad;
>  
> -	if (!biovec_create_pools(bs, pool_size))
> -		return bs;
> +	if (biovec_create_pools(bs, pool_size))
> +		goto bad;
> +
> +	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
> +	if (!bs->rescue_workqueue)
> +		goto bad;
>  
> +	return bs;
>  bad:
>  	bioset_free(bs);
>  	return NULL;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 3a8345e..84fdaac 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -492,6 +492,15 @@ struct bio_set {
>  	mempool_t *bio_integrity_pool;
>  #endif
>  	mempool_t *bvec_pool;
> +
> +	/*
> +	 * Deadlock avoidance for stacking block drivers: see comments in
> +	 * bio_alloc_bioset() for details
> +	 */
> +	spinlock_t		rescue_lock;
> +	struct bio_list		rescue_list;
> +	struct work_struct	rescue_work;
> +	struct workqueue_struct	*rescue_workqueue;
>  };
>  
>  struct biovec_slab {
> -- 
> 1.7.12
> 

  parent reply	other threads:[~2012-08-29 16:25 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-28 17:37 [PATCH v7 0/9] Block cleanups, deadlock fix Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 1/9] block: Generalized bio pool freeing Kent Overstreet
2012-08-28 17:37   ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 2/9] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 3/9] block: Add bio_reset() Kent Overstreet
2012-08-28 17:37   ` Kent Overstreet
2012-08-28 20:31   ` Tejun Heo
2012-08-28 20:31     ` Tejun Heo
2012-08-28 22:17     ` Kent Overstreet
2012-08-28 22:53       ` Kent Overstreet
2012-08-28 22:53         ` Kent Overstreet
2012-09-01  2:23       ` Tejun Heo
2012-09-01  2:23         ` Tejun Heo
2012-09-05 20:13         ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 4/9] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
2012-08-28 20:32   ` Tejun Heo
2012-08-28 20:32     ` Tejun Heo
2012-08-28 22:24     ` Kent Overstreet
2012-09-04  9:05     ` Jiri Kosina
2012-09-04  9:05       ` Jiri Kosina
2012-09-05 19:44       ` Kent Overstreet
2012-09-05 19:44         ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 5/9] block: Kill bi_destructor Kent Overstreet
2012-08-28 20:36   ` Tejun Heo
2012-08-28 20:36     ` Tejun Heo
2012-08-28 22:07     ` Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 6/9] block: Consolidate bio_alloc_bioset(), bio_kmalloc() Kent Overstreet
2012-08-28 17:37   ` Kent Overstreet
2012-08-28 20:41   ` Tejun Heo
2012-08-28 20:41     ` Tejun Heo
2012-08-28 22:03     ` Kent Overstreet
2012-08-28 22:03       ` Kent Overstreet
2012-09-01  2:17       ` Tejun Heo
2012-08-28 17:37 ` [PATCH v7 7/9] block: Add bio_clone_bioset(), bio_clone_kmalloc() Kent Overstreet
2012-08-28 20:44   ` Tejun Heo
2012-08-28 20:44     ` Tejun Heo
2012-08-28 22:05     ` Kent Overstreet
2012-09-01  2:19       ` Tejun Heo
2012-09-01  2:19         ` Tejun Heo
2012-08-28 17:37 ` [PATCH v7 8/9] block: Reorder struct bio_set Kent Overstreet
2012-08-28 17:37 ` [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers Kent Overstreet
2012-08-28 20:49   ` Tejun Heo
2012-08-28 20:49     ` Tejun Heo
2012-08-28 22:28     ` Kent Overstreet
2012-08-28 23:01       ` Kent Overstreet
2012-08-28 23:01         ` Kent Overstreet
2012-08-29  1:31         ` Vivek Goyal
2012-08-29  1:31           ` Vivek Goyal
2012-08-29  3:25           ` Kent Overstreet
2012-08-29 12:57             ` Vivek Goyal
2012-08-29 12:57               ` Vivek Goyal
2012-08-29 14:39               ` [dm-devel] " Alasdair G Kergon
2012-08-29 14:39                 ` Alasdair G Kergon
2012-08-29 16:26                 ` Kent Overstreet
2012-08-29 16:26                   ` Kent Overstreet
2012-08-29 21:01                   ` John Stoffel
2012-08-29 21:08                     ` Kent Overstreet
2012-08-29 21:08                       ` Kent Overstreet
2012-08-28 22:06   ` Vivek Goyal
2012-08-28 22:06     ` Vivek Goyal
2012-08-28 22:23     ` Kent Overstreet
2012-08-28 22:23       ` Kent Overstreet
2012-08-29 16:24   ` Mikulas Patocka [this message]
2012-08-29 16:50     ` Kent Overstreet
2012-08-29 16:57       ` [dm-devel] " Alasdair G Kergon
2012-08-29 16:57         ` Alasdair G Kergon
2012-08-29 17:07       ` Vivek Goyal
2012-08-29 17:07         ` Vivek Goyal
2012-08-29 17:13         ` Kent Overstreet
2012-08-29 17:13           ` Kent Overstreet
2012-08-29 17:23           ` [dm-devel] " Alasdair G Kergon
2012-08-29 17:23             ` Alasdair G Kergon
2012-08-29 17:32             ` Kent Overstreet
2012-08-30 22:07           ` Vivek Goyal
2012-08-30 22:07             ` Vivek Goyal
2012-08-31  1:43             ` Kent Overstreet
2012-08-31  1:43               ` Kent Overstreet
2012-08-31  1:55               ` Kent Overstreet
2012-08-31  1:55                 ` Kent Overstreet
2012-08-31 15:01               ` Vivek Goyal
2012-09-03  1:26                 ` Kent Overstreet
2012-09-03  1:26                   ` Kent Overstreet
2012-09-03 20:41               ` Mikulas Patocka
2012-09-03 20:41                 ` Mikulas Patocka
2012-09-04  3:41                 ` Kent Overstreet
2012-09-04  3:41                   ` Kent Overstreet
2012-09-04 18:55                   ` Tejun Heo
2012-09-04 18:55                     ` Tejun Heo
2012-09-04 19:01                     ` Tejun Heo
2012-09-04 19:43                       ` Kent Overstreet
2012-09-04 19:43                         ` Kent Overstreet
2012-09-04 19:42                     ` Kent Overstreet
2012-09-04 21:03                       ` Tejun Heo
2012-09-04 21:03                         ` Tejun Heo
2012-09-04 19:26                   ` Mikulas Patocka
2012-09-04 19:26                     ` Mikulas Patocka
2012-09-04 19:39                     ` Vivek Goyal
2012-09-04 19:39                       ` Vivek Goyal
2012-09-04 19:51                     ` [PATCH] dm: Use bioset's front_pad for dm_target_io Kent Overstreet
2012-09-04 19:51                       ` Kent Overstreet
2012-09-04 21:20                       ` Tejun Heo
2012-09-04 21:20                         ` Tejun Heo
2012-09-11 19:28                       ` [PATCH 2] " Mikulas Patocka
2012-09-11 19:28                         ` Mikulas Patocka
2012-09-11 19:50                         ` Kent Overstreet
2012-09-11 19:50                           ` Kent Overstreet
2012-09-12 22:31                           ` Mikulas Patocka
2012-09-12 22:31                             ` Mikulas Patocka
2012-09-14 23:09                             ` [dm-devel] " Alasdair G Kergon
2012-09-14 23:09                               ` Alasdair G Kergon
2012-09-01  2:13             ` [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers Tejun Heo
2012-09-01  2:13               ` Tejun Heo
2012-09-03  1:34               ` [PATCH v2] " Kent Overstreet
2012-09-03  1:34                 ` Kent Overstreet
2012-09-04 15:00               ` [PATCH v7 9/9] " Vivek Goyal
2012-09-04 15:00                 ` Vivek Goyal
2012-09-03  0:49             ` Dave Chinner
2012-09-03  0:49               ` Dave Chinner
2012-09-03  1:17               ` Kent Overstreet
2012-09-03  1:17                 ` Kent Overstreet
2012-09-04 13:54               ` Vivek Goyal
2012-09-04 13:54                 ` Vivek Goyal
2012-09-04 18:26                 ` Tejun Heo
2012-09-04 18:26                   ` Tejun Heo
2012-09-05  3:57                   ` Dave Chinner
2012-09-05  3:57                     ` Dave Chinner
2012-09-05  4:37                     ` Tejun Heo
2012-09-05  4:37                       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.1208291210180.774@file.rdu.redhat.com \
    --to=mpatocka@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bharrosh@panasas.com \
    --cc=dm-devel@redhat.com \
    --cc=koverstreet@google.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.