linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Mikulas Patocka <mpatocka@redhat.com>, Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org,
	device-mapper development <dm-devel@redhat.com>,
	Zdenek Kabelac <zkabelac@redhat.com>
Subject: Re: [dm-devel] new patchset to eliminate DM's use of   BIOSET_NEED_RESCUER
Date: Wed, 22 Nov 2017 15:00:51 +1100	[thread overview]
Message-ID: <87bmjv0xos.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <alpine.LRH.2.02.1711212009040.28456@file01.intranet.prod.int.rdu2.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 11093 bytes --]

On Tue, Nov 21 2017, Mikulas Patocka wrote:

> On Tue, 21 Nov 2017, Mike Snitzer wrote:
>
>> On Tue, Nov 21 2017 at  4:23pm -0500,
>> Mikulas Patocka <mpatocka@redhat.com> wrote:
>> 
>> > This is not correct:
>> > 
>> >    2206 static void dm_wq_work(struct work_struct *work)
>> >    2207 {
>> >    2208         struct mapped_device *md = container_of(work, struct mapped_device, work);
>> >    2209         struct bio *bio;
>> >    2210         int srcu_idx;
>> >    2211         struct dm_table *map;
>> >    2212
>> >    2213         if (!bio_list_empty(&md->rescued)) {
>> >    2214                 struct bio_list list;
>> >    2215                 spin_lock_irq(&md->deferred_lock);
>> >    2216                 list = md->rescued;
>> >    2217                 bio_list_init(&md->rescued);
>> >    2218                 spin_unlock_irq(&md->deferred_lock);
>> >    2219                 while ((bio = bio_list_pop(&list)))
>> >    2220                         generic_make_request(bio);
>> >    2221         }
>> >    2222
>> >    2223         map = dm_get_live_table(md, &srcu_idx);
>> >    2224
>> >    2225         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
>> >    2226                 spin_lock_irq(&md->deferred_lock);
>> >    2227                 bio = bio_list_pop(&md->deferred);
>> >    2228                 spin_unlock_irq(&md->deferred_lock);
>> >    2229
>> >    2230                 if (!bio)
>> >    2231                         break;
>> >    2232
>> >    2233                 if (dm_request_based(md))
>> >    2234                         generic_make_request(bio);
>> >    2235                 else
>> >    2236                         __split_and_process_bio(md, map, bio);
>> >    2237         }
>> >    2238
>> >    2239         dm_put_live_table(md, srcu_idx);
>> >    2240 }
>> > 
>> > You can see that if we are in dm_wq_work in __split_and_process_bio, we 
>> > will not process md->rescued list.
>> 
>> Can you elaborate further?  We cannot be "in dm_wq_work in
>> __split_and_process_bio" simultaneously.  Do you mean as a side-effect
>> of scheduling away from __split_and_process_bio?
>> 
>> The more detail you can share the better.
>
> Suppose this scenario:
>
> * dm_wq_work calls __split_and_process_bio
> * __split_and_process_bio eventually reaches the function snapshot_map
> * snapshot_map attempts to take the snapshot lock
>
> * the snapshot lock could be released only if some bios submitted by the 
> snapshot driver to the underlying device complete
> * the bios submitted to the underlying device were already offloaded by 
> some other task and they are waiting on the list md->rescued
> * the bios waiting on md->rescued are not processed, because dm_wq_work is 
> blocked in snapshot_map (called from __split_and_process_bio)

Yes, I think you are right. 

I think the solution is to get rid of the dm_offload() infrastructure
and make it not necessary.
i.e. discard my patches
    dm: prepare to discontinue use of BIOSET_NEED_RESCUER
and
    dm: revise 'rescue' strategy for bio-based bioset allocations

And build on "dm: ensure bio submission follows a depth-first tree walk"
which was written after those and already makes dm_offload() less
important.

Since that "depth-first" patch, every request to the dm device, after
the initial splitting, allocates just one dm_target_io structure, and
makes just one __map_bio() call, and so will behave exactly the way
generic_make_request() expects and copes with - thus avoiding awkward
dependencies and deadlocks.  Except....

a/ If any target defines ->num_write_bios() to return >1,
   __clone_and_map_data_bio() will make multiple calls to alloc_tio()
   and __map_bio(), which might need rescuing.
   But no target defines num_write_bios, and none have since it was
   removed from dm-cache 4.5 years ago.
   Can we discard num_write_bios??

b/ If any target sets any of num_{flush,discard,write_same,write_zeroes}_bios
   to a value > 1, then __send_duplicate_bios() will also make multiple
   calls to alloc_tio() and __map_bio().
   Some do.
     dm-cache-target:  flush=2
     dm-snap: flush=2
     dm-stripe: discard, write_same, write_zeroes all set to 'stripes'.

These will only be a problem if the second (or subsequent) alloc_tio()
blocks waiting for an earlier allocation to complete.  This will only
be a problem if multiple threads are each trying to allocate multiple
dm_target_io from the same bioset at the same time.
This is rare and should be easier to address than the current
dm_offload() approach. 
One possibility would be to copy the approach taken by
crypt_alloc_buffer() which needs to allocate multiple entries from a
mempool.
It first tries the with GFP_NOWAIT.  If that fails it take a mutex and
tries with GFP_NOIO.  This mean only one thread will try to allocate
multiple bios at once, so there can be no deadlock.

Below are two RFC patches.  The first removes num_write_bios.
The second is incomplete and makes a stab are allocating multiple bios
at once safely.
A third would be needed to remove dm_offload() etc... but I cannot quite
fit that in today - must be off.

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Date: Wed, 22 Nov 2017 14:25:18 +1100
Subject: [PATCH] DM: remove num_write_bios target interface.

No target provides num_write_bios and none has done
since 2013.
Having the possibility of num_write_bios > 1 complicates
bio allocation.
So remove the interface and assume there is only one bio
needed.
If a target ever needs more, it must provide a suitable
bioset and allocate itself based on its particular needs.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/dm.c               | 22 ++++------------------
 include/linux/device-mapper.h | 15 ---------------
 2 files changed, 4 insertions(+), 33 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b20febd6cbc7..8c1a05609eea 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1323,27 +1323,13 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
 {
 	struct bio *bio = ci->bio;
 	struct dm_target_io *tio;
-	unsigned target_bio_nr;
-	unsigned num_target_bios = 1;
 	int r = 0;
 
-	/*
-	 * Does the target want to receive duplicate copies of the bio?
-	 */
-	if (bio_data_dir(bio) == WRITE && ti->num_write_bios)
-		num_target_bios = ti->num_write_bios(ti, bio);
-
-	for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
-		tio = alloc_tio(ci, ti, target_bio_nr);
-		tio->len_ptr = len;
-		r = clone_bio(tio, bio, sector, *len);
-		if (r < 0) {
-			free_tio(tio);
-			break;
-		}
+	tio = alloc_tio(ci, ti, 0);
+	tio->len_ptr = len;
+	r = clone_bio(tio, bio, sector, *len);
+	if (r >= 0)
 		__map_bio(tio);
-	}
-
 	return r;
 }
 
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a5538433c927..5a68b366e664 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -220,14 +220,6 @@ struct target_type {
 #define DM_TARGET_WILDCARD		0x00000008
 #define dm_target_is_wildcard(type)	((type)->features & DM_TARGET_WILDCARD)
 
-/*
- * Some targets need to be sent the same WRITE bio severals times so
- * that they can send copies of it to different devices.  This function
- * examines any supplied bio and returns the number of copies of it the
- * target requires.
- */
-typedef unsigned (*dm_num_write_bios_fn) (struct dm_target *ti, struct bio *bio);
-
 /*
  * A target implements own bio data integrity.
  */
@@ -291,13 +283,6 @@ struct dm_target {
 	 */
 	unsigned per_io_data_size;
 
-	/*
-	 * If defined, this function is called to find out how many
-	 * duplicate bios should be sent to the target when writing
-	 * data.
-	 */
-	dm_num_write_bios_fn num_write_bios;
-
 	/* target specific data */
 	void *private;
 
-- 
2.14.0.rc0.dirty


-----------------------------------
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8c1a05609eea..8762661df2ef 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1265,8 +1265,7 @@ static int clone_bio(struct dm_target_io *tio, struct bio *bio,
 }
 
 static struct dm_target_io *alloc_tio(struct clone_info *ci,
-				      struct dm_target *ti,
-				      unsigned target_bio_nr)
+				      struct dm_target *ti)
 {
 	struct dm_target_io *tio;
 	struct bio *clone;
@@ -1276,34 +1275,66 @@ static struct dm_target_io *alloc_tio(struct clone_info *ci,
 
 	tio->io = ci->io;
 	tio->ti = ti;
-	tio->target_bio_nr = target_bio_nr;
+	tio->target_bio_nr = 0;
 
 	return tio;
 }
 
-static void __clone_and_map_simple_bio(struct clone_info *ci,
-				       struct dm_target *ti,
-				       unsigned target_bio_nr, unsigned *len)
+static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
+				struct dm_target *ti, unsigned num_bios)
 {
-	struct dm_target_io *tio = alloc_tio(ci, ti, target_bio_nr);
-	struct bio *clone = &tio->clone;
+	int try;
 
-	tio->len_ptr = len;
+	for (try = 0; try < 2; try++) {
+		int bio_nr;
+		struct bio *bio;
+
+		if (try)
+			mutex_lock(&ci->md->table_devices_lock);
+		for (bio_nr = 0; bio_nr < num_bios; bio_nr++) {
+			bio = bio_alloc_bioset(try ? GFP_NOIO : GFP_NOWAIT,
+					       0, ci->md->bs);
+			if (bio) {
+				struct dm_target_io *tio;
+				bio_list_add(blist, bio);
+				tio = container_of(bio, struct dm_target_io, clone);
 
-	__bio_clone_fast(clone, ci->bio);
-	if (len)
-		bio_setup_sector(clone, ci->sector, *len);
+				tio->io = ci->io;
+				tio->ti = ti;
+				tio->target_bio_nr = bio_nr;
+			} else
+				break;
+		}
+		if (try)
+			mutex_unlock(&ci->md->table_devices_lock);
+		if (bio_nr == num_bios)
+			return;
 
-	__map_bio(tio);
+		while ((bio = bio_list_pop(blist)) != NULL)
+			bio_put(bio);
+	}
 }
 
 static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
 				  unsigned num_bios, unsigned *len)
 {
-	unsigned target_bio_nr;
+	struct bio_list blist = BIO_EMPTY_LIST;
+	struct bio *bio;
 
-	for (target_bio_nr = 0; target_bio_nr < num_bios; target_bio_nr++)
-		__clone_and_map_simple_bio(ci, ti, target_bio_nr, len);
+	if (num_bios == 1)
+		bio_list_add(&blist, &alloc_tio(ci, ti)->clone);
+	else
+		alloc_multiple_bios(&blist, ci, ti, num_bios);
+
+	while ((bio = bio_list_pop(&blist)) != NULL) {
+		struct dm_target_io *tio = container_of(
+			bio, struct dm_target_io, clone);
+		tio->len_ptr = len;
+		__bio_clone_fast(bio, ci->bio);
+		if (len)
+			bio_setup_sector(bio, ci->sector, *len);
+		__map_bio(tio);
+	}
 }
 
 static int __send_empty_flush(struct clone_info *ci)
@@ -1325,7 +1356,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
 	struct dm_target_io *tio;
 	int r = 0;
 
-	tio = alloc_tio(ci, ti, 0);
+	tio = alloc_tio(ci, ti);
 	tio->len_ptr = len;
 	r = clone_bio(tio, bio, sector, *len);
 	if (r >= 0)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  parent reply	other threads:[~2017-11-22  4:01 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-18  4:38 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-06-18  4:38 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
2017-06-18  4:38 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
2017-06-18  4:38 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
2017-06-18  4:38 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
2017-06-18  4:38 ` [PATCH 06/13] rbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18  4:38 ` [PATCH 08/13] pktcdvd: " NeilBrown
2017-06-18  4:38 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
2017-06-18  4:38 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
2017-06-18  4:38 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18  4:38 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
2017-06-18  4:38 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
2017-06-18  4:38 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
2017-06-18  4:38 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
2017-06-18 18:41 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning Jens Axboe
2017-06-18 21:36   ` NeilBrown
2017-11-20 16:43     ` Mike Snitzer
2017-11-21  0:34       ` [dm-devel] " NeilBrown
2017-11-21  1:35         ` Mike Snitzer
2017-11-21 12:10           ` Mike Snitzer
2017-11-21 12:43             ` Mike Snitzer
2017-11-21 19:47               ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] Mike Snitzer
2017-11-21 21:23                 ` [dm-devel] " Mikulas Patocka
2017-11-21 22:51                   ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER Mike Snitzer
2017-11-22  1:21                     ` Mikulas Patocka
2017-11-22  2:32                       ` Mike Snitzer
2017-11-22  4:00                       ` NeilBrown [this message]
2017-11-22  4:28                         ` Mike Snitzer
2017-11-22 21:18                           ` Mike Snitzer
2017-11-22 18:24                         ` [dm-devel] " Mikulas Patocka
2017-11-22 18:49                           ` Mike Snitzer
2017-11-23  5:12                           ` [dm-devel] " NeilBrown
2017-11-23 22:52                           ` [PATCH] dm: use cloned bio as head, not remainder, in __split_and_process_bio() NeilBrown
2017-11-27 14:23                             ` Mike Snitzer
2017-11-28 22:18                               ` [dm-devel] " NeilBrown
2017-11-21 23:03                   ` [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] NeilBrown
2017-11-21 19:44             ` [dm-devel] [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-11-21 19:50               ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bmjv0xos.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).