linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Damien Le Moal <damien.lemoal@wdc.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: [PATCH 4.19 43/44] dm zoned: Fix target BIO completion handling
Date: Tue, 18 Dec 2018 17:39:55 +0100	[thread overview]
Message-ID: <20181218163932.444455036@linuxfoundation.org> (raw)
In-Reply-To: <20181218163927.119623235@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Damien Le Moal <damien.lemoal@wdc.com>

commit d57f9da890696af1484f4a47f7f123560197865a upstream.

struct bioctx includes the ref refcount_t to track the number of I/O
fragments used to process a target BIO as well as ensure that the zone
of the BIO is kept in the active state throughout the lifetime of the
BIO. However, since decrementing of this reference count is done in the
target .end_io method, the function bio_endio() must be called multiple
times for read and write target BIOs, which causes problems with the
value of the __bi_remaining struct bio field for chained BIOs (e.g. the
clone BIO passed by dm core is large and splits into fragments by the
block layer), resulting in incorrect values and inconsistencies with the
BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:

BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);

in bio_remaining_done() called from bio_endio().

Fix this ensuring that bio_endio() is called only once for any target
BIO by always using internal clone BIOs for processing any read or
write target BIO. This allows reference counting using the target BIO
context counter to trigger the target BIO completion bio_endio() call
once all data, metadata and other zone work triggered by the BIO
complete.

Overall, this simplifies the code too as the target .end_io becomes
unnecessary and differences between read and write BIO issuing and
completion processing disappear.

Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-zoned-target.c |  122 +++++++++++++------------------------------
 1 file changed, 38 insertions(+), 84 deletions(-)

--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -20,7 +20,6 @@ struct dmz_bioctx {
 	struct dm_zone		*zone;
 	struct bio		*bio;
 	atomic_t		ref;
-	blk_status_t		status;
 };
 
 /*
@@ -78,65 +77,66 @@ static inline void dmz_bio_endio(struct
 {
 	struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
 
-	if (bioctx->status == BLK_STS_OK && status != BLK_STS_OK)
-		bioctx->status = status;
-	bio_endio(bio);
+	if (status != BLK_STS_OK && bio->bi_status == BLK_STS_OK)
+		bio->bi_status = status;
+
+	if (atomic_dec_and_test(&bioctx->ref)) {
+		struct dm_zone *zone = bioctx->zone;
+
+		if (zone) {
+			if (bio->bi_status != BLK_STS_OK &&
+			    bio_op(bio) == REQ_OP_WRITE &&
+			    dmz_is_seq(zone))
+				set_bit(DMZ_SEQ_WRITE_ERR, &zone->flags);
+			dmz_deactivate_zone(zone);
+		}
+		bio_endio(bio);
+	}
 }
 
 /*
- * Partial clone read BIO completion callback. This terminates the
+ * Completion callback for an internally cloned target BIO. This terminates the
  * target BIO when there are no more references to its context.
  */
-static void dmz_read_bio_end_io(struct bio *bio)
+static void dmz_clone_endio(struct bio *clone)
 {
-	struct dmz_bioctx *bioctx = bio->bi_private;
-	blk_status_t status = bio->bi_status;
+	struct dmz_bioctx *bioctx = clone->bi_private;
+	blk_status_t status = clone->bi_status;
 
-	bio_put(bio);
+	bio_put(clone);
 	dmz_bio_endio(bioctx->bio, status);
 }
 
 /*
- * Issue a BIO to a zone. The BIO may only partially process the
+ * Issue a clone of a target BIO. The clone may only partially process the
  * original target BIO.
  */
-static int dmz_submit_read_bio(struct dmz_target *dmz, struct dm_zone *zone,
-			       struct bio *bio, sector_t chunk_block,
-			       unsigned int nr_blocks)
+static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
+			  struct bio *bio, sector_t chunk_block,
+			  unsigned int nr_blocks)
 {
 	struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-	sector_t sector;
 	struct bio *clone;
 
-	/* BIO remap sector */
-	sector = dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
-
-	/* If the read is not partial, there is no need to clone the BIO */
-	if (nr_blocks == dmz_bio_blocks(bio)) {
-		/* Setup and submit the BIO */
-		bio->bi_iter.bi_sector = sector;
-		atomic_inc(&bioctx->ref);
-		generic_make_request(bio);
-		return 0;
-	}
-
-	/* Partial BIO: we need to clone the BIO */
 	clone = bio_clone_fast(bio, GFP_NOIO, &dmz->bio_set);
 	if (!clone)
 		return -ENOMEM;
 
-	/* Setup the clone */
-	clone->bi_iter.bi_sector = sector;
+	bio_set_dev(clone, dmz->dev->bdev);
+	clone->bi_iter.bi_sector =
+		dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
 	clone->bi_iter.bi_size = dmz_blk2sect(nr_blocks) << SECTOR_SHIFT;
-	clone->bi_end_io = dmz_read_bio_end_io;
+	clone->bi_end_io = dmz_clone_endio;
 	clone->bi_private = bioctx;
 
 	bio_advance(bio, clone->bi_iter.bi_size);
 
-	/* Submit the clone */
 	atomic_inc(&bioctx->ref);
 	generic_make_request(clone);
 
+	if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
+		zone->wp_block += nr_blocks;
+
 	return 0;
 }
 
@@ -214,7 +214,7 @@ static int dmz_handle_read(struct dmz_ta
 		if (nr_blocks) {
 			/* Valid blocks found: read them */
 			nr_blocks = min_t(unsigned int, nr_blocks, end_block - chunk_block);
-			ret = dmz_submit_read_bio(dmz, rzone, bio, chunk_block, nr_blocks);
+			ret = dmz_submit_bio(dmz, rzone, bio, chunk_block, nr_blocks);
 			if (ret)
 				return ret;
 			chunk_block += nr_blocks;
@@ -229,25 +229,6 @@ static int dmz_handle_read(struct dmz_ta
 }
 
 /*
- * Issue a write BIO to a zone.
- */
-static void dmz_submit_write_bio(struct dmz_target *dmz, struct dm_zone *zone,
-				 struct bio *bio, sector_t chunk_block,
-				 unsigned int nr_blocks)
-{
-	struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-
-	/* Setup and submit the BIO */
-	bio_set_dev(bio, dmz->dev->bdev);
-	bio->bi_iter.bi_sector = dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
-	atomic_inc(&bioctx->ref);
-	generic_make_request(bio);
-
-	if (dmz_is_seq(zone))
-		zone->wp_block += nr_blocks;
-}
-
-/*
  * Write blocks directly in a data zone, at the write pointer.
  * If a buffer zone is assigned, invalidate the blocks written
  * in place.
@@ -265,7 +246,9 @@ static int dmz_handle_direct_write(struc
 		return -EROFS;
 
 	/* Submit write */
-	dmz_submit_write_bio(dmz, zone, bio, chunk_block, nr_blocks);
+	ret = dmz_submit_bio(dmz, zone, bio, chunk_block, nr_blocks);
+	if (ret)
+		return ret;
 
 	/*
 	 * Validate the blocks in the data zone and invalidate
@@ -301,7 +284,9 @@ static int dmz_handle_buffered_write(str
 		return -EROFS;
 
 	/* Submit write */
-	dmz_submit_write_bio(dmz, bzone, bio, chunk_block, nr_blocks);
+	ret = dmz_submit_bio(dmz, bzone, bio, chunk_block, nr_blocks);
+	if (ret)
+		return ret;
 
 	/*
 	 * Validate the blocks in the buffer zone
@@ -600,7 +585,6 @@ static int dmz_map(struct dm_target *ti,
 	bioctx->zone = NULL;
 	bioctx->bio = bio;
 	atomic_set(&bioctx->ref, 1);
-	bioctx->status = BLK_STS_OK;
 
 	/* Set the BIO pending in the flush list */
 	if (!nr_sectors && bio_op(bio) == REQ_OP_WRITE) {
@@ -624,35 +608,6 @@ static int dmz_map(struct dm_target *ti,
 }
 
 /*
- * Completed target BIO processing.
- */
-static int dmz_end_io(struct dm_target *ti, struct bio *bio, blk_status_t *error)
-{
-	struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-
-	if (bioctx->status == BLK_STS_OK && *error)
-		bioctx->status = *error;
-
-	if (!atomic_dec_and_test(&bioctx->ref))
-		return DM_ENDIO_INCOMPLETE;
-
-	/* Done */
-	bio->bi_status = bioctx->status;
-
-	if (bioctx->zone) {
-		struct dm_zone *zone = bioctx->zone;
-
-		if (*error && bio_op(bio) == REQ_OP_WRITE) {
-			if (dmz_is_seq(zone))
-				set_bit(DMZ_SEQ_WRITE_ERR, &zone->flags);
-		}
-		dmz_deactivate_zone(zone);
-	}
-
-	return DM_ENDIO_DONE;
-}
-
-/*
  * Get zoned device information.
  */
 static int dmz_get_zoned_device(struct dm_target *ti, char *path)
@@ -947,7 +902,6 @@ static struct target_type dmz_type = {
 	.ctr		 = dmz_ctr,
 	.dtr		 = dmz_dtr,
 	.map		 = dmz_map,
-	.end_io		 = dmz_end_io,
 	.io_hints	 = dmz_io_hints,
 	.prepare_ioctl	 = dmz_prepare_ioctl,
 	.postsuspend	 = dmz_suspend,



  parent reply	other threads:[~2018-12-18 16:42 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-18 16:39 [PATCH 4.19 00/44] 4.19.11-stable review Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 01/44] sched/pelt: Fix warning and clean up IRQ PELT config Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 02/44] scsi: raid_attrs: fix unused variable warning Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 03/44] staging: olpc_dcon: add a missing dependency Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 04/44] slimbus: ngd: mark PM functions as __maybe_unused Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 05/44] i2c: aspeed: fix build warning Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 06/44] ARM: dts: qcom-apq8064-arrow-sd-600eval fix graph_endpoint warning Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 07/44] drm/msm: fix address space warning Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 08/44] pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11 Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 09/44] aio: fix spectre gadget in lookup_ioctx Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 10/44] scripts/spdxcheck.py: always open files in binary mode Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 11/44] fs/iomap.c: get/put the page in iomap_page_create/release() Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 12/44] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 13/44] arm64: dma-mapping: Fix FORCE_CONTIGUOUS buffer clearing Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 14/44] block/bio: Do not zero user pages Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 15/44] ovl: fix decode of dir file handle with multi lower layers Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 16/44] ovl: fix missing override creds in link of a metacopy upper Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 17/44] MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310 Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 18/44] mmc: core: use mrq->sbc when sending CMD23 for RPMB Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 19/44] mmc: sdhci-omap: Fix DCRC error handling during tuning Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 20/44] mmc: sdhci: fix the timeout check window for clock and reset Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 21/44] fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 22/44] ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 23/44] ARM: dts: bcm2837: Fix polarity of wifi reset GPIOs Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 24/44] dm thin: send event about thin-pool state change _after_ making it Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 25/44] dm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty() Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 26/44] dm: call blk_queue_split() to impose device limits on bios Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 27/44] tracing: Fix memory leak in create_filter() Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 28/44] tracing: Fix memory leak in set_trigger_filter() Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 29/44] tracing: Fix memory leak of instance function hash filters Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 30/44] media: vb2: dont call __vb2_queue_cancel if vb2_start_streaming failed Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 31/44] powerpc/msi: Fix NULL pointer access in teardown code Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 32/44] powerpc: Look for "stdout-path" when setting up legacy consoles Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 33/44] drm/nouveau/kms: Fix memory leak in nv50_mstm_del() Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 34/44] drm/nouveau/kms/nv50-: also flush fb writes when rewinding push buffer Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 35/44] Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec" Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 36/44] drm/i915/gvt: Fix tiled memory decoding bug on BDW Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 37/44] drm/i915/execlists: Apply a full mb before execution for Braswell Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 38/44] drm/amdgpu/powerplay: Apply avfs cks-off voltages on VI Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 39/44] drm/amdkfd: add new vega10 pci ids Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 40/44] drm/amdgpu: add some additional " Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 41/44] drm/amdgpu: update smu firmware images for VI variants (v2) Greg Kroah-Hartman
2018-12-18 16:39 ` [PATCH 4.19 42/44] drm/amdgpu: update SMC firmware image for polaris10 variants Greg Kroah-Hartman
2018-12-18 16:39 ` Greg Kroah-Hartman [this message]
2018-12-18 16:39 ` [PATCH 4.19 44/44] x86/build: Fix compiler support check for CONFIG_RETPOLINE Greg Kroah-Hartman
2018-12-18 20:26 ` [PATCH 4.19 00/44] 4.19.11-stable review shuah
2018-12-19 13:19   ` Greg Kroah-Hartman
2018-12-18 21:10 ` Dan Rue
2018-12-19 13:19   ` Greg Kroah-Hartman
2018-12-19 15:01 ` Harsh Shandilya
2018-12-19 15:14   ` Greg Kroah-Hartman
2018-12-19 17:23 ` Guenter Roeck
2018-12-19 18:37   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181218163932.444455036@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=damien.lemoal@wdc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).