All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-23 14:33 Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (24 more replies)
  0 siblings, 25 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
supported by the block layer, and switches existing implementations
of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
removes incorrect discard_zeroes_data, and also switches WRITE SAME
based zeroing in SCSI to this new method.

I've done testing with ATA, SCSI and NVMe setups, but there are
a few things that will need more attention:

 - what is dm-kcopyd doing with the current WRITE SAME usage
   that gets multiple biovecs?  Can we fix it up in any way.
 - what are we going to do with discards through parity raid?
   I suspect we should either just disable it for now entirely
   or modify raid5.c to issue REQ_OP_WRITE_ZEROES to the underlying
   devices.  But I need someone with such a setup to test it.
 - The DRBD code in this area was very odd, and will need an
   audit from the maintainers.

Note that this series needs to be applied on top of the
"support ranges TRIM for libata" series.

A git tree is also avaiable at:

    git://git.infradead.org/users/hch/block.git discard-rework

Gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/discard-rework

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-28 16:12     ` Bart Van Assche
  2017-03-23 14:33 ` [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios Christoph Hellwig
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Make life easy for implementations that needs to send a data buffer
to the device (e.g. SCSI) by numbering it as a data out command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blk_types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb55d0f..6393c13a6498 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -160,7 +160,7 @@ enum req_opf {
 	/* write the same sector many times */
 	REQ_OP_WRITE_SAME	= 7,
 	/* write the zero filled sector many times */
-	REQ_OP_WRITE_ZEROES	= 8,
+	REQ_OP_WRITE_ZEROES	= 9,
 
 	/* SCSI passthrough using struct scsi_request */
 	REQ_OP_SCSI_IN		= 32,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Copy and past the REQ_OP_WRITE_SAME code to prepare to implementations
that limit the write zeroes size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-merge.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index c62a6f0325e0..f9e5a3e0944b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -63,6 +63,20 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
 	return bio_split(bio, split_sectors, GFP_NOIO, bs);
 }
 
+static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
+		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
+{
+	*nsegs = 1;
+
+	if (!q->limits.max_write_zeroes_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
+		return NULL;
+
+	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
+}
+
 static struct bio *blk_bio_write_same_split(struct request_queue *q,
 					    struct bio *bio,
 					    struct bio_set *bs,
@@ -209,8 +223,7 @@ void blk_queue_split(struct request_queue *q, struct bio **bio,
 		split = blk_bio_discard_split(q, *bio, bs, &nsegs);
 		break;
 	case REQ_OP_WRITE_ZEROES:
-		split = NULL;
-		nsegs = (*bio)->bi_phys_segments;
+		split = blk_bio_write_zeroes_split(q, *bio, bs, &nsegs);
 		break;
 	case REQ_OP_WRITE_SAME:
 		split = blk_bio_write_same_split(q, *bio, bs, &nsegs);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-28 18:50     ` Bart Van Assche
  2017-03-23 14:33 ` [PATCH 04/23] md: support REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c     | 45 ++++++++++++++++++++++++++++++++++++++++-----
 drivers/scsi/sd_zbc.c |  1 +
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index af632e350ab4..b6f70a09a301 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -748,7 +748,7 @@ static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
 	return scsi_init_io(cmd);
 }
 
-static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
+static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, bool unmap)
 {
 	struct scsi_device *sdp = cmd->device;
 	struct request *rq = cmd->request;
@@ -765,13 +765,14 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
 
 	cmd->cmd_len = 16;
 	cmd->cmnd[0] = WRITE_SAME_16;
-	cmd->cmnd[1] = 0x8; /* UNMAP */
+	if (unmap)
+		cmd->cmnd[1] = 0x8; /* UNMAP */
 	put_unaligned_be64(sector, &cmd->cmnd[2]);
 	put_unaligned_be32(nr_sectors, &cmd->cmnd[10]);
 
 	cmd->allowed = SD_MAX_RETRIES;
 	cmd->transfersize = data_len;
-	rq->timeout = SD_TIMEOUT;
+	rq->timeout = unmap ? SD_TIMEOUT : SD_WRITE_SAME_TIMEOUT;
 	scsi_req(rq)->resid_len = data_len;
 
 	return scsi_init_io(cmd);
@@ -801,7 +802,7 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd, bool unmap)
 
 	cmd->allowed = SD_MAX_RETRIES;
 	cmd->transfersize = data_len;
-	rq->timeout = SD_TIMEOUT;
+	rq->timeout = unmap ? SD_TIMEOUT : SD_WRITE_SAME_TIMEOUT;
 	scsi_req(rq)->resid_len = data_len;
 
 	return scsi_init_io(cmd);
@@ -857,11 +858,39 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd)
 	return scsi_init_io(cmd);
 }
 
+static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_device *sdp = cmd->device;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
+	u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+
+	if (sdp->ata_trim) {
+		if (!sdp->ata_trim_zeroes_data)
+			return BLKPREP_INVALID;
+		return sd_setup_ata_trim_cmnd(cmd);
+	}
+	if (sdp->no_write_same)
+		return BLKPREP_INVALID;
+	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
+		return sd_setup_write_same16_cmnd(cmd, false);
+	return sd_setup_write_same10_cmnd(cmd, false);
+}
+
 static void sd_config_write_same(struct scsi_disk *sdkp)
 {
 	struct request_queue *q = sdkp->disk->queue;
 	unsigned int logical_block_size = sdkp->device->sector_size;
 
+	if (sdkp->device->ata_trim) {
+		if (sdkp->device->ata_trim_zeroes_data)
+			sdkp->max_ws_blocks = 65535 * (512 / sizeof(__le64));
+		else
+			sdkp->max_ws_blocks = 0;
+		goto config_write_zeroes;
+	}
+
 	if (sdkp->device->no_write_same) {
 		sdkp->max_ws_blocks = 0;
 		goto out;
@@ -886,6 +915,9 @@ static void sd_config_write_same(struct scsi_disk *sdkp)
 out:
 	blk_queue_max_write_same_sectors(q, sdkp->max_ws_blocks *
 					 (logical_block_size >> 9));
+config_write_zeroes:
+	blk_queue_max_write_zeroes_sectors(q, sdkp->max_ws_blocks *
+					 (logical_block_size >> 9));
 }
 
 /**
@@ -1226,7 +1258,7 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 		case SD_LBP_UNMAP:
 			return sd_setup_unmap_cmnd(cmd);
 		case SD_LBP_WS16:
-			return sd_setup_write_same16_cmnd(cmd);
+			return sd_setup_write_same16_cmnd(cmd, true);
 		case SD_LBP_WS10:
 			return sd_setup_write_same10_cmnd(cmd, true);
 		case SD_LBP_ZERO:
@@ -1236,6 +1268,8 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 		default:
 			return BLKPREP_INVALID;
 		}
+	case REQ_OP_WRITE_ZEROES:
+		return sd_setup_write_zeroes_cmnd(cmd);
 	case REQ_OP_WRITE_SAME:
 		return sd_setup_write_same_cmnd(cmd);
 	case REQ_OP_FLUSH:
@@ -1876,6 +1910,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 
 	switch (req_op(req)) {
 	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_ZEROES:
 	case REQ_OP_WRITE_SAME:
 	case REQ_OP_ZONE_RESET:
 		if (!result) {
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 92620c8ea8ad..1994f7799fce 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -329,6 +329,7 @@ void sd_zbc_complete(struct scsi_cmnd *cmd,
 
 	switch (req_op(rq)) {
 	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_ZEROES:
 	case REQ_OP_WRITE_SAME:
 	case REQ_OP_ZONE_RESET:
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 04/23] md: support REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 05/23] dm: " Christoph Hellwig
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Copy & paste from the REQ_OP_WRITE_SAME code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/linear.c    | 1 +
 drivers/md/md.h        | 7 +++++++
 drivers/md/multipath.c | 1 +
 drivers/md/raid0.c     | 2 ++
 drivers/md/raid1.c     | 4 +++-
 drivers/md/raid10.c    | 1 +
 drivers/md/raid5.c     | 1 +
 7 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 3e38e0207a3e..377a8a3672e3 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -293,6 +293,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 						      split, disk_devt(mddev->gendisk),
 						      bio_sector);
 			mddev_check_writesame(mddev, split);
+			mddev_check_write_zeroes(mddev, split);
 			generic_make_request(split);
 		}
 	} while (split != bio);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index dde8ecb760c8..1e76d64ce180 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -709,4 +709,11 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
 	    !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
 		mddev->queue->limits.max_write_same_sectors = 0;
 }
+
+static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio)
+{
+	if (bio_op(bio) == REQ_OP_WRITE_ZEROES &&
+	    !bdev_get_queue(bio->bi_bdev)->limits.max_write_zeroes_sectors)
+		mddev->queue->limits.max_write_zeroes_sectors = 0;
+}
 #endif /* _MD_MD_H */
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 79a12b59250b..e95d521d93e9 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -139,6 +139,7 @@ static void multipath_make_request(struct mddev *mddev, struct bio * bio)
 	mp_bh->bio.bi_end_io = multipath_end_request;
 	mp_bh->bio.bi_private = mp_bh;
 	mddev_check_writesame(mddev, &mp_bh->bio);
+	mddev_check_write_zeroes(mddev, &mp_bh->bio);
 	generic_make_request(&mp_bh->bio);
 	return;
 }
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 93347ca7c7a6..ce7a6a56cf73 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -383,6 +383,7 @@ static int raid0_run(struct mddev *mddev)
 
 		blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
 		blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
+		blk_queue_max_write_zeroes_sectors(mddev->queue, mddev->chunk_sectors);
 		blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
 
 		blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
@@ -504,6 +505,7 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 						      split, disk_devt(mddev->gendisk),
 						      bio_sector);
 			mddev_check_writesame(mddev, split);
+			mddev_check_write_zeroes(mddev, split);
 			generic_make_request(split);
 		}
 	} while (split != bio);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a34f58772022..b59cc100320a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3177,8 +3177,10 @@ static int raid1_run(struct mddev *mddev)
 	if (IS_ERR(conf))
 		return PTR_ERR(conf);
 
-	if (mddev->queue)
+	if (mddev->queue) {
 		blk_queue_max_write_same_sectors(mddev->queue, 0);
+		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
+	}
 
 	rdev_for_each(rdev, mddev) {
 		if (!mddev->gendisk)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e89a8d78a9ed..28ec3a93acee 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3749,6 +3749,7 @@ static int raid10_run(struct mddev *mddev)
 		blk_queue_max_discard_sectors(mddev->queue,
 					      mddev->chunk_sectors);
 		blk_queue_max_write_same_sectors(mddev->queue, 0);
+		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
 		blk_queue_io_min(mddev->queue, chunk_size);
 		if (conf->geo.raid_disks % conf->geo.near_copies)
 			blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ed5cd705b985..8cf1f86dcd05 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7272,6 +7272,7 @@ static int raid5_run(struct mddev *mddev)
 		mddev->queue->limits.discard_zeroes_data = 0;
 
 		blk_queue_max_write_same_sectors(mddev->queue, 0);
+		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
 
 		rdev_for_each(rdev, mddev) {
 			disk_stack_limits(mddev->gendisk, rdev->bdev,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 05/23] dm: support REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (3 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 04/23] md: support REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Copy & paste from the REQ_OP_WRITE_SAME code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm-core.h          |  1 +
 drivers/md/dm-io.c            | 10 +++++++---
 drivers/md/dm-linear.c        |  1 +
 drivers/md/dm-mpath.c         |  1 +
 drivers/md/dm-rq.c            | 11 ++++++++---
 drivers/md/dm-stripe.c        |  2 ++
 drivers/md/dm-table.c         | 30 ++++++++++++++++++++++++++++++
 drivers/md/dm.c               | 31 ++++++++++++++++++++++++++++---
 include/linux/device-mapper.h |  6 ++++++
 9 files changed, 84 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 136fda3ff9e5..fea5bd52ada8 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -132,6 +132,7 @@ void dm_init_md_queue(struct mapped_device *md);
 void dm_init_normal_md_queue(struct mapped_device *md);
 int md_in_flight(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
+void disable_write_zeroes(struct mapped_device *md);
 
 static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
 {
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 03940bf36f6c..fad2d5d1888e 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -312,9 +312,12 @@ static void do_region(int op, int op_flags, unsigned region,
 	 */
 	if (op == REQ_OP_DISCARD)
 		special_cmd_max_sectors = q->limits.max_discard_sectors;
+	else if (op == REQ_OP_WRITE_ZEROES)
+		special_cmd_max_sectors = q->limits.max_write_zeroes_sectors;
 	else if (op == REQ_OP_WRITE_SAME)
 		special_cmd_max_sectors = q->limits.max_write_same_sectors;
-	if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_SAME) &&
+	if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES ||
+	     op == REQ_OP_WRITE_SAME)  &&
 	    special_cmd_max_sectors == 0) {
 		dec_count(io, region, -EOPNOTSUPP);
 		return;
@@ -328,7 +331,8 @@ static void do_region(int op, int op_flags, unsigned region,
 		/*
 		 * Allocate a suitably sized-bio.
 		 */
-		if ((op == REQ_OP_DISCARD) || (op == REQ_OP_WRITE_SAME))
+		if ((op == REQ_OP_DISCARD) || (op == REQ_OP_WRITE_ZEROES) ||
+		    (op == REQ_OP_WRITE_SAME))
 			num_bvecs = 1;
 		else
 			num_bvecs = min_t(int, BIO_MAX_PAGES,
@@ -341,7 +345,7 @@ static void do_region(int op, int op_flags, unsigned region,
 		bio_set_op_attrs(bio, op, op_flags);
 		store_io_and_region_in_bio(bio, io, region);
 
-		if (op == REQ_OP_DISCARD) {
+		if (op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES) {
 			num_sectors = min_t(sector_t, special_cmd_max_sectors, remaining);
 			bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
 			remaining -= num_sectors;
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4788b0b989a9..e17fd44ceef5 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -59,6 +59,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_flush_bios = 1;
 	ti->num_discard_bios = 1;
 	ti->num_write_same_bios = 1;
+	ti->num_write_zeroes_bios = 1;
 	ti->private = lc;
 	return 0;
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7f223dbed49f..ab55955ed704 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1103,6 +1103,7 @@ static int multipath_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	ti->num_flush_bios = 1;
 	ti->num_discard_bios = 1;
 	ti->num_write_same_bios = 1;
+	ti->num_write_zeroes_bios = 1;
 	if (m->queue_mode == DM_TYPE_BIO_BASED)
 		ti->per_io_data_size = multipath_per_bio_data_size();
 	else
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 28955b94d2b2..6006d5d4b06e 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -298,9 +298,14 @@ static void dm_done(struct request *clone, int error, bool mapped)
 			r = rq_end_io(tio->ti, clone, error, &tio->info);
 	}
 
-	if (unlikely(r == -EREMOTEIO && (req_op(clone) == REQ_OP_WRITE_SAME) &&
-		     !clone->q->limits.max_write_same_sectors))
-		disable_write_same(tio->md);
+	if (unlikely(r == -EREMOTEIO)) {
+		if (req_op(clone) == REQ_OP_WRITE_SAME &&
+		    !clone->q->limits.max_write_same_sectors)
+			disable_write_same(tio->md);
+		if (req_op(clone) == REQ_OP_WRITE_ZEROES &&
+		    !clone->q->limits.max_write_zeroes_sectors)
+			disable_write_zeroes(tio->md);
+	}
 
 	if (r <= 0)
 		/* The target wants to complete the I/O */
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 28193a57bf47..5ef49c121d99 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -169,6 +169,7 @@ static int stripe_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_flush_bios = stripes;
 	ti->num_discard_bios = stripes;
 	ti->num_write_same_bios = stripes;
+	ti->num_write_zeroes_bios = stripes;
 
 	sc->chunk_size = chunk_size;
 	if (chunk_size & (chunk_size - 1))
@@ -293,6 +294,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_REMAPPED;
 	}
 	if (unlikely(bio_op(bio) == REQ_OP_DISCARD) ||
+	    unlikely(bio_op(bio) == REQ_OP_WRITE_ZEROES) ||
 	    unlikely(bio_op(bio) == REQ_OP_WRITE_SAME)) {
 		target_bio_nr = dm_bio_get_target_bio_nr(bio);
 		BUG_ON(target_bio_nr >= sc->stripes);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9c9d5a..5cd665c91ead 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1533,6 +1533,34 @@ static bool dm_table_supports_write_same(struct dm_table *t)
 	return true;
 }
 
+static int device_not_write_zeroes_capable(struct dm_target *ti, struct dm_dev *dev,
+					   sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return q && !q->limits.max_write_zeroes_sectors;
+}
+
+static bool dm_table_supports_write_zeroes(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned i = 0;
+
+	while (i < dm_table_get_num_targets(t)) {
+		ti = dm_table_get_target(t, i++);
+
+		if (!ti->num_write_zeroes_bios)
+			return false;
+
+		if (!ti->type->iterate_devices ||
+		    ti->type->iterate_devices(ti, device_not_write_zeroes_capable, NULL))
+			return false;
+	}
+
+	return true;
+}
+
+
 static int device_discard_capable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -1603,6 +1631,8 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 
 	if (!dm_table_supports_write_same(t))
 		q->limits.max_write_same_sectors = 0;
+	if (!dm_table_supports_write_zeroes(t))
+		q->limits.max_write_zeroes_sectors = 0;
 
 	if (dm_table_all_devices_attribute(t, queue_supports_sg_merge))
 		queue_flag_clear_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dfb75979e455..e8226359c8f7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -825,6 +825,14 @@ void disable_write_same(struct mapped_device *md)
 	limits->max_write_same_sectors = 0;
 }
 
+void disable_write_zeroes(struct mapped_device *md)
+{
+	struct queue_limits *limits = dm_get_queue_limits(md);
+
+	/* device doesn't really support WRITE ZEROES, disable it */
+	limits->max_write_zeroes_sectors = 0;
+}
+
 static void clone_endio(struct bio *bio)
 {
 	int error = bio->bi_error;
@@ -851,9 +859,14 @@ static void clone_endio(struct bio *bio)
 		}
 	}
 
-	if (unlikely(r == -EREMOTEIO && (bio_op(bio) == REQ_OP_WRITE_SAME) &&
-		     !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors))
-		disable_write_same(md);
+	if (unlikely(r == -EREMOTEIO)) {
+		if (bio_op(bio) == REQ_OP_WRITE_SAME &&
+		    !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
+			disable_write_same(md);
+		if (bio_op(bio) == REQ_OP_WRITE_ZEROES &&
+		    !bdev_get_queue(bio->bi_bdev)->limits.max_write_zeroes_sectors)
+			disable_write_zeroes(md);
+	}
 
 	free_tio(tio);
 	dec_pending(io, error);
@@ -1202,6 +1215,11 @@ static unsigned get_num_write_same_bios(struct dm_target *ti)
 	return ti->num_write_same_bios;
 }
 
+static unsigned get_num_write_zeroes_bios(struct dm_target *ti)
+{
+	return ti->num_write_zeroes_bios;
+}
+
 typedef bool (*is_split_required_fn)(struct dm_target *ti);
 
 static bool is_split_required_for_discard(struct dm_target *ti)
@@ -1256,6 +1274,11 @@ static int __send_write_same(struct clone_info *ci)
 	return __send_changing_extent_only(ci, get_num_write_same_bios, NULL);
 }
 
+static int __send_write_zeroes(struct clone_info *ci)
+{
+	return __send_changing_extent_only(ci, get_num_write_zeroes_bios, NULL);
+}
+
 /*
  * Select the correct strategy for processing a non-flush bio.
  */
@@ -1270,6 +1293,8 @@ static int __split_and_process_non_flush(struct clone_info *ci)
 		return __send_discard(ci);
 	else if (unlikely(bio_op(bio) == REQ_OP_WRITE_SAME))
 		return __send_write_same(ci);
+	else if (unlikely(bio_op(bio) == REQ_OP_WRITE_ZEROES))
+		return __send_write_zeroes(ci);
 
 	ti = dm_table_find_target(ci->map, ci->sector);
 	if (!dm_target_is_valid(ti))
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a7e6903866fd..3829bee2302a 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -255,6 +255,12 @@ struct dm_target {
 	unsigned num_write_same_bios;
 
 	/*
+	 * The number of WRITE ZEROES bios that will be submitted to the target.
+	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
+	 */
+	unsigned num_write_zeroes_bios;
+
+	/*
 	 * The minimum number of extra bytes allocated in each io for the
 	 * target to use.
 	 */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (4 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 05/23] dm: " Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:55   ` Mike Snitzer
  2017-03-23 14:33 ` [PATCH 07/23] block: stop using blkdev_issue_write_same for zeroing Christoph Hellwig
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

It seems like the code currently passes whatever it was using for writes
to WRITE SAME.  Just switch it to WRITE ZEROES, although that doesn't
need any payload.

Untested, and confused by the code, maybe someone who understands it
better than me can help..

Not-yet-signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm-kcopyd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 9e9d04cb7d51..f85846741d50 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -733,11 +733,11 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
 		job->pages = &zero_page_list;
 
 		/*
-		 * Use WRITE SAME to optimize zeroing if all dests support it.
+		 * Use WRITE ZEROES to optimize zeroing if all dests support it.
 		 */
-		job->rw = REQ_OP_WRITE_SAME;
+		job->rw = REQ_OP_WRITE_ZEROES;
 		for (i = 0; i < job->num_dests; i++)
-			if (!bdev_write_same(job->dests[i].bdev)) {
+			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
 				job->rw = WRITE;
 				break;
 			}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 07/23] block: stop using blkdev_issue_write_same for zeroing
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (5 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 08/23] block: add a flags argument to (__)blkdev_issue_zeroout Christoph Hellwig
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

We'll always use the WRITE ZEROES code for zeroing now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index ed1e78e24db0..a93115ccf461 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -364,10 +364,6 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			return 0;
 	}
 
-	if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
-			ZERO_PAGE(0)))
-		return 0;
-
 	blk_start_plug(&plug);
 	ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
 			&bio, discard);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 08/23] block: add a flags argument to (__)blkdev_issue_zeroout
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (6 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 07/23] block: stop using blkdev_issue_write_same for zeroing Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 09/23] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Turn the existin discard flag into a new BLKDEV_ZERO_UNMAP flag with
similar semantics, but without referring to diѕcard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c              | 31 ++++++++++++++-----------------
 block/ioctl.c                |  3 +--
 drivers/nvme/target/io-cmd.c |  2 +-
 fs/block_dev.c               |  2 +-
 fs/dax.c                     |  2 +-
 fs/xfs/xfs_bmap_util.c       |  2 +-
 include/linux/blkdev.h       | 14 +++++++++-----
 7 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index a93115ccf461..1e849f904d4c 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -282,14 +282,18 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
  * @nr_sects:	number of sectors to write
  * @gfp_mask:	memory allocation flags (for bio_alloc)
  * @biop:	pointer to anchor bio
- * @discard:	discard flag
+ * @flags:	controls detailed behavior
  *
  * Description:
- *  Generate and issue number of bios with zerofiled pages.
+ *  Zero-fill a block range, either using hardware offload or by explicitly
+ *  writing zeroes to the device.
+ *
+ *  If a device is using logical block provisioning, the underlying space will
+ *  only be release if %flags contains BLKDEV_ZERO_UNMAP.
  */
 int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
-		bool discard)
+		unsigned flags)
 {
 	int ret;
 	int bi_size = 0;
@@ -337,28 +341,21 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
  * @sector:	start sector
  * @nr_sects:	number of sectors to write
  * @gfp_mask:	memory allocation flags (for bio_alloc)
- * @discard:	whether to discard the block range
+ * @flags:	controls detailed behavior
  *
  * Description:
- *  Zero-fill a block range.  If the discard flag is set and the block
- *  device guarantees that subsequent READ operations to the block range
- *  in question will return zeroes, the blocks will be discarded. Should
- *  the discard request fail, if the discard flag is not set, or if
- *  discard_zeroes_data is not supported, this function will resort to
- *  zeroing the blocks manually, thus provisioning (allocating,
- *  anchoring) them. If the block device supports WRITE ZEROES or WRITE SAME
- *  command(s), blkdev_issue_zeroout() will use it to optimize the process of
- *  clearing the block range. Otherwise the zeroing will be performed
- *  using regular WRITE calls.
+ *  Zero-fill a block range, either using hardware offload or by explicitly
+ *  writing zeroes to the device.  See __blkdev_issue_zeroout() for the
+ *  valid values for %flags.
  */
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
-			 sector_t nr_sects, gfp_t gfp_mask, bool discard)
+		sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
 {
 	int ret;
 	struct bio *bio = NULL;
 	struct blk_plug plug;
 
-	if (discard) {
+	if (flags & BLKDEV_ZERO_UNMAP) {
 		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
 				BLKDEV_DISCARD_ZERO))
 			return 0;
@@ -366,7 +363,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 
 	blk_start_plug(&plug);
 	ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
-			&bio, discard);
+			&bio, flags);
 	if (ret == 0 && bio) {
 		ret = submit_bio_wait(bio);
 		bio_put(bio);
diff --git a/block/ioctl.c b/block/ioctl.c
index 7b88820b93d9..304685022d9f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -254,8 +254,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 	mapping = bdev->bd_inode->i_mapping;
 	truncate_inode_pages_range(mapping, start, end);
 
-	return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
-				    false);
+	return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, 0);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 4195115c7e54..1366babaee50 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -184,7 +184,7 @@ static void nvmet_execute_write_zeroes(struct nvmet_req *req)
 		(req->ns->blksize_shift - 9)) + 1;
 
 	if (__blkdev_issue_zeroout(req->ns->bdev, sector, nr_sector,
-				GFP_KERNEL, &bio, true))
+				GFP_KERNEL, &bio, BLKDEV_ZERO_UNMAP))
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
 
 	if (bio) {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2eca00ec4370..6128283b7830 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2110,7 +2110,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
-					    GFP_KERNEL, false);
+					    GFP_KERNEL, 0);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
 		/* Only punch if the device can do zeroing discard. */
diff --git a/fs/dax.c b/fs/dax.c
index de622d4282a6..7a107355d33f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -982,7 +982,7 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		sector_t start_sector = dax.sector + (offset >> 9);
 
 		return blkdev_issue_zeroout(bdev, start_sector,
-				length >> 9, GFP_NOFS, true);
+				length >> 9, GFP_NOFS, BLKDEV_ZERO_UNMAP);
 	} else {
 		if (dax_map_atomic(bdev, &dax) < 0)
 			return PTR_ERR(dax.addr);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8b75dcea5966..6b87c4296787 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -81,7 +81,7 @@ xfs_zero_extent(
 	return blkdev_issue_zeroout(xfs_find_bdev_for_inode(VFS_I(ip)),
 		block << (mp->m_super->s_blocksize_bits - 9),
 		count_fsb << (mp->m_super->s_blocksize_bits - 9),
-		GFP_NOFS, true);
+		GFP_NOFS, BLKDEV_ZERO_UNMAP);
 }
 
 int
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b3bd646f580..19209416225d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1333,23 +1333,27 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 	return bqt->tag_index[tag];
 }
 
+extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
+extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
 
 #define BLKDEV_DISCARD_SECURE	(1 << 0)	/* issue a secure erase */
 #define BLKDEV_DISCARD_ZERO	(1 << 1)	/* must reliably zero data */
 
-extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
 extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags);
 extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, int flags,
 		struct bio **biop);
-extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
+
+#define BLKDEV_ZERO_UNMAP	(1 << 0)  /* free unused blocks */
+
 extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
-		bool discard);
+		unsigned flags);
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp_mask, bool discard);
+		sector_t nr_sects, gfp_t gfp_mask, unsigned flags);
+
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 09/23] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (7 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 08/23] block: add a flags argument to (__)blkdev_issue_zeroout Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 10/23] block: add a new BLKDEV_ZERO_NOFALLBACK flag Christoph Hellwig
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

If this flag is set logical provisioning capable device should
release space for the zeroed blocks if possible, if it is not set
devices should keep the blocks anchored.

Also remove an out of sync kerneldoc comment for a static function
that would have become even more out of data with this change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c           | 19 +++++--------------
 include/linux/blk_types.h |  5 +++++
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 1e849f904d4c..a59bb54ac7c7 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -226,20 +226,9 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
-/**
- * __blkdev_issue_write_zeroes - generate number of bios with WRITE ZEROES
- * @bdev:	blockdev to issue
- * @sector:	start sector
- * @nr_sects:	number of sectors to write
- * @gfp_mask:	memory allocation flags (for bio_alloc)
- * @biop:	pointer to anchor bio
- *
- * Description:
- *  Generate and issue number of bios(REQ_OP_WRITE_ZEROES) with zerofiled pages.
- */
 static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
-		struct bio **biop)
+		struct bio **biop, unsigned flags)
 {
 	struct bio *bio = *biop;
 	unsigned int max_write_zeroes_sectors;
@@ -258,7 +247,9 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		bio = next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio->bi_bdev = bdev;
-		bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0);
+		bio->bi_opf = REQ_OP_WRITE_ZEROES;
+		if (flags & BLKDEV_ZERO_UNMAP)
+			bio->bi_opf |= REQ_UNMAP;
 
 		if (nr_sects > max_write_zeroes_sectors) {
 			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
@@ -306,7 +297,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		return -EINVAL;
 
 	ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
-			biop);
+			biop, flags);
 	if (ret == 0 || (ret && ret != -EOPNOTSUPP))
 		goto out;
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 6393c13a6498..4da61fd12c94 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -187,6 +187,10 @@ enum req_flag_bits {
 	__REQ_PREFLUSH,		/* request for cache flush */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
 	__REQ_BACKGROUND,	/* background IO */
+
+	/* command specific flags for REQ_OP_WRITE_ZEROES: */
+	__REQ_UNMAP,		/* explicitly release space when zeroing */
+
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -203,6 +207,7 @@ enum req_flag_bits {
 #define REQ_PREFLUSH		(1ULL << __REQ_PREFLUSH)
 #define REQ_RAHEAD		(1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND		(1ULL << __REQ_BACKGROUND)
+#define REQ_UNMAP		(1ULL << __REQ_UNMAP)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 10/23] block: add a new BLKDEV_ZERO_NOFALLBACK flag
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (8 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 09/23] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches Christoph Hellwig
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if
the caller doesn't want them.

Also clean up the convoluted check for the return condition that this
new flag is added to.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c        | 5 ++++-
 include/linux/blkdev.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index a59bb54ac7c7..33c5bf373b7f 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -281,6 +281,9 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
  *
  *  If a device is using logical block provisioning, the underlying space will
  *  only be release if %flags contains BLKDEV_ZERO_UNMAP.
+ *
+ *  If %flags contains BLKDEV_ZERO_NOFALLBACK, the function will return
+ *  -EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
  */
 int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
@@ -298,7 +301,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 
 	ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
 			biop, flags);
-	if (ret == 0 || (ret && ret != -EOPNOTSUPP))
+	if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
 		goto out;
 
 	ret = 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 19209416225d..1bddcfc631a4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1347,6 +1347,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		struct bio **biop);
 
 #define BLKDEV_ZERO_UNMAP	(1 << 0)  /* free unused blocks */
+#define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
 
 extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (9 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 10/23] block: add a new BLKDEV_ZERO_NOFALLBACK flag Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-28 16:50     ` Bart Van Assche
  2017-03-23 14:33 ` [PATCH 12/23] sd: handle REQ_UNMAP Christoph Hellwig
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
and preparse for removing the discard_zeroes_data flag.

Also remove a pointless discard support check, which is done in
blkdev_issue_discard already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/block_dev.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6128283b7830..5c701c16e8ff 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2074,10 +2074,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 			     loff_t len)
 {
 	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
-	struct request_queue *q = bdev_get_queue(bdev);
 	struct address_space *mapping;
 	loff_t end = start + len - 1;
 	loff_t isize;
+	unsigned flags = 0;
 	int error;
 
 	/* Fail if we don't recognize the flags. */
@@ -2107,21 +2107,15 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	truncate_inode_pages_range(mapping, start, end);
 
 	switch (mode) {
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		flags = BLKDEV_ZERO_NOFALLBACK;
+		/*FALLTHRU*/
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
-					    GFP_KERNEL, 0);
-		break;
-	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
-		/* Only punch if the device can do zeroing discard. */
-		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
-			return -EOPNOTSUPP;
-		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
-					     GFP_KERNEL, 0);
+					     GFP_KERNEL, flags);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
-		if (!blk_queue_discard(q))
-			return -EOPNOTSUPP;
 		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, 0);
 		break;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (10 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
       [not found]   ` <20170323143341.31549-13-hch-jcswGhMUV9g@public.gmane.org>
  2017-03-23 14:33 ` [PATCH 13/23] nvme: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Try to use a write same with unmap bit variant if the device supports it
and the caller asks for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b6f70a09a301..ca96bb33471b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -871,6 +871,16 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
 			return BLKPREP_INVALID;
 		return sd_setup_ata_trim_cmnd(cmd);
 	}
+
+	if (rq->cmd_flags & REQ_UNMAP) {
+		switch (sdkp->provisioning_mode) {
+		case SD_LBP_WS16:
+			return sd_setup_write_same16_cmnd(cmd, true);
+		case SD_LBP_WS10:
+			return sd_setup_write_same10_cmnd(cmd, true);
+		}
+	}
+
 	if (sdp->no_write_same)
 		return BLKPREP_INVALID;
 	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 13/23] nvme: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (11 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 12/23] sd: handle REQ_UNMAP Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 14/23] zram: " Christoph Hellwig
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

But now for the real NVMe Write Zeroes yet, just to get rid of the
discard abuse for zeroing.  Also rename the quirk flag to be a bit
more self-explanatory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 10 +++++-----
 drivers/nvme/host/nvme.h |  6 +++---
 drivers/nvme/host/pci.c  |  6 +++---
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9b3b57fef446..bfa0595dc767 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -335,6 +335,8 @@ int nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
 	case REQ_OP_FLUSH:
 		nvme_setup_flush(ns, cmd);
 		break;
+	case REQ_OP_WRITE_ZEROES:
+		/* currently only aliased to deallocate for a few ctrls: */
 	case REQ_OP_DISCARD:
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
@@ -900,16 +902,14 @@ static void nvme_config_discard(struct nvme_ns *ns)
 	BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
 			NVME_DSM_MAX_RANGES);
 
-	if (ctrl->quirks & NVME_QUIRK_DISCARD_ZEROES)
-		ns->queue->limits.discard_zeroes_data = 1;
-	else
-		ns->queue->limits.discard_zeroes_data = 0;
-
 	ns->queue->limits.discard_alignment = logical_block_size;
 	ns->queue->limits.discard_granularity = logical_block_size;
 	blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
 	blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+
+	if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
+		blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
 }
 
 static int nvme_revalidate_ns(struct nvme_ns *ns, struct nvme_id_ns **id)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2aa20e3e5675..07ebc4a1c8fc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -68,10 +68,10 @@ enum nvme_quirks {
 	NVME_QUIRK_IDENTIFY_CNS			= (1 << 1),
 
 	/*
-	 * The controller deterministically returns O's on reads to discarded
-	 * logical blocks.
+	 * The controller deterministically returns O's on reads to
+	 * logical blocks that deallocate was called on.
 	 */
-	NVME_QUIRK_DISCARD_ZEROES		= (1 << 2),
+	NVME_QUIRK_DEALLOCATE_ZEROES		= (1 << 2),
 
 	/*
 	 * The controller needs a delay before starts checking the device
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 26a5fd05fe88..0a28787267f0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2135,13 +2135,13 @@ static const struct pci_error_handlers nvme_err_handler = {
 static const struct pci_device_id nvme_id_table[] = {
 	{ PCI_VDEVICE(INTEL, 0x0953),
 		.driver_data = NVME_QUIRK_STRIPE_SIZE |
-				NVME_QUIRK_DISCARD_ZEROES, },
+				NVME_QUIRK_DEALLOCATE_ZEROES, },
 	{ PCI_VDEVICE(INTEL, 0x0a53),
 		.driver_data = NVME_QUIRK_STRIPE_SIZE |
-				NVME_QUIRK_DISCARD_ZEROES, },
+				NVME_QUIRK_DEALLOCATE_ZEROES, },
 	{ PCI_VDEVICE(INTEL, 0x0a54),
 		.driver_data = NVME_QUIRK_STRIPE_SIZE |
-				NVME_QUIRK_DISCARD_ZEROES, },
+				NVME_QUIRK_DEALLOCATE_ZEROES, },
 	{ PCI_VDEVICE(INTEL, 0x5845),	/* Qemu emulated controller */
 		.driver_data = NVME_QUIRK_IDENTIFY_CNS, },
 	{ PCI_DEVICE(0x1c58, 0x0003),	/* HGST adapter */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 14/23] zram: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (12 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 13/23] nvme: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 15/23] loop: " Christoph Hellwig
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Just the same as discard if the block size equals the system page size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/zram/zram_drv.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index dceb5edd1e54..1710b06f04a7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -829,10 +829,14 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 	offset = (bio->bi_iter.bi_sector &
 		  (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
 
-	if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
+	switch (bio_op(bio)) {
+	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_ZEROES:
 		zram_bio_discard(zram, index, offset, bio);
 		bio_endio(bio);
 		return;
+	default:
+		break;
 	}
 
 	bio_for_each_segment(bvec, bio, iter) {
@@ -1192,6 +1196,8 @@ static int zram_add(void)
 	zram->disk->queue->limits.max_sectors = SECTORS_PER_PAGE;
 	zram->disk->queue->limits.chunk_sectors = 0;
 	blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX);
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
+
 	/*
 	 * zram_bio_discard() will clear all logical blocks if logical block
 	 * size is identical with physical block size(PAGE_SIZE). But if it is
@@ -1201,10 +1207,7 @@ static int zram_add(void)
 	 * zeroed.
 	 */
 	if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
-		zram->disk->queue->limits.discard_zeroes_data = 1;
-	else
-		zram->disk->queue->limits.discard_zeroes_data = 0;
-	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
+		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
 	add_disk(zram->disk);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 15/23] loop: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (13 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 14/23] zram: " Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 16/23] brd: remove discard support Christoph Hellwig
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

It's identical to discard as hole punches will always leave us with
zeroes on reads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..265cd2e33ff0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -528,6 +528,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 	case REQ_OP_FLUSH:
 		return lo_req_flush(lo, rq);
 	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_ZEROES:
 		return lo_discard(lo, rq, pos);
 	case REQ_OP_WRITE:
 		if (lo->transfer)
@@ -826,6 +827,7 @@ static void loop_config_discard(struct loop_device *lo)
 		q->limits.discard_granularity = 0;
 		q->limits.discard_alignment = 0;
 		blk_queue_max_discard_sectors(q, 0);
+		blk_queue_max_write_zeroes_sectors(q, 0);
 		q->limits.discard_zeroes_data = 0;
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
 		return;
@@ -834,6 +836,7 @@ static void loop_config_discard(struct loop_device *lo)
 	q->limits.discard_granularity = inode->i_sb->s_blocksize;
 	q->limits.discard_alignment = 0;
 	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
+	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
 	q->limits.discard_zeroes_data = 1;
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
@@ -1660,6 +1663,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	switch (req_op(cmd->rq)) {
 	case REQ_OP_FLUSH:
 	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_ZEROES:
 		cmd->use_aio = false;
 		break;
 	default:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 16/23] brd: remove discard support
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (14 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 15/23] loop: " Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 17/23] rbd: remove the discard_zeroes_data flag Christoph Hellwig
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

It's just a in-driver reimplementation of writing zeroes to the pages,
which fails if the discards aren't page aligned.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/brd.c | 56 -----------------------------------------------------
 1 file changed, 56 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..c15d0581531f 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -134,28 +134,6 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 	return page;
 }
 
-static void brd_free_page(struct brd_device *brd, sector_t sector)
-{
-	struct page *page;
-	pgoff_t idx;
-
-	spin_lock(&brd->brd_lock);
-	idx = sector >> PAGE_SECTORS_SHIFT;
-	page = radix_tree_delete(&brd->brd_pages, idx);
-	spin_unlock(&brd->brd_lock);
-	if (page)
-		__free_page(page);
-}
-
-static void brd_zero_page(struct brd_device *brd, sector_t sector)
-{
-	struct page *page;
-
-	page = brd_lookup_page(brd, sector);
-	if (page)
-		clear_highpage(page);
-}
-
 /*
  * Free all backing store pages and radix tree. This must only be called when
  * there are no other users of the device.
@@ -212,24 +190,6 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
 	return 0;
 }
 
-static void discard_from_brd(struct brd_device *brd,
-			sector_t sector, size_t n)
-{
-	while (n >= PAGE_SIZE) {
-		/*
-		 * Don't want to actually discard pages here because
-		 * re-allocating the pages can result in writeback
-		 * deadlocks under heavy load.
-		 */
-		if (0)
-			brd_free_page(brd, sector);
-		else
-			brd_zero_page(brd, sector);
-		sector += PAGE_SIZE >> SECTOR_SHIFT;
-		n -= PAGE_SIZE;
-	}
-}
-
 /*
  * Copy n bytes from src to the brd starting at sector. Does not sleep.
  */
@@ -338,14 +298,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
 	if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
 		goto io_error;
 
-	if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
-		if (sector & ((PAGE_SIZE >> SECTOR_SHIFT) - 1) ||
-		    bio->bi_iter.bi_size & ~PAGE_MASK)
-			goto io_error;
-		discard_from_brd(brd, sector, bio->bi_iter.bi_size);
-		goto out;
-	}
-
 	bio_for_each_segment(bvec, bio, iter) {
 		unsigned int len = bvec.bv_len;
 		int err;
@@ -357,9 +309,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
 		sector += len >> SECTOR_SHIFT;
 	}
 
-out:
-	bio_endio(bio);
-	return BLK_QC_T_NONE;
 io_error:
 	bio_io_error(bio);
 	return BLK_QC_T_NONE;
@@ -464,11 +413,6 @@ static struct brd_device *brd_alloc(int i)
 	 *  is harmless)
 	 */
 	blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE);
-
-	brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
-	blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
-	brd->brd_queue->limits.discard_zeroes_data = 1;
-	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 17/23] rbd: remove the discard_zeroes_data flag
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (15 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 16/23] brd: remove discard support Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 18/23] rsxx: " Christoph Hellwig
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

rbd only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/rbd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 517838b65964..0ec3b430e81d 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4380,7 +4380,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 	q->limits.discard_granularity = segment_size;
 	q->limits.discard_alignment = segment_size;
 	blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
-	q->limits.discard_zeroes_data = 1;
 
 	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
 		q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 18/23] rsxx: remove the discard_zeroes_data flag
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (16 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 17/23] rbd: remove the discard_zeroes_data flag Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 19/23] mmc: " Christoph Hellwig
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

rsxx only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/rsxx/dev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index f81d70b39d10..9c566364ac9c 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -300,7 +300,6 @@ int rsxx_setup_dev(struct rsxx_cardinfo *card)
 						RSXX_HW_BLK_SIZE >> 9);
 		card->queue->limits.discard_granularity = RSXX_HW_BLK_SIZE;
 		card->queue->limits.discard_alignment   = RSXX_HW_BLK_SIZE;
-		card->queue->limits.discard_zeroes_data = 1;
 	}
 
 	card->queue->queuedata = card;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 19/23] mmc: remove the discard_zeroes_data flag
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (17 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 18/23] rsxx: " Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 20/23] block: stop using discards for zeroing Christoph Hellwig
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

mmc only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/mmc/core/queue.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 493eb10ce580..4c54ad34e17a 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -167,8 +167,6 @@ static void mmc_queue_setup_discard(struct request_queue *q,
 
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 	blk_queue_max_discard_sectors(q, max_discard);
-	if (card->erased_byte == 0 && !mmc_can_discard(card))
-		q->limits.discard_zeroes_data = 1;
 	q->limits.discard_granularity = card->pref_erase << 9;
 	/* granularity must not be greater than max. discard */
 	if (card->pref_erase > max_discard)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 20/23] block: stop using discards for zeroing
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (18 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 19/23] mmc: " Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 21/23] drbd: make intelligent use of blkdev_issue_zeroout Christoph Hellwig
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Now that we have REQ_OP_WRITE_ZEROES implemented for all devices that
support efficient zeroing of devices we can remove the call to
blkdev_issue_discard.  This means we only have two ways of zeroing left
and can simply the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 33c5bf373b7f..153ca59393a7 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -279,6 +279,11 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
  *  Zero-fill a block range, either using hardware offload or by explicitly
  *  writing zeroes to the device.
  *
+ *  Note that this function may fail with -EOPNOTSUPP if the driver supports
+ *  efficient zeroing operation, but the device capabilities can only be
+ *  discovered by trial and error.  In this case the caller should call the
+ *  function again, and it will use the fallback path.
+ *
  *  If a device is using logical block provisioning, the underlying space will
  *  only be release if %flags contains BLKDEV_ZERO_UNMAP.
  *
@@ -349,12 +354,6 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	struct bio *bio = NULL;
 	struct blk_plug plug;
 
-	if (flags & BLKDEV_ZERO_UNMAP) {
-		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
-				BLKDEV_DISCARD_ZERO))
-			return 0;
-	}
-
 	blk_start_plug(&plug);
 	ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
 			&bio, flags);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 21/23] drbd: make intelligent use of blkdev_issue_zeroout
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (19 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 20/23] block: stop using discards for zeroing Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-23 14:33 ` [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

drbd always wants its discard wire operations to zero the blocks, so
use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
reinventing it poorly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/drbd/drbd_debugfs.c  |  3 --
 drivers/block/drbd/drbd_int.h      |  6 ---
 drivers/block/drbd/drbd_receiver.c | 99 ++------------------------------------
 drivers/block/drbd/drbd_req.c      |  6 +--
 4 files changed, 7 insertions(+), 107 deletions(-)

diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
index de5c3ee8a790..494837e59f23 100644
--- a/drivers/block/drbd/drbd_debugfs.c
+++ b/drivers/block/drbd/drbd_debugfs.c
@@ -236,9 +236,6 @@ static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re
 	seq_print_rq_state_bit(m, f & EE_CALL_AL_COMPLETE_IO, &sep, "in-AL");
 	seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C");
 	seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync");
-
-	if (f & EE_IS_TRIM)
-		__seq_print_rq_state_bit(m, f & EE_IS_TRIM_USE_ZEROOUT, &sep, "zero-out", "trim");
 	seq_print_rq_state_bit(m, f & EE_WRITE_SAME, &sep, "write-same");
 	seq_putc(m, '\n');
 }
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 724d1c50fc52..d5da45bb03a6 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -437,9 +437,6 @@ enum {
 
 	/* is this a TRIM aka REQ_DISCARD? */
 	__EE_IS_TRIM,
-	/* our lower level cannot handle trim,
-	 * and we want to fall back to zeroout instead */
-	__EE_IS_TRIM_USE_ZEROOUT,
 
 	/* In case a barrier failed,
 	 * we need to resubmit without the barrier flag. */
@@ -482,7 +479,6 @@ enum {
 #define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO)
 #define EE_MAY_SET_IN_SYNC     (1<<__EE_MAY_SET_IN_SYNC)
 #define EE_IS_TRIM             (1<<__EE_IS_TRIM)
-#define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT)
 #define EE_RESUBMITTED         (1<<__EE_RESUBMITTED)
 #define EE_WAS_ERROR           (1<<__EE_WAS_ERROR)
 #define EE_HAS_DIGEST          (1<<__EE_HAS_DIGEST)
@@ -1561,8 +1557,6 @@ extern void start_resync_timer_fn(unsigned long data);
 extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);
 
 /* drbd_receiver.c */
-extern int drbd_issue_discard_or_zero_out(struct drbd_device *device,
-		sector_t start, unsigned int nr_sectors, bool discard);
 extern int drbd_receiver(struct drbd_thread *thi);
 extern int drbd_ack_receiver(struct drbd_thread *thi);
 extern void drbd_send_ping_wf(struct work_struct *ws);
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index aa6bf9692eff..8641776f043a 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1448,105 +1448,14 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
 		drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
 }
 
-/*
- * We *may* ignore the discard-zeroes-data setting, if so configured.
- *
- * Assumption is that it "discard_zeroes_data=0" is only because the backend
- * may ignore partial unaligned discards.
- *
- * LVM/DM thin as of at least
- *   LVM version:     2.02.115(2)-RHEL7 (2015-01-28)
- *   Library version: 1.02.93-RHEL7 (2015-01-28)
- *   Driver version:  4.29.0
- * still behaves this way.
- *
- * For unaligned (wrt. alignment and granularity) or too small discards,
- * we zero-out the initial (and/or) trailing unaligned partial chunks,
- * but discard all the aligned full chunks.
- *
- * At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
- */
-int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard)
-{
-	struct block_device *bdev = device->ldev->backing_bdev;
-	struct request_queue *q = bdev_get_queue(bdev);
-	sector_t tmp, nr;
-	unsigned int max_discard_sectors, granularity;
-	int alignment;
-	int err = 0;
-
-	if (!discard)
-		goto zero_out;
-
-	/* Zero-sector (unknown) and one-sector granularities are the same.  */
-	granularity = max(q->limits.discard_granularity >> 9, 1U);
-	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-	max_discard_sectors = min(q->limits.max_discard_sectors, (1U << 22));
-	max_discard_sectors -= max_discard_sectors % granularity;
-	if (unlikely(!max_discard_sectors))
-		goto zero_out;
-
-	if (nr_sectors < granularity)
-		goto zero_out;
-
-	tmp = start;
-	if (sector_div(tmp, granularity) != alignment) {
-		if (nr_sectors < 2*granularity)
-			goto zero_out;
-		/* start + gran - (start + gran - align) % gran */
-		tmp = start + granularity - alignment;
-		tmp = start + granularity - sector_div(tmp, granularity);
-
-		nr = tmp - start;
-		err |= blkdev_issue_zeroout(bdev, start, nr, GFP_NOIO, 0);
-		nr_sectors -= nr;
-		start = tmp;
-	}
-	while (nr_sectors >= granularity) {
-		nr = min_t(sector_t, nr_sectors, max_discard_sectors);
-		err |= blkdev_issue_discard(bdev, start, nr, GFP_NOIO, 0);
-		nr_sectors -= nr;
-		start += nr;
-	}
- zero_out:
-	if (nr_sectors) {
-		err |= blkdev_issue_zeroout(bdev, start, nr_sectors, GFP_NOIO, 0);
-	}
-	return err != 0;
-}
-
-static bool can_do_reliable_discards(struct drbd_device *device)
-{
-	struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev);
-	struct disk_conf *dc;
-	bool can_do;
-
-	if (!blk_queue_discard(q))
-		return false;
-
-	if (q->limits.discard_zeroes_data)
-		return true;
-
-	rcu_read_lock();
-	dc = rcu_dereference(device->ldev->disk_conf);
-	can_do = dc->discard_zeroes_if_aligned;
-	rcu_read_unlock();
-	return can_do;
-}
-
 static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req)
 {
-	/* If the backend cannot discard, or does not guarantee
-	 * read-back zeroes in discarded ranges, we fall back to
-	 * zero-out.  Unless configuration specifically requested
-	 * otherwise. */
-	if (!can_do_reliable_discards(device))
-		peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT;
+	struct block_device *bdev = device->ldev->backing_bdev;
 
-	if (drbd_issue_discard_or_zero_out(device, peer_req->i.sector,
-	    peer_req->i.size >> 9, !(peer_req->flags & EE_IS_TRIM_USE_ZEROOUT)))
+	if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9,
+			GFP_NOIO, BLKDEV_ZERO_UNMAP))
 		peer_req->flags |= EE_WAS_ERROR;
+
 	drbd_endio_write_sec_final(peer_req);
 }
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 652114ae1a8a..bdd42b8360cc 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1148,10 +1148,10 @@ static int drbd_process_write_request(struct drbd_request *req)
 
 static void drbd_process_discard_req(struct drbd_request *req)
 {
-	int err = drbd_issue_discard_or_zero_out(req->device,
-				req->i.sector, req->i.size >> 9, true);
+	struct block_device *bdev = req->device->ldev->backing_bdev;
 
-	if (err)
+	if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
+			GFP_NOIO, BLKDEV_ZERO_UNMAP))
 		req->private_bio->bi_error = -EIO;
 	bio_endio(req->private_bio);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (20 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 21/23] drbd: make intelligent use of blkdev_issue_zeroout Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
       [not found]   ` <20170323143341.31549-23-hch-jcswGhMUV9g@public.gmane.org>
  2017-03-23 14:33 ` [PATCH 23/23] block: remove the discard_zeroes_data flag Christoph Hellwig
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

It seems like DRBD assumes its on the wire TRIM request always zeroes data.
Use that fact to implement REQ_OP_WRITE_ZEROES.

XXX: will need a careful audit from the drbd team!

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/drbd/drbd_main.c     | 3 ++-
 drivers/block/drbd/drbd_nl.c       | 2 ++
 drivers/block/drbd/drbd_receiver.c | 6 +++---
 drivers/block/drbd/drbd_req.c      | 7 +++++--
 drivers/block/drbd/drbd_worker.c   | 4 +++-
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 92c60cbd04ee..8e62d9f65510 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1668,7 +1668,8 @@ static u32 bio_flags_to_wire(struct drbd_connection *connection,
 			(bio->bi_opf & REQ_FUA ? DP_FUA : 0) |
 			(bio->bi_opf & REQ_PREFLUSH ? DP_FLUSH : 0) |
 			(bio_op(bio) == REQ_OP_WRITE_SAME ? DP_WSAME : 0) |
-			(bio_op(bio) == REQ_OP_DISCARD ? DP_DISCARD : 0);
+			(bio_op(bio) == REQ_OP_DISCARD ? DP_DISCARD : 0) |
+			(bio_op(bio) == REQ_OP_WRITE_ZEROES ? DP_DISCARD : 0);
 	else
 		return bio->bi_opf & REQ_SYNC ? DP_RW_SYNC : 0;
 }
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 908c704e20aa..e4516d3b971d 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1217,10 +1217,12 @@ static void decide_on_discard_support(struct drbd_device *device,
 		blk_queue_discard_granularity(q, 512);
 		q->limits.max_discard_sectors = drbd_max_discard_sectors(connection);
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+		q->limits.max_write_zeroes_sectors = drbd_max_discard_sectors(connection);
 	} else {
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
 		blk_queue_discard_granularity(q, 0);
 		q->limits.max_discard_sectors = 0;
+		q->limits.max_write_zeroes_sectors = 0;
 	}
 }
 
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 8641776f043a..e64e01ca4d1a 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -2285,7 +2285,7 @@ static unsigned long wire_flags_to_bio_flags(u32 dpf)
 static unsigned long wire_flags_to_bio_op(u32 dpf)
 {
 	if (dpf & DP_DISCARD)
-		return REQ_OP_DISCARD;
+		return REQ_OP_WRITE_ZEROES;
 	else
 		return REQ_OP_WRITE;
 }
@@ -2476,7 +2476,7 @@ static int receive_Data(struct drbd_connection *connection, struct packet_info *
 	op_flags = wire_flags_to_bio_flags(dp_flags);
 	if (pi->cmd == P_TRIM) {
 		D_ASSERT(peer_device, peer_req->i.size > 0);
-		D_ASSERT(peer_device, op == REQ_OP_DISCARD);
+		D_ASSERT(peer_device, op == REQ_OP_WRITE_ZEROES);
 		D_ASSERT(peer_device, peer_req->pages == NULL);
 	} else if (peer_req->pages == NULL) {
 		D_ASSERT(device, peer_req->i.size == 0);
@@ -4789,7 +4789,7 @@ static int receive_rs_deallocated(struct drbd_connection *connection, struct pac
 
 	if (get_ldev(device)) {
 		struct drbd_peer_request *peer_req;
-		const int op = REQ_OP_DISCARD;
+		const int op = REQ_OP_WRITE_ZEROES;
 
 		peer_req = drbd_alloc_peer_req(peer_device, ID_SYNCER, sector,
 					       size, 0, GFP_NOIO);
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index bdd42b8360cc..d76ee53ce528 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -59,6 +59,7 @@ static struct drbd_request *drbd_req_new(struct drbd_device *device, struct bio
 	drbd_req_make_private_bio(req, bio_src);
 	req->rq_state = (bio_data_dir(bio_src) == WRITE ? RQ_WRITE : 0)
 		      | (bio_op(bio_src) == REQ_OP_WRITE_SAME ? RQ_WSAME : 0)
+		      | (bio_op(bio_src) == REQ_OP_WRITE_ZEROES ? RQ_UNMAP : 0)
 		      | (bio_op(bio_src) == REQ_OP_DISCARD ? RQ_UNMAP : 0);
 	req->device = device;
 	req->master_bio = bio_src;
@@ -1180,7 +1181,8 @@ drbd_submit_req_private_bio(struct drbd_request *req)
 	if (get_ldev(device)) {
 		if (drbd_insert_fault(device, type))
 			bio_io_error(bio);
-		else if (bio_op(bio) == REQ_OP_DISCARD)
+		else if (bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+			 bio_op(bio) == REQ_OP_DISCARD)
 			drbd_process_discard_req(req);
 		else
 			generic_make_request(bio);
@@ -1234,7 +1236,8 @@ drbd_request_prepare(struct drbd_device *device, struct bio *bio, unsigned long
 	_drbd_start_io_acct(device, req);
 
 	/* process discards always from our submitter thread */
-	if (bio_op(bio) & REQ_OP_DISCARD)
+	if ((bio_op(bio) & REQ_OP_WRITE_ZEROES) ||
+	    (bio_op(bio) & REQ_OP_DISCARD))
 		goto queue_for_submitter_thread;
 
 	if (rw == WRITE && req->private_bio && req->i.size
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 3bff33f21435..1afcb4e02d8d 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -174,7 +174,8 @@ void drbd_peer_request_endio(struct bio *bio)
 	struct drbd_peer_request *peer_req = bio->bi_private;
 	struct drbd_device *device = peer_req->peer_device->device;
 	bool is_write = bio_data_dir(bio) == WRITE;
-	bool is_discard = !!(bio_op(bio) == REQ_OP_DISCARD);
+	bool is_discard = bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+			  bio_op(bio) == REQ_OP_DISCARD;
 
 	if (bio->bi_error && __ratelimit(&drbd_ratelimit_state))
 		drbd_warn(device, "%s: error=%d s=%llus\n",
@@ -249,6 +250,7 @@ void drbd_request_endio(struct bio *bio)
 	/* to avoid recursion in __req_mod */
 	if (unlikely(bio->bi_error)) {
 		switch (bio_op(bio)) {
+		case REQ_OP_WRITE_ZEROES:
 		case REQ_OP_DISCARD:
 			if (bio->bi_error == -EOPNOTSUPP)
 				what = DISCARD_COMPLETED_NOTSUPP;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 23/23] block: remove the discard_zeroes_data flag
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (21 preceding siblings ...)
  2017-03-23 14:33 ` [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:33 ` Christoph Hellwig
  2017-03-28 17:00     ` Bart Van Assche
       [not found] ` <20170323143341.31549-1-hch-jcswGhMUV9g@public.gmane.org>
  2017-03-30 15:12 ` Mike Snitzer
  24 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
  To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg
  Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/ABI/testing/sysfs-block | 10 ++--------
 Documentation/block/queue-sysfs.txt   |  5 -----
 block/blk-lib.c                       |  7 +------
 block/blk-settings.c                  |  3 ---
 block/blk-sysfs.c                     |  2 +-
 block/compat_ioctl.c                  |  2 +-
 block/ioctl.c                         |  2 +-
 drivers/ata/libata-scsi.c             |  2 +-
 drivers/block/drbd/drbd_main.c        |  2 --
 drivers/block/drbd/drbd_nl.c          |  7 +------
 drivers/block/loop.c                  |  2 --
 drivers/block/mtip32xx/mtip32xx.c     |  1 -
 drivers/block/nbd.c                   |  1 -
 drivers/md/dm-cache-target.c          |  1 -
 drivers/md/dm-crypt.c                 |  1 -
 drivers/md/dm-raid.c                  |  6 +++---
 drivers/md/dm-raid1.c                 |  1 -
 drivers/md/dm-table.c                 | 19 -------------------
 drivers/md/dm-thin.c                  |  2 --
 drivers/md/raid5.c                    | 12 +-----------
 drivers/scsi/sd.c                     |  7 -------
 drivers/target/target_core_device.c   |  2 +-
 include/linux/blkdev.h                | 15 ---------------
 include/linux/device-mapper.h         |  5 -----
 24 files changed, 13 insertions(+), 104 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 2da04ce6aeef..dea212db9df3 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -213,14 +213,8 @@ What:		/sys/block/<disk>/queue/discard_zeroes_data
 Date:		May 2011
 Contact:	Martin K. Petersen <martin.petersen@oracle.com>
 Description:
-		Devices that support discard functionality may return
-		stale or random data when a previously discarded block
-		is read back. This can cause problems if the filesystem
-		expects discarded blocks to be explicitly cleared. If a
-		device reports that it deterministically returns zeroes
-		when a discarded area is read the discard_zeroes_data
-		parameter will be set to one. Otherwise it will be 0 and
-		the result of reading a discarded area is undefined.
+		Will always return 0.  Don't rely on any specific behavior
+		for discards, and don't read this file.
 
 What:		/sys/block/<disk>/queue/write_same_max_bytes
 Date:		January 2012
diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt
index c0a3bb5a6e4e..009150ed7db8 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -43,11 +43,6 @@ large discards are issued, setting this value lower will make Linux issue
 smaller discards and potentially help reduce latencies induced by large
 discard operations.
 
-discard_zeroes_data (RO)
-------------------------
-When read, this file will show if the discarded block are zeroed by the
-device or not. If its value is '1' the blocks are zeroed otherwise not.
-
 hw_sector_size (RO)
 -------------------
 This is the hardware sector size of the device, in bytes.
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 153ca59393a7..be44d2725ede 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -37,17 +37,12 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		return -ENXIO;
 
 	if (flags & BLKDEV_DISCARD_SECURE) {
-		if (flags & BLKDEV_DISCARD_ZERO)
-			return -EOPNOTSUPP;
 		if (!blk_queue_secure_erase(q))
 			return -EOPNOTSUPP;
 		op = REQ_OP_SECURE_ERASE;
 	} else {
 		if (!blk_queue_discard(q))
 			return -EOPNOTSUPP;
-		if ((flags & BLKDEV_DISCARD_ZERO) &&
-		    !q->limits.discard_zeroes_data)
-			return -EOPNOTSUPP;
 		op = REQ_OP_DISCARD;
 	}
 
@@ -126,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			&bio);
 	if (!ret && bio) {
 		ret = submit_bio_wait(bio);
-		if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
+		if (ret == -EOPNOTSUPP)
 			ret = 0;
 		bio_put(bio);
 	}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 9d515ae3a405..fe2794986eb9 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -104,7 +104,6 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->discard_granularity = 0;
 	lim->discard_alignment = 0;
 	lim->discard_misaligned = 0;
-	lim->discard_zeroes_data = 0;
 	lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
 	lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
 	lim->alignment_offset = 0;
@@ -128,7 +127,6 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	blk_set_default_limits(lim);
 
 	/* Inherit limits from component devices */
-	lim->discard_zeroes_data = 1;
 	lim->max_segments = USHRT_MAX;
 	lim->max_discard_segments = 1;
 	lim->max_hw_sectors = UINT_MAX;
@@ -623,7 +621,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
 
 	t->cluster &= b->cluster;
-	t->discard_zeroes_data &= b->discard_zeroes_data;
 
 	/* Physical block size a multiple of the logical block size? */
 	if (t->physical_block_size & (t->logical_block_size - 1)) {
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c44b321335f3..ca88a533b735 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -208,7 +208,7 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
 
 static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
 {
-	return queue_var_show(queue_discard_zeroes_data(q), page);
+	return 0;
 }
 
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index 570021a0dc1c..04325b81c2b4 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -685,7 +685,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	case BLKALIGNOFF:
 		return compat_put_int(arg, bdev_alignment_offset(bdev));
 	case BLKDISCARDZEROES:
-		return compat_put_uint(arg, bdev_discard_zeroes_data(bdev));
+		return compat_put_uint(arg, 0);
 	case BLKFLSBUF:
 	case BLKROSET:
 	case BLKDISCARD:
diff --git a/block/ioctl.c b/block/ioctl.c
index 304685022d9f..1a1586f3baa1 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -546,7 +546,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	case BLKALIGNOFF:
 		return put_int(arg, bdev_alignment_offset(bdev));
 	case BLKDISCARDZEROES:
-		return put_uint(arg, bdev_discard_zeroes_data(bdev));
+		return put_uint(arg, 0);
 	case BLKSECTGET:
 		max_sectors = min_t(unsigned int, USHRT_MAX,
 				    queue_max_sectors(bdev_get_queue(bdev)));
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 965b9e7dbb7d..6fb870cac932 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1327,7 +1327,7 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
 		sdev->ata_trim = 1;
 		if (ata_id_has_zero_after_trim(dev->id) &&
 		    (dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM)) {
-			ata_dev_info(dev, "Enabling discard_zeroes_data\n");
+			ata_dev_info(dev, "Enabling zeroing using TRIM\n");
 			sdev->ata_trim_zeroes_data = 1;
 		}
 	}
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 8e62d9f65510..84455c365f57 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -931,7 +931,6 @@ void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct r
 		p->qlim->io_min = cpu_to_be32(queue_io_min(q));
 		p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
 		p->qlim->discard_enabled = blk_queue_discard(q);
-		p->qlim->discard_zeroes_data = queue_discard_zeroes_data(q);
 		p->qlim->write_same_capable = !!q->limits.max_write_same_sectors;
 	} else {
 		q = device->rq_queue;
@@ -941,7 +940,6 @@ void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct r
 		p->qlim->io_min = cpu_to_be32(queue_io_min(q));
 		p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
 		p->qlim->discard_enabled = 0;
-		p->qlim->discard_zeroes_data = 0;
 		p->qlim->write_same_capable = 0;
 	}
 }
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index e4516d3b971d..02255a0d68b9 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1199,10 +1199,6 @@ static void decide_on_discard_support(struct drbd_device *device,
 	struct drbd_connection *connection = first_peer_device(device)->connection;
 	bool can_do = b ? blk_queue_discard(b) : true;
 
-	if (can_do && b && !b->limits.discard_zeroes_data && !discard_zeroes_if_aligned) {
-		can_do = false;
-		drbd_info(device, "discard_zeroes_data=0 and discard_zeroes_if_aligned=no: disabling discards\n");
-	}
 	if (can_do && connection->cstate >= C_CONNECTED && !(connection->agreed_features & DRBD_FF_TRIM)) {
 		can_do = false;
 		drbd_info(connection, "peer DRBD too old, does not support TRIM: disabling discards\n");
@@ -1484,8 +1480,7 @@ static void sanitize_disk_conf(struct drbd_device *device, struct disk_conf *dis
 	if (disk_conf->al_extents > drbd_al_extents_max(nbc))
 		disk_conf->al_extents = drbd_al_extents_max(nbc);
 
-	if (!blk_queue_discard(q)
-	    || (!q->limits.discard_zeroes_data && !disk_conf->discard_zeroes_if_aligned)) {
+	if (!blk_queue_discard(q)) {
 		if (disk_conf->rs_discard_granularity) {
 			disk_conf->rs_discard_granularity = 0; /* disable feature */
 			drbd_info(device, "rs_discard_granularity feature disabled\n");
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 265cd2e33ff0..57f68f4ee886 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -828,7 +828,6 @@ static void loop_config_discard(struct loop_device *lo)
 		q->limits.discard_alignment = 0;
 		blk_queue_max_discard_sectors(q, 0);
 		blk_queue_max_write_zeroes_sectors(q, 0);
-		q->limits.discard_zeroes_data = 0;
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
 		return;
 	}
@@ -837,7 +836,6 @@ static void loop_config_discard(struct loop_device *lo)
 	q->limits.discard_alignment = 0;
 	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
 	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
-	q->limits.discard_zeroes_data = 1;
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index f96ab717534c..7efac4ee26b2 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -4025,7 +4025,6 @@ static int mtip_block_initialize(struct driver_data *dd)
 		dd->queue->limits.discard_granularity = 4096;
 		blk_queue_max_discard_sectors(dd->queue,
 			MTIP_MAX_TRIM_ENTRY_LEN * MTIP_MAX_TRIM_ENTRIES);
-		dd->queue->limits.discard_zeroes_data = 0;
 	}
 
 	/* Set the capacity of the device in 512 byte sectors. */
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 7e4287bc19e5..616e5c6d3ebd 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1040,7 +1040,6 @@ static int nbd_dev_add(int index)
 	queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, disk->queue);
 	disk->queue->limits.discard_granularity = 512;
 	blk_queue_max_discard_sectors(disk->queue, UINT_MAX);
-	disk->queue->limits.discard_zeroes_data = 0;
 	blk_queue_max_hw_sectors(disk->queue, 65536);
 	disk->queue->limits.max_sectors = 256;
 
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 9c689b34e6e7..975922c8f231 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2773,7 +2773,6 @@ static int cache_create(struct cache_args *ca, struct cache **result)
 
 	ti->num_discard_bios = 1;
 	ti->discards_supported = true;
-	ti->discard_zeroes_data_unsupported = true;
 	ti->split_discard_bios = false;
 
 	cache->features = ca->features;
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 389a3637ffcc..ef1d836bd81b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2030,7 +2030,6 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	wake_up_process(cc->write_thread);
 
 	ti->num_flush_bios = 1;
-	ti->discard_zeroes_data_unsupported = true;
 
 	return 0;
 
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index f8564d63982f..468f1380de1d 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -2813,7 +2813,9 @@ static void configure_discard_support(struct raid_set *rs)
 	/* Assume discards not supported until after checks below. */
 	ti->discards_supported = false;
 
-	/* RAID level 4,5,6 require discard_zeroes_data for data integrity! */
+	/*
+	 * XXX: RAID level 4,5,6 require zeroing for safety.
+	 */
 	raid456 = (rs->md.level == 4 || rs->md.level == 5 || rs->md.level == 6);
 
 	for (i = 0; i < rs->raid_disks; i++) {
@@ -2827,8 +2829,6 @@ static void configure_discard_support(struct raid_set *rs)
 			return;
 
 		if (raid456) {
-			if (!q->limits.discard_zeroes_data)
-				return;
 			if (!devices_handle_discard_safely) {
 				DMERR("raid456 discard support disabled due to discard_zeroes_data uncertainty.");
 				DMERR("Set dm-raid.devices_handle_discard_safely=Y to override.");
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 2ddc2d20e62d..a95cbb80fb34 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1124,7 +1124,6 @@ static int mirror_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_flush_bios = 1;
 	ti->num_discard_bios = 1;
 	ti->per_io_data_size = sizeof(struct dm_raid1_bio_record);
-	ti->discard_zeroes_data_unsupported = true;
 
 	ms->kmirrord_wq = alloc_workqueue("kmirrord", WQ_MEM_RECLAIM, 0);
 	if (!ms->kmirrord_wq) {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5cd665c91ead..958275aca008 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1449,22 +1449,6 @@ static bool dm_table_supports_flush(struct dm_table *t, unsigned long flush)
 	return false;
 }
 
-static bool dm_table_discard_zeroes_data(struct dm_table *t)
-{
-	struct dm_target *ti;
-	unsigned i = 0;
-
-	/* Ensure that all targets supports discard_zeroes_data. */
-	while (i < dm_table_get_num_targets(t)) {
-		ti = dm_table_get_target(t, i++);
-
-		if (ti->discard_zeroes_data_unsupported)
-			return false;
-	}
-
-	return true;
-}
-
 static int device_is_nonrot(struct dm_target *ti, struct dm_dev *dev,
 			    sector_t start, sector_t len, void *data)
 {
@@ -1620,9 +1604,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	}
 	blk_queue_write_cache(q, wc, fua);
 
-	if (!dm_table_discard_zeroes_data(t))
-		q->limits.discard_zeroes_data = 0;
-
 	/* Ensure that all underlying devices are non-rotational. */
 	if (dm_table_all_devices_attribute(t, device_is_nonrot))
 		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 2b266a2b5035..a5f1916f621a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -3263,7 +3263,6 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	 * them down to the data device.  The thin device's discard
 	 * processing will cause mappings to be removed from the btree.
 	 */
-	ti->discard_zeroes_data_unsupported = true;
 	if (pf.discard_enabled && pf.discard_passdown) {
 		ti->num_discard_bios = 1;
 
@@ -4119,7 +4118,6 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	ti->per_io_data_size = sizeof(struct dm_thin_endio_hook);
 
 	/* In case the pool supports discards, pass them on. */
-	ti->discard_zeroes_data_unsupported = true;
 	if (tc->pool->pf.discard_enabled) {
 		ti->discards_supported = true;
 		ti->num_discard_bios = 1;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8cf1f86dcd05..a4b04d0dc829 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7265,12 +7265,6 @@ static int raid5_run(struct mddev *mddev)
 		blk_queue_max_discard_sectors(mddev->queue,
 					      0xfffe * STRIPE_SECTORS);
 
-		/*
-		 * unaligned part of discard request will be ignored, so can't
-		 * guarantee discard_zeroes_data
-		 */
-		mddev->queue->limits.discard_zeroes_data = 0;
-
 		blk_queue_max_write_same_sectors(mddev->queue, 0);
 		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
 
@@ -7280,7 +7274,7 @@ static int raid5_run(struct mddev *mddev)
 			disk_stack_limits(mddev->gendisk, rdev->bdev,
 					  rdev->new_data_offset << 9);
 			/*
-			 * discard_zeroes_data is required, otherwise data
+			 * zeroing is required, otherwise data
 			 * could be lost. Consider a scenario: discard a stripe
 			 * (the stripe could be inconsistent if
 			 * discard_zeroes_data is 0); write one disk of the
@@ -7289,10 +7283,6 @@ static int raid5_run(struct mddev *mddev)
 			 * parity); the disk is broken; The stripe data of this
 			 * disk is lost.
 			 */
-			if (!blk_queue_discard(bdev_get_queue(rdev->bdev)) ||
-			    !bdev_get_queue(rdev->bdev)->
-						limits.discard_zeroes_data)
-				discard_supported = false;
 			/* Unfortunately, discard_zeroes_data is not currently
 			 * a guarantee - just a hint.  So we only allow DISCARD
 			 * if the sysadmin has confirmed that only safe devices
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ca96bb33471b..65ed02438565 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -645,8 +645,6 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 	unsigned int logical_block_size = sdkp->device->sector_size;
 	unsigned int max_blocks = 0, max_ranges = 0, max_range_size = 0;
 
-	q->limits.discard_zeroes_data = 0;
-
 	/*
 	 * When LBPRZ is reported, discard alignment and granularity
 	 * must be fixed to the logical block size. Otherwise the block
@@ -682,27 +680,22 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 	case SD_LBP_WS16:
 		max_blocks = min_not_zero(sdkp->max_ws_blocks,
 					  (u32)SD_MAX_WS16_BLOCKS);
-		q->limits.discard_zeroes_data = sdkp->lbprz;
 		break;
 
 	case SD_LBP_WS10:
 		max_blocks = min_not_zero(sdkp->max_ws_blocks,
 					  (u32)SD_MAX_WS10_BLOCKS);
-		q->limits.discard_zeroes_data = sdkp->lbprz;
 		break;
 
 	case SD_LBP_ZERO:
 		max_blocks = min_not_zero(sdkp->max_ws_blocks,
 					  (u32)SD_MAX_WS10_BLOCKS);
-		q->limits.discard_zeroes_data = 1;
 		break;
 
 	case SD_LBP_ATA_TRIM:
 		max_ranges = 512 / sizeof(__le64);
 		max_range_size = USHRT_MAX;
 		max_blocks = max_ranges * max_range_size;
-		if (sdkp->device->ata_trim_zeroes_data)
-			q->limits.discard_zeroes_data = 1;
 		break;
 	}
 
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index c754ae33bf7b..d2f089cfa9ae 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -851,7 +851,7 @@ bool target_configure_unmap_from_queue(struct se_dev_attrib *attrib,
 	attrib->unmap_granularity = q->limits.discard_granularity / block_size;
 	attrib->unmap_granularity_alignment = q->limits.discard_alignment /
 								block_size;
-	attrib->unmap_zeroes_data = q->limits.discard_zeroes_data;
+	attrib->unmap_zeroes_data = 0;
 	return true;
 }
 EXPORT_SYMBOL(target_configure_unmap_from_queue);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1bddcfc631a4..aa1ac2c0f393 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -338,7 +338,6 @@ struct queue_limits {
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
 	unsigned char		cluster;
-	unsigned char		discard_zeroes_data;
 	unsigned char		raid_partial_stripes_expensive;
 	enum blk_zoned_model	zoned;
 };
@@ -1338,7 +1337,6 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
 
 #define BLKDEV_DISCARD_SECURE	(1 << 0)	/* issue a secure erase */
-#define BLKDEV_DISCARD_ZERO	(1 << 1)	/* must reliably zero data */
 
 extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags);
@@ -1543,19 +1541,6 @@ static inline int bdev_discard_alignment(struct block_device *bdev)
 	return q->limits.discard_alignment;
 }
 
-static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
-{
-	if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
-		return 1;
-
-	return 0;
-}
-
-static inline unsigned int bdev_discard_zeroes_data(struct block_device *bdev)
-{
-	return queue_discard_zeroes_data(bdev_get_queue(bdev));
-}
-
 static inline unsigned int bdev_write_same(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 3829bee2302a..c7ea33e38fb9 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -296,11 +296,6 @@ struct dm_target {
 	 * on max_io_len boundary.
 	 */
 	bool split_discard_bios:1;
-
-	/*
-	 * Set if this target does not return zeroes on discarded blocks.
-	 */
-	bool discard_zeroes_data_unsupported:1;
 };
 
 /* Each target can link one of these into the table */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 ` [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-23 14:55   ` Mike Snitzer
  2017-03-23 14:56     ` Christoph Hellwig
  0 siblings, 1 reply; 101+ messages in thread
From: Mike Snitzer @ 2017-03-23 14:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Thu, Mar 23 2017 at 10:33am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> It seems like the code currently passes whatever it was using for writes
> to WRITE SAME.  Just switch it to WRITE ZEROES, although that doesn't
> need any payload.
> 
> Untested, and confused by the code, maybe someone who understands it
> better than me can help..
> 
> Not-yet-signed-off-by: Christoph Hellwig <hch@lst.de>

See commit 70d6c400a ("dm kcopyd: add WRITE SAME support to dm_kcopyd_zero")
drivers/md/dm-io.c:do_region() adjusts the WRITE SAME payload to be a
single page.

So you'd want to tweak dm-io.c accordingly for WRITE ZEROES (presummably
no payload?)

Mike

> ---
>  drivers/md/dm-kcopyd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
> index 9e9d04cb7d51..f85846741d50 100644
> --- a/drivers/md/dm-kcopyd.c
> +++ b/drivers/md/dm-kcopyd.c
> @@ -733,11 +733,11 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
>  		job->pages = &zero_page_list;
>  
>  		/*
> -		 * Use WRITE SAME to optimize zeroing if all dests support it.
> +		 * Use WRITE ZEROES to optimize zeroing if all dests support it.
>  		 */
> -		job->rw = REQ_OP_WRITE_SAME;
> +		job->rw = REQ_OP_WRITE_ZEROES;
>  		for (i = 0; i < job->num_dests; i++)
> -			if (!bdev_write_same(job->dests[i].bdev)) {
> +			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
>  				job->rw = WRITE;
>  				break;
>  			}
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
  2017-03-23 14:55   ` Mike Snitzer
@ 2017-03-23 14:56     ` Christoph Hellwig
  2017-03-23 15:10       ` Mike Snitzer
  0 siblings, 1 reply; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-23 14:56 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, lars.ellenberg, linux-block, linux-scsi,
	drbd-dev, dm-devel, linux-raid

On Thu, Mar 23, 2017 at 10:55:22AM -0400, Mike Snitzer wrote:
> See commit 70d6c400a ("dm kcopyd: add WRITE SAME support to dm_kcopyd_zero")
> drivers/md/dm-io.c:do_region() adjusts the WRITE SAME payload to be a
> single page.
> 
> So you'd want to tweak dm-io.c accordingly for WRITE ZEROES (presummably
> no payload?)

Yeah, I'll look into it.  Any good test case for verifying this code
while I've got your attention?

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
  2017-03-23 14:56     ` Christoph Hellwig
@ 2017-03-23 15:10       ` Mike Snitzer
  2017-03-27  9:12           ` Christoph Hellwig
  0 siblings, 1 reply; 101+ messages in thread
From: Mike Snitzer @ 2017-03-23 15:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Thu, Mar 23 2017 at 10:56am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Mar 23, 2017 at 10:55:22AM -0400, Mike Snitzer wrote:
> > See commit 70d6c400a ("dm kcopyd: add WRITE SAME support to dm_kcopyd_zero")
> > drivers/md/dm-io.c:do_region() adjusts the WRITE SAME payload to be a
> > single page.
> > 
> > So you'd want to tweak dm-io.c accordingly for WRITE ZEROES (presummably
> > no payload?)
> 
> Yeah, I'll look into it.

Ah, your previous 05/23 patch re-uses the discard branch in
dm-io.c:do_region.  Looks correct.  So you should be good.

Not sure why you've split out the dm-kcopyd patch, likely best to just
fold it into the previous dm support patch.

I'll try your code out on my testbed, probably tomorrow, but in general
the DM code looks solid (thanks for doing it!).  But I'll take a closer
look and will report back with my Reviewed-by if all looks (and tests
out) good.

> Any good test case for verifying this code while I've got your
> attention?

DM thinp is the only consumer of dm_kcopyd_zero().  DM thinp
conservatively defaults to zeroing (but we recommend
'skip_block_zeroing' for most setups).  So anyway, just using a DM thin
device with partial thin blocksize IO (without 'skip_block_zeroing')
should get you coverage.

The device-mapper-test-suite thin tests have thinp w/ zeroing coverage
so I'll be sure to run those.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
@ 2017-03-23 15:54     ` Lars Ellenberg
  2017-03-23 14:33 ` [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios Christoph Hellwig
                       ` (23 subsequent siblings)
  24 siblings, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-23 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> supported by the block layer, and switches existing implementations
> of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> removes incorrect discard_zeroes_data, and also switches WRITE SAME
> based zeroing in SCSI to this new method.
> 
> I've done testing with ATA, SCSI and NVMe setups, but there are
> a few things that will need more attention:
> 

>  - The DRBD code in this area was very odd,

DRBD wants all replicas to give back identical data.
If what comes back after a discard is "undefined",
we cannot really use that.

We used to "stack" discard only if our local backend claimed
"discard_zeroes_data". We replicate that IO request to the peer
as discard, and if the peer cannot do discards itself, or has
discard_zeroes_data == 0, the peer will use zeroout instead.

One use-case for this is the device mapper "thin provisioning".
At the time I wrote those "odd" hacks, dm thin targets
would set discard_zeroes_data=0, NOT change discard granularity,
but only actually discard (drop from the tree) whole "chunks",
leaving partial start/end chunks in the mapping tree unchanged.

The logic of "only stack discard, if backend discard_zeroes_data"
would mean that we would not be able to accept and pass down discards
to dm-thin targets. But with data on dm-thin, you would really like
to do the occasional fstrim.

Also, IO backends on the peers do not have to have the same
characteristics.  You could have the DRBD Primary on some SSD,
and the Secondary on some thin-pool LV,
scheduling thin snapthots in intervals or on demand.

With the logic of "use zero-out instead", fstrim would cause it to
fully allocate what was supposed to be thinly provisioned :-(

So what I did there was optionally tell DRBD that
"discard_zeroes_data == 0" on that peer would actually mean
"discard_zeroes_data == 1,
 IF you zero-out the partial chunks of this granularity yourself".

And implemented this "discard aligned chunks of that granularity,
and zero-out partial start/end chunks, if any".

And then claim to upper layers that, yes, discard_zeroes_data=1,
in that case, if so configured, even if our backend (dm-thin)
would say discard_zeroes_data=0.

Does that make sense?  Can we still do that?  Has something like that
been done in block core or device mapper meanwhile?


> and will need an audit from the maintainers.

Will need to make some time for review and testing.

Thanks,

    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-23 15:54     ` Lars Ellenberg
  0 siblings, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-23 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> supported by the block layer, and switches existing implementations
> of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> removes incorrect discard_zeroes_data, and also switches WRITE SAME
> based zeroing in SCSI to this new method.
> 
> I've done testing with ATA, SCSI and NVMe setups, but there are
> a few things that will need more attention:
> 

>  - The DRBD code in this area was very odd,

DRBD wants all replicas to give back identical data.
If what comes back after a discard is "undefined",
we cannot really use that.

We used to "stack" discard only if our local backend claimed
"discard_zeroes_data". We replicate that IO request to the peer
as discard, and if the peer cannot do discards itself, or has
discard_zeroes_data == 0, the peer will use zeroout instead.

One use-case for this is the device mapper "thin provisioning".
At the time I wrote those "odd" hacks, dm thin targets
would set discard_zeroes_data=0, NOT change discard granularity,
but only actually discard (drop from the tree) whole "chunks",
leaving partial start/end chunks in the mapping tree unchanged.

The logic of "only stack discard, if backend discard_zeroes_data"
would mean that we would not be able to accept and pass down discards
to dm-thin targets. But with data on dm-thin, you would really like
to do the occasional fstrim.

Also, IO backends on the peers do not have to have the same
characteristics.  You could have the DRBD Primary on some SSD,
and the Secondary on some thin-pool LV,
scheduling thin snapthots in intervals or on demand.

With the logic of "use zero-out instead", fstrim would cause it to
fully allocate what was supposed to be thinly provisioned :-(

So what I did there was optionally tell DRBD that
"discard_zeroes_data == 0" on that peer would actually mean
"discard_zeroes_data == 1,
 IF you zero-out the partial chunks of this granularity yourself".

And implemented this "discard aligned chunks of that granularity,
and zero-out partial start/end chunks, if any".

And then claim to upper layers that, yes, discard_zeroes_data=1,
in that case, if so configured, even if our backend (dm-thin)
would say discard_zeroes_data=0.

Does that make sense?  Can we still do that?  Has something like that
been done in block core or device mapper meanwhile?


> and will need an audit from the maintainers.

Will need to make some time for review and testing.

Thanks,

    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 15:54     ` Lars Ellenberg
@ 2017-03-23 17:02       ` Mike Snitzer
  -1 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-23 17:02 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: axboe, linux-raid, linux-scsi, martin.petersen, philipp.reisner,
	linux-block, dm-devel, shli, Christoph Hellwig, agk, drbd-dev

On Thu, Mar 23 2017 at 11:54am -0400,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > supported by the block layer, and switches existing implementations
> > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > based zeroing in SCSI to this new method.
> > 
> > I've done testing with ATA, SCSI and NVMe setups, but there are
> > a few things that will need more attention:
> > 
> 
> >  - The DRBD code in this area was very odd,
> 
> DRBD wants all replicas to give back identical data.
> If what comes back after a discard is "undefined",
> we cannot really use that.
> 
> We used to "stack" discard only if our local backend claimed
> "discard_zeroes_data". We replicate that IO request to the peer
> as discard, and if the peer cannot do discards itself, or has
> discard_zeroes_data == 0, the peer will use zeroout instead.
> 
> One use-case for this is the device mapper "thin provisioning".
> At the time I wrote those "odd" hacks, dm thin targets
> would set discard_zeroes_data=0, NOT change discard granularity,
> but only actually discard (drop from the tree) whole "chunks",
> leaving partial start/end chunks in the mapping tree unchanged.
> 
> The logic of "only stack discard, if backend discard_zeroes_data"
> would mean that we would not be able to accept and pass down discards
> to dm-thin targets. But with data on dm-thin, you would really like
> to do the occasional fstrim.

Are you sure you aren't thinking of MD raid?  E.g. raid5's
"devices_handle_discard_safely":
parm:           devices_handle_discard_safely:Set to Y if all devices in each array reliably return zeroes on reads from discarded regions (bool)

I don't recall DM thinp's discard support ever having a requirement for
discard_zeroes_data.

In fact, see header from commit b60ab990ccdf3 ("dm thin: do not expose
non-zero discard limits if discards disabled"):

    Also, always set discard_zeroes_data_unsupported in targets because they
    should never advertise the 'discard_zeroes_data' capability (even if the
    pool's data device supports it).

To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-23 17:02       ` Mike Snitzer
  0 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-23 17:02 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Thu, Mar 23 2017 at 11:54am -0400,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > supported by the block layer, and switches existing implementations
> > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > based zeroing in SCSI to this new method.
> > 
> > I've done testing with ATA, SCSI and NVMe setups, but there are
> > a few things that will need more attention:
> > 
> 
> >  - The DRBD code in this area was very odd,
> 
> DRBD wants all replicas to give back identical data.
> If what comes back after a discard is "undefined",
> we cannot really use that.
> 
> We used to "stack" discard only if our local backend claimed
> "discard_zeroes_data". We replicate that IO request to the peer
> as discard, and if the peer cannot do discards itself, or has
> discard_zeroes_data == 0, the peer will use zeroout instead.
> 
> One use-case for this is the device mapper "thin provisioning".
> At the time I wrote those "odd" hacks, dm thin targets
> would set discard_zeroes_data=0, NOT change discard granularity,
> but only actually discard (drop from the tree) whole "chunks",
> leaving partial start/end chunks in the mapping tree unchanged.
> 
> The logic of "only stack discard, if backend discard_zeroes_data"
> would mean that we would not be able to accept and pass down discards
> to dm-thin targets. But with data on dm-thin, you would really like
> to do the occasional fstrim.

Are you sure you aren't thinking of MD raid?  E.g. raid5's
"devices_handle_discard_safely":
parm:           devices_handle_discard_safely:Set to Y if all devices in each array reliably return zeroes on reads from discarded regions (bool)

I don't recall DM thinp's discard support ever having a requirement for
discard_zeroes_data.

In fact, see header from commit b60ab990ccdf3 ("dm thin: do not expose
non-zero discard limits if discards disabled"):

    Also, always set discard_zeroes_data_unsupported in targets because they
    should never advertise the 'discard_zeroes_data' capability (even if the
    pool's data device supports it).

To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 17:02       ` Mike Snitzer
@ 2017-03-23 22:53           ` Lars Ellenberg
  -1 siblings, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-23 22:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	Christoph Hellwig, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, Mar 23, 2017 at 01:02:22PM -0400, Mike Snitzer wrote:
> On Thu, Mar 23 2017 at 11:54am -0400,
> Lars Ellenberg <lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > > supported by the block layer, and switches existing implementations
> > > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > > based zeroing in SCSI to this new method.
> > > 
> > > I've done testing with ATA, SCSI and NVMe setups, but there are
> > > a few things that will need more attention:
> > > 
> > 
> > >  - The DRBD code in this area was very odd,
> > 
> > DRBD wants all replicas to give back identical data.
> > If what comes back after a discard is "undefined",
> > we cannot really use that.
> > 
> > We used to "stack" discard only if our local backend claimed
> > "discard_zeroes_data". We replicate that IO request to the peer
> > as discard, and if the peer cannot do discards itself, or has
> > discard_zeroes_data == 0, the peer will use zeroout instead.
> > 
> > One use-case for this is the device mapper "thin provisioning".
> > At the time I wrote those "odd" hacks, dm thin targets
> > would set discard_zeroes_data=0, NOT change discard granularity,
> > but only actually discard (drop from the tree) whole "chunks",
> > leaving partial start/end chunks in the mapping tree unchanged.
> > 
> > The logic of "only stack discard, if backend discard_zeroes_data"

That is DRBDs logic I just explained above.
And the "backend" (to DRBD) in that sentence would be thin, and not
the "real" hardware below thin, which may not even support discard.

> > would mean that we would not be able to accept and pass down discards
> > to dm-thin targets. But with data on dm-thin, you would really like
> > to do the occasional fstrim.
> 
> Are you sure you aren't thinking of MD raid?

Yes.

> To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

That is exactly what I was saying.

Thin does not claim to zero data on discard.  which is ok, and correct,
because it only punches holes on full chunks (or whatever you call
them), and leaves the rest in the mapping tree as is.

And that behaviour would prevent DRBD from exposing discards if
configured on top of thin. (see above)

But thin *could* easily guarantee zeroing, by simply punching holes
where it can, and zeroing out the not fully-aligned partial start and
end of the range.

Which is what I added as an option between DRBD and whatever is below,
with the use-case of dm-thin in mind.

And that made it possible for DRBD to
 a) expose "discard" to upper layers, even if we would usually only do
    if the DRBD Primary sits on top of a device that guarantees discard
    zeros data,
 b) still use discards on a secondary, without falling back to zero-out,
    which would unexpectedly fully allocate, instead of trim, a thinly
    provisioned device-mapper target.


Thanks,

    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-23 22:53           ` Lars Ellenberg
  0 siblings, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-23 22:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Thu, Mar 23, 2017 at 01:02:22PM -0400, Mike Snitzer wrote:
> On Thu, Mar 23 2017 at 11:54am -0400,
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> 
> > On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > > supported by the block layer, and switches existing implementations
> > > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > > based zeroing in SCSI to this new method.
> > > 
> > > I've done testing with ATA, SCSI and NVMe setups, but there are
> > > a few things that will need more attention:
> > > 
> > 
> > >  - The DRBD code in this area was very odd,
> > 
> > DRBD wants all replicas to give back identical data.
> > If what comes back after a discard is "undefined",
> > we cannot really use that.
> > 
> > We used to "stack" discard only if our local backend claimed
> > "discard_zeroes_data". We replicate that IO request to the peer
> > as discard, and if the peer cannot do discards itself, or has
> > discard_zeroes_data == 0, the peer will use zeroout instead.
> > 
> > One use-case for this is the device mapper "thin provisioning".
> > At the time I wrote those "odd" hacks, dm thin targets
> > would set discard_zeroes_data=0, NOT change discard granularity,
> > but only actually discard (drop from the tree) whole "chunks",
> > leaving partial start/end chunks in the mapping tree unchanged.
> > 
> > The logic of "only stack discard, if backend discard_zeroes_data"

That is DRBDs logic I just explained above.
And the "backend" (to DRBD) in that sentence would be thin, and not
the "real" hardware below thin, which may not even support discard.

> > would mean that we would not be able to accept and pass down discards
> > to dm-thin targets. But with data on dm-thin, you would really like
> > to do the occasional fstrim.
> 
> Are you sure you aren't thinking of MD raid?

Yes.

> To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

That is exactly what I was saying.

Thin does not claim to zero data on discard.  which is ok, and correct,
because it only punches holes on full chunks (or whatever you call
them), and leaves the rest in the mapping tree as is.

And that behaviour would prevent DRBD from exposing discards if
configured on top of thin. (see above)

But thin *could* easily guarantee zeroing, by simply punching holes
where it can, and zeroing out the not fully-aligned partial start and
end of the range.

Which is what I added as an option between DRBD and whatever is below,
with the use-case of dm-thin in mind.

And that made it possible for DRBD to
 a) expose "discard" to upper layers, even if we would usually only do
    if the DRBD Primary sits on top of a device that guarantees discard
    zeros data,
 b) still use discards on a secondary, without falling back to zero-out,
    which would unexpectedly fully allocate, instead of trim, a thinly
    provisioned device-mapper target.


Thanks,

    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 15:54     ` Lars Ellenberg
@ 2017-03-27  9:10         ` Christoph Hellwig
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Christoph Hellwig, axboe-tSWWG44O7X1aa/9Udqfwiw,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	agk-H+wXaHxf7aLQT0dZR+AlfA, snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA

It sounds like you don't want to support traditional discard at all,
but only WRITE ZEROES.  So in many ways this series is the right way
forward.  It would be nice if we could do a full blown
REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks,
similar to what hardware that implements WRITE SAME of zeroes
or WRITE ZEROES would do.  I'll see if I could include that in my
series.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-27  9:10         ` Christoph Hellwig
  0 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Christoph Hellwig, axboe, martin.petersen, agk, snitzer, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

It sounds like you don't want to support traditional discard at all,
but only WRITE ZEROES.  So in many ways this series is the right way
forward.  It would be nice if we could do a full blown
REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks,
similar to what hardware that implements WRITE SAME of zeroes
or WRITE ZEROES would do.  I'll see if I could include that in my
series.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
  2017-03-23 15:10       ` Mike Snitzer
@ 2017-03-27  9:12           ` Christoph Hellwig
  0 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:12 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, linux-raid, linux-scsi, martin.petersen, philipp.reisner,
	linux-block, dm-devel, lars.ellenberg, shli, Christoph Hellwig,
	agk, drbd-dev

On Thu, Mar 23, 2017 at 11:10:38AM -0400, Mike Snitzer wrote:
> Not sure why you've split out the dm-kcopyd patch, likely best to just
> fold it into the previous dm support patch.

The dm-kcopyd patch switches to using WRITE_ZEROES instead of write
same for dm as user of these interfaces.  The one before just adds
WRITE_ZEROES support to dm.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
@ 2017-03-27  9:12           ` Christoph Hellwig
  0 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:12 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, lars.ellenberg, linux-block, linux-scsi,
	drbd-dev, dm-devel, linux-raid

On Thu, Mar 23, 2017 at 11:10:38AM -0400, Mike Snitzer wrote:
> Not sure why you've split out the dm-kcopyd patch, likely best to just
> fold it into the previous dm support patch.

The dm-kcopyd patch switches to using WRITE_ZEROES instead of write
same for dm as user of these interfaces.  The one before just adds
WRITE_ZEROES support to dm.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-27  9:10         ` Christoph Hellwig
@ 2017-03-27 14:03             ` Mike Snitzer
  -1 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-27 14:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	Christoph Hellwig, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Mon, Mar 27 2017 at  5:10am -0400,
Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:

> It sounds like you don't want to support traditional discard at all,
> but only WRITE ZEROES.  So in many ways this series is the right way
> forward.  It would be nice if we could do a full blown
> REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks,
> similar to what hardware that implements WRITE SAME of zeroes
> or WRITE ZEROES would do.  I'll see if I could include that in my
> series.

By "you" I assume you're referring to Lars?  Lars' approach for discard,
when drbd is layered on dm-thinp, feels over-engineered.  Not his fault,
the way discard and zeroing got conflated certainly lends itself to
these ugly hacks.  SO I do appreciate that for anything to leverage
discard_zeroes_data it needs to be reliable.  Which runs counter to how
discard was implemented (discard may get silently dropped!)  But that is
why dm-thinp doesn't advertise dzd.  Anyway...

As for the blkdev_issue_zeroout() resorting to manually zeroing the
range, if the discard fails or dzd not supported, that certainly
requires DM thinp to implement manual zeroing of the head and tail of
the range if partial blocks are being zeroed.  So I welcome any advances
there.  It is probably something that is best left to Joe or myself to
tackle.  But I'll gladly accept patches ;)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-27 14:03             ` Mike Snitzer
  0 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-27 14:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Mon, Mar 27 2017 at  5:10am -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> It sounds like you don't want to support traditional discard at all,
> but only WRITE ZEROES.  So in many ways this series is the right way
> forward.  It would be nice if we could do a full blown
> REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks,
> similar to what hardware that implements WRITE SAME of zeroes
> or WRITE ZEROES would do.  I'll see if I could include that in my
> series.

By "you" I assume you're referring to Lars?  Lars' approach for discard,
when drbd is layered on dm-thinp, feels over-engineered.  Not his fault,
the way discard and zeroing got conflated certainly lends itself to
these ugly hacks.  SO I do appreciate that for anything to leverage
discard_zeroes_data it needs to be reliable.  Which runs counter to how
discard was implemented (discard may get silently dropped!)  But that is
why dm-thinp doesn't advertise dzd.  Anyway...

As for the blkdev_issue_zeroout() resorting to manually zeroing the
range, if the discard fails or dzd not supported, that certainly
requires DM thinp to implement manual zeroing of the head and tail of
the range if partial blocks are being zeroed.  So I welcome any advances
there.  It is probably something that is best left to Joe or myself to
tackle.  But I'll gladly accept patches ;)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-27 14:03             ` Mike Snitzer
@ 2017-03-27 14:57                 ` Christoph Hellwig
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-27 14:57 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA, Christoph Hellwig,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	Christoph Hellwig, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Mon, Mar 27, 2017 at 10:03:07AM -0400, Mike Snitzer wrote:
> By "you" I assume you're referring to Lars?

Yes.

> Lars' approach for discard,
> when drbd is layered on dm-thinp, feels over-engineered.  Not his fault,
> the way discard and zeroing got conflated certainly lends itself to
> these ugly hacks.  SO I do appreciate that for anything to leverage
> discard_zeroes_data it needs to be reliable.  Which runs counter to how
> discard was implemented (discard may get silently dropped!)  But that is
> why dm-thinp doesn't advertise dzd.  Anyway...

That's exactly what this series does - remove discard_zeroes_data and
use the new REQ_OP_WRITE_ZEROES for anything that wants zeroing offload.

> As for the blkdev_issue_zeroout() resorting to manually zeroing the
> range, if the discard fails or dzd not supported, that certainly
> requires DM thinp to implement manual zeroing of the head and tail of
> the range if partial blocks are being zeroed.  So I welcome any advances
> there.  It is probably something that is best left to Joe or myself to
> tackle.  But I'll gladly accept patches ;)

Ok, I'll happily leave this to the two of you..

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-27 14:57                 ` Christoph Hellwig
  0 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-27 14:57 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Christoph Hellwig, axboe, martin.petersen,
	agk, shli, philipp.reisner, linux-block, linux-scsi, drbd-dev,
	dm-devel, linux-raid

On Mon, Mar 27, 2017 at 10:03:07AM -0400, Mike Snitzer wrote:
> By "you" I assume you're referring to Lars?

Yes.

> Lars' approach for discard,
> when drbd is layered on dm-thinp, feels over-engineered.  Not his fault,
> the way discard and zeroing got conflated certainly lends itself to
> these ugly hacks.  SO I do appreciate that for anything to leverage
> discard_zeroes_data it needs to be reliable.  Which runs counter to how
> discard was implemented (discard may get silently dropped!)  But that is
> why dm-thinp doesn't advertise dzd.  Anyway...

That's exactly what this series does - remove discard_zeroes_data and
use the new REQ_OP_WRITE_ZEROES for anything that wants zeroing offload.

> As for the blkdev_issue_zeroout() resorting to manually zeroing the
> range, if the discard fails or dzd not supported, that certainly
> requires DM thinp to implement manual zeroing of the head and tail of
> the range if partial blocks are being zeroed.  So I welcome any advances
> there.  It is probably something that is best left to Joe or myself to
> tackle.  But I'll gladly accept patches ;)

Ok, I'll happily leave this to the two of you..

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [Drbd-dev] RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-27 14:03             ` Mike Snitzer
@ 2017-03-27 15:08               ` Bart Van Assche
  -1 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-27 15:08 UTC (permalink / raw)
  To: hch, snitzer
  Cc: linux-block, agk, linux-raid, hch, martin.petersen,
	philipp.reisner, axboe, linux-scsi, drbd-dev, shli, dm-devel

On Mon, 2017-03-27 at 10:03 -0400, Mike Snitzer wrote:
> As for the blkdev_issue_zeroout() resorting to manually zeroing the
> range, if the discard fails or dzd not supported, that certainly
> requires DM thinp to implement manual zeroing of the head and tail of
> the range if partial blocks are being zeroed.  So I welcome any advances
> there.  It is probably something that is best left to Joe or myself to
> tackle.  But I'll gladly accept patches ;)

Some time ago I posted a patch series for the block layer that zeroes
start and tail for nonaligned discard requests if dzd has been set. See
also "[PATCH v3 0/5] Make blkdev_issue_discard() submit aligned discard
requests" (https://www.spinics.net/lists/linux-block/msg02360.html).

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [Drbd-dev] RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-27 15:08               ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-27 15:08 UTC (permalink / raw)
  To: hch, snitzer
  Cc: linux-block, agk, linux-raid, hch, martin.petersen,
	philipp.reisner, axboe, linux-scsi, drbd-dev, shli, dm-devel

On Mon, 2017-03-27 at 10:03 -0400, Mike Snitzer wrote:
> As for the blkdev_issue_zeroout() resorting to manually zeroing the
> range, if the discard fails or dzd not supported, that certainly
> requires DM thinp to implement manual zeroing of the head and tail of
> the range if partial blocks are being zeroed.  So I welcome any advances
> there.  It is probably something that is best left to Joe or myself to
> tackle.  But I'll gladly accept patches ;)

Some time ago I posted a patch series for the block layer that zeroes
start and tail for nonaligned discard requests if dzd has been set. See
also "[PATCH v3 0/5] Make blkdev_issue_discard() submit aligned discard
requests" (https://www.spinics.net/lists/linux-block/msg02360.html).

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 ` [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-28 16:12     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 16:12 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Make life easy for implementations that needs to send a data buffer
> to the device (e.g. SCSI) by numbering it as a data out command.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/blk_types.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index d703acb55d0f..6393c13a6498 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -160,7 +160,7 @@ enum req_opf {
>  	/* write the same sector many times */
>  	REQ_OP_WRITE_SAME	= 7,
>  	/* write the zero filled sector many times */
> -	REQ_OP_WRITE_ZEROES	= 8,
> +	REQ_OP_WRITE_ZEROES	= 9,
>  
>  	/* SCSI passthrough using struct scsi_request */
>  	REQ_OP_SCSI_IN		= 32,

Hello Christoph,

Since REQ_OP_WRITE_ZEROES was introduced in kernel v4.10, do we need
"Cc: stable" and "Fixes: a6f0788ec2881" tags for this patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
@ 2017-03-28 16:12     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 16:12 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Make life easy for implementations that needs to send a data buffer
> to the device (e.g. SCSI) by numbering it as a data out command.
>=20
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/blk_types.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>=20
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index d703acb55d0f..6393c13a6498 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -160,7 +160,7 @@ enum req_opf {
>  	/* write the same sector many times */
>  	REQ_OP_WRITE_SAME	=3D 7,
>  	/* write the zero filled sector many times */
> -	REQ_OP_WRITE_ZEROES	=3D 8,
> +	REQ_OP_WRITE_ZEROES	=3D 9,
> =20
>  	/* SCSI passthrough using struct scsi_request */
>  	REQ_OP_SCSI_IN		=3D 32,

Hello Christoph,

Since REQ_OP_WRITE_ZEROES was introduced in kernel v4.10, do we need
"Cc: stable" and "Fixes: a6f0788ec2881" tags for this patch?

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-23 14:33 ` [PATCH 12/23] sd: handle REQ_UNMAP Christoph Hellwig
@ 2017-03-28 16:48       ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 16:48 UTC (permalink / raw)
  To: agk-H+wXaHxf7aLQT0dZR+AlfA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, hch-jcswGhMUV9g,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, shli-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Try to use a write same with unmap bit variant if the device supports it
> and the caller asks for it.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  drivers/scsi/sd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index b6f70a09a301..ca96bb33471b 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -871,6 +871,16 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
>  			return BLKPREP_INVALID;
>  		return sd_setup_ata_trim_cmnd(cmd);
>  	}
> +
> +	if (rq->cmd_flags & REQ_UNMAP) {
> +		switch (sdkp->provisioning_mode) {
> +		case SD_LBP_WS16:
> +			return sd_setup_write_same16_cmnd(cmd, true);
> +		case SD_LBP_WS10:
> +			return sd_setup_write_same10_cmnd(cmd, true);
> +		}
> +	}
> +
>  	if (sdp->no_write_same)
>  		return BLKPREP_INVALID;
>  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)

Users can change the provisioning mode from user space from SD_LBP_WS16 into
SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
@ 2017-03-28 16:48       ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 16:48 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Try to use a write same with unmap bit variant if the device supports it
> and the caller asks for it.
>=20
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/sd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>=20
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index b6f70a09a301..ca96bb33471b 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -871,6 +871,16 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cm=
nd *cmd)
>  			return BLKPREP_INVALID;
>  		return sd_setup_ata_trim_cmnd(cmd);
>  	}
> +
> +	if (rq->cmd_flags & REQ_UNMAP) {
> +		switch (sdkp->provisioning_mode) {
> +		case SD_LBP_WS16:
> +			return sd_setup_write_same16_cmnd(cmd, true);
> +		case SD_LBP_WS10:
> +			return sd_setup_write_same10_cmnd(cmd, true);
> +		}
> +	}
> +
>  	if (sdp->no_write_same)
>  		return BLKPREP_INVALID;
>  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)

Users can change the provisioning mode from user space from=A0SD_LBP_WS16 i=
nto
SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
  2017-03-23 14:33 ` [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches Christoph Hellwig
@ 2017-03-28 16:50     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 16:50 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
> and preparse for removing the discard_zeroes_data flag.

Hello Christoph,

"preparse" probably should have been "prepare"?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
@ 2017-03-28 16:50     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 16:50 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> This gets us support for non-discard efficient write of zeroes (e.g. NVMe=
)
> and preparse for removing the discard_zeroes_data flag.

Hello Christoph,

"preparse" probably should have been "prepare"?

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
  2017-03-23 14:33 ` [PATCH 23/23] block: remove the discard_zeroes_data flag Christoph Hellwig
@ 2017-03-28 17:00     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 17:00 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
> kill this hack.
> 
> [ ... ]
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 2da04ce6aeef..dea212db9df3 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -213,14 +213,8 @@ What:		/sys/block/<disk>/queue/discard_zeroes_data
>  Date:		May 2011
>  Contact:	Martin K. Petersen <martin.petersen@oracle.com>
>  Description:
> -		Devices that support discard functionality may return
> -		stale or random data when a previously discarded block
> -		is read back. This can cause problems if the filesystem
> -		expects discarded blocks to be explicitly cleared. If a
> -		device reports that it deterministically returns zeroes
> -		when a discarded area is read the discard_zeroes_data
> -		parameter will be set to one. Otherwise it will be 0 and
> -		the result of reading a discarded area is undefined.
> +		Will always return 0.  Don't rely on any specific behavior
> +		for discards, and don't read this file.
>  
>  What:		/sys/block/<disk>/queue/write_same_max_bytes
>  Date:		January 2012
>
> [ ... ]
>
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -208,7 +208,7 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
>  
>  static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
>  {
> -	return queue_var_show(queue_discard_zeroes_data(q), page);
> +	return 0;
>  }

Hello Christoph,

It seems to me like the documentation in Documentation/ABI/testing/sysfs-block
and the above code are not in sync. I think the above code will cause reading
from the discard_zeroes_data attribute to return an empty string ("") instead
of "0\n".

BTW, my personal preference is to remove this attribute entirely because keeping
it will cause confusion, no matter how well we document the behavior of this
attribute.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
@ 2017-03-28 17:00     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 17:00 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we ca=
n
> kill this hack.
>=20
> [ ... ]
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/te=
sting/sysfs-block
> index 2da04ce6aeef..dea212db9df3 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -213,14 +213,8 @@ What:		/sys/block/<disk>/queue/discard_zeroes_data
>  Date:		May 2011
>  Contact:	Martin K. Petersen <martin.petersen@oracle.com>
>  Description:
> -		Devices that support discard functionality may return
> -		stale or random data when a previously discarded block
> -		is read back. This can cause problems if the filesystem
> -		expects discarded blocks to be explicitly cleared. If a
> -		device reports that it deterministically returns zeroes
> -		when a discarded area is read the discard_zeroes_data
> -		parameter will be set to one. Otherwise it will be 0 and
> -		the result of reading a discarded area is undefined.
> +		Will always return 0.  Don't rely on any specific behavior
> +		for discards, and don't read this file.
> =20
>  What:		/sys/block/<disk>/queue/write_same_max_bytes
>  Date:		January 2012
>
> [ ... ]
>
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -208,7 +208,7 @@ static ssize_t queue_discard_max_store(struct request=
_queue *q,
> =20
>  static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, c=
har *page)
>  {
> -	return queue_var_show(queue_discard_zeroes_data(q), page);
> +	return 0;
>  }

Hello Christoph,

It seems to me like the documentation in Documentation/ABI/testing/sysfs-bl=
ock
and the above code are not in sync. I think the above code will cause readi=
ng
from the discard_zeroes_data attribute to return an empty string ("") inste=
ad
of "0\n".

BTW, my personal preference is to remove this attribute entirely because ke=
eping
it will cause confusion, no matter how well we document the behavior of thi=
s
attribute.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 ` [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-28 18:50     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 18:50 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index af632e350ab4..b6f70a09a301 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -748,7 +748,7 @@ static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
>  	return scsi_init_io(cmd);
>  }
>  
> -static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
> +static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, bool unmap)
>  {
>  	struct scsi_device *sdp = cmd->device;
>  	struct request *rq = cmd->request;
> @@ -765,13 +765,14 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
>  
>  	cmd->cmd_len = 16;
>  	cmd->cmnd[0] = WRITE_SAME_16;
> -	cmd->cmnd[1] = 0x8; /* UNMAP */
> +	if (unmap)
> +		cmd->cmnd[1] = 0x8; /* UNMAP */
>  	put_unaligned_be64(sector, &cmd->cmnd[2]);
>  	put_unaligned_be32(nr_sectors, &cmd->cmnd[10]);

Hello Christoph,

A quote from SBC: "An OPTIMAL UNMAP GRANULARITY field set to a non-zero value
indicates the optimal granularity in logical blocks for unmap requests (e.g.,
an UNMAP command or a WRITE SAME (16) command with the UNMAP bit set to one).
An unmap request with a number of logical blocks that is not a multiple of
this value may result in unmap operations on fewer LBAs than requested."

This means that just like the start and end of a discard must be aligned on a
discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
also respect that granularity. I think this means that either
__blkdev_issue_zeroout() has to be modified such that it rejects unaligned
REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-28 18:50     ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-28 18:50 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index af632e350ab4..b6f70a09a301 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -748,7 +748,7 @@ static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
>  	return scsi_init_io(cmd);
>  }
> =20
> -static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
> +static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, bool unmap)
>  {
>  	struct scsi_device *sdp =3D cmd->device;
>  	struct request *rq =3D cmd->request;
> @@ -765,13 +765,14 @@ static int sd_setup_write_same16_cmnd(struct scsi_c=
mnd *cmd)
> =20
>  	cmd->cmd_len =3D 16;
>  	cmd->cmnd[0] =3D WRITE_SAME_16;
> -	cmd->cmnd[1] =3D 0x8; /* UNMAP */
> +	if (unmap)
> +		cmd->cmnd[1] =3D 0x8; /* UNMAP */
>  	put_unaligned_be64(sector, &cmd->cmnd[2]);
>  	put_unaligned_be32(nr_sectors, &cmd->cmnd[10]);

Hello Christoph,

A quote from SBC: "An OPTIMAL UNMAP GRANULARITY field set to a non-zero val=
ue
indicates the optimal granularity in logical blocks for unmap requests (e.g=
.,
an UNMAP command or a WRITE SAME (16) command with the UNMAP bit set to one=
).
An unmap request with a number of logical blocks that is not a multiple of
this value may result in unmap operations on fewer LBAs than requested."

This means that just like the start and end of a discard must be aligned on=
 a
discard_granularity boundary, WRITE SAME commands with the UNMAP bit set mu=
st
also respect that granularity. I think this means that either
__blkdev_issue_zeroout() has to be modified such that it rejects unaligned
REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
modified such that it generates REQ_OP_WRITEs for the unaligned start and t=
ail.

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-28 18:50     ` Bart Van Assche
@ 2017-03-28 19:33         ` Mike Snitzer
  -1 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-28 19:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, hch-jcswGhMUV9g,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Tue, Mar 28 2017 at  2:50pm -0400,
Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:

> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index af632e350ab4..b6f70a09a301 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -748,7 +748,7 @@ static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
> >  	return scsi_init_io(cmd);
> >  }
> >  
> > -static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
> > +static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, bool unmap)
> >  {
> >  	struct scsi_device *sdp = cmd->device;
> >  	struct request *rq = cmd->request;
> > @@ -765,13 +765,14 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
> >  
> >  	cmd->cmd_len = 16;
> >  	cmd->cmnd[0] = WRITE_SAME_16;
> > -	cmd->cmnd[1] = 0x8; /* UNMAP */
> > +	if (unmap)
> > +		cmd->cmnd[1] = 0x8; /* UNMAP */
> >  	put_unaligned_be64(sector, &cmd->cmnd[2]);
> >  	put_unaligned_be32(nr_sectors, &cmd->cmnd[10]);
> 
> Hello Christoph,
> 
> A quote from SBC: "An OPTIMAL UNMAP GRANULARITY field set to a non-zero value
> indicates the optimal granularity in logical blocks for unmap requests (e.g.,
> an UNMAP command or a WRITE SAME (16) command with the UNMAP bit set to one).
> An unmap request with a number of logical blocks that is not a multiple of
> this value may result in unmap operations on fewer LBAs than requested."
> 
> This means that just like the start and end of a discard must be aligned on a
> discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
> also respect that granularity. I think this means that either
> __blkdev_issue_zeroout() has to be modified such that it rejects unaligned
> REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
> modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.

That'd get DM thinp off the hook from having to zero the unaligned start
and tail...

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-28 19:33         ` Mike Snitzer
  0 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-28 19:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk, lars.ellenberg, hch, martin.petersen, philipp.reisner,
	axboe, shli, linux-scsi, dm-devel, drbd-dev, linux-block,
	linux-raid

On Tue, Mar 28 2017 at  2:50pm -0400,
Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:

> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index af632e350ab4..b6f70a09a301 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -748,7 +748,7 @@ static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
> >  	return scsi_init_io(cmd);
> >  }
> >  
> > -static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
> > +static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, bool unmap)
> >  {
> >  	struct scsi_device *sdp = cmd->device;
> >  	struct request *rq = cmd->request;
> > @@ -765,13 +765,14 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
> >  
> >  	cmd->cmd_len = 16;
> >  	cmd->cmnd[0] = WRITE_SAME_16;
> > -	cmd->cmnd[1] = 0x8; /* UNMAP */
> > +	if (unmap)
> > +		cmd->cmnd[1] = 0x8; /* UNMAP */
> >  	put_unaligned_be64(sector, &cmd->cmnd[2]);
> >  	put_unaligned_be32(nr_sectors, &cmd->cmnd[10]);
> 
> Hello Christoph,
> 
> A quote from SBC: "An OPTIMAL UNMAP GRANULARITY field set to a non-zero value
> indicates the optimal granularity in logical blocks for unmap requests (e.g.,
> an UNMAP command or a WRITE SAME (16) command with the UNMAP bit set to one).
> An unmap request with a number of logical blocks that is not a multiple of
> this value may result in unmap operations on fewer LBAs than requested."
> 
> This means that just like the start and end of a discard must be aligned on a
> discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
> also respect that granularity. I think this means that either
> __blkdev_issue_zeroout() has to be modified such that it rejects unaligned
> REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
> modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.

That'd get DM thinp off the hook from having to zero the unaligned start
and tail...

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-28 18:50     ` Bart Van Assche
  (?)
  (?)
@ 2017-03-29 14:51     ` Paolo Bonzini
  2017-03-29 16:28         ` Bart Van Assche
  -1 siblings, 1 reply; 101+ messages in thread
From: Paolo Bonzini @ 2017-03-29 14:51 UTC (permalink / raw)
  To: Bart Van Assche, agk, lars.ellenberg, snitzer, hch,
	martin.petersen, philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid



On 28/03/2017 20:50, Bart Van Assche wrote:
> 
> This means that just like the start and end of a discard must be aligned on a
> discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
> also respect that granularity. I think this means that either
> __blkdev_issue_zeroout() has to be modified such that it rejects unaligned
> REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
> modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.

I don't think this is the case.

Rather, Linux should try to align the WRITE SAME commands to the optimal
unmap granularity if the zeroed area requires performing more than one
WRITE SAME command (i.e. > maximum write same length or too large to fit
in the CDB).  However, even in that case it can use WRITE SAME with
UNMAP for the unaligned start and tail.  Unlike the UNMAP command, the
SCSI standard does guarantee that zeroes are written in the unaligned parts.

Paolo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
  2017-03-28 17:00     ` Bart Van Assche
  (?)
@ 2017-03-29 14:52     ` Paolo Bonzini
  -1 siblings, 0 replies; 101+ messages in thread
From: Paolo Bonzini @ 2017-03-29 14:52 UTC (permalink / raw)
  To: Bart Van Assche, agk, lars.ellenberg, snitzer, hch,
	martin.petersen, philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid



On 28/03/2017 19:00, Bart Van Assche wrote:
> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
>> Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
>> kill this hack.
>>
>> [ ... ]
>>
>> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
>> index 2da04ce6aeef..dea212db9df3 100644
>> --- a/Documentation/ABI/testing/sysfs-block
>> +++ b/Documentation/ABI/testing/sysfs-block
>> @@ -213,14 +213,8 @@ What:		/sys/block/<disk>/queue/discard_zeroes_data
>>  Date:		May 2011
>>  Contact:	Martin K. Petersen <martin.petersen@oracle.com>
>>  Description:
>> -		Devices that support discard functionality may return
>> -		stale or random data when a previously discarded block
>> -		is read back. This can cause problems if the filesystem
>> -		expects discarded blocks to be explicitly cleared. If a
>> -		device reports that it deterministically returns zeroes
>> -		when a discarded area is read the discard_zeroes_data
>> -		parameter will be set to one. Otherwise it will be 0 and
>> -		the result of reading a discarded area is undefined.
>> +		Will always return 0.  Don't rely on any specific behavior
>> +		for discards, and don't read this file.
>>  
>>  What:		/sys/block/<disk>/queue/write_same_max_bytes
>>  Date:		January 2012
>>
>> [ ... ]
>>
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -208,7 +208,7 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
>>  
>>  static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
>>  {
>> -	return queue_var_show(queue_discard_zeroes_data(q), page);
>> +	return 0;
>>  }
> 
> Hello Christoph,
> 
> It seems to me like the documentation in Documentation/ABI/testing/sysfs-block
> and the above code are not in sync. I think the above code will cause reading
> from the discard_zeroes_data attribute to return an empty string ("") instead
> of "0\n".
> 
> BTW, my personal preference is to remove this attribute entirely because keeping
> it will cause confusion, no matter how well we document the behavior of this
> attribute.

If you remove it, you should probably remove the BLKDISCARDZEROES ioctl too.

That said, the issue with discard_zeroes_data is that it is badly
defined; it was defined as "if I unmap X, will it read as zeroes?" but
this is not how the SCSI standard defines e.g. the UNMAP command with
LBPRZ=1.  But knowing something like LBPRZ ("if something is unmapped,
will it read as zeroes?") _would_ actually be useful for userspace.
This will be especially true once sd maps lseek(SEEK_HOLE/SEEK_DATA) to
the SCSI GET LBA STATUS command, or once dm-thin supports them.

Secondarily, if the former returns 1, userspace is also interested in
knowing "can REQ_OP_WRITE_ZEROES+REQ_UNMAP ever unmap anything?", i.e.
whether BLKDEV_ZERO_NOFALLBACK will ever return anything but
-EOPNOTSUPP.  For SCSI, this should intuitively mean whether LBPWS or
LBPWS10 are set, but the details depend on how the sd driver implements
REQ_OP_WRITE_ZEROES with REQ_UNMAP.

Paolo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 22:53           ` Lars Ellenberg
  (?)
@ 2017-03-29 14:57           ` Paolo Bonzini
  -1 siblings, 0 replies; 101+ messages in thread
From: Paolo Bonzini @ 2017-03-29 14:57 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Alasdair G Kergon, shli, philipp.reisner, linux-block,
	Linux SCSI List, drbd-dev, dm, linux-raid



On 23/03/2017 23:53, Lars Ellenberg wrote:
> Thin does not claim to zero data on discard.  which is ok, and correct,
> because it only punches holes on full chunks (or whatever you call
> them), and leaves the rest in the mapping tree as is.
> 
> And that behaviour would prevent DRBD from exposing discards if
> configured on top of thin. (see above)
> 
> But thin *could* easily guarantee zeroing, by simply punching holes
> where it can, and zeroing out the not fully-aligned partial start and
> end of the range.

That's the difference between REQ_OP_DISCARD (only punches holes on full
chunks) and REQ_OP_WRITE_ZEROES with the REQ_UNMAP flag (punches holes +
zeroes incomplete chunks).

dm-thinp's REQ_OP_DISCARD should not do anything for unaligned parts.
Instead, layers above should use REQ_OP_WRITE_ZEROES (with or without
REQ_UNMAP, as required) if they need zeroes.  dm-thinp would have to
split off the partial chunks, and zero them in the lower-level device
with REQ_OP_WRITE_ZEROES.

Paolo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-28 16:48       ` Bart Van Assche
  (?)
@ 2017-03-29 14:57       ` Paolo Bonzini
  -1 siblings, 0 replies; 101+ messages in thread
From: Paolo Bonzini @ 2017-03-29 14:57 UTC (permalink / raw)
  To: Bart Van Assche, Alasdair G Kergon, lars.ellenberg, Mike Snitzer,
	Christoph Hellwig, Martin K. Petersen, philipp.reisner,
	Jens Axboe, shli
  Cc: linux-block, linux-raid, dm, Linux SCSI List, drbd-dev



On 28/03/2017 18:48, Bart Van Assche wrote:
>> +	if (rq->cmd_flags & REQ_UNMAP) {
>> +		switch (sdkp->provisioning_mode) {
>> +		case SD_LBP_WS16:
>> +			return sd_setup_write_same16_cmnd(cmd, true);
>> +		case SD_LBP_WS10:
>> +			return sd_setup_write_same10_cmnd(cmd, true);
>> +		}
>> +	}
>> +
>>  	if (sdp->no_write_same)
>>  		return BLKPREP_INVALID;
>>  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
> Users can change the provisioning mode from user space from SD_LBP_WS16 into
> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

Yeah, if REQ_UNMAP is set you should probably check sdkp->provisioning_mode
instead of sdkp->ws16, but apart from this it should still go through the
checks below.

Plus, if the provisioning mode is not ws10 or ws16, should
sd_setup_write_zeroes_cmnd:

1) do a WRITE SAME without UNMAP (what Christoph's code does)

2) return BLKPREP_INVALID

3) ignore provisioning mode and do a WRITE SAME with UNMAP

4) do a WRITE SAME without UNMAP for SD_LBP_{ZERO,FULL,DISABLE},
do a WRITE SAME with UNMAP for SD_LBP_{WS10,WS16,UNMAP}.

I'm in favor of (4).  The distinction between SD_LBP_UNMAP, SD_LBP_WS10
and SD_LBP_WS16 is as problematic as discard_zeroes_data in my opinion.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-29 14:51     ` Paolo Bonzini
@ 2017-03-29 16:28         ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-29 16:28 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, pbonzini, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Wed, 2017-03-29 at 16:51 +0200, Paolo Bonzini wrote:
> On 28/03/2017 20:50, Bart Van Assche wrote:
> > This means that just like the start and end of a discard must be aligned on a
> > discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
> > also respect that granularity. I think this means that either
> > __blkdev_issue_zeroout() has to be modified such that it rejects unaligned
> > REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
> > modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.
> 
> I don't think this is the case.

Hello Paolo,

Can you cite the section(s) from the SCSI specs that support your view? I
reread the "5.49 WRITE SAME (10) command" and "4.7.3.4.4 WRITE SAME command
and unmap operations" sections but I have not found any explicit statement
that specifies the behavior for unaligned WRITE SAME commands with the UNMAP
bit set. It seems to me like the OPTIMAL UNMAP GRANULARITY parameter was
overlooked when both sections were written. Should we ask the T10 committee
for a clarification?

Another question is, if the specification of WRITE SAME + UNMAP would be
made unambiguous in the SBC document, whether or not we should take the risk
to trigger behavior that is not what we expect by sending unaligned WRITE
SAME + UNMAP commands to SCSI devices?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-29 16:28         ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2017-03-29 16:28 UTC (permalink / raw)
  To: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, pbonzini, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid

On Wed, 2017-03-29 at 16:51 +0200, Paolo Bonzini wrote:
> On 28/03/2017 20:50, Bart Van Assche wrote:
> > This means that just like the start and end of a discard must be aligne=
d on a
> > discard_granularity boundary, WRITE SAME commands with the UNMAP bit se=
t must
> > also respect that granularity. I think this means that either
> > __blkdev_issue_zeroout() has to be modified such that it rejects unalig=
ned
> > REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has t=
o be
> > modified such that it generates REQ_OP_WRITEs for the unaligned start a=
nd tail.
>=20
> I don't think this is the case.

Hello Paolo,

Can you cite the section(s) from the SCSI specs that support your view? I
reread the "5.49 WRITE SAME (10) command" and "4.7.3.4.4 WRITE SAME command
and unmap operations" sections but I have not found any explicit statement
that specifies the behavior for unaligned WRITE SAME commands with the UNMA=
P
bit set. It seems to me like=A0the OPTIMAL UNMAP GRANULARITY parameter was
overlooked when both sections were written. Should we ask the T10 committee
for a clarification?

Another question is, if the specification of WRITE SAME + UNMAP would be
made unambiguous in the SBC document, whether or not we should take the ris=
k
to trigger behavior that is not what we expect by sending unaligned WRITE
SAME + UNMAP commands to SCSI devices?

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-29 16:28         ` Bart Van Assche
@ 2017-03-29 16:53           ` Paolo Bonzini
  -1 siblings, 0 replies; 101+ messages in thread
From: Paolo Bonzini @ 2017-03-29 16:53 UTC (permalink / raw)
  To: Bart Van Assche, agk, lars.ellenberg, snitzer, hch,
	martin.petersen, philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid



On 29/03/2017 18:28, Bart Van Assche wrote:
> On Wed, 2017-03-29 at 16:51 +0200, Paolo Bonzini wrote:
>> On 28/03/2017 20:50, Bart Van Assche wrote:
>>> This means that just like the start and end of a discard must be aligned on a
>>> discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
>>> also respect that granularity. I think this means that either
>>> __blkdev_issue_zeroout() has to be modified such that it rejects unaligned
>>> REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
>>> modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.
>>
>> I don't think this is the case.
> 
> Hello Paolo,
> 
> Can you cite the section(s) from the SCSI specs that support your view? I
> reread the "5.49 WRITE SAME (10) command" and "4.7.3.4.4 WRITE SAME command
> and unmap operations" sections but I have not found any explicit statement
> that specifies the behavior for unaligned WRITE SAME commands with the UNMAP
> bit set. It seems to me like the OPTIMAL UNMAP GRANULARITY parameter was
> overlooked when both sections were written. Should we ask the T10 committee
> for a clarification?

From 4.7.3.4.4:

------
If unmap operations are requested in a WRITE SAME command,
then for each specified LBA:

if the Data-Out Buffer of the WRITE SAME command is the same as the
logical block data returned by a read operation from that LBA while in
the unmapped state (see 4.7.4.5), then:

1) the device server performs the actions described in table 6; and

2) if an unmap operation is not performed in step 1), then the device
server shall perform the specified write operation to that LBA;
------

and from the description of WRITE SAME (10): "subsequent read operations
behave as if the device server wrote the single block of user data
received from the Data-Out Buffer to each logical block without
modification" (I have a slightly older copy though, it's 5.45 here).

It's pretty unambiguous that if the device cannot unmap (including the
case where the request is misaligned with respect to the granularity) it
does a write.

> Another question is, if the specification of WRITE SAME + UNMAP would be
> made unambiguous in the SBC document, whether or not we should take the risk
> to trigger behavior that is not what we expect by sending unaligned WRITE
> SAME + UNMAP commands to SCSI devices?

Yes, I think we should.

Paolo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-29 16:53           ` Paolo Bonzini
  0 siblings, 0 replies; 101+ messages in thread
From: Paolo Bonzini @ 2017-03-29 16:53 UTC (permalink / raw)
  To: Bart Van Assche, agk, lars.ellenberg, snitzer, hch,
	martin.petersen, philipp.reisner, axboe, shli
  Cc: linux-scsi, dm-devel, drbd-dev, linux-block, linux-raid



On 29/03/2017 18:28, Bart Van Assche wrote:
> On Wed, 2017-03-29 at 16:51 +0200, Paolo Bonzini wrote:
>> On 28/03/2017 20:50, Bart Van Assche wrote:
>>> This means that just like the start and end of a discard must be aligned on a
>>> discard_granularity boundary, WRITE SAME commands with the UNMAP bit set must
>>> also respect that granularity. I think this means that either
>>> __blkdev_issue_zeroout() has to be modified such that it rejects unaligned
>>> REQ_OP_WRITE_ZEROES operations or that blk_bio_write_same_split() has to be
>>> modified such that it generates REQ_OP_WRITEs for the unaligned start and tail.
>>
>> I don't think this is the case.
> 
> Hello Paolo,
> 
> Can you cite the section(s) from the SCSI specs that support your view? I
> reread the "5.49 WRITE SAME (10) command" and "4.7.3.4.4 WRITE SAME command
> and unmap operations" sections but I have not found any explicit statement
> that specifies the behavior for unaligned WRITE SAME commands with the UNMAP
> bit set. It seems to me like the OPTIMAL UNMAP GRANULARITY parameter was
> overlooked when both sections were written. Should we ask the T10 committee
> for a clarification?

>From 4.7.3.4.4:

------
If unmap operations are requested in a WRITE SAME command,
then for each specified LBA:

if the Data-Out Buffer of the WRITE SAME command is the same as the
logical block data returned by a read operation from that LBA while in
the unmapped state (see 4.7.4.5), then:

1) the device server performs the actions described in table 6; and

2) if an unmap operation is not performed in step 1), then the device
server shall perform the specified write operation to that LBA;
------

and from the description of WRITE SAME (10): "subsequent read operations
behave as if the device server wrote the single block of user data
received from the Data-Out Buffer to each logical block without
modification" (I have a slightly older copy though, it's 5.45 here).

It's pretty unambiguous that if the device cannot unmap (including the
case where the request is misaligned with respect to the granularity) it
does a write.

> Another question is, if the specification of WRITE SAME + UNMAP would be
> made unambiguous in the SBC document, whether or not we should take the risk
> to trigger behavior that is not what we expect by sending unaligned WRITE
> SAME + UNMAP commands to SCSI devices?

Yes, I think we should.

Paolo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
  2017-03-28 18:50     ` Bart Van Assche
@ 2017-03-30  2:25         ` Martin K. Petersen
  -1 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30  2:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, hch-jcswGhMUV9g,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> writes:

Hi Bart,

> A quote from SBC: "An OPTIMAL UNMAP GRANULARITY field set to a
> non-zero value indicates the optimal granularity in logical blocks for
> unmap requests (e.g., an UNMAP command or a WRITE SAME (16) command
> with the UNMAP bit set to one).  An unmap request with a number of
> logical blocks that is not a multiple of this value may result in
> unmap operations on fewer LBAs than requested."

Indeed. Fewer LBAs than requested may be *unmapped*. That does not imply
that they are not *written*.

> This means that just like the start and end of a discard must be
> aligned on a discard_granularity boundary, WRITE SAME commands with
> the UNMAP bit set must also respect that granularity. I think this
> means that either __blkdev_issue_zeroout() has to be modified such
> that it rejects unaligned REQ_OP_WRITE_ZEROES operations or that
> blk_bio_write_same_split() has to be modified such that it generates
> REQ_OP_WRITEs for the unaligned start and tail.

No, that's not correct. SBC states:

"a) if the Data-Out Buffer of the WRITE SAME command is the same as the
   logical block data returned by a read operation from that LBA while
   in the unmapped state, then:

   1) the device server performs the actions described in table 6
      [unmap]; and

   2) if an unmap operation is not performed in step 1), then the device
      server shall perform the specified write operation to that LBA;"

I.e. With WRITE SAME it is the responsibility of the device server to
write any LBAs described by the command that were not successfully
unmapped.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-30  2:25         ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30  2:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

Bart Van Assche <Bart.VanAssche@sandisk.com> writes:

Hi Bart,

> A quote from SBC: "An OPTIMAL UNMAP GRANULARITY field set to a
> non-zero value indicates the optimal granularity in logical blocks for
> unmap requests (e.g., an UNMAP command or a WRITE SAME (16) command
> with the UNMAP bit set to one).  An unmap request with a number of
> logical blocks that is not a multiple of this value may result in
> unmap operations on fewer LBAs than requested."

Indeed. Fewer LBAs than requested may be *unmapped*. That does not imply
that they are not *written*.

> This means that just like the start and end of a discard must be
> aligned on a discard_granularity boundary, WRITE SAME commands with
> the UNMAP bit set must also respect that granularity. I think this
> means that either __blkdev_issue_zeroout() has to be modified such
> that it rejects unaligned REQ_OP_WRITE_ZEROES operations or that
> blk_bio_write_same_split() has to be modified such that it generates
> REQ_OP_WRITEs for the unaligned start and tail.

No, that's not correct. SBC states:

"a) if the Data-Out Buffer of the WRITE SAME command is the same as the
   logical block data returned by a read operation from that LBA while
   in the unmapped state, then:

   1) the device server performs the actions described in table 6
      [unmap]; and

   2) if an unmap operation is not performed in step 1), then the device
      server shall perform the specified write operation to that LBA;"

I.e. With WRITE SAME it is the responsibility of the device server to
write any LBAs described by the command that were not successfully
unmapped.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
  2017-03-28 16:12     ` Bart Van Assche
@ 2017-03-30  8:53         ` hch
  -1 siblings, 0 replies; 101+ messages in thread
From: hch-jcswGhMUV9g @ 2017-03-30  8:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, hch-jcswGhMUV9g,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Tue, Mar 28, 2017 at 04:12:46PM +0000, Bart Van Assche wrote:
> Since REQ_OP_WRITE_ZEROES was introduced in kernel v4.10, do we need
> "Cc: stable" and "Fixes: a6f0788ec2881" tags for this patch?

No.  This just works around the way scsi_setup_cmnd sets up the data
direction.  Before this series it's not an issue because no one used
the req_op data direction for setting up the dma direction.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
@ 2017-03-30  8:53         ` hch
  0 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-30  8:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Tue, Mar 28, 2017 at 04:12:46PM +0000, Bart Van Assche wrote:
> Since REQ_OP_WRITE_ZEROES was introduced in kernel v4.10, do we need
> "Cc: stable" and "Fixes: a6f0788ec2881" tags for this patch?

No.  This just works around the way scsi_setup_cmnd sets up the data
direction.  Before this series it's not an issue because no one used
the req_op data direction for setting up the dma direction.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
  2017-03-28 16:50     ` Bart Van Assche
@ 2017-03-30  8:59         ` hch
  -1 siblings, 0 replies; 101+ messages in thread
From: hch-jcswGhMUV9g @ 2017-03-30  8:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, hch-jcswGhMUV9g,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Tue, Mar 28, 2017 at 04:50:47PM +0000, Bart Van Assche wrote:
> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> > This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
> > and preparse for removing the discard_zeroes_data flag.
> 
> Hello Christoph,
> 
> "preparse" probably should have been "prepare"?

Yes, fixed.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
@ 2017-03-30  8:59         ` hch
  0 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-30  8:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Tue, Mar 28, 2017 at 04:50:47PM +0000, Bart Van Assche wrote:
> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> > This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
> > and preparse for removing the discard_zeroes_data flag.
> 
> Hello Christoph,
> 
> "preparse" probably should have been "prepare"?

Yes, fixed.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-28 16:48       ` Bart Van Assche
@ 2017-03-30  9:02           ` hch
  -1 siblings, 0 replies; 101+ messages in thread
From: hch-jcswGhMUV9g @ 2017-03-30  9:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, hch-jcswGhMUV9g,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
> >  	if (sdp->no_write_same)
> >  		return BLKPREP_INVALID;
> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
> 
> Users can change the provisioning mode from user space from SD_LBP_WS16 into
> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

They can, and if the device has too many sectors that will already cause
discard to fail, and in this case it will cause write zeroes to fail as
well.  The intent behind this patch is to keep the behavior the same
as the old path that uses discards for zeroing.  The logic looks a bit
clumsy, but I'd rather keep it as-is.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
@ 2017-03-30  9:02           ` hch
  0 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-30  9:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
> >  	if (sdp->no_write_same)
> >  		return BLKPREP_INVALID;
> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
> 
> Users can change the provisioning mode from user space from�SD_LBP_WS16 into
> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

They can, and if the device has too many sectors that will already cause
discard to fail, and in this case it will cause write zeroes to fail as
well.  The intent behind this patch is to keep the behavior the same
as the old path that uses discards for zeroing.  The logic looks a bit
clumsy, but I'd rather keep it as-is.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 15:54     ` Lars Ellenberg
@ 2017-03-30  9:04         ` Christoph Hellwig
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-30  9:04 UTC (permalink / raw)
  To: Christoph Hellwig, axboe-tSWWG44O7X1aa/9Udqfwiw,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	agk-H+wXaHxf7aLQT0dZR+AlfA, snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA

Lars, can you please take a look a patch 22 and check if it's safe?

That's the big thing I want to know before posting the next version
of the series.  If it's not safe I'd like to drop that patch.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-30  9:04         ` Christoph Hellwig
  0 siblings, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-30  9:04 UTC (permalink / raw)
  To: Christoph Hellwig, axboe, martin.petersen, agk, snitzer, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

Lars, can you please take a look a patch 22 and check if it's safe?

That's the big thing I want to know before posting the next version
of the series.  If it's not safe I'd like to drop that patch.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
  2017-03-28 17:00     ` Bart Van Assche
  (?)
  (?)
@ 2017-03-30  9:06     ` hch
       [not found]       ` <20170330090655.GF12015-jcswGhMUV9g@public.gmane.org>
  -1 siblings, 1 reply; 101+ messages in thread
From: hch @ 2017-03-30  9:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk, lars.ellenberg, snitzer, hch, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Tue, Mar 28, 2017 at 05:00:48PM +0000, Bart Van Assche wrote:
> 
> It seems to me like the documentation in Documentation/ABI/testing/sysfs-block
> and the above code are not in sync. I think the above code will cause reading
> from the discard_zeroes_data attribute to return an empty string ("") instead
> of "0\n".

Thanks, fine with me.

> 
> BTW, my personal preference is to remove this attribute entirely because keeping
> it will cause confusion, no matter how well we document the behavior of this
> attribute.

Jens, any opinion?  I'd like to remove it too, but I fear it might
break things.  We could deprecate it first with a warning when read
and then remove it a few releases down the road.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-23 14:33 ` [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
@ 2017-03-30 10:06       ` Lars Ellenberg
  0 siblings, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-30 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> Use that fact to implement REQ_OP_WRITE_ZEROES.
> 
> XXX: will need a careful audit from the drbd team!

Thanks, this one looks ok to me.

The real question for me is, will the previous one (21/23)
return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
what we have now?  Will it make an fstrim cause thinly provisioned
devices to suddenly be fully allocated?
Or does it unmap "the same" as what we have now?
Especially on top of dm-thin, but also on top of any other device.
That's something that is not really "obvious" to me yet.

Cheers,
    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-30 10:06       ` Lars Ellenberg
  0 siblings, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-30 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> Use that fact to implement REQ_OP_WRITE_ZEROES.
> 
> XXX: will need a careful audit from the drbd team!

Thanks, this one looks ok to me.

The real question for me is, will the previous one (21/23)
return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
what we have now?  Will it make an fstrim cause thinly provisioned
devices to suddenly be fully allocated?
Or does it unmap "the same" as what we have now?
Especially on top of dm-thin, but also on top of any other device.
That's something that is not really "obvious" to me yet.

Cheers,
    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 10:06       ` Lars Ellenberg
  (?)
@ 2017-03-30 11:44       ` Christoph Hellwig
  2017-03-30 12:50         ` [Drbd-dev] " Lars Ellenberg
       [not found]         ` <20170330114408.GA15777-jcswGhMUV9g@public.gmane.org>
  -1 siblings, 2 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-30 11:44 UTC (permalink / raw)
  To: Christoph Hellwig, axboe, martin.petersen, agk, snitzer, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> > It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> > Use that fact to implement REQ_OP_WRITE_ZEROES.
> > 
> > XXX: will need a careful audit from the drbd team!
> 
> Thanks, this one looks ok to me.

So the DRBD protocol requires the TRIM operation to always zero?

> The real question for me is, will the previous one (21/23)
> return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
> what we have now?

No, blkdev_issue_zeroout should never return -EOPNOTSUPP.

> Will it make an fstrim cause thinly provisioned
> devices to suddenly be fully allocated?

Not for SCSI devices.  Yes for dm-thinp until it implements
REQ_OP_WRITE_ZEROES, which will hopefully be soon.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [Drbd-dev] [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 11:44       ` Christoph Hellwig
@ 2017-03-30 12:50         ` Lars Ellenberg
       [not found]         ` <20170330114408.GA15777-jcswGhMUV9g@public.gmane.org>
  1 sibling, 0 replies; 101+ messages in thread
From: Lars Ellenberg @ 2017-03-30 12:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

On Thu, Mar 30, 2017 at 01:44:09PM +0200, Christoph Hellwig wrote:
> On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> > On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> > > It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> > > Use that fact to implement REQ_OP_WRITE_ZEROES.
> > > 
> > > XXX: will need a careful audit from the drbd team!
> > 
> > Thanks, this one looks ok to me.
> 
> So the DRBD protocol requires the TRIM operation to always zero?

"users" (both as in submitting entities, and people using DRBD)
expect that DRBD guarantees replicas to be identical after whatever
operations have been completed by all replicas.

Which means that for trim/discard/unmap, we can only expose that to
upper layers (or use it for internal purposes) if the operation has
a well defined, and on all backends identical, result.

Short answer: Yes.

> > The real question for me is, will the previous one (21/23)
> > return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
> > what we have now?
> 
> No, blkdev_issue_zeroout should never return -EOPNOTSUPP.
> 
> > Will it make an fstrim cause thinly provisioned
> > devices to suddenly be fully allocated?
> 
> Not for SCSI devices.  Yes for dm-thinp until it implements
> REQ_OP_WRITE_ZEROES, which will hopefully be soon.

"good enough for me" ;-)

Thanks,

    Lars

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 11:44       ` Christoph Hellwig
@ 2017-03-30 13:49             ` Mike Snitzer
       [not found]         ` <20170330114408.GA15777-jcswGhMUV9g@public.gmane.org>
  1 sibling, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-30 13:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, Mar 30 2017 at  7:44am -0400,
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:

> On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> 
> > Will it make an fstrim cause thinly provisioned
> > devices to suddenly be fully allocated?
> 
> Not for SCSI devices.  Yes for dm-thinp until it implements
> REQ_OP_WRITE_ZEROES, which will hopefully be soon.

I can work on this now.  Only question I have is: should DM thinp take
care to zero any misaligned head and tail?  (I assume so but with all
the back and forth between Bart, Paolo and Martin I figured I'd ask
explicitly).

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-30 13:49             ` Mike Snitzer
  0 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-30 13:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, shli, philipp.reisner, linux-block,
	linux-scsi, drbd-dev, dm-devel, linux-raid

On Thu, Mar 30 2017 at  7:44am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> 
> > Will it make an fstrim cause thinly provisioned
> > devices to suddenly be fully allocated?
> 
> Not for SCSI devices.  Yes for dm-thinp until it implements
> REQ_OP_WRITE_ZEROES, which will hopefully be soon.

I can work on this now.  Only question I have is: should DM thinp take
care to zero any misaligned head and tail?  (I assume so but with all
the back and forth between Bart, Paolo and Martin I figured I'd ask
explicitly).

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
                   ` (23 preceding siblings ...)
       [not found] ` <20170323143341.31549-1-hch-jcswGhMUV9g@public.gmane.org>
@ 2017-03-30 15:12 ` Mike Snitzer
  2017-03-30 15:22   ` Martin K. Petersen
  24 siblings, 1 reply; 101+ messages in thread
From: Mike Snitzer @ 2017-03-30 15:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-raid, dm-devel, linux-scsi,
	drbd-dev

Would be very useful, particularly for testing, if
drivers/scsi/scsi_debug.c were updated to support WRITE ZEROES.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 13:49             ` Mike Snitzer
@ 2017-03-30 15:20                 ` Martin K. Petersen
  -1 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:20 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	Christoph Hellwig, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> I can work on this now.  Only question I have is: should DM thinp take
> care to zero any misaligned head and tail?  (I assume so but with all
> the back and forth between Bart, Paolo and Martin I figured I'd ask
> explicitly).

Yep, let's make sure our semantics match the hardware ditto.

 - So write zeroes should behave deterministically and explicitly handle
   any blocks that can't be cleared via deprovisioning.

 - And discard can work at the discard granularity in a
   non-deterministic fashion.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-30 15:20                 ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:20 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

Mike Snitzer <snitzer@redhat.com> writes:

> I can work on this now.  Only question I have is: should DM thinp take
> care to zero any misaligned head and tail?  (I assume so but with all
> the back and forth between Bart, Paolo and Martin I figured I'd ask
> explicitly).

Yep, let's make sure our semantics match the hardware ditto.

 - So write zeroes should behave deterministically and explicitly handle
   any blocks that can't be cleared via deprovisioning.

 - And discard can work at the discard granularity in a
   non-deterministic fashion.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-30 15:12 ` Mike Snitzer
@ 2017-03-30 15:22   ` Martin K. Petersen
       [not found]     ` <yq1lgrm3i36.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:22 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, axboe, martin.petersen, agk, shli,
	philipp.reisner, lars.ellenberg, linux-block, linux-raid,
	dm-devel, linux-scsi, drbd-dev

Mike Snitzer <snitzer@redhat.com> writes:

> Would be very useful, particularly for testing, if
> drivers/scsi/scsi_debug.c were updated to support WRITE ZEROES.

There is no WRITE ZEROES in SCSI. You should be able to get the right
behavior with lbpws=1 lbprz=1.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-30  9:02           ` hch
@ 2017-03-30 15:28               ` Martin K. Petersen
  -1 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:28 UTC (permalink / raw)
  To: hch@lst.de
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

"hch@lst.de" <hch@lst.de> writes:

Christoph,

> On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
>> >  	if (sdp->no_write_same)
>> >  		return BLKPREP_INVALID;
>> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
>> 
>> Users can change the provisioning mode from user space from SD_LBP_WS16 into
>> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
>> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.
>
> They can, and if the device has too many sectors that will already cause
> discard to fail,

I'm not sure I understand what you mean by that?

-- 
Martin K. Petersen	Oracle Linux Engineering
_______________________________________________
drbd-dev mailing list
drbd-dev@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
@ 2017-03-30 15:28               ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:28 UTC (permalink / raw)
  To: hch
  Cc: Bart Van Assche, agk, lars.ellenberg, snitzer, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

"hch@lst.de" <hch@lst.de> writes:

Christoph,

> On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
>> >  	if (sdp->no_write_same)
>> >  		return BLKPREP_INVALID;
>> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
>>=20
>> Users can change the provisioning mode from user space from=C2=A0SD_LBP_=
WS16 into
>> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
>> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.
>
> They can, and if the device has too many sectors that will already cause
> discard to fail,

I'm not sure I understand what you mean by that?

--=20
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
  2017-03-30  9:06     ` hch
@ 2017-03-30 15:29           ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:29 UTC (permalink / raw)
  To: hch@lst.de
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

"hch-jcswGhMUV9g@public.gmane.org" <hch-jcswGhMUV9g@public.gmane.org> writes:

> Jens, any opinion?  I'd like to remove it too, but I fear it might
> break things.  We could deprecate it first with a warning when read
> and then remove it a few releases down the road.

I know of several apps that check this variable (as opposed to the
ioctl).

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
@ 2017-03-30 15:29           ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-30 15:29 UTC (permalink / raw)
  To: hch
  Cc: Bart Van Assche, agk, lars.ellenberg, snitzer, martin.petersen,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

"hch@lst.de" <hch@lst.de> writes:

> Jens, any opinion?  I'd like to remove it too, but I fear it might
> break things.  We could deprecate it first with a warning when read
> and then remove it a few releases down the road.

I know of several apps that check this variable (as opposed to the
ioctl).

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
  2017-03-30 15:22   ` Martin K. Petersen
@ 2017-03-30 15:38         ` Mike Snitzer
  0 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-30 15:38 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, Christoph Hellwig,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, Mar 30 2017 at 11:22am -0400,
Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:

> Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Would be very useful, particularly for testing, if
> > drivers/scsi/scsi_debug.c were updated to support WRITE ZEROES.
> 
> There is no WRITE ZEROES in SCSI. You should be able to get the right
> behavior with lbpws=1 lbprz=1.

OK, thanks.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
@ 2017-03-30 15:38         ` Mike Snitzer
  0 siblings, 0 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-30 15:38 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, axboe, agk, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-raid, dm-devel, linux-scsi,
	drbd-dev

On Thu, Mar 30 2017 at 11:22am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> Mike Snitzer <snitzer@redhat.com> writes:
> 
> > Would be very useful, particularly for testing, if
> > drivers/scsi/scsi_debug.c were updated to support WRITE ZEROES.
> 
> There is no WRITE ZEROES in SCSI. You should be able to get the right
> behavior with lbpws=1 lbprz=1.

OK, thanks.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
  2017-03-30 15:29           ` Martin K. Petersen
  (?)
@ 2017-03-30 17:29           ` hch
  -1 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-30 17:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: hch, Bart Van Assche, agk, lars.ellenberg, snitzer,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Thu, Mar 30, 2017 at 11:29:29AM -0400, Martin K. Petersen wrote:
> "hch@lst.de" <hch@lst.de> writes:
> 
> > Jens, any opinion?  I'd like to remove it too, but I fear it might
> > break things.  We could deprecate it first with a warning when read
> > and then remove it a few releases down the road.
> 
> I know of several apps that check this variable (as opposed to the
> ioctl).

The above was in reference to both methods..

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-30 15:28               ` Martin K. Petersen
@ 2017-03-30 17:30                 ` hch
  -1 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-30 17:30 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: hch, Bart Van Assche, agk, lars.ellenberg, snitzer,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Thu, Mar 30, 2017 at 11:28:32AM -0400, Martin K. Petersen wrote:
> "hch@lst.de" <hch@lst.de> writes:
> 
> Christoph,
> 
> > On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
> >> >  	if (sdp->no_write_same)
> >> >  		return BLKPREP_INVALID;
> >> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
> >> 
> >> Users can change the provisioning mode from user space from SD_LBP_WS16 into
> >> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> >> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.
> >
> > They can, and if the device has too many sectors that will already cause
> > discard to fail,
> 
> I'm not sure I understand what you mean by that?

If you manually change the provisioning mode to WS10 on a device that
must use WRITE SAME (16) to be able to address all blocks you're already
screwed right now, and with this patch you can screw yourself through
the WRITE_ZEROES path in addition to the DISCARD path.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
@ 2017-03-30 17:30                 ` hch
  0 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-30 17:30 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: hch, Bart Van Assche, agk, lars.ellenberg, snitzer,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Thu, Mar 30, 2017 at 11:28:32AM -0400, Martin K. Petersen wrote:
> "hch@lst.de" <hch@lst.de> writes:
> 
> Christoph,
> 
> > On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
> >> >  	if (sdp->no_write_same)
> >> >  		return BLKPREP_INVALID;
> >> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
> >> 
> >> Users can change the provisioning mode from user space from�SD_LBP_WS16 into
> >> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> >> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.
> >
> > They can, and if the device has too many sectors that will already cause
> > discard to fail,
> 
> I'm not sure I understand what you mean by that?

If you manually change the provisioning mode to WS10 on a device that
must use WRITE SAME (16) to be able to address all blocks you're already
screwed right now, and with this patch you can screw yourself through
the WRITE_ZEROES path in addition to the DISCARD path.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 15:20                 ` Martin K. Petersen
  (?)
@ 2017-03-30 23:15                 ` Mike Snitzer
       [not found]                   ` <20170330231550.GA3102-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-03-31  7:17                   ` Christoph Hellwig
  -1 siblings, 2 replies; 101+ messages in thread
From: Mike Snitzer @ 2017-03-30 23:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, axboe, agk, shli, philipp.reisner,
	linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid

On Thu, Mar 30 2017 at 11:20am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> Mike Snitzer <snitzer@redhat.com> writes:
> 
> > I can work on this now.  Only question I have is: should DM thinp take
> > care to zero any misaligned head and tail?  (I assume so but with all
> > the back and forth between Bart, Paolo and Martin I figured I'd ask
> > explicitly).
> 
> Yep, let's make sure our semantics match the hardware ditto.
> 
>  - So write zeroes should behave deterministically and explicitly handle
>    any blocks that can't be cleared via deprovisioning.
> 
>  - And discard can work at the discard granularity in a
>    non-deterministic fashion.

I got pretty far along with implementing the DM thinp support for
WRITE_ZEROES in terms of thinp's DISCARD support (more of an
implementation detail.. or so I thought).

But while discussing this effort with Jeff Moyer he asked: shouldn't the
zeroed blocks be provisioned?  This is a fairly embarassing question not
to be able to answer in the moment.  So I clearly need to learn what the
overall intent of WRITE_ZEROES actually is.

If it is meant as a replacement for WRITE_SAME (as hch switched dm-io
over from WRITE_SAME with a payload of 0 to WRITE_ZEROES) and for the
backing mechanism for blkdev_issue_zeroout() then I guess I have my
answer.  Unless DM thinp can guarantee that the discarded blocks will
always return zeroes (by zeroing before all partial block writes) my
discard based dm-thinp implementation of WRITE_ZEROES is a complete
throw-away (unless block zeroing is enabled.. which it never is because
performance sucks with it).  So if an upper-level of the IO stack
(e.g. ext4) were to assume that a block will _definitely_ have zeroes
then DM thinp would fall short.

This is all to say: I don't see a quick way forward on implementing
performant WRITE_ZEROES support for DM thinp.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-30 17:30                 ` hch
@ 2017-03-31  2:19                     ` Martin K. Petersen
  -1 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-31  2:19 UTC (permalink / raw)
  To: hch@lst.de
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	Martin K. Petersen, snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Bart Van Assche,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

"hch-jcswGhMUV9g@public.gmane.org" <hch-jcswGhMUV9g@public.gmane.org> writes:

> If you manually change the provisioning mode to WS10 on a device that
> must use WRITE SAME (16) to be able to address all blocks you're already
> screwed right now, and with this patch you can screw yourself through
> the WRITE_ZEROES path in addition to the DISCARD path.

Oh, I see. We only had the LBA sanity check in place for write same, not
for discard.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
@ 2017-03-31  2:19                     ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-31  2:19 UTC (permalink / raw)
  To: hch
  Cc: Martin K. Petersen, Bart Van Assche, agk, lars.ellenberg,
	snitzer, philipp.reisner, axboe, shli, linux-scsi, dm-devel,
	drbd-dev, linux-block, linux-raid

"hch@lst.de" <hch@lst.de> writes:

> If you manually change the provisioning mode to WS10 on a device that
> must use WRITE SAME (16) to be able to address all blocks you're already
> screwed right now, and with this patch you can screw yourself through
> the WRITE_ZEROES path in addition to the DISCARD path.

Oh, I see. We only had the LBA sanity check in place for write same, not
for discard.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 23:15                 ` Mike Snitzer
@ 2017-03-31  2:34                       ` Martin K. Petersen
  2017-03-31  7:17                   ` Christoph Hellwig
  1 sibling, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-31  2:34 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Martin K. Petersen,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, shli-DgEjT+Ai2ygdnm+yROfE0A,
	Christoph Hellwig, agk-H+wXaHxf7aLQT0dZR+AlfA,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ

Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

Mike,

> But while discussing this effort with Jeff Moyer he asked: shouldn't the
> zeroed blocks be provisioned?  This is a fairly embarassing question not
> to be able to answer in the moment.  So I clearly need to learn what the
> overall intent of WRITE_ZEROES actually is.

The intent is to guarantee all future reads to these blocks will return
zeroes. Whether to allocate or deallocate or do a combination to achieve
that goal is up to the device.

> If it is meant as a replacement for WRITE_SAME (as hch switched dm-io
> over from WRITE_SAME with a payload of 0 to WRITE_ZEROES) and for the
> backing mechanism for blkdev_issue_zeroout() then I guess I have my
> answer.

Yes. I was hoping MD would use WRITE SAME to initialize data and parity
drives. That's why I went with SAME nomenclature rather than ZEROES
(which had just appeared in NVMe at that time).

Christoph's renaming is mostly to emphasize the purpose and the
semantics. People keep getting confused because both REQ_DISCARD and
REQ_WRITE_SAME use the WRITE SAME SCSI command. But the two uses are
completely orthogonal and governed by different heuristics in sd.c.

> Unless DM thinp can guarantee that the discarded blocks will
> always return zeroes (by zeroing before all partial block writes) my
> discard based dm-thinp implementation of WRITE_ZEROES is a complete
> throw-away (unless block zeroing is enabled.. which it never is because
> performance sucks with it).  So if an upper-level of the IO stack
> (e.g. ext4) were to assume that a block will _definitely_ have zeroes
> then DM thinp would fall short.

You don't have a way to mark those blocks as being full of zeroes
without actually writing them?

Note that the fallback to a zeroout command is to do a regular write. So
if DM doesn't zero the blocks, the block layer is going to it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
@ 2017-03-31  2:34                       ` Martin K. Petersen
  0 siblings, 0 replies; 101+ messages in thread
From: Martin K. Petersen @ 2017-03-31  2:34 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Christoph Hellwig, axboe, agk, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

Mike Snitzer <snitzer@redhat.com> writes:

Mike,

> But while discussing this effort with Jeff Moyer he asked: shouldn't the
> zeroed blocks be provisioned?  This is a fairly embarassing question not
> to be able to answer in the moment.  So I clearly need to learn what the
> overall intent of WRITE_ZEROES actually is.

The intent is to guarantee all future reads to these blocks will return
zeroes. Whether to allocate or deallocate or do a combination to achieve
that goal is up to the device.

> If it is meant as a replacement for WRITE_SAME (as hch switched dm-io
> over from WRITE_SAME with a payload of 0 to WRITE_ZEROES) and for the
> backing mechanism for blkdev_issue_zeroout() then I guess I have my
> answer.

Yes. I was hoping MD would use WRITE SAME to initialize data and parity
drives. That's why I went with SAME nomenclature rather than ZEROES
(which had just appeared in NVMe at that time).

Christoph's renaming is mostly to emphasize the purpose and the
semantics. People keep getting confused because both REQ_DISCARD and
REQ_WRITE_SAME use the WRITE SAME SCSI command. But the two uses are
completely orthogonal and governed by different heuristics in sd.c.

> Unless DM thinp can guarantee that the discarded blocks will
> always return zeroes (by zeroing before all partial block writes) my
> discard based dm-thinp implementation of WRITE_ZEROES is a complete
> throw-away (unless block zeroing is enabled.. which it never is because
> performance sucks with it).  So if an upper-level of the IO stack
> (e.g. ext4) were to assume that a block will _definitely_ have zeroes
> then DM thinp would fall short.

You don't have a way to mark those blocks as being full of zeroes
without actually writing them?

Note that the fallback to a zeroout command is to do a regular write. So
if DM doesn't zero the blocks, the block layer is going to it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
  2017-03-30 23:15                 ` Mike Snitzer
       [not found]                   ` <20170330231550.GA3102-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-03-31  7:17                   ` Christoph Hellwig
  1 sibling, 0 replies; 101+ messages in thread
From: Christoph Hellwig @ 2017-03-31  7:17 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Christoph Hellwig, axboe, agk, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid

On Thu, Mar 30, 2017 at 07:15:50PM -0400, Mike Snitzer wrote:
> I got pretty far along with implementing the DM thinp support for
> WRITE_ZEROES in terms of thinp's DISCARD support (more of an
> implementation detail.. or so I thought).
> 
> But while discussing this effort with Jeff Moyer he asked: shouldn't the
> zeroed blocks be provisioned?  This is a fairly embarassing question not
> to be able to answer in the moment.  So I clearly need to learn what the
> overall intent of WRITE_ZEROES actually is.

It is to ensure the that the block range it's called on returns zeroes
on future reads.  Currently if REQ_UNMAP is set you may free the space
allocation, else not.

After looking at the callers I think this is the wrong way around, so
I'm going to invert the flag, so that the two callers that care that
blocks are allocated will set it instead.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
  2017-03-31  2:19                     ` Martin K. Petersen
@ 2017-03-31  7:18                         ` hch
  -1 siblings, 0 replies; 101+ messages in thread
From: hch-jcswGhMUV9g @ 2017-03-31  7:18 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Bart Van Assche,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA,
	shli-DgEjT+Ai2ygdnm+yROfE0A, hch-jcswGhMUV9g,
	agk-H+wXaHxf7aLQT0dZR+AlfA, drbd-dev-cunTk1MwBs8qoQakbn7OcQ

On Thu, Mar 30, 2017 at 10:19:55PM -0400, Martin K. Petersen wrote:
> > If you manually change the provisioning mode to WS10 on a device that
> > must use WRITE SAME (16) to be able to address all blocks you're already
> > screwed right now, and with this patch you can screw yourself through
> > the WRITE_ZEROES path in addition to the DISCARD path.
> 
> Oh, I see. We only had the LBA sanity check in place for write same, not
> for discard.

And btw, I'd be happy to add such a check, I'd just rather keep it out
of this patch.  It might be a good idea if you give it a turn after
this series given that you have all the devices that have weird
provisioning modes while I don't..

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
@ 2017-03-31  7:18                         ` hch
  0 siblings, 0 replies; 101+ messages in thread
From: hch @ 2017-03-31  7:18 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: hch, Bart Van Assche, agk, lars.ellenberg, snitzer,
	philipp.reisner, axboe, shli, linux-scsi, dm-devel, drbd-dev,
	linux-block, linux-raid

On Thu, Mar 30, 2017 at 10:19:55PM -0400, Martin K. Petersen wrote:
> > If you manually change the provisioning mode to WS10 on a device that
> > must use WRITE SAME (16) to be able to address all blocks you're already
> > screwed right now, and with this patch you can screw yourself through
> > the WRITE_ZEROES path in addition to the DISCARD path.
> 
> Oh, I see. We only had the LBA sanity check in place for write same, not
> for discard.

And btw, I'd be happy to add such a check, I'd just rather keep it out
of this patch.  It might be a good idea if you give it a turn after
this series given that you have all the devices that have weird
provisioning modes while I don't..

^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, other threads:[~2017-03-31  7:18 UTC | newest]

Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
2017-03-23 14:33 ` [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-28 16:12   ` Bart Van Assche
2017-03-28 16:12     ` Bart Van Assche
     [not found]     ` <1490717553.2573.4.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-30  8:53       ` hch-jcswGhMUV9g
2017-03-30  8:53         ` hch
2017-03-23 14:33 ` [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios Christoph Hellwig
2017-03-23 14:33 ` [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-28 18:50   ` Bart Van Assche
2017-03-28 18:50     ` Bart Van Assche
     [not found]     ` <1490726988.2573.16.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-28 19:33       ` Mike Snitzer
2017-03-28 19:33         ` Mike Snitzer
2017-03-30  2:25       ` Martin K. Petersen
2017-03-30  2:25         ` Martin K. Petersen
2017-03-29 14:51     ` Paolo Bonzini
2017-03-29 16:28       ` Bart Van Assche
2017-03-29 16:28         ` Bart Van Assche
2017-03-29 16:53         ` Paolo Bonzini
2017-03-29 16:53           ` Paolo Bonzini
2017-03-23 14:33 ` [PATCH 04/23] md: support REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:33 ` [PATCH 05/23] dm: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:55   ` Mike Snitzer
2017-03-23 14:56     ` Christoph Hellwig
2017-03-23 15:10       ` Mike Snitzer
2017-03-27  9:12         ` Christoph Hellwig
2017-03-27  9:12           ` Christoph Hellwig
2017-03-23 14:33 ` [PATCH 07/23] block: stop using blkdev_issue_write_same for zeroing Christoph Hellwig
2017-03-23 14:33 ` [PATCH 08/23] block: add a flags argument to (__)blkdev_issue_zeroout Christoph Hellwig
2017-03-23 14:33 ` [PATCH 09/23] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:33 ` [PATCH 10/23] block: add a new BLKDEV_ZERO_NOFALLBACK flag Christoph Hellwig
2017-03-23 14:33 ` [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches Christoph Hellwig
2017-03-28 16:50   ` Bart Van Assche
2017-03-28 16:50     ` Bart Van Assche
     [not found]     ` <1490719834.2573.9.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-30  8:59       ` hch-jcswGhMUV9g
2017-03-30  8:59         ` hch
2017-03-23 14:33 ` [PATCH 12/23] sd: handle REQ_UNMAP Christoph Hellwig
     [not found]   ` <20170323143341.31549-13-hch-jcswGhMUV9g@public.gmane.org>
2017-03-28 16:48     ` Bart Van Assche
2017-03-28 16:48       ` Bart Van Assche
2017-03-29 14:57       ` Paolo Bonzini
     [not found]       ` <1490719722.2573.8.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-30  9:02         ` hch-jcswGhMUV9g
2017-03-30  9:02           ` hch
     [not found]           ` <20170330090201.GD12015-jcswGhMUV9g@public.gmane.org>
2017-03-30 15:28             ` Martin K. Petersen
2017-03-30 15:28               ` Martin K. Petersen
2017-03-30 17:30               ` hch
2017-03-30 17:30                 ` hch
     [not found]                 ` <20170330173020.GB24229-jcswGhMUV9g@public.gmane.org>
2017-03-31  2:19                   ` Martin K. Petersen
2017-03-31  2:19                     ` Martin K. Petersen
     [not found]                     ` <yq17f36yypg.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-31  7:18                       ` hch-jcswGhMUV9g
2017-03-31  7:18                         ` hch
2017-03-23 14:33 ` [PATCH 13/23] nvme: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:33 ` [PATCH 14/23] zram: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 15/23] loop: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 16/23] brd: remove discard support Christoph Hellwig
2017-03-23 14:33 ` [PATCH 17/23] rbd: remove the discard_zeroes_data flag Christoph Hellwig
2017-03-23 14:33 ` [PATCH 18/23] rsxx: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 19/23] mmc: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 20/23] block: stop using discards for zeroing Christoph Hellwig
2017-03-23 14:33 ` [PATCH 21/23] drbd: make intelligent use of blkdev_issue_zeroout Christoph Hellwig
2017-03-23 14:33 ` [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
     [not found]   ` <20170323143341.31549-23-hch-jcswGhMUV9g@public.gmane.org>
2017-03-30 10:06     ` Lars Ellenberg
2017-03-30 10:06       ` Lars Ellenberg
2017-03-30 11:44       ` Christoph Hellwig
2017-03-30 12:50         ` [Drbd-dev] " Lars Ellenberg
     [not found]         ` <20170330114408.GA15777-jcswGhMUV9g@public.gmane.org>
2017-03-30 13:49           ` Mike Snitzer
2017-03-30 13:49             ` Mike Snitzer
     [not found]             ` <20170330134957.GA508-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-30 15:20               ` Martin K. Petersen
2017-03-30 15:20                 ` Martin K. Petersen
2017-03-30 23:15                 ` Mike Snitzer
     [not found]                   ` <20170330231550.GA3102-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-31  2:34                     ` Martin K. Petersen
2017-03-31  2:34                       ` Martin K. Petersen
2017-03-31  7:17                   ` Christoph Hellwig
2017-03-23 14:33 ` [PATCH 23/23] block: remove the discard_zeroes_data flag Christoph Hellwig
2017-03-28 17:00   ` Bart Van Assche
2017-03-28 17:00     ` Bart Van Assche
2017-03-29 14:52     ` Paolo Bonzini
2017-03-30  9:06     ` hch
     [not found]       ` <20170330090655.GF12015-jcswGhMUV9g@public.gmane.org>
2017-03-30 15:29         ` Martin K. Petersen
2017-03-30 15:29           ` Martin K. Petersen
2017-03-30 17:29           ` hch
     [not found] ` <20170323143341.31549-1-hch-jcswGhMUV9g@public.gmane.org>
2017-03-23 15:54   ` RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Lars Ellenberg
2017-03-23 15:54     ` Lars Ellenberg
2017-03-23 17:02     ` Mike Snitzer
2017-03-23 17:02       ` Mike Snitzer
     [not found]       ` <20170323170221.GA20854-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-23 22:53         ` Lars Ellenberg
2017-03-23 22:53           ` Lars Ellenberg
2017-03-29 14:57           ` Paolo Bonzini
     [not found]     ` <20170323155410.GD1138-w1SgEEioFePxa46PmUWvFg@public.gmane.org>
2017-03-27  9:10       ` Christoph Hellwig
2017-03-27  9:10         ` Christoph Hellwig
     [not found]         ` <20170327091056.GB6879-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-03-27 14:03           ` Mike Snitzer
2017-03-27 14:03             ` Mike Snitzer
     [not found]             ` <20170327140307.GA13020-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-27 14:57               ` Christoph Hellwig
2017-03-27 14:57                 ` Christoph Hellwig
2017-03-27 15:08             ` [Drbd-dev] " Bart Van Assche
2017-03-27 15:08               ` Bart Van Assche
2017-03-30  9:04       ` Christoph Hellwig
2017-03-30  9:04         ` Christoph Hellwig
2017-03-30 15:12 ` Mike Snitzer
2017-03-30 15:22   ` Martin K. Petersen
     [not found]     ` <yq1lgrm3i36.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-30 15:38       ` Mike Snitzer
2017-03-30 15:38         ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.