linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH block v2 0/3] block: Introduce REQ_NOZERO flag for REQ_OP_WRITE_ZEROES operation
@ 2020-01-16 12:35 Kirill Tkhai
  2020-01-16 12:35 ` [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-16 12:35 UTC (permalink / raw)
  To: linux-block, linux-kernel, martin.petersen, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, chaitanya.kulkarni,
	bvanassche, dhowells, asml.silence, ktkhai

(was "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
 to reflect extents allocation in block device internals")

v2: Introduce new flag for REQ_OP_WRITE_ZEROES instead of
    introduction a new operation as suggested by Martin K. Petersen.
    Removed ext4-related patch to focus on block changes
    for now.

Information about continuous extent placement may be useful
for some block devices. Say, distributed network filesystems,
which provide block device interface, may use this information
for better blocks placement over the nodes in their cluster,
and for better performance. Block devices, which map a file
on another filesystem (loop), may request the same length extent
on underlining filesystem for less fragmentation and for batching
allocation requests. Also, hypervisors like QEMU may use this
information for optimization of cluster allocations.

This patchset introduces REQ_NOZERO flag for REQ_OP_WRITE_ZEROES,
which makes a block device to allocate blocks instead of actual
blocks zeroing. This may be used for forwarding user's fallocate(0)
requests into block device internals. E.g., in loop driver this
will result in allocation extents in backing-file, so subsequent
write won't fail by the reason of no available space. Distributed
network filesystems will be able to assign specific servers for
specific extents, so subsequent write will be more efficient.

Patch [1/3] is preparation, patch [2/3] introduces REQ_NOZERO flag
and implements all the logic, patch [3/3] adds loop as the first
user of the flag.

Note, that here is only block-related patches, example of usage
for ext4 with a performance numbers may be seen in [1].

[1] https://lore.kernel.org/linux-ext4/157599697369.12112.10138136904533871162.stgit@localhost.localdomain/T/#me5bdd5cc313e14de615d81bea214f355ae975db0

---

Kirill Tkhai (3):
      block: Add @flags argument to bdev_write_zeroes_sectors()
      block: Add support for REQ_NOZERO flag
      loop: Add support for REQ_NOZERO


 block/blk-core.c                    |    6 +++---
 block/blk-lib.c                     |   17 ++++++++++-------
 block/blk-merge.c                   |    9 ++++++---
 block/blk-settings.c                |   17 +++++++++++++++++
 drivers/block/loop.c                |   15 ++++++++++++---
 drivers/md/dm-kcopyd.c              |    2 +-
 drivers/target/target_core_iblock.c |    4 ++--
 fs/block_dev.c                      |    4 ++++
 include/linux/blk_types.h           |    5 ++++-
 include/linux/blkdev.h              |   34 ++++++++++++++++++++++++++--------
 10 files changed, 85 insertions(+), 28 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors()
  2020-01-16 12:35 [PATCH block v2 0/3] block: Introduce REQ_NOZERO flag for REQ_OP_WRITE_ZEROES operation Kirill Tkhai
@ 2020-01-16 12:35 ` Kirill Tkhai
  2020-01-19  1:46   ` Bob Liu
  2020-01-21  5:45   ` Martin K. Petersen
  2020-01-16 12:36 ` [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag Kirill Tkhai
  2020-01-16 12:36 ` [PATCH block v2 3/3] loop: Add support for REQ_NOZERO Kirill Tkhai
  2 siblings, 2 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-16 12:35 UTC (permalink / raw)
  To: linux-block, linux-kernel, martin.petersen, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, chaitanya.kulkarni,
	bvanassche, dhowells, asml.silence, ktkhai

This is a preparation for next patch, which introduces
a new flag BLKDEV_ZERO_ALLOCATE for calls, which need
only allocation of blocks and don't need actual blocks
zeroing.

CC: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 block/blk-lib.c                     |    6 +++---
 drivers/md/dm-kcopyd.c              |    2 +-
 drivers/target/target_core_iblock.c |    4 ++--
 include/linux/blkdev.h              |    3 ++-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..3e38c93cfc53 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -224,7 +224,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		return -EPERM;
 
 	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
-	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
+	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
 
 	if (max_write_zeroes_sectors == 0)
 		return -EOPNOTSUPP;
@@ -362,7 +362,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	sector_t bs_mask;
 	struct bio *bio;
 	struct blk_plug plug;
-	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
+	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
 
 	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
 	if ((sector | nr_sects) & bs_mask)
@@ -391,7 +391,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			try_write_zeroes = false;
 			goto retry;
 		}
-		if (!bdev_write_zeroes_sectors(bdev)) {
+		if (!bdev_write_zeroes_sectors(bdev, 0)) {
 			/*
 			 * Zeroing offload support was indicated, but the
 			 * device reported ILLEGAL REQUEST (for some devices
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 1bbe4a34ef4c..f1b8e7926dd4 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -831,7 +831,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
 		 */
 		job->rw = REQ_OP_WRITE_ZEROES;
 		for (i = 0; i < job->num_dests; i++)
-			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
+			if (!bdev_write_zeroes_sectors(job->dests[i].bdev, 0)) {
 				job->rw = WRITE;
 				break;
 			}
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 51ffd5c002de..73a63e197bf5 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -117,7 +117,7 @@ static int iblock_configure_device(struct se_device *dev)
 	 * Enable write same emulation for IBLOCK and use 0xFFFF as
 	 * the smaller WRITE_SAME(10) only has a two-byte block count.
 	 */
-	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd);
+	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd, 0);
 	if (max_write_zeroes_sectors)
 		dev->dev_attrib.max_write_same_len = max_write_zeroes_sectors;
 	else
@@ -468,7 +468,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
 		return TCM_INVALID_CDB_FIELD;
 	}
 
-	if (bdev_write_zeroes_sectors(bdev)) {
+	if (bdev_write_zeroes_sectors(bdev, 0)) {
 		if (!iblock_execute_zero_out(bdev, cmd))
 			return 0;
 	}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c45779f00cbd..4cd69552df9a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1418,7 +1418,8 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
 	return 0;
 }
 
-static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
+static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
+						     unsigned int flags)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-16 12:35 [PATCH block v2 0/3] block: Introduce REQ_NOZERO flag for REQ_OP_WRITE_ZEROES operation Kirill Tkhai
  2020-01-16 12:35 ` [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
@ 2020-01-16 12:36 ` Kirill Tkhai
  2020-01-19  1:50   ` Bob Liu
  2020-01-21  6:14   ` Martin K. Petersen
  2020-01-16 12:36 ` [PATCH block v2 3/3] loop: Add support for REQ_NOZERO Kirill Tkhai
  2 siblings, 2 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-16 12:36 UTC (permalink / raw)
  To: linux-block, linux-kernel, martin.petersen, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, chaitanya.kulkarni,
	bvanassche, dhowells, asml.silence, ktkhai

This adds support for REQ_NOZERO extension of REQ_OP_WRITE_ZEROES
operation, which encourages a block device driver to just allocate
blocks (or mark them allocated) instead of actual blocks zeroing.
REQ_NOZERO is aimed to be used for network filesystems providing
a block device interface. Also, block devices, which map a file
on other filesystem (like loop), may use this for less fragmentation
and batching fallocate() requests. Hypervisors like QEMU may
introduce optimizations of clusters allocations based on this.

BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
blkdev_issue_zeroout().

CC: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 block/blk-core.c          |    6 +++---
 block/blk-lib.c           |   17 ++++++++++-------
 block/blk-merge.c         |    9 ++++++---
 block/blk-settings.c      |   17 +++++++++++++++++
 fs/block_dev.c            |    4 ++++
 include/linux/blk_types.h |    5 ++++-
 include/linux/blkdev.h    |   31 ++++++++++++++++++++++++-------
 7 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 50a5de025d5e..2edcd55624f1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -978,7 +978,7 @@ generic_make_request_checks(struct bio *bio)
 			goto not_supported;
 		break;
 	case REQ_OP_WRITE_ZEROES:
-		if (!q->limits.max_write_zeroes_sectors)
+		if (!blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf))
 			goto not_supported;
 		break;
 	default:
@@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
 static int blk_cloned_rq_check_limits(struct request_queue *q,
 				      struct request *rq)
 {
-	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
+	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
 		printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
 			__func__, blk_rq_sectors(rq),
-			blk_queue_get_max_sectors(q, req_op(rq)));
+			blk_queue_get_max_sectors(q, rq->cmd_flags));
 		return -EIO;
 	}
 
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3e38c93cfc53..3e80279eb029 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		struct bio **biop, unsigned flags)
 {
 	struct bio *bio = *biop;
-	unsigned int max_write_zeroes_sectors;
+	unsigned int max_write_zeroes_sectors, req_flags = 0;
 	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (!q)
@@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		return -EPERM;
 
 	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
-	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
+	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
 
 	if (max_write_zeroes_sectors == 0)
 		return -EOPNOTSUPP;
 
+	if (flags & BLKDEV_ZERO_NOUNMAP)
+		req_flags |= REQ_NOUNMAP;
+	if (flags & BLKDEV_ZERO_ALLOCATE)
+		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
+
 	while (nr_sects) {
 		bio = blk_next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
-		bio->bi_opf = REQ_OP_WRITE_ZEROES;
-		if (flags & BLKDEV_ZERO_NOUNMAP)
-			bio->bi_opf |= REQ_NOUNMAP;
+		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
 
 		if (nr_sects > max_write_zeroes_sectors) {
 			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
@@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	sector_t bs_mask;
 	struct bio *bio;
 	struct blk_plug plug;
-	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
+	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
 
 	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
 	if ((sector | nr_sects) & bs_mask)
@@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			try_write_zeroes = false;
 			goto retry;
 		}
-		if (!bdev_write_zeroes_sectors(bdev, 0)) {
+		if (!bdev_write_zeroes_sectors(bdev, flags)) {
 			/*
 			 * Zeroing offload support was indicated, but the
 			 * device reported ILLEGAL REQUEST (for some devices
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 347782a24a35..e3ce4b87bbaa 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -105,15 +105,18 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
 static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
 		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
 {
+	unsigned int max_sectors;
+
+	max_sectors = blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf);
 	*nsegs = 0;
 
-	if (!q->limits.max_write_zeroes_sectors)
+	if (!max_sectors)
 		return NULL;
 
-	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
+	if (bio_sectors(bio) <= max_sectors)
 		return NULL;
 
-	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
+	return bio_split(bio, max_sectors, GFP_NOIO, bs);
 }
 
 static struct bio *blk_bio_write_same_split(struct request_queue *q,
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 5f6dcc7a47bd..f682374c5106 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->chunk_sectors = 0;
 	lim->max_write_same_sectors = 0;
 	lim->max_write_zeroes_sectors = 0;
+	lim->max_allocate_sectors = 0;
 	lim->max_discard_sectors = 0;
 	lim->max_hw_discard_sectors = 0;
 	lim->discard_granularity = 0;
@@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_same_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
+	lim->max_allocate_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -257,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_allocate_sectors - set max sectors for a single
+ *                                  allocate request
+ * @q:  the request queue for the device
+ * @max_allocate_sectors: maximum number of sectors to write per command
+ **/
+void blk_queue_max_allocate_sectors(struct request_queue *q,
+		unsigned int max_allocate_sectors)
+{
+	q->limits.max_allocate_sectors = max_allocate_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
+
 /**
  * blk_queue_max_segments - set max hw segments for a request for this queue
  * @q:  the request queue for the device
@@ -506,6 +521,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 					b->max_write_same_sectors);
 	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
 					b->max_write_zeroes_sectors);
+	t->max_allocate_sectors = min(t->max_allocate_sectors,
+					b->max_allocate_sectors);
 	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
 
 	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 69bf2fb6f7cd..1ffef894b3bd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
 		break;
+	case FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
+		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
 		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, 0);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 70254ae11769..9ed166860099 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -335,7 +335,9 @@ enum req_flag_bits {
 
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
-
+	__REQ_NOZERO,		/* only notify about allocated blocks,
+				 * and do not actual zero them
+				 */
 	__REQ_HIPRI,
 
 	/* for driver use */
@@ -362,6 +364,7 @@ enum req_flag_bits {
 #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
+#define REQ_NOZERO		(1ULL << __REQ_NOZERO)
 #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4cd69552df9a..f4ec5db64432 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -336,6 +336,7 @@ struct queue_limits {
 	unsigned int		max_hw_discard_sectors;
 	unsigned int		max_write_same_sectors;
 	unsigned int		max_write_zeroes_sectors;
+	unsigned int		max_allocate_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 
@@ -988,9 +989,19 @@ static inline struct bio_vec req_bvec(struct request *rq)
 	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
 }
 
+static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
+		struct request_queue *q, unsigned int op_flags)
+{
+	if (op_flags & REQ_NOZERO)
+		return q->limits.max_allocate_sectors;
+	return q->limits.max_write_zeroes_sectors;
+}
+
 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
-						     int op)
+						     unsigned int op_flags)
 {
+	int op = op_flags & REQ_OP_MASK;
+
 	if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE))
 		return min(q->limits.max_discard_sectors,
 			   UINT_MAX >> SECTOR_SHIFT);
@@ -999,7 +1010,7 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 		return q->limits.max_write_same_sectors;
 
 	if (unlikely(op == REQ_OP_WRITE_ZEROES))
-		return q->limits.max_write_zeroes_sectors;
+		return blk_queue_get_max_write_zeroes_sectors(q, op_flags);
 
 	return q->limits.max_sectors;
 }
@@ -1029,10 +1040,10 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
 	if (!q->limits.chunk_sectors ||
 	    req_op(rq) == REQ_OP_DISCARD ||
 	    req_op(rq) == REQ_OP_SECURE_ERASE)
-		return blk_queue_get_max_sectors(q, req_op(rq));
+		return blk_queue_get_max_sectors(q, rq->cmd_flags);
 
 	return min(blk_max_size_offset(q, offset),
-			blk_queue_get_max_sectors(q, req_op(rq)));
+			blk_queue_get_max_sectors(q, rq->cmd_flags));
 }
 
 static inline unsigned int blk_rq_count_bios(struct request *rq)
@@ -1078,6 +1089,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_allocate_sectors(struct request_queue *q,
+		unsigned int max_allocate_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned short);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_alignment_offset(struct request_queue *q,
@@ -1219,6 +1232,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
+#define BLKDEV_ZERO_ALLOCATE	(1 << 2)  /* allocate range of blocks */
 
 extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
@@ -1423,10 +1437,13 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 
-	if (q)
-		return q->limits.max_write_zeroes_sectors;
+	if (!q)
+		return 0;
 
-	return 0;
+	if (flags & BLKDEV_ZERO_ALLOCATE)
+		return q->limits.max_allocate_sectors;
+	else
+		return q->limits.max_write_zeroes_sectors;
 }
 
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH block v2 3/3] loop: Add support for REQ_NOZERO
  2020-01-16 12:35 [PATCH block v2 0/3] block: Introduce REQ_NOZERO flag for REQ_OP_WRITE_ZEROES operation Kirill Tkhai
  2020-01-16 12:35 ` [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
  2020-01-16 12:36 ` [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag Kirill Tkhai
@ 2020-01-16 12:36 ` Kirill Tkhai
  2 siblings, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-16 12:36 UTC (permalink / raw)
  To: linux-block, linux-kernel, martin.petersen, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, chaitanya.kulkarni,
	bvanassche, dhowells, asml.silence, ktkhai

Support for new modifier of REQ_OP_WRITE_ZEROES command.
This results in allocation extents in backing file instead
of actual blocks zeroing.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/block/loop.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..f9e5af7b2ee0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -581,6 +581,15 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	return 0;
 }
 
+static unsigned int write_zeroes_to_fallocate_mode(unsigned int flags)
+{
+	if (flags & REQ_NOZERO)
+		return 0;
+	if (flags & REQ_NOUNMAP)
+		return FALLOC_FL_ZERO_RANGE;
+	return FALLOC_FL_PUNCH_HOLE;
+}
+
 static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 {
 	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
@@ -604,9 +613,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 		 * write zeroes the range.  Otherwise, punch them out.
 		 */
 		return lo_fallocate(lo, rq, pos,
-			(rq->cmd_flags & REQ_NOUNMAP) ?
-				FALLOC_FL_ZERO_RANGE :
-				FALLOC_FL_PUNCH_HOLE);
+			write_zeroes_to_fallocate_mode(rq->cmd_flags));
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_WRITE:
@@ -877,6 +884,7 @@ static void loop_config_discard(struct loop_device *lo)
 		q->limits.discard_alignment = 0;
 		blk_queue_max_discard_sectors(q, 0);
 		blk_queue_max_write_zeroes_sectors(q, 0);
+		blk_queue_max_allocate_sectors(q, 0);
 		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
 		return;
 	}
@@ -886,6 +894,7 @@ static void loop_config_discard(struct loop_device *lo)
 
 	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
 	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
+	blk_queue_max_allocate_sectors(q, UINT_MAX >> 9);
 	blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
 }
 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors()
  2020-01-16 12:35 ` [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
@ 2020-01-19  1:46   ` Bob Liu
  2020-01-20 11:11     ` Kirill Tkhai
  2020-01-21  5:45   ` Martin K. Petersen
  1 sibling, 1 reply; 15+ messages in thread
From: Bob Liu @ 2020-01-19  1:46 UTC (permalink / raw)
  To: Kirill Tkhai, linux-block, linux-kernel, martin.petersen, axboe,
	tytso, adilger.kernel, Chaitanya.Kulkarni, darrick.wong,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence

On 1/16/20 8:35 PM, Kirill Tkhai wrote:
> This is a preparation for next patch, which introduces
> a new flag BLKDEV_ZERO_ALLOCATE for calls, which need
> only allocation of blocks and don't need actual blocks
> zeroing.
> 
> CC: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  block/blk-lib.c                     |    6 +++---
>  drivers/md/dm-kcopyd.c              |    2 +-
>  drivers/target/target_core_iblock.c |    4 ++--
>  include/linux/blkdev.h              |    3 ++-
>  4 files changed, 8 insertions(+), 7 deletions(-)
> 

Looks good to me.
Reviewed-by: Bob Liu <bob.liu@oracle.com>

> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 5f2c429d4378..3e38c93cfc53 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -224,7 +224,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		return -EPERM;
>  
>  	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
>  
>  	if (max_write_zeroes_sectors == 0)
>  		return -EOPNOTSUPP;
> @@ -362,7 +362,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  	sector_t bs_mask;
>  	struct bio *bio;
>  	struct blk_plug plug;
> -	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
> +	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
>  
>  	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>  	if ((sector | nr_sects) & bs_mask)
> @@ -391,7 +391,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  			try_write_zeroes = false;
>  			goto retry;
>  		}
> -		if (!bdev_write_zeroes_sectors(bdev)) {
> +		if (!bdev_write_zeroes_sectors(bdev, 0)) {
>  			/*
>  			 * Zeroing offload support was indicated, but the
>  			 * device reported ILLEGAL REQUEST (for some devices
> diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
> index 1bbe4a34ef4c..f1b8e7926dd4 100644
> --- a/drivers/md/dm-kcopyd.c
> +++ b/drivers/md/dm-kcopyd.c
> @@ -831,7 +831,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
>  		 */
>  		job->rw = REQ_OP_WRITE_ZEROES;
>  		for (i = 0; i < job->num_dests; i++)
> -			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
> +			if (!bdev_write_zeroes_sectors(job->dests[i].bdev, 0)) {
>  				job->rw = WRITE;
>  				break;
>  			}
> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index 51ffd5c002de..73a63e197bf5 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -117,7 +117,7 @@ static int iblock_configure_device(struct se_device *dev)
>  	 * Enable write same emulation for IBLOCK and use 0xFFFF as
>  	 * the smaller WRITE_SAME(10) only has a two-byte block count.
>  	 */
> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd);
> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd, 0);
>  	if (max_write_zeroes_sectors)
>  		dev->dev_attrib.max_write_same_len = max_write_zeroes_sectors;
>  	else
> @@ -468,7 +468,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
>  		return TCM_INVALID_CDB_FIELD;
>  	}
>  
> -	if (bdev_write_zeroes_sectors(bdev)) {
> +	if (bdev_write_zeroes_sectors(bdev, 0)) {
>  		if (!iblock_execute_zero_out(bdev, cmd))
>  			return 0;
>  	}
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c45779f00cbd..4cd69552df9a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1418,7 +1418,8 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
>  	return 0;
>  }
>  
> -static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
> +static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
> +						     unsigned int flags)
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
>  
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-16 12:36 ` [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag Kirill Tkhai
@ 2020-01-19  1:50   ` Bob Liu
  2020-01-20 10:02     ` Kirill Tkhai
  2020-01-21  6:14   ` Martin K. Petersen
  1 sibling, 1 reply; 15+ messages in thread
From: Bob Liu @ 2020-01-19  1:50 UTC (permalink / raw)
  To: Kirill Tkhai, linux-block, linux-kernel, martin.petersen, axboe,
	tytso, adilger.kernel, Chaitanya.Kulkarni, darrick.wong,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence

On 1/16/20 8:36 PM, Kirill Tkhai wrote:
> This adds support for REQ_NOZERO extension of REQ_OP_WRITE_ZEROES
> operation, which encourages a block device driver to just allocate
> blocks (or mark them allocated) instead of actual blocks zeroing.
> REQ_NOZERO is aimed to be used for network filesystems providing
> a block device interface. Also, block devices, which map a file
> on other filesystem (like loop), may use this for less fragmentation
> and batching fallocate() requests. Hypervisors like QEMU may
> introduce optimizations of clusters allocations based on this.
> 
> BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
> blkdev_issue_zeroout().
>> CC: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  block/blk-core.c          |    6 +++---
>  block/blk-lib.c           |   17 ++++++++++-------
>  block/blk-merge.c         |    9 ++++++---
>  block/blk-settings.c      |   17 +++++++++++++++++
>  fs/block_dev.c            |    4 ++++
>  include/linux/blk_types.h |    5 ++++-
>  include/linux/blkdev.h    |   31 ++++++++++++++++++++++++-------
>  7 files changed, 68 insertions(+), 21 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 50a5de025d5e..2edcd55624f1 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -978,7 +978,7 @@ generic_make_request_checks(struct bio *bio)
>  			goto not_supported;
>  		break;
>  	case REQ_OP_WRITE_ZEROES:
> -		if (!q->limits.max_write_zeroes_sectors)
> +		if (!blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf))
>  			goto not_supported;
>  		break;
>  	default:
> @@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
>  static int blk_cloned_rq_check_limits(struct request_queue *q,
>  				      struct request *rq)
>  {
> -	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
> +	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
>  		printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
>  			__func__, blk_rq_sectors(rq),
> -			blk_queue_get_max_sectors(q, req_op(rq)));
> +			blk_queue_get_max_sectors(q, rq->cmd_flags));
>  		return -EIO;
>  	}
>  
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 3e38c93cfc53..3e80279eb029 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		struct bio **biop, unsigned flags)
>  {
>  	struct bio *bio = *biop;
> -	unsigned int max_write_zeroes_sectors;
> +	unsigned int max_write_zeroes_sectors, req_flags = 0;
>  	struct request_queue *q = bdev_get_queue(bdev);
>  
>  	if (!q)
> @@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		return -EPERM;
>  
>  	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
>  
>  	if (max_write_zeroes_sectors == 0)
>  		return -EOPNOTSUPP;
>  
> +	if (flags & BLKDEV_ZERO_NOUNMAP)
> +		req_flags |= REQ_NOUNMAP;
> +	if (flags & BLKDEV_ZERO_ALLOCATE)
> +		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
> +
>  	while (nr_sects) {
>  		bio = blk_next_bio(bio, 0, gfp_mask);
>  		bio->bi_iter.bi_sector = sector;
>  		bio_set_dev(bio, bdev);
> -		bio->bi_opf = REQ_OP_WRITE_ZEROES;
> -		if (flags & BLKDEV_ZERO_NOUNMAP)
> -			bio->bi_opf |= REQ_NOUNMAP;
> +		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
>  
>  		if (nr_sects > max_write_zeroes_sectors) {
>  			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
> @@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  	sector_t bs_mask;
>  	struct bio *bio;
>  	struct blk_plug plug;
> -	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
> +	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
>  
>  	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>  	if ((sector | nr_sects) & bs_mask)
> @@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  			try_write_zeroes = false;
>  			goto retry;
>  		}
> -		if (!bdev_write_zeroes_sectors(bdev, 0)) {
> +		if (!bdev_write_zeroes_sectors(bdev, flags)) {
>  			/*
>  			 * Zeroing offload support was indicated, but the
>  			 * device reported ILLEGAL REQUEST (for some devices
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 347782a24a35..e3ce4b87bbaa 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -105,15 +105,18 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
>  static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
>  		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
>  {
> +	unsigned int max_sectors;
> +
> +	max_sectors = blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf);
>  	*nsegs = 0;
>  
> -	if (!q->limits.max_write_zeroes_sectors)
> +	if (!max_sectors)
>  		return NULL;
>  
> -	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
> +	if (bio_sectors(bio) <= max_sectors)
>  		return NULL;
>  
> -	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
> +	return bio_split(bio, max_sectors, GFP_NOIO, bs);
>  }
>  
>  static struct bio *blk_bio_write_same_split(struct request_queue *q,
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 5f6dcc7a47bd..f682374c5106 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->chunk_sectors = 0;
>  	lim->max_write_same_sectors = 0;
>  	lim->max_write_zeroes_sectors = 0;
> +	lim->max_allocate_sectors = 0;
>  	lim->max_discard_sectors = 0;
>  	lim->max_hw_discard_sectors = 0;
>  	lim->discard_granularity = 0;
> @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>  	lim->max_dev_sectors = UINT_MAX;
>  	lim->max_write_same_sectors = UINT_MAX;
>  	lim->max_write_zeroes_sectors = UINT_MAX;
> +	lim->max_allocate_sectors = UINT_MAX;
>  }
>  EXPORT_SYMBOL(blk_set_stacking_limits);
>  
> @@ -257,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>  
> +/**
> + * blk_queue_max_allocate_sectors - set max sectors for a single
> + *                                  allocate request
> + * @q:  the request queue for the device
> + * @max_allocate_sectors: maximum number of sectors to write per command
> + **/
> +void blk_queue_max_allocate_sectors(struct request_queue *q,
> +		unsigned int max_allocate_sectors)
> +{
> +	q->limits.max_allocate_sectors = max_allocate_sectors;
> +}
> +EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
> +

I'd suggest split this to a separated patch.

>  /**
>   * blk_queue_max_segments - set max hw segments for a request for this queue
>   * @q:  the request queue for the device
> @@ -506,6 +521,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  					b->max_write_same_sectors);
>  	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
>  					b->max_write_zeroes_sectors);
> +	t->max_allocate_sectors = min(t->max_allocate_sectors,
> +					b->max_allocate_sectors);
>  	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>  
>  	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 69bf2fb6f7cd..1ffef894b3bd 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>  					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
>  		break;
> +	case FALLOC_FL_KEEP_SIZE:
> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> +			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
> +		break;
>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
>  		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>  					     GFP_KERNEL, 0);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 70254ae11769..9ed166860099 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -335,7 +335,9 @@ enum req_flag_bits {
>  
>  	/* command specific flags for REQ_OP_WRITE_ZEROES: */
>  	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
> -
> +	__REQ_NOZERO,		/* only notify about allocated blocks,
> +				 * and do not actual zero them
> +				 */
>  	__REQ_HIPRI,
>  
>  	/* for driver use */
> @@ -362,6 +364,7 @@ enum req_flag_bits {
>  #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
>  
>  #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
> +#define REQ_NOZERO		(1ULL << __REQ_NOZERO)
>  #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
>  
>  #define REQ_DRV			(1ULL << __REQ_DRV)
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4cd69552df9a..f4ec5db64432 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -336,6 +336,7 @@ struct queue_limits {
>  	unsigned int		max_hw_discard_sectors;
>  	unsigned int		max_write_same_sectors;
>  	unsigned int		max_write_zeroes_sectors;
> +	unsigned int		max_allocate_sectors;
>  	unsigned int		discard_granularity;
>  	unsigned int		discard_alignment;
>  
> @@ -988,9 +989,19 @@ static inline struct bio_vec req_bvec(struct request *rq)
>  	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
>  }
>  
> +static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
> +		struct request_queue *q, unsigned int op_flags)
> +{
> +	if (op_flags & REQ_NOZERO)
> +		return q->limits.max_allocate_sectors;
> +	return q->limits.max_write_zeroes_sectors;
> +}
> +

And this one.
Also, should we consider other code path used q->limits.max_write_zeroes_sectors?

Regards,
Bob

>  static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
> -						     int op)
> +						     unsigned int op_flags)
>  {
> +	int op = op_flags & REQ_OP_MASK;
> +
>  	if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE))
>  		return min(q->limits.max_discard_sectors,
>  			   UINT_MAX >> SECTOR_SHIFT);
> @@ -999,7 +1010,7 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
>  		return q->limits.max_write_same_sectors;
>  
>  	if (unlikely(op == REQ_OP_WRITE_ZEROES))
> -		return q->limits.max_write_zeroes_sectors;
> +		return blk_queue_get_max_write_zeroes_sectors(q, op_flags);
>  
>  	return q->limits.max_sectors;
>  }
> @@ -1029,10 +1040,10 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
>  	if (!q->limits.chunk_sectors ||
>  	    req_op(rq) == REQ_OP_DISCARD ||
>  	    req_op(rq) == REQ_OP_SECURE_ERASE)
> -		return blk_queue_get_max_sectors(q, req_op(rq));
> +		return blk_queue_get_max_sectors(q, rq->cmd_flags);
>  
>  	return min(blk_max_size_offset(q, offset),
> -			blk_queue_get_max_sectors(q, req_op(rq)));
> +			blk_queue_get_max_sectors(q, rq->cmd_flags));
>  }
>  
>  static inline unsigned int blk_rq_count_bios(struct request *rq)
> @@ -1078,6 +1089,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q,
>  		unsigned int max_write_same_sectors);
>  extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>  		unsigned int max_write_same_sectors);
> +extern void blk_queue_max_allocate_sectors(struct request_queue *q,
> +		unsigned int max_allocate_sectors);
>  extern void blk_queue_logical_block_size(struct request_queue *, unsigned short);
>  extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
>  extern void blk_queue_alignment_offset(struct request_queue *q,
> @@ -1219,6 +1232,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> +#define BLKDEV_ZERO_ALLOCATE	(1 << 2)  /* allocate range of blocks */
>  
>  extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
> @@ -1423,10 +1437,13 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
>  
> -	if (q)
> -		return q->limits.max_write_zeroes_sectors;
> +	if (!q)
> +		return 0;
>  
> -	return 0;
> +	if (flags & BLKDEV_ZERO_ALLOCATE)
> +		return q->limits.max_allocate_sectors;
> +	else
> +		return q->limits.max_write_zeroes_sectors;
>  }
>  
>  static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-19  1:50   ` Bob Liu
@ 2020-01-20 10:02     ` Kirill Tkhai
  2020-01-21  6:13       ` Bob Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-20 10:02 UTC (permalink / raw)
  To: Bob Liu, linux-block, linux-kernel, martin.petersen, axboe,
	tytso, adilger.kernel, Chaitanya.Kulkarni, darrick.wong,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence

On 19.01.2020 04:50, Bob Liu wrote:
> On 1/16/20 8:36 PM, Kirill Tkhai wrote:
>> This adds support for REQ_NOZERO extension of REQ_OP_WRITE_ZEROES
>> operation, which encourages a block device driver to just allocate
>> blocks (or mark them allocated) instead of actual blocks zeroing.
>> REQ_NOZERO is aimed to be used for network filesystems providing
>> a block device interface. Also, block devices, which map a file
>> on other filesystem (like loop), may use this for less fragmentation
>> and batching fallocate() requests. Hypervisors like QEMU may
>> introduce optimizations of clusters allocations based on this.
>>
>> BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
>> blkdev_issue_zeroout().
>>> CC: Martin K. Petersen <martin.petersen@oracle.com>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  block/blk-core.c          |    6 +++---
>>  block/blk-lib.c           |   17 ++++++++++-------
>>  block/blk-merge.c         |    9 ++++++---
>>  block/blk-settings.c      |   17 +++++++++++++++++
>>  fs/block_dev.c            |    4 ++++
>>  include/linux/blk_types.h |    5 ++++-
>>  include/linux/blkdev.h    |   31 ++++++++++++++++++++++++-------
>>  7 files changed, 68 insertions(+), 21 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 50a5de025d5e..2edcd55624f1 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -978,7 +978,7 @@ generic_make_request_checks(struct bio *bio)
>>  			goto not_supported;
>>  		break;
>>  	case REQ_OP_WRITE_ZEROES:
>> -		if (!q->limits.max_write_zeroes_sectors)
>> +		if (!blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf))
>>  			goto not_supported;
>>  		break;
>>  	default:
>> @@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
>>  static int blk_cloned_rq_check_limits(struct request_queue *q,
>>  				      struct request *rq)
>>  {
>> -	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
>> +	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
>>  		printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
>>  			__func__, blk_rq_sectors(rq),
>> -			blk_queue_get_max_sectors(q, req_op(rq)));
>> +			blk_queue_get_max_sectors(q, rq->cmd_flags));
>>  		return -EIO;
>>  	}
>>  
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 3e38c93cfc53..3e80279eb029 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>  		struct bio **biop, unsigned flags)
>>  {
>>  	struct bio *bio = *biop;
>> -	unsigned int max_write_zeroes_sectors;
>> +	unsigned int max_write_zeroes_sectors, req_flags = 0;
>>  	struct request_queue *q = bdev_get_queue(bdev);
>>  
>>  	if (!q)
>> @@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>  		return -EPERM;
>>  
>>  	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
>> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
>> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
>>  
>>  	if (max_write_zeroes_sectors == 0)
>>  		return -EOPNOTSUPP;
>>  
>> +	if (flags & BLKDEV_ZERO_NOUNMAP)
>> +		req_flags |= REQ_NOUNMAP;
>> +	if (flags & BLKDEV_ZERO_ALLOCATE)
>> +		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
>> +
>>  	while (nr_sects) {
>>  		bio = blk_next_bio(bio, 0, gfp_mask);
>>  		bio->bi_iter.bi_sector = sector;
>>  		bio_set_dev(bio, bdev);
>> -		bio->bi_opf = REQ_OP_WRITE_ZEROES;
>> -		if (flags & BLKDEV_ZERO_NOUNMAP)
>> -			bio->bi_opf |= REQ_NOUNMAP;
>> +		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
>>  
>>  		if (nr_sects > max_write_zeroes_sectors) {
>>  			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
>> @@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>  	sector_t bs_mask;
>>  	struct bio *bio;
>>  	struct blk_plug plug;
>> -	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
>> +	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
>>  
>>  	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>>  	if ((sector | nr_sects) & bs_mask)
>> @@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>  			try_write_zeroes = false;
>>  			goto retry;
>>  		}
>> -		if (!bdev_write_zeroes_sectors(bdev, 0)) {
>> +		if (!bdev_write_zeroes_sectors(bdev, flags)) {
>>  			/*
>>  			 * Zeroing offload support was indicated, but the
>>  			 * device reported ILLEGAL REQUEST (for some devices
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 347782a24a35..e3ce4b87bbaa 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -105,15 +105,18 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
>>  static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
>>  		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
>>  {
>> +	unsigned int max_sectors;
>> +
>> +	max_sectors = blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf);
>>  	*nsegs = 0;
>>  
>> -	if (!q->limits.max_write_zeroes_sectors)
>> +	if (!max_sectors)
>>  		return NULL;
>>  
>> -	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
>> +	if (bio_sectors(bio) <= max_sectors)
>>  		return NULL;
>>  
>> -	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
>> +	return bio_split(bio, max_sectors, GFP_NOIO, bs);
>>  }
>>  
>>  static struct bio *blk_bio_write_same_split(struct request_queue *q,
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 5f6dcc7a47bd..f682374c5106 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>>  	lim->chunk_sectors = 0;
>>  	lim->max_write_same_sectors = 0;
>>  	lim->max_write_zeroes_sectors = 0;
>> +	lim->max_allocate_sectors = 0;
>>  	lim->max_discard_sectors = 0;
>>  	lim->max_hw_discard_sectors = 0;
>>  	lim->discard_granularity = 0;
>> @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>>  	lim->max_dev_sectors = UINT_MAX;
>>  	lim->max_write_same_sectors = UINT_MAX;
>>  	lim->max_write_zeroes_sectors = UINT_MAX;
>> +	lim->max_allocate_sectors = UINT_MAX;
>>  }
>>  EXPORT_SYMBOL(blk_set_stacking_limits);
>>  
>> @@ -257,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>>  }
>>  EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>>  
>> +/**
>> + * blk_queue_max_allocate_sectors - set max sectors for a single
>> + *                                  allocate request
>> + * @q:  the request queue for the device
>> + * @max_allocate_sectors: maximum number of sectors to write per command
>> + **/
>> +void blk_queue_max_allocate_sectors(struct request_queue *q,
>> +		unsigned int max_allocate_sectors)
>> +{
>> +	q->limits.max_allocate_sectors = max_allocate_sectors;
>> +}
>> +EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
>> +
> 
> I'd suggest split this to a separated patch.

Yeah, this function is used in [3/3] only, so in may go after this patch [2/3] as a separate patch.

>>  /**
>>   * blk_queue_max_segments - set max hw segments for a request for this queue
>>   * @q:  the request queue for the device
>> @@ -506,6 +521,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>  					b->max_write_same_sectors);
>>  	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
>>  					b->max_write_zeroes_sectors);
>> +	t->max_allocate_sectors = min(t->max_allocate_sectors,
>> +					b->max_allocate_sectors);
>>  	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>>  
>>  	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 69bf2fb6f7cd..1ffef894b3bd 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>>  		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>>  					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
>>  		break;
>> +	case FALLOC_FL_KEEP_SIZE:
>> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>> +			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
>> +		break;
>>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
>>  		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>>  					     GFP_KERNEL, 0);
>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>> index 70254ae11769..9ed166860099 100644
>> --- a/include/linux/blk_types.h
>> +++ b/include/linux/blk_types.h
>> @@ -335,7 +335,9 @@ enum req_flag_bits {
>>  
>>  	/* command specific flags for REQ_OP_WRITE_ZEROES: */
>>  	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
>> -
>> +	__REQ_NOZERO,		/* only notify about allocated blocks,
>> +				 * and do not actual zero them
>> +				 */
>>  	__REQ_HIPRI,
>>  
>>  	/* for driver use */
>> @@ -362,6 +364,7 @@ enum req_flag_bits {
>>  #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
>>  
>>  #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
>> +#define REQ_NOZERO		(1ULL << __REQ_NOZERO)
>>  #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
>>  
>>  #define REQ_DRV			(1ULL << __REQ_DRV)
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 4cd69552df9a..f4ec5db64432 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -336,6 +336,7 @@ struct queue_limits {
>>  	unsigned int		max_hw_discard_sectors;
>>  	unsigned int		max_write_same_sectors;
>>  	unsigned int		max_write_zeroes_sectors;
>> +	unsigned int		max_allocate_sectors;
>>  	unsigned int		discard_granularity;
>>  	unsigned int		discard_alignment;
>>  
>> @@ -988,9 +989,19 @@ static inline struct bio_vec req_bvec(struct request *rq)
>>  	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
>>  }
>>  
>> +static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
>> +		struct request_queue *q, unsigned int op_flags)
>> +{
>> +	if (op_flags & REQ_NOZERO)
>> +		return q->limits.max_allocate_sectors;
>> +	return q->limits.max_write_zeroes_sectors;
>> +}
>> +
> 
> And this one.

It looks it won't be good, since this will require to declare REQ_NOZERO
in a separate patch. This will tear off the flag declaration from the logic.

> Also, should we consider other code path used q->limits.max_write_zeroes_sectors?

Other code paths should not dereference q->limits.max_allocate_sectors, unless
it is directly set in not-zero value. In case of max_allocate_sectors is zero,
high-level primitives (generic_make_request_checks(), __blkdev_issue_write_zeroes(), ..)
complete such the bios immediately. Other drivers may need additional work
to support this, and really only subset of drivers need support of this, so this is
not a subject of this patchset.

Hm, it looks like there is an exception, which may inherit stack limits from children.
Device-mapper will pick all the limits we enable for children.
We may disable REQ_WRITE_ZEROES|REQ_NOZERO directly there, since it's not supported
in this driver yet.

Are you hinting at this here?

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 0a2cc197f62b..b8aa5f6f9ce1 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -489,6 +489,7 @@ static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
 		       (unsigned long long) start << SECTOR_SHIFT);
 
 	limits->zoned = blk_queue_zoned_model(q);
+	limits->max_allocate_sectors = 0;
 
 	return 0;
 }
@@ -1548,6 +1549,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
 			       dm_device_name(table->md),
 			       (unsigned long long) ti->begin,
 			       (unsigned long long) ti->len);
+		limits->max_allocate_sectors = 0;
 
 		/*
 		 * FIXME: this should likely be moved to blk_stack_limits(), would

Thanks,
Kirill

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors()
  2020-01-19  1:46   ` Bob Liu
@ 2020-01-20 11:11     ` Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-20 11:11 UTC (permalink / raw)
  To: Bob Liu, linux-block, linux-kernel, martin.petersen, axboe,
	tytso, adilger.kernel, Chaitanya.Kulkarni, darrick.wong,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence

On 19.01.2020 04:46, Bob Liu wrote:
> On 1/16/20 8:35 PM, Kirill Tkhai wrote:
>> This is a preparation for next patch, which introduces
>> a new flag BLKDEV_ZERO_ALLOCATE for calls, which need
>> only allocation of blocks and don't need actual blocks
>> zeroing.
>>
>> CC: Martin K. Petersen <martin.petersen@oracle.com>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  block/blk-lib.c                     |    6 +++---
>>  drivers/md/dm-kcopyd.c              |    2 +-
>>  drivers/target/target_core_iblock.c |    4 ++--
>>  include/linux/blkdev.h              |    3 ++-
>>  4 files changed, 8 insertions(+), 7 deletions(-)
>>
> 
> Looks good to me.
> Reviewed-by: Bob Liu <bob.liu@oracle.com>

Thanks, Bob.
 
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 5f2c429d4378..3e38c93cfc53 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -224,7 +224,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>  		return -EPERM;
>>  
>>  	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
>> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
>> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
>>  
>>  	if (max_write_zeroes_sectors == 0)
>>  		return -EOPNOTSUPP;
>> @@ -362,7 +362,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>  	sector_t bs_mask;
>>  	struct bio *bio;
>>  	struct blk_plug plug;
>> -	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
>> +	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
>>  
>>  	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>>  	if ((sector | nr_sects) & bs_mask)
>> @@ -391,7 +391,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>  			try_write_zeroes = false;
>>  			goto retry;
>>  		}
>> -		if (!bdev_write_zeroes_sectors(bdev)) {
>> +		if (!bdev_write_zeroes_sectors(bdev, 0)) {
>>  			/*
>>  			 * Zeroing offload support was indicated, but the
>>  			 * device reported ILLEGAL REQUEST (for some devices
>> diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
>> index 1bbe4a34ef4c..f1b8e7926dd4 100644
>> --- a/drivers/md/dm-kcopyd.c
>> +++ b/drivers/md/dm-kcopyd.c
>> @@ -831,7 +831,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
>>  		 */
>>  		job->rw = REQ_OP_WRITE_ZEROES;
>>  		for (i = 0; i < job->num_dests; i++)
>> -			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
>> +			if (!bdev_write_zeroes_sectors(job->dests[i].bdev, 0)) {
>>  				job->rw = WRITE;
>>  				break;
>>  			}
>> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
>> index 51ffd5c002de..73a63e197bf5 100644
>> --- a/drivers/target/target_core_iblock.c
>> +++ b/drivers/target/target_core_iblock.c
>> @@ -117,7 +117,7 @@ static int iblock_configure_device(struct se_device *dev)
>>  	 * Enable write same emulation for IBLOCK and use 0xFFFF as
>>  	 * the smaller WRITE_SAME(10) only has a two-byte block count.
>>  	 */
>> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd);
>> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd, 0);
>>  	if (max_write_zeroes_sectors)
>>  		dev->dev_attrib.max_write_same_len = max_write_zeroes_sectors;
>>  	else
>> @@ -468,7 +468,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
>>  		return TCM_INVALID_CDB_FIELD;
>>  	}
>>  
>> -	if (bdev_write_zeroes_sectors(bdev)) {
>> +	if (bdev_write_zeroes_sectors(bdev, 0)) {
>>  		if (!iblock_execute_zero_out(bdev, cmd))
>>  			return 0;
>>  	}
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index c45779f00cbd..4cd69552df9a 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -1418,7 +1418,8 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
>>  	return 0;
>>  }
>>  
>> -static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
>> +static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
>> +						     unsigned int flags)
>>  {
>>  	struct request_queue *q = bdev_get_queue(bdev);
>>  
>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors()
  2020-01-16 12:35 ` [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
  2020-01-19  1:46   ` Bob Liu
@ 2020-01-21  5:45   ` Martin K. Petersen
  1 sibling, 0 replies; 15+ messages in thread
From: Martin K. Petersen @ 2020-01-21  5:45 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-block, linux-kernel, martin.petersen, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence


Kirill,

> This is a preparation for next patch, which introduces a new flag
> BLKDEV_ZERO_ALLOCATE for calls, which need only allocation of blocks
> and don't need actual blocks zeroing.

Path of least resistance, I guess.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-20 10:02     ` Kirill Tkhai
@ 2020-01-21  6:13       ` Bob Liu
  2020-01-21  9:45         ` Kirill Tkhai
  0 siblings, 1 reply; 15+ messages in thread
From: Bob Liu @ 2020-01-21  6:13 UTC (permalink / raw)
  To: Kirill Tkhai, linux-block, linux-kernel, martin.petersen, axboe,
	tytso, adilger.kernel, Chaitanya.Kulkarni, darrick.wong,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence

On 1/20/20 6:02 PM, Kirill Tkhai wrote:
> On 19.01.2020 04:50, Bob Liu wrote:
>> On 1/16/20 8:36 PM, Kirill Tkhai wrote:
>>> This adds support for REQ_NOZERO extension of REQ_OP_WRITE_ZEROES
>>> operation, which encourages a block device driver to just allocate
>>> blocks (or mark them allocated) instead of actual blocks zeroing.
>>> REQ_NOZERO is aimed to be used for network filesystems providing
>>> a block device interface. Also, block devices, which map a file
>>> on other filesystem (like loop), may use this for less fragmentation
>>> and batching fallocate() requests. Hypervisors like QEMU may
>>> introduce optimizations of clusters allocations based on this.
>>>
>>> BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
>>> blkdev_issue_zeroout().
>>>> CC: Martin K. Petersen <martin.petersen@oracle.com>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>> ---
>>>  block/blk-core.c          |    6 +++---
>>>  block/blk-lib.c           |   17 ++++++++++-------
>>>  block/blk-merge.c         |    9 ++++++---
>>>  block/blk-settings.c      |   17 +++++++++++++++++
>>>  fs/block_dev.c            |    4 ++++
>>>  include/linux/blk_types.h |    5 ++++-
>>>  include/linux/blkdev.h    |   31 ++++++++++++++++++++++++-------
>>>  7 files changed, 68 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 50a5de025d5e..2edcd55624f1 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -978,7 +978,7 @@ generic_make_request_checks(struct bio *bio)
>>>  			goto not_supported;
>>>  		break;
>>>  	case REQ_OP_WRITE_ZEROES:
>>> -		if (!q->limits.max_write_zeroes_sectors)
>>> +		if (!blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf))
>>>  			goto not_supported;
>>>  		break;
>>>  	default:
>>> @@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
>>>  static int blk_cloned_rq_check_limits(struct request_queue *q,
>>>  				      struct request *rq)
>>>  {
>>> -	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
>>> +	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
>>>  		printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
>>>  			__func__, blk_rq_sectors(rq),
>>> -			blk_queue_get_max_sectors(q, req_op(rq)));
>>> +			blk_queue_get_max_sectors(q, rq->cmd_flags));
>>>  		return -EIO;
>>>  	}
>>>  
>>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>>> index 3e38c93cfc53..3e80279eb029 100644
>>> --- a/block/blk-lib.c
>>> +++ b/block/blk-lib.c
>>> @@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>>  		struct bio **biop, unsigned flags)
>>>  {
>>>  	struct bio *bio = *biop;
>>> -	unsigned int max_write_zeroes_sectors;
>>> +	unsigned int max_write_zeroes_sectors, req_flags = 0;
>>>  	struct request_queue *q = bdev_get_queue(bdev);
>>>  
>>>  	if (!q)
>>> @@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>>  		return -EPERM;
>>>  
>>>  	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
>>> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
>>> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
>>>  
>>>  	if (max_write_zeroes_sectors == 0)
>>>  		return -EOPNOTSUPP;
>>>  
>>> +	if (flags & BLKDEV_ZERO_NOUNMAP)
>>> +		req_flags |= REQ_NOUNMAP;
>>> +	if (flags & BLKDEV_ZERO_ALLOCATE)
>>> +		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
>>> +
>>>  	while (nr_sects) {
>>>  		bio = blk_next_bio(bio, 0, gfp_mask);
>>>  		bio->bi_iter.bi_sector = sector;
>>>  		bio_set_dev(bio, bdev);
>>> -		bio->bi_opf = REQ_OP_WRITE_ZEROES;
>>> -		if (flags & BLKDEV_ZERO_NOUNMAP)
>>> -			bio->bi_opf |= REQ_NOUNMAP;
>>> +		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
>>>  
>>>  		if (nr_sects > max_write_zeroes_sectors) {
>>>  			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
>>> @@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>>  	sector_t bs_mask;
>>>  	struct bio *bio;
>>>  	struct blk_plug plug;
>>> -	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
>>> +	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
>>>  
>>>  	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>>>  	if ((sector | nr_sects) & bs_mask)
>>> @@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>>  			try_write_zeroes = false;
>>>  			goto retry;
>>>  		}
>>> -		if (!bdev_write_zeroes_sectors(bdev, 0)) {
>>> +		if (!bdev_write_zeroes_sectors(bdev, flags)) {
>>>  			/*
>>>  			 * Zeroing offload support was indicated, but the
>>>  			 * device reported ILLEGAL REQUEST (for some devices
>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>> index 347782a24a35..e3ce4b87bbaa 100644
>>> --- a/block/blk-merge.c
>>> +++ b/block/blk-merge.c
>>> @@ -105,15 +105,18 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
>>>  static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
>>>  		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
>>>  {
>>> +	unsigned int max_sectors;
>>> +
>>> +	max_sectors = blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf);
>>>  	*nsegs = 0;
>>>  
>>> -	if (!q->limits.max_write_zeroes_sectors)
>>> +	if (!max_sectors)
>>>  		return NULL;
>>>  
>>> -	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
>>> +	if (bio_sectors(bio) <= max_sectors)
>>>  		return NULL;
>>>  
>>> -	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
>>> +	return bio_split(bio, max_sectors, GFP_NOIO, bs);
>>>  }
>>>  
>>>  static struct bio *blk_bio_write_same_split(struct request_queue *q,
>>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>>> index 5f6dcc7a47bd..f682374c5106 100644
>>> --- a/block/blk-settings.c
>>> +++ b/block/blk-settings.c
>>> @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>>>  	lim->chunk_sectors = 0;
>>>  	lim->max_write_same_sectors = 0;
>>>  	lim->max_write_zeroes_sectors = 0;
>>> +	lim->max_allocate_sectors = 0;
>>>  	lim->max_discard_sectors = 0;
>>>  	lim->max_hw_discard_sectors = 0;
>>>  	lim->discard_granularity = 0;
>>> @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>>>  	lim->max_dev_sectors = UINT_MAX;
>>>  	lim->max_write_same_sectors = UINT_MAX;
>>>  	lim->max_write_zeroes_sectors = UINT_MAX;
>>> +	lim->max_allocate_sectors = UINT_MAX;
>>>  }
>>>  EXPORT_SYMBOL(blk_set_stacking_limits);
>>>  
>>> @@ -257,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>>>  }
>>>  EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>>>  
>>> +/**
>>> + * blk_queue_max_allocate_sectors - set max sectors for a single
>>> + *                                  allocate request
>>> + * @q:  the request queue for the device
>>> + * @max_allocate_sectors: maximum number of sectors to write per command
>>> + **/
>>> +void blk_queue_max_allocate_sectors(struct request_queue *q,
>>> +		unsigned int max_allocate_sectors)
>>> +{
>>> +	q->limits.max_allocate_sectors = max_allocate_sectors;
>>> +}
>>> +EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
>>> +
>>
>> I'd suggest split this to a separated patch.
> 
> Yeah, this function is used in [3/3] only, so in may go after this patch [2/3] as a separate patch.
> 
>>>  /**
>>>   * blk_queue_max_segments - set max hw segments for a request for this queue
>>>   * @q:  the request queue for the device
>>> @@ -506,6 +521,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>>  					b->max_write_same_sectors);
>>>  	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
>>>  					b->max_write_zeroes_sectors);
>>> +	t->max_allocate_sectors = min(t->max_allocate_sectors,
>>> +					b->max_allocate_sectors);
>>>  	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>>>  
>>>  	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>> index 69bf2fb6f7cd..1ffef894b3bd 100644
>>> --- a/fs/block_dev.c
>>> +++ b/fs/block_dev.c
>>> @@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>>>  		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>>>  					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
>>>  		break;
>>> +	case FALLOC_FL_KEEP_SIZE:
>>> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>>> +			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
>>> +		break;
>>>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
>>>  		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>>>  					     GFP_KERNEL, 0);
>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>> index 70254ae11769..9ed166860099 100644
>>> --- a/include/linux/blk_types.h
>>> +++ b/include/linux/blk_types.h
>>> @@ -335,7 +335,9 @@ enum req_flag_bits {
>>>  
>>>  	/* command specific flags for REQ_OP_WRITE_ZEROES: */
>>>  	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
>>> -
>>> +	__REQ_NOZERO,		/* only notify about allocated blocks,
>>> +				 * and do not actual zero them
>>> +				 */
>>>  	__REQ_HIPRI,
>>>  
>>>  	/* for driver use */
>>> @@ -362,6 +364,7 @@ enum req_flag_bits {
>>>  #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
>>>  
>>>  #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
>>> +#define REQ_NOZERO		(1ULL << __REQ_NOZERO)
>>>  #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
>>>  
>>>  #define REQ_DRV			(1ULL << __REQ_DRV)
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index 4cd69552df9a..f4ec5db64432 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -336,6 +336,7 @@ struct queue_limits {
>>>  	unsigned int		max_hw_discard_sectors;
>>>  	unsigned int		max_write_same_sectors;
>>>  	unsigned int		max_write_zeroes_sectors;
>>> +	unsigned int		max_allocate_sectors;
>>>  	unsigned int		discard_granularity;
>>>  	unsigned int		discard_alignment;
>>>  
>>> @@ -988,9 +989,19 @@ static inline struct bio_vec req_bvec(struct request *rq)
>>>  	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
>>>  }
>>>  
>>> +static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
>>> +		struct request_queue *q, unsigned int op_flags)
>>> +{
>>> +	if (op_flags & REQ_NOZERO)
>>> +		return q->limits.max_allocate_sectors;
>>> +	return q->limits.max_write_zeroes_sectors;
>>> +}
>>> +
>>
>> And this one.
> 
> It looks it won't be good, since this will require to declare REQ_NOZERO
> in a separate patch. This will tear off the flag declaration from the logic.
> 
>> Also, should we consider other code path used q->limits.max_write_zeroes_sectors?
> 
> Other code paths should not dereference q->limits.max_allocate_sectors, unless
> it is directly set in not-zero value. In case of max_allocate_sectors is zero,
> high-level primitives (generic_make_request_checks(), __blkdev_issue_write_zeroes(), ..)
> complete such the bios immediately. Other drivers may need additional work
> to support this, and really only subset of drivers need support of this, so this is
> not a subject of this patchset.
> 
> Hm, it looks like there is an exception, which may inherit stack limits from children.
> Device-mapper will pick all the limits we enable for children.
> We may disable REQ_WRITE_ZEROES|REQ_NOZERO directly there, since it's not supported
> in this driver yet.
> 
> Are you hinting at this here?
> 

Oh, I mean other place references q->limits.max_write_zeroes_sectors.
Like in drivers/md/dm-io.c:
		special_cmd_max_sectors = q->limits.max_write_zeroes_sectors;

Now there should use blk_queue_get_max_write_zeroes_sectors() instead of use
q->limits.max_write_zeroes_sectors directly?

If yes, then I think codes related with function blk_queue_get_max_write_zeroes_sectors() should
also be split to separated patch.

> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 0a2cc197f62b..b8aa5f6f9ce1 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -489,6 +489,7 @@ static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
>  		       (unsigned long long) start << SECTOR_SHIFT);
>  
>  	limits->zoned = blk_queue_zoned_model(q);
> +	limits->max_allocate_sectors = 0;
>  
>  	return 0;
>  }
> @@ -1548,6 +1549,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
>  			       dm_device_name(table->md),
>  			       (unsigned long long) ti->begin,
>  			       (unsigned long long) ti->len);
> +		limits->max_allocate_sectors = 0;
>  
>  		/*
>  		 * FIXME: this should likely be moved to blk_stack_limits(), would
> 
> Thanks,
> Kirill
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-16 12:36 ` [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag Kirill Tkhai
  2020-01-19  1:50   ` Bob Liu
@ 2020-01-21  6:14   ` Martin K. Petersen
  2020-01-21  9:47     ` Kirill Tkhai
  2020-01-31  6:23     ` Christoph Hellwig
  1 sibling, 2 replies; 15+ messages in thread
From: Martin K. Petersen @ 2020-01-21  6:14 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-block, linux-kernel, martin.petersen, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence


Kirill,

> +	if (flags & BLKDEV_ZERO_NOUNMAP)
> +		req_flags |= REQ_NOUNMAP;
> +	if (flags & BLKDEV_ZERO_ALLOCATE)
> +		req_flags |= REQ_NOZERO|REQ_NOUNMAP;

I find there is some dissonance between using BLKDEV_ZERO_ALLOCATE to
describe this operation in one case and REQ_NOZERO in the other.

I understand why not zeroing is important in your case. However, I think
the allocation aspect is semantically more important. Also, in the case
of SCSI, the allocated blocks will typically appear zeroed. So from that
perspective REQ_NOZERO doesn't really make sense. I would really prefer
to use REQ_ALLOCATE to describe this operation. I agree that "do not
write every block" is important too. I just don't have a good suggestion
for how to express that as an additional qualifier to REQ_ALLOCATE_?.

Also, adding to the confusion: In the context of SCSI, ANCHOR requires
UNMAP. So my head hurts a bit when I read REQ_NOZERO|REQ_NOUNMAP and
have to translate that into ANCHOR|UNMAP.

Longer term, I think we should consider introducing REQ_OP_SINGLE_RANGE
or something like that as an umbrella operation that can be used to
describe zeroing, allocating, and other things that operate on a single
LBA range with no payload. Thus removing both the writiness and the
zeroness from the existing REQ_OP_WRITE_ZEROES conduit.

Naming issues aside, your patch looks fine. I'll try to rebase my SCSI
patches on top of your series to see how things fit.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-21  6:13       ` Bob Liu
@ 2020-01-21  9:45         ` Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-21  9:45 UTC (permalink / raw)
  To: Bob Liu, linux-block, linux-kernel, martin.petersen, axboe,
	tytso, adilger.kernel, Chaitanya.Kulkarni, darrick.wong,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence

On 21.01.2020 09:13, Bob Liu wrote:
> On 1/20/20 6:02 PM, Kirill Tkhai wrote:
>> On 19.01.2020 04:50, Bob Liu wrote:
>>> On 1/16/20 8:36 PM, Kirill Tkhai wrote:
>>>> This adds support for REQ_NOZERO extension of REQ_OP_WRITE_ZEROES
>>>> operation, which encourages a block device driver to just allocate
>>>> blocks (or mark them allocated) instead of actual blocks zeroing.
>>>> REQ_NOZERO is aimed to be used for network filesystems providing
>>>> a block device interface. Also, block devices, which map a file
>>>> on other filesystem (like loop), may use this for less fragmentation
>>>> and batching fallocate() requests. Hypervisors like QEMU may
>>>> introduce optimizations of clusters allocations based on this.
>>>>
>>>> BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
>>>> blkdev_issue_zeroout().
>>>>> CC: Martin K. Petersen <martin.petersen@oracle.com>
>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>> ---
>>>>  block/blk-core.c          |    6 +++---
>>>>  block/blk-lib.c           |   17 ++++++++++-------
>>>>  block/blk-merge.c         |    9 ++++++---
>>>>  block/blk-settings.c      |   17 +++++++++++++++++
>>>>  fs/block_dev.c            |    4 ++++
>>>>  include/linux/blk_types.h |    5 ++++-
>>>>  include/linux/blkdev.h    |   31 ++++++++++++++++++++++++-------
>>>>  7 files changed, 68 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index 50a5de025d5e..2edcd55624f1 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -978,7 +978,7 @@ generic_make_request_checks(struct bio *bio)
>>>>  			goto not_supported;
>>>>  		break;
>>>>  	case REQ_OP_WRITE_ZEROES:
>>>> -		if (!q->limits.max_write_zeroes_sectors)
>>>> +		if (!blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf))
>>>>  			goto not_supported;
>>>>  		break;
>>>>  	default:
>>>> @@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
>>>>  static int blk_cloned_rq_check_limits(struct request_queue *q,
>>>>  				      struct request *rq)
>>>>  {
>>>> -	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
>>>> +	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
>>>>  		printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
>>>>  			__func__, blk_rq_sectors(rq),
>>>> -			blk_queue_get_max_sectors(q, req_op(rq)));
>>>> +			blk_queue_get_max_sectors(q, rq->cmd_flags));
>>>>  		return -EIO;
>>>>  	}
>>>>  
>>>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>>>> index 3e38c93cfc53..3e80279eb029 100644
>>>> --- a/block/blk-lib.c
>>>> +++ b/block/blk-lib.c
>>>> @@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>>>  		struct bio **biop, unsigned flags)
>>>>  {
>>>>  	struct bio *bio = *biop;
>>>> -	unsigned int max_write_zeroes_sectors;
>>>> +	unsigned int max_write_zeroes_sectors, req_flags = 0;
>>>>  	struct request_queue *q = bdev_get_queue(bdev);
>>>>  
>>>>  	if (!q)
>>>> @@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>>>>  		return -EPERM;
>>>>  
>>>>  	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
>>>> -	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
>>>> +	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
>>>>  
>>>>  	if (max_write_zeroes_sectors == 0)
>>>>  		return -EOPNOTSUPP;
>>>>  
>>>> +	if (flags & BLKDEV_ZERO_NOUNMAP)
>>>> +		req_flags |= REQ_NOUNMAP;
>>>> +	if (flags & BLKDEV_ZERO_ALLOCATE)
>>>> +		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
>>>> +
>>>>  	while (nr_sects) {
>>>>  		bio = blk_next_bio(bio, 0, gfp_mask);
>>>>  		bio->bi_iter.bi_sector = sector;
>>>>  		bio_set_dev(bio, bdev);
>>>> -		bio->bi_opf = REQ_OP_WRITE_ZEROES;
>>>> -		if (flags & BLKDEV_ZERO_NOUNMAP)
>>>> -			bio->bi_opf |= REQ_NOUNMAP;
>>>> +		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
>>>>  
>>>>  		if (nr_sects > max_write_zeroes_sectors) {
>>>>  			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
>>>> @@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>>>  	sector_t bs_mask;
>>>>  	struct bio *bio;
>>>>  	struct blk_plug plug;
>>>> -	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
>>>> +	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
>>>>  
>>>>  	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>>>>  	if ((sector | nr_sects) & bs_mask)
>>>> @@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>>>  			try_write_zeroes = false;
>>>>  			goto retry;
>>>>  		}
>>>> -		if (!bdev_write_zeroes_sectors(bdev, 0)) {
>>>> +		if (!bdev_write_zeroes_sectors(bdev, flags)) {
>>>>  			/*
>>>>  			 * Zeroing offload support was indicated, but the
>>>>  			 * device reported ILLEGAL REQUEST (for some devices
>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>> index 347782a24a35..e3ce4b87bbaa 100644
>>>> --- a/block/blk-merge.c
>>>> +++ b/block/blk-merge.c
>>>> @@ -105,15 +105,18 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
>>>>  static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
>>>>  		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
>>>>  {
>>>> +	unsigned int max_sectors;
>>>> +
>>>> +	max_sectors = blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf);
>>>>  	*nsegs = 0;
>>>>  
>>>> -	if (!q->limits.max_write_zeroes_sectors)
>>>> +	if (!max_sectors)
>>>>  		return NULL;
>>>>  
>>>> -	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
>>>> +	if (bio_sectors(bio) <= max_sectors)
>>>>  		return NULL;
>>>>  
>>>> -	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
>>>> +	return bio_split(bio, max_sectors, GFP_NOIO, bs);
>>>>  }
>>>>  
>>>>  static struct bio *blk_bio_write_same_split(struct request_queue *q,
>>>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>>>> index 5f6dcc7a47bd..f682374c5106 100644
>>>> --- a/block/blk-settings.c
>>>> +++ b/block/blk-settings.c
>>>> @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>>>>  	lim->chunk_sectors = 0;
>>>>  	lim->max_write_same_sectors = 0;
>>>>  	lim->max_write_zeroes_sectors = 0;
>>>> +	lim->max_allocate_sectors = 0;
>>>>  	lim->max_discard_sectors = 0;
>>>>  	lim->max_hw_discard_sectors = 0;
>>>>  	lim->discard_granularity = 0;
>>>> @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>>>>  	lim->max_dev_sectors = UINT_MAX;
>>>>  	lim->max_write_same_sectors = UINT_MAX;
>>>>  	lim->max_write_zeroes_sectors = UINT_MAX;
>>>> +	lim->max_allocate_sectors = UINT_MAX;
>>>>  }
>>>>  EXPORT_SYMBOL(blk_set_stacking_limits);
>>>>  
>>>> @@ -257,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>>>>  }
>>>>  EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>>>>  
>>>> +/**
>>>> + * blk_queue_max_allocate_sectors - set max sectors for a single
>>>> + *                                  allocate request
>>>> + * @q:  the request queue for the device
>>>> + * @max_allocate_sectors: maximum number of sectors to write per command
>>>> + **/
>>>> +void blk_queue_max_allocate_sectors(struct request_queue *q,
>>>> +		unsigned int max_allocate_sectors)
>>>> +{
>>>> +	q->limits.max_allocate_sectors = max_allocate_sectors;
>>>> +}
>>>> +EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
>>>> +
>>>
>>> I'd suggest split this to a separated patch.
>>
>> Yeah, this function is used in [3/3] only, so in may go after this patch [2/3] as a separate patch.
>>
>>>>  /**
>>>>   * blk_queue_max_segments - set max hw segments for a request for this queue
>>>>   * @q:  the request queue for the device
>>>> @@ -506,6 +521,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>>>  					b->max_write_same_sectors);
>>>>  	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
>>>>  					b->max_write_zeroes_sectors);
>>>> +	t->max_allocate_sectors = min(t->max_allocate_sectors,
>>>> +					b->max_allocate_sectors);
>>>>  	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>>>>  
>>>>  	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
>>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>>> index 69bf2fb6f7cd..1ffef894b3bd 100644
>>>> --- a/fs/block_dev.c
>>>> +++ b/fs/block_dev.c
>>>> @@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>>>>  		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>>>>  					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
>>>>  		break;
>>>> +	case FALLOC_FL_KEEP_SIZE:
>>>> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>>>> +			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
>>>> +		break;
>>>>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
>>>>  		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>>>>  					     GFP_KERNEL, 0);
>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>>> index 70254ae11769..9ed166860099 100644
>>>> --- a/include/linux/blk_types.h
>>>> +++ b/include/linux/blk_types.h
>>>> @@ -335,7 +335,9 @@ enum req_flag_bits {
>>>>  
>>>>  	/* command specific flags for REQ_OP_WRITE_ZEROES: */
>>>>  	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
>>>> -
>>>> +	__REQ_NOZERO,		/* only notify about allocated blocks,
>>>> +				 * and do not actual zero them
>>>> +				 */
>>>>  	__REQ_HIPRI,
>>>>  
>>>>  	/* for driver use */
>>>> @@ -362,6 +364,7 @@ enum req_flag_bits {
>>>>  #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
>>>>  
>>>>  #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
>>>> +#define REQ_NOZERO		(1ULL << __REQ_NOZERO)
>>>>  #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
>>>>  
>>>>  #define REQ_DRV			(1ULL << __REQ_DRV)
>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>> index 4cd69552df9a..f4ec5db64432 100644
>>>> --- a/include/linux/blkdev.h
>>>> +++ b/include/linux/blkdev.h
>>>> @@ -336,6 +336,7 @@ struct queue_limits {
>>>>  	unsigned int		max_hw_discard_sectors;
>>>>  	unsigned int		max_write_same_sectors;
>>>>  	unsigned int		max_write_zeroes_sectors;
>>>> +	unsigned int		max_allocate_sectors;
>>>>  	unsigned int		discard_granularity;
>>>>  	unsigned int		discard_alignment;
>>>>  
>>>> @@ -988,9 +989,19 @@ static inline struct bio_vec req_bvec(struct request *rq)
>>>>  	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
>>>>  }
>>>>  
>>>> +static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
>>>> +		struct request_queue *q, unsigned int op_flags)
>>>> +{
>>>> +	if (op_flags & REQ_NOZERO)
>>>> +		return q->limits.max_allocate_sectors;
>>>> +	return q->limits.max_write_zeroes_sectors;
>>>> +}
>>>> +
>>>
>>> And this one.
>>
>> It looks it won't be good, since this will require to declare REQ_NOZERO
>> in a separate patch. This will tear off the flag declaration from the logic.
>>
>>> Also, should we consider other code path used q->limits.max_write_zeroes_sectors?
>>
>> Other code paths should not dereference q->limits.max_allocate_sectors, unless
>> it is directly set in not-zero value. In case of max_allocate_sectors is zero,
>> high-level primitives (generic_make_request_checks(), __blkdev_issue_write_zeroes(), ..)
>> complete such the bios immediately. Other drivers may need additional work
>> to support this, and really only subset of drivers need support of this, so this is
>> not a subject of this patchset.
>>
>> Hm, it looks like there is an exception, which may inherit stack limits from children.
>> Device-mapper will pick all the limits we enable for children.
>> We may disable REQ_WRITE_ZEROES|REQ_NOZERO directly there, since it's not supported
>> in this driver yet.
>>
>> Are you hinting at this here?
>>
> 
> Oh, I mean other place references q->limits.max_write_zeroes_sectors.
> Like in drivers/md/dm-io.c:
> 		special_cmd_max_sectors = q->limits.max_write_zeroes_sectors;
> 
> Now there should use blk_queue_get_max_write_zeroes_sectors() instead of use
> q->limits.max_write_zeroes_sectors directly?

For dm I don't think so. This primitive is for code simplifying, while
there it won't solve any readability problems. It will be only byte swapping
before real introduction of allocation support for dm. We will use the primitive
for places, which support both plain "write zeroes" and "write zeroes allocates",
while it is not need for other places.

> If yes, then I think codes related with function blk_queue_get_max_write_zeroes_sectors() should
> also be split to separated patch.
> 
>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>> index 0a2cc197f62b..b8aa5f6f9ce1 100644
>> --- a/drivers/md/dm-table.c
>> +++ b/drivers/md/dm-table.c
>> @@ -489,6 +489,7 @@ static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
>>  		       (unsigned long long) start << SECTOR_SHIFT);
>>  
>>  	limits->zoned = blk_queue_zoned_model(q);
>> +	limits->max_allocate_sectors = 0;
>>  
>>  	return 0;
>>  }
>> @@ -1548,6 +1549,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
>>  			       dm_device_name(table->md),
>>  			       (unsigned long long) ti->begin,
>>  			       (unsigned long long) ti->len);
>> +		limits->max_allocate_sectors = 0;
>>  
>>  		/*
>>  		 * FIXME: this should likely be moved to blk_stack_limits(), would

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-21  6:14   ` Martin K. Petersen
@ 2020-01-21  9:47     ` Kirill Tkhai
  2020-01-31  6:23     ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-21  9:47 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-block, linux-kernel, axboe, tytso, adilger.kernel,
	Chaitanya.Kulkarni, darrick.wong, ming.lei, osandov, jthumshirn,
	minwoo.im.dev, damien.lemoal, andrea.parri, hare, tj, ajay.joshi,
	sagi, dsterba, bvanassche, dhowells, asml.silence

On 21.01.2020 09:14, Martin K. Petersen wrote:
> 
> Kirill,
> 
>> +	if (flags & BLKDEV_ZERO_NOUNMAP)
>> +		req_flags |= REQ_NOUNMAP;
>> +	if (flags & BLKDEV_ZERO_ALLOCATE)
>> +		req_flags |= REQ_NOZERO|REQ_NOUNMAP;
> 
> I find there is some dissonance between using BLKDEV_ZERO_ALLOCATE to
> describe this operation in one case and REQ_NOZERO in the other.
> 
> I understand why not zeroing is important in your case. However, I think
> the allocation aspect is semantically more important. Also, in the case
> of SCSI, the allocated blocks will typically appear zeroed. So from that
> perspective REQ_NOZERO doesn't really make sense. I would really prefer
> to use REQ_ALLOCATE to describe this operation. I agree that "do not
> write every block" is important too. I just don't have a good suggestion
> for how to express that as an additional qualifier to REQ_ALLOCATE_?.

No problem, I'll rename the modifier.

> Also, adding to the confusion: In the context of SCSI, ANCHOR requires
> UNMAP. So my head hurts a bit when I read REQ_NOZERO|REQ_NOUNMAP and
> have to translate that into ANCHOR|UNMAP.
> 
> Longer term, I think we should consider introducing REQ_OP_SINGLE_RANGE
> or something like that as an umbrella operation that can be used to
> describe zeroing, allocating, and other things that operate on a single
> LBA range with no payload. Thus removing both the writiness and the
> zeroness from the existing REQ_OP_WRITE_ZEROES conduit.
> 
> Naming issues aside, your patch looks fine. I'll try to rebase my SCSI
> patches on top of your series to see how things fit.

Ok, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-21  6:14   ` Martin K. Petersen
  2020-01-21  9:47     ` Kirill Tkhai
@ 2020-01-31  6:23     ` Christoph Hellwig
  2020-01-31  9:15       ` Kirill Tkhai
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2020-01-31  6:23 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Kirill Tkhai, linux-block, linux-kernel, axboe, tytso,
	adilger.kernel, Chaitanya.Kulkarni, darrick.wong, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence

On Tue, Jan 21, 2020 at 01:14:05AM -0500, Martin K. Petersen wrote:
> I find there is some dissonance between using BLKDEV_ZERO_ALLOCATE to
> describe this operation in one case and REQ_NOZERO in the other.
> 
> I understand why not zeroing is important in your case. However, I think
> the allocation aspect is semantically more important. Also, in the case
> of SCSI, the allocated blocks will typically appear zeroed. So from that
> perspective REQ_NOZERO doesn't really make sense. I would really prefer
> to use REQ_ALLOCATE to describe this operation. I agree that "do not
> write every block" is important too. I just don't have a good suggestion
> for how to express that as an additional qualifier to REQ_ALLOCATE_?.

Agreed.  Nevermind the problem of a REQ_OP_WRITE_ZEROES operations with
a NOZERO flag causing a massive confusion to the reader.

> Also, adding to the confusion: In the context of SCSI, ANCHOR requires
> UNMAP. So my head hurts a bit when I read REQ_NOZERO|REQ_NOUNMAP and
> have to translate that into ANCHOR|UNMAP.
> 
> Longer term, I think we should consider introducing REQ_OP_SINGLE_RANGE
> or something like that as an umbrella operation that can be used to
> describe zeroing, allocating, and other things that operate on a single
> LBA range with no payload. Thus removing both the writiness and the
> zeroness from the existing REQ_OP_WRITE_ZEROES conduit.

What is the benefit of a multipler there?  Given all this flags
confusion I'm almost tempted to just split up REQ_OP_WRITE_ZEROES into
REQ_OP_ALLOCATE ("cheap") and REQ_OP_WRITE_ZEROES ("potentially
expensive") and just let the caller handle the difference.  Everytime
we try to encode semantic differences into flags we're eventually
running into trouble.  Sais the person that added REQ_UNMAP..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag
  2020-01-31  6:23     ` Christoph Hellwig
@ 2020-01-31  9:15       ` Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2020-01-31  9:15 UTC (permalink / raw)
  To: Christoph Hellwig, Martin K. Petersen
  Cc: linux-block, linux-kernel, axboe, tytso, adilger.kernel,
	Chaitanya.Kulkarni, darrick.wong, ming.lei, osandov, jthumshirn,
	minwoo.im.dev, damien.lemoal, andrea.parri, hare, tj, ajay.joshi,
	sagi, dsterba, bvanassche, dhowells, asml.silence

Hi, Christoph,

On 31.01.2020 09:23, Christoph Hellwig wrote:
> On Tue, Jan 21, 2020 at 01:14:05AM -0500, Martin K. Petersen wrote:
>> I find there is some dissonance between using BLKDEV_ZERO_ALLOCATE to
>> describe this operation in one case and REQ_NOZERO in the other.
>>
>> I understand why not zeroing is important in your case. However, I think
>> the allocation aspect is semantically more important. Also, in the case
>> of SCSI, the allocated blocks will typically appear zeroed. So from that
>> perspective REQ_NOZERO doesn't really make sense. I would really prefer
>> to use REQ_ALLOCATE to describe this operation. I agree that "do not
>> write every block" is important too. I just don't have a good suggestion
>> for how to express that as an additional qualifier to REQ_ALLOCATE_?.
> 
> Agreed.  Nevermind the problem of a REQ_OP_WRITE_ZEROES operations with
> a NOZERO flag causing a massive confusion to the reader.
> 
>> Also, adding to the confusion: In the context of SCSI, ANCHOR requires
>> UNMAP. So my head hurts a bit when I read REQ_NOZERO|REQ_NOUNMAP and
>> have to translate that into ANCHOR|UNMAP.
>>
>> Longer term, I think we should consider introducing REQ_OP_SINGLE_RANGE
>> or something like that as an umbrella operation that can be used to
>> describe zeroing, allocating, and other things that operate on a single
>> LBA range with no payload. Thus removing both the writiness and the
>> zeroness from the existing REQ_OP_WRITE_ZEROES conduit.
> 
> What is the benefit of a multipler there?  Given all this flags
> confusion I'm almost tempted to just split up REQ_OP_WRITE_ZEROES into
> REQ_OP_ALLOCATE ("cheap") and REQ_OP_WRITE_ZEROES ("potentially
> expensive") and just let the caller handle the difference.  Everytime
> we try to encode semantic differences into flags we're eventually
> running into trouble.  Sais the person that added REQ_UNMAP..

We started from separated REQ_OP_ASSIGN_RANGE in v1, but then we decided
to use a modifier because this looks better and scatters less over
I/O stack. See "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
to reflect extents allocation in block device internals" series for the details.
(https://lkml.org/lkml/2020/1/7/1616 and neighbouring messages).

Last version of the patchset is v5 and it's here: https://lkml.org/lkml/2020/1/22/643

Kirill

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-01-31  9:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16 12:35 [PATCH block v2 0/3] block: Introduce REQ_NOZERO flag for REQ_OP_WRITE_ZEROES operation Kirill Tkhai
2020-01-16 12:35 ` [PATCH block v2 1/3] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
2020-01-19  1:46   ` Bob Liu
2020-01-20 11:11     ` Kirill Tkhai
2020-01-21  5:45   ` Martin K. Petersen
2020-01-16 12:36 ` [PATCH block v2 2/3] block: Add support for REQ_NOZERO flag Kirill Tkhai
2020-01-19  1:50   ` Bob Liu
2020-01-20 10:02     ` Kirill Tkhai
2020-01-21  6:13       ` Bob Liu
2020-01-21  9:45         ` Kirill Tkhai
2020-01-21  6:14   ` Martin K. Petersen
2020-01-21  9:47     ` Kirill Tkhai
2020-01-31  6:23     ` Christoph Hellwig
2020-01-31  9:15       ` Kirill Tkhai
2020-01-16 12:36 ` [PATCH block v2 3/3] loop: Add support for REQ_NOZERO Kirill Tkhai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).