Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
@ 2020-02-13  7:39 Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 1/6] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

(was "[PATCH block v2 0/3] block: Introduce REQ_NOZERO flag
      for REQ_OP_WRITE_ZEROES operation";
 was "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
      to reflect extents allocation in block device internals")

v7: Two comments changed.

v6: req_op() cosmetic change.

v5: Kill dm|md patch, which disables REQ_ALLOCATE for these devices.
    Disable REQ_ALLOCATE for all stacking devices instead of this.

v4: Correct argument for mddev_check_write_zeroes().

v3: Rename REQ_NOZERO to REQ_ALLOCATE.
    Split helpers to separate patches.
    Add a patch, disabling max_allocate_sectors inheritance for dm.

v2: Introduce new flag for REQ_OP_WRITE_ZEROES instead of
    introduction a new operation as suggested by Martin K. Petersen.
    Removed ext4-related patch to focus on block changes
    for now.

Information about continuous extent placement may be useful
for some block devices. Say, distributed network filesystems,
which provide block device interface, may use this information
for better blocks placement over the nodes in their cluster,
and for better performance. Block devices, which map a file
on another filesystem (loop), may request the same length extent
on underlining filesystem for less fragmentation and for batching
allocation requests. Also, hypervisors like QEMU may use this
information for optimization of cluster allocations.

This patchset introduces REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES,
which makes a block device to allocate blocks instead of actual
blocks zeroing. This may be used for forwarding user's fallocate(0)
requests into block device internals. E.g., in loop driver this
will result in allocation extents in backing-file, so subsequent
write won't fail by the reason of no available space. Distributed
network filesystems will be able to assign specific servers for
specific extents, so subsequent write will be more efficient.

Patches [1-3/6] are preparation on helper functions, patch [4/6]
introduces REQ_ALLOCATE flag and implements all the logic,
patch [5/6] adds one more helper, patch [6/6] adds loop
as the first user of the flag.

Note, that here is only block-related patches, example of usage
for ext4 with a performance numbers may be seen in [1].

[1] https://lore.kernel.org/linux-ext4/157599697369.12112.10138136904533871162.stgit@localhost.localdomain/T/#me5bdd5cc313e14de615d81bea214f355ae975db0
---

Kirill Tkhai (6):
      block: Add @flags argument to bdev_write_zeroes_sectors()
      block: Pass op_flags into blk_queue_get_max_sectors()
      block: Introduce blk_queue_get_max_write_zeroes_sectors()
      block: Add support for REQ_ALLOCATE flag
      block: Add blk_queue_max_allocate_sectors()
      loop: Add support for REQ_ALLOCATE


 block/blk-core.c                    |    6 +++---
 block/blk-lib.c                     |   17 ++++++++++-------
 block/blk-merge.c                   |    9 ++++++---
 block/blk-settings.c                |   17 +++++++++++++++++
 drivers/block/loop.c                |   20 +++++++++++++-------
 drivers/md/dm-kcopyd.c              |    2 +-
 drivers/target/target_core_iblock.c |    4 ++--
 fs/block_dev.c                      |    4 ++++
 include/linux/blk_types.h           |    6 ++++++
 include/linux/blkdev.h              |   34 ++++++++++++++++++++++++++--------
 10 files changed, 88 insertions(+), 31 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 1/6] block: Add @flags argument to bdev_write_zeroes_sectors()
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
@ 2020-02-13  7:39 ` Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 2/6] block: Pass op_flags into blk_queue_get_max_sectors() Kirill Tkhai
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

This is a preparation for next patch, which introduces
a new flag BLKDEV_ZERO_ALLOCATE for calls, which need
only allocation of blocks and don't need actual blocks
zeroing.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c                     |    6 +++---
 drivers/md/dm-kcopyd.c              |    2 +-
 drivers/target/target_core_iblock.c |    4 ++--
 include/linux/blkdev.h              |    3 ++-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..3e38c93cfc53 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -224,7 +224,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		return -EPERM;
 
 	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
-	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
+	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
 
 	if (max_write_zeroes_sectors == 0)
 		return -EOPNOTSUPP;
@@ -362,7 +362,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	sector_t bs_mask;
 	struct bio *bio;
 	struct blk_plug plug;
-	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
+	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
 
 	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
 	if ((sector | nr_sects) & bs_mask)
@@ -391,7 +391,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			try_write_zeroes = false;
 			goto retry;
 		}
-		if (!bdev_write_zeroes_sectors(bdev)) {
+		if (!bdev_write_zeroes_sectors(bdev, 0)) {
 			/*
 			 * Zeroing offload support was indicated, but the
 			 * device reported ILLEGAL REQUEST (for some devices
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 1bbe4a34ef4c..f1b8e7926dd4 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -831,7 +831,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
 		 */
 		job->rw = REQ_OP_WRITE_ZEROES;
 		for (i = 0; i < job->num_dests; i++)
-			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
+			if (!bdev_write_zeroes_sectors(job->dests[i].bdev, 0)) {
 				job->rw = WRITE;
 				break;
 			}
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 51ffd5c002de..73a63e197bf5 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -117,7 +117,7 @@ static int iblock_configure_device(struct se_device *dev)
 	 * Enable write same emulation for IBLOCK and use 0xFFFF as
 	 * the smaller WRITE_SAME(10) only has a two-byte block count.
 	 */
-	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd);
+	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd, 0);
 	if (max_write_zeroes_sectors)
 		dev->dev_attrib.max_write_same_len = max_write_zeroes_sectors;
 	else
@@ -468,7 +468,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
 		return TCM_INVALID_CDB_FIELD;
 	}
 
-	if (bdev_write_zeroes_sectors(bdev)) {
+	if (bdev_write_zeroes_sectors(bdev, 0)) {
 		if (!iblock_execute_zero_out(bdev, cmd))
 			return 0;
 	}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 053ea4b51988..1e3c08854b09 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1417,7 +1417,8 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
 	return 0;
 }
 
-static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
+static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
+						     unsigned int flags)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 2/6] block: Pass op_flags into blk_queue_get_max_sectors()
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 1/6] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
@ 2020-02-13  7:39 ` Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 3/6] block: Introduce blk_queue_get_max_write_zeroes_sectors() Kirill Tkhai
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

This preparation patch changes argument type, and now
the function takes full op_flags instead of just op code.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
---
 block/blk-core.c       |    4 ++--
 include/linux/blkdev.h |    8 +++++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 089e890ab208..28a6d46eb982 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1221,10 +1221,10 @@ EXPORT_SYMBOL(submit_bio);
 static int blk_cloned_rq_check_limits(struct request_queue *q,
 				      struct request *rq)
 {
-	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
+	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
 		printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
 			__func__, blk_rq_sectors(rq),
-			blk_queue_get_max_sectors(q, req_op(rq)));
+			blk_queue_get_max_sectors(q, rq->cmd_flags));
 		return -EIO;
 	}
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1e3c08854b09..f4ec7ae214ab 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -988,8 +988,10 @@ static inline struct bio_vec req_bvec(struct request *rq)
 }
 
 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
-						     int op)
+						     unsigned int op_flags)
 {
+	int op = op_flags & REQ_OP_MASK;
+
 	if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE))
 		return min(q->limits.max_discard_sectors,
 			   UINT_MAX >> SECTOR_SHIFT);
@@ -1028,10 +1030,10 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
 	if (!q->limits.chunk_sectors ||
 	    req_op(rq) == REQ_OP_DISCARD ||
 	    req_op(rq) == REQ_OP_SECURE_ERASE)
-		return blk_queue_get_max_sectors(q, req_op(rq));
+		return blk_queue_get_max_sectors(q, rq->cmd_flags);
 
 	return min(blk_max_size_offset(q, offset),
-			blk_queue_get_max_sectors(q, req_op(rq)));
+			blk_queue_get_max_sectors(q, rq->cmd_flags));
 }
 
 static inline unsigned int blk_rq_count_bios(struct request *rq)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 3/6] block: Introduce blk_queue_get_max_write_zeroes_sectors()
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 1/6] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 2/6] block: Pass op_flags into blk_queue_get_max_sectors() Kirill Tkhai
@ 2020-02-13  7:39 ` Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 4/6] block: Add support for REQ_ALLOCATE flag Kirill Tkhai
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

This introduces a new primitive, which returns max sectors
for REQ_OP_WRITE_ZEROES operation.
@op_flags is unused now, and it will be enabled in next patch.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
---
 block/blk-core.c       |    2 +-
 block/blk-merge.c      |    9 ++++++---
 include/linux/blkdev.h |    8 +++++++-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 28a6d46eb982..c7387b0d69e5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -949,7 +949,7 @@ generic_make_request_checks(struct bio *bio)
 			goto not_supported;
 		break;
 	case REQ_OP_WRITE_ZEROES:
-		if (!q->limits.max_write_zeroes_sectors)
+		if (!blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf))
 			goto not_supported;
 		break;
 	default:
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1534ed736363..467b292bc6e8 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -105,15 +105,18 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
 static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
 		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
 {
+	unsigned int max_sectors;
+
+	max_sectors = blk_queue_get_max_write_zeroes_sectors(q, bio->bi_opf);
 	*nsegs = 0;
 
-	if (!q->limits.max_write_zeroes_sectors)
+	if (!max_sectors)
 		return NULL;
 
-	if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
+	if (bio_sectors(bio) <= max_sectors)
 		return NULL;
 
-	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
+	return bio_split(bio, max_sectors, GFP_NOIO, bs);
 }
 
 static struct bio *blk_bio_write_same_split(struct request_queue *q,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f4ec7ae214ab..55a714161684 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -987,6 +987,12 @@ static inline struct bio_vec req_bvec(struct request *rq)
 	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
 }
 
+static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
+		struct request_queue *q, unsigned int op_flags)
+{
+	return q->limits.max_write_zeroes_sectors;
+}
+
 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 						     unsigned int op_flags)
 {
@@ -1000,7 +1006,7 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 		return q->limits.max_write_same_sectors;
 
 	if (unlikely(op == REQ_OP_WRITE_ZEROES))
-		return q->limits.max_write_zeroes_sectors;
+		return blk_queue_get_max_write_zeroes_sectors(q, op_flags);
 
 	return q->limits.max_sectors;
 }



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 4/6] block: Add support for REQ_ALLOCATE flag
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
                   ` (2 preceding siblings ...)
  2020-02-13  7:39 ` [PATCH v7 3/6] block: Introduce blk_queue_get_max_write_zeroes_sectors() Kirill Tkhai
@ 2020-02-13  7:39 ` Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 5/6] block: Add blk_queue_max_allocate_sectors() Kirill Tkhai
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

This adds support for REQ_ALLOCATE extension of REQ_OP_WRITE_ZEROES
operation, which encourages a block device driver to just allocate
blocks (or mark them allocated) instead of actual blocks zeroing.
REQ_ALLOCATE is aimed to be used for network filesystems providing
a block device interface. Also, block devices, which map a file
on other filesystem (like loop), may use this for less fragmentation
and batching fallocate() requests. Hypervisors like QEMU may
introduce optimizations of clusters allocations based on this.

BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
blkdev_issue_zeroout().

Stacking devices start from zero max_allocate_sectors limit for now,
and the support is going to be implemented separate for each device
in the future.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
---
 block/blk-lib.c           |   17 ++++++++++-------
 block/blk-settings.c      |    4 ++++
 fs/block_dev.c            |    4 ++++
 include/linux/blk_types.h |    6 ++++++
 include/linux/blkdev.h    |   13 ++++++++++---
 5 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3e38c93cfc53..9cd6f86523ba 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		struct bio **biop, unsigned flags)
 {
 	struct bio *bio = *biop;
-	unsigned int max_write_zeroes_sectors;
+	unsigned int max_write_zeroes_sectors, req_flags = 0;
 	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (!q)
@@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		return -EPERM;
 
 	/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
-	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
+	max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
 
 	if (max_write_zeroes_sectors == 0)
 		return -EOPNOTSUPP;
 
+	if (flags & BLKDEV_ZERO_NOUNMAP)
+		req_flags |= REQ_NOUNMAP;
+	if (flags & BLKDEV_ZERO_ALLOCATE)
+		req_flags |= REQ_ALLOCATE|REQ_NOUNMAP;
+
 	while (nr_sects) {
 		bio = blk_next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
-		bio->bi_opf = REQ_OP_WRITE_ZEROES;
-		if (flags & BLKDEV_ZERO_NOUNMAP)
-			bio->bi_opf |= REQ_NOUNMAP;
+		bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
 
 		if (nr_sects > max_write_zeroes_sectors) {
 			bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
@@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	sector_t bs_mask;
 	struct bio *bio;
 	struct blk_plug plug;
-	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
+	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
 
 	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
 	if ((sector | nr_sects) & bs_mask)
@@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			try_write_zeroes = false;
 			goto retry;
 		}
-		if (!bdev_write_zeroes_sectors(bdev, 0)) {
+		if (!bdev_write_zeroes_sectors(bdev, flags)) {
 			/*
 			 * Zeroing offload support was indicated, but the
 			 * device reported ILLEGAL REQUEST (for some devices
diff --git a/block/blk-settings.c b/block/blk-settings.c
index c8eda2e7b91e..8d5df9d37239 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->chunk_sectors = 0;
 	lim->max_write_same_sectors = 0;
 	lim->max_write_zeroes_sectors = 0;
+	lim->max_allocate_sectors = 0;
 	lim->max_discard_sectors = 0;
 	lim->max_hw_discard_sectors = 0;
 	lim->discard_granularity = 0;
@@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_same_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
+	lim->max_allocate_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -506,6 +508,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 					b->max_write_same_sectors);
 	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
 					b->max_write_zeroes_sectors);
+	t->max_allocate_sectors = min(t->max_allocate_sectors,
+					b->max_allocate_sectors);
 	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
 
 	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 69bf2fb6f7cd..1ffef894b3bd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
 		break;
+	case FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+			GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
+		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
 		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
 					     GFP_KERNEL, 0);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 70254ae11769..6f34f5d6edf1 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -335,6 +335,11 @@ enum req_flag_bits {
 
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
+	/*
+	 * Ensure the LBA range is backed by physical storage
+	 * without writing zeroes to the blocks.
+	 */
+	__REQ_ALLOCATE,
 
 	__REQ_HIPRI,
 
@@ -362,6 +367,7 @@ enum req_flag_bits {
 #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
+#define REQ_ALLOCATE		(1ULL << __REQ_ALLOCATE)
 #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 55a714161684..40707f980a2e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -336,6 +336,7 @@ struct queue_limits {
 	unsigned int		max_hw_discard_sectors;
 	unsigned int		max_write_same_sectors;
 	unsigned int		max_write_zeroes_sectors;
+	unsigned int		max_allocate_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 
@@ -990,6 +991,8 @@ static inline struct bio_vec req_bvec(struct request *rq)
 static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
 		struct request_queue *q, unsigned int op_flags)
 {
+	if (op_flags & REQ_ALLOCATE)
+		return q->limits.max_allocate_sectors;
 	return q->limits.max_write_zeroes_sectors;
 }
 
@@ -1226,6 +1229,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
+#define BLKDEV_ZERO_ALLOCATE	(1 << 2)  /* allocate range of blocks */
 
 extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
@@ -1430,10 +1434,13 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 
-	if (q)
-		return q->limits.max_write_zeroes_sectors;
+	if (!q)
+		return 0;
 
-	return 0;
+	if (flags & BLKDEV_ZERO_ALLOCATE)
+		return q->limits.max_allocate_sectors;
+	else
+		return q->limits.max_write_zeroes_sectors;
 }
 
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 5/6] block: Add blk_queue_max_allocate_sectors()
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
                   ` (3 preceding siblings ...)
  2020-02-13  7:39 ` [PATCH v7 4/6] block: Add support for REQ_ALLOCATE flag Kirill Tkhai
@ 2020-02-13  7:39 ` Kirill Tkhai
  2020-02-13  7:39 ` [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE Kirill Tkhai
  2020-02-13  7:55 ` [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
  6 siblings, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

This is a new helper to assign max_allocate_sectors
limit of block device queue.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
---
 block/blk-settings.c   |   13 +++++++++++++
 include/linux/blkdev.h |    2 ++
 2 files changed, 15 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8d5df9d37239..24cf8fbbd125 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -259,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_allocate_sectors - set max sectors for a single
+ *                                  allocate request
+ * @q:  the request queue for the device
+ * @max_allocate_sectors: maximum number of sectors to write per command
+ **/
+void blk_queue_max_allocate_sectors(struct request_queue *q,
+		unsigned int max_allocate_sectors)
+{
+	q->limits.max_allocate_sectors = max_allocate_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
+
 /**
  * blk_queue_max_segments - set max hw segments for a request for this queue
  * @q:  the request queue for the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 40707f980a2e..f5edbfea7b84 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1088,6 +1088,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_allocate_sectors(struct request_queue *q,
+		unsigned int max_allocate_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_alignment_offset(struct request_queue *q,



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
                   ` (4 preceding siblings ...)
  2020-02-13  7:39 ` [PATCH v7 5/6] block: Add blk_queue_max_allocate_sectors() Kirill Tkhai
@ 2020-02-13  7:39 ` Kirill Tkhai
  2020-02-13 18:11   ` Darrick J. Wong
  2020-02-13  7:55 ` [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
  6 siblings, 1 reply; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:39 UTC (permalink / raw)
  To: martin.petersen, bob.liu, axboe, darrick.wong
  Cc: agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	chaitanya.kulkarni, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel, ktkhai

Support for new modifier of REQ_OP_WRITE_ZEROES command.
This results in allocation extents in backing file instead
of actual blocks zeroing.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 drivers/block/loop.c |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..0704167a5aaa 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -581,6 +581,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	return 0;
 }
 
+/* Convert REQ_OP_WRITE_ZEROES modifiers into fallocate mode */
+static unsigned int write_zeroes_to_fallocate_mode(unsigned int flags)
+{
+	if (flags & REQ_ALLOCATE)
+		return 0;
+	if (flags & REQ_NOUNMAP)
+		return FALLOC_FL_ZERO_RANGE;
+	return FALLOC_FL_PUNCH_HOLE;
+}
+
 static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 {
 	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
@@ -599,14 +609,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 	case REQ_OP_FLUSH:
 		return lo_req_flush(lo, rq);
 	case REQ_OP_WRITE_ZEROES:
-		/*
-		 * If the caller doesn't want deallocation, call zeroout to
-		 * write zeroes the range.  Otherwise, punch them out.
-		 */
 		return lo_fallocate(lo, rq, pos,
-			(rq->cmd_flags & REQ_NOUNMAP) ?
-				FALLOC_FL_ZERO_RANGE :
-				FALLOC_FL_PUNCH_HOLE);
+			write_zeroes_to_fallocate_mode(rq->cmd_flags));
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_WRITE:
@@ -877,6 +881,7 @@ static void loop_config_discard(struct loop_device *lo)
 		q->limits.discard_alignment = 0;
 		blk_queue_max_discard_sectors(q, 0);
 		blk_queue_max_write_zeroes_sectors(q, 0);
+		blk_queue_max_allocate_sectors(q, 0);
 		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
 		return;
 	}
@@ -886,6 +891,7 @@ static void loop_config_discard(struct loop_device *lo)
 
 	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
 	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
+	blk_queue_max_allocate_sectors(q, UINT_MAX >> 9);
 	blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
 }
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
                   ` (5 preceding siblings ...)
  2020-02-13  7:39 ` [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE Kirill Tkhai
@ 2020-02-13  7:55 ` Kirill Tkhai
  2020-03-06  9:11   ` Kirill Tkhai
  6 siblings, 1 reply; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13  7:55 UTC (permalink / raw)
  To: axboe
  Cc: martin.petersen, bob.liu, darrick.wong, agk, snitzer, dm-devel,
	song, tytso, adilger.kernel, Chaitanya.Kulkarni, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence, linux-block, linux-kernel

Hi, Jens,

could you please provide some comments on this? I sent v1 two months ago,
and it would be great to know your vision of the functionality and
the approach and whether it is going to go to block tree.

Thanks,
Kirill

On 13.02.2020 10:39, Kirill Tkhai wrote:
> (was "[PATCH block v2 0/3] block: Introduce REQ_NOZERO flag
>       for REQ_OP_WRITE_ZEROES operation";
>  was "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
>       to reflect extents allocation in block device internals")
> 
> v7: Two comments changed.
> 
> v6: req_op() cosmetic change.
> 
> v5: Kill dm|md patch, which disables REQ_ALLOCATE for these devices.
>     Disable REQ_ALLOCATE for all stacking devices instead of this.
> 
> v4: Correct argument for mddev_check_write_zeroes().
> 
> v3: Rename REQ_NOZERO to REQ_ALLOCATE.
>     Split helpers to separate patches.
>     Add a patch, disabling max_allocate_sectors inheritance for dm.
> 
> v2: Introduce new flag for REQ_OP_WRITE_ZEROES instead of
>     introduction a new operation as suggested by Martin K. Petersen.
>     Removed ext4-related patch to focus on block changes
>     for now.
> 
> Information about continuous extent placement may be useful
> for some block devices. Say, distributed network filesystems,
> which provide block device interface, may use this information
> for better blocks placement over the nodes in their cluster,
> and for better performance. Block devices, which map a file
> on another filesystem (loop), may request the same length extent
> on underlining filesystem for less fragmentation and for batching
> allocation requests. Also, hypervisors like QEMU may use this
> information for optimization of cluster allocations.
> 
> This patchset introduces REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES,
> which makes a block device to allocate blocks instead of actual
> blocks zeroing. This may be used for forwarding user's fallocate(0)
> requests into block device internals. E.g., in loop driver this
> will result in allocation extents in backing-file, so subsequent
> write won't fail by the reason of no available space. Distributed
> network filesystems will be able to assign specific servers for
> specific extents, so subsequent write will be more efficient.
> 
> Patches [1-3/6] are preparation on helper functions, patch [4/6]
> introduces REQ_ALLOCATE flag and implements all the logic,
> patch [5/6] adds one more helper, patch [6/6] adds loop
> as the first user of the flag.
> 
> Note, that here is only block-related patches, example of usage
> for ext4 with a performance numbers may be seen in [1].
> 
> [1] https://lore.kernel.org/linux-ext4/157599697369.12112.10138136904533871162.stgit@localhost.localdomain/T/#me5bdd5cc313e14de615d81bea214f355ae975db0
> ---
> 
> Kirill Tkhai (6):
>       block: Add @flags argument to bdev_write_zeroes_sectors()
>       block: Pass op_flags into blk_queue_get_max_sectors()
>       block: Introduce blk_queue_get_max_write_zeroes_sectors()
>       block: Add support for REQ_ALLOCATE flag
>       block: Add blk_queue_max_allocate_sectors()
>       loop: Add support for REQ_ALLOCATE
> 
> 
>  block/blk-core.c                    |    6 +++---
>  block/blk-lib.c                     |   17 ++++++++++-------
>  block/blk-merge.c                   |    9 ++++++---
>  block/blk-settings.c                |   17 +++++++++++++++++
>  drivers/block/loop.c                |   20 +++++++++++++-------
>  drivers/md/dm-kcopyd.c              |    2 +-
>  drivers/target/target_core_iblock.c |    4 ++--
>  fs/block_dev.c                      |    4 ++++
>  include/linux/blk_types.h           |    6 ++++++
>  include/linux/blkdev.h              |   34 ++++++++++++++++++++++++++--------
>  10 files changed, 88 insertions(+), 31 deletions(-)
> 
> --
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE
  2020-02-13  7:39 ` [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE Kirill Tkhai
@ 2020-02-13 18:11   ` Darrick J. Wong
  2020-02-13 20:07     ` Kirill Tkhai
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2020-02-13 18:11 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: martin.petersen, bob.liu, axboe, agk, snitzer, dm-devel, song,
	tytso, adilger.kernel, Chaitanya.Kulkarni, ming.lei, osandov,
	jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri, hare, tj,
	ajay.joshi, sagi, dsterba, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel

On Thu, Feb 13, 2020 at 10:39:35AM +0300, Kirill Tkhai wrote:
> Support for new modifier of REQ_OP_WRITE_ZEROES command.
> This results in allocation extents in backing file instead
> of actual blocks zeroing.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  drivers/block/loop.c |   20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 739b372a5112..0704167a5aaa 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -581,6 +581,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
>  	return 0;
>  }
>  

Urgh, I meant "copy the comment directly to here", e.g.

/*
 * Convert REQ_OP_WRITE_ZEROES modifiers into fallocate mode
 *
 * If the caller requires allocation, select that mode.  If the caller
 * doesn't want deallocation, call zeroout to write zeroes the range.
 * Otherwise, punch out the blocks.
 */

The goal here is to reinforce the notion that FL_ZERO_RANGE is how we
achieve nounmap zeroing...

--D

> +static unsigned int write_zeroes_to_fallocate_mode(unsigned int flags)
> +{
> +	if (flags & REQ_ALLOCATE)
> +		return 0;
> +	if (flags & REQ_NOUNMAP)
> +		return FALLOC_FL_ZERO_RANGE;
> +	return FALLOC_FL_PUNCH_HOLE;
> +}
> +
>  static int do_req_filebacked(struct loop_device *lo, struct request *rq)
>  {
>  	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
> @@ -599,14 +609,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
>  	case REQ_OP_FLUSH:
>  		return lo_req_flush(lo, rq);
>  	case REQ_OP_WRITE_ZEROES:
> -		/*
> -		 * If the caller doesn't want deallocation, call zeroout to
> -		 * write zeroes the range.  Otherwise, punch them out.
> -		 */
>  		return lo_fallocate(lo, rq, pos,
> -			(rq->cmd_flags & REQ_NOUNMAP) ?
> -				FALLOC_FL_ZERO_RANGE :
> -				FALLOC_FL_PUNCH_HOLE);
> +			write_zeroes_to_fallocate_mode(rq->cmd_flags));
>  	case REQ_OP_DISCARD:
>  		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
>  	case REQ_OP_WRITE:
> @@ -877,6 +881,7 @@ static void loop_config_discard(struct loop_device *lo)
>  		q->limits.discard_alignment = 0;
>  		blk_queue_max_discard_sectors(q, 0);
>  		blk_queue_max_write_zeroes_sectors(q, 0);
> +		blk_queue_max_allocate_sectors(q, 0);
>  		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
>  		return;
>  	}
> @@ -886,6 +891,7 @@ static void loop_config_discard(struct loop_device *lo)
>  
>  	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
>  	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
> +	blk_queue_max_allocate_sectors(q, UINT_MAX >> 9);
>  	blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE
  2020-02-13 18:11   ` Darrick J. Wong
@ 2020-02-13 20:07     ` Kirill Tkhai
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-02-13 20:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: martin.petersen, bob.liu, axboe, agk, snitzer, dm-devel, song,
	tytso, adilger.kernel, Chaitanya.Kulkarni, ming.lei, osandov,
	jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri, hare, tj,
	ajay.joshi, sagi, dsterba, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel

On 13.02.2020 21:11, Darrick J. Wong wrote:
> On Thu, Feb 13, 2020 at 10:39:35AM +0300, Kirill Tkhai wrote:
>> Support for new modifier of REQ_OP_WRITE_ZEROES command.
>> This results in allocation extents in backing file instead
>> of actual blocks zeroing.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Reviewed-by: Bob Liu <bob.liu@oracle.com>
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>> ---
>>  drivers/block/loop.c |   20 +++++++++++++-------
>>  1 file changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>> index 739b372a5112..0704167a5aaa 100644
>> --- a/drivers/block/loop.c
>> +++ b/drivers/block/loop.c
>> @@ -581,6 +581,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
>>  	return 0;
>>  }
>>  
> 
> Urgh, I meant "copy the comment directly to here", e.g.
> 
> /*
>  * Convert REQ_OP_WRITE_ZEROES modifiers into fallocate mode
>  *
>  * If the caller requires allocation, select that mode.  If the caller
>  * doesn't want deallocation, call zeroout to write zeroes the range.
>  * Otherwise, punch out the blocks.
>  */
> 
> The goal here is to reinforce the notion that FL_ZERO_RANGE is how we
> achieve nounmap zeroing...

The function is pretty small, and its meaning is easily seen from the name
and description.

I don't think we have to retell every function code line in a separate
sentence, since this distract the attention from really difficult places,
which are really need it.

Kirill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-02-13  7:55 ` [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
@ 2020-03-06  9:11   ` Kirill Tkhai
  2020-03-13 13:08     ` Kirill Tkhai
  0 siblings, 1 reply; 22+ messages in thread
From: Kirill Tkhai @ 2020-03-06  9:11 UTC (permalink / raw)
  To: axboe
  Cc: martin.petersen, bob.liu, darrick.wong, agk, snitzer, dm-devel,
	song, tytso, adilger.kernel, Chaitanya.Kulkarni, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence, linux-block, linux-kernel

ping

On 13.02.2020 10:55, Kirill Tkhai wrote:
> Hi, Jens,
> 
> could you please provide some comments on this? I sent v1 two months ago,
> and it would be great to know your vision of the functionality and
> the approach and whether it is going to go to block tree.
> 
> Thanks,
> Kirill
> 
> On 13.02.2020 10:39, Kirill Tkhai wrote:
>> (was "[PATCH block v2 0/3] block: Introduce REQ_NOZERO flag
>>       for REQ_OP_WRITE_ZEROES operation";
>>  was "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
>>       to reflect extents allocation in block device internals")
>>
>> v7: Two comments changed.
>>
>> v6: req_op() cosmetic change.
>>
>> v5: Kill dm|md patch, which disables REQ_ALLOCATE for these devices.
>>     Disable REQ_ALLOCATE for all stacking devices instead of this.
>>
>> v4: Correct argument for mddev_check_write_zeroes().
>>
>> v3: Rename REQ_NOZERO to REQ_ALLOCATE.
>>     Split helpers to separate patches.
>>     Add a patch, disabling max_allocate_sectors inheritance for dm.
>>
>> v2: Introduce new flag for REQ_OP_WRITE_ZEROES instead of
>>     introduction a new operation as suggested by Martin K. Petersen.
>>     Removed ext4-related patch to focus on block changes
>>     for now.
>>
>> Information about continuous extent placement may be useful
>> for some block devices. Say, distributed network filesystems,
>> which provide block device interface, may use this information
>> for better blocks placement over the nodes in their cluster,
>> and for better performance. Block devices, which map a file
>> on another filesystem (loop), may request the same length extent
>> on underlining filesystem for less fragmentation and for batching
>> allocation requests. Also, hypervisors like QEMU may use this
>> information for optimization of cluster allocations.
>>
>> This patchset introduces REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES,
>> which makes a block device to allocate blocks instead of actual
>> blocks zeroing. This may be used for forwarding user's fallocate(0)
>> requests into block device internals. E.g., in loop driver this
>> will result in allocation extents in backing-file, so subsequent
>> write won't fail by the reason of no available space. Distributed
>> network filesystems will be able to assign specific servers for
>> specific extents, so subsequent write will be more efficient.
>>
>> Patches [1-3/6] are preparation on helper functions, patch [4/6]
>> introduces REQ_ALLOCATE flag and implements all the logic,
>> patch [5/6] adds one more helper, patch [6/6] adds loop
>> as the first user of the flag.
>>
>> Note, that here is only block-related patches, example of usage
>> for ext4 with a performance numbers may be seen in [1].
>>
>> [1] https://lore.kernel.org/linux-ext4/157599697369.12112.10138136904533871162.stgit@localhost.localdomain/T/#me5bdd5cc313e14de615d81bea214f355ae975db0
>> ---
>>
>> Kirill Tkhai (6):
>>       block: Add @flags argument to bdev_write_zeroes_sectors()
>>       block: Pass op_flags into blk_queue_get_max_sectors()
>>       block: Introduce blk_queue_get_max_write_zeroes_sectors()
>>       block: Add support for REQ_ALLOCATE flag
>>       block: Add blk_queue_max_allocate_sectors()
>>       loop: Add support for REQ_ALLOCATE
>>
>>
>>  block/blk-core.c                    |    6 +++---
>>  block/blk-lib.c                     |   17 ++++++++++-------
>>  block/blk-merge.c                   |    9 ++++++---
>>  block/blk-settings.c                |   17 +++++++++++++++++
>>  drivers/block/loop.c                |   20 +++++++++++++-------
>>  drivers/md/dm-kcopyd.c              |    2 +-
>>  drivers/target/target_core_iblock.c |    4 ++--
>>  fs/block_dev.c                      |    4 ++++
>>  include/linux/blk_types.h           |    6 ++++++
>>  include/linux/blkdev.h              |   34 ++++++++++++++++++++++++++--------
>>  10 files changed, 88 insertions(+), 31 deletions(-)
>>
>> --
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Reviewed-by: Bob Liu <bob.liu@oracle.com>
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-06  9:11   ` Kirill Tkhai
@ 2020-03-13 13:08     ` Kirill Tkhai
  2020-03-19 10:28       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Kirill Tkhai @ 2020-03-13 13:08 UTC (permalink / raw)
  To: axboe
  Cc: martin.petersen, bob.liu, darrick.wong, agk, snitzer, dm-devel,
	song, tytso, adilger.kernel, Chaitanya.Kulkarni, ming.lei,
	osandov, jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri,
	hare, tj, ajay.joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence, linux-block, linux-kernel

I just don't understand the reason nothing happens :(
I see newly-sent patches comes fast into block tree.
But there is only silence... I grepped over Documentation,
and there is no special rules about block tree. So,
it looks like standard rules should be applyable.

Some comments? Some requests for reworking? Some personal reasons to ignore my patches?

On 06.03.2020 12:11, Kirill Tkhai wrote:
> ping
> 
> On 13.02.2020 10:55, Kirill Tkhai wrote:
>> Hi, Jens,
>>
>> could you please provide some comments on this? I sent v1 two months ago,
>> and it would be great to know your vision of the functionality and
>> the approach and whether it is going to go to block tree.
>>
>> Thanks,
>> Kirill
>>
>> On 13.02.2020 10:39, Kirill Tkhai wrote:
>>> (was "[PATCH block v2 0/3] block: Introduce REQ_NOZERO flag
>>>       for REQ_OP_WRITE_ZEROES operation";
>>>  was "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
>>>       to reflect extents allocation in block device internals")
>>>
>>> v7: Two comments changed.
>>>
>>> v6: req_op() cosmetic change.
>>>
>>> v5: Kill dm|md patch, which disables REQ_ALLOCATE for these devices.
>>>     Disable REQ_ALLOCATE for all stacking devices instead of this.
>>>
>>> v4: Correct argument for mddev_check_write_zeroes().
>>>
>>> v3: Rename REQ_NOZERO to REQ_ALLOCATE.
>>>     Split helpers to separate patches.
>>>     Add a patch, disabling max_allocate_sectors inheritance for dm.
>>>
>>> v2: Introduce new flag for REQ_OP_WRITE_ZEROES instead of
>>>     introduction a new operation as suggested by Martin K. Petersen.
>>>     Removed ext4-related patch to focus on block changes
>>>     for now.
>>>
>>> Information about continuous extent placement may be useful
>>> for some block devices. Say, distributed network filesystems,
>>> which provide block device interface, may use this information
>>> for better blocks placement over the nodes in their cluster,
>>> and for better performance. Block devices, which map a file
>>> on another filesystem (loop), may request the same length extent
>>> on underlining filesystem for less fragmentation and for batching
>>> allocation requests. Also, hypervisors like QEMU may use this
>>> information for optimization of cluster allocations.
>>>
>>> This patchset introduces REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES,
>>> which makes a block device to allocate blocks instead of actual
>>> blocks zeroing. This may be used for forwarding user's fallocate(0)
>>> requests into block device internals. E.g., in loop driver this
>>> will result in allocation extents in backing-file, so subsequent
>>> write won't fail by the reason of no available space. Distributed
>>> network filesystems will be able to assign specific servers for
>>> specific extents, so subsequent write will be more efficient.
>>>
>>> Patches [1-3/6] are preparation on helper functions, patch [4/6]
>>> introduces REQ_ALLOCATE flag and implements all the logic,
>>> patch [5/6] adds one more helper, patch [6/6] adds loop
>>> as the first user of the flag.
>>>
>>> Note, that here is only block-related patches, example of usage
>>> for ext4 with a performance numbers may be seen in [1].
>>>
>>> [1] https://lore.kernel.org/linux-ext4/157599697369.12112.10138136904533871162.stgit@localhost.localdomain/T/#me5bdd5cc313e14de615d81bea214f355ae975db0
>>> ---
>>>
>>> Kirill Tkhai (6):
>>>       block: Add @flags argument to bdev_write_zeroes_sectors()
>>>       block: Pass op_flags into blk_queue_get_max_sectors()
>>>       block: Introduce blk_queue_get_max_write_zeroes_sectors()
>>>       block: Add support for REQ_ALLOCATE flag
>>>       block: Add blk_queue_max_allocate_sectors()
>>>       loop: Add support for REQ_ALLOCATE
>>>
>>>
>>>  block/blk-core.c                    |    6 +++---
>>>  block/blk-lib.c                     |   17 ++++++++++-------
>>>  block/blk-merge.c                   |    9 ++++++---
>>>  block/blk-settings.c                |   17 +++++++++++++++++
>>>  drivers/block/loop.c                |   20 +++++++++++++-------
>>>  drivers/md/dm-kcopyd.c              |    2 +-
>>>  drivers/target/target_core_iblock.c |    4 ++--
>>>  fs/block_dev.c                      |    4 ++++
>>>  include/linux/blk_types.h           |    6 ++++++
>>>  include/linux/blkdev.h              |   34 ++++++++++++++++++++++++++--------
>>>  10 files changed, 88 insertions(+), 31 deletions(-)
>>>
>>> --
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>> Reviewed-by: Bob Liu <bob.liu@oracle.com>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-13 13:08     ` Kirill Tkhai
@ 2020-03-19 10:28       ` Christoph Hellwig
  2020-03-19 10:42         ` Kirill Tkhai
  2020-03-19 13:03         ` Martin K. Petersen
  0 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-03-19 10:28 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: axboe, martin.petersen, bob.liu, darrick.wong, agk, snitzer,
	dm-devel, song, tytso, adilger.kernel, Chaitanya.Kulkarni,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence, linux-block, linux-kernel

On Fri, Mar 13, 2020 at 04:08:58PM +0300, Kirill Tkhai wrote:
> I just don't understand the reason nothing happens :(
> I see newly-sent patches comes fast into block tree.
> But there is only silence... I grepped over Documentation,
> and there is no special rules about block tree. So,
> it looks like standard rules should be applyable.
> 
> Some comments? Some requests for reworking? Some personal reasons to ignore my patches?

I'm still completely opposed to the magic overloading using a flag.
That is just a bad design waiting for trouble to happen.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-19 10:28       ` Christoph Hellwig
@ 2020-03-19 10:42         ` Kirill Tkhai
  2020-03-19 13:03         ` Martin K. Petersen
  1 sibling, 0 replies; 22+ messages in thread
From: Kirill Tkhai @ 2020-03-19 10:42 UTC (permalink / raw)
  To: Christoph Hellwig, martin.petersen
  Cc: axboe, bob.liu, darrick.wong, agk, snitzer, dm-devel, song,
	tytso, adilger.kernel, Chaitanya.Kulkarni, ming.lei, osandov,
	jthumshirn, minwoo.im.dev, damien.lemoal, andrea.parri, hare, tj,
	ajay.joshi, sagi, dsterba, bvanassche, dhowells, asml.silence,
	linux-block, linux-kernel

On 19.03.2020 13:28, Christoph Hellwig wrote:
> On Fri, Mar 13, 2020 at 04:08:58PM +0300, Kirill Tkhai wrote:
>> I just don't understand the reason nothing happens :(
>> I see newly-sent patches comes fast into block tree.
>> But there is only silence... I grepped over Documentation,
>> and there is no special rules about block tree. So,
>> it looks like standard rules should be applyable.
>>
>> Some comments? Some requests for reworking? Some personal reasons to ignore my patches?
> 
> I'm still completely opposed to the magic overloading using a flag.
> That is just a bad design waiting for trouble to happen.

This flag is suggested by Martin Petersen, while the first version of the patchset was based
on a separate operation.

Since I see only Jens in MAINTAINERS:

BLOCK LAYER
M:      Jens Axboe <axboe@kernel.dk>
L:      linux-block@vger.kernel.org
T:      git git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
S:      Maintained
F:      block/
F:      drivers/block/
F:      kernel/trace/blktrace.c
F:      lib/sbitmap.c

I expect his comments about final design of this, because both you and Martin are maintainers
of another subsystems. I don't want rework this many times until Jens says he wants some third
design.

I think I'm pretty polite and patient in my waiting, while Jens completely ignores me by some
reasons, which are completely unclear for me. I don't think this is completely polite in relation
to me.

Kirill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-19 10:28       ` Christoph Hellwig
  2020-03-19 10:42         ` Kirill Tkhai
@ 2020-03-19 13:03         ` Martin K. Petersen
  2020-03-25 16:26           ` Darrick J. Wong
  1 sibling, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2020-03-19 13:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kirill Tkhai, axboe, martin.petersen, bob.liu, darrick.wong, agk,
	snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	bvanassche, dhowells, asml.silence, linux-block, linux-kernel


Christoph,

>> Some comments? Some requests for reworking? Some personal reasons to
>> ignore my patches?
>
> I'm still completely opposed to the magic overloading using a flag.
> That is just a bad design waiting for trouble to happen.

The observation was that Kirill's original patch set was a line-for-line
carbon copy of the write zeroes handling throughout the stack. The only
difference between the two was at the bottom. Instead of duplicating all
that code it seemed cleaner to use shared plumbing since these
operations need to be split and merged exactly the same way in the block
layer.

Also, we already have REQ_NOUNMAP, not sure why an additional handling
flag would lead to trouble? Note that I suggested renaming
REQ_OP_WRITE_ZEROES to something else to separate the semantics from the
plumbing.

We need to be able to express:

 - zero & allocate block range (REQ_OP_WRITE_ZEROES, REQ_NOUNMAP)
 - zero & deallocate block range (REQ_OP_WRITE_ZEROES, !REQ_NOUNMAP)
 - allocate block range (?, don't care about zeroing)
 - deallocate block range (REQ_OP_DISCARD, don't care about zeroing)

It just seems like a full-fledged REQ_OP_ALLOCATE is an overkill to fill
that gap.

That said, I do think that we have traditionally put emphasis on the
wrong part of these operations. All we ever talk about wrt. discard and
friends is the zeroing aspect. But I actually think that, semantically,
the act of allocating and deallocating blocks is more important. And
that zeroing is an optional second order effect of those operations. So
if we could go back in time and nuke multi-range DSM TRIM/UNMAP, I would
like to have REQ_OP_ALLOCATE/REQ_OP_DEALLOCATE with an optional REQ_ZERO
flag. I think that would be cleaner. I have a much easier time wrapping
my head around "allocate this block and zero it if you can" than "zero
this block and do not deallocate it". But maybe that's just me.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-19 13:03         ` Martin K. Petersen
@ 2020-03-25 16:26           ` Darrick J. Wong
  2020-03-25 16:32             ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2020-03-25 16:26 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Kirill Tkhai, axboe, bob.liu, agk, snitzer,
	dm-devel, song, tytso, adilger.kernel, Chaitanya.Kulkarni,
	ming.lei, osandov, jthumshirn, minwoo.im.dev, damien.lemoal,
	andrea.parri, hare, tj, ajay.joshi, sagi, dsterba, bvanassche,
	dhowells, asml.silence, linux-block, linux-kernel

On Thu, Mar 19, 2020 at 09:03:40AM -0400, Martin K. Petersen wrote:
> 
> Christoph,
> 
> >> Some comments? Some requests for reworking? Some personal reasons to
> >> ignore my patches?
> >
> > I'm still completely opposed to the magic overloading using a flag.
> > That is just a bad design waiting for trouble to happen.
> 
> The observation was that Kirill's original patch set was a line-for-line
> carbon copy of the write zeroes handling throughout the stack. The only
> difference between the two was at the bottom. Instead of duplicating all
> that code it seemed cleaner to use shared plumbing since these
> operations need to be split and merged exactly the same way in the block
> layer.
> 
> Also, we already have REQ_NOUNMAP, not sure why an additional handling
> flag would lead to trouble? Note that I suggested renaming
> REQ_OP_WRITE_ZEROES to something else to separate the semantics from the
> plumbing.
> 
> We need to be able to express:
> 
>  - zero & allocate block range (REQ_OP_WRITE_ZEROES, REQ_NOUNMAP)
>  - zero & deallocate block range (REQ_OP_WRITE_ZEROES, !REQ_NOUNMAP)
>  - allocate block range (?, don't care about zeroing)
>  - deallocate block range (REQ_OP_DISCARD, don't care about zeroing)
> 
> It just seems like a full-fledged REQ_OP_ALLOCATE is an overkill to fill
> that gap.
> 
> That said, I do think that we have traditionally put emphasis on the
> wrong part of these operations. All we ever talk about wrt. discard and
> friends is the zeroing aspect. But I actually think that, semantically,
> the act of allocating and deallocating blocks is more important. And
> that zeroing is an optional second order effect of those operations. So
> if we could go back in time and nuke multi-range DSM TRIM/UNMAP, I would
> like to have REQ_OP_ALLOCATE/REQ_OP_DEALLOCATE with an optional REQ_ZERO
> flag. I think that would be cleaner. I have a much easier time wrapping
> my head around "allocate this block and zero it if you can" than "zero
> this block and do not deallocate it". But maybe that's just me.

I'd love to transition to that.  My brain is not good at following all
the inverse logic that NOUNMAP spread everywhere.  I have a difficult
time following what the blockdev fallocate code does, which is sad since
hch and I are the primary stuckees^Wmeddlers^Wauthors of that function. :/

--D

> -- 
> Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-25 16:26           ` Darrick J. Wong
@ 2020-03-25 16:32             ` Christoph Hellwig
  2020-03-25 17:23               ` Martin K. Petersen
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-03-25 16:32 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Martin K. Petersen, Christoph Hellwig, Kirill Tkhai, axboe,
	bob.liu, agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	bvanassche, dhowells, asml.silence, linux-block, linux-kernel

On Wed, Mar 25, 2020 at 09:26:56AM -0700, Darrick J. Wong wrote:
> > That said, I do think that we have traditionally put emphasis on the
> > wrong part of these operations. All we ever talk about wrt. discard and
> > friends is the zeroing aspect. But I actually think that, semantically,
> > the act of allocating and deallocating blocks is more important. And
> > that zeroing is an optional second order effect of those operations. So
> > if we could go back in time and nuke multi-range DSM TRIM/UNMAP, I would
> > like to have REQ_OP_ALLOCATE/REQ_OP_DEALLOCATE with an optional REQ_ZERO
> > flag. I think that would be cleaner. I have a much easier time wrapping
> > my head around "allocate this block and zero it if you can" than "zero
> > this block and do not deallocate it". But maybe that's just me.
> 
> I'd love to transition to that.  My brain is not good at following all
> the inverse logic that NOUNMAP spread everywhere.  I have a difficult
> time following what the blockdev fallocate code does, which is sad since
> hch and I are the primary stuckees^Wmeddlers^Wauthors of that function. :/

I am very much against that for the following reason:

 - the current REQ_OP_DISCARD is purely a hint, and implementations can
   (and do) choose to ignore it
 - REQ_OP_WRITE_ZEROES is an actual data integrity operation with
   everything that entails

Going back to mixing these two will lead to a disaster sooner or later.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-25 16:32             ` Christoph Hellwig
@ 2020-03-25 17:23               ` Martin K. Petersen
  2020-03-26  9:29                 ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2020-03-25 17:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Martin K. Petersen, Kirill Tkhai, axboe,
	bob.liu, agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	bvanassche, dhowells, asml.silence, linux-block, linux-kernel


Christoph,

> I am very much against that for the following reason:
>
>  - the current REQ_OP_DISCARD is purely a hint, and implementations can
>    (and do) choose to ignore it
>
>  - REQ_OP_WRITE_ZEROES is an actual data integrity operation with
>    everything that entails

If you want to keep emphasis on the "integrity operation" instead of the
provisioning aspect, would you expect REQ_ALLOCATE (which may or may not
zero blocks) to be considered a deterministic operation or a
non-deterministic one? Should this depend on whether the device
guarantees zeroing when provisioning blocks or not?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-25 17:23               ` Martin K. Petersen
@ 2020-03-26  9:29                 ` Christoph Hellwig
  2020-03-26 14:34                   ` Martin K. Petersen
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-03-26  9:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Darrick J. Wong, Kirill Tkhai, axboe, bob.liu,
	agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	bvanassche, dhowells, asml.silence, linux-block, linux-kernel

On Wed, Mar 25, 2020 at 01:23:33PM -0400, Martin K. Petersen wrote:
> 
> Christoph,
> 
> > I am very much against that for the following reason:
> >
> >  - the current REQ_OP_DISCARD is purely a hint, and implementations can
> >    (and do) choose to ignore it
> >
> >  - REQ_OP_WRITE_ZEROES is an actual data integrity operation with
> >    everything that entails
> 
> If you want to keep emphasis on the "integrity operation" instead of the
> provisioning aspect, would you expect REQ_ALLOCATE (which may or may not
> zero blocks) to be considered a deterministic operation or a
> non-deterministic one? Should this depend on whether the device
> guarantees zeroing when provisioning blocks or not?

That's why I don't like the whole flags game very much.  I'd rather
have REQ_OP_WRITE_ZEROES as the integrity operation that gurantees
zeroing, and a REQ_ALLOCATE that doesn't guarantee zeroing, just some
deterministic state of the blocks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-26  9:29                 ` Christoph Hellwig
@ 2020-03-26 14:34                   ` Martin K. Petersen
  2020-03-26 14:45                     ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2020-03-26 14:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Darrick J. Wong, Kirill Tkhai, axboe,
	bob.liu, agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	bvanassche, dhowells, asml.silence, linux-block, linux-kernel


Christoph,

> That's why I don't like the whole flags game very much.  I'd rather
> have REQ_OP_WRITE_ZEROES as the integrity operation that gurantees
> zeroing, and a REQ_ALLOCATE that doesn't guarantee zeroing, just some
> deterministic state of the blocks.

I just worry about the proliferation of identical merging and splitting
code throughout the block stack as we add additional single-range, no
payload operations (Verify, etc.). I prefer to enforce the semantics in
the LLD and not in the plumbing. But I won't object to a separate
REQ_OP_ALLOCATE if you find the resulting code duplication acceptable.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-26 14:34                   ` Martin K. Petersen
@ 2020-03-26 14:45                     ` Christoph Hellwig
  2020-03-26 16:48                       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-03-26 14:45 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Darrick J. Wong, Kirill Tkhai, axboe, bob.liu,
	agk, snitzer, dm-devel, song, tytso, adilger.kernel,
	Chaitanya.Kulkarni, ming.lei, osandov, jthumshirn, minwoo.im.dev,
	damien.lemoal, andrea.parri, hare, tj, ajay.joshi, sagi, dsterba,
	bvanassche, dhowells, asml.silence, linux-block, linux-kernel

On Thu, Mar 26, 2020 at 10:34:42AM -0400, Martin K. Petersen wrote:
> I just worry about the proliferation of identical merging and splitting
> code throughout the block stack as we add additional single-range, no
> payload operations (Verify, etc.). I prefer to enforce the semantics in
> the LLD and not in the plumbing. But I won't object to a separate
> REQ_OP_ALLOCATE if you find the resulting code duplication acceptable.

I find it acceptable for now.  And I think we should find some way
(e.g. by being table driven) to share code between differnet opcodes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES
  2020-03-26 14:45                     ` Christoph Hellwig
@ 2020-03-26 16:48                       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 22+ messages in thread
From: Chaitanya Kulkarni @ 2020-03-26 16:48 UTC (permalink / raw)
  To: hch, Martin K. Petersen
  Cc: Darrick J. Wong, Kirill Tkhai, axboe, bob.liu, agk, snitzer,
	dm-devel, song, tytso, adilger.kernel, ming.lei, osandov,
	jthumshirn, minwoo.im.dev, Damien Le Moal, andrea.parri, hare,
	tj, Ajay Joshi, sagi, dsterba, bvanassche, dhowells,
	asml.silence, linux-block, linux-kernel

On 03/26/2020 07:46 AM, Christoph Hellwig wrote:
> On Thu, Mar 26, 2020 at 10:34:42AM -0400, Martin K. Petersen wrote:
>>> I just worry about the proliferation of identical merging and
>>> splitting code throughout the block stack as we add additional
>>> single-range, no payload operations (Verify, etc.). I prefer to
>>> enforce the semantics in the LLD and not in the plumbing. But I
>>> won't object to a separate REQ_OP_ALLOCATE if you find the
>>> resulting code duplication acceptable.
> I find it acceptable for now.  And I think we should find some way
> (e.g. by being table driven) to share code between differnet
> opcodes.
>

With reference to Martin's comment (verify etc) there is a significant
advantage when using payloadless bio to offload the functionality
to the directly attached device and over the fabrics when dealing
with larger disks.

How about we create a helper which is independent of the operations
can accept req_op and issues the payloadless bios. Something like
following totally untested :-

diff --git a/block/blk-lib.c b/block/blk-lib.c
index cf9e75a730b4..d3fccd3211cc 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -209,6 +209,33 @@ int blkdev_issue_write_same(struct block_device
*bdev, sector_t sector,
  }
  EXPORT_SYMBOL(blkdev_issue_write_same);

+static void __blkdev_issue_payloadless(struct block_device *bdev,
unsigned op,
+               sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+               struct bio **biop, unsigned bio_opf, unsigned int
max_sectors)
+{
+       struct bio *bio = *biop;
+
+       while (nr_sects) {
+               bio = blk_next_bio(bio, 0, gfp_mask);
+               bio->bi_iter.bi_sector = sector;
+               bio_set_dev(bio, bdev);
+               bio->bi_opf = op;
+               bio->bi_opf |= bio_opf;
+
+               if (nr_sects > max_sectors) {
+                       bio->bi_iter.bi_size = max_sectors << 9;
+                       nr_sects -= max_sectors;
+                       sector += max_sectors;
+               } else {
+                       bio->bi_iter.bi_size = nr_sects << 9;
+                       nr_sects = 0;
+               }
+               cond_resched();
+       }
+
+       *biop = bio;
+}
+
  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
                 sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
                 struct bio **biop, unsigned flags)
@@ -216,6 +243,7 @@ static int __blkdev_issue_write_zeroes(struct
block_device *bdev,
         struct bio *bio = *biop;
         unsigned int max_write_zeroes_sectors;
         struct request_queue *q = bdev_get_queue(bdev);
+       unsigned int unmap = (flags & BLKDEV_ZERO_NOUNMAP) ? REQ_NOUNMAP
: 0;

         if (!q)
                 return -ENXIO;
@@ -229,24 +257,8 @@ static int __blkdev_issue_write_zeroes(struct
block_device *bdev,
         if (max_write_zeroes_sectors == 0)
                 return -EOPNOTSUPP;

-       while (nr_sects) {
-               bio = blk_next_bio(bio, 0, gfp_mask);
-               bio->bi_iter.bi_sector = sector;
-               bio_set_dev(bio, bdev);
-               bio->bi_opf = REQ_OP_WRITE_ZEROES;
-               if (flags & BLKDEV_ZERO_NOUNMAP)
-                       bio->bi_opf |= REQ_NOUNMAP;
-
-               if (nr_sects > max_write_zeroes_sectors) {
-                       bio->bi_iter.bi_size = max_write_zeroes_sectors
<< 9;
-                       nr_sects -= max_write_zeroes_sectors;
-                       sector += max_write_zeroes_sectors;
-               } else {
-                       bio->bi_iter.bi_size = nr_sects << 9;
-                       nr_sects = 0;
-               }
-               cond_resched();
-       }
+       __blkdev_issue_payloadless(bdev, REQ_OP_WRITE_ZEROES, sector,
nr_sects,
+                       gfp_mask, biop, unmap, max_write_zeroes_sectors);

         *biop = bio;
         return 0;

I'll be happy to send out a well tested patch based on the above
suggestion or any feedback I get and re-spin this series or OP can
re-spin this series whatever works.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13  7:39 [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
2020-02-13  7:39 ` [PATCH v7 1/6] block: Add @flags argument to bdev_write_zeroes_sectors() Kirill Tkhai
2020-02-13  7:39 ` [PATCH v7 2/6] block: Pass op_flags into blk_queue_get_max_sectors() Kirill Tkhai
2020-02-13  7:39 ` [PATCH v7 3/6] block: Introduce blk_queue_get_max_write_zeroes_sectors() Kirill Tkhai
2020-02-13  7:39 ` [PATCH v7 4/6] block: Add support for REQ_ALLOCATE flag Kirill Tkhai
2020-02-13  7:39 ` [PATCH v7 5/6] block: Add blk_queue_max_allocate_sectors() Kirill Tkhai
2020-02-13  7:39 ` [PATCH v7 6/6] loop: Add support for REQ_ALLOCATE Kirill Tkhai
2020-02-13 18:11   ` Darrick J. Wong
2020-02-13 20:07     ` Kirill Tkhai
2020-02-13  7:55 ` [PATCH v7 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES Kirill Tkhai
2020-03-06  9:11   ` Kirill Tkhai
2020-03-13 13:08     ` Kirill Tkhai
2020-03-19 10:28       ` Christoph Hellwig
2020-03-19 10:42         ` Kirill Tkhai
2020-03-19 13:03         ` Martin K. Petersen
2020-03-25 16:26           ` Darrick J. Wong
2020-03-25 16:32             ` Christoph Hellwig
2020-03-25 17:23               ` Martin K. Petersen
2020-03-26  9:29                 ` Christoph Hellwig
2020-03-26 14:34                   ` Martin K. Petersen
2020-03-26 14:45                     ` Christoph Hellwig
2020-03-26 16:48                       ` Chaitanya Kulkarni

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git