linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage
       [not found] <20221229071647.437095-1-sarthakkukreti@chromium.org>
@ 2023-04-14  0:02 ` Sarthak Kukreti
  2023-04-14  0:02   ` [PATCH v3 1/3] block: Introduce provisioning primitives Sarthak Kukreti
                     ` (4 more replies)
  0 siblings, 5 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-14  0:02 UTC (permalink / raw)
  To: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Hi,

This patch series adds a mechanism to pass through provision requests on
stacked thinly provisioned block devices.

The linux kernel provides several mechanisms to set up thinly provisioned
block storage abstractions (eg. dm-thin, loop devices over sparse files),
either directly as block devices or backing storage for filesystems. Currently,
short of writing data to either the device or filesystme, there is no way for
users to pre-allocate space for use in such storage setups. Consider the
following use-cases:

1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that
   the underlying thinpool metadata is not modified during the suspend
   mechanism, the dm-thin device needs to be fully provisioned.
2) If a filesystem uses a loop device over a sparse file, fallocate() on the
   filesystem will allocate blocks for files but the underlying sparse file
   will remain intact.
3) Another example is virtual machine using a sparse file/dm-thin as a storage
   device; by default, allocations within the VM boundaries will not affect
   the host.
4) Several storage standards support mechanisms for thin provisioning on
   real hardware devices. For example:
   a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning:
      "When the THINP bit in the NSFEAT field of the Identify Namespace data
       structure is set to ‘1’, the controller ... shall track the number of
       allocated blocks in the Namespace Utilization field"
   b. The SCSi Block Commands reference - 4 section references "Thin
      provisioned logical units",
   c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".

In all the above situations, currently, the only way for pre-allocating space
is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does not
scale well with larger pre-allocation sizes.

This patchset introduces primitives to support block-level provisioning (note:
the term 'provisioning' is used to prevent overloading the term
'allocations/pre-allocations') requests across filesystems and block devices.
This allows fallocate() and file creation requests to reserve space across
stacked layers of block devices and filesystems. Currently, the patchset covers
a prototype on the device-mapper targets, loop device and ext4, but the same
mechanism can be extended to other filesystems/block devices as well as extended
for use with devices in 4 a-c.

Patch 1 introduces REQ_OP_PROVISION as a new request type.
The provision request acts like the inverse of a discard request; instead
of notifying lower layers that the block range will no longer be used, provision
acts as a request to lower layers to provision disk space for the given block
range. Real hardware storage devices will currently disable the provisioing
capability but for the standards listed in 4a.-c., REQ_OP_PROVISION can be
overloaded for use as the provisioing primitive for future devices.

Patch 2 implements REQ_OP_PROVISION handling for some of the device-mapper
targets. Device-mapper targets will usually mirror the support of underlying
devices. This patch also enables the use of fallocate in mode == 0 for block
devices.

Patch 3 wires up the loop device handling of REQ_OP_PROVISION and calls
fallocate() with mode 0 on the underlying file/block device.

Testing:
--------
- Tested on a VM running a 6.2 kernel.

- Preallocation of dm-thin devices:
As expected, avoiding the need to zero out thinly-provisioned block devices to
preallocate space speeds up the provisioning operation significantly:

The following was tested on a dm-thin device set up on top of a dm-thinp with
skip_block_zeroing=true.
A) Zeroout was measured using `fallocate -z ...`
B) Provision was measured using `fallocate -p ...`.

Size	Time	A	B
512M	real	1.093	0.034
	user	0	0
	sys	0.022	0.01
1G	real	2.182	0.048
	user	0	0.01
	sys	0.022	0
2G	real	4.344	0.082
	user	0	0.01
	sys	0.036	0
4G	real	8.679	0.153
	user	0	0.01
	sys	0.073	0
8G	real	17.777	0.318
	user	0	0.01
	sys	0.144	0

Changelog:

V3:
- Drop FALLOC_FL_PROVISION and use mode == 0 for provision requests.
- Drop fs-specific patches; will be sent out in a follow up series.
- Fix missing shared block handling for thin snapshots.

V2:
- Fix stacked limit handling.
- Enable provision request handling in dm-snapshot
- Don't call truncate_bdev_range if blkdev_fallocate() is called with
  FALLOC_FL_PROVISION.
- Clarify semantics of FALLOC_FL_PROVISION and why it needs to be a separate flag
  (as opposed to overloading mode == 0).

Sarthak Kukreti (3):
  block: Introduce provisioning primitives
  dm: Add support for block provisioning
  loop: Add support for provision requests

 block/blk-core.c              |   5 ++
 block/blk-lib.c               |  53 ++++++++++++++++
 block/blk-merge.c             |  18 ++++++
 block/blk-settings.c          |  19 ++++++
 block/blk-sysfs.c             |   8 +++
 block/bounce.c                |   1 +
 block/fops.c                  |  14 +++--
 drivers/block/loop.c          |  42 +++++++++++++
 drivers/md/dm-crypt.c         |   4 +-
 drivers/md/dm-linear.c        |   1 +
 drivers/md/dm-snap.c          |   7 +++
 drivers/md/dm-table.c         |  25 ++++++++
 drivers/md/dm-thin.c          | 110 +++++++++++++++++++++++++++++++---
 drivers/md/dm.c               |   4 ++
 include/linux/bio.h           |   6 +-
 include/linux/blk_types.h     |   5 +-
 include/linux/blkdev.h        |  16 +++++
 include/linux/device-mapper.h |  11 ++++
 18 files changed, 333 insertions(+), 16 deletions(-)

-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 1/3] block: Introduce provisioning primitives
  2023-04-14  0:02 ` [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
@ 2023-04-14  0:02   ` Sarthak Kukreti
  2023-04-17 17:35     ` Brian Foster
  2023-04-14  0:02   ` [PATCH v3 2/3] dm: Add support for block provisioning Sarthak Kukreti
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-14  0:02 UTC (permalink / raw)
  To: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Introduce block request REQ_OP_PROVISION. The intent of this request
is to request underlying storage to preallocate disk space for the given
block range. Block devices that support this capability will export
a provision limit within their request queues.

This patch also adds the capability to call fallocate() in mode 0
on block devices, which will send REQ_OP_PROVISION to the block
device for the specified range,

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/blk-core.c          |  5 ++++
 block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c         | 18 +++++++++++++
 block/blk-settings.c      | 19 ++++++++++++++
 block/blk-sysfs.c         |  8 ++++++
 block/bounce.c            |  1 +
 block/fops.c              | 14 ++++++++---
 include/linux/bio.h       |  6 +++--
 include/linux/blk_types.h |  5 +++-
 include/linux/blkdev.h    | 16 ++++++++++++
 10 files changed, 138 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 42926e6cb83c..4a2342ba3a8b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
 	REQ_OP_NAME(WRITE_ZEROES),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
+	REQ_OP_NAME(PROVISION)
 };
 #undef REQ_OP_NAME
 
@@ -798,6 +799,10 @@ void submit_bio_noacct(struct bio *bio)
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_PROVISION:
+		if (!q->limits.max_provision_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..647b6451660b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -343,3 +343,56 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_secure_erase);
+
+/**
+ * blkdev_issue_provision - provision a block range
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to provision
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Issues a provision request to the block device for the range of sectors.
+ *  For thinly provisioned block devices, this acts as a signal for the
+ *  underlying storage pool to allocate space for this block range.
+ */
+int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp)
+{
+	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	unsigned int max_sectors = bdev_max_provision_sectors(bdev);
+	struct bio *bio = NULL;
+	struct blk_plug plug;
+	int ret = 0;
+
+	if (max_sectors == 0)
+		return -EOPNOTSUPP;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+	if (bdev_read_only(bdev))
+		return -EPERM;
+
+	blk_start_plug(&plug);
+	for (;;) {
+		unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
+
+		bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
+
+		sector += req_sects;
+		nr_sects -= req_sects;
+		if (!nr_sects) {
+			ret = submit_bio_wait(bio);
+			if (ret == -EOPNOTSUPP)
+				ret = 0;
+			bio_put(bio);
+			break;
+		}
+		cond_resched();
+	}
+	blk_finish_plug(&plug);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_provision);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6460abdb2426..a3ffebb97a1d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
 	return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
 }
 
+static struct bio *bio_split_provision(struct bio *bio,
+					const struct queue_limits *lim,
+					unsigned int *nsegs, struct bio_set *bs)
+{
+	*nsegs = 0;
+
+	if (!lim->max_provision_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= lim->max_provision_sectors)
+		return NULL;
+
+	return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
+}
+
 /*
  * Return the maximum number of sectors from the start of a bio that may be
  * submitted as a single request to a block device. If enough sectors remain,
@@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
 	case REQ_OP_WRITE_ZEROES:
 		split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
 		break;
+	case REQ_OP_PROVISION:
+		split = bio_split_provision(bio, lim, nr_segs, bs);
+		break;
 	default:
 		split = bio_split_rw(bio, lim, nr_segs, bs,
 				get_max_io_size(bio, lim) << SECTOR_SHIFT);
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 896b4654ab00..d303e6614c36 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
+	lim->max_provision_sectors = 0;
 }
 
 /**
@@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_provision_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_provision_sectors - set max sectors for a single provision
+ *
+ * @q:  the request queue for the device
+ * @max_provision_sectors: maximum number of sectors to provision per command
+ **/
+
+void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors)
+{
+	q->limits.max_provision_sectors = max_provision_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_provision_sectors);
+
 /**
  * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
  * @q:  the request queue for the device
@@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_provision_sectors = min_not_zero(t->max_provision_sectors,
+						b->max_provision_sectors);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f1fce1c7fa44..202aa78f933e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -132,6 +132,12 @@ static ssize_t queue_max_discard_segments_show(struct request_queue *q,
 	return queue_var_show(queue_max_discard_segments(q), page);
 }
 
+static ssize_t queue_max_provision_sectors_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(queue_max_provision_sectors(q), (page));
+}
+
 static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(q->limits.max_integrity_segments, page);
@@ -599,6 +605,7 @@ QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
 QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
 
 QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
+QUEUE_RO_ENTRY(queue_max_provision_sectors, "max_provision_sectors");
 QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
 QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
 QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
@@ -648,6 +655,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_max_sectors_entry.attr,
 	&queue_max_segments_entry.attr,
 	&queue_max_discard_segments_entry.attr,
+	&queue_max_provision_sectors_entry.attr,
 	&queue_max_integrity_segments_entry.attr,
 	&queue_max_segment_size_entry.attr,
 	&elv_iosched_entry.attr,
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..ab9d8723ae64 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		break;
 	default:
 		bio_for_each_segment(bv, bio_src, iter)
diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..f82da2fb8af0 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -625,7 +625,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	int error;
 
 	/* Fail if we don't recognize the flags. */
-	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+	if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
 		return -EOPNOTSUPP;
 
 	/* Don't go off the end of the device. */
@@ -649,11 +649,17 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	filemap_invalidate_lock(inode->i_mapping);
 
 	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
+	if (mode != 0) {
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+	}
 
 	switch (mode) {
+	case 0:
+		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
+					       len >> SECTOR_SHIFT, GFP_KERNEL);
+		break;
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d766be7152e1..9820b3b039f2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
 	    bio->bi_iter.bi_size &&
 	    bio_op(bio) != REQ_OP_DISCARD &&
 	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
-	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
+	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
+	    bio_op(bio) != REQ_OP_PROVISION)
 		return true;
 
 	return false;
@@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
 {
 	return bio_op(bio) == REQ_OP_DISCARD ||
 	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
-	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
+	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+	       bio_op(bio) == REQ_OP_PROVISION;
 }
 
 static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99be590f952f..27bdf88f541c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -385,7 +385,10 @@ enum req_op {
 	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
 	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
 
-	REQ_OP_LAST		= (__force blk_opf_t)36,
+	/* request device to provision block */
+	REQ_OP_PROVISION        = (__force blk_opf_t)37,
+
+	REQ_OP_LAST		= (__force blk_opf_t)38,
 };
 
 enum req_flag_bits {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 941304f17492..239e2f418b6e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -303,6 +303,7 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int		max_provision_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
@@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
 
+extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask);
+
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
 
@@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
 	return q->limits.max_discard_segments;
 }
 
+static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
+{
+	return q->limits.max_provision_sectors;
+}
+
 static inline unsigned int queue_max_segment_size(const struct request_queue *q)
 {
 	return q->limits.max_segment_size;
@@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
 	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
 }
 
+static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
+{
+	return bdev_get_queue(bdev)->limits.max_provision_sectors;
+}
+
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
 {
 	return blk_queue_zoned_model(bdev_get_queue(bdev));
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 2/3] dm: Add support for block provisioning
  2023-04-14  0:02 ` [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
  2023-04-14  0:02   ` [PATCH v3 1/3] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-04-14  0:02   ` Sarthak Kukreti
       [not found]     ` <CAJ0trDbyqoKEDN4kzcdn+vWhx+hk6pTM4ndf-E02f3uT9YZ3Uw@mail.gmail.com>
  2023-04-14 21:58     ` Mike Snitzer
  2023-04-14  0:02   ` [PATCH v3 3/3] loop: Add support for provision requests Sarthak Kukreti
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-14  0:02 UTC (permalink / raw)
  To: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Add support to dm devices for REQ_OP_PROVISION. The default mode
is to passthrough the request to the underlying device, if it
supports it. dm-thinpool uses the provision request to provision
blocks for a dm-thin device. dm-thinpool currently does not
pass through REQ_OP_PROVISION to underlying devices.

For shared blocks, provision requests will break sharing and copy the
contents of the entire block.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-crypt.c         |   4 +-
 drivers/md/dm-linear.c        |   1 +
 drivers/md/dm-snap.c          |   7 +++
 drivers/md/dm-table.c         |  25 ++++++++
 drivers/md/dm-thin.c          | 110 +++++++++++++++++++++++++++++++---
 drivers/md/dm.c               |   4 ++
 include/linux/device-mapper.h |  11 ++++
 7 files changed, 153 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3ba53dc3cc3f..5c655bfd4686 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3087,6 +3087,8 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
 	if (ret)
 		return ret;
 
+	ti->num_provision_bios = 1;
+
 	while (opt_params--) {
 		opt_string = dm_shift_arg(&as);
 		if (!opt_string) {
@@ -3390,7 +3392,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	 * - for REQ_OP_DISCARD caller must use flush if IO ordering matters
 	 */
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH ||
-	    bio_op(bio) == REQ_OP_DISCARD)) {
+	    bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) {
 		bio_set_dev(bio, cc->dev->bdev);
 		if (bio_sectors(bio))
 			bio->bi_iter.bi_sector = cc->start +
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 3e622dcc9dbd..7843e548e850 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->num_provision_bios = 1;
 	ti->private = lc;
 	return 0;
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index f766c21408f1..f6b224a12000 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1358,6 +1358,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (s->discard_zeroes_cow)
 		ti->num_discard_bios = (s->discard_passdown_origin ? 2 : 1);
 	ti->per_io_data_size = sizeof(struct dm_snap_tracked_chunk);
+	ti->num_provision_bios = 1;
 
 	/* Add snapshot to the list of snapshots for this origin */
 	/* Exceptions aren't triggered till snapshot_resume() is called */
@@ -2003,6 +2004,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	/* If the block is already remapped - use that, else remap it */
 	e = dm_lookup_exception(&s->complete, chunk);
 	if (e) {
+		if (unlikely(bio_op(bio) == REQ_OP_PROVISION)) {
+			bio_endio(bio);
+			r = DM_MAPIO_SUBMITTED;
+			goto out_unlock;
+		}
 		remap_exception(s, e, bio, chunk);
 		if (unlikely(bio_op(bio) == REQ_OP_DISCARD) &&
 		    io_overlaps_chunk(s, bio)) {
@@ -2413,6 +2419,7 @@ static void snapshot_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		/* All discards are split on chunk_size boundary */
 		limits->discard_granularity = snap->store->chunk_size;
 		limits->max_discard_sectors = snap->store->chunk_size;
+		limits->max_provision_sectors = snap->store->chunk_size;
 
 		up_read(&_origins_lock);
 	}
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 2055a758541d..5985343384a7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1850,6 +1850,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
 	return true;
 }
 
+static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
+				    sector_t start, sector_t len, void *data)
+{
+	return !bdev_max_provision_sectors(dev->bdev);
+}
+
+static bool dm_table_supports_provision(struct dm_table *t)
+{
+	for (unsigned int i = 0; i < t->num_targets; i++) {
+		struct dm_target *ti = dm_table_get_target(t, i);
+
+		if (ti->provision_supported ||
+		    (ti->type->iterate_devices &&
+		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
+			return true;
+	}
+
+	return false;
+}
+
 static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
 				     sector_t start, sector_t len, void *data)
 {
@@ -1983,6 +2003,11 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_zeroes(t))
 		q->limits.max_write_zeroes_sectors = 0;
 
+	if (dm_table_supports_provision(t))
+		blk_queue_max_provision_sectors(q, UINT_MAX >> 9);
+	else
+		q->limits.max_provision_sectors = 0;
+
 	dm_table_verify_integrity(t);
 
 	/*
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 13d4677baafd..b08b7ae617be 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -909,7 +909,8 @@ static void __inc_remap_and_issue_cell(void *context,
 	struct bio *bio;
 
 	while ((bio = bio_list_pop(&cell->bios))) {
-		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
+		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+		    bio_op(bio) == REQ_OP_PROVISION)
 			bio_list_add(&info->defer_bios, bio);
 		else {
 			inc_all_io_entry(info->tc->pool, bio);
@@ -1013,6 +1014,15 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
 		goto out;
 	}
 
+	/*
+	 * For provision requests, once the prepared block has been inserted
+	 * into the mapping btree, return.
+	 */
+	if (bio && bio_op(bio) == REQ_OP_PROVISION) {
+		bio_endio(bio);
+		return;
+	}
+
 	/*
 	 * Release any bios held while the block was being provisioned.
 	 * If we are processing a write bio that completely covers the block,
@@ -1241,7 +1251,7 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
 
 static int io_overwrites_block(struct pool *pool, struct bio *bio)
 {
-	return (bio_data_dir(bio) == WRITE) &&
+	return (bio_data_dir(bio) == WRITE) && bio_op(bio) != REQ_OP_PROVISION &&
 		io_overlaps_block(pool, bio);
 }
 
@@ -1334,10 +1344,11 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	/*
 	 * IO to pool_dev remaps to the pool target's data_dev.
 	 *
-	 * If the whole block of data is being overwritten, we can issue the
-	 * bio immediately. Otherwise we use kcopyd to clone the data first.
+	 * If the whole block of data is being overwritten and if this is not a
+	 * provision request, we can issue the bio immediately.
+	 * Otherwise we use kcopyd to clone the data first.
 	 */
-	if (io_overwrites_block(pool, bio))
+	if (io_overwrites_block(pool, bio) && bio_op(bio) != REQ_OP_PROVISION)
 		remap_and_issue_overwrite(tc, bio, data_dest, m);
 	else {
 		struct dm_io_region from, to;
@@ -1356,7 +1367,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 		/*
 		 * Do we need to zero a tail region?
 		 */
-		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks) {
+		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks &&
+		    bio_op(bio) != REQ_OP_PROVISION) {
 			atomic_inc(&m->prepare_actions);
 			ll_zero(tc, m,
 				data_dest * pool->sectors_per_block + len,
@@ -1390,6 +1402,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 	m->data_block = data_block;
 	m->cell = cell;
 
+	/* Provision requests are chained on the original bio. */
+	if (bio && bio_op(bio) == REQ_OP_PROVISION)
+		m->bio = bio;
+
 	/*
 	 * If the whole block of data is being overwritten or we are not
 	 * zeroing pre-existing data, we can issue the bio immediately.
@@ -1865,7 +1881,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
 
 	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
 		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
-		cell_defer_no_holder(tc, virt_cell);
+		if (bio_op(bio) != REQ_OP_PROVISION)
+			cell_defer_no_holder(tc, virt_cell);
 	} else {
 		struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
 
@@ -1982,6 +1999,73 @@ static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
 	}
 }
 
+static void process_provision_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
+{
+	int r;
+	struct pool *pool = tc->pool;
+	struct bio *bio = cell->holder;
+	dm_block_t begin, end;
+	struct dm_thin_lookup_result lookup_result;
+
+	if (tc->requeue_mode) {
+		cell_requeue(pool, cell);
+		return;
+	}
+
+	get_bio_block_range(tc, bio, &begin, &end);
+
+	while (begin != end) {
+		r = ensure_next_mapping(pool);
+		if (r)
+			/* we did our best */
+			return;
+
+		r = dm_thin_find_block(tc->td, begin, 1, &lookup_result);
+		switch (r) {
+		case 0:
+			if (lookup_result.shared)
+				process_shared_bio(tc, bio, begin,
+						   &lookup_result, cell);
+			begin++;
+			break;
+		case -ENODATA:
+			bio_inc_remaining(bio);
+			provision_block(tc, bio, begin, cell);
+			begin++;
+			break;
+		default:
+			DMERR_LIMIT(
+				"%s: dm_thin_find_block() failed: error = %d",
+				__func__, r);
+			cell_defer_no_holder(tc, cell);
+			bio_io_error(bio);
+			begin++;
+			break;
+		}
+	}
+	bio_endio(bio);
+	cell_defer_no_holder(tc, cell);
+}
+
+static void process_provision_bio(struct thin_c *tc, struct bio *bio)
+{
+	dm_block_t begin, end;
+	struct dm_cell_key virt_key;
+	struct dm_bio_prison_cell *virt_cell;
+
+	get_bio_block_range(tc, bio, &begin, &end);
+	if (begin == end) {
+		bio_endio(bio);
+		return;
+	}
+
+	build_key(tc->td, VIRTUAL, begin, end, &virt_key);
+	if (bio_detain(tc->pool, &virt_key, bio, &virt_cell))
+		return;
+
+	process_provision_cell(tc, virt_cell);
+}
+
 static void process_bio(struct thin_c *tc, struct bio *bio)
 {
 	struct pool *pool = tc->pool;
@@ -2202,6 +2286,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
 
 		if (bio_op(bio) == REQ_OP_DISCARD)
 			pool->process_discard(tc, bio);
+		else if (bio_op(bio) == REQ_OP_PROVISION)
+			process_provision_bio(tc, bio);
 		else
 			pool->process_bio(tc, bio);
 
@@ -2723,7 +2809,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
+	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+	    bio_op(bio) == REQ_OP_PROVISION) {
 		thin_defer_bio_with_throttle(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -3370,6 +3457,8 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	pt->adjusted_pf = pt->requested_pf = pf;
 	ti->num_flush_bios = 1;
 	ti->limit_swap_bios = true;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
 
 	/*
 	 * Only need to enable discards if the pool should pass
@@ -4068,6 +4157,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
 	}
 
+
 	/*
 	 * pt->adjusted_pf is a staging area for the actual features to use.
 	 * They get transferred to the live pool in bind_control_target()
@@ -4261,6 +4351,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->num_discard_bios = 1;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	mutex_unlock(&dm_thin_pool_table.mutex);
 
 	spin_lock_irq(&tc->pool->lock);
@@ -4475,6 +4568,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
 	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
+	limits->max_provision_sectors = 2048 * 1024 * 16; /* 16G */
 }
 
 static struct target_type thin_target = {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dfde0088147a..d8f1803062b7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1593,6 +1593,7 @@ static bool is_abnormal_io(struct bio *bio)
 		case REQ_OP_DISCARD:
 		case REQ_OP_SECURE_ERASE:
 		case REQ_OP_WRITE_ZEROES:
+		case REQ_OP_PROVISION:
 			return true;
 		default:
 			break;
@@ -1617,6 +1618,9 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
 	case REQ_OP_WRITE_ZEROES:
 		num_bios = ti->num_write_zeroes_bios;
 		break;
+	case REQ_OP_PROVISION:
+		num_bios = ti->num_provision_bios;
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 7975483816e4..e9f687521ae6 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -334,6 +334,12 @@ struct dm_target {
 	 */
 	unsigned int num_write_zeroes_bios;
 
+	/*
+	 * The number of PROVISION bios that will be submitted to the target.
+	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
+	 */
+	unsigned int num_provision_bios;
+
 	/*
 	 * The minimum number of extra bytes allocated in each io for the
 	 * target to use.
@@ -358,6 +364,11 @@ struct dm_target {
 	 */
 	bool discards_supported:1;
 
+	/* Set if this target needs to receive provision requests regardless of
+	 * whether or not its underlying devices have support.
+	 */
+	bool provision_supported:1;
+
 	/*
 	 * Set if we need to limit the number of in-flight bios when swapping.
 	 */
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 3/3] loop: Add support for provision requests
  2023-04-14  0:02 ` [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
  2023-04-14  0:02   ` [PATCH v3 1/3] block: Introduce provisioning primitives Sarthak Kukreti
  2023-04-14  0:02   ` [PATCH v3 2/3] dm: Add support for block provisioning Sarthak Kukreti
@ 2023-04-14  0:02   ` Sarthak Kukreti
  2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
  4 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-14  0:02 UTC (permalink / raw)
  To: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Add support for provision requests to loopback devices.
Loop devices will configure provision support based on
whether the underlying block device/file can support
the provision request and upon receiving a provision bio,
will map it to the backing device/storage. For loop devices
over files, a REQ_OP_PROVISION request will translate to
an fallocate mode 0 call on the backing file.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/block/loop.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2..13c4b4f8b9c1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -327,6 +327,24 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 	return ret;
 }
 
+static int lo_req_provision(struct loop_device *lo, struct request *rq, loff_t pos)
+{
+	struct file *file = lo->lo_backing_file;
+	struct request_queue *q = lo->lo_queue;
+	int ret;
+
+	if (!q->limits.max_provision_sectors) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	ret = file->f_op->fallocate(file, 0, pos, blk_rq_bytes(rq));
+	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
+		ret = -EIO;
+ out:
+	return ret;
+}
+
 static int lo_req_flush(struct loop_device *lo, struct request *rq)
 {
 	int ret = vfs_fsync(lo->lo_backing_file, 0);
@@ -488,6 +506,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 				FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
+	case REQ_OP_PROVISION:
+		return lo_req_provision(lo, rq, pos);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
 			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
@@ -754,6 +774,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
+static void loop_config_provision(struct loop_device *lo)
+{
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
+
+	/*
+	 * If the backing device is a block device, mirror its provisioning
+	 * capability.
+	 */
+	if (S_ISBLK(inode->i_mode)) {
+		blk_queue_max_provision_sectors(lo->lo_queue,
+			bdev_max_provision_sectors(I_BDEV(inode)));
+	} else if (file->f_op->fallocate) {
+		blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
+	} else {
+		blk_queue_max_provision_sectors(lo->lo_queue, 0);
+	}
+}
+
 static void loop_config_discard(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -1092,6 +1131,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	blk_queue_io_min(lo->lo_queue, bsize);
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
@@ -1304,6 +1344,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	}
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 
 	/* update dio if lo_offset or transfer is changed */
 	__loop_update_dio(lo, lo->use_dio);
@@ -1830,6 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 	case REQ_OP_DISCARD:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		cmd->use_aio = false;
 		break;
 	default:
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/3] dm: Add support for block provisioning
       [not found]     ` <CAJ0trDbyqoKEDN4kzcdn+vWhx+hk6pTM4ndf-E02f3uT9YZ3Uw@mail.gmail.com>
@ 2023-04-14 18:14       ` Mike Snitzer
  0 siblings, 0 replies; 57+ messages in thread
From: Mike Snitzer @ 2023-04-14 18:14 UTC (permalink / raw)
  To: Sarthak Kukreti, Joe Thornber
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Michael S. Tsirkin, sarthakkukreti, Darrick J. Wong, Jason Wang,
	Bart Van Assche, linux-kernel, linux-block, dm-devel,
	Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, linux-fsdevel,
	linux-ext4, Brian Foster, Alasdair Kergon

On Fri, Apr 14 2023 at  9:32P -0400,
Joe Thornber <thornber@redhat.com> wrote:

> On Fri, Apr 14, 2023 at 7:52 AM Sarthak Kukreti <sarthakkukreti@chromium.org>
> wrote:
> 
> > Add support to dm devices for REQ_OP_PROVISION. The default mode
> > is to passthrough the request to the underlying device, if it
> > supports it. dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block.
> >
> 
> I see two issue with this patch:
> 
> i) You use get_bio_block_range() to see which blocks the provision bio
> covers.  But this function only returns
> complete blocks that are covered (it was designed for discard).  Unlike
> discard, provision is not a hint so those
> partial blocks will need to be provisioned too.
> 
> ii) You are setting off multiple dm_thin_new_mapping operations in flight
> at once.  Each of these receives
> the same virt_cell and frees it  when it completes.  So I think we have
> multiple frees occuring?  In addition you also
> release the cell yourself in process_provision_cell().  Fixing this is not
> trivial, you'll need to reference count the cells,
> and aggregate the mapping operation results.
> 
> I think it would be far easier to restrict the size of the provision bio to
> be no bigger than one thinp block (as we do for normal io).  This way dm
> core can split the bios, chain the child bios rather than having to
> reference count mapping ops, and aggregate the results.

I happened to be looking at implementing WRITE_ZEROES support for DM
thinp yesterday and reached the same conclussion relative to it (both
of Joe's points above, for me "ii)" was: the dm-bio-prison-v1 range
locking we do for discards needs work for other types of IO).

We can work to make REQ_OP_PROVISION spanning multiple thinp blocks
possible as follow-on optimization; but in the near-term DM thinp
needs REQ_OP_PROVISION to be split on a thinp block boundary.

This splitting can be assisted by block core in terms of a new
'provision_granularity' (which for thinp, it'd be set to the thinp
blocksize).  But I don't know that we need to go that far (I'm
thinking its fine to have DM do the splitting it needs and only
elevate related code to block core if/when needed in the future).

DM core can take on conditionally imposing its max_io_len() to handle
splitting REQ_OP_PROVISION as needed on a per-target basis. This DM
core commit I've staged for 6.4 makes this quite a simple change:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-6.4&id=13f6facf3faeed34ca381aef4c9b153c7aed3972

So please rebase on linux-dm.git's dm-6.4 branch, and for
REQ_OP_PROVISION dm.c:__process_abnormal_io() you'd add this:

	case REQ_OP_PROVISION:
                num_bios = ti->num_provision_bios;
                if (ti->max_provision_granularity)
                        max_granularity = limits->max_provision_sectors;
                break;

I'll reply again later today (to patch 2's actual code changes),
because I caught at least one other thing worth mentioning.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/3] dm: Add support for block provisioning
  2023-04-14  0:02   ` [PATCH v3 2/3] dm: Add support for block provisioning Sarthak Kukreti
       [not found]     ` <CAJ0trDbyqoKEDN4kzcdn+vWhx+hk6pTM4ndf-E02f3uT9YZ3Uw@mail.gmail.com>
@ 2023-04-14 21:58     ` Mike Snitzer
  2023-04-18 22:13       ` Sarthak Kukreti
  1 sibling, 1 reply; 57+ messages in thread
From: Mike Snitzer @ 2023-04-14 21:58 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Stefan Hajnoczi, Alasdair Kergon, Christoph Hellwig,
	Brian Foster, Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Daniil Lunev, Darrick J. Wong

On Thu, Apr 13 2023 at  8:02P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> Add support to dm devices for REQ_OP_PROVISION. The default mode
> is to passthrough the request to the underlying device, if it
> supports it. dm-thinpool uses the provision request to provision
> blocks for a dm-thin device. dm-thinpool currently does not
> pass through REQ_OP_PROVISION to underlying devices.
> 
> For shared blocks, provision requests will break sharing and copy the
> contents of the entire block.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  drivers/md/dm-crypt.c         |   4 +-
>  drivers/md/dm-linear.c        |   1 +
>  drivers/md/dm-snap.c          |   7 +++

Have you tested REQ_OP_PROVISION with these targets?  Just want to
make sure you have an explicit need (and vested interest) for them
passing through REQ_OP_PROVISION.

> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 2055a758541d..5985343384a7 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1850,6 +1850,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
>  	return true;
>  }
>  
> +static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
> +				    sector_t start, sector_t len, void *data)
> +{
> +	return !bdev_max_provision_sectors(dev->bdev);
> +}
> +
> +static bool dm_table_supports_provision(struct dm_table *t)
> +{
> +	for (unsigned int i = 0; i < t->num_targets; i++) {
> +		struct dm_target *ti = dm_table_get_target(t, i);
> +
> +		if (ti->provision_supported ||
> +		    (ti->type->iterate_devices &&
> +		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
>  				     sector_t start, sector_t len, void *data)
>  {
> @@ -1983,6 +2003,11 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
>  	if (!dm_table_supports_write_zeroes(t))
>  		q->limits.max_write_zeroes_sectors = 0;
>  
> +	if (dm_table_supports_provision(t))
> +		blk_queue_max_provision_sectors(q, UINT_MAX >> 9);

This doesn't seem correct in that it'll override whatever
max_provision_sectors was set by a target (like thinp).

I think you only need the if (!dm_table_supports_provision)) case:

> +	else
> +		q->limits.max_provision_sectors = 0;
> +
>  	dm_table_verify_integrity(t);
>  
>  	/*
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 13d4677baafd..b08b7ae617be 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c

I think it'll make the most sense to split out the dm-thin.c changes
in a separate patch.

> @@ -909,7 +909,8 @@ static void __inc_remap_and_issue_cell(void *context,
>  	struct bio *bio;
>  
>  	while ((bio = bio_list_pop(&cell->bios))) {
> -		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
> +		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> +		    bio_op(bio) == REQ_OP_PROVISION)
>  			bio_list_add(&info->defer_bios, bio);
>  		else {
>  			inc_all_io_entry(info->tc->pool, bio);
> @@ -1013,6 +1014,15 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
>  		goto out;
>  	}
>  
> +	/*
> +	 * For provision requests, once the prepared block has been inserted
> +	 * into the mapping btree, return.
> +	 */
> +	if (bio && bio_op(bio) == REQ_OP_PROVISION) {
> +		bio_endio(bio);
> +		return;
> +	}
> +
>  	/*
>  	 * Release any bios held while the block was being provisioned.
>  	 * If we are processing a write bio that completely covers the block,
> @@ -1241,7 +1251,7 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
>  
>  static int io_overwrites_block(struct pool *pool, struct bio *bio)
>  {
> -	return (bio_data_dir(bio) == WRITE) &&
> +	return (bio_data_dir(bio) == WRITE) && bio_op(bio) != REQ_OP_PROVISION &&
>  		io_overlaps_block(pool, bio);
>  }
>  
> @@ -1334,10 +1344,11 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
>  	/*
>  	 * IO to pool_dev remaps to the pool target's data_dev.
>  	 *
> -	 * If the whole block of data is being overwritten, we can issue the
> -	 * bio immediately. Otherwise we use kcopyd to clone the data first.
> +	 * If the whole block of data is being overwritten and if this is not a
> +	 * provision request, we can issue the bio immediately.
> +	 * Otherwise we use kcopyd to clone the data first.
>  	 */
> -	if (io_overwrites_block(pool, bio))
> +	if (io_overwrites_block(pool, bio) && bio_op(bio) != REQ_OP_PROVISION)
>  		remap_and_issue_overwrite(tc, bio, data_dest, m);
>  	else {
>  		struct dm_io_region from, to;
> @@ -1356,7 +1367,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
>  		/*
>  		 * Do we need to zero a tail region?
>  		 */
> -		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks) {
> +		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks &&
> +		    bio_op(bio) != REQ_OP_PROVISION) {
>  			atomic_inc(&m->prepare_actions);
>  			ll_zero(tc, m,
>  				data_dest * pool->sectors_per_block + len,
> @@ -1390,6 +1402,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
>  	m->data_block = data_block;
>  	m->cell = cell;
>  
> +	/* Provision requests are chained on the original bio. */
> +	if (bio && bio_op(bio) == REQ_OP_PROVISION)
> +		m->bio = bio;
> +
>  	/*
>  	 * If the whole block of data is being overwritten or we are not
>  	 * zeroing pre-existing data, we can issue the bio immediately.
> @@ -1865,7 +1881,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  
>  	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
>  		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
> -		cell_defer_no_holder(tc, virt_cell);
> +		if (bio_op(bio) != REQ_OP_PROVISION)
> +			cell_defer_no_holder(tc, virt_cell);
>  	} else {
>  		struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
>  

Not confident in the above changes given the request that we only
handle REQ_OP_PROVISION one thinp block at a time.  So I'll gloss over
them for now.

> @@ -1982,6 +1999,73 @@ static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
>  	}
>  }
>  
> +static void process_provision_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> +{
> +	int r;
> +	struct pool *pool = tc->pool;
> +	struct bio *bio = cell->holder;
> +	dm_block_t begin, end;
> +	struct dm_thin_lookup_result lookup_result;
> +
> +	if (tc->requeue_mode) {
> +		cell_requeue(pool, cell);
> +		return;
> +	}
> +
> +	get_bio_block_range(tc, bio, &begin, &end);
> +
> +	while (begin != end) {
> +		r = ensure_next_mapping(pool);
> +		if (r)
> +			/* we did our best */
> +			return;
> +
> +		r = dm_thin_find_block(tc->td, begin, 1, &lookup_result);
> +		switch (r) {
> +		case 0:
> +			if (lookup_result.shared)
> +				process_shared_bio(tc, bio, begin,
> +						   &lookup_result, cell);
> +			begin++;
> +			break;
> +		case -ENODATA:
> +			bio_inc_remaining(bio);
> +			provision_block(tc, bio, begin, cell);
> +			begin++;
> +			break;
> +		default:
> +			DMERR_LIMIT(
> +				"%s: dm_thin_find_block() failed: error = %d",
> +				__func__, r);
> +			cell_defer_no_holder(tc, cell);
> +			bio_io_error(bio);
> +			begin++;
> +			break;
> +		}
> +	}
> +	bio_endio(bio);
> +	cell_defer_no_holder(tc, cell);
> +}
> +
> +static void process_provision_bio(struct thin_c *tc, struct bio *bio)
> +{
> +	dm_block_t begin, end;
> +	struct dm_cell_key virt_key;
> +	struct dm_bio_prison_cell *virt_cell;
> +
> +	get_bio_block_range(tc, bio, &begin, &end);
> +	if (begin == end) {
> +		bio_endio(bio);
> +		return;
> +	}

Like Joe mentioned, this pattern was fine for discards because they
are advisory/optional.  But we need to make sure we don't truncate
REQ_OP_PROVISION -- so we need to round up if we partially bleed into
the blocks to the left or right.

BUT ranged REQ_OP_PROVISION support is for later, this can be dealt
with more simply in that each REQ_OP_PROVISION will be handled a block
at a time initially.  SO you'll want to honor _all_ REQ_OP_PROVISION,
never returning early.

> +
> +	build_key(tc->td, VIRTUAL, begin, end, &virt_key);
> +	if (bio_detain(tc->pool, &virt_key, bio, &virt_cell))
> +		return;
> +
> +	process_provision_cell(tc, virt_cell);
> +}
> +
>  static void process_bio(struct thin_c *tc, struct bio *bio)
>  {
>  	struct pool *pool = tc->pool;
> @@ -2202,6 +2286,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
>  
>  		if (bio_op(bio) == REQ_OP_DISCARD)
>  			pool->process_discard(tc, bio);
> +		else if (bio_op(bio) == REQ_OP_PROVISION)
> +			process_provision_bio(tc, bio);

This should be pool->process_provision() (or ->process_provision_bio
if you like).  Point is, you need to be switching these methods
if/when the pool_mode transitions in set_pool_mode().

>  		else
>  			pool->process_bio(tc, bio);
>  
> @@ -2723,7 +2809,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
>  		return DM_MAPIO_SUBMITTED;
>  	}
>  
> -	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
> +	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> +	    bio_op(bio) == REQ_OP_PROVISION) {
>  		thin_defer_bio_with_throttle(tc, bio);
>  		return DM_MAPIO_SUBMITTED;
>  	}
> @@ -3370,6 +3457,8 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	pt->adjusted_pf = pt->requested_pf = pf;
>  	ti->num_flush_bios = 1;
>  	ti->limit_swap_bios = true;
> +	ti->num_provision_bios = 1;
> +	ti->provision_supported = true;
>  
>  	/*
>  	 * Only need to enable discards if the pool should pass
> @@ -4068,6 +4157,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  		blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
>  	}
>  
> +

Please fix this extra whitespace damage.

>  	/*
>  	 * pt->adjusted_pf is a staging area for the actual features to use.
>  	 * They get transferred to the live pool in bind_control_target()
> @@ -4261,6 +4351,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  		ti->num_discard_bios = 1;
>  	}
>  
> +	ti->num_provision_bios = 1;
> +	ti->provision_supported = true;
> +
>  	mutex_unlock(&dm_thin_pool_table.mutex);
>  
>  	spin_lock_irq(&tc->pool->lock);
> @@ -4475,6 +4568,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  
>  	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
>  	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> +	limits->max_provision_sectors = 2048 * 1024 * 16; /* 16G */

Building on my previous reply, with suggested update to
dm.c:__process_abnormal_io(), once you rebase on dm-6.4's dm-thin.c
you'll want to instead:

limits->max_provision_sectors = pool->sectors_per_block << SECTOR_SHIFT;

And you'll want to drop any of the above code that deals with handling
bio-prison range locking and processing of REQ_OP_PROVISION for
multiple thinp blocks at once.

Simple REQ_OP_PROVISION processing one thinp block at a time first and
then we can worry about handling REQ_OP_PROVISION that span blocks
later.

>  static struct target_type thin_target = {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index dfde0088147a..d8f1803062b7 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1593,6 +1593,7 @@ static bool is_abnormal_io(struct bio *bio)
>  		case REQ_OP_DISCARD:
>  		case REQ_OP_SECURE_ERASE:
>  		case REQ_OP_WRITE_ZEROES:
> +		case REQ_OP_PROVISION:
>  			return true;
>  		default:
>  			break;
> @@ -1617,6 +1618,9 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
>  	case REQ_OP_WRITE_ZEROES:
>  		num_bios = ti->num_write_zeroes_bios;
>  		break;
> +	case REQ_OP_PROVISION:
> +		num_bios = ti->num_provision_bios;
> +		break;
>  	default:
>  		break;
>  	}

Please be sure to include my suggested __process_abnormal_io change
from my previous reply.

> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index 7975483816e4..e9f687521ae6 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -334,6 +334,12 @@ struct dm_target {
>  	 */
>  	unsigned int num_write_zeroes_bios;
>  
> +	/*
> +	 * The number of PROVISION bios that will be submitted to the target.
> +	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
> +	 */
> +	unsigned int num_provision_bios;
> +
>  	/*
>  	 * The minimum number of extra bytes allocated in each io for the
>  	 * target to use.
> @@ -358,6 +364,11 @@ struct dm_target {
>  	 */
>  	bool discards_supported:1;
>  
> +	/* Set if this target needs to receive provision requests regardless of
> +	 * whether or not its underlying devices have support.
> +	 */
> +	bool provision_supported:1;
> +
>  	/*
>  	 * Set if we need to limit the number of in-flight bios when swapping.
>  	 */

You'll need to add max_provision_granularity bool too (as implied by
the dm.c:__process_abnormal_io() change suggested in my first reply to
this patch).

I'm happy to wait for you to consume the v3 feedback we've provided so
you can create a v4.  I'm thinking I can base dm-thin.c's WRITE_ZEROES
support ontop of your REQ_OP_PROVISION v4 changes -- they should be
complementary.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/3] block: Introduce provisioning primitives
  2023-04-14  0:02   ` [PATCH v3 1/3] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-04-17 17:35     ` Brian Foster
  2023-04-18 22:13       ` Sarthak Kukreti
  0 siblings, 1 reply; 57+ messages in thread
From: Brian Foster @ 2023-04-17 17:35 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Stefan Hajnoczi, Alasdair Kergon, Mike Snitzer,
	Christoph Hellwig, Theodore Ts'o, Andreas Dilger,
	Bart Van Assche, Daniil Lunev, Darrick J. Wong

On Thu, Apr 13, 2023 at 05:02:17PM -0700, Sarthak Kukreti wrote:
> Introduce block request REQ_OP_PROVISION. The intent of this request
> is to request underlying storage to preallocate disk space for the given
> block range. Block devices that support this capability will export
> a provision limit within their request queues.
> 
> This patch also adds the capability to call fallocate() in mode 0
> on block devices, which will send REQ_OP_PROVISION to the block
> device for the specified range,
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/blk-core.c          |  5 ++++
>  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
>  block/blk-merge.c         | 18 +++++++++++++
>  block/blk-settings.c      | 19 ++++++++++++++
>  block/blk-sysfs.c         |  8 ++++++
>  block/bounce.c            |  1 +
>  block/fops.c              | 14 ++++++++---
>  include/linux/bio.h       |  6 +++--
>  include/linux/blk_types.h |  5 +++-
>  include/linux/blkdev.h    | 16 ++++++++++++
>  10 files changed, 138 insertions(+), 7 deletions(-)
> 
...
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c..f82da2fb8af0 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -625,7 +625,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	int error;
>  
>  	/* Fail if we don't recognize the flags. */
> -	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> +	if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
>  		return -EOPNOTSUPP;
>  
>  	/* Don't go off the end of the device. */
> @@ -649,11 +649,17 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	filemap_invalidate_lock(inode->i_mapping);
>  
>  	/* Invalidate the page cache, including dirty pages. */
> -	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> -	if (error)
> -		goto fail;
> +	if (mode != 0) {
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (error)
> +			goto fail;
> +	}
>  
>  	switch (mode) {
> +	case 0:
> +		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> +					       len >> SECTOR_SHIFT, GFP_KERNEL);
> +		break;

I would think we'd want to support any combination of
FALLOC_FL_KEEP_SIZE and FALLOC_FL_UNSHARE_RANGE..? All of the other
commands support the former modifier, for one. It also looks like if
somebody attempts to invoke with mode == FALLOC_FL_KEEP_SIZE, even with
the current upstream code that would perform the bdev truncate before
returning -EOPNOTSUPP. That seems like a bit of an unfortunate side
effect to me.

WRT to unshare, if the PROVISION request is always going to imply an
unshare (which seems reasonable to me), there's probably no reason to
-EOPNOTSUPP if a caller explicitly passes UNSHARE_RANGE.

Brian

>  	case FALLOC_FL_ZERO_RANGE:
>  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
>  		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d766be7152e1..9820b3b039f2 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
>  	    bio->bi_iter.bi_size &&
>  	    bio_op(bio) != REQ_OP_DISCARD &&
>  	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
> -	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
> +	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> +	    bio_op(bio) != REQ_OP_PROVISION)
>  		return true;
>  
>  	return false;
> @@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
>  {
>  	return bio_op(bio) == REQ_OP_DISCARD ||
>  	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
> -	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
> +	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
> +	       bio_op(bio) == REQ_OP_PROVISION;
>  }
>  
>  static inline void *bio_data(struct bio *bio)
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 99be590f952f..27bdf88f541c 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -385,7 +385,10 @@ enum req_op {
>  	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
>  	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
>  
> -	REQ_OP_LAST		= (__force blk_opf_t)36,
> +	/* request device to provision block */
> +	REQ_OP_PROVISION        = (__force blk_opf_t)37,
> +
> +	REQ_OP_LAST		= (__force blk_opf_t)38,
>  };
>  
>  enum req_flag_bits {
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 941304f17492..239e2f418b6e 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -303,6 +303,7 @@ struct queue_limits {
>  	unsigned int		discard_granularity;
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
> +	unsigned int		max_provision_sectors;
>  
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
> @@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
>  		unsigned int max_discard_sectors);
>  extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>  		unsigned int max_write_same_sectors);
> +extern void blk_queue_max_provision_sectors(struct request_queue *q,
> +		unsigned int max_provision_sectors);
>  extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
>  extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
>  		unsigned int max_zone_append_sectors);
> @@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp);
>  
> +extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> +		sector_t nr_sects, gfp_t gfp_mask);
> +
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
>  
> @@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
>  	return q->limits.max_discard_segments;
>  }
>  
> +static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
> +{
> +	return q->limits.max_provision_sectors;
> +}
> +
>  static inline unsigned int queue_max_segment_size(const struct request_queue *q)
>  {
>  	return q->limits.max_segment_size;
> @@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
>  	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
>  }
>  
> +static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
> +{
> +	return bdev_get_queue(bdev)->limits.max_provision_sectors;
> +}
> +
>  static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
>  {
>  	return blk_queue_zoned_model(bdev_get_queue(bdev));
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage
  2023-04-14  0:02 ` [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
                     ` (2 preceding siblings ...)
  2023-04-14  0:02   ` [PATCH v3 3/3] loop: Add support for provision requests Sarthak Kukreti
@ 2023-04-18 22:12   ` Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
                       ` (3 more replies)
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
  4 siblings, 4 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:12 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Hi,

This patch series is revision 4 of introducing a new mechanism to pass through provision requests on stacked thinly provisioned storage devices. See [1] for original cover letter.

[1] https://lore.kernel.org/lkml/ZDnMl8A1B1+Tfn5S@redhat.com/T/#md4f20113c2242755747ae069f84be720a6751012

Changelog:

V4:
- Fix UNSHARE and KEEP_SIZE handling in blkdev_fallocate.
- Split dm-thin support into a separate patch.
- Remove ranged provision request handling and adjust io hints to handle provisioning one block at a time.
- Add missing provision_supported for dm targets.

V3:
- Drop FALLOC_FL_PROVISION and use mode == 0 for provision requests.
- Drop fs-specific patches; will be sent out in a follow up series.
- Fix missing shared block handling for thin snapshots.

V2:
- Fix stacked limit handling.
- Enable provision request handling in dm-snapshot
- Don't call truncate_bdev_range if blkdev_fallocate() is called with
  FALLOC_FL_PROVISION.
- Clarify semantics of FALLOC_FL_PROVISION and why it needs to be a separate flag
  (as opposed to overloading mode == 0).


Sarthak Kukreti (4):
  block: Introduce provisioning primitives
  dm: Add block provisioning support
  dm-thin: Add REQ_OP_PROVISION support
  loop: Add support for provision requests

 block/blk-core.c              |  5 +++
 block/blk-lib.c               | 53 +++++++++++++++++++++++++
 block/blk-merge.c             | 18 +++++++++
 block/blk-settings.c          | 19 +++++++++
 block/blk-sysfs.c             |  8 ++++
 block/bounce.c                |  1 +
 block/fops.c                  | 25 +++++++++---
 drivers/block/loop.c          | 42 ++++++++++++++++++++
 drivers/md/dm-crypt.c         |  5 ++-
 drivers/md/dm-linear.c        |  2 +
 drivers/md/dm-snap.c          |  8 ++++
 drivers/md/dm-table.c         | 23 +++++++++++
 drivers/md/dm-thin.c          | 73 ++++++++++++++++++++++++++++++++---
 drivers/md/dm.c               |  6 +++
 include/linux/bio.h           |  6 ++-
 include/linux/blk_types.h     |  5 ++-
 include/linux/blkdev.h        | 16 ++++++++
 include/linux/device-mapper.h | 17 ++++++++
 18 files changed, 317 insertions(+), 15 deletions(-)

-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
@ 2023-04-18 22:12     ` Sarthak Kukreti
  2023-04-18 22:43       ` Bart Van Assche
  2023-04-19 15:36       ` [dm-devel] " Darrick J. Wong
  2023-04-18 22:12     ` [PATCH v4 2/4] dm: Add block provisioning support Sarthak Kukreti
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:12 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Introduce block request REQ_OP_PROVISION. The intent of this request
is to request underlying storage to preallocate disk space for the given
block range. Block devices that support this capability will export
a provision limit within their request queues.

This patch also adds the capability to call fallocate() in mode 0
on block devices, which will send REQ_OP_PROVISION to the block
device for the specified range,

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/blk-core.c          |  5 ++++
 block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c         | 18 +++++++++++++
 block/blk-settings.c      | 19 ++++++++++++++
 block/blk-sysfs.c         |  8 ++++++
 block/bounce.c            |  1 +
 block/fops.c              | 25 +++++++++++++-----
 include/linux/bio.h       |  6 +++--
 include/linux/blk_types.h |  5 +++-
 include/linux/blkdev.h    | 16 ++++++++++++
 10 files changed, 147 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 42926e6cb83c..4a2342ba3a8b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
 	REQ_OP_NAME(WRITE_ZEROES),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
+	REQ_OP_NAME(PROVISION)
 };
 #undef REQ_OP_NAME
 
@@ -798,6 +799,10 @@ void submit_bio_noacct(struct bio *bio)
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_PROVISION:
+		if (!q->limits.max_provision_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..647b6451660b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -343,3 +343,56 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_secure_erase);
+
+/**
+ * blkdev_issue_provision - provision a block range
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to provision
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Issues a provision request to the block device for the range of sectors.
+ *  For thinly provisioned block devices, this acts as a signal for the
+ *  underlying storage pool to allocate space for this block range.
+ */
+int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp)
+{
+	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	unsigned int max_sectors = bdev_max_provision_sectors(bdev);
+	struct bio *bio = NULL;
+	struct blk_plug plug;
+	int ret = 0;
+
+	if (max_sectors == 0)
+		return -EOPNOTSUPP;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+	if (bdev_read_only(bdev))
+		return -EPERM;
+
+	blk_start_plug(&plug);
+	for (;;) {
+		unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
+
+		bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
+
+		sector += req_sects;
+		nr_sects -= req_sects;
+		if (!nr_sects) {
+			ret = submit_bio_wait(bio);
+			if (ret == -EOPNOTSUPP)
+				ret = 0;
+			bio_put(bio);
+			break;
+		}
+		cond_resched();
+	}
+	blk_finish_plug(&plug);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_provision);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6460abdb2426..a3ffebb97a1d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
 	return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
 }
 
+static struct bio *bio_split_provision(struct bio *bio,
+					const struct queue_limits *lim,
+					unsigned int *nsegs, struct bio_set *bs)
+{
+	*nsegs = 0;
+
+	if (!lim->max_provision_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= lim->max_provision_sectors)
+		return NULL;
+
+	return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
+}
+
 /*
  * Return the maximum number of sectors from the start of a bio that may be
  * submitted as a single request to a block device. If enough sectors remain,
@@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
 	case REQ_OP_WRITE_ZEROES:
 		split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
 		break;
+	case REQ_OP_PROVISION:
+		split = bio_split_provision(bio, lim, nr_segs, bs);
+		break;
 	default:
 		split = bio_split_rw(bio, lim, nr_segs, bs,
 				get_max_io_size(bio, lim) << SECTOR_SHIFT);
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 896b4654ab00..d303e6614c36 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
+	lim->max_provision_sectors = 0;
 }
 
 /**
@@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_provision_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_provision_sectors - set max sectors for a single provision
+ *
+ * @q:  the request queue for the device
+ * @max_provision_sectors: maximum number of sectors to provision per command
+ **/
+
+void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors)
+{
+	q->limits.max_provision_sectors = max_provision_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_provision_sectors);
+
 /**
  * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
  * @q:  the request queue for the device
@@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_provision_sectors = min_not_zero(t->max_provision_sectors,
+						b->max_provision_sectors);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f1fce1c7fa44..202aa78f933e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -132,6 +132,12 @@ static ssize_t queue_max_discard_segments_show(struct request_queue *q,
 	return queue_var_show(queue_max_discard_segments(q), page);
 }
 
+static ssize_t queue_max_provision_sectors_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(queue_max_provision_sectors(q), (page));
+}
+
 static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(q->limits.max_integrity_segments, page);
@@ -599,6 +605,7 @@ QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
 QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
 
 QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
+QUEUE_RO_ENTRY(queue_max_provision_sectors, "max_provision_sectors");
 QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
 QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
 QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
@@ -648,6 +655,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_max_sectors_entry.attr,
 	&queue_max_segments_entry.attr,
 	&queue_max_discard_segments_entry.attr,
+	&queue_max_provision_sectors_entry.attr,
 	&queue_max_integrity_segments_entry.attr,
 	&queue_max_segment_size_entry.attr,
 	&elv_iosched_entry.attr,
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..ab9d8723ae64 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		break;
 	default:
 		bio_for_each_segment(bv, bio_src, iter)
diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..e1775269654a 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -611,9 +611,13 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return ret;
 }
 
+#define	BLKDEV_FALLOC_FL_TRUNCATE				\
+		(FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_ZERO_RANGE |	\
+		 FALLOC_FL_NO_HIDE_STALE)
+
 #define	BLKDEV_FALLOC_FL_SUPPORTED					\
-		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
-		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+		(BLKDEV_FALLOC_FL_TRUNCATE | FALLOC_FL_KEEP_SIZE |	\
+		 FALLOC_FL_UNSHARE_RANGE)
 
 static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 			     loff_t len)
@@ -625,7 +629,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	int error;
 
 	/* Fail if we don't recognize the flags. */
-	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+	if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
 		return -EOPNOTSUPP;
 
 	/* Don't go off the end of the device. */
@@ -649,11 +653,20 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	filemap_invalidate_lock(inode->i_mapping);
 
 	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
+	if (mode & BLKDEV_FALLOC_FL_TRUNCATE) {
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+	}
 
 	switch (mode) {
+	case 0:
+	case FALLOC_FL_UNSHARE_RANGE:
+	case FALLOC_FL_KEEP_SIZE:
+	case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
+					       len >> SECTOR_SHIFT, GFP_KERNEL);
+		break;
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d766be7152e1..9820b3b039f2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
 	    bio->bi_iter.bi_size &&
 	    bio_op(bio) != REQ_OP_DISCARD &&
 	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
-	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
+	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
+	    bio_op(bio) != REQ_OP_PROVISION)
 		return true;
 
 	return false;
@@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
 {
 	return bio_op(bio) == REQ_OP_DISCARD ||
 	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
-	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
+	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+	       bio_op(bio) == REQ_OP_PROVISION;
 }
 
 static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99be590f952f..27bdf88f541c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -385,7 +385,10 @@ enum req_op {
 	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
 	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
 
-	REQ_OP_LAST		= (__force blk_opf_t)36,
+	/* request device to provision block */
+	REQ_OP_PROVISION        = (__force blk_opf_t)37,
+
+	REQ_OP_LAST		= (__force blk_opf_t)38,
 };
 
 enum req_flag_bits {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 941304f17492..239e2f418b6e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -303,6 +303,7 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int		max_provision_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
@@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
 
+extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask);
+
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
 
@@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
 	return q->limits.max_discard_segments;
 }
 
+static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
+{
+	return q->limits.max_provision_sectors;
+}
+
 static inline unsigned int queue_max_segment_size(const struct request_queue *q)
 {
 	return q->limits.max_segment_size;
@@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
 	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
 }
 
+static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
+{
+	return bdev_get_queue(bdev)->limits.max_provision_sectors;
+}
+
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
 {
 	return blk_queue_zoned_model(bdev_get_queue(bdev));
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v4 2/4] dm: Add block provisioning support
  2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-04-18 22:12     ` Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 3/4] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 4/4] loop: Add support for provision requests Sarthak Kukreti
  3 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:12 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Add block provisioning support for device-mapper targets.
dm-crypt, dm-snap and dm-linear will, by default, passthrough
REQ_OP_PROVISION requests to the underlying device, if
supported.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-crypt.c         |  5 ++++-
 drivers/md/dm-linear.c        |  2 ++
 drivers/md/dm-snap.c          |  8 ++++++++
 drivers/md/dm-table.c         | 23 +++++++++++++++++++++++
 drivers/md/dm.c               |  6 ++++++
 include/linux/device-mapper.h | 17 +++++++++++++++++
 6 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 8b47b913ee83..aa8072d6d7bf 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3336,6 +3336,9 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		cc->tag_pool_max_sectors <<= cc->sector_shift;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	ret = -ENOMEM;
 	cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 1, devname);
 	if (!cc->io_queue) {
@@ -3390,7 +3393,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	 * - for REQ_OP_DISCARD caller must use flush if IO ordering matters
 	 */
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH ||
-	    bio_op(bio) == REQ_OP_DISCARD)) {
+	    bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) {
 		bio_set_dev(bio, cc->dev->bdev);
 		if (bio_sectors(bio))
 			bio->bi_iter.bi_sector = cc->start +
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index f4448d520ee9..66e50f5b0665 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,8 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
 	ti->private = lc;
 	return 0;
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 9c49f53760d0..07927bb1e711 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1358,6 +1358,8 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (s->discard_zeroes_cow)
 		ti->num_discard_bios = (s->discard_passdown_origin ? 2 : 1);
 	ti->per_io_data_size = sizeof(struct dm_snap_tracked_chunk);
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
 
 	/* Add snapshot to the list of snapshots for this origin */
 	/* Exceptions aren't triggered till snapshot_resume() is called */
@@ -2003,6 +2005,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	/* If the block is already remapped - use that, else remap it */
 	e = dm_lookup_exception(&s->complete, chunk);
 	if (e) {
+		if (unlikely(bio_op(bio) == REQ_OP_PROVISION)) {
+			bio_endio(bio);
+			r = DM_MAPIO_SUBMITTED;
+			goto out_unlock;
+		}
 		remap_exception(s, e, bio, chunk);
 		if (unlikely(bio_op(bio) == REQ_OP_DISCARD) &&
 		    io_overlaps_chunk(s, bio)) {
@@ -2413,6 +2420,7 @@ static void snapshot_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		/* All discards are split on chunk_size boundary */
 		limits->discard_granularity = snap->store->chunk_size;
 		limits->max_discard_sectors = snap->store->chunk_size;
+		limits->max_provision_sectors = snap->store->chunk_size;
 
 		up_read(&_origins_lock);
 	}
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 119db5e01080..9301f050529f 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1854,6 +1854,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
 	return true;
 }
 
+static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
+				    sector_t start, sector_t len, void *data)
+{
+	return !bdev_max_provision_sectors(dev->bdev);
+}
+
+static bool dm_table_supports_provision(struct dm_table *t)
+{
+	for (unsigned int i = 0; i < t->num_targets; i++) {
+		struct dm_target *ti = dm_table_get_target(t, i);
+
+		if (ti->provision_supported ||
+		    (ti->type->iterate_devices &&
+		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
+			return true;
+	}
+
+	return false;
+}
+
 static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
 				     sector_t start, sector_t len, void *data)
 {
@@ -1987,6 +2007,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_zeroes(t))
 		q->limits.max_write_zeroes_sectors = 0;
 
+	if (!dm_table_supports_provision(t))
+		q->limits.max_provision_sectors = 0;
+
 	dm_table_verify_integrity(t);
 
 	/*
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b694ba3a106..9b94121b8d38 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1609,6 +1609,7 @@ static bool is_abnormal_io(struct bio *bio)
 		case REQ_OP_DISCARD:
 		case REQ_OP_SECURE_ERASE:
 		case REQ_OP_WRITE_ZEROES:
+		case REQ_OP_PROVISION:
 			return true;
 		default:
 			break;
@@ -1641,6 +1642,11 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
 		if (ti->max_write_zeroes_granularity)
 			max_granularity = limits->max_write_zeroes_sectors;
 		break;
+	case REQ_OP_PROVISION:
+		num_bios = ti->num_provision_bios;
+		if (ti->max_provision_granularity)
+			max_granularity = limits->max_provision_sectors;
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a52d2b9a6846..9981378457d2 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -334,6 +334,12 @@ struct dm_target {
 	 */
 	unsigned int num_write_zeroes_bios;
 
+	/*
+	 * The number of PROVISION bios that will be submitted to the target.
+	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
+	 */
+	unsigned int num_provision_bios;
+
 	/*
 	 * The minimum number of extra bytes allocated in each io for the
 	 * target to use.
@@ -358,6 +364,11 @@ struct dm_target {
 	 */
 	bool discards_supported:1;
 
+	/* Set if this target needs to receive provision requests regardless of
+	 * whether or not its underlying devices have support.
+	 */
+	bool provision_supported:1;
+
 	/*
 	 * Set if this target requires that discards be split on
 	 * 'max_discard_sectors' boundaries.
@@ -376,6 +387,12 @@ struct dm_target {
 	 */
 	bool max_write_zeroes_granularity:1;
 
+	/*
+	 * Set if this target requires that provisions be split on
+	 * 'max_provision_sectors' boundaries.
+	 */
+	bool max_provision_granularity:1;
+
 	/*
 	 * Set if we need to limit the number of in-flight bios when swapping.
 	 */
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v4 3/4] dm-thin: Add REQ_OP_PROVISION support
  2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 2/4] dm: Add block provisioning support Sarthak Kukreti
@ 2023-04-18 22:12     ` Sarthak Kukreti
  2023-04-18 22:12     ` [PATCH v4 4/4] loop: Add support for provision requests Sarthak Kukreti
  3 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:12 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

dm-thinpool uses the provision request to provision
blocks for a dm-thin device. dm-thinpool currently does not
pass through REQ_OP_PROVISION to underlying devices.

For shared blocks, provision requests will break sharing and copy the
contents of the entire block. Additionally, if 'skip_block_zeroing'
is not set, dm-thin will opt to zero out the entire range as a part
of provisioning.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-thin.c | 73 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 2b13c949bd72..58d633f5c928 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -274,6 +274,7 @@ struct pool {
 
 	process_bio_fn process_bio;
 	process_bio_fn process_discard;
+	process_bio_fn process_provision;
 
 	process_cell_fn process_cell;
 	process_cell_fn process_discard_cell;
@@ -913,7 +914,8 @@ static void __inc_remap_and_issue_cell(void *context,
 	struct bio *bio;
 
 	while ((bio = bio_list_pop(&cell->bios))) {
-		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
+		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+		    bio_op(bio) == REQ_OP_PROVISION)
 			bio_list_add(&info->defer_bios, bio);
 		else {
 			inc_all_io_entry(info->tc->pool, bio);
@@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
 
 static int io_overwrites_block(struct pool *pool, struct bio *bio)
 {
-	return (bio_data_dir(bio) == WRITE) &&
-		io_overlaps_block(pool, bio);
+	return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) &&
+	       bio_op(bio) != REQ_OP_PROVISION;
 }
 
 static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
@@ -1891,7 +1893,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
 
 	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
 		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
-		cell_defer_no_holder(tc, virt_cell);
+		if (bio_op(bio) != REQ_OP_PROVISION)
+			cell_defer_no_holder(tc, virt_cell);
 	} else {
 		struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
 
@@ -1953,6 +1956,51 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 	}
 }
 
+static void process_provision_bio(struct thin_c *tc, struct bio *bio)
+{
+	int r;
+	struct pool *pool = tc->pool;
+	dm_block_t block = get_bio_block(tc, bio);
+	struct dm_bio_prison_cell *cell;
+	struct dm_cell_key key;
+	struct dm_thin_lookup_result lookup_result;
+
+	/*
+	 * If cell is already occupied, then the block is already
+	 * being provisioned so we have nothing further to do here.
+	 */
+	build_virtual_key(tc->td, block, &key);
+	if (bio_detain(pool, &key, bio, &cell))
+		return;
+
+	if (tc->requeue_mode) {
+		cell_requeue(pool, cell);
+		return;
+	}
+
+	r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
+	switch (r) {
+	case 0:
+		if (lookup_result.shared) {
+			process_shared_bio(tc, bio, block, &lookup_result, cell);
+		} else {
+			bio_endio(bio);
+			cell_defer_no_holder(tc, cell);
+		}
+		break;
+	case -ENODATA:
+		provision_block(tc, bio, block, cell);
+		break;
+
+	default:
+		DMERR_LIMIT("%s: dm_thin_find_block() failed: error = %d",
+			    __func__, r);
+		cell_defer_no_holder(tc, cell);
+		bio_io_error(bio);
+		break;
+	}
+}
+
 static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
 {
 	int r;
@@ -2228,6 +2276,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
 
 		if (bio_op(bio) == REQ_OP_DISCARD)
 			pool->process_discard(tc, bio);
+		else if (bio_op(bio) == REQ_OP_PROVISION)
+			pool->process_provision(tc, bio);
 		else
 			pool->process_bio(tc, bio);
 
@@ -2579,6 +2629,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_fail;
 		pool->process_discard = process_bio_fail;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_fail;
 		pool->process_discard_cell = process_cell_fail;
 		pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -2592,6 +2643,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_read_only;
 		pool->process_discard = process_bio_success;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_read_only;
 		pool->process_discard_cell = process_cell_success;
 		pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -2612,6 +2664,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		pool->out_of_data_space = true;
 		pool->process_bio = process_bio_read_only;
 		pool->process_discard = process_discard_bio;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_read_only;
 		pool->process_prepared_mapping = process_prepared_mapping;
 		set_discard_callbacks(pool);
@@ -2628,6 +2681,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_write(pool->pmd);
 		pool->process_bio = process_bio;
 		pool->process_discard = process_discard_bio;
+		pool->process_provision = process_provision_bio;
 		pool->process_cell = process_cell;
 		pool->process_prepared_mapping = process_prepared_mapping;
 		set_discard_callbacks(pool);
@@ -2749,7 +2803,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
+	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+	    bio_op(bio) == REQ_OP_PROVISION) {
 		thin_defer_bio_with_throttle(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -3396,6 +3451,9 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	pt->adjusted_pf = pt->requested_pf = pf;
 	ti->num_flush_bios = 1;
 	ti->limit_swap_bios = true;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+	ti->max_provision_granularity = true;
 
 	/*
 	 * Only need to enable discards if the pool should pass
@@ -4288,6 +4346,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->max_discard_granularity = true;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	mutex_unlock(&dm_thin_pool_table.mutex);
 
 	spin_lock_irq(&tc->pool->lock);
@@ -4502,6 +4563,8 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
 	limits->max_discard_sectors = pool->sectors_per_block * BIO_PRISON_MAX_RANGE;
+
+	limits->max_provision_sectors = pool->sectors_per_block;
 }
 
 static struct target_type thin_target = {
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v4 4/4] loop: Add support for provision requests
  2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
                       ` (2 preceding siblings ...)
  2023-04-18 22:12     ` [PATCH v4 3/4] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
@ 2023-04-18 22:12     ` Sarthak Kukreti
  3 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:12 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Add support for provision requests to loopback devices.
Loop devices will configure provision support based on
whether the underlying block device/file can support
the provision request and upon receiving a provision bio,
will map it to the backing device/storage. For loop devices
over files, a REQ_OP_PROVISION request will translate to
an fallocate mode 0 call on the backing file.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/block/loop.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2..13c4b4f8b9c1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -327,6 +327,24 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 	return ret;
 }
 
+static int lo_req_provision(struct loop_device *lo, struct request *rq, loff_t pos)
+{
+	struct file *file = lo->lo_backing_file;
+	struct request_queue *q = lo->lo_queue;
+	int ret;
+
+	if (!q->limits.max_provision_sectors) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	ret = file->f_op->fallocate(file, 0, pos, blk_rq_bytes(rq));
+	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
+		ret = -EIO;
+ out:
+	return ret;
+}
+
 static int lo_req_flush(struct loop_device *lo, struct request *rq)
 {
 	int ret = vfs_fsync(lo->lo_backing_file, 0);
@@ -488,6 +506,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 				FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
+	case REQ_OP_PROVISION:
+		return lo_req_provision(lo, rq, pos);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
 			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
@@ -754,6 +774,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
+static void loop_config_provision(struct loop_device *lo)
+{
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
+
+	/*
+	 * If the backing device is a block device, mirror its provisioning
+	 * capability.
+	 */
+	if (S_ISBLK(inode->i_mode)) {
+		blk_queue_max_provision_sectors(lo->lo_queue,
+			bdev_max_provision_sectors(I_BDEV(inode)));
+	} else if (file->f_op->fallocate) {
+		blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
+	} else {
+		blk_queue_max_provision_sectors(lo->lo_queue, 0);
+	}
+}
+
 static void loop_config_discard(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -1092,6 +1131,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	blk_queue_io_min(lo->lo_queue, bsize);
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
@@ -1304,6 +1344,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	}
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 
 	/* update dio if lo_offset or transfer is changed */
 	__loop_update_dio(lo, lo->use_dio);
@@ -1830,6 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 	case REQ_OP_DISCARD:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		cmd->use_aio = false;
 		break;
 	default:
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/3] block: Introduce provisioning primitives
  2023-04-17 17:35     ` Brian Foster
@ 2023-04-18 22:13       ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:13 UTC (permalink / raw)
  To: Brian Foster
  Cc: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Stefan Hajnoczi, Alasdair Kergon, Mike Snitzer,
	Christoph Hellwig, Theodore Ts'o, Andreas Dilger,
	Bart Van Assche, Daniil Lunev, Darrick J. Wong

On Mon, Apr 17, 2023 at 10:33 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Thu, Apr 13, 2023 at 05:02:17PM -0700, Sarthak Kukreti wrote:
> > Introduce block request REQ_OP_PROVISION. The intent of this request
> > is to request underlying storage to preallocate disk space for the given
> > block range. Block devices that support this capability will export
> > a provision limit within their request queues.
> >
> > This patch also adds the capability to call fallocate() in mode 0
> > on block devices, which will send REQ_OP_PROVISION to the block
> > device for the specified range,
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  block/blk-core.c          |  5 ++++
> >  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
> >  block/blk-merge.c         | 18 +++++++++++++
> >  block/blk-settings.c      | 19 ++++++++++++++
> >  block/blk-sysfs.c         |  8 ++++++
> >  block/bounce.c            |  1 +
> >  block/fops.c              | 14 ++++++++---
> >  include/linux/bio.h       |  6 +++--
> >  include/linux/blk_types.h |  5 +++-
> >  include/linux/blkdev.h    | 16 ++++++++++++
> >  10 files changed, 138 insertions(+), 7 deletions(-)
> >
> ...
> > diff --git a/block/fops.c b/block/fops.c
> > index d2e6be4e3d1c..f82da2fb8af0 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -625,7 +625,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> >       int error;
> >
> >       /* Fail if we don't recognize the flags. */
> > -     if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> > +     if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> >               return -EOPNOTSUPP;
> >
> >       /* Don't go off the end of the device. */
> > @@ -649,11 +649,17 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> >       filemap_invalidate_lock(inode->i_mapping);
> >
> >       /* Invalidate the page cache, including dirty pages. */
> > -     error = truncate_bdev_range(bdev, file->f_mode, start, end);
> > -     if (error)
> > -             goto fail;
> > +     if (mode != 0) {
> > +             error = truncate_bdev_range(bdev, file->f_mode, start, end);
> > +             if (error)
> > +                     goto fail;
> > +     }
> >
> >       switch (mode) {
> > +     case 0:
> > +             error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> > +                                            len >> SECTOR_SHIFT, GFP_KERNEL);
> > +             break;
>
> I would think we'd want to support any combination of
> FALLOC_FL_KEEP_SIZE and FALLOC_FL_UNSHARE_RANGE..? All of the other
> commands support the former modifier, for one. It also looks like if
> somebody attempts to invoke with mode == FALLOC_FL_KEEP_SIZE, even with
> the current upstream code that would perform the bdev truncate before
> returning -EOPNOTSUPP. That seems like a bit of an unfortunate side
> effect to me.
>
Added a separate flag set to decide whether we should truncate or not.

> WRT to unshare, if the PROVISION request is always going to imply an
> unshare (which seems reasonable to me), there's probably no reason to
> -EOPNOTSUPP if a caller explicitly passes UNSHARE_RANGE.
>
Added handling in v4.

Thanks!

Sarthak

> Brian
>
> >       case FALLOC_FL_ZERO_RANGE:
> >       case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> >               error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index d766be7152e1..9820b3b039f2 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
> >           bio->bi_iter.bi_size &&
> >           bio_op(bio) != REQ_OP_DISCARD &&
> >           bio_op(bio) != REQ_OP_SECURE_ERASE &&
> > -         bio_op(bio) != REQ_OP_WRITE_ZEROES)
> > +         bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> > +         bio_op(bio) != REQ_OP_PROVISION)
> >               return true;
> >
> >       return false;
> > @@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
> >  {
> >       return bio_op(bio) == REQ_OP_DISCARD ||
> >              bio_op(bio) == REQ_OP_SECURE_ERASE ||
> > -            bio_op(bio) == REQ_OP_WRITE_ZEROES;
> > +            bio_op(bio) == REQ_OP_WRITE_ZEROES ||
> > +            bio_op(bio) == REQ_OP_PROVISION;
> >  }
> >
> >  static inline void *bio_data(struct bio *bio)
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index 99be590f952f..27bdf88f541c 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -385,7 +385,10 @@ enum req_op {
> >       REQ_OP_DRV_IN           = (__force blk_opf_t)34,
> >       REQ_OP_DRV_OUT          = (__force blk_opf_t)35,
> >
> > -     REQ_OP_LAST             = (__force blk_opf_t)36,
> > +     /* request device to provision block */
> > +     REQ_OP_PROVISION        = (__force blk_opf_t)37,
> > +
> > +     REQ_OP_LAST             = (__force blk_opf_t)38,
> >  };
> >
> >  enum req_flag_bits {
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 941304f17492..239e2f418b6e 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -303,6 +303,7 @@ struct queue_limits {
> >       unsigned int            discard_granularity;
> >       unsigned int            discard_alignment;
> >       unsigned int            zone_write_granularity;
> > +     unsigned int            max_provision_sectors;
> >
> >       unsigned short          max_segments;
> >       unsigned short          max_integrity_segments;
> > @@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
> >               unsigned int max_discard_sectors);
> >  extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
> >               unsigned int max_write_same_sectors);
> > +extern void blk_queue_max_provision_sectors(struct request_queue *q,
> > +             unsigned int max_provision_sectors);
> >  extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
> >  extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
> >               unsigned int max_zone_append_sectors);
> > @@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
> >               sector_t nr_sects, gfp_t gfp);
> >
> > +extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> > +             sector_t nr_sects, gfp_t gfp_mask);
> > +
> >  #define BLKDEV_ZERO_NOUNMAP  (1 << 0)  /* do not free blocks */
> >  #define BLKDEV_ZERO_NOFALLBACK       (1 << 1)  /* don't write explicit zeroes */
> >
> > @@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
> >       return q->limits.max_discard_segments;
> >  }
> >
> > +static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
> > +{
> > +     return q->limits.max_provision_sectors;
> > +}
> > +
> >  static inline unsigned int queue_max_segment_size(const struct request_queue *q)
> >  {
> >       return q->limits.max_segment_size;
> > @@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
> >       return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
> >  }
> >
> > +static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
> > +{
> > +     return bdev_get_queue(bdev)->limits.max_provision_sectors;
> > +}
> > +
> >  static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> >  {
> >       return blk_queue_zoned_model(bdev_get_queue(bdev));
> > --
> > 2.40.0.634.g4ca3ef3211-goog
> >
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/3] dm: Add support for block provisioning
  2023-04-14 21:58     ` Mike Snitzer
@ 2023-04-18 22:13       ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-18 22:13 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: sarthakkukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Stefan Hajnoczi, Alasdair Kergon, Christoph Hellwig,
	Brian Foster, Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Daniil Lunev, Darrick J. Wong

On Fri, Apr 14, 2023 at 2:58 PM Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Thu, Apr 13 2023 at  8:02P -0400,
> Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:
>
> > Add support to dm devices for REQ_OP_PROVISION. The default mode
> > is to passthrough the request to the underlying device, if it
> > supports it. dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  drivers/md/dm-crypt.c         |   4 +-
> >  drivers/md/dm-linear.c        |   1 +
> >  drivers/md/dm-snap.c          |   7 +++
>
> Have you tested REQ_OP_PROVISION with these targets?  Just want to
> make sure you have an explicit need (and vested interest) for them
> passing through REQ_OP_PROVISION.
>

Yes. I have a vested interest in dm-linear and dm-crypt; I kept
dm-snap support mostly for consistency with thin snapshots.

> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 2055a758541d..5985343384a7 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -1850,6 +1850,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
> >       return true;
> >  }
> >
> > +static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
> > +                                 sector_t start, sector_t len, void *data)
> > +{
> > +     return !bdev_max_provision_sectors(dev->bdev);
> > +}
> > +
> > +static bool dm_table_supports_provision(struct dm_table *t)
> > +{
> > +     for (unsigned int i = 0; i < t->num_targets; i++) {
> > +             struct dm_target *ti = dm_table_get_target(t, i);
> > +
> > +             if (ti->provision_supported ||
> > +                 (ti->type->iterate_devices &&
> > +                 ti->type->iterate_devices(ti, device_provision_capable, NULL)))
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
> >                                    sector_t start, sector_t len, void *data)
> >  {
> > @@ -1983,6 +2003,11 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> >       if (!dm_table_supports_write_zeroes(t))
> >               q->limits.max_write_zeroes_sectors = 0;
> >
> > +     if (dm_table_supports_provision(t))
> > +             blk_queue_max_provision_sectors(q, UINT_MAX >> 9);
>
> This doesn't seem correct in that it'll override whatever
> max_provision_sectors was set by a target (like thinp).
>
> I think you only need the if (!dm_table_supports_provision)) case:
>
Done

> > +     else
> > +             q->limits.max_provision_sectors = 0;
> > +
> >       dm_table_verify_integrity(t);
> >
> >       /*
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index 13d4677baafd..b08b7ae617be 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
>
> I think it'll make the most sense to split out the dm-thin.c changes
> in a separate patch.
>
Separated the dm-thin changes into a separate patch that follows this one in v4.

> > @@ -909,7 +909,8 @@ static void __inc_remap_and_issue_cell(void *context,
> >       struct bio *bio;
> >
> >       while ((bio = bio_list_pop(&cell->bios))) {
> > -             if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
> > +             if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> > +                 bio_op(bio) == REQ_OP_PROVISION)
> >                       bio_list_add(&info->defer_bios, bio);
> >               else {
> >                       inc_all_io_entry(info->tc->pool, bio);
> > @@ -1013,6 +1014,15 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
> >               goto out;
> >       }
> >
> > +     /*
> > +      * For provision requests, once the prepared block has been inserted
> > +      * into the mapping btree, return.
> > +      */
> > +     if (bio && bio_op(bio) == REQ_OP_PROVISION) {
> > +             bio_endio(bio);
> > +             return;
> > +     }
> > +
> >       /*
> >        * Release any bios held while the block was being provisioned.
> >        * If we are processing a write bio that completely covers the block,
> > @@ -1241,7 +1251,7 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
> >
> >  static int io_overwrites_block(struct pool *pool, struct bio *bio)
> >  {
> > -     return (bio_data_dir(bio) == WRITE) &&
> > +     return (bio_data_dir(bio) == WRITE) && bio_op(bio) != REQ_OP_PROVISION &&
> >               io_overlaps_block(pool, bio);
> >  }
> >
> > @@ -1334,10 +1344,11 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
> >       /*
> >        * IO to pool_dev remaps to the pool target's data_dev.
> >        *
> > -      * If the whole block of data is being overwritten, we can issue the
> > -      * bio immediately. Otherwise we use kcopyd to clone the data first.
> > +      * If the whole block of data is being overwritten and if this is not a
> > +      * provision request, we can issue the bio immediately.
> > +      * Otherwise we use kcopyd to clone the data first.
> >        */
> > -     if (io_overwrites_block(pool, bio))
> > +     if (io_overwrites_block(pool, bio) && bio_op(bio) != REQ_OP_PROVISION)
> >               remap_and_issue_overwrite(tc, bio, data_dest, m);
> >       else {
> >               struct dm_io_region from, to;
> > @@ -1356,7 +1367,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
> >               /*
> >                * Do we need to zero a tail region?
> >                */
> > -             if (len < pool->sectors_per_block && pool->pf.zero_new_blocks) {
> > +             if (len < pool->sectors_per_block && pool->pf.zero_new_blocks &&
> > +                 bio_op(bio) != REQ_OP_PROVISION) {
> >                       atomic_inc(&m->prepare_actions);
> >                       ll_zero(tc, m,
> >                               data_dest * pool->sectors_per_block + len,
> > @@ -1390,6 +1402,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
> >       m->data_block = data_block;
> >       m->cell = cell;
> >
> > +     /* Provision requests are chained on the original bio. */
> > +     if (bio && bio_op(bio) == REQ_OP_PROVISION)
> > +             m->bio = bio;
> > +
> >       /*
> >        * If the whole block of data is being overwritten or we are not
> >        * zeroing pre-existing data, we can issue the bio immediately.
> > @@ -1865,7 +1881,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
> >
> >       if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
> >               break_sharing(tc, bio, block, &key, lookup_result, data_cell);
> > -             cell_defer_no_holder(tc, virt_cell);
> > +             if (bio_op(bio) != REQ_OP_PROVISION)
> > +                     cell_defer_no_holder(tc, virt_cell);
> >       } else {
> >               struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
> >
>
> Not confident in the above changes given the request that we only
> handle REQ_OP_PROVISION one thinp block at a time.  So I'll gloss over
> them for now.
>
Yeah, the majority of this got removed in v4. I added a check in
io_overwrites_block() to return false for all provision requests.

> > @@ -1982,6 +1999,73 @@ static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> >       }
> >  }
> >
> > +static void process_provision_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> > +{
> > +     int r;
> > +     struct pool *pool = tc->pool;
> > +     struct bio *bio = cell->holder;
> > +     dm_block_t begin, end;
> > +     struct dm_thin_lookup_result lookup_result;
> > +
> > +     if (tc->requeue_mode) {
> > +             cell_requeue(pool, cell);
> > +             return;
> > +     }
> > +
> > +     get_bio_block_range(tc, bio, &begin, &end);
> > +
> > +     while (begin != end) {
> > +             r = ensure_next_mapping(pool);
> > +             if (r)
> > +                     /* we did our best */
> > +                     return;
> > +
> > +             r = dm_thin_find_block(tc->td, begin, 1, &lookup_result);
> > +             switch (r) {
> > +             case 0:
> > +                     if (lookup_result.shared)
> > +                             process_shared_bio(tc, bio, begin,
> > +                                                &lookup_result, cell);
> > +                     begin++;
> > +                     break;
> > +             case -ENODATA:
> > +                     bio_inc_remaining(bio);
> > +                     provision_block(tc, bio, begin, cell);
> > +                     begin++;
> > +                     break;
> > +             default:
> > +                     DMERR_LIMIT(
> > +                             "%s: dm_thin_find_block() failed: error = %d",
> > +                             __func__, r);
> > +                     cell_defer_no_holder(tc, cell);
> > +                     bio_io_error(bio);
> > +                     begin++;
> > +                     break;
> > +             }
> > +     }
> > +     bio_endio(bio);
> > +     cell_defer_no_holder(tc, cell);
> > +}
> > +
> > +static void process_provision_bio(struct thin_c *tc, struct bio *bio)
> > +{
> > +     dm_block_t begin, end;
> > +     struct dm_cell_key virt_key;
> > +     struct dm_bio_prison_cell *virt_cell;
> > +
> > +     get_bio_block_range(tc, bio, &begin, &end);
> > +     if (begin == end) {
> > +             bio_endio(bio);
> > +             return;
> > +     }
>
> Like Joe mentioned, this pattern was fine for discards because they
> are advisory/optional.  But we need to make sure we don't truncate
> REQ_OP_PROVISION -- so we need to round up if we partially bleed into
> the blocks to the left or right.
>
> BUT ranged REQ_OP_PROVISION support is for later, this can be dealt
> with more simply in that each REQ_OP_PROVISION will be handled a block
> at a time initially.  SO you'll want to honor _all_ REQ_OP_PROVISION,
> never returning early.
>
Thanks. The next patch in the series has the simplified version. It
had a lot in common with process_bio() so there was a possibility for
merging the two code paths, but I opted to keep it like this to make
ranged handling and passdown support easier to implement.

> > +
> > +     build_key(tc->td, VIRTUAL, begin, end, &virt_key);
> > +     if (bio_detain(tc->pool, &virt_key, bio, &virt_cell))
> > +             return;
> > +
> > +     process_provision_cell(tc, virt_cell);
> > +}
> > +
> >  static void process_bio(struct thin_c *tc, struct bio *bio)
> >  {
> >       struct pool *pool = tc->pool;
> > @@ -2202,6 +2286,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
> >
> >               if (bio_op(bio) == REQ_OP_DISCARD)
> >                       pool->process_discard(tc, bio);
> > +             else if (bio_op(bio) == REQ_OP_PROVISION)
> > +                     process_provision_bio(tc, bio);
>
> This should be pool->process_provision() (or ->process_provision_bio
> if you like).  Point is, you need to be switching these methods
> if/when the pool_mode transitions in set_pool_mode().
>
Done

> >               else
> >                       pool->process_bio(tc, bio);
> >
> > @@ -2723,7 +2809,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
> >               return DM_MAPIO_SUBMITTED;
> >       }
> >
> > -     if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
> > +     if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> > +         bio_op(bio) == REQ_OP_PROVISION) {
> >               thin_defer_bio_with_throttle(tc, bio);
> >               return DM_MAPIO_SUBMITTED;
> >       }
> > @@ -3370,6 +3457,8 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> >       pt->adjusted_pf = pt->requested_pf = pf;
> >       ti->num_flush_bios = 1;
> >       ti->limit_swap_bios = true;
> > +     ti->num_provision_bios = 1;
> > +     ti->provision_supported = true;
> >
> >       /*
> >        * Only need to enable discards if the pool should pass
> > @@ -4068,6 +4157,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >               blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
> >       }
> >
> > +
>
> Please fix this extra whitespace damage.
>
Done

> >       /*
> >        * pt->adjusted_pf is a staging area for the actual features to use.
> >        * They get transferred to the live pool in bind_control_target()
> > @@ -4261,6 +4351,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> >               ti->num_discard_bios = 1;
> >       }
> >
> > +     ti->num_provision_bios = 1;
> > +     ti->provision_supported = true;
> > +
> >       mutex_unlock(&dm_thin_pool_table.mutex);
> >
> >       spin_lock_irq(&tc->pool->lock);
> > @@ -4475,6 +4568,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >
> >       limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> >       limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > +     limits->max_provision_sectors = 2048 * 1024 * 16; /* 16G */
>
> Building on my previous reply, with suggested update to
> dm.c:__process_abnormal_io(), once you rebase on dm-6.4's dm-thin.c
> you'll want to instead:
>
> limits->max_provision_sectors = pool->sectors_per_block << SECTOR_SHIFT;
>
> And you'll want to drop any of the above code that deals with handling
> bio-prison range locking and processing of REQ_OP_PROVISION for
> multiple thinp blocks at once.
>
> Simple REQ_OP_PROVISION processing one thinp block at a time first and
> then we can worry about handling REQ_OP_PROVISION that span blocks
> later.
>
Thanks, done in v4.

> >  static struct target_type thin_target = {
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index dfde0088147a..d8f1803062b7 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1593,6 +1593,7 @@ static bool is_abnormal_io(struct bio *bio)
> >               case REQ_OP_DISCARD:
> >               case REQ_OP_SECURE_ERASE:
> >               case REQ_OP_WRITE_ZEROES:
> > +             case REQ_OP_PROVISION:
> >                       return true;
> >               default:
> >                       break;
> > @@ -1617,6 +1618,9 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
> >       case REQ_OP_WRITE_ZEROES:
> >               num_bios = ti->num_write_zeroes_bios;
> >               break;
> > +     case REQ_OP_PROVISION:
> > +             num_bios = ti->num_provision_bios;
> > +             break;
> >       default:
> >               break;
> >       }
>
> Please be sure to include my suggested __process_abnormal_io change
> from my previous reply.
>
Done.

> > diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> > index 7975483816e4..e9f687521ae6 100644
> > --- a/include/linux/device-mapper.h
> > +++ b/include/linux/device-mapper.h
> > @@ -334,6 +334,12 @@ struct dm_target {
> >        */
> >       unsigned int num_write_zeroes_bios;
> >
> > +     /*
> > +      * The number of PROVISION bios that will be submitted to the target.
> > +      * The bio number can be accessed with dm_bio_get_target_bio_nr.
> > +      */
> > +     unsigned int num_provision_bios;
> > +
> >       /*
> >        * The minimum number of extra bytes allocated in each io for the
> >        * target to use.
> > @@ -358,6 +364,11 @@ struct dm_target {
> >        */
> >       bool discards_supported:1;
> >
> > +     /* Set if this target needs to receive provision requests regardless of
> > +      * whether or not its underlying devices have support.
> > +      */
> > +     bool provision_supported:1;
> > +
> >       /*
> >        * Set if we need to limit the number of in-flight bios when swapping.
> >        */
>
> You'll need to add max_provision_granularity bool too (as implied by
> the dm.c:__process_abnormal_io() change suggested in my first reply to
> this patch).
>
> I'm happy to wait for you to consume the v3 feedback we've provided so
> you can create a v4.  I'm thinking I can base dm-thin.c's WRITE_ZEROES
> support ontop of your REQ_OP_PROVISION v4 changes -- they should be
> complementary.
>
Done. Thanks for the review and guidance!

Best
Sarthak

> Thanks,
> Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-18 22:12     ` [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-04-18 22:43       ` Bart Van Assche
  2023-04-20 17:41         ` Sarthak Kukreti
  2023-04-19 15:36       ` [dm-devel] " Darrick J. Wong
  1 sibling, 1 reply; 57+ messages in thread
From: Bart Van Assche @ 2023-04-18 22:43 UTC (permalink / raw)
  To: Sarthak Kukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

On 4/18/23 15:12, Sarthak Kukreti wrote:
>   	/* Fail if we don't recognize the flags. */
> -	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> +	if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
>   		return -EOPNOTSUPP;

Is this change necessary? Doesn't (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) 
!= 0 imply that mode != 0?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [dm-devel] [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-18 22:12     ` [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
  2023-04-18 22:43       ` Bart Van Assche
@ 2023-04-19 15:36       ` Darrick J. Wong
  2023-04-19 16:17         ` Mike Snitzer
  1 sibling, 1 reply; 57+ messages in thread
From: Darrick J. Wong @ 2023-04-19 15:36 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin, Jason Wang,
	Bart Van Assche, Mike Snitzer, Christoph Hellwig, Andreas Dilger,
	Daniil Lunev, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Tue, Apr 18, 2023 at 03:12:04PM -0700, Sarthak Kukreti wrote:
> Introduce block request REQ_OP_PROVISION. The intent of this request
> is to request underlying storage to preallocate disk space for the given
> block range. Block devices that support this capability will export
> a provision limit within their request queues.
> 
> This patch also adds the capability to call fallocate() in mode 0
> on block devices, which will send REQ_OP_PROVISION to the block
> device for the specified range,
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/blk-core.c          |  5 ++++
>  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
>  block/blk-merge.c         | 18 +++++++++++++
>  block/blk-settings.c      | 19 ++++++++++++++
>  block/blk-sysfs.c         |  8 ++++++
>  block/bounce.c            |  1 +
>  block/fops.c              | 25 +++++++++++++-----
>  include/linux/bio.h       |  6 +++--
>  include/linux/blk_types.h |  5 +++-
>  include/linux/blkdev.h    | 16 ++++++++++++
>  10 files changed, 147 insertions(+), 9 deletions(-)
> 

<cut to the fallocate part; the block/ changes look fine to /me/ at
first glance, but what do I know... ;)>

> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c..e1775269654a 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -611,9 +611,13 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	return ret;
>  }
>  
> +#define	BLKDEV_FALLOC_FL_TRUNCATE				\

At first I thought from this name that you were defining a new truncate
mode for fallocate, then I realized that this is mask for deciding if we
/want/ to truncate the pagecache.

#define		BLKDEV_FALLOC_TRUNCATE_MASK ?

> +		(FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_ZERO_RANGE |	\

Ok, so discarding and writing zeroes truncates the page cache, makes
sense since we're "writing" directly to the block device.

> +		 FALLOC_FL_NO_HIDE_STALE)

Here things get tricky -- some of the FALLOC_FL mode bits are really an
opcode and cannot be specified together, whereas others select optional
behavior for certain opcodes.

IIRC, the mutually exclusive opcodes are:

	PUNCH_HOLE
	ZERO_RANGE
	COLLAPSE_RANGE
	INSERT_RANGE
	(none of the above, for allocation)

and the "variants on a theme are":

	KEEP_SIZE
	NO_HIDE_STALE
	UNSHARE_RANGE

not all of which are supported by all the opcodes.

Does it make sense to truncate the page cache if userspace passes in
mode == NO_HIDE_STALE?  There's currently no defined meaning for this
combination, but I think this means we'll truncate the pagecache before
deciding if we're actually going to issue any commands.

I think that's just a bug in the existing code -- it should be
validating that @mode is any of the supported combinations *before*
truncating the pagecache.

Otherwise you could have a mkfs program that starts writing new fs
metadata, decides to provision the storage (say for a logging region),
doesn't realize it's running on an old kernel, and then oops the
provision attempt fails but have we now shredded the pagecache and lost
all the writes?

--D

> +
>  #define	BLKDEV_FALLOC_FL_SUPPORTED					\
> -		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
> -		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
> +		(BLKDEV_FALLOC_FL_TRUNCATE | FALLOC_FL_KEEP_SIZE |	\
> +		 FALLOC_FL_UNSHARE_RANGE)
>  
>  static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  			     loff_t len)
> @@ -625,7 +629,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	int error;
>  
>  	/* Fail if we don't recognize the flags. */
> -	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> +	if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
>  		return -EOPNOTSUPP;
>  
>  	/* Don't go off the end of the device. */
> @@ -649,11 +653,20 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	filemap_invalidate_lock(inode->i_mapping);
>  
>  	/* Invalidate the page cache, including dirty pages. */
> -	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> -	if (error)
> -		goto fail;
> +	if (mode & BLKDEV_FALLOC_FL_TRUNCATE) {
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (error)
> +			goto fail;
> +	}
>  
>  	switch (mode) {
> +	case 0:
> +	case FALLOC_FL_UNSHARE_RANGE:
> +	case FALLOC_FL_KEEP_SIZE:
> +	case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
> +		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> +					       len >> SECTOR_SHIFT, GFP_KERNEL);
> +		break;
>  	case FALLOC_FL_ZERO_RANGE:
>  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
>  		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d766be7152e1..9820b3b039f2 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
>  	    bio->bi_iter.bi_size &&
>  	    bio_op(bio) != REQ_OP_DISCARD &&
>  	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
> -	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
> +	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> +	    bio_op(bio) != REQ_OP_PROVISION)
>  		return true;
>  
>  	return false;
> @@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
>  {
>  	return bio_op(bio) == REQ_OP_DISCARD ||
>  	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
> -	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
> +	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
> +	       bio_op(bio) == REQ_OP_PROVISION;
>  }
>  
>  static inline void *bio_data(struct bio *bio)
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 99be590f952f..27bdf88f541c 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -385,7 +385,10 @@ enum req_op {
>  	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
>  	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
>  
> -	REQ_OP_LAST		= (__force blk_opf_t)36,
> +	/* request device to provision block */
> +	REQ_OP_PROVISION        = (__force blk_opf_t)37,
> +
> +	REQ_OP_LAST		= (__force blk_opf_t)38,
>  };
>  
>  enum req_flag_bits {
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 941304f17492..239e2f418b6e 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -303,6 +303,7 @@ struct queue_limits {
>  	unsigned int		discard_granularity;
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
> +	unsigned int		max_provision_sectors;
>  
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
> @@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
>  		unsigned int max_discard_sectors);
>  extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>  		unsigned int max_write_same_sectors);
> +extern void blk_queue_max_provision_sectors(struct request_queue *q,
> +		unsigned int max_provision_sectors);
>  extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
>  extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
>  		unsigned int max_zone_append_sectors);
> @@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp);
>  
> +extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> +		sector_t nr_sects, gfp_t gfp_mask);
> +
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
>  
> @@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
>  	return q->limits.max_discard_segments;
>  }
>  
> +static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
> +{
> +	return q->limits.max_provision_sectors;
> +}
> +
>  static inline unsigned int queue_max_segment_size(const struct request_queue *q)
>  {
>  	return q->limits.max_segment_size;
> @@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
>  	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
>  }
>  
> +static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
> +{
> +	return bdev_get_queue(bdev)->limits.max_provision_sectors;
> +}
> +
>  static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
>  {
>  	return blk_queue_zoned_model(bdev_get_queue(bdev));
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-19 15:36       ` [dm-devel] " Darrick J. Wong
@ 2023-04-19 16:17         ` Mike Snitzer
  2023-04-19 17:26           ` Darrick J. Wong
  0 siblings, 1 reply; 57+ messages in thread
From: Mike Snitzer @ 2023-04-19 16:17 UTC (permalink / raw)
  To: Darrick J. Wong, Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin, Jason Wang,
	Bart Van Assche, Christoph Hellwig, Andreas Dilger, Daniil Lunev,
	Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Wed, Apr 19 2023 at 11:36P -0400,
Darrick J. Wong <djwong@kernel.org> wrote:

> On Tue, Apr 18, 2023 at 03:12:04PM -0700, Sarthak Kukreti wrote:
> > Introduce block request REQ_OP_PROVISION. The intent of this request
> > is to request underlying storage to preallocate disk space for the given
> > block range. Block devices that support this capability will export
> > a provision limit within their request queues.
> > 
> > This patch also adds the capability to call fallocate() in mode 0
> > on block devices, which will send REQ_OP_PROVISION to the block
> > device for the specified range,
> > 
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  block/blk-core.c          |  5 ++++
> >  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
> >  block/blk-merge.c         | 18 +++++++++++++
> >  block/blk-settings.c      | 19 ++++++++++++++
> >  block/blk-sysfs.c         |  8 ++++++
> >  block/bounce.c            |  1 +
> >  block/fops.c              | 25 +++++++++++++-----
> >  include/linux/bio.h       |  6 +++--
> >  include/linux/blk_types.h |  5 +++-
> >  include/linux/blkdev.h    | 16 ++++++++++++
> >  10 files changed, 147 insertions(+), 9 deletions(-)
> > 
> 
> <cut to the fallocate part; the block/ changes look fine to /me/ at
> first glance, but what do I know... ;)>
> 
> > diff --git a/block/fops.c b/block/fops.c
> > index d2e6be4e3d1c..e1775269654a 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -611,9 +611,13 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  	return ret;
> >  }
> >  
> > +#define	BLKDEV_FALLOC_FL_TRUNCATE				\
> 
> At first I thought from this name that you were defining a new truncate
> mode for fallocate, then I realized that this is mask for deciding if we
> /want/ to truncate the pagecache.
> 
> #define		BLKDEV_FALLOC_TRUNCATE_MASK ?
> 
> > +		(FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_ZERO_RANGE |	\
> 
> Ok, so discarding and writing zeroes truncates the page cache, makes
> sense since we're "writing" directly to the block device.
> 
> > +		 FALLOC_FL_NO_HIDE_STALE)
> 
> Here things get tricky -- some of the FALLOC_FL mode bits are really an
> opcode and cannot be specified together, whereas others select optional
> behavior for certain opcodes.
> 
> IIRC, the mutually exclusive opcodes are:
> 
> 	PUNCH_HOLE
> 	ZERO_RANGE
> 	COLLAPSE_RANGE
> 	INSERT_RANGE
> 	(none of the above, for allocation)
> 
> and the "variants on a theme are":
> 
> 	KEEP_SIZE
> 	NO_HIDE_STALE
> 	UNSHARE_RANGE
> 
> not all of which are supported by all the opcodes.
> 
> Does it make sense to truncate the page cache if userspace passes in
> mode == NO_HIDE_STALE?  There's currently no defined meaning for this
> combination, but I think this means we'll truncate the pagecache before
> deciding if we're actually going to issue any commands.
> 
> I think that's just a bug in the existing code -- it should be
> validating that @mode is any of the supported combinations *before*
> truncating the pagecache.
> 
> Otherwise you could have a mkfs program that starts writing new fs
> metadata, decides to provision the storage (say for a logging region),
> doesn't realize it's running on an old kernel, and then oops the
> provision attempt fails but have we now shredded the pagecache and lost
> all the writes?

While that just caused me to have an "oh shit, that's crazy" (in a
scary way) belly laugh...
(And obviously needs fixing independent of this patchset)

Shouldn't mkfs first check that the underlying storage supports
REQ_OP_PROVISION by verifying
/sys/block/<dev>/queue/provision_max_bytes exists and is not 0?
(Just saying, we need to add new features more defensively.. you just
made the case based on this scenario's implications alone)

Sarthak, please note I said "provision_max_bytes": all other ops
(e.g. DISCARD, WRITE_ZEROES, etc) have <op>_max_bytes exported through
sysfs, not <op>_max_sectors.  Please export provision_max_bytes, e.g.:

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 202aa78f933e..2e5ac7b1ffbd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -605,12 +605,12 @@ QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
 QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
 
 QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
-QUEUE_RO_ENTRY(queue_max_provision_sectors, "max_provision_sectors");
 QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
 QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
 QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
 QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
 
+QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-19 16:17         ` Mike Snitzer
@ 2023-04-19 17:26           ` Darrick J. Wong
  2023-04-19 23:21             ` Dave Chinner
  0 siblings, 1 reply; 57+ messages in thread
From: Darrick J. Wong @ 2023-04-19 17:26 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Sarthak Kukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel, Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Jason Wang, Bart Van Assche, Christoph Hellwig, Andreas Dilger,
	Daniil Lunev, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Wed, Apr 19, 2023 at 12:17:34PM -0400, Mike Snitzer wrote:
> On Wed, Apr 19 2023 at 11:36P -0400,
> Darrick J. Wong <djwong@kernel.org> wrote:
> 
> > On Tue, Apr 18, 2023 at 03:12:04PM -0700, Sarthak Kukreti wrote:
> > > Introduce block request REQ_OP_PROVISION. The intent of this request
> > > is to request underlying storage to preallocate disk space for the given
> > > block range. Block devices that support this capability will export
> > > a provision limit within their request queues.
> > > 
> > > This patch also adds the capability to call fallocate() in mode 0
> > > on block devices, which will send REQ_OP_PROVISION to the block
> > > device for the specified range,
> > > 
> > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > ---
> > >  block/blk-core.c          |  5 ++++
> > >  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
> > >  block/blk-merge.c         | 18 +++++++++++++
> > >  block/blk-settings.c      | 19 ++++++++++++++
> > >  block/blk-sysfs.c         |  8 ++++++
> > >  block/bounce.c            |  1 +
> > >  block/fops.c              | 25 +++++++++++++-----
> > >  include/linux/bio.h       |  6 +++--
> > >  include/linux/blk_types.h |  5 +++-
> > >  include/linux/blkdev.h    | 16 ++++++++++++
> > >  10 files changed, 147 insertions(+), 9 deletions(-)
> > > 
> > 
> > <cut to the fallocate part; the block/ changes look fine to /me/ at
> > first glance, but what do I know... ;)>
> > 
> > > diff --git a/block/fops.c b/block/fops.c
> > > index d2e6be4e3d1c..e1775269654a 100644
> > > --- a/block/fops.c
> > > +++ b/block/fops.c
> > > @@ -611,9 +611,13 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > >  	return ret;
> > >  }
> > >  
> > > +#define	BLKDEV_FALLOC_FL_TRUNCATE				\
> > 
> > At first I thought from this name that you were defining a new truncate
> > mode for fallocate, then I realized that this is mask for deciding if we
> > /want/ to truncate the pagecache.
> > 
> > #define		BLKDEV_FALLOC_TRUNCATE_MASK ?
> > 
> > > +		(FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_ZERO_RANGE |	\
> > 
> > Ok, so discarding and writing zeroes truncates the page cache, makes
> > sense since we're "writing" directly to the block device.
> > 
> > > +		 FALLOC_FL_NO_HIDE_STALE)
> > 
> > Here things get tricky -- some of the FALLOC_FL mode bits are really an
> > opcode and cannot be specified together, whereas others select optional
> > behavior for certain opcodes.
> > 
> > IIRC, the mutually exclusive opcodes are:
> > 
> > 	PUNCH_HOLE
> > 	ZERO_RANGE
> > 	COLLAPSE_RANGE
> > 	INSERT_RANGE
> > 	(none of the above, for allocation)
> > 
> > and the "variants on a theme are":
> > 
> > 	KEEP_SIZE
> > 	NO_HIDE_STALE
> > 	UNSHARE_RANGE
> > 
> > not all of which are supported by all the opcodes.
> > 
> > Does it make sense to truncate the page cache if userspace passes in
> > mode == NO_HIDE_STALE?  There's currently no defined meaning for this
> > combination, but I think this means we'll truncate the pagecache before
> > deciding if we're actually going to issue any commands.
> > 
> > I think that's just a bug in the existing code -- it should be
> > validating that @mode is any of the supported combinations *before*
> > truncating the pagecache.
> > 
> > Otherwise you could have a mkfs program that starts writing new fs
> > metadata, decides to provision the storage (say for a logging region),
> > doesn't realize it's running on an old kernel, and then oops the
> > provision attempt fails but have we now shredded the pagecache and lost
> > all the writes?
> 
> While that just caused me to have an "oh shit, that's crazy" (in a
> scary way) belly laugh...

I just tried this and:

# xfs_io -c 'pwrite -S 0x58 1m 1m' -c fsync -c 'pwrite -S 0x59 1m 4096' -c 'pread -v 1m 64' -c 'falloc 1m 4096' -c 'pread -v 1m 64' /dev/sda
wrote 1048576/1048576 bytes at offset 1048576
1 MiB, 256 ops; 0.0013 sec (723.589 MiB/sec and 185238.7844 ops/sec)
wrote 4096/4096 bytes at offset 1048576
4 KiB, 1 ops; 0.0000 sec (355.114 MiB/sec and 90909.0909 ops/sec)
00100000:  59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59  YYYYYYYYYYYYYYYY
00100010:  59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59  YYYYYYYYYYYYYYYY
00100020:  59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59  YYYYYYYYYYYYYYYY
00100030:  59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59  YYYYYYYYYYYYYYYY
read 64/64 bytes at offset 1048576
64.000000 bytes, 1 ops; 0.0000 sec (1.565 MiB/sec and 25641.0256 ops/sec)
fallocate: Operation not supported
00100000:  58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58  XXXXXXXXXXXXXXXX
00100010:  58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58  XXXXXXXXXXXXXXXX
00100020:  58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58  XXXXXXXXXXXXXXXX
00100030:  58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58  XXXXXXXXXXXXXXXX
read 64/64 bytes at offset 1048576
64.000000 bytes, 1 ops; 0.0003 sec (176.554 KiB/sec and 2824.8588 ops/sec)

(Write 1MB of Xs, flush it to disk, write 4k of Ys, confirm the Y's are
in the page cache, fail to fallocate, reread and spot the Xs that we
supposedly overwrote.)

oops.

> (And obviously needs fixing independent of this patchset)
> 
> Shouldn't mkfs first check that the underlying storage supports
> REQ_OP_PROVISION by verifying
> /sys/block/<dev>/queue/provision_max_bytes exists and is not 0?
> (Just saying, we need to add new features more defensively.. you just
> made the case based on this scenario's implications alone)

Not for fallocate -- for regular files, there's no way to check if the
filesystem actually supports the operation requested other than to try
it and see if it succeeds.  We probably should've defined a DRY_RUN flag
for that purpose back when it was introduced.

For fallocate calls to block devices, yes, the program can check the
queue limits in sysfs if fstat says the supplied path is a block device,
but I don't know that most programs are that thorough.  The fallocate(1)
CLI program does not check.

Then I moved on to fs utilities:

ext4: For discard, mke2fs calls BLKDISCARD if it detects a block device
via fstat, and falloc(PUNCH|KEEP_SIZE) for anything else.  For zeroing,
it only uses falloc(ZERO) or falloc(PUNCH|KEEP_SIZE) and does not try to
use BLKZEROOUT.  It does not check sysfs queue limits at all.

XFS: mkfs.xfs issues BLKDISCARD before writing anything to the device,
so that's fine.  It uses falloc(ZERO) to erase the log, but since
xfsprogs provides its own buffer cache and uses O_DIRECT, pagecache
coherency problems aren't an issue.

btrfs: mkfs.btrfs only issues BLKDISCARD, and only before it starts
writing the new fs, so no problems there.

--D

> Sarthak, please note I said "provision_max_bytes": all other ops
> (e.g. DISCARD, WRITE_ZEROES, etc) have <op>_max_bytes exported through
> sysfs, not <op>_max_sectors.  Please export provision_max_bytes, e.g.:
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 202aa78f933e..2e5ac7b1ffbd 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -605,12 +605,12 @@ QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
>  QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
>  
>  QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
> -QUEUE_RO_ENTRY(queue_max_provision_sectors, "max_provision_sectors");
>  QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
>  QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
>  QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
>  QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
>  
> +QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
>  QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
>  QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
>  QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-19 17:26           ` Darrick J. Wong
@ 2023-04-19 23:21             ` Dave Chinner
  2023-04-20  0:53               ` Sarthak Kukreti
  0 siblings, 1 reply; 57+ messages in thread
From: Dave Chinner @ 2023-04-19 23:21 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Mike Snitzer, Sarthak Kukreti, dm-devel, linux-block, linux-ext4,
	linux-kernel, linux-fsdevel, Jens Axboe, Theodore Ts'o,
	Michael S. Tsirkin, Jason Wang, Bart Van Assche,
	Christoph Hellwig, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi,
	Brian Foster, Alasdair Kergon

On Wed, Apr 19, 2023 at 10:26:02AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 19, 2023 at 12:17:34PM -0400, Mike Snitzer wrote:
> > (And obviously needs fixing independent of this patchset)
> > 
> > Shouldn't mkfs first check that the underlying storage supports
> > REQ_OP_PROVISION by verifying
> > /sys/block/<dev>/queue/provision_max_bytes exists and is not 0?
> > (Just saying, we need to add new features more defensively.. you just
> > made the case based on this scenario's implications alone)
> 
> Not for fallocate -- for regular files, there's no way to check if the
> filesystem actually supports the operation requested other than to try
> it and see if it succeeds.  We probably should've defined a DRY_RUN flag
> for that purpose back when it was introduced.

That ignores the fact that fallocate() was never intended to
guarantee it will work in all contexts - it's an advisory interface
at it's most basic level. If the call succeeds, then great, it does
what is says on the box, but if it fails then it should have no
visible effect on user data at all and the application still needs
to perform whatever modification it needed done in some other way.

IOWs, calling it one a block device without first checking if the
block device supports that fallocate operation is exactly how it is
supposed to be used. If the kernel can't handle this gracefully
without corrupting data, then that's a kernel bug and not an
application problem.

> For fallocate calls to block devices, yes, the program can check the
> queue limits in sysfs if fstat says the supplied path is a block device,
> but I don't know that most programs are that thorough.  The fallocate(1)
> CLI program does not check.

Right. fallocate() was intended to just do the right thing without
the application having to jump thrown an unknown number of hoops to
determine if fallocate() can be used or not in the context it is
executing in.  The kernel implementation is supposed to abstract all that
context-dependent behaviour away from the application; all the
application has to do is implement the fallocate() fast path and a
single generic "do the right thing the slow way" fallback when the
fallocate() method it called is not supported...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5 0/5] Introduce block provisioning primitives
  2023-04-14  0:02 ` [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
                     ` (3 preceding siblings ...)
  2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
@ 2023-04-20  0:48   ` Sarthak Kukreti
  2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
                       ` (5 more replies)
  4 siblings, 6 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:48 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Next revision of adding support for block provisioning requests.

Changes from v4:
- Add fix for block devices invalidating pagecache if blkdev_fallocate()
  is called with an invalid mode.
- s/max_provision_sectors/provision_max_bytes in sysfs.

Sarthak Kukreti (5):
  block: Don't invalidate pagecache for invalid falloc modes
  block: Introduce provisioning primitives
  dm: Add block provisioning support
  dm-thin: Add REQ_OP_PROVISION support
  loop: Add support for provision requests

 block/blk-core.c              |  5 +++
 block/blk-lib.c               | 53 +++++++++++++++++++++++++
 block/blk-merge.c             | 18 +++++++++
 block/blk-settings.c          | 19 +++++++++
 block/blk-sysfs.c             |  9 +++++
 block/bounce.c                |  1 +
 block/fops.c                  | 28 +++++++++-----
 drivers/block/loop.c          | 42 ++++++++++++++++++++
 drivers/md/dm-crypt.c         |  5 ++-
 drivers/md/dm-linear.c        |  2 +
 drivers/md/dm-snap.c          |  8 ++++
 drivers/md/dm-table.c         | 23 +++++++++++
 drivers/md/dm-thin.c          | 73 ++++++++++++++++++++++++++++++++---
 drivers/md/dm.c               |  6 +++
 include/linux/bio.h           |  6 ++-
 include/linux/blk_types.h     |  5 ++-
 include/linux/blkdev.h        | 16 ++++++++
 include/linux/device-mapper.h | 17 ++++++++
 18 files changed, 318 insertions(+), 18 deletions(-)

-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
@ 2023-04-20  0:48     ` Sarthak Kukreti
  2023-04-20  1:22       ` Darrick J. Wong
                         ` (2 more replies)
  2023-04-20  0:48     ` [PATCH v5 2/5] block: Introduce provisioning primitives Sarthak Kukreti
                       ` (4 subsequent siblings)
  5 siblings, 3 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:48 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Only call truncate_bdev_range() if the fallocate mode is
supported. This fixes a bug where data in the pagecache
could be invalidated if the fallocate() was called on the
block device with an invalid mode.

Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/fops.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..2fd7e8b9ab48 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -648,25 +648,27 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 
 	filemap_invalidate_lock(inode->i_mapping);
 
-	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
-
+	/*
+	 * Invalidate the page cache, including dirty pages, for valid
+	 * de-allocate mode calls to fallocate().
+	 */
 	switch (mode) {
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
-		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
+		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
+			blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOUNMAP);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
-		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
+		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
+			blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOFALLBACK);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
-		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
+		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
+			blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL);
 		break;
 	default:
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 2/5] block: Introduce provisioning primitives
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
  2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
@ 2023-04-20  0:48     ` Sarthak Kukreti
  2023-04-20  0:48     ` [PATCH v5 3/5] dm: Add block provisioning support Sarthak Kukreti
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:48 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Introduce block request REQ_OP_PROVISION. The intent of this request
is to request underlying storage to preallocate disk space for the given
block range. Block devices that support this capability will export
a provision limit within their request queues.

This patch also adds the capability to call fallocate() in mode 0
on block devices, which will send REQ_OP_PROVISION to the block
device for the specified range,

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/blk-core.c          |  5 ++++
 block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c         | 18 +++++++++++++
 block/blk-settings.c      | 19 ++++++++++++++
 block/blk-sysfs.c         |  9 +++++++
 block/bounce.c            |  1 +
 block/fops.c              | 10 +++++++-
 include/linux/bio.h       |  6 +++--
 include/linux/blk_types.h |  5 +++-
 include/linux/blkdev.h    | 16 ++++++++++++
 10 files changed, 138 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 42926e6cb83c..4a2342ba3a8b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
 	REQ_OP_NAME(WRITE_ZEROES),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
+	REQ_OP_NAME(PROVISION)
 };
 #undef REQ_OP_NAME
 
@@ -798,6 +799,10 @@ void submit_bio_noacct(struct bio *bio)
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_PROVISION:
+		if (!q->limits.max_provision_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..647b6451660b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -343,3 +343,56 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_secure_erase);
+
+/**
+ * blkdev_issue_provision - provision a block range
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to provision
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Issues a provision request to the block device for the range of sectors.
+ *  For thinly provisioned block devices, this acts as a signal for the
+ *  underlying storage pool to allocate space for this block range.
+ */
+int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp)
+{
+	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	unsigned int max_sectors = bdev_max_provision_sectors(bdev);
+	struct bio *bio = NULL;
+	struct blk_plug plug;
+	int ret = 0;
+
+	if (max_sectors == 0)
+		return -EOPNOTSUPP;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+	if (bdev_read_only(bdev))
+		return -EPERM;
+
+	blk_start_plug(&plug);
+	for (;;) {
+		unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
+
+		bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
+
+		sector += req_sects;
+		nr_sects -= req_sects;
+		if (!nr_sects) {
+			ret = submit_bio_wait(bio);
+			if (ret == -EOPNOTSUPP)
+				ret = 0;
+			bio_put(bio);
+			break;
+		}
+		cond_resched();
+	}
+	blk_finish_plug(&plug);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_provision);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6460abdb2426..a3ffebb97a1d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
 	return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
 }
 
+static struct bio *bio_split_provision(struct bio *bio,
+					const struct queue_limits *lim,
+					unsigned int *nsegs, struct bio_set *bs)
+{
+	*nsegs = 0;
+
+	if (!lim->max_provision_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= lim->max_provision_sectors)
+		return NULL;
+
+	return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
+}
+
 /*
  * Return the maximum number of sectors from the start of a bio that may be
  * submitted as a single request to a block device. If enough sectors remain,
@@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
 	case REQ_OP_WRITE_ZEROES:
 		split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
 		break;
+	case REQ_OP_PROVISION:
+		split = bio_split_provision(bio, lim, nr_segs, bs);
+		break;
 	default:
 		split = bio_split_rw(bio, lim, nr_segs, bs,
 				get_max_io_size(bio, lim) << SECTOR_SHIFT);
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 896b4654ab00..d303e6614c36 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
+	lim->max_provision_sectors = 0;
 }
 
 /**
@@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_provision_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_provision_sectors - set max sectors for a single provision
+ *
+ * @q:  the request queue for the device
+ * @max_provision_sectors: maximum number of sectors to provision per command
+ **/
+
+void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors)
+{
+	q->limits.max_provision_sectors = max_provision_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_provision_sectors);
+
 /**
  * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
  * @q:  the request queue for the device
@@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_provision_sectors = min_not_zero(t->max_provision_sectors,
+						b->max_provision_sectors);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f1fce1c7fa44..0a3165211c66 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -213,6 +213,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
 	return queue_var_show(0, page);
 }
 
+static ssize_t queue_provision_max_show(struct request_queue *q,
+		char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_provision_sectors << 9);
+}
+
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(0, page);
@@ -604,6 +611,7 @@ QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
 QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
 QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
 
+QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
@@ -661,6 +669,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_provision_max_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..ab9d8723ae64 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		break;
 	default:
 		bio_for_each_segment(bv, bio_src, iter)
diff --git a/block/fops.c b/block/fops.c
index 2fd7e8b9ab48..16420c5fe4a2 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -613,7 +613,8 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 #define	BLKDEV_FALLOC_FL_SUPPORTED					\
 		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
-		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE |	\
+		 FALLOC_FL_UNSHARE_RANGE)
 
 static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 			     loff_t len)
@@ -653,6 +654,13 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	 * de-allocate mode calls to fallocate().
 	 */
 	switch (mode) {
+	case 0:
+	case FALLOC_FL_UNSHARE_RANGE:
+	case FALLOC_FL_KEEP_SIZE:
+	case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
+					       len >> SECTOR_SHIFT, GFP_KERNEL);
+		break;
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
 		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d766be7152e1..9820b3b039f2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
 	    bio->bi_iter.bi_size &&
 	    bio_op(bio) != REQ_OP_DISCARD &&
 	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
-	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
+	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
+	    bio_op(bio) != REQ_OP_PROVISION)
 		return true;
 
 	return false;
@@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
 {
 	return bio_op(bio) == REQ_OP_DISCARD ||
 	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
-	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
+	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+	       bio_op(bio) == REQ_OP_PROVISION;
 }
 
 static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99be590f952f..27bdf88f541c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -385,7 +385,10 @@ enum req_op {
 	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
 	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
 
-	REQ_OP_LAST		= (__force blk_opf_t)36,
+	/* request device to provision block */
+	REQ_OP_PROVISION        = (__force blk_opf_t)37,
+
+	REQ_OP_LAST		= (__force blk_opf_t)38,
 };
 
 enum req_flag_bits {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 941304f17492..239e2f418b6e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -303,6 +303,7 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int		max_provision_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
@@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
 
+extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask);
+
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
 
@@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
 	return q->limits.max_discard_segments;
 }
 
+static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
+{
+	return q->limits.max_provision_sectors;
+}
+
 static inline unsigned int queue_max_segment_size(const struct request_queue *q)
 {
 	return q->limits.max_segment_size;
@@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
 	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
 }
 
+static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
+{
+	return bdev_get_queue(bdev)->limits.max_provision_sectors;
+}
+
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
 {
 	return blk_queue_zoned_model(bdev_get_queue(bdev));
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 3/5] dm: Add block provisioning support
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
  2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
  2023-04-20  0:48     ` [PATCH v5 2/5] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-04-20  0:48     ` Sarthak Kukreti
  2023-04-20  0:48     ` [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:48 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Add block provisioning support for device-mapper targets.
dm-crypt, dm-snap and dm-linear will, by default, passthrough
REQ_OP_PROVISION requests to the underlying device, if
supported.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-crypt.c         |  5 ++++-
 drivers/md/dm-linear.c        |  2 ++
 drivers/md/dm-snap.c          |  8 ++++++++
 drivers/md/dm-table.c         | 23 +++++++++++++++++++++++
 drivers/md/dm.c               |  6 ++++++
 include/linux/device-mapper.h | 17 +++++++++++++++++
 6 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 8b47b913ee83..aa8072d6d7bf 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3336,6 +3336,9 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		cc->tag_pool_max_sectors <<= cc->sector_shift;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	ret = -ENOMEM;
 	cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 1, devname);
 	if (!cc->io_queue) {
@@ -3390,7 +3393,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	 * - for REQ_OP_DISCARD caller must use flush if IO ordering matters
 	 */
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH ||
-	    bio_op(bio) == REQ_OP_DISCARD)) {
+	    bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) {
 		bio_set_dev(bio, cc->dev->bdev);
 		if (bio_sectors(bio))
 			bio->bi_iter.bi_sector = cc->start +
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index f4448d520ee9..66e50f5b0665 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,8 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
 	ti->private = lc;
 	return 0;
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 9c49f53760d0..07927bb1e711 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1358,6 +1358,8 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (s->discard_zeroes_cow)
 		ti->num_discard_bios = (s->discard_passdown_origin ? 2 : 1);
 	ti->per_io_data_size = sizeof(struct dm_snap_tracked_chunk);
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
 
 	/* Add snapshot to the list of snapshots for this origin */
 	/* Exceptions aren't triggered till snapshot_resume() is called */
@@ -2003,6 +2005,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	/* If the block is already remapped - use that, else remap it */
 	e = dm_lookup_exception(&s->complete, chunk);
 	if (e) {
+		if (unlikely(bio_op(bio) == REQ_OP_PROVISION)) {
+			bio_endio(bio);
+			r = DM_MAPIO_SUBMITTED;
+			goto out_unlock;
+		}
 		remap_exception(s, e, bio, chunk);
 		if (unlikely(bio_op(bio) == REQ_OP_DISCARD) &&
 		    io_overlaps_chunk(s, bio)) {
@@ -2413,6 +2420,7 @@ static void snapshot_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		/* All discards are split on chunk_size boundary */
 		limits->discard_granularity = snap->store->chunk_size;
 		limits->max_discard_sectors = snap->store->chunk_size;
+		limits->max_provision_sectors = snap->store->chunk_size;
 
 		up_read(&_origins_lock);
 	}
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 119db5e01080..9301f050529f 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1854,6 +1854,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
 	return true;
 }
 
+static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
+				    sector_t start, sector_t len, void *data)
+{
+	return !bdev_max_provision_sectors(dev->bdev);
+}
+
+static bool dm_table_supports_provision(struct dm_table *t)
+{
+	for (unsigned int i = 0; i < t->num_targets; i++) {
+		struct dm_target *ti = dm_table_get_target(t, i);
+
+		if (ti->provision_supported ||
+		    (ti->type->iterate_devices &&
+		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
+			return true;
+	}
+
+	return false;
+}
+
 static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
 				     sector_t start, sector_t len, void *data)
 {
@@ -1987,6 +2007,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_zeroes(t))
 		q->limits.max_write_zeroes_sectors = 0;
 
+	if (!dm_table_supports_provision(t))
+		q->limits.max_provision_sectors = 0;
+
 	dm_table_verify_integrity(t);
 
 	/*
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b694ba3a106..9b94121b8d38 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1609,6 +1609,7 @@ static bool is_abnormal_io(struct bio *bio)
 		case REQ_OP_DISCARD:
 		case REQ_OP_SECURE_ERASE:
 		case REQ_OP_WRITE_ZEROES:
+		case REQ_OP_PROVISION:
 			return true;
 		default:
 			break;
@@ -1641,6 +1642,11 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
 		if (ti->max_write_zeroes_granularity)
 			max_granularity = limits->max_write_zeroes_sectors;
 		break;
+	case REQ_OP_PROVISION:
+		num_bios = ti->num_provision_bios;
+		if (ti->max_provision_granularity)
+			max_granularity = limits->max_provision_sectors;
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a52d2b9a6846..9981378457d2 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -334,6 +334,12 @@ struct dm_target {
 	 */
 	unsigned int num_write_zeroes_bios;
 
+	/*
+	 * The number of PROVISION bios that will be submitted to the target.
+	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
+	 */
+	unsigned int num_provision_bios;
+
 	/*
 	 * The minimum number of extra bytes allocated in each io for the
 	 * target to use.
@@ -358,6 +364,11 @@ struct dm_target {
 	 */
 	bool discards_supported:1;
 
+	/* Set if this target needs to receive provision requests regardless of
+	 * whether or not its underlying devices have support.
+	 */
+	bool provision_supported:1;
+
 	/*
 	 * Set if this target requires that discards be split on
 	 * 'max_discard_sectors' boundaries.
@@ -376,6 +387,12 @@ struct dm_target {
 	 */
 	bool max_write_zeroes_granularity:1;
 
+	/*
+	 * Set if this target requires that provisions be split on
+	 * 'max_provision_sectors' boundaries.
+	 */
+	bool max_provision_granularity:1;
+
 	/*
 	 * Set if we need to limit the number of in-flight bios when swapping.
 	 */
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
                       ` (2 preceding siblings ...)
  2023-04-20  0:48     ` [PATCH v5 3/5] dm: Add block provisioning support Sarthak Kukreti
@ 2023-04-20  0:48     ` Sarthak Kukreti
  2023-05-01 19:15       ` Mike Snitzer
  2023-04-20  0:48     ` [PATCH v5 5/5] loop: Add support for provision requests Sarthak Kukreti
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
  5 siblings, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:48 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

dm-thinpool uses the provision request to provision
blocks for a dm-thin device. dm-thinpool currently does not
pass through REQ_OP_PROVISION to underlying devices.

For shared blocks, provision requests will break sharing and copy the
contents of the entire block. Additionally, if 'skip_block_zeroing'
is not set, dm-thin will opt to zero out the entire range as a part
of provisioning.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-thin.c | 73 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 2b13c949bd72..58d633f5c928 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -274,6 +274,7 @@ struct pool {
 
 	process_bio_fn process_bio;
 	process_bio_fn process_discard;
+	process_bio_fn process_provision;
 
 	process_cell_fn process_cell;
 	process_cell_fn process_discard_cell;
@@ -913,7 +914,8 @@ static void __inc_remap_and_issue_cell(void *context,
 	struct bio *bio;
 
 	while ((bio = bio_list_pop(&cell->bios))) {
-		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
+		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+		    bio_op(bio) == REQ_OP_PROVISION)
 			bio_list_add(&info->defer_bios, bio);
 		else {
 			inc_all_io_entry(info->tc->pool, bio);
@@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
 
 static int io_overwrites_block(struct pool *pool, struct bio *bio)
 {
-	return (bio_data_dir(bio) == WRITE) &&
-		io_overlaps_block(pool, bio);
+	return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) &&
+	       bio_op(bio) != REQ_OP_PROVISION;
 }
 
 static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
@@ -1891,7 +1893,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
 
 	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
 		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
-		cell_defer_no_holder(tc, virt_cell);
+		if (bio_op(bio) != REQ_OP_PROVISION)
+			cell_defer_no_holder(tc, virt_cell);
 	} else {
 		struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
 
@@ -1953,6 +1956,51 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 	}
 }
 
+static void process_provision_bio(struct thin_c *tc, struct bio *bio)
+{
+	int r;
+	struct pool *pool = tc->pool;
+	dm_block_t block = get_bio_block(tc, bio);
+	struct dm_bio_prison_cell *cell;
+	struct dm_cell_key key;
+	struct dm_thin_lookup_result lookup_result;
+
+	/*
+	 * If cell is already occupied, then the block is already
+	 * being provisioned so we have nothing further to do here.
+	 */
+	build_virtual_key(tc->td, block, &key);
+	if (bio_detain(pool, &key, bio, &cell))
+		return;
+
+	if (tc->requeue_mode) {
+		cell_requeue(pool, cell);
+		return;
+	}
+
+	r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
+	switch (r) {
+	case 0:
+		if (lookup_result.shared) {
+			process_shared_bio(tc, bio, block, &lookup_result, cell);
+		} else {
+			bio_endio(bio);
+			cell_defer_no_holder(tc, cell);
+		}
+		break;
+	case -ENODATA:
+		provision_block(tc, bio, block, cell);
+		break;
+
+	default:
+		DMERR_LIMIT("%s: dm_thin_find_block() failed: error = %d",
+			    __func__, r);
+		cell_defer_no_holder(tc, cell);
+		bio_io_error(bio);
+		break;
+	}
+}
+
 static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
 {
 	int r;
@@ -2228,6 +2276,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
 
 		if (bio_op(bio) == REQ_OP_DISCARD)
 			pool->process_discard(tc, bio);
+		else if (bio_op(bio) == REQ_OP_PROVISION)
+			pool->process_provision(tc, bio);
 		else
 			pool->process_bio(tc, bio);
 
@@ -2579,6 +2629,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_fail;
 		pool->process_discard = process_bio_fail;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_fail;
 		pool->process_discard_cell = process_cell_fail;
 		pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -2592,6 +2643,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_read_only;
 		pool->process_discard = process_bio_success;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_read_only;
 		pool->process_discard_cell = process_cell_success;
 		pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -2612,6 +2664,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		pool->out_of_data_space = true;
 		pool->process_bio = process_bio_read_only;
 		pool->process_discard = process_discard_bio;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_read_only;
 		pool->process_prepared_mapping = process_prepared_mapping;
 		set_discard_callbacks(pool);
@@ -2628,6 +2681,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_write(pool->pmd);
 		pool->process_bio = process_bio;
 		pool->process_discard = process_discard_bio;
+		pool->process_provision = process_provision_bio;
 		pool->process_cell = process_cell;
 		pool->process_prepared_mapping = process_prepared_mapping;
 		set_discard_callbacks(pool);
@@ -2749,7 +2803,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
+	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+	    bio_op(bio) == REQ_OP_PROVISION) {
 		thin_defer_bio_with_throttle(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -3396,6 +3451,9 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	pt->adjusted_pf = pt->requested_pf = pf;
 	ti->num_flush_bios = 1;
 	ti->limit_swap_bios = true;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+	ti->max_provision_granularity = true;
 
 	/*
 	 * Only need to enable discards if the pool should pass
@@ -4288,6 +4346,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->max_discard_granularity = true;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	mutex_unlock(&dm_thin_pool_table.mutex);
 
 	spin_lock_irq(&tc->pool->lock);
@@ -4502,6 +4563,8 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
 	limits->max_discard_sectors = pool->sectors_per_block * BIO_PRISON_MAX_RANGE;
+
+	limits->max_provision_sectors = pool->sectors_per_block;
 }
 
 static struct target_type thin_target = {
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 5/5] loop: Add support for provision requests
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
                       ` (3 preceding siblings ...)
  2023-04-20  0:48     ` [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
@ 2023-04-20  0:48     ` Sarthak Kukreti
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
  5 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:48 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Add support for provision requests to loopback devices.
Loop devices will configure provision support based on
whether the underlying block device/file can support
the provision request and upon receiving a provision bio,
will map it to the backing device/storage. For loop devices
over files, a REQ_OP_PROVISION request will translate to
an fallocate mode 0 call on the backing file.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/block/loop.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2..13c4b4f8b9c1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -327,6 +327,24 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 	return ret;
 }
 
+static int lo_req_provision(struct loop_device *lo, struct request *rq, loff_t pos)
+{
+	struct file *file = lo->lo_backing_file;
+	struct request_queue *q = lo->lo_queue;
+	int ret;
+
+	if (!q->limits.max_provision_sectors) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	ret = file->f_op->fallocate(file, 0, pos, blk_rq_bytes(rq));
+	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
+		ret = -EIO;
+ out:
+	return ret;
+}
+
 static int lo_req_flush(struct loop_device *lo, struct request *rq)
 {
 	int ret = vfs_fsync(lo->lo_backing_file, 0);
@@ -488,6 +506,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 				FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
+	case REQ_OP_PROVISION:
+		return lo_req_provision(lo, rq, pos);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
 			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
@@ -754,6 +774,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
+static void loop_config_provision(struct loop_device *lo)
+{
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
+
+	/*
+	 * If the backing device is a block device, mirror its provisioning
+	 * capability.
+	 */
+	if (S_ISBLK(inode->i_mode)) {
+		blk_queue_max_provision_sectors(lo->lo_queue,
+			bdev_max_provision_sectors(I_BDEV(inode)));
+	} else if (file->f_op->fallocate) {
+		blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
+	} else {
+		blk_queue_max_provision_sectors(lo->lo_queue, 0);
+	}
+}
+
 static void loop_config_discard(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -1092,6 +1131,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	blk_queue_io_min(lo->lo_queue, bsize);
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
@@ -1304,6 +1344,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	}
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 
 	/* update dio if lo_offset or transfer is changed */
 	__loop_update_dio(lo, lo->use_dio);
@@ -1830,6 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 	case REQ_OP_DISCARD:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		cmd->use_aio = false;
 		break;
 	default:
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-19 23:21             ` Dave Chinner
@ 2023-04-20  0:53               ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  0:53 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Mike Snitzer, dm-devel, linux-block, linux-ext4,
	linux-kernel, linux-fsdevel, Jens Axboe, Theodore Ts'o,
	Michael S. Tsirkin, Jason Wang, Bart Van Assche,
	Christoph Hellwig, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi,
	Brian Foster, Alasdair Kergon

On Wed, Apr 19, 2023 at 4:21 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Apr 19, 2023 at 10:26:02AM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 19, 2023 at 12:17:34PM -0400, Mike Snitzer wrote:
> > > (And obviously needs fixing independent of this patchset)
> > >
> > > Shouldn't mkfs first check that the underlying storage supports
> > > REQ_OP_PROVISION by verifying
> > > /sys/block/<dev>/queue/provision_max_bytes exists and is not 0?
> > > (Just saying, we need to add new features more defensively.. you just
> > > made the case based on this scenario's implications alone)
> >
> > Not for fallocate -- for regular files, there's no way to check if the
> > filesystem actually supports the operation requested other than to try
> > it and see if it succeeds.  We probably should've defined a DRY_RUN flag
> > for that purpose back when it was introduced.
>
> That ignores the fact that fallocate() was never intended to
> guarantee it will work in all contexts - it's an advisory interface
> at it's most basic level. If the call succeeds, then great, it does
> what is says on the box, but if it fails then it should have no
> visible effect on user data at all and the application still needs
> to perform whatever modification it needed done in some other way.
>
> IOWs, calling it one a block device without first checking if the
> block device supports that fallocate operation is exactly how it is
> supposed to be used. If the kernel can't handle this gracefully
> without corrupting data, then that's a kernel bug and not an
> application problem.
>
> > For fallocate calls to block devices, yes, the program can check the
> > queue limits in sysfs if fstat says the supplied path is a block device,
> > but I don't know that most programs are that thorough.  The fallocate(1)
> > CLI program does not check.
>
> Right. fallocate() was intended to just do the right thing without
> the application having to jump thrown an unknown number of hoops to
> determine if fallocate() can be used or not in the context it is
> executing in.  The kernel implementation is supposed to abstract all that
> context-dependent behaviour away from the application; all the
> application has to do is implement the fallocate() fast path and a
> single generic "do the right thing the slow way" fallback when the
> fallocate() method it called is not supported...
>
I added a separate commit[1] to fix this so that we only
truncate_bdev_range() iff we are in a supported de-allocate mode call.
Subsequently, the REQ_OP_PROVISION patch is a bit simpler when rebased
on top.

[1] https://www.spinics.net/lists/kernel/msg4765688.html

Best
Sarthak

> -Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
@ 2023-04-20  1:22       ` Darrick J. Wong
  2023-04-20  1:48         ` Sarthak Kukreti
  2023-04-20  1:47       ` [PATCH v5-fix " Sarthak Kukreti
  2023-04-24 15:54       ` [PATCH v5 " kernel test robot
  2 siblings, 1 reply; 57+ messages in thread
From: Darrick J. Wong @ 2023-04-20  1:22 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev

On Wed, Apr 19, 2023 at 05:48:46PM -0700, Sarthak Kukreti wrote:
> Only call truncate_bdev_range() if the fallocate mode is
> supported. This fixes a bug where data in the pagecache
> could be invalidated if the fallocate() was called on the
> block device with an invalid mode.
> 
> Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/fops.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c..2fd7e8b9ab48 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -648,25 +648,27 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  
>  	filemap_invalidate_lock(inode->i_mapping);
>  
> -	/* Invalidate the page cache, including dirty pages. */
> -	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> -	if (error)
> -		goto fail;
> -
> +	/*
> +	 * Invalidate the page cache, including dirty pages, for valid
> +	 * de-allocate mode calls to fallocate().
> +	 */
>  	switch (mode) {
>  	case FALLOC_FL_ZERO_RANGE:
>  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> -		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
> +			blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,

I'm pretty sure we're supposed to preserve the error codes returned by
both functions.

	error = truncate_bdev_range(...);
	if (!error)
		error = blkdev_issue_zeroout(...);

--D

>  					     len >> SECTOR_SHIFT, GFP_KERNEL,
>  					     BLKDEV_ZERO_NOUNMAP);
>  		break;
>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> -		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
> +			blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
>  					     len >> SECTOR_SHIFT, GFP_KERNEL,
>  					     BLKDEV_ZERO_NOFALLBACK);
>  		break;
>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> -		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
> +			blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
>  					     len >> SECTOR_SHIFT, GFP_KERNEL);
>  		break;
>  	default:
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5-fix 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
  2023-04-20  1:22       ` Darrick J. Wong
@ 2023-04-20  1:47       ` Sarthak Kukreti
  2023-04-20 16:20         ` Mike Snitzer
  2023-04-24 15:54       ` [PATCH v5 " kernel test robot
  2 siblings, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  1:47 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Only call truncate_bdev_range() if the fallocate mode is
supported. This fixes a bug where data in the pagecache
could be invalidated if the fallocate() was called on the
block device with an invalid mode.

Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/fops.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..d359254c645d 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -648,26 +648,37 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 
 	filemap_invalidate_lock(inode->i_mapping);
 
-	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
-
+	/*
+	 * Invalidate the page cache, including dirty pages, for valid
+	 * de-allocate mode calls to fallocate().
+	 */
 	switch (mode) {
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
-		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
-					     len >> SECTOR_SHIFT, GFP_KERNEL,
-					     BLKDEV_ZERO_NOUNMAP);
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (!error)
+			error = blkdev_issue_zeroout(bdev,
+						     start >> SECTOR_SHIFT,
+						     len >> SECTOR_SHIFT,
+						     GFP_KERNEL,
+						     BLKDEV_ZERO_NOUNMAP);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
-		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
-					     len >> SECTOR_SHIFT, GFP_KERNEL,
-					     BLKDEV_ZERO_NOFALLBACK);
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (!error)
+			error = blkdev_issue_zeroout(bdev,
+						     start >> SECTOR_SHIFT,
+						     len >> SECTOR_SHIFT,
+						     GFP_KERNEL,
+						     BLKDEV_ZERO_NOFALLBACK);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
-		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
-					     len >> SECTOR_SHIFT, GFP_KERNEL);
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (!error)
+			error = blkdev_issue_discard(bdev,
+						     start >> SECTOR_SHIFT,
+						     len >> SECTOR_SHIFT,
+						     GFP_KERNEL);
 		break;
 	default:
 		error = -EOPNOTSUPP;
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20  1:22       ` Darrick J. Wong
@ 2023-04-20  1:48         ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20  1:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev

Sorry I tried to be too concise :) Updated with a fixed up patch!

Best
Sarhtak

On Wed, Apr 19, 2023 at 6:22 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, Apr 19, 2023 at 05:48:46PM -0700, Sarthak Kukreti wrote:
> > Only call truncate_bdev_range() if the fallocate mode is
> > supported. This fixes a bug where data in the pagecache
> > could be invalidated if the fallocate() was called on the
> > block device with an invalid mode.
> >
> > Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  block/fops.c | 18 ++++++++++--------
> >  1 file changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/block/fops.c b/block/fops.c
> > index d2e6be4e3d1c..2fd7e8b9ab48 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -648,25 +648,27 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> >
> >       filemap_invalidate_lock(inode->i_mapping);
> >
> > -     /* Invalidate the page cache, including dirty pages. */
> > -     error = truncate_bdev_range(bdev, file->f_mode, start, end);
> > -     if (error)
> > -             goto fail;
> > -
> > +     /*
> > +      * Invalidate the page cache, including dirty pages, for valid
> > +      * de-allocate mode calls to fallocate().
> > +      */
> >       switch (mode) {
> >       case FALLOC_FL_ZERO_RANGE:
> >       case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> > -             error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> > +             error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
> > +                     blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
>
> I'm pretty sure we're supposed to preserve the error codes returned by
> both functions.
>
>         error = truncate_bdev_range(...);
>         if (!error)
>                 error = blkdev_issue_zeroout(...);
>
> --D
>
> >                                            len >> SECTOR_SHIFT, GFP_KERNEL,
> >                                            BLKDEV_ZERO_NOUNMAP);
> >               break;
> >       case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> > -             error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> > +             error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
> > +                     blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> >                                            len >> SECTOR_SHIFT, GFP_KERNEL,
> >                                            BLKDEV_ZERO_NOFALLBACK);
> >               break;
> >       case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> > -             error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> > +             error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
> > +                     blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> >                                            len >> SECTOR_SHIFT, GFP_KERNEL);
> >               break;
> >       default:
> > --
> > 2.40.0.634.g4ca3ef3211-goog
> >

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5-fix 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20  1:47       ` [PATCH v5-fix " Sarthak Kukreti
@ 2023-04-20 16:20         ` Mike Snitzer
  2023-04-20 17:28           ` Sarthak Kukreti
  2023-04-20 18:15           ` Sarthak Kukreti
  0 siblings, 2 replies; 57+ messages in thread
From: Mike Snitzer @ 2023-04-20 16:20 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Brian Foster,
	Alasdair Kergon

On Wed, Apr 19 2023 at  9:47P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> Only call truncate_bdev_range() if the fallocate mode is
> supported. This fixes a bug where data in the pagecache
> could be invalidated if the fallocate() was called on the
> block device with an invalid mode.
> 
> Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")

You should add:

Cc: stable@vger.kernel.org
Reported-by: Darrick J. Wong <djwong@kernel.org>

> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/fops.c | 37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c..d359254c645d 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -648,26 +648,37 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  
>  	filemap_invalidate_lock(inode->i_mapping);
>  
> -	/* Invalidate the page cache, including dirty pages. */
> -	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> -	if (error)
> -		goto fail;
> -

You remove the only user of the 'fail' label.  But I think it'd be
cleaner to keep using it below (reduces indentation churn too).

> +	/*
> +	 * Invalidate the page cache, including dirty pages, for valid
> +	 * de-allocate mode calls to fallocate().
> +	 */
>  	switch (mode) {
>  	case FALLOC_FL_ZERO_RANGE:
>  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> -		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> -					     len >> SECTOR_SHIFT, GFP_KERNEL,
> -					     BLKDEV_ZERO_NOUNMAP);
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (!error)
> +			error = blkdev_issue_zeroout(bdev,
> +						     start >> SECTOR_SHIFT,
> +						     len >> SECTOR_SHIFT,
> +						     GFP_KERNEL,
> +						     BLKDEV_ZERO_NOUNMAP);
>  		break;


So:

		error = truncate_bdev_range(bdev, file->f_mode, start, end);
		if (error)
		        goto fail;
		...


>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> -		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
> -					     len >> SECTOR_SHIFT, GFP_KERNEL,
> -					     BLKDEV_ZERO_NOFALLBACK);
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (!error)
> +			error = blkdev_issue_zeroout(bdev,
> +						     start >> SECTOR_SHIFT,
> +						     len >> SECTOR_SHIFT,
> +						     GFP_KERNEL,
> +						     BLKDEV_ZERO_NOFALLBACK);
>  		break;

Same.

>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> -		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> -					     len >> SECTOR_SHIFT, GFP_KERNEL);
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (!error)
> +			error = blkdev_issue_discard(bdev,
> +						     start >> SECTOR_SHIFT,
> +						     len >> SECTOR_SHIFT,
> +						     GFP_KERNEL);
>  		break;

Same.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5-fix 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20 16:20         ` Mike Snitzer
@ 2023-04-20 17:28           ` Sarthak Kukreti
  2023-04-20 18:17             ` Sarthak Kukreti
  2023-04-20 18:15           ` Sarthak Kukreti
  1 sibling, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20 17:28 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong, stable

Only call truncate_bdev_range() if the fallocate mode is
supported. This fixes a bug where data in the pagecache
could be invalidated if the fallocate() was called on the
block device with an invalid mode.

Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
Cc: stable@vger.kernel.org
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/fops.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..20b1eddcbe25 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -648,24 +648,35 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 
 	filemap_invalidate_lock(inode->i_mapping);
 
-	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
-
+	/*
+	 * Invalidate the page cache, including dirty pages, for valid
+	 * de-allocate mode calls to fallocate().
+	 */
 	switch (mode) {
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOUNMAP);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOFALLBACK);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (!error)
+			goto fail;
+
 		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL);
 		break;
-- 
2.40.0.396.gfff15efe05-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/4] block: Introduce provisioning primitives
  2023-04-18 22:43       ` Bart Van Assche
@ 2023-04-20 17:41         ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20 17:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong

Dropped in v5.

Thanks!
Sarthak

On Tue, Apr 18, 2023 at 3:43 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 4/18/23 15:12, Sarthak Kukreti wrote:
> >       /* Fail if we don't recognize the flags. */
> > -     if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> > +     if (mode != 0 && mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> >               return -EOPNOTSUPP;
>
> Is this change necessary? Doesn't (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> != 0 imply that mode != 0?
>
> Thanks,
>
> Bart.
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5-fix 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20 16:20         ` Mike Snitzer
  2023-04-20 17:28           ` Sarthak Kukreti
@ 2023-04-20 18:15           ` Sarthak Kukreti
  1 sibling, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20 18:15 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong, stable

Only call truncate_bdev_range() if the fallocate mode is
supported. This fixes a bug where data in the pagecache
could be invalidated if the fallocate() was called on the
block device with an invalid mode.

Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
Cc: stable@vger.kernel.org
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/fops.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..4c70fdc546e7 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -648,24 +648,35 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 
 	filemap_invalidate_lock(inode->i_mapping);
 
-	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
-
+	/*
+	 * Invalidate the page cache, including dirty pages, for valid
+	 * de-allocate mode calls to fallocate().
+	 */
 	switch (mode) {
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOUNMAP);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOFALLBACK);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL);
 		break;
-- 
2.40.0.396.gfff15efe05-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5-fix 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20 17:28           ` Sarthak Kukreti
@ 2023-04-20 18:17             ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-04-20 18:17 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, Daniil Lunev,
	Darrick J. Wong, stable

This patch had a slight typo; fixed in the most recent patch.

- Sarthak

On Thu, Apr 20, 2023 at 10:28 AM Sarthak Kukreti
<sarthakkukreti@chromium.org> wrote:
>
> Only call truncate_bdev_range() if the fallocate mode is
> supported. This fixes a bug where data in the pagecache
> could be invalidated if the fallocate() was called on the
> block device with an invalid mode.
>
> Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
> Cc: stable@vger.kernel.org
> Reported-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/fops.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c..20b1eddcbe25 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -648,24 +648,35 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>
>         filemap_invalidate_lock(inode->i_mapping);
>
> -       /* Invalidate the page cache, including dirty pages. */
> -       error = truncate_bdev_range(bdev, file->f_mode, start, end);
> -       if (error)
> -               goto fail;
> -
> +       /*
> +        * Invalidate the page cache, including dirty pages, for valid
> +        * de-allocate mode calls to fallocate().
> +        */
>         switch (mode) {
>         case FALLOC_FL_ZERO_RANGE:
>         case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> +               error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +               if (error)
> +                       goto fail;
> +
>                 error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
>                                              len >> SECTOR_SHIFT, GFP_KERNEL,
>                                              BLKDEV_ZERO_NOUNMAP);
>                 break;
>         case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> +               error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +               if (error)
> +                       goto fail;
> +
>                 error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
>                                              len >> SECTOR_SHIFT, GFP_KERNEL,
>                                              BLKDEV_ZERO_NOFALLBACK);
>                 break;
>         case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> +               error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +               if (!error)
> +                       goto fail;
> +
>                 error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
>                                              len >> SECTOR_SHIFT, GFP_KERNEL);
>                 break;
> --
> 2.40.0.396.gfff15efe05-goog
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
  2023-04-20  1:22       ` Darrick J. Wong
  2023-04-20  1:47       ` [PATCH v5-fix " Sarthak Kukreti
@ 2023-04-24 15:54       ` kernel test robot
  2 siblings, 0 replies; 57+ messages in thread
From: kernel test robot @ 2023-04-24 15:54 UTC (permalink / raw)
  To: Sarthak Kukreti, dm-devel, linux-block, linux-ext4, linux-kernel,
	linux-fsdevel
  Cc: llvm, oe-kbuild-all, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Stefan Hajnoczi, Alasdair Kergon, Mike Snitzer,
	Christoph Hellwig, Brian Foster, Theodore Ts'o,
	Andreas Dilger, Bart Van Assche, Daniil Lunev, Darrick J. Wong

Hi Sarthak,

kernel test robot noticed the following build warnings:

[auto build test WARNING on device-mapper-dm/for-next]
[also build test WARNING on linus/master v6.3 next-20230421]
[cannot apply to axboe-block/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Sarthak-Kukreti/block-Introduce-provisioning-primitives/20230420-095025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git for-next
patch link:    https://lore.kernel.org/r/20230420004850.297045-2-sarthakkukreti%40chromium.org
patch subject: [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes
config: hexagon-randconfig-r006-20230424 (https://download.01.org/0day-ci/archive/20230424/202304242302.5zYRfUub-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 437b7602e4a998220871de78afcb020b9c14a661)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/8bd0744b438be1722c5f8c1fe077e9dcef0e81b7
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Sarthak-Kukreti/block-Introduce-provisioning-primitives/20230420-095025
        git checkout 8bd0744b438be1722c5f8c1fe077e9dcef0e81b7
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304242302.5zYRfUub-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from block/fops.c:9:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from block/fops.c:9:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from block/fops.c:9:
   In file included from include/linux/blkdev.h:9:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> block/fops.c:678:2: warning: unused label 'fail' [-Wunused-label]
    fail:
    ^~~~~
   7 warnings generated.


vim +/fail +678 block/fops.c

cd82cca7ebfe9c Christoph Hellwig 2021-09-07  613  
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  614  #define	BLKDEV_FALLOC_FL_SUPPORTED					\
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  615  		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  616  		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  617  
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  618  static long blkdev_fallocate(struct file *file, int mode, loff_t start,
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  619  			     loff_t len)
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  620  {
f278eb3d8178f9 Ming Lei          2021-09-23  621  	struct inode *inode = bdev_file_inode(file);
f278eb3d8178f9 Ming Lei          2021-09-23  622  	struct block_device *bdev = I_BDEV(inode);
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  623  	loff_t end = start + len - 1;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  624  	loff_t isize;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  625  	int error;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  626  
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  627  	/* Fail if we don't recognize the flags. */
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  628  	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  629  		return -EOPNOTSUPP;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  630  
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  631  	/* Don't go off the end of the device. */
2a93ad8fcb377b Christoph Hellwig 2021-10-18  632  	isize = bdev_nr_bytes(bdev);
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  633  	if (start >= isize)
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  634  		return -EINVAL;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  635  	if (end >= isize) {
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  636  		if (mode & FALLOC_FL_KEEP_SIZE) {
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  637  			len = isize - start;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  638  			end = start + len - 1;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  639  		} else
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  640  			return -EINVAL;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  641  	}
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  642  
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  643  	/*
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  644  	 * Don't allow IO that isn't aligned to logical block size.
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  645  	 */
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  646  	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  647  		return -EINVAL;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  648  
f278eb3d8178f9 Ming Lei          2021-09-23  649  	filemap_invalidate_lock(inode->i_mapping);
f278eb3d8178f9 Ming Lei          2021-09-23  650  
8bd0744b438be1 Sarthak Kukreti   2023-04-19  651  	/*
8bd0744b438be1 Sarthak Kukreti   2023-04-19  652  	 * Invalidate the page cache, including dirty pages, for valid
8bd0744b438be1 Sarthak Kukreti   2023-04-19  653  	 * de-allocate mode calls to fallocate().
8bd0744b438be1 Sarthak Kukreti   2023-04-19  654  	 */
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  655  	switch (mode) {
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  656  	case FALLOC_FL_ZERO_RANGE:
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  657  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
8bd0744b438be1 Sarthak Kukreti   2023-04-19  658  		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
8bd0744b438be1 Sarthak Kukreti   2023-04-19  659  			blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
6549a874fb65e7 Pavel Begunkov    2021-10-20  660  					     len >> SECTOR_SHIFT, GFP_KERNEL,
6549a874fb65e7 Pavel Begunkov    2021-10-20  661  					     BLKDEV_ZERO_NOUNMAP);
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  662  		break;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  663  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
8bd0744b438be1 Sarthak Kukreti   2023-04-19  664  		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
8bd0744b438be1 Sarthak Kukreti   2023-04-19  665  			blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
6549a874fb65e7 Pavel Begunkov    2021-10-20  666  					     len >> SECTOR_SHIFT, GFP_KERNEL,
6549a874fb65e7 Pavel Begunkov    2021-10-20  667  					     BLKDEV_ZERO_NOFALLBACK);
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  668  		break;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  669  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
8bd0744b438be1 Sarthak Kukreti   2023-04-19  670  		error = truncate_bdev_range(bdev, file->f_mode, start, end) ||
8bd0744b438be1 Sarthak Kukreti   2023-04-19  671  			blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
44abff2c0b970a Christoph Hellwig 2022-04-15  672  					     len >> SECTOR_SHIFT, GFP_KERNEL);
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  673  		break;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  674  	default:
f278eb3d8178f9 Ming Lei          2021-09-23  675  		error = -EOPNOTSUPP;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  676  	}
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  677  
f278eb3d8178f9 Ming Lei          2021-09-23 @678   fail:
f278eb3d8178f9 Ming Lei          2021-09-23  679  	filemap_invalidate_unlock(inode->i_mapping);
f278eb3d8178f9 Ming Lei          2021-09-23  680  	return error;
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  681  }
cd82cca7ebfe9c Christoph Hellwig 2021-09-07  682  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-04-20  0:48     ` [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
@ 2023-05-01 19:15       ` Mike Snitzer
  2023-05-06  6:32         ` Sarthak Kukreti
  0 siblings, 1 reply; 57+ messages in thread
From: Mike Snitzer @ 2023-05-01 19:15 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Brian Foster,
	Alasdair Kergon, ejt

On Wed, Apr 19 2023 at  8:48P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> dm-thinpool uses the provision request to provision
> blocks for a dm-thin device. dm-thinpool currently does not
> pass through REQ_OP_PROVISION to underlying devices.
> 
> For shared blocks, provision requests will break sharing and copy the
> contents of the entire block. Additionally, if 'skip_block_zeroing'
> is not set, dm-thin will opt to zero out the entire range as a part
> of provisioning.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  drivers/md/dm-thin.c | 73 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 68 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 2b13c949bd72..58d633f5c928 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1891,7 +1893,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  
>  	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
>  		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
> -		cell_defer_no_holder(tc, virt_cell);
> +		if (bio_op(bio) != REQ_OP_PROVISION)
> +			cell_defer_no_holder(tc, virt_cell);

Can you please explain why cell_defer_no_holder() is skipped for REQ_OP_PROVISION here?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v6 0/5] Introduce block provisioning primitives
  2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
                       ` (4 preceding siblings ...)
  2023-04-20  0:48     ` [PATCH v5 5/5] loop: Add support for provision requests Sarthak Kukreti
@ 2023-05-06  6:29     ` Sarthak Kukreti
  2023-05-06  6:29       ` [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
                         ` (5 more replies)
  5 siblings, 6 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:29 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

Hi,

This patch series covers iteration 6 of adding support for block
provisioning requests.

Changes from v5:
- Remove explicit supports_provision from dm devices.
- Move provision sectors io hint to pool_io_hint. Other devices
  will derive the provisioning limits from the stack.
- Remove artifact from v4 to omit cell_defer_no_holder for
  REQ_OP_PROVISION.
- Fix blkdev_fallocate() called with invalid fallocate
  modes to propagate errors correctly.

Sarthak Kukreti (5):
  block: Don't invalidate pagecache for invalid falloc modes
  block: Introduce provisioning primitives
  dm: Add block provisioning support
  dm-thin: Add REQ_OP_PROVISION support
  loop: Add support for provision requests

 block/blk-core.c              |  5 +++
 block/blk-lib.c               | 53 ++++++++++++++++++++++++++
 block/blk-merge.c             | 18 +++++++++
 block/blk-settings.c          | 19 ++++++++++
 block/blk-sysfs.c             |  9 +++++
 block/bounce.c                |  1 +
 block/fops.c                  | 31 +++++++++++++---
 drivers/block/loop.c          | 42 +++++++++++++++++++++
 drivers/md/dm-crypt.c         |  4 +-
 drivers/md/dm-linear.c        |  1 +
 drivers/md/dm-snap.c          |  7 ++++
 drivers/md/dm-table.c         | 23 ++++++++++++
 drivers/md/dm-thin.c          | 70 +++++++++++++++++++++++++++++++++--
 drivers/md/dm.c               |  6 +++
 include/linux/bio.h           |  6 ++-
 include/linux/blk_types.h     |  5 ++-
 include/linux/blkdev.h        | 16 ++++++++
 include/linux/device-mapper.h | 17 +++++++++
 18 files changed, 319 insertions(+), 14 deletions(-)

-- 
2.40.1.521.gf1e218fcd8-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
@ 2023-05-06  6:29       ` Sarthak Kukreti
  2023-05-09 16:51         ` Mike Snitzer
  2023-05-12 18:31         ` Darrick J. Wong
  2023-05-06  6:29       ` [PATCH v6 2/5] block: Introduce provisioning primitives Sarthak Kukreti
                         ` (4 subsequent siblings)
  5 siblings, 2 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:29 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong, stable

Only call truncate_bdev_range() if the fallocate mode is
supported. This fixes a bug where data in the pagecache
could be invalidated if the fallocate() was called on the
block device with an invalid mode.

Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
Cc: stable@vger.kernel.org
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/fops.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c..4c70fdc546e7 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -648,24 +648,35 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 
 	filemap_invalidate_lock(inode->i_mapping);
 
-	/* Invalidate the page cache, including dirty pages. */
-	error = truncate_bdev_range(bdev, file->f_mode, start, end);
-	if (error)
-		goto fail;
-
+	/*
+	 * Invalidate the page cache, including dirty pages, for valid
+	 * de-allocate mode calls to fallocate().
+	 */
 	switch (mode) {
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOUNMAP);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL,
 					     BLKDEV_ZERO_NOFALLBACK);
 		break;
 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = truncate_bdev_range(bdev, file->f_mode, start, end);
+		if (error)
+			goto fail;
+
 		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
 					     len >> SECTOR_SHIFT, GFP_KERNEL);
 		break;
-- 
2.40.1.521.gf1e218fcd8-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v6 2/5] block: Introduce provisioning primitives
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
  2023-05-06  6:29       ` [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
@ 2023-05-06  6:29       ` Sarthak Kukreti
  2023-05-09 16:52         ` Mike Snitzer
  2023-05-12 18:37         ` Darrick J. Wong
  2023-05-06  6:29       ` [PATCH v6 3/5] dm: Add block provisioning support Sarthak Kukreti
                         ` (3 subsequent siblings)
  5 siblings, 2 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:29 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

Introduce block request REQ_OP_PROVISION. The intent of this request
is to request underlying storage to preallocate disk space for the given
block range. Block devices that support this capability will export
a provision limit within their request queues.

This patch also adds the capability to call fallocate() in mode 0
on block devices, which will send REQ_OP_PROVISION to the block
device for the specified range,

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/blk-core.c          |  5 ++++
 block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c         | 18 +++++++++++++
 block/blk-settings.c      | 19 ++++++++++++++
 block/blk-sysfs.c         |  9 +++++++
 block/bounce.c            |  1 +
 block/fops.c              | 10 +++++++-
 include/linux/bio.h       |  6 +++--
 include/linux/blk_types.h |  5 +++-
 include/linux/blkdev.h    | 16 ++++++++++++
 10 files changed, 138 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 42926e6cb83c..4a2342ba3a8b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
 	REQ_OP_NAME(WRITE_ZEROES),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
+	REQ_OP_NAME(PROVISION)
 };
 #undef REQ_OP_NAME
 
@@ -798,6 +799,10 @@ void submit_bio_noacct(struct bio *bio)
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_PROVISION:
+		if (!q->limits.max_provision_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..647b6451660b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -343,3 +343,56 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_secure_erase);
+
+/**
+ * blkdev_issue_provision - provision a block range
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to provision
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Issues a provision request to the block device for the range of sectors.
+ *  For thinly provisioned block devices, this acts as a signal for the
+ *  underlying storage pool to allocate space for this block range.
+ */
+int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp)
+{
+	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	unsigned int max_sectors = bdev_max_provision_sectors(bdev);
+	struct bio *bio = NULL;
+	struct blk_plug plug;
+	int ret = 0;
+
+	if (max_sectors == 0)
+		return -EOPNOTSUPP;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+	if (bdev_read_only(bdev))
+		return -EPERM;
+
+	blk_start_plug(&plug);
+	for (;;) {
+		unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
+
+		bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
+
+		sector += req_sects;
+		nr_sects -= req_sects;
+		if (!nr_sects) {
+			ret = submit_bio_wait(bio);
+			if (ret == -EOPNOTSUPP)
+				ret = 0;
+			bio_put(bio);
+			break;
+		}
+		cond_resched();
+	}
+	blk_finish_plug(&plug);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_provision);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6460abdb2426..a3ffebb97a1d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
 	return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
 }
 
+static struct bio *bio_split_provision(struct bio *bio,
+					const struct queue_limits *lim,
+					unsigned int *nsegs, struct bio_set *bs)
+{
+	*nsegs = 0;
+
+	if (!lim->max_provision_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= lim->max_provision_sectors)
+		return NULL;
+
+	return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
+}
+
 /*
  * Return the maximum number of sectors from the start of a bio that may be
  * submitted as a single request to a block device. If enough sectors remain,
@@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
 	case REQ_OP_WRITE_ZEROES:
 		split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
 		break;
+	case REQ_OP_PROVISION:
+		split = bio_split_provision(bio, lim, nr_segs, bs);
+		break;
 	default:
 		split = bio_split_rw(bio, lim, nr_segs, bs,
 				get_max_io_size(bio, lim) << SECTOR_SHIFT);
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 896b4654ab00..d303e6614c36 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
+	lim->max_provision_sectors = 0;
 }
 
 /**
@@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_provision_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_provision_sectors - set max sectors for a single provision
+ *
+ * @q:  the request queue for the device
+ * @max_provision_sectors: maximum number of sectors to provision per command
+ **/
+
+void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors)
+{
+	q->limits.max_provision_sectors = max_provision_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_provision_sectors);
+
 /**
  * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
  * @q:  the request queue for the device
@@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_provision_sectors = min_not_zero(t->max_provision_sectors,
+						b->max_provision_sectors);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f1fce1c7fa44..0a3165211c66 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -213,6 +213,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
 	return queue_var_show(0, page);
 }
 
+static ssize_t queue_provision_max_show(struct request_queue *q,
+		char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_provision_sectors << 9);
+}
+
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(0, page);
@@ -604,6 +611,7 @@ QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
 QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
 QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
 
+QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
@@ -661,6 +669,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_provision_max_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..ab9d8723ae64 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		break;
 	default:
 		bio_for_each_segment(bv, bio_src, iter)
diff --git a/block/fops.c b/block/fops.c
index 4c70fdc546e7..be2e41f160bf 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -613,7 +613,8 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 #define	BLKDEV_FALLOC_FL_SUPPORTED					\
 		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
-		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE |	\
+		 FALLOC_FL_UNSHARE_RANGE)
 
 static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 			     loff_t len)
@@ -653,6 +654,13 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	 * de-allocate mode calls to fallocate().
 	 */
 	switch (mode) {
+	case 0:
+	case FALLOC_FL_UNSHARE_RANGE:
+	case FALLOC_FL_KEEP_SIZE:
+	case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
+					       len >> SECTOR_SHIFT, GFP_KERNEL);
+		break;
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
 		error = truncate_bdev_range(bdev, file->f_mode, start, end);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d766be7152e1..9820b3b039f2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
 	    bio->bi_iter.bi_size &&
 	    bio_op(bio) != REQ_OP_DISCARD &&
 	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
-	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
+	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
+	    bio_op(bio) != REQ_OP_PROVISION)
 		return true;
 
 	return false;
@@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
 {
 	return bio_op(bio) == REQ_OP_DISCARD ||
 	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
-	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
+	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+	       bio_op(bio) == REQ_OP_PROVISION;
 }
 
 static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 99be590f952f..27bdf88f541c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -385,7 +385,10 @@ enum req_op {
 	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
 	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
 
-	REQ_OP_LAST		= (__force blk_opf_t)36,
+	/* request device to provision block */
+	REQ_OP_PROVISION        = (__force blk_opf_t)37,
+
+	REQ_OP_LAST		= (__force blk_opf_t)38,
 };
 
 enum req_flag_bits {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 941304f17492..239e2f418b6e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -303,6 +303,7 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int		max_provision_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_provision_sectors(struct request_queue *q,
+		unsigned int max_provision_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
@@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
 
+extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask);
+
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
 
@@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
 	return q->limits.max_discard_segments;
 }
 
+static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
+{
+	return q->limits.max_provision_sectors;
+}
+
 static inline unsigned int queue_max_segment_size(const struct request_queue *q)
 {
 	return q->limits.max_segment_size;
@@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
 	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
 }
 
+static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
+{
+	return bdev_get_queue(bdev)->limits.max_provision_sectors;
+}
+
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
 {
 	return blk_queue_zoned_model(bdev_get_queue(bdev));
-- 
2.40.1.521.gf1e218fcd8-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v6 3/5] dm: Add block provisioning support
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
  2023-05-06  6:29       ` [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
  2023-05-06  6:29       ` [PATCH v6 2/5] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-05-06  6:29       ` Sarthak Kukreti
  2023-05-09 16:52         ` Mike Snitzer
  2023-05-06  6:29       ` [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:29 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

Add block provisioning support for device-mapper targets.
dm-crypt, dm-snap and dm-linear will, by default, passthrough
REQ_OP_PROVISION requests to the underlying device, if
supported.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-crypt.c         |  4 +++-
 drivers/md/dm-linear.c        |  1 +
 drivers/md/dm-snap.c          |  7 +++++++
 drivers/md/dm-table.c         | 23 +++++++++++++++++++++++
 drivers/md/dm.c               |  6 ++++++
 include/linux/device-mapper.h | 17 +++++++++++++++++
 6 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 8b47b913ee83..5a7c475ce6fc 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3336,6 +3336,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		cc->tag_pool_max_sectors <<= cc->sector_shift;
 	}
 
+	ti->num_provision_bios = 1;
+
 	ret = -ENOMEM;
 	cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 1, devname);
 	if (!cc->io_queue) {
@@ -3390,7 +3392,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	 * - for REQ_OP_DISCARD caller must use flush if IO ordering matters
 	 */
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH ||
-	    bio_op(bio) == REQ_OP_DISCARD)) {
+	    bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) {
 		bio_set_dev(bio, cc->dev->bdev);
 		if (bio_sectors(bio))
 			bio->bi_iter.bi_sector = cc->start +
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index f4448d520ee9..74ee27ca551a 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->num_provision_bios = 1;
 	ti->private = lc;
 	return 0;
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 9c49f53760d0..0dfda50ac4e0 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1358,6 +1358,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (s->discard_zeroes_cow)
 		ti->num_discard_bios = (s->discard_passdown_origin ? 2 : 1);
 	ti->per_io_data_size = sizeof(struct dm_snap_tracked_chunk);
+	ti->num_provision_bios = 1;
 
 	/* Add snapshot to the list of snapshots for this origin */
 	/* Exceptions aren't triggered till snapshot_resume() is called */
@@ -2003,6 +2004,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	/* If the block is already remapped - use that, else remap it */
 	e = dm_lookup_exception(&s->complete, chunk);
 	if (e) {
+		if (unlikely(bio_op(bio) == REQ_OP_PROVISION)) {
+			bio_endio(bio);
+			r = DM_MAPIO_SUBMITTED;
+			goto out_unlock;
+		}
 		remap_exception(s, e, bio, chunk);
 		if (unlikely(bio_op(bio) == REQ_OP_DISCARD) &&
 		    io_overlaps_chunk(s, bio)) {
@@ -2413,6 +2419,7 @@ static void snapshot_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		/* All discards are split on chunk_size boundary */
 		limits->discard_granularity = snap->store->chunk_size;
 		limits->max_discard_sectors = snap->store->chunk_size;
+		limits->max_provision_sectors = snap->store->chunk_size;
 
 		up_read(&_origins_lock);
 	}
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 119db5e01080..282c530b0685 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1854,6 +1854,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
 	return true;
 }
 
+static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
+				    sector_t start, sector_t len, void *data)
+{
+	return bdev_max_provision_sectors(dev->bdev);
+}
+
+static bool dm_table_supports_provision(struct dm_table *t)
+{
+	for (unsigned int i = 0; i < t->num_targets; i++) {
+		struct dm_target *ti = dm_table_get_target(t, i);
+
+		if (ti->provision_supported ||
+		    (ti->type->iterate_devices &&
+		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
+			return true;
+	}
+
+	return false;
+}
+
 static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
 				     sector_t start, sector_t len, void *data)
 {
@@ -1987,6 +2007,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_zeroes(t))
 		q->limits.max_write_zeroes_sectors = 0;
 
+	if (!dm_table_supports_provision(t))
+		q->limits.max_provision_sectors = 0;
+
 	dm_table_verify_integrity(t);
 
 	/*
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b694ba3a106..9b94121b8d38 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1609,6 +1609,7 @@ static bool is_abnormal_io(struct bio *bio)
 		case REQ_OP_DISCARD:
 		case REQ_OP_SECURE_ERASE:
 		case REQ_OP_WRITE_ZEROES:
+		case REQ_OP_PROVISION:
 			return true;
 		default:
 			break;
@@ -1641,6 +1642,11 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
 		if (ti->max_write_zeroes_granularity)
 			max_granularity = limits->max_write_zeroes_sectors;
 		break;
+	case REQ_OP_PROVISION:
+		num_bios = ti->num_provision_bios;
+		if (ti->max_provision_granularity)
+			max_granularity = limits->max_provision_sectors;
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a52d2b9a6846..9981378457d2 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -334,6 +334,12 @@ struct dm_target {
 	 */
 	unsigned int num_write_zeroes_bios;
 
+	/*
+	 * The number of PROVISION bios that will be submitted to the target.
+	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
+	 */
+	unsigned int num_provision_bios;
+
 	/*
 	 * The minimum number of extra bytes allocated in each io for the
 	 * target to use.
@@ -358,6 +364,11 @@ struct dm_target {
 	 */
 	bool discards_supported:1;
 
+	/* Set if this target needs to receive provision requests regardless of
+	 * whether or not its underlying devices have support.
+	 */
+	bool provision_supported:1;
+
 	/*
 	 * Set if this target requires that discards be split on
 	 * 'max_discard_sectors' boundaries.
@@ -376,6 +387,12 @@ struct dm_target {
 	 */
 	bool max_write_zeroes_granularity:1;
 
+	/*
+	 * Set if this target requires that provisions be split on
+	 * 'max_provision_sectors' boundaries.
+	 */
+	bool max_provision_granularity:1;
+
 	/*
 	 * Set if we need to limit the number of in-flight bios when swapping.
 	 */
-- 
2.40.1.521.gf1e218fcd8-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
                         ` (2 preceding siblings ...)
  2023-05-06  6:29       ` [PATCH v6 3/5] dm: Add block provisioning support Sarthak Kukreti
@ 2023-05-06  6:29       ` Sarthak Kukreti
  2023-05-09 16:58         ` Mike Snitzer
  2023-05-12 17:32         ` Mike Snitzer
  2023-05-06  6:29       ` [PATCH v6 5/5] loop: Add support for provision requests Sarthak Kukreti
  2023-05-12 18:28       ` [PATCH v6 0/5] Introduce block provisioning primitives Darrick J. Wong
  5 siblings, 2 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:29 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

dm-thinpool uses the provision request to provision
blocks for a dm-thin device. dm-thinpool currently does not
pass through REQ_OP_PROVISION to underlying devices.

For shared blocks, provision requests will break sharing and copy the
contents of the entire block. Additionally, if 'skip_block_zeroing'
is not set, dm-thin will opt to zero out the entire range as a part
of provisioning.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-thin.c | 70 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 2b13c949bd72..3f94f53ac956 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -274,6 +274,7 @@ struct pool {
 
 	process_bio_fn process_bio;
 	process_bio_fn process_discard;
+	process_bio_fn process_provision;
 
 	process_cell_fn process_cell;
 	process_cell_fn process_discard_cell;
@@ -913,7 +914,8 @@ static void __inc_remap_and_issue_cell(void *context,
 	struct bio *bio;
 
 	while ((bio = bio_list_pop(&cell->bios))) {
-		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
+		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+		    bio_op(bio) == REQ_OP_PROVISION)
 			bio_list_add(&info->defer_bios, bio);
 		else {
 			inc_all_io_entry(info->tc->pool, bio);
@@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
 
 static int io_overwrites_block(struct pool *pool, struct bio *bio)
 {
-	return (bio_data_dir(bio) == WRITE) &&
-		io_overlaps_block(pool, bio);
+	return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) &&
+	       bio_op(bio) != REQ_OP_PROVISION;
 }
 
 static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
@@ -1953,6 +1955,51 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 	}
 }
 
+static void process_provision_bio(struct thin_c *tc, struct bio *bio)
+{
+	int r;
+	struct pool *pool = tc->pool;
+	dm_block_t block = get_bio_block(tc, bio);
+	struct dm_bio_prison_cell *cell;
+	struct dm_cell_key key;
+	struct dm_thin_lookup_result lookup_result;
+
+	/*
+	 * If cell is already occupied, then the block is already
+	 * being provisioned so we have nothing further to do here.
+	 */
+	build_virtual_key(tc->td, block, &key);
+	if (bio_detain(pool, &key, bio, &cell))
+		return;
+
+	if (tc->requeue_mode) {
+		cell_requeue(pool, cell);
+		return;
+	}
+
+	r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
+	switch (r) {
+	case 0:
+		if (lookup_result.shared) {
+			process_shared_bio(tc, bio, block, &lookup_result, cell);
+		} else {
+			bio_endio(bio);
+			cell_defer_no_holder(tc, cell);
+		}
+		break;
+	case -ENODATA:
+		provision_block(tc, bio, block, cell);
+		break;
+
+	default:
+		DMERR_LIMIT("%s: dm_thin_find_block() failed: error = %d",
+			    __func__, r);
+		cell_defer_no_holder(tc, cell);
+		bio_io_error(bio);
+		break;
+	}
+}
+
 static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
 {
 	int r;
@@ -2228,6 +2275,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
 
 		if (bio_op(bio) == REQ_OP_DISCARD)
 			pool->process_discard(tc, bio);
+		else if (bio_op(bio) == REQ_OP_PROVISION)
+			pool->process_provision(tc, bio);
 		else
 			pool->process_bio(tc, bio);
 
@@ -2579,6 +2628,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_fail;
 		pool->process_discard = process_bio_fail;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_fail;
 		pool->process_discard_cell = process_cell_fail;
 		pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -2592,6 +2642,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_read_only;
 		pool->process_discard = process_bio_success;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_read_only;
 		pool->process_discard_cell = process_cell_success;
 		pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -2612,6 +2663,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		pool->out_of_data_space = true;
 		pool->process_bio = process_bio_read_only;
 		pool->process_discard = process_discard_bio;
+		pool->process_provision = process_bio_fail;
 		pool->process_cell = process_cell_read_only;
 		pool->process_prepared_mapping = process_prepared_mapping;
 		set_discard_callbacks(pool);
@@ -2628,6 +2680,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		dm_pool_metadata_read_write(pool->pmd);
 		pool->process_bio = process_bio;
 		pool->process_discard = process_discard_bio;
+		pool->process_provision = process_provision_bio;
 		pool->process_cell = process_cell;
 		pool->process_prepared_mapping = process_prepared_mapping;
 		set_discard_callbacks(pool);
@@ -2749,7 +2802,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
+	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+	    bio_op(bio) == REQ_OP_PROVISION) {
 		thin_defer_bio_with_throttle(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -3396,6 +3450,9 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	pt->adjusted_pf = pt->requested_pf = pf;
 	ti->num_flush_bios = 1;
 	ti->limit_swap_bios = true;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+	ti->max_provision_granularity = true;
 
 	/*
 	 * Only need to enable discards if the pool should pass
@@ -4114,6 +4171,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	 * The pool uses the same discard limits as the underlying data
 	 * device.  DM core has already set this up.
 	 */
+
+	limits->max_provision_sectors = pool->sectors_per_block;
 }
 
 static struct target_type pool_target = {
@@ -4288,6 +4347,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->max_discard_granularity = true;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	mutex_unlock(&dm_thin_pool_table.mutex);
 
 	spin_lock_irq(&tc->pool->lock);
-- 
2.40.1.521.gf1e218fcd8-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v6 5/5] loop: Add support for provision requests
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
                         ` (3 preceding siblings ...)
  2023-05-06  6:29       ` [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
@ 2023-05-06  6:29       ` Sarthak Kukreti
  2023-05-15 12:40         ` Brian Foster
  2023-05-12 18:28       ` [PATCH v6 0/5] Introduce block provisioning primitives Darrick J. Wong
  5 siblings, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:29 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

Add support for provision requests to loopback devices.
Loop devices will configure provision support based on
whether the underlying block device/file can support
the provision request and upon receiving a provision bio,
will map it to the backing device/storage. For loop devices
over files, a REQ_OP_PROVISION request will translate to
an fallocate mode 0 call on the backing file.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/block/loop.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2..13c4b4f8b9c1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -327,6 +327,24 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 	return ret;
 }
 
+static int lo_req_provision(struct loop_device *lo, struct request *rq, loff_t pos)
+{
+	struct file *file = lo->lo_backing_file;
+	struct request_queue *q = lo->lo_queue;
+	int ret;
+
+	if (!q->limits.max_provision_sectors) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	ret = file->f_op->fallocate(file, 0, pos, blk_rq_bytes(rq));
+	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
+		ret = -EIO;
+ out:
+	return ret;
+}
+
 static int lo_req_flush(struct loop_device *lo, struct request *rq)
 {
 	int ret = vfs_fsync(lo->lo_backing_file, 0);
@@ -488,6 +506,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 				FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
+	case REQ_OP_PROVISION:
+		return lo_req_provision(lo, rq, pos);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
 			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
@@ -754,6 +774,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
+static void loop_config_provision(struct loop_device *lo)
+{
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
+
+	/*
+	 * If the backing device is a block device, mirror its provisioning
+	 * capability.
+	 */
+	if (S_ISBLK(inode->i_mode)) {
+		blk_queue_max_provision_sectors(lo->lo_queue,
+			bdev_max_provision_sectors(I_BDEV(inode)));
+	} else if (file->f_op->fallocate) {
+		blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
+	} else {
+		blk_queue_max_provision_sectors(lo->lo_queue, 0);
+	}
+}
+
 static void loop_config_discard(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -1092,6 +1131,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	blk_queue_io_min(lo->lo_queue, bsize);
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
@@ -1304,6 +1344,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	}
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 
 	/* update dio if lo_offset or transfer is changed */
 	__loop_update_dio(lo, lo->use_dio);
@@ -1830,6 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 	case REQ_OP_DISCARD:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		cmd->use_aio = false;
 		break;
 	default:
-- 
2.40.1.521.gf1e218fcd8-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-01 19:15       ` Mike Snitzer
@ 2023-05-06  6:32         ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-06  6:32 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Brian Foster,
	Alasdair Kergon, ejt

On Mon, May 1, 2023 at 12:15 PM Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Wed, Apr 19 2023 at  8:48P -0400,
> Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:
>
> > dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block. Additionally, if 'skip_block_zeroing'
> > is not set, dm-thin will opt to zero out the entire range as a part
> > of provisioning.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  drivers/md/dm-thin.c | 73 +++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 68 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index 2b13c949bd72..58d633f5c928 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -1891,7 +1893,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
> >
> >       if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
> >               break_sharing(tc, bio, block, &key, lookup_result, data_cell);
> > -             cell_defer_no_holder(tc, virt_cell);
> > +             if (bio_op(bio) != REQ_OP_PROVISION)
> > +                     cell_defer_no_holder(tc, virt_cell);
>
> Can you please explain why cell_defer_no_holder() is skipped for REQ_OP_PROVISION here?
>
I recalled seeing a BUG in dm-prison-v1 if I allowed
cell_defer_no_holder() for REQ_OP_PROVISION, but from additional
testing, it looks like it was left behind from a cleanup in v4.
Dropped in v6.

Thanks
Sarthak

> Thanks,
> Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-05-06  6:29       ` [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
@ 2023-05-09 16:51         ` Mike Snitzer
  2023-05-12 18:31         ` Darrick J. Wong
  1 sibling, 0 replies; 57+ messages in thread
From: Mike Snitzer @ 2023-05-09 16:51 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, stable,
	Christoph Hellwig, Andreas Dilger, Stefan Hajnoczi, Brian Foster,
	Alasdair Kergon

On Sat, May 06 2023 at  2:29P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> Only call truncate_bdev_range() if the fallocate mode is
> supported. This fixes a bug where data in the pagecache
> could be invalidated if the fallocate() was called on the
> block device with an invalid mode.
> 
> Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
> Cc: stable@vger.kernel.org
> Reported-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 2/5] block: Introduce provisioning primitives
  2023-05-06  6:29       ` [PATCH v6 2/5] block: Introduce provisioning primitives Sarthak Kukreti
@ 2023-05-09 16:52         ` Mike Snitzer
  2023-05-12 18:37         ` Darrick J. Wong
  1 sibling, 0 replies; 57+ messages in thread
From: Mike Snitzer @ 2023-05-09 16:52 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Sat, May 06 2023 at  2:29P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> Introduce block request REQ_OP_PROVISION. The intent of this request
> is to request underlying storage to preallocate disk space for the given
> block range. Block devices that support this capability will export
> a provision limit within their request queues.
> 
> This patch also adds the capability to call fallocate() in mode 0
> on block devices, which will send REQ_OP_PROVISION to the block
> device for the specified range,
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 3/5] dm: Add block provisioning support
  2023-05-06  6:29       ` [PATCH v6 3/5] dm: Add block provisioning support Sarthak Kukreti
@ 2023-05-09 16:52         ` Mike Snitzer
  0 siblings, 0 replies; 57+ messages in thread
From: Mike Snitzer @ 2023-05-09 16:52 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Sat, May 06 2023 at  2:29P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> Add block provisioning support for device-mapper targets.
> dm-crypt, dm-snap and dm-linear will, by default, passthrough
> REQ_OP_PROVISION requests to the underlying device, if
> supported.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-06  6:29       ` [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
@ 2023-05-09 16:58         ` Mike Snitzer
  2023-05-11 20:03           ` Sarthak Kukreti
  2023-05-12 17:32         ` Mike Snitzer
  1 sibling, 1 reply; 57+ messages in thread
From: Mike Snitzer @ 2023-05-09 16:58 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Sat, May 06 2023 at  2:29P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> dm-thinpool uses the provision request to provision
> blocks for a dm-thin device. dm-thinpool currently does not
> pass through REQ_OP_PROVISION to underlying devices.
> 
> For shared blocks, provision requests will break sharing and copy the
> contents of the entire block. Additionally, if 'skip_block_zeroing'
> is not set, dm-thin will opt to zero out the entire range as a part
> of provisioning.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  drivers/md/dm-thin.c | 70 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 2b13c949bd72..3f94f53ac956 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -274,6 +274,7 @@ struct pool {
>  
>  	process_bio_fn process_bio;
>  	process_bio_fn process_discard;
> +	process_bio_fn process_provision;
>  
>  	process_cell_fn process_cell;
>  	process_cell_fn process_discard_cell;
> @@ -913,7 +914,8 @@ static void __inc_remap_and_issue_cell(void *context,
>  	struct bio *bio;
>  
>  	while ((bio = bio_list_pop(&cell->bios))) {
> -		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
> +		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> +		    bio_op(bio) == REQ_OP_PROVISION)
>  			bio_list_add(&info->defer_bios, bio);
>  		else {
>  			inc_all_io_entry(info->tc->pool, bio);
> @@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
>  
>  static int io_overwrites_block(struct pool *pool, struct bio *bio)
>  {
> -	return (bio_data_dir(bio) == WRITE) &&
> -		io_overlaps_block(pool, bio);
> +	return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) &&
> +	       bio_op(bio) != REQ_OP_PROVISION;
>  }
>  
>  static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
> @@ -1953,6 +1955,51 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  	}
>  }
>  
> +static void process_provision_bio(struct thin_c *tc, struct bio *bio)
> +{
> +	int r;
> +	struct pool *pool = tc->pool;
> +	dm_block_t block = get_bio_block(tc, bio);
> +	struct dm_bio_prison_cell *cell;
> +	struct dm_cell_key key;
> +	struct dm_thin_lookup_result lookup_result;
> +
> +	/*
> +	 * If cell is already occupied, then the block is already
> +	 * being provisioned so we have nothing further to do here.
> +	 */
> +	build_virtual_key(tc->td, block, &key);
> +	if (bio_detain(pool, &key, bio, &cell))
> +		return;
> +
> +	if (tc->requeue_mode) {
> +		cell_requeue(pool, cell);
> +		return;
> +	}
> +
> +	r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
> +	switch (r) {
> +	case 0:
> +		if (lookup_result.shared) {
> +			process_shared_bio(tc, bio, block, &lookup_result, cell);
> +		} else {
> +			bio_endio(bio);
> +			cell_defer_no_holder(tc, cell);
> +		}
> +		break;
> +	case -ENODATA:
> +		provision_block(tc, bio, block, cell);
> +		break;
> +
> +	default:
> +		DMERR_LIMIT("%s: dm_thin_find_block() failed: error = %d",
> +			    __func__, r);
> +		cell_defer_no_holder(tc, cell);
> +		bio_io_error(bio);
> +		break;
> +	}
> +}
> +
>  static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
>  {
>  	int r;
> @@ -2228,6 +2275,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
>  
>  		if (bio_op(bio) == REQ_OP_DISCARD)
>  			pool->process_discard(tc, bio);
> +		else if (bio_op(bio) == REQ_OP_PROVISION)
> +			pool->process_provision(tc, bio);
>  		else
>  			pool->process_bio(tc, bio);
>  
> @@ -2579,6 +2628,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
>  		dm_pool_metadata_read_only(pool->pmd);
>  		pool->process_bio = process_bio_fail;
>  		pool->process_discard = process_bio_fail;
> +		pool->process_provision = process_bio_fail;
>  		pool->process_cell = process_cell_fail;
>  		pool->process_discard_cell = process_cell_fail;
>  		pool->process_prepared_mapping = process_prepared_mapping_fail;
> @@ -2592,6 +2642,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
>  		dm_pool_metadata_read_only(pool->pmd);
>  		pool->process_bio = process_bio_read_only;
>  		pool->process_discard = process_bio_success;
> +		pool->process_provision = process_bio_fail;
>  		pool->process_cell = process_cell_read_only;
>  		pool->process_discard_cell = process_cell_success;
>  		pool->process_prepared_mapping = process_prepared_mapping_fail;
> @@ -2612,6 +2663,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
>  		pool->out_of_data_space = true;
>  		pool->process_bio = process_bio_read_only;
>  		pool->process_discard = process_discard_bio;
> +		pool->process_provision = process_bio_fail;
>  		pool->process_cell = process_cell_read_only;
>  		pool->process_prepared_mapping = process_prepared_mapping;
>  		set_discard_callbacks(pool);
> @@ -2628,6 +2680,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
>  		dm_pool_metadata_read_write(pool->pmd);
>  		pool->process_bio = process_bio;
>  		pool->process_discard = process_discard_bio;
> +		pool->process_provision = process_provision_bio;
>  		pool->process_cell = process_cell;
>  		pool->process_prepared_mapping = process_prepared_mapping;
>  		set_discard_callbacks(pool);
> @@ -2749,7 +2802,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
>  		return DM_MAPIO_SUBMITTED;
>  	}
>  
> -	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
> +	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> +	    bio_op(bio) == REQ_OP_PROVISION) {
>  		thin_defer_bio_with_throttle(tc, bio);
>  		return DM_MAPIO_SUBMITTED;
>  	}
> @@ -3396,6 +3450,9 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	pt->adjusted_pf = pt->requested_pf = pf;
>  	ti->num_flush_bios = 1;
>  	ti->limit_swap_bios = true;
> +	ti->num_provision_bios = 1;
> +	ti->provision_supported = true;
> +	ti->max_provision_granularity = true;
>  
>  	/*
>  	 * Only need to enable discards if the pool should pass
> @@ -4114,6 +4171,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  	 * The pool uses the same discard limits as the underlying data
>  	 * device.  DM core has already set this up.
>  	 */
> +
> +	limits->max_provision_sectors = pool->sectors_per_block;
>  }
>  
>  static struct target_type pool_target = {
> @@ -4288,6 +4347,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  		ti->max_discard_granularity = true;
>  	}
>  
> +	ti->num_provision_bios = 1;
> +	ti->provision_supported = true;
> +

We need this in thin_ctr: ti->max_provision_granularity = true;

More needed in the thin target than thin-pool; otherwise provision bio
issued to thin devices won't be split appropriately.  But I do think
its fine to set in both thin_ctr and pool_ctr.

Otherwise, looks good.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-09 16:58         ` Mike Snitzer
@ 2023-05-11 20:03           ` Sarthak Kukreti
  2023-05-12 14:34             ` Mike Snitzer
  0 siblings, 1 reply; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-11 20:03 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Tue, May 9, 2023 at 9:58 AM Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Sat, May 06 2023 at  2:29P -0400,
> Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:
>
> > dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block. Additionally, if 'skip_block_zeroing'
> > is not set, dm-thin will opt to zero out the entire range as a part
> > of provisioning.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  drivers/md/dm-thin.c | 70 +++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 66 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index 2b13c949bd72..3f94f53ac956 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -274,6 +274,7 @@ struct pool {
> >
> >       process_bio_fn process_bio;
> >       process_bio_fn process_discard;
> > +     process_bio_fn process_provision;
> >
> >       process_cell_fn process_cell;
> >       process_cell_fn process_discard_cell;
> > @@ -913,7 +914,8 @@ static void __inc_remap_and_issue_cell(void *context,
> >       struct bio *bio;
> >
> >       while ((bio = bio_list_pop(&cell->bios))) {
> > -             if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
> > +             if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> > +                 bio_op(bio) == REQ_OP_PROVISION)
> >                       bio_list_add(&info->defer_bios, bio);
> >               else {
> >                       inc_all_io_entry(info->tc->pool, bio);
> > @@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
> >
> >  static int io_overwrites_block(struct pool *pool, struct bio *bio)
> >  {
> > -     return (bio_data_dir(bio) == WRITE) &&
> > -             io_overlaps_block(pool, bio);
> > +     return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) &&
> > +            bio_op(bio) != REQ_OP_PROVISION;
> >  }
> >
> >  static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
> > @@ -1953,6 +1955,51 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
> >       }
> >  }
> >
> > +static void process_provision_bio(struct thin_c *tc, struct bio *bio)
> > +{
> > +     int r;
> > +     struct pool *pool = tc->pool;
> > +     dm_block_t block = get_bio_block(tc, bio);
> > +     struct dm_bio_prison_cell *cell;
> > +     struct dm_cell_key key;
> > +     struct dm_thin_lookup_result lookup_result;
> > +
> > +     /*
> > +      * If cell is already occupied, then the block is already
> > +      * being provisioned so we have nothing further to do here.
> > +      */
> > +     build_virtual_key(tc->td, block, &key);
> > +     if (bio_detain(pool, &key, bio, &cell))
> > +             return;
> > +
> > +     if (tc->requeue_mode) {
> > +             cell_requeue(pool, cell);
> > +             return;
> > +     }
> > +
> > +     r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
> > +     switch (r) {
> > +     case 0:
> > +             if (lookup_result.shared) {
> > +                     process_shared_bio(tc, bio, block, &lookup_result, cell);
> > +             } else {
> > +                     bio_endio(bio);
> > +                     cell_defer_no_holder(tc, cell);
> > +             }
> > +             break;
> > +     case -ENODATA:
> > +             provision_block(tc, bio, block, cell);
> > +             break;
> > +
> > +     default:
> > +             DMERR_LIMIT("%s: dm_thin_find_block() failed: error = %d",
> > +                         __func__, r);
> > +             cell_defer_no_holder(tc, cell);
> > +             bio_io_error(bio);
> > +             break;
> > +     }
> > +}
> > +
> >  static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> >  {
> >       int r;
> > @@ -2228,6 +2275,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
> >
> >               if (bio_op(bio) == REQ_OP_DISCARD)
> >                       pool->process_discard(tc, bio);
> > +             else if (bio_op(bio) == REQ_OP_PROVISION)
> > +                     pool->process_provision(tc, bio);
> >               else
> >                       pool->process_bio(tc, bio);
> >
> > @@ -2579,6 +2628,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
> >               dm_pool_metadata_read_only(pool->pmd);
> >               pool->process_bio = process_bio_fail;
> >               pool->process_discard = process_bio_fail;
> > +             pool->process_provision = process_bio_fail;
> >               pool->process_cell = process_cell_fail;
> >               pool->process_discard_cell = process_cell_fail;
> >               pool->process_prepared_mapping = process_prepared_mapping_fail;
> > @@ -2592,6 +2642,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
> >               dm_pool_metadata_read_only(pool->pmd);
> >               pool->process_bio = process_bio_read_only;
> >               pool->process_discard = process_bio_success;
> > +             pool->process_provision = process_bio_fail;
> >               pool->process_cell = process_cell_read_only;
> >               pool->process_discard_cell = process_cell_success;
> >               pool->process_prepared_mapping = process_prepared_mapping_fail;
> > @@ -2612,6 +2663,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
> >               pool->out_of_data_space = true;
> >               pool->process_bio = process_bio_read_only;
> >               pool->process_discard = process_discard_bio;
> > +             pool->process_provision = process_bio_fail;
> >               pool->process_cell = process_cell_read_only;
> >               pool->process_prepared_mapping = process_prepared_mapping;
> >               set_discard_callbacks(pool);
> > @@ -2628,6 +2680,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
> >               dm_pool_metadata_read_write(pool->pmd);
> >               pool->process_bio = process_bio;
> >               pool->process_discard = process_discard_bio;
> > +             pool->process_provision = process_provision_bio;
> >               pool->process_cell = process_cell;
> >               pool->process_prepared_mapping = process_prepared_mapping;
> >               set_discard_callbacks(pool);
> > @@ -2749,7 +2802,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
> >               return DM_MAPIO_SUBMITTED;
> >       }
> >
> > -     if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
> > +     if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> > +         bio_op(bio) == REQ_OP_PROVISION) {
> >               thin_defer_bio_with_throttle(tc, bio);
> >               return DM_MAPIO_SUBMITTED;
> >       }
> > @@ -3396,6 +3450,9 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> >       pt->adjusted_pf = pt->requested_pf = pf;
> >       ti->num_flush_bios = 1;
> >       ti->limit_swap_bios = true;
> > +     ti->num_provision_bios = 1;
> > +     ti->provision_supported = true;
> > +     ti->max_provision_granularity = true;
> >
> >       /*
> >        * Only need to enable discards if the pool should pass
> > @@ -4114,6 +4171,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >        * The pool uses the same discard limits as the underlying data
> >        * device.  DM core has already set this up.
> >        */
> > +
> > +     limits->max_provision_sectors = pool->sectors_per_block;
> >  }
> >
> >  static struct target_type pool_target = {
> > @@ -4288,6 +4347,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> >               ti->max_discard_granularity = true;
> >       }
> >
> > +     ti->num_provision_bios = 1;
> > +     ti->provision_supported = true;
> > +
>
> We need this in thin_ctr: ti->max_provision_granularity = true;
>
> More needed in the thin target than thin-pool; otherwise provision bio
> issued to thin devices won't be split appropriately.  But I do think
> its fine to set in both thin_ctr and pool_ctr.
>
> Otherwise, looks good.
>
Thanks! I'll add it to the next iteration (in addition to any other
feedback that's added to v6).

Given that this series covers multiple subsystems, would there be a
preferred way of queueing this for merge?

Best
Sarthak

> Thanks,
> Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-11 20:03           ` Sarthak Kukreti
@ 2023-05-12 14:34             ` Mike Snitzer
  0 siblings, 0 replies; 57+ messages in thread
From: Mike Snitzer @ 2023-05-12 14:34 UTC (permalink / raw)
  To: Sarthak Kukreti, Jens Axboe, Brian Foster, Darrick J. Wong
  Cc: Christoph Hellwig, Theodore Ts'o, Michael S. Tsirkin,
	Jason Wang, Bart Van Assche, linux-kernel, linux-block, dm-devel,
	Andreas Dilger, Stefan Hajnoczi, linux-fsdevel, linux-ext4,
	Alasdair Kergon

On Thu, May 11 2023 at  4:03P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> On Tue, May 9, 2023 at 9:58 AM Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > On Sat, May 06 2023 at  2:29P -0400,
> > Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:
> >
> > > dm-thinpool uses the provision request to provision
> > > blocks for a dm-thin device. dm-thinpool currently does not
> > > pass through REQ_OP_PROVISION to underlying devices.
> > >
> > > For shared blocks, provision requests will break sharing and copy the
> > > contents of the entire block. Additionally, if 'skip_block_zeroing'
> > > is not set, dm-thin will opt to zero out the entire range as a part
> > > of provisioning.
> > >
> > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > ---
> > >  drivers/md/dm-thin.c | 70 +++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 66 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > index 2b13c949bd72..3f94f53ac956 100644
> > > --- a/drivers/md/dm-thin.c
> > > +++ b/drivers/md/dm-thin.c
> > > @@ -4288,6 +4347,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> > >               ti->max_discard_granularity = true;
> > >       }
> > >
> > > +     ti->num_provision_bios = 1;
> > > +     ti->provision_supported = true;
> > > +
> >
> > We need this in thin_ctr: ti->max_provision_granularity = true;
> >
> > More needed in the thin target than thin-pool; otherwise provision bio
> > issued to thin devices won't be split appropriately.  But I do think
> > its fine to set in both thin_ctr and pool_ctr.
> >
> > Otherwise, looks good.
> >
> Thanks! I'll add it to the next iteration (in addition to any other
> feedback that's added to v6).

OK. I'll begin basing dm-thinp's WRITE_ZEROES support ontop of this
series.
 
> Given that this series covers multiple subsystems, would there be a
> preferred way of queueing this for merge?

I think it'd be OK for Jens to pick this series up and I'll rebase
my corresponding DM tree once he does.

In addition to Jens; Brian, Darrick and/or others: any chance you
could review the block core changes in this series to ensure you're
cool with them?

Would be nice to get Sarthak review feedback so that hopefully his v7
can be the final revision.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-06  6:29       ` [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
  2023-05-09 16:58         ` Mike Snitzer
@ 2023-05-12 17:32         ` Mike Snitzer
  2023-05-15 21:19           ` Sarthak Kukreti
  1 sibling, 1 reply; 57+ messages in thread
From: Mike Snitzer @ 2023-05-12 17:32 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Sat, May 06 2023 at  2:29P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> dm-thinpool uses the provision request to provision
> blocks for a dm-thin device. dm-thinpool currently does not
> pass through REQ_OP_PROVISION to underlying devices.
> 
> For shared blocks, provision requests will break sharing and copy the
> contents of the entire block. Additionally, if 'skip_block_zeroing'
> is not set, dm-thin will opt to zero out the entire range as a part
> of provisioning.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  drivers/md/dm-thin.c | 70 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 2b13c949bd72..3f94f53ac956 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
...
> @@ -4114,6 +4171,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  	 * The pool uses the same discard limits as the underlying data
>  	 * device.  DM core has already set this up.
>  	 */
> +
> +	limits->max_provision_sectors = pool->sectors_per_block;

Just noticed that setting limits->max_provision_sectors needs to move
above pool_io_hints code that sets up discards -- otherwise the early
return from if (!pt->adjusted_pf.discard_enabled) will cause setting
max_provision_sectors to be skipped.

Here is a roll up of the fixes that need to be folded into this patch:

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 3f94f53ac956..90c8e36cb327 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -4151,6 +4151,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
 	}
 
+	limits->max_provision_sectors = pool->sectors_per_block;
+
 	/*
 	 * pt->adjusted_pf is a staging area for the actual features to use.
 	 * They get transferred to the live pool in bind_control_target()
@@ -4171,8 +4173,6 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	 * The pool uses the same discard limits as the underlying data
 	 * device.  DM core has already set this up.
 	 */
-
-	limits->max_provision_sectors = pool->sectors_per_block;
 }
 
 static struct target_type pool_target = {
@@ -4349,6 +4349,7 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	ti->num_provision_bios = 1;
 	ti->provision_supported = true;
+	ti->max_provision_granularity = true;
 
 	mutex_unlock(&dm_thin_pool_table.mutex);
 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 0/5] Introduce block provisioning primitives
  2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
                         ` (4 preceding siblings ...)
  2023-05-06  6:29       ` [PATCH v6 5/5] loop: Add support for provision requests Sarthak Kukreti
@ 2023-05-12 18:28       ` Darrick J. Wong
  5 siblings, 0 replies; 57+ messages in thread
From: Darrick J. Wong @ 2023-05-12 18:28 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche

On Fri, May 05, 2023 at 11:29:04PM -0700, Sarthak Kukreti wrote:
> Hi,
> 
> This patch series covers iteration 6 of adding support for block
> provisioning requests.

I didn't even notice there was a v6.  Could you start a fresh thread
when you bump the revision count, please?

--D

> Changes from v5:
> - Remove explicit supports_provision from dm devices.
> - Move provision sectors io hint to pool_io_hint. Other devices
>   will derive the provisioning limits from the stack.
> - Remove artifact from v4 to omit cell_defer_no_holder for
>   REQ_OP_PROVISION.
> - Fix blkdev_fallocate() called with invalid fallocate
>   modes to propagate errors correctly.
> 
> Sarthak Kukreti (5):
>   block: Don't invalidate pagecache for invalid falloc modes
>   block: Introduce provisioning primitives
>   dm: Add block provisioning support
>   dm-thin: Add REQ_OP_PROVISION support
>   loop: Add support for provision requests
> 
>  block/blk-core.c              |  5 +++
>  block/blk-lib.c               | 53 ++++++++++++++++++++++++++
>  block/blk-merge.c             | 18 +++++++++
>  block/blk-settings.c          | 19 ++++++++++
>  block/blk-sysfs.c             |  9 +++++
>  block/bounce.c                |  1 +
>  block/fops.c                  | 31 +++++++++++++---
>  drivers/block/loop.c          | 42 +++++++++++++++++++++
>  drivers/md/dm-crypt.c         |  4 +-
>  drivers/md/dm-linear.c        |  1 +
>  drivers/md/dm-snap.c          |  7 ++++
>  drivers/md/dm-table.c         | 23 ++++++++++++
>  drivers/md/dm-thin.c          | 70 +++++++++++++++++++++++++++++++++--
>  drivers/md/dm.c               |  6 +++
>  include/linux/bio.h           |  6 ++-
>  include/linux/blk_types.h     |  5 ++-
>  include/linux/blkdev.h        | 16 ++++++++
>  include/linux/device-mapper.h | 17 +++++++++
>  18 files changed, 319 insertions(+), 14 deletions(-)
> 
> -- 
> 2.40.1.521.gf1e218fcd8-goog
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes
  2023-05-06  6:29       ` [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
  2023-05-09 16:51         ` Mike Snitzer
@ 2023-05-12 18:31         ` Darrick J. Wong
  1 sibling, 0 replies; 57+ messages in thread
From: Darrick J. Wong @ 2023-05-12 18:31 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche, stable

On Fri, May 05, 2023 at 11:29:05PM -0700, Sarthak Kukreti wrote:
> Only call truncate_bdev_range() if the fallocate mode is
> supported. This fixes a bug where data in the pagecache
> could be invalidated if the fallocate() was called on the
> block device with an invalid mode.
> 
> Fixes: 25f4c41415e5 ("block: implement (some of) fallocate for block devices")
> Cc: stable@vger.kernel.org
> Reported-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>

Ideally you'd only take filemap_invalidate_lock for valid modes, but eh
who cares about efficiency for the EOPNOTSUPP case, let's move on. :)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  block/fops.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c..4c70fdc546e7 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -648,24 +648,35 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  
>  	filemap_invalidate_lock(inode->i_mapping);
>  
> -	/* Invalidate the page cache, including dirty pages. */
> -	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> -	if (error)
> -		goto fail;
> -
> +	/*
> +	 * Invalidate the page cache, including dirty pages, for valid
> +	 * de-allocate mode calls to fallocate().
> +	 */
>  	switch (mode) {
>  	case FALLOC_FL_ZERO_RANGE:
>  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (error)
> +			goto fail;
> +
>  		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
>  					     len >> SECTOR_SHIFT, GFP_KERNEL,
>  					     BLKDEV_ZERO_NOUNMAP);
>  		break;
>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (error)
> +			goto fail;
> +
>  		error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
>  					     len >> SECTOR_SHIFT, GFP_KERNEL,
>  					     BLKDEV_ZERO_NOFALLBACK);
>  		break;
>  	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> +		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> +		if (error)
> +			goto fail;
> +
>  		error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
>  					     len >> SECTOR_SHIFT, GFP_KERNEL);
>  		break;
> -- 
> 2.40.1.521.gf1e218fcd8-goog
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 2/5] block: Introduce provisioning primitives
  2023-05-06  6:29       ` [PATCH v6 2/5] block: Introduce provisioning primitives Sarthak Kukreti
  2023-05-09 16:52         ` Mike Snitzer
@ 2023-05-12 18:37         ` Darrick J. Wong
  2023-05-15 21:55           ` Sarthak Kukreti
  1 sibling, 1 reply; 57+ messages in thread
From: Darrick J. Wong @ 2023-05-12 18:37 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche

On Fri, May 05, 2023 at 11:29:06PM -0700, Sarthak Kukreti wrote:
> Introduce block request REQ_OP_PROVISION. The intent of this request
> is to request underlying storage to preallocate disk space for the given
> block range. Block devices that support this capability will export
> a provision limit within their request queues.
> 
> This patch also adds the capability to call fallocate() in mode 0
> on block devices, which will send REQ_OP_PROVISION to the block
> device for the specified range,
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/blk-core.c          |  5 ++++
>  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
>  block/blk-merge.c         | 18 +++++++++++++
>  block/blk-settings.c      | 19 ++++++++++++++
>  block/blk-sysfs.c         |  9 +++++++
>  block/bounce.c            |  1 +
>  block/fops.c              | 10 +++++++-
>  include/linux/bio.h       |  6 +++--
>  include/linux/blk_types.h |  5 +++-
>  include/linux/blkdev.h    | 16 ++++++++++++
>  10 files changed, 138 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 42926e6cb83c..4a2342ba3a8b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
>  	REQ_OP_NAME(WRITE_ZEROES),
>  	REQ_OP_NAME(DRV_IN),
>  	REQ_OP_NAME(DRV_OUT),
> +	REQ_OP_NAME(PROVISION)
>  };
>  #undef REQ_OP_NAME
>  
> @@ -798,6 +799,10 @@ void submit_bio_noacct(struct bio *bio)
>  		if (!q->limits.max_write_zeroes_sectors)
>  			goto not_supported;
>  		break;
> +	case REQ_OP_PROVISION:
> +		if (!q->limits.max_provision_sectors)
> +			goto not_supported;
> +		break;
>  	default:
>  		break;
>  	}
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index e59c3069e835..647b6451660b 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -343,3 +343,56 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>  	return ret;
>  }
>  EXPORT_SYMBOL(blkdev_issue_secure_erase);
> +
> +/**
> + * blkdev_issue_provision - provision a block range
> + * @bdev:	blockdev to write
> + * @sector:	start sector
> + * @nr_sects:	number of sectors to provision
> + * @gfp_mask:	memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *  Issues a provision request to the block device for the range of sectors.
> + *  For thinly provisioned block devices, this acts as a signal for the
> + *  underlying storage pool to allocate space for this block range.
> + */
> +int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> +		sector_t nr_sects, gfp_t gfp)
> +{
> +	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
> +	unsigned int max_sectors = bdev_max_provision_sectors(bdev);
> +	struct bio *bio = NULL;
> +	struct blk_plug plug;
> +	int ret = 0;
> +
> +	if (max_sectors == 0)
> +		return -EOPNOTSUPP;
> +	if ((sector | nr_sects) & bs_mask)
> +		return -EINVAL;
> +	if (bdev_read_only(bdev))
> +		return -EPERM;
> +
> +	blk_start_plug(&plug);
> +	for (;;) {
> +		unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
> +
> +		bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
> +		bio->bi_iter.bi_sector = sector;
> +		bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
> +
> +		sector += req_sects;
> +		nr_sects -= req_sects;
> +		if (!nr_sects) {
> +			ret = submit_bio_wait(bio);
> +			if (ret == -EOPNOTSUPP)
> +				ret = 0;

Why do we convert EOPNOTSUPP to success here?  If the device suddenly
forgets how to provision space, wouldn't we want to pass that up to the
caller?

(I'm not sure when this would happen -- perhaps the bdev has the general
provisioning capability but not for the specific range requested?)

The rest of the patch looks ok to me.

--D

> +			bio_put(bio);
> +			break;
> +		}
> +		cond_resched();
> +	}
> +	blk_finish_plug(&plug);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(blkdev_issue_provision);
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 6460abdb2426..a3ffebb97a1d 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
>  	return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
>  }
>  
> +static struct bio *bio_split_provision(struct bio *bio,
> +					const struct queue_limits *lim,
> +					unsigned int *nsegs, struct bio_set *bs)
> +{
> +	*nsegs = 0;
> +
> +	if (!lim->max_provision_sectors)
> +		return NULL;
> +
> +	if (bio_sectors(bio) <= lim->max_provision_sectors)
> +		return NULL;
> +
> +	return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
> +}
> +
>  /*
>   * Return the maximum number of sectors from the start of a bio that may be
>   * submitted as a single request to a block device. If enough sectors remain,
> @@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
>  	case REQ_OP_WRITE_ZEROES:
>  		split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
>  		break;
> +	case REQ_OP_PROVISION:
> +		split = bio_split_provision(bio, lim, nr_segs, bs);
> +		break;
>  	default:
>  		split = bio_split_rw(bio, lim, nr_segs, bs,
>  				get_max_io_size(bio, lim) << SECTOR_SHIFT);
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 896b4654ab00..d303e6614c36 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->zoned = BLK_ZONED_NONE;
>  	lim->zone_write_granularity = 0;
>  	lim->dma_alignment = 511;
> +	lim->max_provision_sectors = 0;
>  }
>  
>  /**
> @@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>  	lim->max_dev_sectors = UINT_MAX;
>  	lim->max_write_zeroes_sectors = UINT_MAX;
>  	lim->max_zone_append_sectors = UINT_MAX;
> +	lim->max_provision_sectors = UINT_MAX;
>  }
>  EXPORT_SYMBOL(blk_set_stacking_limits);
>  
> @@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>  
> +/**
> + * blk_queue_max_provision_sectors - set max sectors for a single provision
> + *
> + * @q:  the request queue for the device
> + * @max_provision_sectors: maximum number of sectors to provision per command
> + **/
> +
> +void blk_queue_max_provision_sectors(struct request_queue *q,
> +		unsigned int max_provision_sectors)
> +{
> +	q->limits.max_provision_sectors = max_provision_sectors;
> +}
> +EXPORT_SYMBOL(blk_queue_max_provision_sectors);
> +
>  /**
>   * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
>   * @q:  the request queue for the device
> @@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  	t->max_segment_size = min_not_zero(t->max_segment_size,
>  					   b->max_segment_size);
>  
> +	t->max_provision_sectors = min_not_zero(t->max_provision_sectors,
> +						b->max_provision_sectors);
> +
>  	t->misaligned |= b->misaligned;
>  
>  	alignment = queue_limit_alignment_offset(b, start);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index f1fce1c7fa44..0a3165211c66 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -213,6 +213,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
>  	return queue_var_show(0, page);
>  }
>  
> +static ssize_t queue_provision_max_show(struct request_queue *q,
> +		char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_provision_sectors << 9);
> +}
> +
>  static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
>  {
>  	return queue_var_show(0, page);
> @@ -604,6 +611,7 @@ QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
>  QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
>  QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
>  
> +QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
>  QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
>  QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
>  QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
> @@ -661,6 +669,7 @@ static struct attribute *queue_attrs[] = {
>  	&queue_discard_max_entry.attr,
>  	&queue_discard_max_hw_entry.attr,
>  	&queue_discard_zeroes_data_entry.attr,
> +	&queue_provision_max_entry.attr,
>  	&queue_write_same_max_entry.attr,
>  	&queue_write_zeroes_max_entry.attr,
>  	&queue_zone_append_max_entry.attr,
> diff --git a/block/bounce.c b/block/bounce.c
> index 7cfcb242f9a1..ab9d8723ae64 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
>  	case REQ_OP_DISCARD:
>  	case REQ_OP_SECURE_ERASE:
>  	case REQ_OP_WRITE_ZEROES:
> +	case REQ_OP_PROVISION:
>  		break;
>  	default:
>  		bio_for_each_segment(bv, bio_src, iter)
> diff --git a/block/fops.c b/block/fops.c
> index 4c70fdc546e7..be2e41f160bf 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -613,7 +613,8 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  
>  #define	BLKDEV_FALLOC_FL_SUPPORTED					\
>  		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
> -		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
> +		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE |	\
> +		 FALLOC_FL_UNSHARE_RANGE)
>  
>  static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  			     loff_t len)
> @@ -653,6 +654,13 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	 * de-allocate mode calls to fallocate().
>  	 */
>  	switch (mode) {
> +	case 0:
> +	case FALLOC_FL_UNSHARE_RANGE:
> +	case FALLOC_FL_KEEP_SIZE:
> +	case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
> +		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> +					       len >> SECTOR_SHIFT, GFP_KERNEL);
> +		break;
>  	case FALLOC_FL_ZERO_RANGE:
>  	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
>  		error = truncate_bdev_range(bdev, file->f_mode, start, end);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d766be7152e1..9820b3b039f2 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
>  	    bio->bi_iter.bi_size &&
>  	    bio_op(bio) != REQ_OP_DISCARD &&
>  	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
> -	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
> +	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> +	    bio_op(bio) != REQ_OP_PROVISION)
>  		return true;
>  
>  	return false;
> @@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
>  {
>  	return bio_op(bio) == REQ_OP_DISCARD ||
>  	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
> -	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
> +	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
> +	       bio_op(bio) == REQ_OP_PROVISION;
>  }
>  
>  static inline void *bio_data(struct bio *bio)
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 99be590f952f..27bdf88f541c 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -385,7 +385,10 @@ enum req_op {
>  	REQ_OP_DRV_IN		= (__force blk_opf_t)34,
>  	REQ_OP_DRV_OUT		= (__force blk_opf_t)35,
>  
> -	REQ_OP_LAST		= (__force blk_opf_t)36,
> +	/* request device to provision block */
> +	REQ_OP_PROVISION        = (__force blk_opf_t)37,
> +
> +	REQ_OP_LAST		= (__force blk_opf_t)38,
>  };
>  
>  enum req_flag_bits {
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 941304f17492..239e2f418b6e 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -303,6 +303,7 @@ struct queue_limits {
>  	unsigned int		discard_granularity;
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
> +	unsigned int		max_provision_sectors;
>  
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
> @@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
>  		unsigned int max_discard_sectors);
>  extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
>  		unsigned int max_write_same_sectors);
> +extern void blk_queue_max_provision_sectors(struct request_queue *q,
> +		unsigned int max_provision_sectors);
>  extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
>  extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
>  		unsigned int max_zone_append_sectors);
> @@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp);
>  
> +extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> +		sector_t nr_sects, gfp_t gfp_mask);
> +
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
>  
> @@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
>  	return q->limits.max_discard_segments;
>  }
>  
> +static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
> +{
> +	return q->limits.max_provision_sectors;
> +}
> +
>  static inline unsigned int queue_max_segment_size(const struct request_queue *q)
>  {
>  	return q->limits.max_segment_size;
> @@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
>  	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
>  }
>  
> +static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
> +{
> +	return bdev_get_queue(bdev)->limits.max_provision_sectors;
> +}
> +
>  static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
>  {
>  	return blk_queue_zoned_model(bdev_get_queue(bdev));
> -- 
> 2.40.1.521.gf1e218fcd8-goog
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 5/5] loop: Add support for provision requests
  2023-05-06  6:29       ` [PATCH v6 5/5] loop: Add support for provision requests Sarthak Kukreti
@ 2023-05-15 12:40         ` Brian Foster
  2023-05-15 21:31           ` Sarthak Kukreti
  0 siblings, 1 reply; 57+ messages in thread
From: Brian Foster @ 2023-05-15 12:40 UTC (permalink / raw)
  To: Sarthak Kukreti
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

On Fri, May 05, 2023 at 11:29:09PM -0700, Sarthak Kukreti wrote:
> Add support for provision requests to loopback devices.
> Loop devices will configure provision support based on
> whether the underlying block device/file can support
> the provision request and upon receiving a provision bio,
> will map it to the backing device/storage. For loop devices
> over files, a REQ_OP_PROVISION request will translate to
> an fallocate mode 0 call on the backing file.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  drivers/block/loop.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bc31bb7072a2..13c4b4f8b9c1 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -327,6 +327,24 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
>  	return ret;
>  }
>  
> +static int lo_req_provision(struct loop_device *lo, struct request *rq, loff_t pos)
> +{
> +	struct file *file = lo->lo_backing_file;
> +	struct request_queue *q = lo->lo_queue;
> +	int ret;
> +
> +	if (!q->limits.max_provision_sectors) {
> +		ret = -EOPNOTSUPP;
> +		goto out;
> +	}
> +
> +	ret = file->f_op->fallocate(file, 0, pos, blk_rq_bytes(rq));
> +	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
> +		ret = -EIO;
> + out:
> +	return ret;
> +}
> +
>  static int lo_req_flush(struct loop_device *lo, struct request *rq)
>  {
>  	int ret = vfs_fsync(lo->lo_backing_file, 0);
> @@ -488,6 +506,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
>  				FALLOC_FL_PUNCH_HOLE);
>  	case REQ_OP_DISCARD:
>  		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
> +	case REQ_OP_PROVISION:
> +		return lo_req_provision(lo, rq, pos);

Hi Sarthak,

The only thing that stands out to me is the separate lo_req_provision()
helper here. It seems it might be a little cleaner to extend and reuse
lo_req_fallocate()..? But that's not something I feel strongly about, so
this all looks pretty good to me either way, FWIW.

Brian

>  	case REQ_OP_WRITE:
>  		if (cmd->use_aio)
>  			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
> @@ -754,6 +774,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
>  				   &loop_attribute_group);
>  }
>  
> +static void loop_config_provision(struct loop_device *lo)
> +{
> +	struct file *file = lo->lo_backing_file;
> +	struct inode *inode = file->f_mapping->host;
> +
> +	/*
> +	 * If the backing device is a block device, mirror its provisioning
> +	 * capability.
> +	 */
> +	if (S_ISBLK(inode->i_mode)) {
> +		blk_queue_max_provision_sectors(lo->lo_queue,
> +			bdev_max_provision_sectors(I_BDEV(inode)));
> +	} else if (file->f_op->fallocate) {
> +		blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
> +	} else {
> +		blk_queue_max_provision_sectors(lo->lo_queue, 0);
> +	}
> +}
> +
>  static void loop_config_discard(struct loop_device *lo)
>  {
>  	struct file *file = lo->lo_backing_file;
> @@ -1092,6 +1131,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
>  	blk_queue_io_min(lo->lo_queue, bsize);
>  
>  	loop_config_discard(lo);
> +	loop_config_provision(lo);
>  	loop_update_rotational(lo);
>  	loop_update_dio(lo);
>  	loop_sysfs_init(lo);
> @@ -1304,6 +1344,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
>  	}
>  
>  	loop_config_discard(lo);
> +	loop_config_provision(lo);
>  
>  	/* update dio if lo_offset or transfer is changed */
>  	__loop_update_dio(lo, lo->use_dio);
> @@ -1830,6 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	case REQ_OP_FLUSH:
>  	case REQ_OP_DISCARD:
>  	case REQ_OP_WRITE_ZEROES:
> +	case REQ_OP_PROVISION:
>  		cmd->use_aio = false;
>  		break;
>  	default:
> -- 
> 2.40.1.521.gf1e218fcd8-goog
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support
  2023-05-12 17:32         ` Mike Snitzer
@ 2023-05-15 21:19           ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-15 21:19 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Theodore Ts'o, Michael S. Tsirkin,
	Darrick J. Wong, Jason Wang, Bart Van Assche, Christoph Hellwig,
	Andreas Dilger, Stefan Hajnoczi, Brian Foster, Alasdair Kergon

On Fri, May 12, 2023 at 10:32 AM Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Sat, May 06 2023 at  2:29P -0400,
> Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:
>
> > dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block. Additionally, if 'skip_block_zeroing'
> > is not set, dm-thin will opt to zero out the entire range as a part
> > of provisioning.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  drivers/md/dm-thin.c | 70 +++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 66 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index 2b13c949bd72..3f94f53ac956 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> ...
> > @@ -4114,6 +4171,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >        * The pool uses the same discard limits as the underlying data
> >        * device.  DM core has already set this up.
> >        */
> > +
> > +     limits->max_provision_sectors = pool->sectors_per_block;
>
> Just noticed that setting limits->max_provision_sectors needs to move
> above pool_io_hints code that sets up discards -- otherwise the early
> return from if (!pt->adjusted_pf.discard_enabled) will cause setting
> max_provision_sectors to be skipped.
>
> Here is a roll up of the fixes that need to be folded into this patch:
>
Ah right, thanks for pointing that out! I'll fold this into v7.

Best
Sarthak

> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 3f94f53ac956..90c8e36cb327 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -4151,6 +4151,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
>                 blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
>         }
>
> +       limits->max_provision_sectors = pool->sectors_per_block;
> +
>         /*
>          * pt->adjusted_pf is a staging area for the actual features to use.
>          * They get transferred to the live pool in bind_control_target()
> @@ -4171,8 +4173,6 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
>          * The pool uses the same discard limits as the underlying data
>          * device.  DM core has already set this up.
>          */
> -
> -       limits->max_provision_sectors = pool->sectors_per_block;
>  }
>
>  static struct target_type pool_target = {
> @@ -4349,6 +4349,7 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>
>         ti->num_provision_bios = 1;
>         ti->provision_supported = true;
> +       ti->max_provision_granularity = true;
>
>         mutex_unlock(&dm_thin_pool_table.mutex);
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 5/5] loop: Add support for provision requests
  2023-05-15 12:40         ` Brian Foster
@ 2023-05-15 21:31           ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-15 21:31 UTC (permalink / raw)
  To: Brian Foster
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche,
	Darrick J. Wong

On Mon, May 15, 2023 at 5:37 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Fri, May 05, 2023 at 11:29:09PM -0700, Sarthak Kukreti wrote:
> > Add support for provision requests to loopback devices.
> > Loop devices will configure provision support based on
> > whether the underlying block device/file can support
> > the provision request and upon receiving a provision bio,
> > will map it to the backing device/storage. For loop devices
> > over files, a REQ_OP_PROVISION request will translate to
> > an fallocate mode 0 call on the backing file.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  drivers/block/loop.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> >
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index bc31bb7072a2..13c4b4f8b9c1 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -327,6 +327,24 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
> >       return ret;
> >  }
> >
> > +static int lo_req_provision(struct loop_device *lo, struct request *rq, loff_t pos)
> > +{
> > +     struct file *file = lo->lo_backing_file;
> > +     struct request_queue *q = lo->lo_queue;
> > +     int ret;
> > +
> > +     if (!q->limits.max_provision_sectors) {
> > +             ret = -EOPNOTSUPP;
> > +             goto out;
> > +     }
> > +
> > +     ret = file->f_op->fallocate(file, 0, pos, blk_rq_bytes(rq));
> > +     if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
> > +             ret = -EIO;
> > + out:
> > +     return ret;
> > +}
> > +
> >  static int lo_req_flush(struct loop_device *lo, struct request *rq)
> >  {
> >       int ret = vfs_fsync(lo->lo_backing_file, 0);
> > @@ -488,6 +506,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
> >                               FALLOC_FL_PUNCH_HOLE);
> >       case REQ_OP_DISCARD:
> >               return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
> > +     case REQ_OP_PROVISION:
> > +             return lo_req_provision(lo, rq, pos);
>
> Hi Sarthak,
>
> The only thing that stands out to me is the separate lo_req_provision()
> helper here. It seems it might be a little cleaner to extend and reuse
> lo_req_fallocate()..? But that's not something I feel strongly about, so
> this all looks pretty good to me either way, FWIW.
>
Fair point, I think that should shorten the patch (and for
correctness, we'd want to add FALLOC_FL_KEEP_SIZE for REQ_OP_PROVISION
too). I'll fix this up in v7.

Best
Sarthak

> Brian
>
> >       case REQ_OP_WRITE:
> >               if (cmd->use_aio)
> >                       return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
> > @@ -754,6 +774,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
> >                                  &loop_attribute_group);
> >  }
> >
> > +static void loop_config_provision(struct loop_device *lo)
> > +{
> > +     struct file *file = lo->lo_backing_file;
> > +     struct inode *inode = file->f_mapping->host;
> > +
> > +     /*
> > +      * If the backing device is a block device, mirror its provisioning
> > +      * capability.
> > +      */
> > +     if (S_ISBLK(inode->i_mode)) {
> > +             blk_queue_max_provision_sectors(lo->lo_queue,
> > +                     bdev_max_provision_sectors(I_BDEV(inode)));
> > +     } else if (file->f_op->fallocate) {
> > +             blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
> > +     } else {
> > +             blk_queue_max_provision_sectors(lo->lo_queue, 0);
> > +     }
> > +}
> > +
> >  static void loop_config_discard(struct loop_device *lo)
> >  {
> >       struct file *file = lo->lo_backing_file;
> > @@ -1092,6 +1131,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
> >       blk_queue_io_min(lo->lo_queue, bsize);
> >
> >       loop_config_discard(lo);
> > +     loop_config_provision(lo);
> >       loop_update_rotational(lo);
> >       loop_update_dio(lo);
> >       loop_sysfs_init(lo);
> > @@ -1304,6 +1344,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
> >       }
> >
> >       loop_config_discard(lo);
> > +     loop_config_provision(lo);
> >
> >       /* update dio if lo_offset or transfer is changed */
> >       __loop_update_dio(lo, lo->use_dio);
> > @@ -1830,6 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
> >       case REQ_OP_FLUSH:
> >       case REQ_OP_DISCARD:
> >       case REQ_OP_WRITE_ZEROES:
> > +     case REQ_OP_PROVISION:
> >               cmd->use_aio = false;
> >               break;
> >       default:
> > --
> > 2.40.1.521.gf1e218fcd8-goog
> >
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v6 2/5] block: Introduce provisioning primitives
  2023-05-12 18:37         ` Darrick J. Wong
@ 2023-05-15 21:55           ` Sarthak Kukreti
  0 siblings, 0 replies; 57+ messages in thread
From: Sarthak Kukreti @ 2023-05-15 21:55 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dm-devel, linux-block, linux-ext4, linux-kernel, linux-fsdevel,
	Jens Axboe, Michael S. Tsirkin, Jason Wang, Stefan Hajnoczi,
	Alasdair Kergon, Mike Snitzer, Christoph Hellwig, Brian Foster,
	Theodore Ts'o, Andreas Dilger, Bart Van Assche

On Fri, May 12, 2023 at 11:37 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Fri, May 05, 2023 at 11:29:06PM -0700, Sarthak Kukreti wrote:
> > Introduce block request REQ_OP_PROVISION. The intent of this request
> > is to request underlying storage to preallocate disk space for the given
> > block range. Block devices that support this capability will export
> > a provision limit within their request queues.
> >
> > This patch also adds the capability to call fallocate() in mode 0
> > on block devices, which will send REQ_OP_PROVISION to the block
> > device for the specified range,
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  block/blk-core.c          |  5 ++++
> >  block/blk-lib.c           | 53 +++++++++++++++++++++++++++++++++++++++
> >  block/blk-merge.c         | 18 +++++++++++++
> >  block/blk-settings.c      | 19 ++++++++++++++
> >  block/blk-sysfs.c         |  9 +++++++
> >  block/bounce.c            |  1 +
> >  block/fops.c              | 10 +++++++-
> >  include/linux/bio.h       |  6 +++--
> >  include/linux/blk_types.h |  5 +++-
> >  include/linux/blkdev.h    | 16 ++++++++++++
> >  10 files changed, 138 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 42926e6cb83c..4a2342ba3a8b 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
> >       REQ_OP_NAME(WRITE_ZEROES),
> >       REQ_OP_NAME(DRV_IN),
> >       REQ_OP_NAME(DRV_OUT),
> > +     REQ_OP_NAME(PROVISION)
> >  };
> >  #undef REQ_OP_NAME
> >
> > @@ -798,6 +799,10 @@ void submit_bio_noacct(struct bio *bio)
> >               if (!q->limits.max_write_zeroes_sectors)
> >                       goto not_supported;
> >               break;
> > +     case REQ_OP_PROVISION:
> > +             if (!q->limits.max_provision_sectors)
> > +                     goto not_supported;
> > +             break;
> >       default:
> >               break;
> >       }
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index e59c3069e835..647b6451660b 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -343,3 +343,56 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
> >       return ret;
> >  }
> >  EXPORT_SYMBOL(blkdev_issue_secure_erase);
> > +
> > +/**
> > + * blkdev_issue_provision - provision a block range
> > + * @bdev:    blockdev to write
> > + * @sector:  start sector
> > + * @nr_sects:        number of sectors to provision
> > + * @gfp_mask:        memory allocation flags (for bio_alloc)
> > + *
> > + * Description:
> > + *  Issues a provision request to the block device for the range of sectors.
> > + *  For thinly provisioned block devices, this acts as a signal for the
> > + *  underlying storage pool to allocate space for this block range.
> > + */
> > +int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> > +             sector_t nr_sects, gfp_t gfp)
> > +{
> > +     sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
> > +     unsigned int max_sectors = bdev_max_provision_sectors(bdev);
> > +     struct bio *bio = NULL;
> > +     struct blk_plug plug;
> > +     int ret = 0;
> > +
> > +     if (max_sectors == 0)
> > +             return -EOPNOTSUPP;
> > +     if ((sector | nr_sects) & bs_mask)
> > +             return -EINVAL;
> > +     if (bdev_read_only(bdev))
> > +             return -EPERM;
> > +
> > +     blk_start_plug(&plug);
> > +     for (;;) {
> > +             unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
> > +
> > +             bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
> > +             bio->bi_iter.bi_sector = sector;
> > +             bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
> > +
> > +             sector += req_sects;
> > +             nr_sects -= req_sects;
> > +             if (!nr_sects) {
> > +                     ret = submit_bio_wait(bio);
> > +                     if (ret == -EOPNOTSUPP)
> > +                             ret = 0;
>
> Why do we convert EOPNOTSUPP to success here?  If the device suddenly
> forgets how to provision space, wouldn't we want to pass that up to the
> caller?
>
> (I'm not sure when this would happen -- perhaps the bdev has the general
> provisioning capability but not for the specific range requested?)
>
Ah good catch, I initially wired it up to be less noisy in the kernel
logs but left it behind accidentally. The error should definitely be
passed through: one case where this can happen is if the device-mapper
table comprises several underlying targets but only a few of them
support provision. I'll fix this in v7.

Best
Sarthak

> The rest of the patch looks ok to me.
>
> --D
>
> > +                     bio_put(bio);
> > +                     break;
> > +             }
> > +             cond_resched();
> > +     }
> > +     blk_finish_plug(&plug);
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(blkdev_issue_provision);
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 6460abdb2426..a3ffebb97a1d 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
> >       return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
> >  }
> >
> > +static struct bio *bio_split_provision(struct bio *bio,
> > +                                     const struct queue_limits *lim,
> > +                                     unsigned int *nsegs, struct bio_set *bs)
> > +{
> > +     *nsegs = 0;
> > +
> > +     if (!lim->max_provision_sectors)
> > +             return NULL;
> > +
> > +     if (bio_sectors(bio) <= lim->max_provision_sectors)
> > +             return NULL;
> > +
> > +     return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
> > +}
> > +
> >  /*
> >   * Return the maximum number of sectors from the start of a bio that may be
> >   * submitted as a single request to a block device. If enough sectors remain,
> > @@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
> >       case REQ_OP_WRITE_ZEROES:
> >               split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
> >               break;
> > +     case REQ_OP_PROVISION:
> > +             split = bio_split_provision(bio, lim, nr_segs, bs);
> > +             break;
> >       default:
> >               split = bio_split_rw(bio, lim, nr_segs, bs,
> >                               get_max_io_size(bio, lim) << SECTOR_SHIFT);
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 896b4654ab00..d303e6614c36 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> >       lim->zoned = BLK_ZONED_NONE;
> >       lim->zone_write_granularity = 0;
> >       lim->dma_alignment = 511;
> > +     lim->max_provision_sectors = 0;
> >  }
> >
> >  /**
> > @@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> >       lim->max_dev_sectors = UINT_MAX;
> >       lim->max_write_zeroes_sectors = UINT_MAX;
> >       lim->max_zone_append_sectors = UINT_MAX;
> > +     lim->max_provision_sectors = UINT_MAX;
> >  }
> >  EXPORT_SYMBOL(blk_set_stacking_limits);
> >
> > @@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
> >  }
> >  EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
> >
> > +/**
> > + * blk_queue_max_provision_sectors - set max sectors for a single provision
> > + *
> > + * @q:  the request queue for the device
> > + * @max_provision_sectors: maximum number of sectors to provision per command
> > + **/
> > +
> > +void blk_queue_max_provision_sectors(struct request_queue *q,
> > +             unsigned int max_provision_sectors)
> > +{
> > +     q->limits.max_provision_sectors = max_provision_sectors;
> > +}
> > +EXPORT_SYMBOL(blk_queue_max_provision_sectors);
> > +
> >  /**
> >   * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
> >   * @q:  the request queue for the device
> > @@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >       t->max_segment_size = min_not_zero(t->max_segment_size,
> >                                          b->max_segment_size);
> >
> > +     t->max_provision_sectors = min_not_zero(t->max_provision_sectors,
> > +                                             b->max_provision_sectors);
> > +
> >       t->misaligned |= b->misaligned;
> >
> >       alignment = queue_limit_alignment_offset(b, start);
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index f1fce1c7fa44..0a3165211c66 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -213,6 +213,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
> >       return queue_var_show(0, page);
> >  }
> >
> > +static ssize_t queue_provision_max_show(struct request_queue *q,
> > +             char *page)
> > +{
> > +     return sprintf(page, "%llu\n",
> > +             (unsigned long long)q->limits.max_provision_sectors << 9);
> > +}
> > +
> >  static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
> >  {
> >       return queue_var_show(0, page);
> > @@ -604,6 +611,7 @@ QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
> >  QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
> >  QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
> >
> > +QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
> >  QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
> >  QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
> >  QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
> > @@ -661,6 +669,7 @@ static struct attribute *queue_attrs[] = {
> >       &queue_discard_max_entry.attr,
> >       &queue_discard_max_hw_entry.attr,
> >       &queue_discard_zeroes_data_entry.attr,
> > +     &queue_provision_max_entry.attr,
> >       &queue_write_same_max_entry.attr,
> >       &queue_write_zeroes_max_entry.attr,
> >       &queue_zone_append_max_entry.attr,
> > diff --git a/block/bounce.c b/block/bounce.c
> > index 7cfcb242f9a1..ab9d8723ae64 100644
> > --- a/block/bounce.c
> > +++ b/block/bounce.c
> > @@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
> >       case REQ_OP_DISCARD:
> >       case REQ_OP_SECURE_ERASE:
> >       case REQ_OP_WRITE_ZEROES:
> > +     case REQ_OP_PROVISION:
> >               break;
> >       default:
> >               bio_for_each_segment(bv, bio_src, iter)
> > diff --git a/block/fops.c b/block/fops.c
> > index 4c70fdc546e7..be2e41f160bf 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -613,7 +613,8 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >
> >  #define      BLKDEV_FALLOC_FL_SUPPORTED                                      \
> >               (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |           \
> > -              FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
> > +              FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE |       \
> > +              FALLOC_FL_UNSHARE_RANGE)
> >
> >  static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> >                            loff_t len)
> > @@ -653,6 +654,13 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> >        * de-allocate mode calls to fallocate().
> >        */
> >       switch (mode) {
> > +     case 0:
> > +     case FALLOC_FL_UNSHARE_RANGE:
> > +     case FALLOC_FL_KEEP_SIZE:
> > +     case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
> > +             error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> > +                                            len >> SECTOR_SHIFT, GFP_KERNEL);
> > +             break;
> >       case FALLOC_FL_ZERO_RANGE:
> >       case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> >               error = truncate_bdev_range(bdev, file->f_mode, start, end);
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index d766be7152e1..9820b3b039f2 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
> >           bio->bi_iter.bi_size &&
> >           bio_op(bio) != REQ_OP_DISCARD &&
> >           bio_op(bio) != REQ_OP_SECURE_ERASE &&
> > -         bio_op(bio) != REQ_OP_WRITE_ZEROES)
> > +         bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> > +         bio_op(bio) != REQ_OP_PROVISION)
> >               return true;
> >
> >       return false;
> > @@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
> >  {
> >       return bio_op(bio) == REQ_OP_DISCARD ||
> >              bio_op(bio) == REQ_OP_SECURE_ERASE ||
> > -            bio_op(bio) == REQ_OP_WRITE_ZEROES;
> > +            bio_op(bio) == REQ_OP_WRITE_ZEROES ||
> > +            bio_op(bio) == REQ_OP_PROVISION;
> >  }
> >
> >  static inline void *bio_data(struct bio *bio)
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index 99be590f952f..27bdf88f541c 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -385,7 +385,10 @@ enum req_op {
> >       REQ_OP_DRV_IN           = (__force blk_opf_t)34,
> >       REQ_OP_DRV_OUT          = (__force blk_opf_t)35,
> >
> > -     REQ_OP_LAST             = (__force blk_opf_t)36,
> > +     /* request device to provision block */
> > +     REQ_OP_PROVISION        = (__force blk_opf_t)37,
> > +
> > +     REQ_OP_LAST             = (__force blk_opf_t)38,
> >  };
> >
> >  enum req_flag_bits {
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 941304f17492..239e2f418b6e 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -303,6 +303,7 @@ struct queue_limits {
> >       unsigned int            discard_granularity;
> >       unsigned int            discard_alignment;
> >       unsigned int            zone_write_granularity;
> > +     unsigned int            max_provision_sectors;
> >
> >       unsigned short          max_segments;
> >       unsigned short          max_integrity_segments;
> > @@ -921,6 +922,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
> >               unsigned int max_discard_sectors);
> >  extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
> >               unsigned int max_write_same_sectors);
> > +extern void blk_queue_max_provision_sectors(struct request_queue *q,
> > +             unsigned int max_provision_sectors);
> >  extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
> >  extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
> >               unsigned int max_zone_append_sectors);
> > @@ -1060,6 +1063,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
> >               sector_t nr_sects, gfp_t gfp);
> >
> > +extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
> > +             sector_t nr_sects, gfp_t gfp_mask);
> > +
> >  #define BLKDEV_ZERO_NOUNMAP  (1 << 0)  /* do not free blocks */
> >  #define BLKDEV_ZERO_NOFALLBACK       (1 << 1)  /* don't write explicit zeroes */
> >
> > @@ -1139,6 +1145,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
> >       return q->limits.max_discard_segments;
> >  }
> >
> > +static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
> > +{
> > +     return q->limits.max_provision_sectors;
> > +}
> > +
> >  static inline unsigned int queue_max_segment_size(const struct request_queue *q)
> >  {
> >       return q->limits.max_segment_size;
> > @@ -1281,6 +1292,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
> >       return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
> >  }
> >
> > +static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
> > +{
> > +     return bdev_get_queue(bdev)->limits.max_provision_sectors;
> > +}
> > +
> >  static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> >  {
> >       return blk_queue_zoned_model(bdev_get_queue(bdev));
> > --
> > 2.40.1.521.gf1e218fcd8-goog
> >

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-05-15 21:55 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221229071647.437095-1-sarthakkukreti@chromium.org>
2023-04-14  0:02 ` [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2023-04-14  0:02   ` [PATCH v3 1/3] block: Introduce provisioning primitives Sarthak Kukreti
2023-04-17 17:35     ` Brian Foster
2023-04-18 22:13       ` Sarthak Kukreti
2023-04-14  0:02   ` [PATCH v3 2/3] dm: Add support for block provisioning Sarthak Kukreti
     [not found]     ` <CAJ0trDbyqoKEDN4kzcdn+vWhx+hk6pTM4ndf-E02f3uT9YZ3Uw@mail.gmail.com>
2023-04-14 18:14       ` Mike Snitzer
2023-04-14 21:58     ` Mike Snitzer
2023-04-18 22:13       ` Sarthak Kukreti
2023-04-14  0:02   ` [PATCH v3 3/3] loop: Add support for provision requests Sarthak Kukreti
2023-04-18 22:12   ` [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2023-04-18 22:12     ` [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
2023-04-18 22:43       ` Bart Van Assche
2023-04-20 17:41         ` Sarthak Kukreti
2023-04-19 15:36       ` [dm-devel] " Darrick J. Wong
2023-04-19 16:17         ` Mike Snitzer
2023-04-19 17:26           ` Darrick J. Wong
2023-04-19 23:21             ` Dave Chinner
2023-04-20  0:53               ` Sarthak Kukreti
2023-04-18 22:12     ` [PATCH v4 2/4] dm: Add block provisioning support Sarthak Kukreti
2023-04-18 22:12     ` [PATCH v4 3/4] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-04-18 22:12     ` [PATCH v4 4/4] loop: Add support for provision requests Sarthak Kukreti
2023-04-20  0:48   ` [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
2023-04-20  0:48     ` [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
2023-04-20  1:22       ` Darrick J. Wong
2023-04-20  1:48         ` Sarthak Kukreti
2023-04-20  1:47       ` [PATCH v5-fix " Sarthak Kukreti
2023-04-20 16:20         ` Mike Snitzer
2023-04-20 17:28           ` Sarthak Kukreti
2023-04-20 18:17             ` Sarthak Kukreti
2023-04-20 18:15           ` Sarthak Kukreti
2023-04-24 15:54       ` [PATCH v5 " kernel test robot
2023-04-20  0:48     ` [PATCH v5 2/5] block: Introduce provisioning primitives Sarthak Kukreti
2023-04-20  0:48     ` [PATCH v5 3/5] dm: Add block provisioning support Sarthak Kukreti
2023-04-20  0:48     ` [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-05-01 19:15       ` Mike Snitzer
2023-05-06  6:32         ` Sarthak Kukreti
2023-04-20  0:48     ` [PATCH v5 5/5] loop: Add support for provision requests Sarthak Kukreti
2023-05-06  6:29     ` [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
2023-05-06  6:29       ` [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
2023-05-09 16:51         ` Mike Snitzer
2023-05-12 18:31         ` Darrick J. Wong
2023-05-06  6:29       ` [PATCH v6 2/5] block: Introduce provisioning primitives Sarthak Kukreti
2023-05-09 16:52         ` Mike Snitzer
2023-05-12 18:37         ` Darrick J. Wong
2023-05-15 21:55           ` Sarthak Kukreti
2023-05-06  6:29       ` [PATCH v6 3/5] dm: Add block provisioning support Sarthak Kukreti
2023-05-09 16:52         ` Mike Snitzer
2023-05-06  6:29       ` [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-05-09 16:58         ` Mike Snitzer
2023-05-11 20:03           ` Sarthak Kukreti
2023-05-12 14:34             ` Mike Snitzer
2023-05-12 17:32         ` Mike Snitzer
2023-05-15 21:19           ` Sarthak Kukreti
2023-05-06  6:29       ` [PATCH v6 5/5] loop: Add support for provision requests Sarthak Kukreti
2023-05-15 12:40         ` Brian Foster
2023-05-15 21:31           ` Sarthak Kukreti
2023-05-12 18:28       ` [PATCH v6 0/5] Introduce block provisioning primitives Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).