dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [dm-devel] [PATCH 0/7] add simple copy support
       [not found] <CGME20210817101741epcas5p174ca0a539587da6a67b9f58cd13f2bad@epcas5p1.samsung.com>
@ 2021-08-17 10:14 ` SelvaKumar S
       [not found]   ` <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com>
                     ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

This started out as an attempt to support NVMe Simple Copy Command (SCC),
and evolved during the RFC review process.

The patchset, at this point, contains -
1. SCC support in NVMe driver
2. Block-layer infra for copy-offload operation
3. ioctl interface to user-space
4. copy-emulation infra in the block-layer
5. copy-offload plumbing to dm-kcopyd (thus creating couple of in-kernel
	users such as dm-clone)


The SCC specification, i.e. TP4065a can be found in following link

https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip

Simple copy is a copy offloading feature and can be used to copy multiple
contiguous ranges (source_ranges) of LBA's to a single destination LBA
within the device, reducing traffic between host and device.

We define a block ioctl for copy and copy payload format similar to
discard. For device supporting native simple copy, we attach the control
information as payload to the bio and submit to the device. Copy emulation
is implemented incase underlaying device does not support copy offload or
based on sysfs choice. Copy emulation is done by reading each source range
into buffer and writing it to the destination.

At present this implementation does not support copy offload for stacked/dm
devices, rather copy operation is completed through emulation.

One of the in-kernel use case for copy-offload is implemented by this
patchset. dm-kcopyd infra has been changed to leverage the copy-offload if
it is natively available. Other future use-cases could be F2FS GC, BTRFS
relocation/balance and copy_file_range.

Following limits are added to queue limits and are exposed via sysfs to
userspace, which user can use to form a payload.

 - copy_offload:
	configurable, can be used set emulation or copy offload
		0 to disable copy offload,
		1 to enable copy offloading support. Offload can be only
			enabled, if underlaying device supports offload

 - max_copy_sectors:
	total copy length supported by copy offload feature in device.
	0 indicates copy offload is not supported.

 - max_copy_nr_ranges:
	maximum number of source range entries supported by copy offload
			feature in device

 - max_copy_range_sectors:
	maximum copy length per source range entry

*blkdev_issue_copy* takes source bdev, no of sources, array of source
ranges (in sectors), destination bdev and destination offset(in sectors).
If both source and destination block devices are same and queue parameter
copy_offload is 1, then copy is done through native copy offloading.
Copy emulation is used in otherwise.

Changes from  RFC v5

1. Handle copy larger than maximum copy limits
2. Create copy context and submit copy offload asynchronously
3. Remove BLKDEV_COPY_NOEMULATION opt-in option of copy offload and
check for copy support before submission from dm and other layers
4. Allocate maximum possible allocatable buffer for copy emulation
rather failing very large copy offload.
5. Fix copy_offload sysfs to be either have 0 or 1

Changes from RFC v4

1. Extend dm-kcopyd to leverage copy-offload, while copying within the
same device. The other approach was to have copy-emulation by moving
dm-kcopyd to block layer. But it also required moving core dm-io infra,
causing a massive churn across multiple dm-targets.
2. Remove export in bio_map_kern()
3. Change copy_offload sysfs to accept 0 or else
4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
5. Rename payload entries, add source bdev field to be used while
partition remapping, remove copy_size
6. Change the blkdev_issue_copy() interface to accept destination and
source values in sector rather in bytes
7. Add payload to bio using bio_map_kern() for copy_offload case
8. Add check to return error if one of the source range length is 0
9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy
emulation incase of copy offload is not supported. Caller can his use
his existing copying logic to complete the io.
10. Bug fix copy checks and reduce size of rcu_lock()

Changes from RFC v3

1. gfp_flag fixes.
2. Export bio_map_kern() and use it to allocate and add pages to bio.
3. Move copy offload, reading to buf, writing from buf to separate functions
4. Send read bio of copy offload by chaining them and submit asynchronously
5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy().
6. Move single source range limit check to blk_check_copy()
7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper.
8. Change blkdev_issue_copy() interface generic to accepts destination bdev
        to support XCOPY as well.
9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory.
10. Fix buf allocoation logic to allocate buffer for the total size of copy.
11. Reword patch commit description.

Changes from RFC v2

1. Add emulation support for devices not supporting copy.
2. Add *copy_offload* sysfs entry to enable and disable copy_offload
        in devices supporting simple copy.
3. Remove simple copy support for stacked devices.

Changes from RFC v1:

1. Fix memory leak in __blkdev_issue_copy
2. Unmark blk_check_copy inline
3. Fix line break in blk_check_copy_eod
4. Remove p checks and made code more readable
5. Don't use bio_set_op_attrs and remove op and set
   bi_opf directly
6. Use struct_size to calculate total_size
7. Fix partition remap of copy destination
8. Remove mcl,mssrl,msrc from nvme_ns
9. Initialize copy queue limits to 0 in nvme_config_copy
10. Remove return in QUEUE_FLAG_COPY check
11. Remove unused OCFS


Nitesh Shetty (4):
  block: Introduce queue limits for copy-offload support
  block: copy offload support infrastructure
  block: Introduce a new ioctl for simple copy
  block: add emulation for simple copy

SelvaKumar S (3):
  block: make bio_map_kern() non static
  nvme: add simple copy support
  dm kcopyd: add simple copy offload support

 block/blk-core.c          |  84 ++++++++-
 block/blk-lib.c           | 352 ++++++++++++++++++++++++++++++++++++++
 block/blk-map.c           |   2 +-
 block/blk-settings.c      |   4 +
 block/blk-sysfs.c         |  51 ++++++
 block/blk-zoned.c         |   1 +
 block/bounce.c            |   1 +
 block/ioctl.c             |  33 ++++
 drivers/md/dm-kcopyd.c    |  56 +++++-
 drivers/nvme/host/core.c  |  83 +++++++++
 drivers/nvme/host/trace.c |  19 ++
 include/linux/bio.h       |   1 +
 include/linux/blk_types.h |  20 +++
 include/linux/blkdev.h    |  21 +++
 include/linux/nvme.h      |  43 ++++-
 include/uapi/linux/fs.h   |  20 +++
 16 files changed, 775 insertions(+), 16 deletions(-)

-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 1/7] block: make bio_map_kern() non static
       [not found]   ` <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  0 siblings, 0 replies; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

Make bio_map_kern() non static, so that copy offload/emulation can use
it to add vmalloced memory to bio.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 block/blk-map.c        | 2 +-
 include/linux/blkdev.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index d1448aaad980..1b97b9b503f4 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -336,7 +336,7 @@ static void bio_map_kern_endio(struct bio *bio)
  *	Map the kernel address into a bio suitable for io to a block
  *	device. Returns an error pointer in case of error.
  */
-static struct bio *bio_map_kern(struct request_queue *q, void *data,
+struct bio *bio_map_kern(struct request_queue *q, void *data,
 		unsigned int len, gfp_t gfp_mask)
 {
 	unsigned long kaddr = (unsigned long)data;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index df404c1fb087..28a193225cf2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -903,6 +903,8 @@ extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
 extern int blk_rq_unmap_user(struct bio *);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+			gfp_t gfp_mask);
 extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
 extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
 			       struct rq_map_data *, const struct iov_iter *,
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support
       [not found]   ` <CGME20210817101753epcas5p4f4257f8edda27e184ecbb273b700ccbc@epcas5p4.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  2021-08-17 13:08       ` Greg KH
  0 siblings, 1 reply; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

From: Nitesh Shetty <nj.shetty@samsung.com>

Add device limits as sysfs entries,
        - copy_offload (READ_WRITE)
        - max_copy_sectors (READ_ONLY)
        - max_copy_ranges_sectors (READ_ONLY)
        - max_copy_nr_ranges (READ_ONLY)

copy_offload(= 0), is disabled by default. This needs to be enabled if
copy-offload needs to be used.
max_copy_sectors = 0, indicates the device doesn't support native copy.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 block/blk-settings.c   |  4 ++++
 block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  6 +++++
 3 files changed, 61 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 3613d2cc0688..ac59922e0cde 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->copy_offload = 0;
+	lim->max_copy_sectors = 0;
+	lim->max_copy_nr_ranges = 0;
+	lim->max_copy_range_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1832587dce3a..aabab65f10ab 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -169,6 +169,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
 	return queue_var_show(q->limits.discard_granularity, page);
 }
 
+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.copy_offload, page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long copy_offload;
+	ssize_t ret = queue_var_store(&copy_offload, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_offload && q->limits.max_copy_sectors == 0)
+		return -EINVAL;
+
+	if (copy_offload)
+		q->limits.copy_offload = BLK_COPY_OFFLOAD_SCC;
+	else
+		q->limits.copy_offload = 0;
+
+	return ret;
+}
+
+static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.max_copy_sectors, page);
+}
+
+static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_range_sectors, page);
+}
+
+static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_nr_ranges, page);
+}
+
 static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
 {
 
@@ -611,6 +653,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
 QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
 QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
 
+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
+QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
+QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
+
 QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
 QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
 QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -657,6 +704,10 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_copy_offload_entry.attr,
+	&queue_max_copy_sectors_entry.attr,
+	&queue_max_copy_range_sectors_entry.attr,
+	&queue_max_copy_nr_ranges_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 28a193225cf2..fd4cfaadda5b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -321,10 +321,14 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int            copy_offload;
+	unsigned int            max_copy_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
 	unsigned short		max_discard_segments;
+	unsigned short          max_copy_range_sectors;
+	unsigned short          max_copy_nr_ranges;
 
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
@@ -600,6 +604,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_SIMPLE_COPY	30	/* supports simple copy */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -622,6 +627,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
+#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_SIMPLE_COPY, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_secure_erase(q) \
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
       [not found]   ` <CGME20210817101758epcas5p1ec353b3838d64654e69488229256d9eb@epcas5p1.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  2021-08-17 17:14       ` Bart Van Assche
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

From: Nitesh Shetty <nj.shetty@samsung.com>

Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
bio with control information as payload and submit to the device.
Larger copy operation may be divided if necessary by looking at device
limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
submitted to zoned device.
Native copy offload is not supported for stacked devices.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
---
 block/blk-core.c          |  84 ++++++++++++-
 block/blk-lib.c           | 252 ++++++++++++++++++++++++++++++++++++++
 block/blk-zoned.c         |   1 +
 block/bounce.c            |   1 +
 include/linux/bio.h       |   1 +
 include/linux/blk_types.h |  20 +++
 include/linux/blkdev.h    |  13 ++
 include/uapi/linux/fs.h   |  12 ++
 8 files changed, 378 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d2722ecd4d9b..541b1561b4af 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -704,6 +704,17 @@ static noinline int should_fail_bio(struct bio *bio)
 }
 ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO);
 
+static inline int bio_check_copy_eod(struct bio *bio, sector_t start,
+		sector_t nr_sectors, sector_t max_sect)
+{
+	if (nr_sectors && max_sect &&
+	    (nr_sectors > max_sect || start > max_sect - nr_sectors)) {
+		handle_bad_sector(bio, max_sect);
+		return -EIO;
+	}
+	return 0;
+}
+
 /*
  * Check whether this bio extends beyond the end of the device or partition.
  * This may well happen - the kernel calls bread() without checking the size of
@@ -723,6 +734,61 @@ static inline int bio_check_eod(struct bio *bio)
 	return 0;
 }
 
+/*
+ * check for eod limits and remap ranges if needed
+ */
+static int blk_check_copy(struct bio *bio)
+{
+	struct blk_copy_payload *payload = bio_data(bio);
+	sector_t dst_max_sect, dst_start_sect, copy_size = 0;
+	sector_t src_max_sect, src_start_sect;
+	struct block_device *bd_part;
+	int i, ret = -EIO;
+
+	rcu_read_lock();
+
+	bd_part = bio->bi_bdev;
+	if (unlikely(!bd_part))
+		goto err;
+
+	dst_max_sect =  bdev_nr_sectors(bd_part);
+	dst_start_sect = bd_part->bd_start_sect;
+
+	src_max_sect = bdev_nr_sectors(payload->src_bdev);
+	src_start_sect = payload->src_bdev->bd_start_sect;
+
+	if (unlikely(should_fail_request(bd_part, bio->bi_iter.bi_size)))
+		goto err;
+
+	if (unlikely(bio_check_ro(bio)))
+		goto err;
+
+	rcu_read_unlock();
+
+	for (i = 0; i < payload->copy_nr_ranges; i++) {
+		ret = bio_check_copy_eod(bio, payload->range[i].src,
+				payload->range[i].len, src_max_sect);
+		if (unlikely(ret))
+			goto out;
+
+		payload->range[i].src += src_start_sect;
+		copy_size += payload->range[i].len;
+	}
+
+	/* check if copy length crosses eod */
+	ret = bio_check_copy_eod(bio, bio->bi_iter.bi_sector,
+				copy_size, dst_max_sect);
+	if (unlikely(ret))
+		goto out;
+
+	bio->bi_iter.bi_sector += dst_start_sect;
+	return 0;
+err:
+	rcu_read_unlock();
+out:
+	return ret;
+}
+
 /*
  * Remap block n of partition p to block n+start(p) of the disk.
  */
@@ -799,13 +865,15 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 
 	if (should_fail_bio(bio))
 		goto end_io;
-	if (unlikely(bio_check_ro(bio)))
-		goto end_io;
-	if (!bio_flagged(bio, BIO_REMAPPED)) {
-		if (unlikely(bio_check_eod(bio)))
-			goto end_io;
-		if (bdev->bd_partno && unlikely(blk_partition_remap(bio)))
+	if (likely(!op_is_copy(bio->bi_opf))) {
+		if (unlikely(bio_check_ro(bio)))
 			goto end_io;
+		if (!bio_flagged(bio, BIO_REMAPPED)) {
+			if (unlikely(bio_check_eod(bio)))
+				goto end_io;
+			if (bdev->bd_partno && unlikely(blk_partition_remap(bio)))
+				goto end_io;
+		}
 	}
 
 	/*
@@ -829,6 +897,10 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		if (!blk_queue_discard(q))
 			goto not_supported;
 		break;
+	case REQ_OP_COPY:
+		if (unlikely(blk_check_copy(bio)))
+			goto end_io;
+		break;
 	case REQ_OP_SECURE_ERASE:
 		if (!blk_queue_secure_erase(q))
 			goto not_supported;
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9f09beadcbe3..7fee0ae95c44 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -151,6 +151,258 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+/*
+ * Wait on and process all in-flight BIOs.  This must only be called once
+ * all bios have been issued so that the refcount can only decrease.
+ * This just waits for all bios to make it through cio_bio_end_io.  IO
+ * errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+	int ret = 0;
+
+	while (atomic_read(&cio->refcount)) {
+		cio->waiter = current;
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		blk_io_schedule();
+		/* wake up sets us TASK_RUNNING */
+		cio->waiter = NULL;
+		ret = cio->io_err;
+	}
+	kvfree(cio);
+
+	return ret;
+}
+
+/*
+ * The BIO completion handler simply decrements refcount.
+ * Also wake up process, if this is the last bio to be completed.
+ *
+ * During I/O bi_private points at the cio.
+ */
+static void cio_bio_end_io(struct bio *bio)
+{
+	struct cio *cio = bio->bi_private;
+
+	if (bio->bi_status)
+		cio->io_err = bio->bi_status;
+	kvfree(page_address(bio_first_bvec_all(bio)->bv_page) +
+			bio_first_bvec_all(bio)->bv_offset);
+	bio_put(bio);
+
+	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
+		wake_up_process(cio->waiter);
+}
+
+int blk_copy_offload_submit_bio(struct block_device *bdev,
+		struct blk_copy_payload *payload, int payload_size,
+		struct cio *cio, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio;
+
+	bio = bio_map_kern(q, payload, payload_size, gfp_mask);
+	if (IS_ERR(bio))
+		return PTR_ERR(bio);
+
+	bio_set_dev(bio, bdev);
+	bio->bi_opf = REQ_OP_COPY | REQ_NOMERGE;
+	bio->bi_iter.bi_sector = payload->dest;
+	bio->bi_end_io = cio_bio_end_io;
+	bio->bi_private = cio;
+	atomic_inc(&cio->refcount);
+	submit_bio(bio);
+
+	return 0;
+}
+
+/* Go through all the enrties inside user provided payload, and determine the
+ * maximum number of entries in a payload, based on device's scc-limits.
+ */
+static inline int blk_max_payload_entries(int nr_srcs, struct range_entry *rlist,
+		int max_nr_srcs, sector_t max_copy_range_sectors, sector_t max_copy_len)
+{
+	sector_t range_len, copy_len = 0, remaining = 0;
+	int ri = 0, pi = 1, max_pi = 0;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		for (remaining = rlist[ri].len; remaining > 0; remaining -= range_len) {
+			range_len = min3(remaining, max_copy_range_sectors,
+								max_copy_len - copy_len);
+			pi++;
+			copy_len += range_len;
+
+			if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) {
+				max_pi = max(max_pi, pi);
+				pi = 1;
+				copy_len = 0;
+			}
+		}
+	}
+
+	return max(max_pi, pi);
+}
+
+/*
+ * blk_copy_offload_scc	- Use device's native copy offload feature
+ * Go through user provide payload, prepare new payload based on device's copy offload limits.
+ */
+int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dest_bdev,
+		sector_t dest, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(dest_bdev);
+	struct cio *cio = NULL;
+	struct blk_copy_payload *payload;
+	sector_t range_len, copy_len = 0, remaining = 0;
+	sector_t src_blk, cdest = dest;
+	sector_t max_copy_range_sectors, max_copy_len;
+	int ri = 0, pi = 0, ret = 0, payload_size, max_pi, max_nr_srcs;
+
+	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
+	if (!cio)
+		return -ENOMEM;
+	atomic_set(&cio->refcount, 0);
+
+	max_nr_srcs = q->limits.max_copy_nr_ranges;
+	max_copy_range_sectors = q->limits.max_copy_range_sectors;
+	max_copy_len = q->limits.max_copy_sectors;
+
+	max_pi = blk_max_payload_entries(nr_srcs, rlist, max_nr_srcs,
+					max_copy_range_sectors, max_copy_len);
+	payload_size = struct_size(payload, range, max_pi);
+
+	payload = kvmalloc(payload_size, gfp_mask);
+	if (!payload) {
+		ret = -ENOMEM;
+		goto free_cio;
+	}
+	payload->src_bdev = src_bdev;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		for (remaining = rlist[ri].len, src_blk = rlist[ri].src; remaining > 0;
+						remaining -= range_len, src_blk += range_len) {
+
+			range_len = min3(remaining, max_copy_range_sectors,
+								max_copy_len - copy_len);
+			payload->range[pi].len = range_len;
+			payload->range[pi].src = src_blk;
+			pi++;
+			copy_len += range_len;
+
+			/* Submit current payload, if crossing device copy limits */
+			if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) {
+				payload->dest = cdest;
+				payload->copy_nr_ranges = pi;
+				ret = blk_copy_offload_submit_bio(dest_bdev, payload,
+								payload_size, cio, gfp_mask);
+				if (ret)
+					goto free_payload;
+
+				/* reset index, length and allocate new payload */
+				pi = 0;
+				cdest += copy_len;
+				copy_len = 0;
+				payload = kvmalloc(payload_size, gfp_mask);
+				if (!payload) {
+					ret = -ENOMEM;
+					goto free_cio;
+				}
+				payload->src_bdev = src_bdev;
+			}
+		}
+	}
+
+	if (pi) {
+		payload->dest = cdest;
+		payload->copy_nr_ranges = pi;
+		ret = blk_copy_offload_submit_bio(dest_bdev, payload, payload_size, cio, gfp_mask);
+		if (ret)
+			goto free_payload;
+	}
+
+	/* Wait for completion of all IO's*/
+	ret = cio_await_completion(cio);
+
+	return ret;
+
+free_payload:
+	kvfree(payload);
+free_cio:
+	cio_await_completion(cio);
+	return ret;
+}
+
+static inline sector_t blk_copy_len(struct range_entry *rlist, int nr_srcs)
+{
+	int i;
+	sector_t len = 0;
+
+	for (i = 0; i < nr_srcs; i++) {
+		if (rlist[i].len)
+			len += rlist[i].len;
+		else
+			return 0;
+	}
+
+	return len;
+}
+
+static inline bool blk_check_offload_scc(struct request_queue *src_q,
+		struct request_queue *dest_q)
+{
+	if (src_q == dest_q && src_q->limits.copy_offload == BLK_COPY_OFFLOAD_SCC)
+		return true;
+
+	return false;
+}
+
+/*
+ * blkdev_issue_copy - queue a copy
+ * @src_bdev:	source block device
+ * @nr_srcs:	number of source ranges to copy
+ * @src_rlist:	array of source ranges
+ * @dest_bdev:	destination block device
+ * @dest:	destination in sector
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ * @flags:	BLKDEV_COPY_* flags to control behaviour
+ *
+ * Description:
+ *	Copy source ranges from source block device to destination block device.
+ *	length of a source range cannot be zero.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev,
+		sector_t dest, gfp_t gfp_mask, int flags)
+{
+	struct request_queue *src_q = bdev_get_queue(src_bdev);
+	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
+	sector_t copy_len;
+	int ret = -EINVAL;
+
+	if (!src_q || !dest_q)
+		return -ENXIO;
+
+	if (!nr_srcs)
+		return -EINVAL;
+
+	if (nr_srcs >= MAX_COPY_NR_RANGE)
+		return -EINVAL;
+
+	copy_len = blk_copy_len(src_rlist, nr_srcs);
+	if (!copy_len && copy_len >= MAX_COPY_TOTAL_LENGTH)
+		return -EINVAL;
+
+	if (bdev_read_only(dest_bdev))
+		return -EPERM;
+
+	if (blk_check_offload_scc(src_q, dest_q))
+		ret = blk_copy_offload_scc(src_bdev, nr_srcs, src_rlist, dest_bdev, dest, gfp_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_copy);
+
 /**
  * __blkdev_issue_write_same - generate number of bios with same page
  * @bdev:	target blockdev
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 86fce751bb17..7643fc868521 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -67,6 +67,7 @@ bool blk_req_needs_zone_write_lock(struct request *rq)
 	case REQ_OP_WRITE_ZEROES:
 	case REQ_OP_WRITE_SAME:
 	case REQ_OP_WRITE:
+	case REQ_OP_COPY:
 		return blk_rq_zone_is_seq(rq);
 	default:
 		return false;
diff --git a/block/bounce.c b/block/bounce.c
index 05fc7148489d..d9b05aaf6e56 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
 	switch (bio_op(bio)) {
+	case REQ_OP_COPY:
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3d67d0fbc868..068fa2e8896a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -73,6 +73,7 @@ static inline bool bio_has_data(struct bio *bio)
 static inline bool bio_no_advance_iter(const struct bio *bio)
 {
 	return bio_op(bio) == REQ_OP_DISCARD ||
+	       bio_op(bio) == REQ_OP_COPY ||
 	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
 	       bio_op(bio) == REQ_OP_WRITE_SAME ||
 	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 9e392daa1d7f..1ab77176cb46 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -347,6 +347,8 @@ enum req_opf {
 	REQ_OP_ZONE_RESET	= 15,
 	/* reset all the zone present on the device */
 	REQ_OP_ZONE_RESET_ALL	= 17,
+	/* copy ranges within device */
+	REQ_OP_COPY		= 19,
 
 	/* Driver private requests */
 	REQ_OP_DRV_IN		= 34,
@@ -470,6 +472,11 @@ static inline bool op_is_discard(unsigned int op)
 	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
 }
 
+static inline bool op_is_copy(unsigned int op)
+{
+	return (op & REQ_OP_MASK) == REQ_OP_COPY;
+}
+
 /*
  * Check if a bio or request operation is a zone management operation, with
  * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
@@ -529,4 +536,17 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+struct cio {
+	atomic_t refcount;
+	blk_status_t io_err;
+	struct task_struct *waiter;     /* waiting task (NULL if none) */
+};
+
+struct blk_copy_payload {
+	struct block_device	*src_bdev;
+	sector_t		dest;
+	int			copy_nr_ranges;
+	struct range_entry	range[];
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fd4cfaadda5b..38369dff6a36 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -52,6 +52,12 @@ struct blk_keyslot_manager;
 /* Doing classic polling */
 #define BLK_MQ_POLL_CLASSIC -1
 
+/* Define copy offload options */
+enum blk_copy {
+	BLK_COPY_OFFLOAD_EMULATE = 0,
+	BLK_COPY_OFFLOAD_SCC,
+};
+
 /*
  * Maximum number of blkcg policies allowed to be registered concurrently.
  * Defined here to simplify include dependency.
@@ -1051,6 +1057,9 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 		return min(q->limits.max_discard_sectors,
 			   UINT_MAX >> SECTOR_SHIFT);
 
+	if (unlikely(op == REQ_OP_COPY))
+		return q->limits.max_copy_sectors;
+
 	if (unlikely(op == REQ_OP_WRITE_SAME))
 		return q->limits.max_write_same_sectors;
 
@@ -1326,6 +1335,10 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, int flags,
 		struct bio **biop);
 
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev,
+		sector_t dest, gfp_t gfp_mask, int flags);
+
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..7a97b588d892 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,18 @@ struct fstrim_range {
 	__u64 minlen;
 };
 
+/* Maximum no of entries supported */
+#define MAX_COPY_NR_RANGE	(1 << 12)
+
+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
+
+/* Source range entry for copy */
+struct range_entry {
+	__u64 src;
+	__u64 len;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
       [not found]   ` <CGME20210817101803epcas5p10cda1d52f8a8f1172e34b1f9cf8eef3b@epcas5p1.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  2021-08-17 13:09       ` Greg KH
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

From: Nitesh Shetty <nj.shetty@samsung.com>

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to a destination in the device. COPY ioctl accepts a 'copy_range'
structure that contains destination (in sectors), no of sources and
pointer to the array of source ranges. Each source range is represented by
'range_entry' that contains start and length of source ranges (in sectors)

MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.

Example code, to issue BLKCOPY:
/* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
 * [64,24], on the same device */

int main(void)
{
	int ret, fd;
	struct range_entry source_range[] = {{.src = 0, .len = 8},
		{.src = 16, .len = 8}, {.src = 32, .len = 8},};
	struct copy_range cr;

	cr.dest = 64;
	cr.nr_range = 3;
	cr.range_list = (__u64)&source_range;

	fd = open("/dev/nvme0n1", O_RDWR);
	if (fd < 0) return 1;

	ret = ioctl(fd, BLKCOPY, &cr);
	if (ret < 0) printf("copy failure\n");

	close(fd);

	return ret;
}

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h |  8 ++++++++
 2 files changed, 41 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index eb0491e90b9a..2af56d01e9fe 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 				    GFP_KERNEL, flags);
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	struct copy_range crange;
+	struct range_entry *rlist;
+	int ret;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
+		return -EFAULT;
+
+	rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
+			GFP_KERNEL);
+	if (!rlist)
+		return -ENOMEM;
+
+	if (copy_from_user(rlist, (void __user *)crange.range_list,
+				sizeof(*rlist) * crange.nr_range)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest,
+			GFP_KERNEL, 0);
+out:
+	kfree(rlist);
+	return ret;
+}
+
 static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
@@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 	case BLKSECDISCARD:
 		return blk_ioctl_discard(bdev, mode, arg,
 				BLKDEV_DISCARD_SECURE);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 7a97b588d892..4183688ff398 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -76,6 +76,13 @@ struct range_entry {
 	__u64 len;
 };
 
+struct copy_range {
+	__u64 dest;
+	__u64 nr_range;
+	__u64 range_list;
+	__u64 rsvd;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
@@ -197,6 +204,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 5/7] block: add emulation for simple copy
       [not found]   ` <CGME20210817101809epcas5p39eed3531ed82f5f08127eb3dba1fc50f@epcas5p3.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  2021-08-17 22:10       ` kernel test robot
  0 siblings, 1 reply; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

From: Nitesh Shetty <nj.shetty@samsung.com>

For the devices which does not support simple copy, copy emulation is
added. Also for stacked devices, copy is performed via emulation.

Copy-emulation is implemented by allocating maximum possible memory
less than or equal to total copy size. The source ranges are read
into memory by chaining bio for each source ranges and submitting them
async and the last bio waits for completion. After data is read, it is
written to the destination and the process is repeated till no source
ranges left.

bio_map_kern() is used to allocate bio and add pages of copy buffer to
bio. As bio->bi_private and bio->bi_end_io are needed for chaining the
bio and gets over-written, invalidate_kernel_vmap_range() for read is
called in the caller.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
---
 block/blk-lib.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7fee0ae95c44..d29c52b90dcf 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -333,6 +333,64 @@ int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
 	return ret;
 }
 
+int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
+				sector_t sector, unsigned int op, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio, *parent = NULL;
+	sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
+			queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT));
+	sector_t len, remaining;
+	int ret;
+
+	for (remaining = buf_len; remaining > 0; remaining -= len) {
+		len = min_t(int, max_hw_len, remaining);
+retry:
+		bio = bio_map_kern(q, buf, len << SECTOR_SHIFT, gfp_mask);
+		if (IS_ERR(bio)) {
+			len >>= 1;
+			if (len)
+				goto retry;
+			return PTR_ERR(bio);
+		}
+
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_opf = op;
+		bio_set_dev(bio, bdev);
+		bio->bi_end_io = NULL;
+		bio->bi_private = NULL;
+
+		if (parent) {
+			bio_chain(parent, bio);
+			submit_bio(parent);
+		}
+		parent = bio;
+		sector += len;
+	}
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return ret;
+}
+
+static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask)
+{
+	int min_size = PAGE_SIZE;
+	void *buf;
+
+	while (req_size >= min_size) {
+		buf = kvmalloc(req_size, gfp_mask);
+		if (buf) {
+			*alloc_size = (req_size >> SECTOR_SHIFT);
+			return buf;
+		}
+		/* retry half the requested size */
+		req_size >>= 1;
+	}
+
+	return NULL;
+}
+
 static inline sector_t blk_copy_len(struct range_entry *rlist, int nr_srcs)
 {
 	int i;
@@ -348,6 +406,46 @@ static inline sector_t blk_copy_len(struct range_entry *rlist, int nr_srcs)
 	return len;
 }
 
+/*
+ * If native copy offload feature is absent, this function tries to emulate,
+ * by copying data from source to a temporary buffer and from buffer to
+ * destination device.
+ */
+static int blk_copy_emulate(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dest_bdev,
+		sector_t dest, gfp_t gfp_mask)
+{
+	void *buf = NULL;
+	int ret, nr_i = 0;
+	sector_t src_blk, copy_len, buf_len, read_len, copied_len, remaining = 0;
+
+	copy_len = blk_copy_len(rlist, nr_srcs);
+	buf = blk_alloc_buf(copy_len << SECTOR_SHIFT, &buf_len, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	for (copied_len = 0; copied_len < copy_len; copied_len += read_len) {
+		if (!remaining) {
+			src_blk = rlist[nr_i].src;
+			remaining = rlist[nr_i++].len;
+		}
+
+		read_len = min_t(sector_t, remaining, buf_len);
+		ret = blk_submit_rw_buf(src_bdev, buf, read_len, src_blk, REQ_OP_READ, gfp_mask);
+		if (ret)
+			goto out;
+		src_blk += read_len;
+		remaining -= read_len;
+		ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dest + copied_len, REQ_OP_WRITE,
+				gfp_mask);
+		if (ret)
+			goto out;
+	}
+out:
+	kvfree(buf);
+	return ret;
+}
+
 static inline bool blk_check_offload_scc(struct request_queue *src_q,
 		struct request_queue *dest_q)
 {
@@ -398,6 +496,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
 
 	if (blk_check_offload_scc(src_q, dest_q))
 		ret = blk_copy_offload_scc(src_bdev, nr_srcs, src_rlist, dest_bdev, dest, gfp_mask);
+	else
+		ret = blk_copy_emulate(src_bdev, nr_srcs, src_rlist, dest_bdev, dest, gfp_mask);
 
 	return ret;
 }
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 6/7] nvme: add simple copy support
       [not found]   ` <CGME20210817101814epcas5p41db3d7269f5139efcaf2ca685cd04a16@epcas5p4.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  0 siblings, 0 replies; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

Add support for TP 4065a ("Simple Copy Command"), v2020.05.04 ("Ratified")

For device supporting native simple copy, this implementation accepts
the payload passed from the block layer and convert payload to form
simple copy command and submit to the device.

Set the device copy limits to queue limits. By default copy_offload
is disabled.

End-to-end protection is done by setting both PRINFOR and PRINFOW
to 0.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
---
 drivers/nvme/host/core.c  | 83 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/trace.c | 19 +++++++++
 include/linux/nvme.h      | 43 ++++++++++++++++++--
 3 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f5aabbb4a00f..41f02ee8c0d2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -825,6 +825,59 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+static inline blk_status_t nvme_setup_copy(struct nvme_ns *ns,
+	       struct request *req, struct nvme_command *cmnd)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_copy_range *range = NULL;
+	struct blk_copy_payload *payload;
+	unsigned short nr_range = 0;
+	u16 control = 0, ssrl;
+	u32 dsmgmt = 0;
+	u64 slba;
+	int i;
+
+	payload = bio_data(req->bio);
+	nr_range = payload->copy_nr_ranges;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+
+	if (req->cmd_flags & REQ_FAILFAST_DEV)
+		control |= NVME_RW_LR;
+
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(blk_rq_pos(req) >> (ns->lba_shift - 9));
+
+	range = kmalloc_array(nr_range, sizeof(*range),
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!range)
+		return BLK_STS_RESOURCE;
+
+	for (i = 0; i < nr_range; i++) {
+		slba = (payload->range[i].src) >> (ns->lba_shift - 9);
+		ssrl = (payload->range[i].len) >> (ns->lba_shift - 9);
+		range[i].slba = cpu_to_le64(slba);
+		range[i].nlb = cpu_to_le16(ssrl - 1);
+	}
+
+	cmnd->copy.nr_range = nr_range - 1;
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * nr_range;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	if (ctrl->nr_streams)
+		nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
+
+	cmnd->rw.control = cpu_to_le16(control);
+	cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
+
+	return BLK_STS_OK;
+}
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -1012,6 +1065,9 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 	case REQ_OP_DISCARD:
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
+	case REQ_OP_COPY:
+		ret = nvme_setup_copy(ns, req, cmd);
+		break;
 	case REQ_OP_READ:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
@@ -1640,6 +1696,31 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
+				       struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct request_queue *queue = disk->queue;
+
+	if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
+		queue->limits.copy_offload = 0;
+		queue->limits.max_copy_sectors = 0;
+		queue->limits.max_copy_range_sectors = 0;
+		queue->limits.max_copy_nr_ranges = 0;
+		blk_queue_flag_clear(QUEUE_FLAG_SIMPLE_COPY, queue);
+		return;
+	}
+
+	/* setting copy limits */
+	blk_queue_flag_test_and_set(QUEUE_FLAG_SIMPLE_COPY, queue);
+	queue->limits.copy_offload = 0;
+	queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) *
+		(1 << (ns->lba_shift - 9));
+	queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) *
+		(1 << (ns->lba_shift - 9));
+	queue->limits.max_copy_nr_ranges = id->msrc + 1;
+}
+
 static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
 {
 	return !uuid_is_null(&ids->uuid) ||
@@ -1811,6 +1892,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	nvme_config_discard(disk, ns);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
+	nvme_config_copy(disk, ns, id);
 
 	set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
 		test_bit(NVME_NS_FORCE_RO, &ns->flags));
@@ -4564,6 +4646,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
index 6543015b6121..0e7db7ad5ed3 100644
--- a/drivers/nvme/host/trace.c
+++ b/drivers/nvme/host/trace.c
@@ -136,6 +136,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10)
 	return ret;
 }
 
+static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 slba = get_unaligned_le64(cdw10);
+	u8 nr_range = get_unaligned_le16(cdw10 + 8);
+	u16 control = get_unaligned_le16(cdw10 + 10);
+	u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
+	u32 reftag = get_unaligned_le32(cdw10 +  16);
+
+	trace_seq_printf(p,
+			 "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
+			 slba, nr_range, control, dsmgmt, reftag);
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+
 static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10)
 {
 	const char *ret = trace_seq_buffer_ptr(p);
@@ -227,6 +244,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
 		return nvme_trace_zone_mgmt_send(p, cdw10);
 	case nvme_cmd_zone_mgmt_recv:
 		return nvme_trace_zone_mgmt_recv(p, cdw10);
+	case nvme_cmd_copy:
+		return nvme_trace_copy(p, cdw10);
 	default:
 		return nvme_trace_common(p, cdw10);
 	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index b7c4c4130b65..4f79f223c9eb 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -295,7 +295,7 @@ struct nvme_id_ctrl {
 	__u8			nvscc;
 	__u8			nwpc;
 	__le16			acwu;
-	__u8			rsvd534[2];
+	__le16			ocfs;
 	__le32			sgls;
 	__le32			mnan;
 	__u8			rsvd544[224];
@@ -320,6 +320,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
@@ -368,7 +369,10 @@ struct nvme_id_ns {
 	__le16			npdg;
 	__le16			npda;
 	__le16			nows;
-	__u8			rsvd74[18];
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd91[11];
 	__le32			anagrpid;
 	__u8			rsvd96[3];
 	__u8			nsattr;
@@ -689,6 +693,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -710,7 +715,8 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
-		nvme_opcode_name(nvme_cmd_zone_append))
+		nvme_opcode_name(nvme_cmd_zone_append),		\
+		nvme_opcode_name(nvme_cmd_copy))
 
 
 
@@ -883,6 +889,36 @@ struct nvme_dsm_range {
 	__le64			slba;
 };
 
+struct nvme_copy_command {
+	__u8                    opcode;
+	__u8                    flags;
+	__u16                   command_id;
+	__le32                  nsid;
+	__u64                   rsvd2;
+	__le64                  metadata;
+	union nvme_data_ptr     dptr;
+	__le64                  sdlba;
+	__u8			nr_range;
+	__u8			rsvd12;
+	__le16                  control;
+	__le16                  rsvd13;
+	__le16			dspec;
+	__le32                  ilbrt;
+	__le16                  lbat;
+	__le16                  lbatm;
+};
+
+struct nvme_copy_range {
+	__le64			rsvd0;
+	__le64			slba;
+	__le16			nlb;
+	__le16			rsvd18;
+	__le32			rsvd20;
+	__le32			eilbrt;
+	__le16			elbat;
+	__le16			elbatm;
+};
+
 struct nvme_write_zeroes_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1427,6 +1463,7 @@ struct nvme_command {
 		struct nvme_download_firmware dlfw;
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
+		struct nvme_copy_command copy;
 		struct nvme_write_zeroes_cmd write_zeroes;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support
       [not found]   ` <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com>
@ 2021-08-17 10:14     ` SelvaKumar S
  2021-08-17 20:29       ` Mikulas Patocka
  0 siblings, 1 reply; 30+ messages in thread
From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-kcopyd.c | 56 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 37b03ab7e5c9..d9ee105a6127 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
 	atomic_t nr_jobs;
 
 /*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
  *
- * i)   jobs waiting for pages
- * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i)	jobs waiting to try copy offload
+ * ii)   jobs waiting for pages
+ * iii)  jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
  *
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
  */
 	spinlock_t job_lock;
 	struct list_head callback_jobs;
 	struct list_head complete_jobs;
+	struct list_head copy_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
 };
@@ -579,6 +581,43 @@ static int run_io_job(struct kcopyd_job *job)
 	return r;
 }
 
+static int run_copy_job(struct kcopyd_job *job)
+{
+	int r, i, count = 0;
+	unsigned long flags = 0;
+	struct range_entry srange;
+
+	struct request_queue *src_q, *dest_q;
+
+	for (i = 0; i < job->num_dests; i++) {
+		srange.src = job->source.sector;
+		srange.len = job->source.count;
+
+		src_q = bdev_get_queue(job->source.bdev);
+		dest_q = bdev_get_queue(job->dests[i].bdev);
+
+		if (src_q != dest_q && !src_q->limits.copy_offload)
+			break;
+
+		r = blkdev_issue_copy(job->source.bdev, 1, &srange,
+			job->dests[i].bdev, job->dests[i].sector, GFP_KERNEL, flags);
+		if (r)
+			break;
+
+		job->dests[i].count = 0;
+		count++;
+	}
+
+	if (count == job->num_dests) {
+		push(&job->kc->complete_jobs, job);
+	} else {
+		push(&job->kc->pages_jobs, job);
+		r = 0;
+	}
+
+	return r;
+}
+
 static int run_pages_job(struct kcopyd_job *job)
 {
 	int r;
@@ -659,6 +698,7 @@ static void do_work(struct work_struct *work)
 	spin_unlock_irq(&kc->job_lock);
 
 	blk_start_plug(&plug);
+	process_jobs(&kc->copy_jobs, kc, run_copy_job);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
 	process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +716,8 @@ static void dispatch_job(struct kcopyd_job *job)
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
 		push(&kc->callback_jobs, job);
+	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+		push(&kc->copy_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -916,6 +958,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 	spin_lock_init(&kc->job_lock);
 	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
+	INIT_LIST_HEAD(&kc->copy_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
 	kc->throttle = throttle;
@@ -971,6 +1014,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 
 	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
+	WARN_ON(!list_empty(&kc->copy_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
 	destroy_workqueue(kc->kcopyd_wq);
-- 
2.25.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support
  2021-08-17 10:14     ` [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support SelvaKumar S
@ 2021-08-17 13:08       ` Greg KH
  2021-08-17 14:42         ` Nitesh Shetty
  0 siblings, 1 reply; 30+ messages in thread
From: Greg KH @ 2021-08-17 13:08 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove,
	linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal,
	joshi.k, martin.petersen, linux-api, johannes.thumshirn,
	linux-fsdevel, joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 03:44:18PM +0530, SelvaKumar S wrote:
> From: Nitesh Shetty <nj.shetty@samsung.com>
> 
> Add device limits as sysfs entries,
>         - copy_offload (READ_WRITE)
>         - max_copy_sectors (READ_ONLY)
>         - max_copy_ranges_sectors (READ_ONLY)
>         - max_copy_nr_ranges (READ_ONLY)

You forgot to add Documentation/ABI/ entries for your new sysfs files,
so we can't properly review them :(

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
  2021-08-17 10:14     ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S
@ 2021-08-17 13:09       ` Greg KH
  2021-08-17 13:10       ` Greg KH
  2021-08-17 23:36       ` Darrick J. Wong
  2 siblings, 0 replies; 30+ messages in thread
From: Greg KH @ 2021-08-17 13:09 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove,
	linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal,
	joshi.k, martin.petersen, linux-api, johannes.thumshirn,
	linux-fsdevel, joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote:
> From: Nitesh Shetty <nj.shetty@samsung.com>
> 
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to a destination in the device. COPY ioctl accepts a 'copy_range'
> structure that contains destination (in sectors), no of sources and
> pointer to the array of source ranges. Each source range is represented by
> 'range_entry' that contains start and length of source ranges (in sectors)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
>  * [64,24], on the same device */
> 
> int main(void)
> {
> 	int ret, fd;
> 	struct range_entry source_range[] = {{.src = 0, .len = 8},
> 		{.src = 16, .len = 8}, {.src = 32, .len = 8},};
> 	struct copy_range cr;
> 
> 	cr.dest = 64;
> 	cr.nr_range = 3;
> 	cr.range_list = (__u64)&source_range;
> 
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 
> 	ret = ioctl(fd, BLKCOPY, &cr);
> 	if (ret < 0) printf("copy failure\n");
> 
> 	close(fd);
> 
> 	return ret;
> }
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
>  include/uapi/linux/fs.h |  8 ++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/block/ioctl.c b/block/ioctl.c
> index eb0491e90b9a..2af56d01e9fe 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
>  				    GFP_KERNEL, flags);
>  }
>  
> +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> +		unsigned long arg)
> +{
> +	struct copy_range crange;
> +	struct range_entry *rlist;
> +	int ret;
> +
> +	if (!(mode & FMODE_WRITE))
> +		return -EBADF;
> +
> +	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> +		return -EFAULT;
> +
> +	rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
> +			GFP_KERNEL);
> +	if (!rlist)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(rlist, (void __user *)crange.range_list,
> +				sizeof(*rlist) * crange.nr_range)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest,
> +			GFP_KERNEL, 0);
> +out:
> +	kfree(rlist);
> +	return ret;
> +}
> +
>  static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>  		unsigned long arg)
>  {
> @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
>  	case BLKSECDISCARD:
>  		return blk_ioctl_discard(bdev, mode, arg,
>  				BLKDEV_DISCARD_SECURE);
> +	case BLKCOPY:
> +		return blk_ioctl_copy(bdev, mode, arg);
>  	case BLKZEROOUT:
>  		return blk_ioctl_zeroout(bdev, mode, arg);
>  	case BLKGETDISKSEQ:
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 7a97b588d892..4183688ff398 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -76,6 +76,13 @@ struct range_entry {
>  	__u64 len;
>  };
>  
> +struct copy_range {
> +	__u64 dest;
> +	__u64 nr_range;
> +	__u64 range_list;
> +	__u64 rsvd;

If you have a "reserved" field, you HAVE to check that it is 0.  If not,
you can never use it in the future.

Also, you can spell it out, we have lots of vowels :)

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
  2021-08-17 10:14     ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S
  2021-08-17 13:09       ` Greg KH
@ 2021-08-17 13:10       ` Greg KH
  2021-08-17 14:48         ` Nitesh Shetty
  2021-08-17 23:36       ` Darrick J. Wong
  2 siblings, 1 reply; 30+ messages in thread
From: Greg KH @ 2021-08-17 13:10 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove,
	linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal,
	joshi.k, martin.petersen, linux-api, johannes.thumshirn,
	linux-fsdevel, joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote:
> From: Nitesh Shetty <nj.shetty@samsung.com>
> 
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to a destination in the device. COPY ioctl accepts a 'copy_range'
> structure that contains destination (in sectors), no of sources and
> pointer to the array of source ranges. Each source range is represented by
> 'range_entry' that contains start and length of source ranges (in sectors)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
>  * [64,24], on the same device */
> 
> int main(void)
> {
> 	int ret, fd;
> 	struct range_entry source_range[] = {{.src = 0, .len = 8},
> 		{.src = 16, .len = 8}, {.src = 32, .len = 8},};
> 	struct copy_range cr;
> 
> 	cr.dest = 64;
> 	cr.nr_range = 3;
> 	cr.range_list = (__u64)&source_range;
> 
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 
> 	ret = ioctl(fd, BLKCOPY, &cr);
> 	if (ret < 0) printf("copy failure\n");
> 
> 	close(fd);
> 
> 	return ret;
> }
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
>  include/uapi/linux/fs.h |  8 ++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/block/ioctl.c b/block/ioctl.c
> index eb0491e90b9a..2af56d01e9fe 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
>  				    GFP_KERNEL, flags);
>  }
>  
> +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> +		unsigned long arg)
> +{
> +	struct copy_range crange;
> +	struct range_entry *rlist;
> +	int ret;
> +
> +	if (!(mode & FMODE_WRITE))
> +		return -EBADF;
> +
> +	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> +		return -EFAULT;
> +
> +	rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
> +			GFP_KERNEL);

No error checking for huge values of nr_range?  Is that wise?  You
really want userspace to be able to allocate "all" of the kernel memory
in the system?

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support
  2021-08-17 13:08       ` Greg KH
@ 2021-08-17 14:42         ` Nitesh Shetty
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Shetty @ 2021-08-17 14:42 UTC (permalink / raw)
  To: Greg KH
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal,
	joshi.k, martin.petersen, linux-api, johannes.thumshirn,
	linux-fsdevel, joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 6:38 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Aug 17, 2021 at 03:44:18PM +0530, SelvaKumar S wrote:
> > From: Nitesh Shetty <nj.shetty@samsung.com>
> >
> > Add device limits as sysfs entries,
> >         - copy_offload (READ_WRITE)
> >         - max_copy_sectors (READ_ONLY)
> >         - max_copy_ranges_sectors (READ_ONLY)
> >         - max_copy_nr_ranges (READ_ONLY)
>
> You forgot to add Documentation/ABI/ entries for your new sysfs files,
> so we can't properly review them :(
>

Sorry, we will update the documentation in the next version.

Thanks,
Nitesh Shetty

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
  2021-08-17 13:10       ` Greg KH
@ 2021-08-17 14:48         ` Nitesh Shetty
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Shetty @ 2021-08-17 14:48 UTC (permalink / raw)
  To: Greg KH
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove,
	linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal,
	joshi.k, martin.petersen, linux-api, johannes.thumshirn,
	linux-fsdevel, joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 6:40 PM Greg KH <greg@kroah.com> wrote:
>
> On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote:
> > From: Nitesh Shetty <nj.shetty@samsung.com>
> >
> > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> > to a destination in the device. COPY ioctl accepts a 'copy_range'
> > structure that contains destination (in sectors), no of sources and
> > pointer to the array of source ranges. Each source range is represented by
> > 'range_entry' that contains start and length of source ranges (in sectors)
> >
> > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> >
> > Example code, to issue BLKCOPY:
> > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
> >  * [64,24], on the same device */
> >
> > int main(void)
> > {
> >       int ret, fd;
> >       struct range_entry source_range[] = {{.src = 0, .len = 8},
> >               {.src = 16, .len = 8}, {.src = 32, .len = 8},};
> >       struct copy_range cr;
> >
> >       cr.dest = 64;
> >       cr.nr_range = 3;
> >       cr.range_list = (__u64)&source_range;
> >
> >       fd = open("/dev/nvme0n1", O_RDWR);
> >       if (fd < 0) return 1;
> >
> >       ret = ioctl(fd, BLKCOPY, &cr);
> >       if (ret < 0) printf("copy failure\n");
> >
> >       close(fd);
> >
> >       return ret;
> > }
> >
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > ---
> >  block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
> >  include/uapi/linux/fs.h |  8 ++++++++
> >  2 files changed, 41 insertions(+)
> >
> > diff --git a/block/ioctl.c b/block/ioctl.c
> > index eb0491e90b9a..2af56d01e9fe 100644
> > --- a/block/ioctl.c
> > +++ b/block/ioctl.c
> > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
> >                                   GFP_KERNEL, flags);
> >  }
> >
> > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> > +             unsigned long arg)
> > +{
> > +     struct copy_range crange;
> > +     struct range_entry *rlist;
> > +     int ret;
> > +
> > +     if (!(mode & FMODE_WRITE))
> > +             return -EBADF;
> > +
> > +     if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> > +             return -EFAULT;
> > +
> > +     rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
> > +                     GFP_KERNEL);
>
> No error checking for huge values of nr_range?  Is that wise?  You
> really want userspace to be able to allocate "all" of the kernel memory
> in the system?
>
> thanks,
>
> greg k-h

We added a kernel imposed limit MAX_COPY_NR_RANGE for that purpose,
but missed adding the check here.
Will have that fixed. Thanks for pointing this out.

Nitesh Shetty

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 10:14     ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S
@ 2021-08-17 17:14       ` Bart Van Assche
  2021-08-17 20:41         ` Mikulas Patocka
  2021-08-20 10:39         ` Kanchan Joshi
  2021-08-17 20:35       ` kernel test robot
  2021-08-18 18:35       ` Martin K. Petersen
  2 siblings, 2 replies; 30+ messages in thread
From: Bart Van Assche @ 2021-08-17 17:14 UTC (permalink / raw)
  To: SelvaKumar S, linux-nvme, linux-block
  Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, dm-devel, hch, agk,
	linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	Martin K. Petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

On 8/17/21 3:14 AM, SelvaKumar S wrote:
> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
> bio with control information as payload and submit to the device.
> Larger copy operation may be divided if necessary by looking at device
> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
> submitted to zoned device.
> Native copy offload is not supported for stacked devices.

Using a single operation for copy-offloading instead of separate 
operations for reading and writing is fundamentally incompatible with 
the device mapper. I think we need a copy-offloading implementation that 
is compatible with the device mapper.

Storing the parameters of the copy operation in the bio payload is 
incompatible with the current implementation of bio_split().

In other words, I think there are fundamental problems with this patch 
series.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support
  2021-08-17 10:14     ` [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support SelvaKumar S
@ 2021-08-17 20:29       ` Mikulas Patocka
  0 siblings, 0 replies; 30+ messages in thread
From: Mikulas Patocka @ 2021-08-17 20:29 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove,
	linux-block, joshiiitr, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, johannes.thumshirn, linux-api, linux-fsdevel,
	javier.gonz, asml.silence



On Tue, 17 Aug 2021, SelvaKumar S wrote:

> Introduce copy_jobs to use copy-offload, if supported by underlying devices
> otherwise fall back to existing method.

dm-kcopyd is usually used on the dm-linear target. And this patchset 
doesn't support passing copy requests through the linear target - so this 
patch doesn't seem useful.

Mikulas

> run_copy_jobs() calls block layer copy offload API, if both source and
> destination request queue are same and support copy offload.
> On successful completion, destination regions copied count is made zero,
> failed regions are processed via existing method.
> 
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> ---
>  drivers/md/dm-kcopyd.c | 56 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 50 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
> index 37b03ab7e5c9..d9ee105a6127 100644
> --- a/drivers/md/dm-kcopyd.c
> +++ b/drivers/md/dm-kcopyd.c
> @@ -74,18 +74,20 @@ struct dm_kcopyd_client {
>  	atomic_t nr_jobs;
>  
>  /*
> - * We maintain four lists of jobs:
> + * We maintain five lists of jobs:
>   *
> - * i)   jobs waiting for pages
> - * ii)  jobs that have pages, and are waiting for the io to be issued.
> - * iii) jobs that don't need to do any IO and just run a callback
> - * iv) jobs that have completed.
> + * i)	jobs waiting to try copy offload
> + * ii)   jobs waiting for pages
> + * iii)  jobs that have pages, and are waiting for the io to be issued.
> + * iv) jobs that don't need to do any IO and just run a callback
> + * v) jobs that have completed.
>   *
> - * All four of these are protected by job_lock.
> + * All five of these are protected by job_lock.
>   */
>  	spinlock_t job_lock;
>  	struct list_head callback_jobs;
>  	struct list_head complete_jobs;
> +	struct list_head copy_jobs;
>  	struct list_head io_jobs;
>  	struct list_head pages_jobs;
>  };
> @@ -579,6 +581,43 @@ static int run_io_job(struct kcopyd_job *job)
>  	return r;
>  }
>  
> +static int run_copy_job(struct kcopyd_job *job)
> +{
> +	int r, i, count = 0;
> +	unsigned long flags = 0;
> +	struct range_entry srange;
> +
> +	struct request_queue *src_q, *dest_q;
> +
> +	for (i = 0; i < job->num_dests; i++) {
> +		srange.src = job->source.sector;
> +		srange.len = job->source.count;
> +
> +		src_q = bdev_get_queue(job->source.bdev);
> +		dest_q = bdev_get_queue(job->dests[i].bdev);
> +
> +		if (src_q != dest_q && !src_q->limits.copy_offload)
> +			break;
> +
> +		r = blkdev_issue_copy(job->source.bdev, 1, &srange,
> +			job->dests[i].bdev, job->dests[i].sector, GFP_KERNEL, flags);
> +		if (r)
> +			break;
> +
> +		job->dests[i].count = 0;
> +		count++;
> +	}
> +
> +	if (count == job->num_dests) {
> +		push(&job->kc->complete_jobs, job);
> +	} else {
> +		push(&job->kc->pages_jobs, job);
> +		r = 0;
> +	}
> +
> +	return r;
> +}
> +
>  static int run_pages_job(struct kcopyd_job *job)
>  {
>  	int r;
> @@ -659,6 +698,7 @@ static void do_work(struct work_struct *work)
>  	spin_unlock_irq(&kc->job_lock);
>  
>  	blk_start_plug(&plug);
> +	process_jobs(&kc->copy_jobs, kc, run_copy_job);
>  	process_jobs(&kc->complete_jobs, kc, run_complete_job);
>  	process_jobs(&kc->pages_jobs, kc, run_pages_job);
>  	process_jobs(&kc->io_jobs, kc, run_io_job);
> @@ -676,6 +716,8 @@ static void dispatch_job(struct kcopyd_job *job)
>  	atomic_inc(&kc->nr_jobs);
>  	if (unlikely(!job->source.count))
>  		push(&kc->callback_jobs, job);
> +	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
> +		push(&kc->copy_jobs, job);
>  	else if (job->pages == &zero_page_list)
>  		push(&kc->io_jobs, job);
>  	else
> @@ -916,6 +958,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
>  	spin_lock_init(&kc->job_lock);
>  	INIT_LIST_HEAD(&kc->callback_jobs);
>  	INIT_LIST_HEAD(&kc->complete_jobs);
> +	INIT_LIST_HEAD(&kc->copy_jobs);
>  	INIT_LIST_HEAD(&kc->io_jobs);
>  	INIT_LIST_HEAD(&kc->pages_jobs);
>  	kc->throttle = throttle;
> @@ -971,6 +1014,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
>  
>  	BUG_ON(!list_empty(&kc->callback_jobs));
>  	BUG_ON(!list_empty(&kc->complete_jobs));
> +	WARN_ON(!list_empty(&kc->copy_jobs));
>  	BUG_ON(!list_empty(&kc->io_jobs));
>  	BUG_ON(!list_empty(&kc->pages_jobs));
>  	destroy_workqueue(kc->kcopyd_wq);
> -- 
> 2.25.1
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 10:14     ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S
  2021-08-17 17:14       ` Bart Van Assche
@ 2021-08-17 20:35       ` kernel test robot
  2021-08-18 18:35       ` Martin K. Petersen
  2 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-08-17 20:35 UTC (permalink / raw)
  To: SelvaKumar S, linux-nvme, linux-block
  Cc: axboe, damien.lemoal, kbuild-all, linux-scsi, linux-api, kbusch,
	clang-built-linux, dm-devel, linux-fsdevel, asml.silence

[-- Attachment #1: Type: text/plain, Size: 7342 bytes --]

Hi SelvaKumar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on block/for-next]
[also build test WARNING on dm/for-next linus/master v5.14-rc6 next-20210817]
[cannot apply to linux-nvme/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r013-20210816 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/35fc502a7f20a7cd42432cee2777a621c40a3bd3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111
        git checkout 35fc502a7f20a7cd42432cee2777a621c40a3bd3
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:197:5: warning: no previous prototype for function 'blk_copy_offload_submit_bio' [-Wmissing-prototypes]
   int blk_copy_offload_submit_bio(struct block_device *bdev,
       ^
   block/blk-lib.c:197:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload_submit_bio(struct block_device *bdev,
   ^
   static 
>> block/blk-lib.c:250:5: warning: no previous prototype for function 'blk_copy_offload_scc' [-Wmissing-prototypes]
   int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:250:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
   2 warnings generated.


vim +/blk_copy_offload_submit_bio +197 block/blk-lib.c

   196	
 > 197	int blk_copy_offload_submit_bio(struct block_device *bdev,
   198			struct blk_copy_payload *payload, int payload_size,
   199			struct cio *cio, gfp_t gfp_mask)
   200	{
   201		struct request_queue *q = bdev_get_queue(bdev);
   202		struct bio *bio;
   203	
   204		bio = bio_map_kern(q, payload, payload_size, gfp_mask);
   205		if (IS_ERR(bio))
   206			return PTR_ERR(bio);
   207	
   208		bio_set_dev(bio, bdev);
   209		bio->bi_opf = REQ_OP_COPY | REQ_NOMERGE;
   210		bio->bi_iter.bi_sector = payload->dest;
   211		bio->bi_end_io = cio_bio_end_io;
   212		bio->bi_private = cio;
   213		atomic_inc(&cio->refcount);
   214		submit_bio(bio);
   215	
   216		return 0;
   217	}
   218	
   219	/* Go through all the enrties inside user provided payload, and determine the
   220	 * maximum number of entries in a payload, based on device's scc-limits.
   221	 */
   222	static inline int blk_max_payload_entries(int nr_srcs, struct range_entry *rlist,
   223			int max_nr_srcs, sector_t max_copy_range_sectors, sector_t max_copy_len)
   224	{
   225		sector_t range_len, copy_len = 0, remaining = 0;
   226		int ri = 0, pi = 1, max_pi = 0;
   227	
   228		for (ri = 0; ri < nr_srcs; ri++) {
   229			for (remaining = rlist[ri].len; remaining > 0; remaining -= range_len) {
   230				range_len = min3(remaining, max_copy_range_sectors,
   231									max_copy_len - copy_len);
   232				pi++;
   233				copy_len += range_len;
   234	
   235				if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) {
   236					max_pi = max(max_pi, pi);
   237					pi = 1;
   238					copy_len = 0;
   239				}
   240			}
   241		}
   242	
   243		return max(max_pi, pi);
   244	}
   245	
   246	/*
   247	 * blk_copy_offload_scc	- Use device's native copy offload feature
   248	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   249	 */
 > 250	int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
   251			struct range_entry *rlist, struct block_device *dest_bdev,
   252			sector_t dest, gfp_t gfp_mask)
   253	{
   254		struct request_queue *q = bdev_get_queue(dest_bdev);
   255		struct cio *cio = NULL;
   256		struct blk_copy_payload *payload;
   257		sector_t range_len, copy_len = 0, remaining = 0;
   258		sector_t src_blk, cdest = dest;
   259		sector_t max_copy_range_sectors, max_copy_len;
   260		int ri = 0, pi = 0, ret = 0, payload_size, max_pi, max_nr_srcs;
   261	
   262		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   263		if (!cio)
   264			return -ENOMEM;
   265		atomic_set(&cio->refcount, 0);
   266	
   267		max_nr_srcs = q->limits.max_copy_nr_ranges;
   268		max_copy_range_sectors = q->limits.max_copy_range_sectors;
   269		max_copy_len = q->limits.max_copy_sectors;
   270	
   271		max_pi = blk_max_payload_entries(nr_srcs, rlist, max_nr_srcs,
   272						max_copy_range_sectors, max_copy_len);
   273		payload_size = struct_size(payload, range, max_pi);
   274	
   275		payload = kvmalloc(payload_size, gfp_mask);
   276		if (!payload) {
   277			ret = -ENOMEM;
   278			goto free_cio;
   279		}
   280		payload->src_bdev = src_bdev;
   281	
   282		for (ri = 0; ri < nr_srcs; ri++) {
   283			for (remaining = rlist[ri].len, src_blk = rlist[ri].src; remaining > 0;
   284							remaining -= range_len, src_blk += range_len) {
   285	
   286				range_len = min3(remaining, max_copy_range_sectors,
   287									max_copy_len - copy_len);
   288				payload->range[pi].len = range_len;
   289				payload->range[pi].src = src_blk;
   290				pi++;
   291				copy_len += range_len;
   292	
   293				/* Submit current payload, if crossing device copy limits */
   294				if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) {
   295					payload->dest = cdest;
   296					payload->copy_nr_ranges = pi;
   297					ret = blk_copy_offload_submit_bio(dest_bdev, payload,
   298									payload_size, cio, gfp_mask);
   299					if (ret)
   300						goto free_payload;
   301	
   302					/* reset index, length and allocate new payload */
   303					pi = 0;
   304					cdest += copy_len;
   305					copy_len = 0;
   306					payload = kvmalloc(payload_size, gfp_mask);
   307					if (!payload) {
   308						ret = -ENOMEM;
   309						goto free_cio;
   310					}
   311					payload->src_bdev = src_bdev;
   312				}
   313			}
   314		}
   315	
   316		if (pi) {
   317			payload->dest = cdest;
   318			payload->copy_nr_ranges = pi;
   319			ret = blk_copy_offload_submit_bio(dest_bdev, payload, payload_size, cio, gfp_mask);
   320			if (ret)
   321				goto free_payload;
   322		}
   323	
   324		/* Wait for completion of all IO's*/
   325		ret = cio_await_completion(cio);
   326	
   327		return ret;
   328	
   329	free_payload:
   330		kvfree(payload);
   331	free_cio:
   332		cio_await_completion(cio);
   333		return ret;
   334	}
   335	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22159 bytes --]

[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 17:14       ` Bart Van Assche
@ 2021-08-17 20:41         ` Mikulas Patocka
  2021-08-17 21:53           ` Douglas Gilbert
  2021-08-20 10:39         ` Kanchan Joshi
  1 sibling, 1 reply; 30+ messages in thread
From: Mikulas Patocka @ 2021-08-17 20:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel,
	hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch,
	SelvaKumar S, selvajove, linux-block, javier.gonz, kbusch, axboe,
	damien.lemoal, joshi.k, Martin K. Petersen, linux-api,
	johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence



On Tue, 17 Aug 2021, Bart Van Assche wrote:

> On 8/17/21 3:14 AM, SelvaKumar S wrote:
> > Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
> > bio with control information as payload and submit to the device.
> > Larger copy operation may be divided if necessary by looking at device
> > limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
> > submitted to zoned device.
> > Native copy offload is not supported for stacked devices.
> 
> Using a single operation for copy-offloading instead of separate operations
> for reading and writing is fundamentally incompatible with the device mapper.
> I think we need a copy-offloading implementation that is compatible with the
> device mapper.

I once wrote a copy offload implementation that is compatible with device 
mapper. The copy operation creates two bios (one for reading and one for 
writing), passes them independently through device mapper and pairs them 
at the physical device driver.

It's here: http://people.redhat.com/~mpatocka/patches/kernel/xcopy/current

I verified that it works with iSCSI. Would you be interested in continuing 
this work?

Mikulas

> Storing the parameters of the copy operation in the bio payload is
> incompatible with the current implementation of bio_split().
> 
> In other words, I think there are fundamental problems with this patch series.
> 
> Bart.
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 20:41         ` Mikulas Patocka
@ 2021-08-17 21:53           ` Douglas Gilbert
  2021-08-17 22:06             ` Bart Van Assche
  0 siblings, 1 reply; 30+ messages in thread
From: Douglas Gilbert @ 2021-08-17 21:53 UTC (permalink / raw)
  To: Mikulas Patocka, Bart Van Assche
  Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel,
	hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch,
	SelvaKumar S, selvajove, linux-block, javier.gonz, kbusch, axboe,
	damien.lemoal, joshi.k, Martin K. Petersen, linux-api,
	johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence

On 2021-08-17 4:41 p.m., Mikulas Patocka wrote:
> 
> 
> On Tue, 17 Aug 2021, Bart Van Assche wrote:
> 
>> On 8/17/21 3:14 AM, SelvaKumar S wrote:
>>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
>>> bio with control information as payload and submit to the device.
>>> Larger copy operation may be divided if necessary by looking at device
>>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
>>> submitted to zoned device.
>>> Native copy offload is not supported for stacked devices.
>>
>> Using a single operation for copy-offloading instead of separate operations
>> for reading and writing is fundamentally incompatible with the device mapper.
>> I think we need a copy-offloading implementation that is compatible with the
>> device mapper.
> 
> I once wrote a copy offload implementation that is compatible with device
> mapper. The copy operation creates two bios (one for reading and one for
> writing), passes them independently through device mapper and pairs them
> at the physical device driver.
> 
> It's here: http://people.redhat.com/~mpatocka/patches/kernel/xcopy/current

In my copy solution the read-side and write-side bio pairs share the same 
storage (i.e. ram) This gets around the need to copy data between the bio_s.
See:
    https://sg.danny.cz/sg/sg_v40.html
in Section 8 on Request sharing. This technique can be efficiently extend to
source --> destination1,destination2,...      copies.

Doug Gilbert

> I verified that it works with iSCSI. Would you be interested in continuing
> this work?
> 
> Mikulas
> 
>> Storing the parameters of the copy operation in the bio payload is
>> incompatible with the current implementation of bio_split().
>>
>> In other words, I think there are fundamental problems with this patch series.
>>
>> Bart.
>>
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 21:53           ` Douglas Gilbert
@ 2021-08-17 22:06             ` Bart Van Assche
  0 siblings, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2021-08-17 22:06 UTC (permalink / raw)
  To: dgilbert, Mikulas Patocka
  Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel,
	hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch,
	SelvaKumar S, selvajove, linux-block, javier.gonz, kbusch, axboe,
	damien.lemoal, joshi.k, Martin K. Petersen, linux-api,
	johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence

On 8/17/21 2:53 PM, Douglas Gilbert wrote:
> On 2021-08-17 4:41 p.m., Mikulas Patocka wrote:
>> On Tue, 17 Aug 2021, Bart Van Assche wrote:
>>> On 8/17/21 3:14 AM, SelvaKumar S wrote:
>>>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
>>>> bio with control information as payload and submit to the device.
>>>> Larger copy operation may be divided if necessary by looking at device
>>>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
>>>> submitted to zoned device.
>>>> Native copy offload is not supported for stacked devices.
>>>
>>> Using a single operation for copy-offloading instead of separate 
>>> operations
>>> for reading and writing is fundamentally incompatible with the device 
>>> mapper.
>>> I think we need a copy-offloading implementation that is compatible 
>>> with the
>>> device mapper.
>>
>> I once wrote a copy offload implementation that is compatible with device
>> mapper. The copy operation creates two bios (one for reading and one for
>> writing), passes them independently through device mapper and pairs them
>> at the physical device driver.
>>
>> It's here: 
>> http://people.redhat.com/~mpatocka/patches/kernel/xcopy/current
> 
> In my copy solution the read-side and write-side bio pairs share the 
> same storage (i.e. ram) This gets around the need to copy data between 
> the bio_s.
> See:
>     https://sg.danny.cz/sg/sg_v40.html
> in Section 8 on Request sharing. This technique can be efficiently 
> extend to
> source --> destination1,destination2,...      copies.
> 
> Doug Gilbert
> 
>> I verified that it works with iSCSI. Would you be interested in 
>> continuing
>> this work?

Hi Mikulas and Doug,

Yes, I'm interested in continuing Mikulas' work on copy offloading. I 
will take a look at Doug's approach too for sharing buffers between 
read-side and write-side bios. It may take a few months however before I 
can find the time to work on this.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 5/7] block: add emulation for simple copy
  2021-08-17 10:14     ` [dm-devel] [PATCH 5/7] block: add emulation " SelvaKumar S
@ 2021-08-17 22:10       ` kernel test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-08-17 22:10 UTC (permalink / raw)
  To: SelvaKumar S, linux-nvme, linux-block
  Cc: axboe, damien.lemoal, kbuild-all, linux-scsi, linux-api, kbusch,
	clang-built-linux, dm-devel, linux-fsdevel, asml.silence

[-- Attachment #1: Type: text/plain, Size: 4154 bytes --]

Hi SelvaKumar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on block/for-next]
[also build test WARNING on dm/for-next next-20210817]
[cannot apply to linus/master linux-nvme/for-next v5.14-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r013-20210816 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/c307f1051a72122d636502c6885df8b2b25ed697
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111
        git checkout c307f1051a72122d636502c6885df8b2b25ed697
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   block/blk-lib.c:197:5: warning: no previous prototype for function 'blk_copy_offload_submit_bio' [-Wmissing-prototypes]
   int blk_copy_offload_submit_bio(struct block_device *bdev,
       ^
   block/blk-lib.c:197:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload_submit_bio(struct block_device *bdev,
   ^
   static 
   block/blk-lib.c:250:5: warning: no previous prototype for function 'blk_copy_offload_scc' [-Wmissing-prototypes]
   int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:250:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
>> block/blk-lib.c:336:5: warning: no previous prototype for function 'blk_submit_rw_buf' [-Wmissing-prototypes]
   int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
       ^
   block/blk-lib.c:336:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   ^
   static 
   3 warnings generated.


vim +/blk_submit_rw_buf +336 block/blk-lib.c

   335	
 > 336	int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   337					sector_t sector, unsigned int op, gfp_t gfp_mask)
   338	{
   339		struct request_queue *q = bdev_get_queue(bdev);
   340		struct bio *bio, *parent = NULL;
   341		sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
   342				queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT));
   343		sector_t len, remaining;
   344		int ret;
   345	
   346		for (remaining = buf_len; remaining > 0; remaining -= len) {
   347			len = min_t(int, max_hw_len, remaining);
   348	retry:
   349			bio = bio_map_kern(q, buf, len << SECTOR_SHIFT, gfp_mask);
   350			if (IS_ERR(bio)) {
   351				len >>= 1;
   352				if (len)
   353					goto retry;
   354				return PTR_ERR(bio);
   355			}
   356	
   357			bio->bi_iter.bi_sector = sector;
   358			bio->bi_opf = op;
   359			bio_set_dev(bio, bdev);
   360			bio->bi_end_io = NULL;
   361			bio->bi_private = NULL;
   362	
   363			if (parent) {
   364				bio_chain(parent, bio);
   365				submit_bio(parent);
   366			}
   367			parent = bio;
   368			sector += len;
   369		}
   370		ret = submit_bio_wait(bio);
   371		bio_put(bio);
   372	
   373		return ret;
   374	}
   375	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22159 bytes --]

[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
  2021-08-17 10:14     ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S
  2021-08-17 13:09       ` Greg KH
  2021-08-17 13:10       ` Greg KH
@ 2021-08-17 23:36       ` Darrick J. Wong
  2021-08-18 15:37         ` Nitesh Shetty
  2 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2021-08-17 23:36 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote:
> From: Nitesh Shetty <nj.shetty@samsung.com>
> 
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to a destination in the device. COPY ioctl accepts a 'copy_range'
> structure that contains destination (in sectors), no of sources and
> pointer to the array of source ranges. Each source range is represented by
> 'range_entry' that contains start and length of source ranges (in sectors)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
>  * [64,24], on the same device */
> 
> int main(void)
> {
> 	int ret, fd;
> 	struct range_entry source_range[] = {{.src = 0, .len = 8},
> 		{.src = 16, .len = 8}, {.src = 32, .len = 8},};
> 	struct copy_range cr;
> 
> 	cr.dest = 64;
> 	cr.nr_range = 3;
> 	cr.range_list = (__u64)&source_range;
> 
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 
> 	ret = ioctl(fd, BLKCOPY, &cr);
> 	if (ret < 0) printf("copy failure\n");
> 
> 	close(fd);
> 
> 	return ret;
> }
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
>  include/uapi/linux/fs.h |  8 ++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/block/ioctl.c b/block/ioctl.c
> index eb0491e90b9a..2af56d01e9fe 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
>  				    GFP_KERNEL, flags);
>  }
>  
> +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> +		unsigned long arg)
> +{
> +	struct copy_range crange;
> +	struct range_entry *rlist;
> +	int ret;
> +
> +	if (!(mode & FMODE_WRITE))
> +		return -EBADF;
> +
> +	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> +		return -EFAULT;
> +
> +	rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
> +			GFP_KERNEL);
> +	if (!rlist)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(rlist, (void __user *)crange.range_list,
> +				sizeof(*rlist) * crange.nr_range)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest,
> +			GFP_KERNEL, 0);
> +out:
> +	kfree(rlist);
> +	return ret;
> +}
> +
>  static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>  		unsigned long arg)
>  {
> @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
>  	case BLKSECDISCARD:
>  		return blk_ioctl_discard(bdev, mode, arg,
>  				BLKDEV_DISCARD_SECURE);
> +	case BLKCOPY:
> +		return blk_ioctl_copy(bdev, mode, arg);
>  	case BLKZEROOUT:
>  		return blk_ioctl_zeroout(bdev, mode, arg);
>  	case BLKGETDISKSEQ:
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 7a97b588d892..4183688ff398 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -76,6 +76,13 @@ struct range_entry {
>  	__u64 len;
>  };
>  
> +struct copy_range {
> +	__u64 dest;
> +	__u64 nr_range;

If the maximum number of elements in the range list is 1<<12, there's no
need for this to be larger than a u16, right?

> +	__u64 range_list;

Pointers embedded in a structure are /not/ a good idea, because this
will create a lot of compatibility headaches for 32-bit binaries running
on 64-bit kernels.  Please just make the size of this header structure
a multiple of 8 bytes and put the range_entry list immediately after it.

struct copy_range {
	__s64 dest_offset;
	__u32 nr_range_entries;
	__u32 flags;
	__u64 reserved[2];
};

struct __user range_entry *re = ((struct range_entry *)(copyhead + 1));

copy_from_user(&urk, re...);

--D

> +	__u64 rsvd;
> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1
> @@ -197,6 +204,7 @@ struct fsxattr {
>  #define BLKROTATIONAL _IO(0x12,126)
>  #define BLKZEROOUT _IO(0x12,127)
>  #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
> +#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
>  /*
>   * A jump here: 130-136 are reserved for zoned block devices
>   * (see uapi/linux/blkzoned.h)
> -- 
> 2.25.1
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 0/7] add simple copy support
  2021-08-17 10:14 ` [dm-devel] [PATCH 0/7] add simple copy support SelvaKumar S
                     ` (6 preceding siblings ...)
       [not found]   ` <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com>
@ 2021-08-17 23:37   ` Darrick J. Wong
  2021-08-18 15:40     ` Nitesh Shetty
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2021-08-17 23:37 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi,
	nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

On Tue, Aug 17, 2021 at 03:44:16PM +0530, SelvaKumar S wrote:
> This started out as an attempt to support NVMe Simple Copy Command (SCC),
> and evolved during the RFC review process.
> 
> The patchset, at this point, contains -
> 1. SCC support in NVMe driver
> 2. Block-layer infra for copy-offload operation
> 3. ioctl interface to user-space
> 4. copy-emulation infra in the block-layer
> 5. copy-offload plumbing to dm-kcopyd (thus creating couple of in-kernel
> 	users such as dm-clone)
> 
> 
> The SCC specification, i.e. TP4065a can be found in following link
> 
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip
> 
> Simple copy is a copy offloading feature and can be used to copy multiple
> contiguous ranges (source_ranges) of LBA's to a single destination LBA
> within the device, reducing traffic between host and device.
> 
> We define a block ioctl for copy and copy payload format similar to
> discard. For device supporting native simple copy, we attach the control
> information as payload to the bio and submit to the device. Copy emulation
> is implemented incase underlaying device does not support copy offload or
> based on sysfs choice. Copy emulation is done by reading each source range
> into buffer and writing it to the destination.

Seems useful.  Would you mind adapting the loop driver to call
copy_file_range (for files) so that anyone interested in making a
filesystem use this capability (cough) can write fstests?

--D

> At present this implementation does not support copy offload for stacked/dm
> devices, rather copy operation is completed through emulation.
> 
> One of the in-kernel use case for copy-offload is implemented by this
> patchset. dm-kcopyd infra has been changed to leverage the copy-offload if
> it is natively available. Other future use-cases could be F2FS GC, BTRFS
> relocation/balance and copy_file_range.
> 
> Following limits are added to queue limits and are exposed via sysfs to
> userspace, which user can use to form a payload.
> 
>  - copy_offload:
> 	configurable, can be used set emulation or copy offload
> 		0 to disable copy offload,
> 		1 to enable copy offloading support. Offload can be only
> 			enabled, if underlaying device supports offload
> 
>  - max_copy_sectors:
> 	total copy length supported by copy offload feature in device.
> 	0 indicates copy offload is not supported.
> 
>  - max_copy_nr_ranges:
> 	maximum number of source range entries supported by copy offload
> 			feature in device
> 
>  - max_copy_range_sectors:
> 	maximum copy length per source range entry
> 
> *blkdev_issue_copy* takes source bdev, no of sources, array of source
> ranges (in sectors), destination bdev and destination offset(in sectors).
> If both source and destination block devices are same and queue parameter
> copy_offload is 1, then copy is done through native copy offloading.
> Copy emulation is used in otherwise.
> 
> Changes from  RFC v5
> 
> 1. Handle copy larger than maximum copy limits
> 2. Create copy context and submit copy offload asynchronously
> 3. Remove BLKDEV_COPY_NOEMULATION opt-in option of copy offload and
> check for copy support before submission from dm and other layers
> 4. Allocate maximum possible allocatable buffer for copy emulation
> rather failing very large copy offload.
> 5. Fix copy_offload sysfs to be either have 0 or 1
> 
> Changes from RFC v4
> 
> 1. Extend dm-kcopyd to leverage copy-offload, while copying within the
> same device. The other approach was to have copy-emulation by moving
> dm-kcopyd to block layer. But it also required moving core dm-io infra,
> causing a massive churn across multiple dm-targets.
> 2. Remove export in bio_map_kern()
> 3. Change copy_offload sysfs to accept 0 or else
> 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
> 5. Rename payload entries, add source bdev field to be used while
> partition remapping, remove copy_size
> 6. Change the blkdev_issue_copy() interface to accept destination and
> source values in sector rather in bytes
> 7. Add payload to bio using bio_map_kern() for copy_offload case
> 8. Add check to return error if one of the source range length is 0
> 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy
> emulation incase of copy offload is not supported. Caller can his use
> his existing copying logic to complete the io.
> 10. Bug fix copy checks and reduce size of rcu_lock()
> 
> Changes from RFC v3
> 
> 1. gfp_flag fixes.
> 2. Export bio_map_kern() and use it to allocate and add pages to bio.
> 3. Move copy offload, reading to buf, writing from buf to separate functions
> 4. Send read bio of copy offload by chaining them and submit asynchronously
> 5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy().
> 6. Move single source range limit check to blk_check_copy()
> 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper.
> 8. Change blkdev_issue_copy() interface generic to accepts destination bdev
>         to support XCOPY as well.
> 9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory.
> 10. Fix buf allocoation logic to allocate buffer for the total size of copy.
> 11. Reword patch commit description.
> 
> Changes from RFC v2
> 
> 1. Add emulation support for devices not supporting copy.
> 2. Add *copy_offload* sysfs entry to enable and disable copy_offload
>         in devices supporting simple copy.
> 3. Remove simple copy support for stacked devices.
> 
> Changes from RFC v1:
> 
> 1. Fix memory leak in __blkdev_issue_copy
> 2. Unmark blk_check_copy inline
> 3. Fix line break in blk_check_copy_eod
> 4. Remove p checks and made code more readable
> 5. Don't use bio_set_op_attrs and remove op and set
>    bi_opf directly
> 6. Use struct_size to calculate total_size
> 7. Fix partition remap of copy destination
> 8. Remove mcl,mssrl,msrc from nvme_ns
> 9. Initialize copy queue limits to 0 in nvme_config_copy
> 10. Remove return in QUEUE_FLAG_COPY check
> 11. Remove unused OCFS
> 
> 
> Nitesh Shetty (4):
>   block: Introduce queue limits for copy-offload support
>   block: copy offload support infrastructure
>   block: Introduce a new ioctl for simple copy
>   block: add emulation for simple copy
> 
> SelvaKumar S (3):
>   block: make bio_map_kern() non static
>   nvme: add simple copy support
>   dm kcopyd: add simple copy offload support
> 
>  block/blk-core.c          |  84 ++++++++-
>  block/blk-lib.c           | 352 ++++++++++++++++++++++++++++++++++++++
>  block/blk-map.c           |   2 +-
>  block/blk-settings.c      |   4 +
>  block/blk-sysfs.c         |  51 ++++++
>  block/blk-zoned.c         |   1 +
>  block/bounce.c            |   1 +
>  block/ioctl.c             |  33 ++++
>  drivers/md/dm-kcopyd.c    |  56 +++++-
>  drivers/nvme/host/core.c  |  83 +++++++++
>  drivers/nvme/host/trace.c |  19 ++
>  include/linux/bio.h       |   1 +
>  include/linux/blk_types.h |  20 +++
>  include/linux/blkdev.h    |  21 +++
>  include/linux/nvme.h      |  43 ++++-
>  include/uapi/linux/fs.h   |  20 +++
>  16 files changed, 775 insertions(+), 16 deletions(-)
> 
> -- 
> 2.25.1
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
  2021-08-17 23:36       ` Darrick J. Wong
@ 2021-08-18 15:37         ` Nitesh Shetty
  2021-08-18 16:17           ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Shetty @ 2021-08-18 15:37 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi,
	willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

On Wed, Aug 18, 2021 at 5:06 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote:
> > From: Nitesh Shetty <nj.shetty@samsung.com>
> >
> > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> > to a destination in the device. COPY ioctl accepts a 'copy_range'
> > structure that contains destination (in sectors), no of sources and
> > pointer to the array of source ranges. Each source range is represented by
> > 'range_entry' that contains start and length of source ranges (in sectors)
> >
> > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> >
> > Example code, to issue BLKCOPY:
> > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
> >  * [64,24], on the same device */
> >
> > int main(void)
> > {
> >       int ret, fd;
> >       struct range_entry source_range[] = {{.src = 0, .len = 8},
> >               {.src = 16, .len = 8}, {.src = 32, .len = 8},};
> >       struct copy_range cr;
> >
> >       cr.dest = 64;
> >       cr.nr_range = 3;
> >       cr.range_list = (__u64)&source_range;
> >
> >       fd = open("/dev/nvme0n1", O_RDWR);
> >       if (fd < 0) return 1;
> >
> >       ret = ioctl(fd, BLKCOPY, &cr);
> >       if (ret < 0) printf("copy failure\n");
> >
> >       close(fd);
> >
> >       return ret;
> > }
> >
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > ---
> >  block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
> >  include/uapi/linux/fs.h |  8 ++++++++
> >  2 files changed, 41 insertions(+)
> >
> > diff --git a/block/ioctl.c b/block/ioctl.c
> > index eb0491e90b9a..2af56d01e9fe 100644
> > --- a/block/ioctl.c
> > +++ b/block/ioctl.c
> > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
> >                                   GFP_KERNEL, flags);
> >  }
> >
> > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> > +             unsigned long arg)
> > +{
> > +     struct copy_range crange;
> > +     struct range_entry *rlist;
> > +     int ret;
> > +
> > +     if (!(mode & FMODE_WRITE))
> > +             return -EBADF;
> > +
> > +     if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> > +             return -EFAULT;
> > +
> > +     rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
> > +                     GFP_KERNEL);
> > +     if (!rlist)
> > +             return -ENOMEM;
> > +
> > +     if (copy_from_user(rlist, (void __user *)crange.range_list,
> > +                             sizeof(*rlist) * crange.nr_range)) {
> > +             ret = -EFAULT;
> > +             goto out;
> > +     }
> > +
> > +     ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest,
> > +                     GFP_KERNEL, 0);
> > +out:
> > +     kfree(rlist);
> > +     return ret;
> > +}
> > +
> >  static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> >               unsigned long arg)
> >  {
> > @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
> >       case BLKSECDISCARD:
> >               return blk_ioctl_discard(bdev, mode, arg,
> >                               BLKDEV_DISCARD_SECURE);
> > +     case BLKCOPY:
> > +             return blk_ioctl_copy(bdev, mode, arg);
> >       case BLKZEROOUT:
> >               return blk_ioctl_zeroout(bdev, mode, arg);
> >       case BLKGETDISKSEQ:
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index 7a97b588d892..4183688ff398 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -76,6 +76,13 @@ struct range_entry {
> >       __u64 len;
> >  };
> >
> > +struct copy_range {
> > +     __u64 dest;
> > +     __u64 nr_range;
>
> If the maximum number of elements in the range list is 1<<12, there's no
> need for this to be larger than a u16, right?
>
> > +     __u64 range_list;
>
> Pointers embedded in a structure are /not/ a good idea, because this
> will create a lot of compatibility headaches for 32-bit binaries running
> on 64-bit kernels.  Please just make the size of this header structure
> a multiple of 8 bytes and put the range_entry list immediately after it.
>
> struct copy_range {
>         __s64 dest_offset;
>         __u32 nr_range_entries;
>         __u32 flags;
>         __u64 reserved[2];
> };
>
> struct __user range_entry *re = ((struct range_entry *)(copyhead + 1));
>
> copy_from_user(&urk, re...);
>
> --D
>
Thanks, this is better. 'Reserved' field was there to be used for
future extension of the interface.
Now that you mentioned 'flags', it seems we can do away with
'reserved' fields altogether?

Regards,
Nitesh Shetty

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 0/7] add simple copy support
  2021-08-17 23:37   ` [dm-devel] [PATCH 0/7] add simple copy support Darrick J. Wong
@ 2021-08-18 15:40     ` Nitesh Shetty
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Shetty @ 2021-08-18 15:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi,
	willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

On Wed, Aug 18, 2021 at 5:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Aug 17, 2021 at 03:44:16PM +0530, SelvaKumar S wrote:
> > This started out as an attempt to support NVMe Simple Copy Command (SCC),
> > and evolved during the RFC review process.
> >
> > The patchset, at this point, contains -
> > 1. SCC support in NVMe driver
> > 2. Block-layer infra for copy-offload operation
> > 3. ioctl interface to user-space
> > 4. copy-emulation infra in the block-layer
> > 5. copy-offload plumbing to dm-kcopyd (thus creating couple of in-kernel
> >       users such as dm-clone)
> >
> >
> > The SCC specification, i.e. TP4065a can be found in following link
> >
> > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip
> >
> > Simple copy is a copy offloading feature and can be used to copy multiple
> > contiguous ranges (source_ranges) of LBA's to a single destination LBA
> > within the device, reducing traffic between host and device.
> >
> > We define a block ioctl for copy and copy payload format similar to
> > discard. For device supporting native simple copy, we attach the control
> > information as payload to the bio and submit to the device. Copy emulation
> > is implemented incase underlaying device does not support copy offload or
> > based on sysfs choice. Copy emulation is done by reading each source range
> > into buffer and writing it to the destination.
>
> Seems useful.  Would you mind adapting the loop driver to call
> copy_file_range (for files) so that anyone interested in making a
> filesystem use this capability (cough) can write fstests?

We are planning to look into copy_file_range plumbing after settling
on the current series,
 which already became heavyweight for the first drop.

Nitesh Shetty
>
> --D
>
> > At present this implementation does not support copy offload for stacked/dm
> > devices, rather copy operation is completed through emulation.
> >
> > One of the in-kernel use case for copy-offload is implemented by this
> > patchset. dm-kcopyd infra has been changed to leverage the copy-offload if
> > it is natively available. Other future use-cases could be F2FS GC, BTRFS
> > relocation/balance and copy_file_range.
> >
> > Following limits are added to queue limits and are exposed via sysfs to
> > userspace, which user can use to form a payload.
> >
> >  - copy_offload:
> >       configurable, can be used set emulation or copy offload
> >               0 to disable copy offload,
> >               1 to enable copy offloading support. Offload can be only
> >                       enabled, if underlaying device supports offload
> >
> >  - max_copy_sectors:
> >       total copy length supported by copy offload feature in device.
> >       0 indicates copy offload is not supported.
> >
> >  - max_copy_nr_ranges:
> >       maximum number of source range entries supported by copy offload
> >                       feature in device
> >
> >  - max_copy_range_sectors:
> >       maximum copy length per source range entry
> >
> > *blkdev_issue_copy* takes source bdev, no of sources, array of source
> > ranges (in sectors), destination bdev and destination offset(in sectors).
> > If both source and destination block devices are same and queue parameter
> > copy_offload is 1, then copy is done through native copy offloading.
> > Copy emulation is used in otherwise.
> >
> > Changes from  RFC v5
> >
> > 1. Handle copy larger than maximum copy limits
> > 2. Create copy context and submit copy offload asynchronously
> > 3. Remove BLKDEV_COPY_NOEMULATION opt-in option of copy offload and
> > check for copy support before submission from dm and other layers
> > 4. Allocate maximum possible allocatable buffer for copy emulation
> > rather failing very large copy offload.
> > 5. Fix copy_offload sysfs to be either have 0 or 1
> >
> > Changes from RFC v4
> >
> > 1. Extend dm-kcopyd to leverage copy-offload, while copying within the
> > same device. The other approach was to have copy-emulation by moving
> > dm-kcopyd to block layer. But it also required moving core dm-io infra,
> > causing a massive churn across multiple dm-targets.
> > 2. Remove export in bio_map_kern()
> > 3. Change copy_offload sysfs to accept 0 or else
> > 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
> > 5. Rename payload entries, add source bdev field to be used while
> > partition remapping, remove copy_size
> > 6. Change the blkdev_issue_copy() interface to accept destination and
> > source values in sector rather in bytes
> > 7. Add payload to bio using bio_map_kern() for copy_offload case
> > 8. Add check to return error if one of the source range length is 0
> > 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy
> > emulation incase of copy offload is not supported. Caller can his use
> > his existing copying logic to complete the io.
> > 10. Bug fix copy checks and reduce size of rcu_lock()
> >
> > Changes from RFC v3
> >
> > 1. gfp_flag fixes.
> > 2. Export bio_map_kern() and use it to allocate and add pages to bio.
> > 3. Move copy offload, reading to buf, writing from buf to separate functions
> > 4. Send read bio of copy offload by chaining them and submit asynchronously
> > 5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy().
> > 6. Move single source range limit check to blk_check_copy()
> > 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper.
> > 8. Change blkdev_issue_copy() interface generic to accepts destination bdev
> >         to support XCOPY as well.
> > 9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory.
> > 10. Fix buf allocoation logic to allocate buffer for the total size of copy.
> > 11. Reword patch commit description.
> >
> > Changes from RFC v2
> >
> > 1. Add emulation support for devices not supporting copy.
> > 2. Add *copy_offload* sysfs entry to enable and disable copy_offload
> >         in devices supporting simple copy.
> > 3. Remove simple copy support for stacked devices.
> >
> > Changes from RFC v1:
> >
> > 1. Fix memory leak in __blkdev_issue_copy
> > 2. Unmark blk_check_copy inline
> > 3. Fix line break in blk_check_copy_eod
> > 4. Remove p checks and made code more readable
> > 5. Don't use bio_set_op_attrs and remove op and set
> >    bi_opf directly
> > 6. Use struct_size to calculate total_size
> > 7. Fix partition remap of copy destination
> > 8. Remove mcl,mssrl,msrc from nvme_ns
> > 9. Initialize copy queue limits to 0 in nvme_config_copy
> > 10. Remove return in QUEUE_FLAG_COPY check
> > 11. Remove unused OCFS
> >
> >
> > Nitesh Shetty (4):
> >   block: Introduce queue limits for copy-offload support
> >   block: copy offload support infrastructure
> >   block: Introduce a new ioctl for simple copy
> >   block: add emulation for simple copy
> >
> > SelvaKumar S (3):
> >   block: make bio_map_kern() non static
> >   nvme: add simple copy support
> >   dm kcopyd: add simple copy offload support
> >
> >  block/blk-core.c          |  84 ++++++++-
> >  block/blk-lib.c           | 352 ++++++++++++++++++++++++++++++++++++++
> >  block/blk-map.c           |   2 +-
> >  block/blk-settings.c      |   4 +
> >  block/blk-sysfs.c         |  51 ++++++
> >  block/blk-zoned.c         |   1 +
> >  block/bounce.c            |   1 +
> >  block/ioctl.c             |  33 ++++
> >  drivers/md/dm-kcopyd.c    |  56 +++++-
> >  drivers/nvme/host/core.c  |  83 +++++++++
> >  drivers/nvme/host/trace.c |  19 ++
> >  include/linux/bio.h       |   1 +
> >  include/linux/blk_types.h |  20 +++
> >  include/linux/blkdev.h    |  21 +++
> >  include/linux/nvme.h      |  43 ++++-
> >  include/uapi/linux/fs.h   |  20 +++
> >  16 files changed, 775 insertions(+), 16 deletions(-)
> >
> > --
> > 2.25.1
> >

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy
  2021-08-18 15:37         ` Nitesh Shetty
@ 2021-08-18 16:17           ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-08-18 16:17 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi,
	willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block,
	mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k,
	martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel,
	joshiiitr, asml.silence

On Wed, Aug 18, 2021 at 09:07:54PM +0530, Nitesh Shetty wrote:
> On Wed, Aug 18, 2021 at 5:06 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote:
> > > From: Nitesh Shetty <nj.shetty@samsung.com>
> > >
> > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> > > to a destination in the device. COPY ioctl accepts a 'copy_range'
> > > structure that contains destination (in sectors), no of sources and
> > > pointer to the array of source ranges. Each source range is represented by
> > > 'range_entry' that contains start and length of source ranges (in sectors)
> > >
> > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> > > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> > >
> > > Example code, to issue BLKCOPY:
> > > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to
> > >  * [64,24], on the same device */
> > >
> > > int main(void)
> > > {
> > >       int ret, fd;
> > >       struct range_entry source_range[] = {{.src = 0, .len = 8},
> > >               {.src = 16, .len = 8}, {.src = 32, .len = 8},};
> > >       struct copy_range cr;
> > >
> > >       cr.dest = 64;
> > >       cr.nr_range = 3;
> > >       cr.range_list = (__u64)&source_range;
> > >
> > >       fd = open("/dev/nvme0n1", O_RDWR);
> > >       if (fd < 0) return 1;
> > >
> > >       ret = ioctl(fd, BLKCOPY, &cr);
> > >       if (ret < 0) printf("copy failure\n");
> > >
> > >       close(fd);
> > >
> > >       return ret;
> > > }
> > >
> > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > > ---
> > >  block/ioctl.c           | 33 +++++++++++++++++++++++++++++++++
> > >  include/uapi/linux/fs.h |  8 ++++++++
> > >  2 files changed, 41 insertions(+)
> > >
> > > diff --git a/block/ioctl.c b/block/ioctl.c
> > > index eb0491e90b9a..2af56d01e9fe 100644
> > > --- a/block/ioctl.c
> > > +++ b/block/ioctl.c
> > > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
> > >                                   GFP_KERNEL, flags);
> > >  }
> > >
> > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> > > +             unsigned long arg)
> > > +{
> > > +     struct copy_range crange;
> > > +     struct range_entry *rlist;
> > > +     int ret;
> > > +
> > > +     if (!(mode & FMODE_WRITE))
> > > +             return -EBADF;
> > > +
> > > +     if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> > > +             return -EFAULT;
> > > +
> > > +     rlist = kmalloc_array(crange.nr_range, sizeof(*rlist),
> > > +                     GFP_KERNEL);
> > > +     if (!rlist)
> > > +             return -ENOMEM;
> > > +
> > > +     if (copy_from_user(rlist, (void __user *)crange.range_list,
> > > +                             sizeof(*rlist) * crange.nr_range)) {
> > > +             ret = -EFAULT;
> > > +             goto out;
> > > +     }
> > > +
> > > +     ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest,
> > > +                     GFP_KERNEL, 0);
> > > +out:
> > > +     kfree(rlist);
> > > +     return ret;
> > > +}
> > > +
> > >  static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> > >               unsigned long arg)
> > >  {
> > > @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
> > >       case BLKSECDISCARD:
> > >               return blk_ioctl_discard(bdev, mode, arg,
> > >                               BLKDEV_DISCARD_SECURE);
> > > +     case BLKCOPY:
> > > +             return blk_ioctl_copy(bdev, mode, arg);
> > >       case BLKZEROOUT:
> > >               return blk_ioctl_zeroout(bdev, mode, arg);
> > >       case BLKGETDISKSEQ:
> > > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > > index 7a97b588d892..4183688ff398 100644
> > > --- a/include/uapi/linux/fs.h
> > > +++ b/include/uapi/linux/fs.h
> > > @@ -76,6 +76,13 @@ struct range_entry {
> > >       __u64 len;
> > >  };
> > >
> > > +struct copy_range {
> > > +     __u64 dest;
> > > +     __u64 nr_range;
> >
> > If the maximum number of elements in the range list is 1<<12, there's no
> > need for this to be larger than a u16, right?
> >
> > > +     __u64 range_list;
> >
> > Pointers embedded in a structure are /not/ a good idea, because this
> > will create a lot of compatibility headaches for 32-bit binaries running
> > on 64-bit kernels.  Please just make the size of this header structure
> > a multiple of 8 bytes and put the range_entry list immediately after it.
> >
> > struct copy_range {
> >         __s64 dest_offset;
> >         __u32 nr_range_entries;
> >         __u32 flags;
> >         __u64 reserved[2];
> > };
> >
> > struct __user range_entry *re = ((struct range_entry *)(copyhead + 1));
> >
> > copy_from_user(&urk, re...);
> >
> > --D
> >
> Thanks, this is better. 'Reserved' field was there to be used for
> future extension of the interface.
> Now that you mentioned 'flags', it seems we can do away with
> 'reserved' fields altogether?

We still want the reserved-must-be-zero fields so that adding the first
field or two doesn't require changes to the pointer arithmetic.

Also, I suppose you could make the relationship between copy_range and
range_entry more explicit:

struct copy_range {
        __s64 dest_offset;
        __u32 nr_range_entries;
        __u32 flags;
        __u64 reserved[2];

	/* must come last */
	struct range_entry entries[];
};

struct __user range_entry *re = &copyhead->entries[0];

--D

> 
> Regards,
> Nitesh Shetty

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 10:14     ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S
  2021-08-17 17:14       ` Bart Van Assche
  2021-08-17 20:35       ` kernel test robot
@ 2021-08-18 18:35       ` Martin K. Petersen
  2021-08-20 11:11         ` Kanchan Joshi
  2 siblings, 1 reply; 30+ messages in thread
From: Martin K. Petersen @ 2021-08-18 18:35 UTC (permalink / raw)
  To: SelvaKumar S
  Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche,
	linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove,
	linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal,
	joshi.k, martin.petersen, linux-api, johannes.thumshirn,
	linux-fsdevel, joshiiitr, asml.silence


> Native copy offload is not supported for stacked devices.

One of the main reasons that the historic attempts at supporting copy
offload did not get merged was that the ubiquitous deployment scenario,
stacked block devices, was not handled well.

Pitfalls surrounding stacking has been brought up several times in
response to your series. It is critically important that both kernel
plumbing and user-facing interfaces are defined in a way that works for
the most common use cases. This includes copying between block devices
and handling block device stacking. Stacking being one of the most
fundamental operating principles of the Linux block layer!

Proposing a brand new interface that out of the gate is incompatible
with both stacking and the copy offload capability widely implemented in
shipping hardware makes little sense. While NVMe currently only supports
copy operations inside a single namespace, it is surely only a matter of
time before that restriction is lifted.

Changing existing interfaces is painful, especially when these are
exposed to userland. We obviously can't predict every field or feature
that may be needed in the future. But we should at the very least build
the infrastructure around what already exists. And that's where the
proposed design falls short...

-- 
Martin K. Petersen	Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-17 17:14       ` Bart Van Assche
  2021-08-17 20:41         ` Mikulas Patocka
@ 2021-08-20 10:39         ` Kanchan Joshi
  2021-08-20 21:18           ` Bart Van Assche
  1 sibling, 1 reply; 30+ messages in thread
From: Kanchan Joshi @ 2021-08-20 10:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel,
	Christoph Hellwig, agk, linux-scsi, nitheshshetty,
	Matthew Wilcox, Nitesh Shetty, kch, SelvaKumar S, Selva Jove,
	linux-block, mpatocka, Keith Busch, Jens Axboe, Damien Le Moal,
	Martin K. Petersen, KANCHAN JOSHI, linux-api, Johannes Thumshirn,
	linux-fsdevel, Javier Gonzalez, Pavel Begunkov

Bart, Mikulas

On Tue, Aug 17, 2021 at 10:44 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 8/17/21 3:14 AM, SelvaKumar S wrote:
> > Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
> > bio with control information as payload and submit to the device.
> > Larger copy operation may be divided if necessary by looking at device
> > limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
> > submitted to zoned device.
> > Native copy offload is not supported for stacked devices.
>
> Using a single operation for copy-offloading instead of separate
> operations for reading and writing is fundamentally incompatible with
> the device mapper. I think we need a copy-offloading implementation that
> is compatible with the device mapper.
>

While each read/write command is for a single contiguous range of
device, with simple-copy we get to operate on multiple discontiguous
ranges, with a single command.
That seemed like a good opportunity to reduce control-plane traffic
(compared to read/write operations) as well.

With a separate read-and-write bio approach, each source-range will
spawn at least one read, one write and eventually one SCC command. And
it only gets worse as there could be many such discontiguous ranges (for
GC use-case at least) coming from user-space in a single payload.
Overall sequence will be
- Receive a payload from user-space
- Disassemble into many read-write pair bios at block-layer
- Assemble those (somehow) in NVMe to reduce simple-copy commands
- Send commands to device

We thought payload could be a good way to reduce the
disassembly/assembly work and traffic between block-layer to nvme.
How do you see this tradeoff?  What seems necessary for device-mapper
usecase, appears to be a cost when device-mapper isn't used.
Especially for SCC (since copy is within single ns), device-mappers
may not be too compelling anyway.

Must device-mapper support be a requirement for the initial support atop SCC?
Or do you think it will still be a progress if we finalize the
user-space interface to cover all that is foreseeable.And for
device-mapper compatible transport between block-layer and NVMe - we
do it in the later stage when NVMe too comes up with better copy
capabilities?


-- 
Joshi

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-18 18:35       ` Martin K. Petersen
@ 2021-08-20 11:11         ` Kanchan Joshi
  0 siblings, 0 replies; 30+ messages in thread
From: Kanchan Joshi @ 2021-08-20 11:11 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: snitzer, djwong, linux-nvme, dm-devel, Christoph Hellwig, agk,
	bvanassche, linux-scsi, nitheshshetty, Matthew Wilcox,
	Nitesh Shetty, kch, SelvaKumar S, Selva Jove, linux-block,
	mpatocka, Keith Busch, Jens Axboe, Damien Le Moal, KANCHAN JOSHI,
	linux-api, Johannes Thumshirn, linux-fsdevel, Javier Gonzalez,
	Pavel Begunkov

On Thu, Aug 19, 2021 at 12:05 AM Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
>
> > Native copy offload is not supported for stacked devices.
>
> One of the main reasons that the historic attempts at supporting copy
> offload did not get merged was that the ubiquitous deployment scenario,
> stacked block devices, was not handled well.
>
> Pitfalls surrounding stacking has been brought up several times in
> response to your series. It is critically important that both kernel
> plumbing and user-facing interfaces are defined in a way that works for
> the most common use cases. This includes copying between block devices
> and handling block device stacking. Stacking being one of the most
> fundamental operating principles of the Linux block layer!
>
> Proposing a brand new interface that out of the gate is incompatible
> with both stacking and the copy offload capability widely implemented in
> shipping hardware makes little sense. While NVMe currently only supports
> copy operations inside a single namespace, it is surely only a matter of
> time before that restriction is lifted.
>
> Changing existing interfaces is painful, especially when these are
> exposed to userland. We obviously can't predict every field or feature
> that may be needed in the future. But we should at the very least build
> the infrastructure around what already exists. And that's where the
> proposed design falls short...
>
Certainly, on user-space interface. We've got few cracks to be filled
there, missing the future viability.
But on stacking, can that be additive. Could you please take a look at
the other response (comment from Bart) for the trade-offs.


-- 
Joshi

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-20 10:39         ` Kanchan Joshi
@ 2021-08-20 21:18           ` Bart Van Assche
  2021-08-26  7:46             ` Nitesh Shetty
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-20 21:18 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel,
	Christoph Hellwig, agk, linux-scsi, nitheshshetty,
	Matthew Wilcox, Nitesh Shetty, kch, SelvaKumar S, Selva Jove,
	linux-block, mpatocka, Keith Busch, Jens Axboe, Damien Le Moal,
	Martin K. Petersen, KANCHAN JOSHI, linux-api, Johannes Thumshirn,
	linux-fsdevel, Javier Gonzalez, Pavel Begunkov

On 8/20/21 3:39 AM, Kanchan Joshi wrote:
> Bart, Mikulas
> 
> On Tue, Aug 17, 2021 at 10:44 PM Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> On 8/17/21 3:14 AM, SelvaKumar S wrote:
>>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
>>> bio with control information as payload and submit to the device.
>>> Larger copy operation may be divided if necessary by looking at device
>>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
>>> submitted to zoned device.
>>> Native copy offload is not supported for stacked devices.
>>
>> Using a single operation for copy-offloading instead of separate
>> operations for reading and writing is fundamentally incompatible with
>> the device mapper. I think we need a copy-offloading implementation that
>> is compatible with the device mapper.
>>
> 
> While each read/write command is for a single contiguous range of
> device, with simple-copy we get to operate on multiple discontiguous
> ranges, with a single command.
> That seemed like a good opportunity to reduce control-plane traffic
> (compared to read/write operations) as well.
> 
> With a separate read-and-write bio approach, each source-range will
> spawn at least one read, one write and eventually one SCC command. And
> it only gets worse as there could be many such discontiguous ranges (for
> GC use-case at least) coming from user-space in a single payload.
> Overall sequence will be
> - Receive a payload from user-space
> - Disassemble into many read-write pair bios at block-layer
> - Assemble those (somehow) in NVMe to reduce simple-copy commands
> - Send commands to device
> 
> We thought payload could be a good way to reduce the
> disassembly/assembly work and traffic between block-layer to nvme.
> How do you see this tradeoff?  What seems necessary for device-mapper
> usecase, appears to be a cost when device-mapper isn't used.
> Especially for SCC (since copy is within single ns), device-mappers
> may not be too compelling anyway.
> 
> Must device-mapper support be a requirement for the initial support atop SCC?
> Or do you think it will still be a progress if we finalize the
> user-space interface to cover all that is foreseeable.And for
> device-mapper compatible transport between block-layer and NVMe - we
> do it in the later stage when NVMe too comes up with better copy
> capabilities?

Hi Kanchan,

These days there might be more systems that run the device mapper on top 
of the NVMe driver or a SCSI driver than systems that do use the device 
mapper. It is common practice these days to use dm-crypt on personal 
workstations and laptops. LVM (dm-linear) is popular because it is more 
flexible than a traditional partition table. Android phones use 
dm-verity on top of hardware encryption. In other words, not supporting 
the device mapper means that a very large number of use cases is 
excluded. So I think supporting the device mapper from the start is 
important, even if that means combining individual bios at the bottom of 
the storage stack into simple copy commands.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure
  2021-08-20 21:18           ` Bart Van Assche
@ 2021-08-26  7:46             ` Nitesh Shetty
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Shetty @ 2021-08-26  7:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Snitzer, Greg Kroah-Hartman, Darrick J. Wong, linux-nvme,
	dm-devel, Christoph Hellwig, agk, linux-scsi, Matthew Wilcox,
	Nitesh Shetty, kch, SelvaKumar S, Selva Jove, linux-block,
	mpatocka, Javier Gonzalez, Keith Busch, Jens Axboe,
	Damien Le Moal, Martin K. Petersen, KANCHAN JOSHI, linux-api,
	Johannes Thumshirn, linux-fsdevel, Kanchan Joshi, Pavel Begunkov

Hi Bart,Mikulas,Martin,Douglas,

We will go through your previous work and use this thread as a medium for
further discussion, if we come across issues to be sorted out.

Thank you,
Nitesh Shetty

On Sat, Aug 21, 2021 at 2:48 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 8/20/21 3:39 AM, Kanchan Joshi wrote:
> > Bart, Mikulas
> >
> > On Tue, Aug 17, 2021 at 10:44 PM Bart Van Assche <bvanassche@acm.org> wrote:
> >>
> >> On 8/17/21 3:14 AM, SelvaKumar S wrote:
> >>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
> >>> bio with control information as payload and submit to the device.
> >>> Larger copy operation may be divided if necessary by looking at device
> >>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when
> >>> submitted to zoned device.
> >>> Native copy offload is not supported for stacked devices.
> >>
> >> Using a single operation for copy-offloading instead of separate
> >> operations for reading and writing is fundamentally incompatible with
> >> the device mapper. I think we need a copy-offloading implementation that
> >> is compatible with the device mapper.
> >>
> >
> > While each read/write command is for a single contiguous range of
> > device, with simple-copy we get to operate on multiple discontiguous
> > ranges, with a single command.
> > That seemed like a good opportunity to reduce control-plane traffic
> > (compared to read/write operations) as well.
> >
> > With a separate read-and-write bio approach, each source-range will
> > spawn at least one read, one write and eventually one SCC command. And
> > it only gets worse as there could be many such discontiguous ranges (for
> > GC use-case at least) coming from user-space in a single payload.
> > Overall sequence will be
> > - Receive a payload from user-space
> > - Disassemble into many read-write pair bios at block-layer
> > - Assemble those (somehow) in NVMe to reduce simple-copy commands
> > - Send commands to device
> >
> > We thought payload could be a good way to reduce the
> > disassembly/assembly work and traffic between block-layer to nvme.
> > How do you see this tradeoff?  What seems necessary for device-mapper
> > usecase, appears to be a cost when device-mapper isn't used.
> > Especially for SCC (since copy is within single ns), device-mappers
> > may not be too compelling anyway.
> >
> > Must device-mapper support be a requirement for the initial support atop SCC?
> > Or do you think it will still be a progress if we finalize the
> > user-space interface to cover all that is foreseeable.And for
> > device-mapper compatible transport between block-layer and NVMe - we
> > do it in the later stage when NVMe too comes up with better copy
> > capabilities?
>
> Hi Kanchan,
>
> These days there might be more systems that run the device mapper on top
> of the NVMe driver or a SCSI driver than systems that do use the device
> mapper. It is common practice these days to use dm-crypt on personal
> workstations and laptops. LVM (dm-linear) is popular because it is more
> flexible than a traditional partition table. Android phones use
> dm-verity on top of hardware encryption. In other words, not supporting
> the device mapper means that a very large number of use cases is
> excluded. So I think supporting the device mapper from the start is
> important, even if that means combining individual bios at the bottom of
> the storage stack into simple copy commands.
>
> Thanks,
>
> Bart.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-08-30  6:53 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210817101741epcas5p174ca0a539587da6a67b9f58cd13f2bad@epcas5p1.samsung.com>
2021-08-17 10:14 ` [dm-devel] [PATCH 0/7] add simple copy support SelvaKumar S
     [not found]   ` <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 1/7] block: make bio_map_kern() non static SelvaKumar S
     [not found]   ` <CGME20210817101753epcas5p4f4257f8edda27e184ecbb273b700ccbc@epcas5p4.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support SelvaKumar S
2021-08-17 13:08       ` Greg KH
2021-08-17 14:42         ` Nitesh Shetty
     [not found]   ` <CGME20210817101758epcas5p1ec353b3838d64654e69488229256d9eb@epcas5p1.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S
2021-08-17 17:14       ` Bart Van Assche
2021-08-17 20:41         ` Mikulas Patocka
2021-08-17 21:53           ` Douglas Gilbert
2021-08-17 22:06             ` Bart Van Assche
2021-08-20 10:39         ` Kanchan Joshi
2021-08-20 21:18           ` Bart Van Assche
2021-08-26  7:46             ` Nitesh Shetty
2021-08-17 20:35       ` kernel test robot
2021-08-18 18:35       ` Martin K. Petersen
2021-08-20 11:11         ` Kanchan Joshi
     [not found]   ` <CGME20210817101803epcas5p10cda1d52f8a8f1172e34b1f9cf8eef3b@epcas5p1.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S
2021-08-17 13:09       ` Greg KH
2021-08-17 13:10       ` Greg KH
2021-08-17 14:48         ` Nitesh Shetty
2021-08-17 23:36       ` Darrick J. Wong
2021-08-18 15:37         ` Nitesh Shetty
2021-08-18 16:17           ` Darrick J. Wong
     [not found]   ` <CGME20210817101809epcas5p39eed3531ed82f5f08127eb3dba1fc50f@epcas5p3.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 5/7] block: add emulation " SelvaKumar S
2021-08-17 22:10       ` kernel test robot
     [not found]   ` <CGME20210817101814epcas5p41db3d7269f5139efcaf2ca685cd04a16@epcas5p4.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 6/7] nvme: add simple copy support SelvaKumar S
     [not found]   ` <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com>
2021-08-17 10:14     ` [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support SelvaKumar S
2021-08-17 20:29       ` Mikulas Patocka
2021-08-17 23:37   ` [dm-devel] [PATCH 0/7] add simple copy support Darrick J. Wong
2021-08-18 15:40     ` Nitesh Shetty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).