* [dm-devel] [PATCH 0/7] add simple copy support [not found] <CGME20210817101741epcas5p174ca0a539587da6a67b9f58cd13f2bad@epcas5p1.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S [not found] ` <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com> ` (7 more replies) 0 siblings, 8 replies; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence This started out as an attempt to support NVMe Simple Copy Command (SCC), and evolved during the RFC review process. The patchset, at this point, contains - 1. SCC support in NVMe driver 2. Block-layer infra for copy-offload operation 3. ioctl interface to user-space 4. copy-emulation infra in the block-layer 5. copy-offload plumbing to dm-kcopyd (thus creating couple of in-kernel users such as dm-clone) The SCC specification, i.e. TP4065a can be found in following link https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip Simple copy is a copy offloading feature and can be used to copy multiple contiguous ranges (source_ranges) of LBA's to a single destination LBA within the device, reducing traffic between host and device. We define a block ioctl for copy and copy payload format similar to discard. For device supporting native simple copy, we attach the control information as payload to the bio and submit to the device. Copy emulation is implemented incase underlaying device does not support copy offload or based on sysfs choice. Copy emulation is done by reading each source range into buffer and writing it to the destination. At present this implementation does not support copy offload for stacked/dm devices, rather copy operation is completed through emulation. One of the in-kernel use case for copy-offload is implemented by this patchset. dm-kcopyd infra has been changed to leverage the copy-offload if it is natively available. Other future use-cases could be F2FS GC, BTRFS relocation/balance and copy_file_range. Following limits are added to queue limits and are exposed via sysfs to userspace, which user can use to form a payload. - copy_offload: configurable, can be used set emulation or copy offload 0 to disable copy offload, 1 to enable copy offloading support. Offload can be only enabled, if underlaying device supports offload - max_copy_sectors: total copy length supported by copy offload feature in device. 0 indicates copy offload is not supported. - max_copy_nr_ranges: maximum number of source range entries supported by copy offload feature in device - max_copy_range_sectors: maximum copy length per source range entry *blkdev_issue_copy* takes source bdev, no of sources, array of source ranges (in sectors), destination bdev and destination offset(in sectors). If both source and destination block devices are same and queue parameter copy_offload is 1, then copy is done through native copy offloading. Copy emulation is used in otherwise. Changes from RFC v5 1. Handle copy larger than maximum copy limits 2. Create copy context and submit copy offload asynchronously 3. Remove BLKDEV_COPY_NOEMULATION opt-in option of copy offload and check for copy support before submission from dm and other layers 4. Allocate maximum possible allocatable buffer for copy emulation rather failing very large copy offload. 5. Fix copy_offload sysfs to be either have 0 or 1 Changes from RFC v4 1. Extend dm-kcopyd to leverage copy-offload, while copying within the same device. The other approach was to have copy-emulation by moving dm-kcopyd to block layer. But it also required moving core dm-io infra, causing a massive churn across multiple dm-targets. 2. Remove export in bio_map_kern() 3. Change copy_offload sysfs to accept 0 or else 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY 5. Rename payload entries, add source bdev field to be used while partition remapping, remove copy_size 6. Change the blkdev_issue_copy() interface to accept destination and source values in sector rather in bytes 7. Add payload to bio using bio_map_kern() for copy_offload case 8. Add check to return error if one of the source range length is 0 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy emulation incase of copy offload is not supported. Caller can his use his existing copying logic to complete the io. 10. Bug fix copy checks and reduce size of rcu_lock() Changes from RFC v3 1. gfp_flag fixes. 2. Export bio_map_kern() and use it to allocate and add pages to bio. 3. Move copy offload, reading to buf, writing from buf to separate functions 4. Send read bio of copy offload by chaining them and submit asynchronously 5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy(). 6. Move single source range limit check to blk_check_copy() 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper. 8. Change blkdev_issue_copy() interface generic to accepts destination bdev to support XCOPY as well. 9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory. 10. Fix buf allocoation logic to allocate buffer for the total size of copy. 11. Reword patch commit description. Changes from RFC v2 1. Add emulation support for devices not supporting copy. 2. Add *copy_offload* sysfs entry to enable and disable copy_offload in devices supporting simple copy. 3. Remove simple copy support for stacked devices. Changes from RFC v1: 1. Fix memory leak in __blkdev_issue_copy 2. Unmark blk_check_copy inline 3. Fix line break in blk_check_copy_eod 4. Remove p checks and made code more readable 5. Don't use bio_set_op_attrs and remove op and set bi_opf directly 6. Use struct_size to calculate total_size 7. Fix partition remap of copy destination 8. Remove mcl,mssrl,msrc from nvme_ns 9. Initialize copy queue limits to 0 in nvme_config_copy 10. Remove return in QUEUE_FLAG_COPY check 11. Remove unused OCFS Nitesh Shetty (4): block: Introduce queue limits for copy-offload support block: copy offload support infrastructure block: Introduce a new ioctl for simple copy block: add emulation for simple copy SelvaKumar S (3): block: make bio_map_kern() non static nvme: add simple copy support dm kcopyd: add simple copy offload support block/blk-core.c | 84 ++++++++- block/blk-lib.c | 352 ++++++++++++++++++++++++++++++++++++++ block/blk-map.c | 2 +- block/blk-settings.c | 4 + block/blk-sysfs.c | 51 ++++++ block/blk-zoned.c | 1 + block/bounce.c | 1 + block/ioctl.c | 33 ++++ drivers/md/dm-kcopyd.c | 56 +++++- drivers/nvme/host/core.c | 83 +++++++++ drivers/nvme/host/trace.c | 19 ++ include/linux/bio.h | 1 + include/linux/blk_types.h | 20 +++ include/linux/blkdev.h | 21 +++ include/linux/nvme.h | 43 ++++- include/uapi/linux/fs.h | 20 +++ 16 files changed, 775 insertions(+), 16 deletions(-) -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com>]
* [dm-devel] [PATCH 1/7] block: make bio_map_kern() non static [not found] ` <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 0 siblings, 0 replies; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence Make bio_map_kern() non static, so that copy offload/emulation can use it to add vmalloced memory to bio. Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- block/blk-map.c | 2 +- include/linux/blkdev.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index d1448aaad980..1b97b9b503f4 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -336,7 +336,7 @@ static void bio_map_kern_endio(struct bio *bio) * Map the kernel address into a bio suitable for io to a block * device. Returns an error pointer in case of error. */ -static struct bio *bio_map_kern(struct request_queue *q, void *data, +struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, gfp_t gfp_mask) { unsigned long kaddr = (unsigned long)data; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index df404c1fb087..28a193225cf2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -903,6 +903,8 @@ extern int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); extern int blk_rq_unmap_user(struct bio *); +struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, + gfp_t gfp_mask); extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t); extern int blk_rq_map_user_iov(struct request_queue *, struct request *, struct rq_map_data *, const struct iov_iter *, -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101753epcas5p4f4257f8edda27e184ecbb273b700ccbc@epcas5p4.samsung.com>]
* [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support [not found] ` <CGME20210817101753epcas5p4f4257f8edda27e184ecbb273b700ccbc@epcas5p4.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 2021-08-17 13:08 ` Greg KH 0 siblings, 1 reply; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence From: Nitesh Shetty <nj.shetty@samsung.com> Add device limits as sysfs entries, - copy_offload (READ_WRITE) - max_copy_sectors (READ_ONLY) - max_copy_ranges_sectors (READ_ONLY) - max_copy_nr_ranges (READ_ONLY) copy_offload(= 0), is disabled by default. This needs to be enabled if copy-offload needs to be used. max_copy_sectors = 0, indicates the device doesn't support native copy. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- block/blk-settings.c | 4 ++++ block/blk-sysfs.c | 51 ++++++++++++++++++++++++++++++++++++++++++ include/linux/blkdev.h | 6 +++++ 3 files changed, 61 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index 3613d2cc0688..ac59922e0cde 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim) lim->misaligned = 0; lim->zoned = BLK_ZONED_NONE; lim->zone_write_granularity = 0; + lim->copy_offload = 0; + lim->max_copy_sectors = 0; + lim->max_copy_nr_ranges = 0; + lim->max_copy_range_sectors = 0; } EXPORT_SYMBOL(blk_set_default_limits); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 1832587dce3a..aabab65f10ab 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -169,6 +169,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag return queue_var_show(q->limits.discard_granularity, page); } +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.copy_offload, page); +} + +static ssize_t queue_copy_offload_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long copy_offload; + ssize_t ret = queue_var_store(©_offload, page, count); + + if (ret < 0) + return ret; + + if (copy_offload && q->limits.max_copy_sectors == 0) + return -EINVAL; + + if (copy_offload) + q->limits.copy_offload = BLK_COPY_OFFLOAD_SCC; + else + q->limits.copy_offload = 0; + + return ret; +} + +static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.max_copy_sectors, page); +} + +static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q, + char *page) +{ + return queue_var_show(q->limits.max_copy_range_sectors, page); +} + +static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q, + char *page) +{ + return queue_var_show(q->limits.max_copy_nr_ranges, page); +} + static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page) { @@ -611,6 +653,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload"); +QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors"); +QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors"); +QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges"); + QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); QUEUE_RW_ENTRY(queue_poll, "io_poll"); @@ -657,6 +704,10 @@ static struct attribute *queue_attrs[] = { &queue_discard_max_entry.attr, &queue_discard_max_hw_entry.attr, &queue_discard_zeroes_data_entry.attr, + &queue_copy_offload_entry.attr, + &queue_max_copy_sectors_entry.attr, + &queue_max_copy_range_sectors_entry.attr, + &queue_max_copy_nr_ranges_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, &queue_zone_append_max_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 28a193225cf2..fd4cfaadda5b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -321,10 +321,14 @@ struct queue_limits { unsigned int discard_granularity; unsigned int discard_alignment; unsigned int zone_write_granularity; + unsigned int copy_offload; + unsigned int max_copy_sectors; unsigned short max_segments; unsigned short max_integrity_segments; unsigned short max_discard_segments; + unsigned short max_copy_range_sectors; + unsigned short max_copy_nr_ranges; unsigned char misaligned; unsigned char discard_misaligned; @@ -600,6 +604,7 @@ struct request_queue { #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ #define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is active */ #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ +#define QUEUE_FLAG_SIMPLE_COPY 30 /* supports simple copy */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \ @@ -622,6 +627,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags) #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags) #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) +#define blk_queue_copy(q) test_bit(QUEUE_FLAG_SIMPLE_COPY, &(q)->queue_flags) #define blk_queue_zone_resetall(q) \ test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags) #define blk_queue_secure_erase(q) \ -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support 2021-08-17 10:14 ` [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support SelvaKumar S @ 2021-08-17 13:08 ` Greg KH 2021-08-17 14:42 ` Nitesh Shetty 0 siblings, 1 reply; 30+ messages in thread From: Greg KH @ 2021-08-17 13:08 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 03:44:18PM +0530, SelvaKumar S wrote: > From: Nitesh Shetty <nj.shetty@samsung.com> > > Add device limits as sysfs entries, > - copy_offload (READ_WRITE) > - max_copy_sectors (READ_ONLY) > - max_copy_ranges_sectors (READ_ONLY) > - max_copy_nr_ranges (READ_ONLY) You forgot to add Documentation/ABI/ entries for your new sysfs files, so we can't properly review them :( thanks, greg k-h -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support 2021-08-17 13:08 ` Greg KH @ 2021-08-17 14:42 ` Nitesh Shetty 0 siblings, 0 replies; 30+ messages in thread From: Nitesh Shetty @ 2021-08-17 14:42 UTC (permalink / raw) To: Greg KH Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 6:38 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Tue, Aug 17, 2021 at 03:44:18PM +0530, SelvaKumar S wrote: > > From: Nitesh Shetty <nj.shetty@samsung.com> > > > > Add device limits as sysfs entries, > > - copy_offload (READ_WRITE) > > - max_copy_sectors (READ_ONLY) > > - max_copy_ranges_sectors (READ_ONLY) > > - max_copy_nr_ranges (READ_ONLY) > > You forgot to add Documentation/ABI/ entries for your new sysfs files, > so we can't properly review them :( > Sorry, we will update the documentation in the next version. Thanks, Nitesh Shetty -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101758epcas5p1ec353b3838d64654e69488229256d9eb@epcas5p1.samsung.com>]
* [dm-devel] [PATCH 3/7] block: copy offload support infrastructure [not found] ` <CGME20210817101758epcas5p1ec353b3838d64654e69488229256d9eb@epcas5p1.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 2021-08-17 17:14 ` Bart Van Assche ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence From: Nitesh Shetty <nj.shetty@samsung.com> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create bio with control information as payload and submit to the device. Larger copy operation may be divided if necessary by looking at device limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when submitted to zoned device. Native copy offload is not supported for stacked devices. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> --- block/blk-core.c | 84 ++++++++++++- block/blk-lib.c | 252 ++++++++++++++++++++++++++++++++++++++ block/blk-zoned.c | 1 + block/bounce.c | 1 + include/linux/bio.h | 1 + include/linux/blk_types.h | 20 +++ include/linux/blkdev.h | 13 ++ include/uapi/linux/fs.h | 12 ++ 8 files changed, 378 insertions(+), 6 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index d2722ecd4d9b..541b1561b4af 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -704,6 +704,17 @@ static noinline int should_fail_bio(struct bio *bio) } ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO); +static inline int bio_check_copy_eod(struct bio *bio, sector_t start, + sector_t nr_sectors, sector_t max_sect) +{ + if (nr_sectors && max_sect && + (nr_sectors > max_sect || start > max_sect - nr_sectors)) { + handle_bad_sector(bio, max_sect); + return -EIO; + } + return 0; +} + /* * Check whether this bio extends beyond the end of the device or partition. * This may well happen - the kernel calls bread() without checking the size of @@ -723,6 +734,61 @@ static inline int bio_check_eod(struct bio *bio) return 0; } +/* + * check for eod limits and remap ranges if needed + */ +static int blk_check_copy(struct bio *bio) +{ + struct blk_copy_payload *payload = bio_data(bio); + sector_t dst_max_sect, dst_start_sect, copy_size = 0; + sector_t src_max_sect, src_start_sect; + struct block_device *bd_part; + int i, ret = -EIO; + + rcu_read_lock(); + + bd_part = bio->bi_bdev; + if (unlikely(!bd_part)) + goto err; + + dst_max_sect = bdev_nr_sectors(bd_part); + dst_start_sect = bd_part->bd_start_sect; + + src_max_sect = bdev_nr_sectors(payload->src_bdev); + src_start_sect = payload->src_bdev->bd_start_sect; + + if (unlikely(should_fail_request(bd_part, bio->bi_iter.bi_size))) + goto err; + + if (unlikely(bio_check_ro(bio))) + goto err; + + rcu_read_unlock(); + + for (i = 0; i < payload->copy_nr_ranges; i++) { + ret = bio_check_copy_eod(bio, payload->range[i].src, + payload->range[i].len, src_max_sect); + if (unlikely(ret)) + goto out; + + payload->range[i].src += src_start_sect; + copy_size += payload->range[i].len; + } + + /* check if copy length crosses eod */ + ret = bio_check_copy_eod(bio, bio->bi_iter.bi_sector, + copy_size, dst_max_sect); + if (unlikely(ret)) + goto out; + + bio->bi_iter.bi_sector += dst_start_sect; + return 0; +err: + rcu_read_unlock(); +out: + return ret; +} + /* * Remap block n of partition p to block n+start(p) of the disk. */ @@ -799,13 +865,15 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) if (should_fail_bio(bio)) goto end_io; - if (unlikely(bio_check_ro(bio))) - goto end_io; - if (!bio_flagged(bio, BIO_REMAPPED)) { - if (unlikely(bio_check_eod(bio))) - goto end_io; - if (bdev->bd_partno && unlikely(blk_partition_remap(bio))) + if (likely(!op_is_copy(bio->bi_opf))) { + if (unlikely(bio_check_ro(bio))) goto end_io; + if (!bio_flagged(bio, BIO_REMAPPED)) { + if (unlikely(bio_check_eod(bio))) + goto end_io; + if (bdev->bd_partno && unlikely(blk_partition_remap(bio))) + goto end_io; + } } /* @@ -829,6 +897,10 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) if (!blk_queue_discard(q)) goto not_supported; break; + case REQ_OP_COPY: + if (unlikely(blk_check_copy(bio))) + goto end_io; + break; case REQ_OP_SECURE_ERASE: if (!blk_queue_secure_erase(q)) goto not_supported; diff --git a/block/blk-lib.c b/block/blk-lib.c index 9f09beadcbe3..7fee0ae95c44 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -151,6 +151,258 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } EXPORT_SYMBOL(blkdev_issue_discard); +/* + * Wait on and process all in-flight BIOs. This must only be called once + * all bios have been issued so that the refcount can only decrease. + * This just waits for all bios to make it through cio_bio_end_io. IO + * errors are propagated through cio->io_error. + */ +static int cio_await_completion(struct cio *cio) +{ + int ret = 0; + + while (atomic_read(&cio->refcount)) { + cio->waiter = current; + __set_current_state(TASK_UNINTERRUPTIBLE); + blk_io_schedule(); + /* wake up sets us TASK_RUNNING */ + cio->waiter = NULL; + ret = cio->io_err; + } + kvfree(cio); + + return ret; +} + +/* + * The BIO completion handler simply decrements refcount. + * Also wake up process, if this is the last bio to be completed. + * + * During I/O bi_private points at the cio. + */ +static void cio_bio_end_io(struct bio *bio) +{ + struct cio *cio = bio->bi_private; + + if (bio->bi_status) + cio->io_err = bio->bi_status; + kvfree(page_address(bio_first_bvec_all(bio)->bv_page) + + bio_first_bvec_all(bio)->bv_offset); + bio_put(bio); + + if (atomic_dec_and_test(&cio->refcount) && cio->waiter) + wake_up_process(cio->waiter); +} + +int blk_copy_offload_submit_bio(struct block_device *bdev, + struct blk_copy_payload *payload, int payload_size, + struct cio *cio, gfp_t gfp_mask) +{ + struct request_queue *q = bdev_get_queue(bdev); + struct bio *bio; + + bio = bio_map_kern(q, payload, payload_size, gfp_mask); + if (IS_ERR(bio)) + return PTR_ERR(bio); + + bio_set_dev(bio, bdev); + bio->bi_opf = REQ_OP_COPY | REQ_NOMERGE; + bio->bi_iter.bi_sector = payload->dest; + bio->bi_end_io = cio_bio_end_io; + bio->bi_private = cio; + atomic_inc(&cio->refcount); + submit_bio(bio); + + return 0; +} + +/* Go through all the enrties inside user provided payload, and determine the + * maximum number of entries in a payload, based on device's scc-limits. + */ +static inline int blk_max_payload_entries(int nr_srcs, struct range_entry *rlist, + int max_nr_srcs, sector_t max_copy_range_sectors, sector_t max_copy_len) +{ + sector_t range_len, copy_len = 0, remaining = 0; + int ri = 0, pi = 1, max_pi = 0; + + for (ri = 0; ri < nr_srcs; ri++) { + for (remaining = rlist[ri].len; remaining > 0; remaining -= range_len) { + range_len = min3(remaining, max_copy_range_sectors, + max_copy_len - copy_len); + pi++; + copy_len += range_len; + + if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) { + max_pi = max(max_pi, pi); + pi = 1; + copy_len = 0; + } + } + } + + return max(max_pi, pi); +} + +/* + * blk_copy_offload_scc - Use device's native copy offload feature + * Go through user provide payload, prepare new payload based on device's copy offload limits. + */ +int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, + struct range_entry *rlist, struct block_device *dest_bdev, + sector_t dest, gfp_t gfp_mask) +{ + struct request_queue *q = bdev_get_queue(dest_bdev); + struct cio *cio = NULL; + struct blk_copy_payload *payload; + sector_t range_len, copy_len = 0, remaining = 0; + sector_t src_blk, cdest = dest; + sector_t max_copy_range_sectors, max_copy_len; + int ri = 0, pi = 0, ret = 0, payload_size, max_pi, max_nr_srcs; + + cio = kzalloc(sizeof(struct cio), GFP_KERNEL); + if (!cio) + return -ENOMEM; + atomic_set(&cio->refcount, 0); + + max_nr_srcs = q->limits.max_copy_nr_ranges; + max_copy_range_sectors = q->limits.max_copy_range_sectors; + max_copy_len = q->limits.max_copy_sectors; + + max_pi = blk_max_payload_entries(nr_srcs, rlist, max_nr_srcs, + max_copy_range_sectors, max_copy_len); + payload_size = struct_size(payload, range, max_pi); + + payload = kvmalloc(payload_size, gfp_mask); + if (!payload) { + ret = -ENOMEM; + goto free_cio; + } + payload->src_bdev = src_bdev; + + for (ri = 0; ri < nr_srcs; ri++) { + for (remaining = rlist[ri].len, src_blk = rlist[ri].src; remaining > 0; + remaining -= range_len, src_blk += range_len) { + + range_len = min3(remaining, max_copy_range_sectors, + max_copy_len - copy_len); + payload->range[pi].len = range_len; + payload->range[pi].src = src_blk; + pi++; + copy_len += range_len; + + /* Submit current payload, if crossing device copy limits */ + if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) { + payload->dest = cdest; + payload->copy_nr_ranges = pi; + ret = blk_copy_offload_submit_bio(dest_bdev, payload, + payload_size, cio, gfp_mask); + if (ret) + goto free_payload; + + /* reset index, length and allocate new payload */ + pi = 0; + cdest += copy_len; + copy_len = 0; + payload = kvmalloc(payload_size, gfp_mask); + if (!payload) { + ret = -ENOMEM; + goto free_cio; + } + payload->src_bdev = src_bdev; + } + } + } + + if (pi) { + payload->dest = cdest; + payload->copy_nr_ranges = pi; + ret = blk_copy_offload_submit_bio(dest_bdev, payload, payload_size, cio, gfp_mask); + if (ret) + goto free_payload; + } + + /* Wait for completion of all IO's*/ + ret = cio_await_completion(cio); + + return ret; + +free_payload: + kvfree(payload); +free_cio: + cio_await_completion(cio); + return ret; +} + +static inline sector_t blk_copy_len(struct range_entry *rlist, int nr_srcs) +{ + int i; + sector_t len = 0; + + for (i = 0; i < nr_srcs; i++) { + if (rlist[i].len) + len += rlist[i].len; + else + return 0; + } + + return len; +} + +static inline bool blk_check_offload_scc(struct request_queue *src_q, + struct request_queue *dest_q) +{ + if (src_q == dest_q && src_q->limits.copy_offload == BLK_COPY_OFFLOAD_SCC) + return true; + + return false; +} + +/* + * blkdev_issue_copy - queue a copy + * @src_bdev: source block device + * @nr_srcs: number of source ranges to copy + * @src_rlist: array of source ranges + * @dest_bdev: destination block device + * @dest: destination in sector + * @gfp_mask: memory allocation flags (for bio_alloc) + * @flags: BLKDEV_COPY_* flags to control behaviour + * + * Description: + * Copy source ranges from source block device to destination block device. + * length of a source range cannot be zero. + */ +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs, + struct range_entry *src_rlist, struct block_device *dest_bdev, + sector_t dest, gfp_t gfp_mask, int flags) +{ + struct request_queue *src_q = bdev_get_queue(src_bdev); + struct request_queue *dest_q = bdev_get_queue(dest_bdev); + sector_t copy_len; + int ret = -EINVAL; + + if (!src_q || !dest_q) + return -ENXIO; + + if (!nr_srcs) + return -EINVAL; + + if (nr_srcs >= MAX_COPY_NR_RANGE) + return -EINVAL; + + copy_len = blk_copy_len(src_rlist, nr_srcs); + if (!copy_len && copy_len >= MAX_COPY_TOTAL_LENGTH) + return -EINVAL; + + if (bdev_read_only(dest_bdev)) + return -EPERM; + + if (blk_check_offload_scc(src_q, dest_q)) + ret = blk_copy_offload_scc(src_bdev, nr_srcs, src_rlist, dest_bdev, dest, gfp_mask); + + return ret; +} +EXPORT_SYMBOL(blkdev_issue_copy); + /** * __blkdev_issue_write_same - generate number of bios with same page * @bdev: target blockdev diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 86fce751bb17..7643fc868521 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -67,6 +67,7 @@ bool blk_req_needs_zone_write_lock(struct request *rq) case REQ_OP_WRITE_ZEROES: case REQ_OP_WRITE_SAME: case REQ_OP_WRITE: + case REQ_OP_COPY: return blk_rq_zone_is_seq(rq); default: return false; diff --git a/block/bounce.c b/block/bounce.c index 05fc7148489d..d9b05aaf6e56 100644 --- a/block/bounce.c +++ b/block/bounce.c @@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src) bio->bi_iter.bi_size = bio_src->bi_iter.bi_size; switch (bio_op(bio)) { + case REQ_OP_COPY: case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: case REQ_OP_WRITE_ZEROES: diff --git a/include/linux/bio.h b/include/linux/bio.h index 3d67d0fbc868..068fa2e8896a 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -73,6 +73,7 @@ static inline bool bio_has_data(struct bio *bio) static inline bool bio_no_advance_iter(const struct bio *bio) { return bio_op(bio) == REQ_OP_DISCARD || + bio_op(bio) == REQ_OP_COPY || bio_op(bio) == REQ_OP_SECURE_ERASE || bio_op(bio) == REQ_OP_WRITE_SAME || bio_op(bio) == REQ_OP_WRITE_ZEROES; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 9e392daa1d7f..1ab77176cb46 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -347,6 +347,8 @@ enum req_opf { REQ_OP_ZONE_RESET = 15, /* reset all the zone present on the device */ REQ_OP_ZONE_RESET_ALL = 17, + /* copy ranges within device */ + REQ_OP_COPY = 19, /* Driver private requests */ REQ_OP_DRV_IN = 34, @@ -470,6 +472,11 @@ static inline bool op_is_discard(unsigned int op) return (op & REQ_OP_MASK) == REQ_OP_DISCARD; } +static inline bool op_is_copy(unsigned int op) +{ + return (op & REQ_OP_MASK) == REQ_OP_COPY; +} + /* * Check if a bio or request operation is a zone management operation, with * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case @@ -529,4 +536,17 @@ struct blk_rq_stat { u64 batch; }; +struct cio { + atomic_t refcount; + blk_status_t io_err; + struct task_struct *waiter; /* waiting task (NULL if none) */ +}; + +struct blk_copy_payload { + struct block_device *src_bdev; + sector_t dest; + int copy_nr_ranges; + struct range_entry range[]; +}; + #endif /* __LINUX_BLK_TYPES_H */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index fd4cfaadda5b..38369dff6a36 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -52,6 +52,12 @@ struct blk_keyslot_manager; /* Doing classic polling */ #define BLK_MQ_POLL_CLASSIC -1 +/* Define copy offload options */ +enum blk_copy { + BLK_COPY_OFFLOAD_EMULATE = 0, + BLK_COPY_OFFLOAD_SCC, +}; + /* * Maximum number of blkcg policies allowed to be registered concurrently. * Defined here to simplify include dependency. @@ -1051,6 +1057,9 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, return min(q->limits.max_discard_sectors, UINT_MAX >> SECTOR_SHIFT); + if (unlikely(op == REQ_OP_COPY)) + return q->limits.max_copy_sectors; + if (unlikely(op == REQ_OP_WRITE_SAME)) return q->limits.max_write_same_sectors; @@ -1326,6 +1335,10 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, int flags, struct bio **biop); +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs, + struct range_entry *src_rlist, struct block_device *dest_bdev, + sector_t dest, gfp_t gfp_mask, int flags); + #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */ #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */ diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index bdf7b404b3e7..7a97b588d892 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -64,6 +64,18 @@ struct fstrim_range { __u64 minlen; }; +/* Maximum no of entries supported */ +#define MAX_COPY_NR_RANGE (1 << 12) + +/* maximum total copy length */ +#define MAX_COPY_TOTAL_LENGTH (1 << 21) + +/* Source range entry for copy */ +struct range_entry { + __u64 src; + __u64 len; +}; + /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */ #define FILE_DEDUPE_RANGE_SAME 0 #define FILE_DEDUPE_RANGE_DIFFERS 1 -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 10:14 ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S @ 2021-08-17 17:14 ` Bart Van Assche 2021-08-17 20:41 ` Mikulas Patocka 2021-08-20 10:39 ` Kanchan Joshi 2021-08-17 20:35 ` kernel test robot 2021-08-18 18:35 ` Martin K. Petersen 2 siblings, 2 replies; 30+ messages in thread From: Bart Van Assche @ 2021-08-17 17:14 UTC (permalink / raw) To: SelvaKumar S, linux-nvme, linux-block Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, dm-devel, hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, Martin K. Petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On 8/17/21 3:14 AM, SelvaKumar S wrote: > Introduce REQ_OP_COPY, a no-merge copy offload operation. Create > bio with control information as payload and submit to the device. > Larger copy operation may be divided if necessary by looking at device > limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when > submitted to zoned device. > Native copy offload is not supported for stacked devices. Using a single operation for copy-offloading instead of separate operations for reading and writing is fundamentally incompatible with the device mapper. I think we need a copy-offloading implementation that is compatible with the device mapper. Storing the parameters of the copy operation in the bio payload is incompatible with the current implementation of bio_split(). In other words, I think there are fundamental problems with this patch series. Bart. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 17:14 ` Bart Van Assche @ 2021-08-17 20:41 ` Mikulas Patocka 2021-08-17 21:53 ` Douglas Gilbert 2021-08-20 10:39 ` Kanchan Joshi 1 sibling, 1 reply; 30+ messages in thread From: Mikulas Patocka @ 2021-08-17 20:41 UTC (permalink / raw) To: Bart Van Assche Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel, hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, Martin K. Petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, 17 Aug 2021, Bart Van Assche wrote: > On 8/17/21 3:14 AM, SelvaKumar S wrote: > > Introduce REQ_OP_COPY, a no-merge copy offload operation. Create > > bio with control information as payload and submit to the device. > > Larger copy operation may be divided if necessary by looking at device > > limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when > > submitted to zoned device. > > Native copy offload is not supported for stacked devices. > > Using a single operation for copy-offloading instead of separate operations > for reading and writing is fundamentally incompatible with the device mapper. > I think we need a copy-offloading implementation that is compatible with the > device mapper. I once wrote a copy offload implementation that is compatible with device mapper. The copy operation creates two bios (one for reading and one for writing), passes them independently through device mapper and pairs them at the physical device driver. It's here: http://people.redhat.com/~mpatocka/patches/kernel/xcopy/current I verified that it works with iSCSI. Would you be interested in continuing this work? Mikulas > Storing the parameters of the copy operation in the bio payload is > incompatible with the current implementation of bio_split(). > > In other words, I think there are fundamental problems with this patch series. > > Bart. > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 20:41 ` Mikulas Patocka @ 2021-08-17 21:53 ` Douglas Gilbert 2021-08-17 22:06 ` Bart Van Assche 0 siblings, 1 reply; 30+ messages in thread From: Douglas Gilbert @ 2021-08-17 21:53 UTC (permalink / raw) To: Mikulas Patocka, Bart Van Assche Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel, hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, Martin K. Petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On 2021-08-17 4:41 p.m., Mikulas Patocka wrote: > > > On Tue, 17 Aug 2021, Bart Van Assche wrote: > >> On 8/17/21 3:14 AM, SelvaKumar S wrote: >>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create >>> bio with control information as payload and submit to the device. >>> Larger copy operation may be divided if necessary by looking at device >>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when >>> submitted to zoned device. >>> Native copy offload is not supported for stacked devices. >> >> Using a single operation for copy-offloading instead of separate operations >> for reading and writing is fundamentally incompatible with the device mapper. >> I think we need a copy-offloading implementation that is compatible with the >> device mapper. > > I once wrote a copy offload implementation that is compatible with device > mapper. The copy operation creates two bios (one for reading and one for > writing), passes them independently through device mapper and pairs them > at the physical device driver. > > It's here: http://people.redhat.com/~mpatocka/patches/kernel/xcopy/current In my copy solution the read-side and write-side bio pairs share the same storage (i.e. ram) This gets around the need to copy data between the bio_s. See: https://sg.danny.cz/sg/sg_v40.html in Section 8 on Request sharing. This technique can be efficiently extend to source --> destination1,destination2,... copies. Doug Gilbert > I verified that it works with iSCSI. Would you be interested in continuing > this work? > > Mikulas > >> Storing the parameters of the copy operation in the bio payload is >> incompatible with the current implementation of bio_split(). >> >> In other words, I think there are fundamental problems with this patch series. >> >> Bart. >> > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 21:53 ` Douglas Gilbert @ 2021-08-17 22:06 ` Bart Van Assche 0 siblings, 0 replies; 30+ messages in thread From: Bart Van Assche @ 2021-08-17 22:06 UTC (permalink / raw) To: dgilbert, Mikulas Patocka Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel, hch, agk, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, Martin K. Petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On 8/17/21 2:53 PM, Douglas Gilbert wrote: > On 2021-08-17 4:41 p.m., Mikulas Patocka wrote: >> On Tue, 17 Aug 2021, Bart Van Assche wrote: >>> On 8/17/21 3:14 AM, SelvaKumar S wrote: >>>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create >>>> bio with control information as payload and submit to the device. >>>> Larger copy operation may be divided if necessary by looking at device >>>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when >>>> submitted to zoned device. >>>> Native copy offload is not supported for stacked devices. >>> >>> Using a single operation for copy-offloading instead of separate >>> operations >>> for reading and writing is fundamentally incompatible with the device >>> mapper. >>> I think we need a copy-offloading implementation that is compatible >>> with the >>> device mapper. >> >> I once wrote a copy offload implementation that is compatible with device >> mapper. The copy operation creates two bios (one for reading and one for >> writing), passes them independently through device mapper and pairs them >> at the physical device driver. >> >> It's here: >> http://people.redhat.com/~mpatocka/patches/kernel/xcopy/current > > In my copy solution the read-side and write-side bio pairs share the > same storage (i.e. ram) This gets around the need to copy data between > the bio_s. > See: > https://sg.danny.cz/sg/sg_v40.html > in Section 8 on Request sharing. This technique can be efficiently > extend to > source --> destination1,destination2,... copies. > > Doug Gilbert > >> I verified that it works with iSCSI. Would you be interested in >> continuing >> this work? Hi Mikulas and Doug, Yes, I'm interested in continuing Mikulas' work on copy offloading. I will take a look at Doug's approach too for sharing buffers between read-side and write-side bios. It may take a few months however before I can find the time to work on this. Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 17:14 ` Bart Van Assche 2021-08-17 20:41 ` Mikulas Patocka @ 2021-08-20 10:39 ` Kanchan Joshi 2021-08-20 21:18 ` Bart Van Assche 1 sibling, 1 reply; 30+ messages in thread From: Kanchan Joshi @ 2021-08-20 10:39 UTC (permalink / raw) To: Bart Van Assche Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel, Christoph Hellwig, agk, linux-scsi, nitheshshetty, Matthew Wilcox, Nitesh Shetty, kch, SelvaKumar S, Selva Jove, linux-block, mpatocka, Keith Busch, Jens Axboe, Damien Le Moal, Martin K. Petersen, KANCHAN JOSHI, linux-api, Johannes Thumshirn, linux-fsdevel, Javier Gonzalez, Pavel Begunkov Bart, Mikulas On Tue, Aug 17, 2021 at 10:44 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On 8/17/21 3:14 AM, SelvaKumar S wrote: > > Introduce REQ_OP_COPY, a no-merge copy offload operation. Create > > bio with control information as payload and submit to the device. > > Larger copy operation may be divided if necessary by looking at device > > limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when > > submitted to zoned device. > > Native copy offload is not supported for stacked devices. > > Using a single operation for copy-offloading instead of separate > operations for reading and writing is fundamentally incompatible with > the device mapper. I think we need a copy-offloading implementation that > is compatible with the device mapper. > While each read/write command is for a single contiguous range of device, with simple-copy we get to operate on multiple discontiguous ranges, with a single command. That seemed like a good opportunity to reduce control-plane traffic (compared to read/write operations) as well. With a separate read-and-write bio approach, each source-range will spawn at least one read, one write and eventually one SCC command. And it only gets worse as there could be many such discontiguous ranges (for GC use-case at least) coming from user-space in a single payload. Overall sequence will be - Receive a payload from user-space - Disassemble into many read-write pair bios at block-layer - Assemble those (somehow) in NVMe to reduce simple-copy commands - Send commands to device We thought payload could be a good way to reduce the disassembly/assembly work and traffic between block-layer to nvme. How do you see this tradeoff? What seems necessary for device-mapper usecase, appears to be a cost when device-mapper isn't used. Especially for SCC (since copy is within single ns), device-mappers may not be too compelling anyway. Must device-mapper support be a requirement for the initial support atop SCC? Or do you think it will still be a progress if we finalize the user-space interface to cover all that is foreseeable.And for device-mapper compatible transport between block-layer and NVMe - we do it in the later stage when NVMe too comes up with better copy capabilities? -- Joshi -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-20 10:39 ` Kanchan Joshi @ 2021-08-20 21:18 ` Bart Van Assche 2021-08-26 7:46 ` Nitesh Shetty 0 siblings, 1 reply; 30+ messages in thread From: Bart Van Assche @ 2021-08-20 21:18 UTC (permalink / raw) To: Kanchan Joshi Cc: Mike Snitzer, Greg Kroah-Hartman, djwong, linux-nvme, dm-devel, Christoph Hellwig, agk, linux-scsi, nitheshshetty, Matthew Wilcox, Nitesh Shetty, kch, SelvaKumar S, Selva Jove, linux-block, mpatocka, Keith Busch, Jens Axboe, Damien Le Moal, Martin K. Petersen, KANCHAN JOSHI, linux-api, Johannes Thumshirn, linux-fsdevel, Javier Gonzalez, Pavel Begunkov On 8/20/21 3:39 AM, Kanchan Joshi wrote: > Bart, Mikulas > > On Tue, Aug 17, 2021 at 10:44 PM Bart Van Assche <bvanassche@acm.org> wrote: >> >> On 8/17/21 3:14 AM, SelvaKumar S wrote: >>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create >>> bio with control information as payload and submit to the device. >>> Larger copy operation may be divided if necessary by looking at device >>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when >>> submitted to zoned device. >>> Native copy offload is not supported for stacked devices. >> >> Using a single operation for copy-offloading instead of separate >> operations for reading and writing is fundamentally incompatible with >> the device mapper. I think we need a copy-offloading implementation that >> is compatible with the device mapper. >> > > While each read/write command is for a single contiguous range of > device, with simple-copy we get to operate on multiple discontiguous > ranges, with a single command. > That seemed like a good opportunity to reduce control-plane traffic > (compared to read/write operations) as well. > > With a separate read-and-write bio approach, each source-range will > spawn at least one read, one write and eventually one SCC command. And > it only gets worse as there could be many such discontiguous ranges (for > GC use-case at least) coming from user-space in a single payload. > Overall sequence will be > - Receive a payload from user-space > - Disassemble into many read-write pair bios at block-layer > - Assemble those (somehow) in NVMe to reduce simple-copy commands > - Send commands to device > > We thought payload could be a good way to reduce the > disassembly/assembly work and traffic between block-layer to nvme. > How do you see this tradeoff? What seems necessary for device-mapper > usecase, appears to be a cost when device-mapper isn't used. > Especially for SCC (since copy is within single ns), device-mappers > may not be too compelling anyway. > > Must device-mapper support be a requirement for the initial support atop SCC? > Or do you think it will still be a progress if we finalize the > user-space interface to cover all that is foreseeable.And for > device-mapper compatible transport between block-layer and NVMe - we > do it in the later stage when NVMe too comes up with better copy > capabilities? Hi Kanchan, These days there might be more systems that run the device mapper on top of the NVMe driver or a SCSI driver than systems that do use the device mapper. It is common practice these days to use dm-crypt on personal workstations and laptops. LVM (dm-linear) is popular because it is more flexible than a traditional partition table. Android phones use dm-verity on top of hardware encryption. In other words, not supporting the device mapper means that a very large number of use cases is excluded. So I think supporting the device mapper from the start is important, even if that means combining individual bios at the bottom of the storage stack into simple copy commands. Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-20 21:18 ` Bart Van Assche @ 2021-08-26 7:46 ` Nitesh Shetty 0 siblings, 0 replies; 30+ messages in thread From: Nitesh Shetty @ 2021-08-26 7:46 UTC (permalink / raw) To: Bart Van Assche Cc: Mike Snitzer, Greg Kroah-Hartman, Darrick J. Wong, linux-nvme, dm-devel, Christoph Hellwig, agk, linux-scsi, Matthew Wilcox, Nitesh Shetty, kch, SelvaKumar S, Selva Jove, linux-block, mpatocka, Javier Gonzalez, Keith Busch, Jens Axboe, Damien Le Moal, Martin K. Petersen, KANCHAN JOSHI, linux-api, Johannes Thumshirn, linux-fsdevel, Kanchan Joshi, Pavel Begunkov Hi Bart,Mikulas,Martin,Douglas, We will go through your previous work and use this thread as a medium for further discussion, if we come across issues to be sorted out. Thank you, Nitesh Shetty On Sat, Aug 21, 2021 at 2:48 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 8/20/21 3:39 AM, Kanchan Joshi wrote: > > Bart, Mikulas > > > > On Tue, Aug 17, 2021 at 10:44 PM Bart Van Assche <bvanassche@acm.org> wrote: > >> > >> On 8/17/21 3:14 AM, SelvaKumar S wrote: > >>> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create > >>> bio with control information as payload and submit to the device. > >>> Larger copy operation may be divided if necessary by looking at device > >>> limits. REQ_OP_COPY(19) is a write op and takes zone_write_lock when > >>> submitted to zoned device. > >>> Native copy offload is not supported for stacked devices. > >> > >> Using a single operation for copy-offloading instead of separate > >> operations for reading and writing is fundamentally incompatible with > >> the device mapper. I think we need a copy-offloading implementation that > >> is compatible with the device mapper. > >> > > > > While each read/write command is for a single contiguous range of > > device, with simple-copy we get to operate on multiple discontiguous > > ranges, with a single command. > > That seemed like a good opportunity to reduce control-plane traffic > > (compared to read/write operations) as well. > > > > With a separate read-and-write bio approach, each source-range will > > spawn at least one read, one write and eventually one SCC command. And > > it only gets worse as there could be many such discontiguous ranges (for > > GC use-case at least) coming from user-space in a single payload. > > Overall sequence will be > > - Receive a payload from user-space > > - Disassemble into many read-write pair bios at block-layer > > - Assemble those (somehow) in NVMe to reduce simple-copy commands > > - Send commands to device > > > > We thought payload could be a good way to reduce the > > disassembly/assembly work and traffic between block-layer to nvme. > > How do you see this tradeoff? What seems necessary for device-mapper > > usecase, appears to be a cost when device-mapper isn't used. > > Especially for SCC (since copy is within single ns), device-mappers > > may not be too compelling anyway. > > > > Must device-mapper support be a requirement for the initial support atop SCC? > > Or do you think it will still be a progress if we finalize the > > user-space interface to cover all that is foreseeable.And for > > device-mapper compatible transport between block-layer and NVMe - we > > do it in the later stage when NVMe too comes up with better copy > > capabilities? > > Hi Kanchan, > > These days there might be more systems that run the device mapper on top > of the NVMe driver or a SCSI driver than systems that do use the device > mapper. It is common practice these days to use dm-crypt on personal > workstations and laptops. LVM (dm-linear) is popular because it is more > flexible than a traditional partition table. Android phones use > dm-verity on top of hardware encryption. In other words, not supporting > the device mapper means that a very large number of use cases is > excluded. So I think supporting the device mapper from the start is > important, even if that means combining individual bios at the bottom of > the storage stack into simple copy commands. > > Thanks, > > Bart. > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 10:14 ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S 2021-08-17 17:14 ` Bart Van Assche @ 2021-08-17 20:35 ` kernel test robot 2021-08-18 18:35 ` Martin K. Petersen 2 siblings, 0 replies; 30+ messages in thread From: kernel test robot @ 2021-08-17 20:35 UTC (permalink / raw) To: SelvaKumar S, linux-nvme, linux-block Cc: axboe, damien.lemoal, kbuild-all, linux-scsi, linux-api, kbusch, clang-built-linux, dm-devel, linux-fsdevel, asml.silence [-- Attachment #1: Type: text/plain, Size: 7342 bytes --] Hi SelvaKumar, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on block/for-next] [also build test WARNING on dm/for-next linus/master v5.14-rc6 next-20210817] [cannot apply to linux-nvme/for-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next config: hexagon-randconfig-r013-20210816 (attached as .config) compiler: clang version 12.0.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/35fc502a7f20a7cd42432cee2777a621c40a3bd3 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111 git checkout 35fc502a7f20a7cd42432cee2777a621c40a3bd3 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): >> block/blk-lib.c:197:5: warning: no previous prototype for function 'blk_copy_offload_submit_bio' [-Wmissing-prototypes] int blk_copy_offload_submit_bio(struct block_device *bdev, ^ block/blk-lib.c:197:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int blk_copy_offload_submit_bio(struct block_device *bdev, ^ static >> block/blk-lib.c:250:5: warning: no previous prototype for function 'blk_copy_offload_scc' [-Wmissing-prototypes] int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, ^ block/blk-lib.c:250:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, ^ static 2 warnings generated. vim +/blk_copy_offload_submit_bio +197 block/blk-lib.c 196 > 197 int blk_copy_offload_submit_bio(struct block_device *bdev, 198 struct blk_copy_payload *payload, int payload_size, 199 struct cio *cio, gfp_t gfp_mask) 200 { 201 struct request_queue *q = bdev_get_queue(bdev); 202 struct bio *bio; 203 204 bio = bio_map_kern(q, payload, payload_size, gfp_mask); 205 if (IS_ERR(bio)) 206 return PTR_ERR(bio); 207 208 bio_set_dev(bio, bdev); 209 bio->bi_opf = REQ_OP_COPY | REQ_NOMERGE; 210 bio->bi_iter.bi_sector = payload->dest; 211 bio->bi_end_io = cio_bio_end_io; 212 bio->bi_private = cio; 213 atomic_inc(&cio->refcount); 214 submit_bio(bio); 215 216 return 0; 217 } 218 219 /* Go through all the enrties inside user provided payload, and determine the 220 * maximum number of entries in a payload, based on device's scc-limits. 221 */ 222 static inline int blk_max_payload_entries(int nr_srcs, struct range_entry *rlist, 223 int max_nr_srcs, sector_t max_copy_range_sectors, sector_t max_copy_len) 224 { 225 sector_t range_len, copy_len = 0, remaining = 0; 226 int ri = 0, pi = 1, max_pi = 0; 227 228 for (ri = 0; ri < nr_srcs; ri++) { 229 for (remaining = rlist[ri].len; remaining > 0; remaining -= range_len) { 230 range_len = min3(remaining, max_copy_range_sectors, 231 max_copy_len - copy_len); 232 pi++; 233 copy_len += range_len; 234 235 if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) { 236 max_pi = max(max_pi, pi); 237 pi = 1; 238 copy_len = 0; 239 } 240 } 241 } 242 243 return max(max_pi, pi); 244 } 245 246 /* 247 * blk_copy_offload_scc - Use device's native copy offload feature 248 * Go through user provide payload, prepare new payload based on device's copy offload limits. 249 */ > 250 int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, 251 struct range_entry *rlist, struct block_device *dest_bdev, 252 sector_t dest, gfp_t gfp_mask) 253 { 254 struct request_queue *q = bdev_get_queue(dest_bdev); 255 struct cio *cio = NULL; 256 struct blk_copy_payload *payload; 257 sector_t range_len, copy_len = 0, remaining = 0; 258 sector_t src_blk, cdest = dest; 259 sector_t max_copy_range_sectors, max_copy_len; 260 int ri = 0, pi = 0, ret = 0, payload_size, max_pi, max_nr_srcs; 261 262 cio = kzalloc(sizeof(struct cio), GFP_KERNEL); 263 if (!cio) 264 return -ENOMEM; 265 atomic_set(&cio->refcount, 0); 266 267 max_nr_srcs = q->limits.max_copy_nr_ranges; 268 max_copy_range_sectors = q->limits.max_copy_range_sectors; 269 max_copy_len = q->limits.max_copy_sectors; 270 271 max_pi = blk_max_payload_entries(nr_srcs, rlist, max_nr_srcs, 272 max_copy_range_sectors, max_copy_len); 273 payload_size = struct_size(payload, range, max_pi); 274 275 payload = kvmalloc(payload_size, gfp_mask); 276 if (!payload) { 277 ret = -ENOMEM; 278 goto free_cio; 279 } 280 payload->src_bdev = src_bdev; 281 282 for (ri = 0; ri < nr_srcs; ri++) { 283 for (remaining = rlist[ri].len, src_blk = rlist[ri].src; remaining > 0; 284 remaining -= range_len, src_blk += range_len) { 285 286 range_len = min3(remaining, max_copy_range_sectors, 287 max_copy_len - copy_len); 288 payload->range[pi].len = range_len; 289 payload->range[pi].src = src_blk; 290 pi++; 291 copy_len += range_len; 292 293 /* Submit current payload, if crossing device copy limits */ 294 if ((pi == max_nr_srcs) || (copy_len == max_copy_len)) { 295 payload->dest = cdest; 296 payload->copy_nr_ranges = pi; 297 ret = blk_copy_offload_submit_bio(dest_bdev, payload, 298 payload_size, cio, gfp_mask); 299 if (ret) 300 goto free_payload; 301 302 /* reset index, length and allocate new payload */ 303 pi = 0; 304 cdest += copy_len; 305 copy_len = 0; 306 payload = kvmalloc(payload_size, gfp_mask); 307 if (!payload) { 308 ret = -ENOMEM; 309 goto free_cio; 310 } 311 payload->src_bdev = src_bdev; 312 } 313 } 314 } 315 316 if (pi) { 317 payload->dest = cdest; 318 payload->copy_nr_ranges = pi; 319 ret = blk_copy_offload_submit_bio(dest_bdev, payload, payload_size, cio, gfp_mask); 320 if (ret) 321 goto free_payload; 322 } 323 324 /* Wait for completion of all IO's*/ 325 ret = cio_await_completion(cio); 326 327 return ret; 328 329 free_payload: 330 kvfree(payload); 331 free_cio: 332 cio_await_completion(cio); 333 return ret; 334 } 335 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 22159 bytes --] [-- Attachment #3: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-17 10:14 ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S 2021-08-17 17:14 ` Bart Van Assche 2021-08-17 20:35 ` kernel test robot @ 2021-08-18 18:35 ` Martin K. Petersen 2021-08-20 11:11 ` Kanchan Joshi 2 siblings, 1 reply; 30+ messages in thread From: Martin K. Petersen @ 2021-08-18 18:35 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence > Native copy offload is not supported for stacked devices. One of the main reasons that the historic attempts at supporting copy offload did not get merged was that the ubiquitous deployment scenario, stacked block devices, was not handled well. Pitfalls surrounding stacking has been brought up several times in response to your series. It is critically important that both kernel plumbing and user-facing interfaces are defined in a way that works for the most common use cases. This includes copying between block devices and handling block device stacking. Stacking being one of the most fundamental operating principles of the Linux block layer! Proposing a brand new interface that out of the gate is incompatible with both stacking and the copy offload capability widely implemented in shipping hardware makes little sense. While NVMe currently only supports copy operations inside a single namespace, it is surely only a matter of time before that restriction is lifted. Changing existing interfaces is painful, especially when these are exposed to userland. We obviously can't predict every field or feature that may be needed in the future. But we should at the very least build the infrastructure around what already exists. And that's where the proposed design falls short... -- Martin K. Petersen Oracle Linux Engineering -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 3/7] block: copy offload support infrastructure 2021-08-18 18:35 ` Martin K. Petersen @ 2021-08-20 11:11 ` Kanchan Joshi 0 siblings, 0 replies; 30+ messages in thread From: Kanchan Joshi @ 2021-08-20 11:11 UTC (permalink / raw) To: Martin K. Petersen Cc: snitzer, djwong, linux-nvme, dm-devel, Christoph Hellwig, agk, bvanassche, linux-scsi, nitheshshetty, Matthew Wilcox, Nitesh Shetty, kch, SelvaKumar S, Selva Jove, linux-block, mpatocka, Keith Busch, Jens Axboe, Damien Le Moal, KANCHAN JOSHI, linux-api, Johannes Thumshirn, linux-fsdevel, Javier Gonzalez, Pavel Begunkov On Thu, Aug 19, 2021 at 12:05 AM Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > > Native copy offload is not supported for stacked devices. > > One of the main reasons that the historic attempts at supporting copy > offload did not get merged was that the ubiquitous deployment scenario, > stacked block devices, was not handled well. > > Pitfalls surrounding stacking has been brought up several times in > response to your series. It is critically important that both kernel > plumbing and user-facing interfaces are defined in a way that works for > the most common use cases. This includes copying between block devices > and handling block device stacking. Stacking being one of the most > fundamental operating principles of the Linux block layer! > > Proposing a brand new interface that out of the gate is incompatible > with both stacking and the copy offload capability widely implemented in > shipping hardware makes little sense. While NVMe currently only supports > copy operations inside a single namespace, it is surely only a matter of > time before that restriction is lifted. > > Changing existing interfaces is painful, especially when these are > exposed to userland. We obviously can't predict every field or feature > that may be needed in the future. But we should at the very least build > the infrastructure around what already exists. And that's where the > proposed design falls short... > Certainly, on user-space interface. We've got few cracks to be filled there, missing the future viability. But on stacking, can that be additive. Could you please take a look at the other response (comment from Bart) for the trade-offs. -- Joshi -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101803epcas5p10cda1d52f8a8f1172e34b1f9cf8eef3b@epcas5p1.samsung.com>]
* [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy [not found] ` <CGME20210817101803epcas5p10cda1d52f8a8f1172e34b1f9cf8eef3b@epcas5p1.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 2021-08-17 13:09 ` Greg KH ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence From: Nitesh Shetty <nj.shetty@samsung.com> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges to a destination in the device. COPY ioctl accepts a 'copy_range' structure that contains destination (in sectors), no of sources and pointer to the array of source ranges. Each source range is represented by 'range_entry' that contains start and length of source ranges (in sectors) MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. Example code, to issue BLKCOPY: /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to * [64,24], on the same device */ int main(void) { int ret, fd; struct range_entry source_range[] = {{.src = 0, .len = 8}, {.src = 16, .len = 8}, {.src = 32, .len = 8},}; struct copy_range cr; cr.dest = 64; cr.nr_range = 3; cr.range_list = (__u64)&source_range; fd = open("/dev/nvme0n1", O_RDWR); if (fd < 0) return 1; ret = ioctl(fd, BLKCOPY, &cr); if (ret < 0) printf("copy failure\n"); close(fd); return ret; } Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ include/uapi/linux/fs.h | 8 ++++++++ 2 files changed, 41 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index eb0491e90b9a..2af56d01e9fe 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, GFP_KERNEL, flags); } +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, + unsigned long arg) +{ + struct copy_range crange; + struct range_entry *rlist; + int ret; + + if (!(mode & FMODE_WRITE)) + return -EBADF; + + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) + return -EFAULT; + + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), + GFP_KERNEL); + if (!rlist) + return -ENOMEM; + + if (copy_from_user(rlist, (void __user *)crange.range_list, + sizeof(*rlist) * crange.nr_range)) { + ret = -EFAULT; + goto out; + } + + ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest, + GFP_KERNEL, 0); +out: + kfree(rlist); + return ret; +} + static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, unsigned long arg) { @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, case BLKSECDISCARD: return blk_ioctl_discard(bdev, mode, arg, BLKDEV_DISCARD_SECURE); + case BLKCOPY: + return blk_ioctl_copy(bdev, mode, arg); case BLKZEROOUT: return blk_ioctl_zeroout(bdev, mode, arg); case BLKGETDISKSEQ: diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 7a97b588d892..4183688ff398 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -76,6 +76,13 @@ struct range_entry { __u64 len; }; +struct copy_range { + __u64 dest; + __u64 nr_range; + __u64 range_list; + __u64 rsvd; +}; + /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */ #define FILE_DEDUPE_RANGE_SAME 0 #define FILE_DEDUPE_RANGE_DIFFERS 1 @@ -197,6 +204,7 @@ struct fsxattr { #define BLKROTATIONAL _IO(0x12,126) #define BLKZEROOUT _IO(0x12,127) #define BLKGETDISKSEQ _IOR(0x12,128,__u64) +#define BLKCOPY _IOWR(0x12, 129, struct copy_range) /* * A jump here: 130-136 are reserved for zoned block devices * (see uapi/linux/blkzoned.h) -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy 2021-08-17 10:14 ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S @ 2021-08-17 13:09 ` Greg KH 2021-08-17 13:10 ` Greg KH 2021-08-17 23:36 ` Darrick J. Wong 2 siblings, 0 replies; 30+ messages in thread From: Greg KH @ 2021-08-17 13:09 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote: > From: Nitesh Shetty <nj.shetty@samsung.com> > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges > to a destination in the device. COPY ioctl accepts a 'copy_range' > structure that contains destination (in sectors), no of sources and > pointer to the array of source ranges. Each source range is represented by > 'range_entry' that contains start and length of source ranges (in sectors) > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. > > Example code, to issue BLKCOPY: > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to > * [64,24], on the same device */ > > int main(void) > { > int ret, fd; > struct range_entry source_range[] = {{.src = 0, .len = 8}, > {.src = 16, .len = 8}, {.src = 32, .len = 8},}; > struct copy_range cr; > > cr.dest = 64; > cr.nr_range = 3; > cr.range_list = (__u64)&source_range; > > fd = open("/dev/nvme0n1", O_RDWR); > if (fd < 0) return 1; > > ret = ioctl(fd, BLKCOPY, &cr); > if (ret < 0) printf("copy failure\n"); > > close(fd); > > return ret; > } > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ > include/uapi/linux/fs.h | 8 ++++++++ > 2 files changed, 41 insertions(+) > > diff --git a/block/ioctl.c b/block/ioctl.c > index eb0491e90b9a..2af56d01e9fe 100644 > --- a/block/ioctl.c > +++ b/block/ioctl.c > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, > GFP_KERNEL, flags); > } > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, > + unsigned long arg) > +{ > + struct copy_range crange; > + struct range_entry *rlist; > + int ret; > + > + if (!(mode & FMODE_WRITE)) > + return -EBADF; > + > + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) > + return -EFAULT; > + > + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), > + GFP_KERNEL); > + if (!rlist) > + return -ENOMEM; > + > + if (copy_from_user(rlist, (void __user *)crange.range_list, > + sizeof(*rlist) * crange.nr_range)) { > + ret = -EFAULT; > + goto out; > + } > + > + ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest, > + GFP_KERNEL, 0); > +out: > + kfree(rlist); > + return ret; > +} > + > static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, > unsigned long arg) > { > @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, > case BLKSECDISCARD: > return blk_ioctl_discard(bdev, mode, arg, > BLKDEV_DISCARD_SECURE); > + case BLKCOPY: > + return blk_ioctl_copy(bdev, mode, arg); > case BLKZEROOUT: > return blk_ioctl_zeroout(bdev, mode, arg); > case BLKGETDISKSEQ: > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > index 7a97b588d892..4183688ff398 100644 > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -76,6 +76,13 @@ struct range_entry { > __u64 len; > }; > > +struct copy_range { > + __u64 dest; > + __u64 nr_range; > + __u64 range_list; > + __u64 rsvd; If you have a "reserved" field, you HAVE to check that it is 0. If not, you can never use it in the future. Also, you can spell it out, we have lots of vowels :) thanks, greg k-h -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy 2021-08-17 10:14 ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S 2021-08-17 13:09 ` Greg KH @ 2021-08-17 13:10 ` Greg KH 2021-08-17 14:48 ` Nitesh Shetty 2021-08-17 23:36 ` Darrick J. Wong 2 siblings, 1 reply; 30+ messages in thread From: Greg KH @ 2021-08-17 13:10 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote: > From: Nitesh Shetty <nj.shetty@samsung.com> > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges > to a destination in the device. COPY ioctl accepts a 'copy_range' > structure that contains destination (in sectors), no of sources and > pointer to the array of source ranges. Each source range is represented by > 'range_entry' that contains start and length of source ranges (in sectors) > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. > > Example code, to issue BLKCOPY: > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to > * [64,24], on the same device */ > > int main(void) > { > int ret, fd; > struct range_entry source_range[] = {{.src = 0, .len = 8}, > {.src = 16, .len = 8}, {.src = 32, .len = 8},}; > struct copy_range cr; > > cr.dest = 64; > cr.nr_range = 3; > cr.range_list = (__u64)&source_range; > > fd = open("/dev/nvme0n1", O_RDWR); > if (fd < 0) return 1; > > ret = ioctl(fd, BLKCOPY, &cr); > if (ret < 0) printf("copy failure\n"); > > close(fd); > > return ret; > } > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ > include/uapi/linux/fs.h | 8 ++++++++ > 2 files changed, 41 insertions(+) > > diff --git a/block/ioctl.c b/block/ioctl.c > index eb0491e90b9a..2af56d01e9fe 100644 > --- a/block/ioctl.c > +++ b/block/ioctl.c > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, > GFP_KERNEL, flags); > } > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, > + unsigned long arg) > +{ > + struct copy_range crange; > + struct range_entry *rlist; > + int ret; > + > + if (!(mode & FMODE_WRITE)) > + return -EBADF; > + > + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) > + return -EFAULT; > + > + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), > + GFP_KERNEL); No error checking for huge values of nr_range? Is that wise? You really want userspace to be able to allocate "all" of the kernel memory in the system? thanks, greg k-h -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy 2021-08-17 13:10 ` Greg KH @ 2021-08-17 14:48 ` Nitesh Shetty 0 siblings, 0 replies; 30+ messages in thread From: Nitesh Shetty @ 2021-08-17 14:48 UTC (permalink / raw) To: Greg KH Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 6:40 PM Greg KH <greg@kroah.com> wrote: > > On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote: > > From: Nitesh Shetty <nj.shetty@samsung.com> > > > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges > > to a destination in the device. COPY ioctl accepts a 'copy_range' > > structure that contains destination (in sectors), no of sources and > > pointer to the array of source ranges. Each source range is represented by > > 'range_entry' that contains start and length of source ranges (in sectors) > > > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and > > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. > > > > Example code, to issue BLKCOPY: > > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to > > * [64,24], on the same device */ > > > > int main(void) > > { > > int ret, fd; > > struct range_entry source_range[] = {{.src = 0, .len = 8}, > > {.src = 16, .len = 8}, {.src = 32, .len = 8},}; > > struct copy_range cr; > > > > cr.dest = 64; > > cr.nr_range = 3; > > cr.range_list = (__u64)&source_range; > > > > fd = open("/dev/nvme0n1", O_RDWR); > > if (fd < 0) return 1; > > > > ret = ioctl(fd, BLKCOPY, &cr); > > if (ret < 0) printf("copy failure\n"); > > > > close(fd); > > > > return ret; > > } > > > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > > --- > > block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ > > include/uapi/linux/fs.h | 8 ++++++++ > > 2 files changed, 41 insertions(+) > > > > diff --git a/block/ioctl.c b/block/ioctl.c > > index eb0491e90b9a..2af56d01e9fe 100644 > > --- a/block/ioctl.c > > +++ b/block/ioctl.c > > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, > > GFP_KERNEL, flags); > > } > > > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, > > + unsigned long arg) > > +{ > > + struct copy_range crange; > > + struct range_entry *rlist; > > + int ret; > > + > > + if (!(mode & FMODE_WRITE)) > > + return -EBADF; > > + > > + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) > > + return -EFAULT; > > + > > + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), > > + GFP_KERNEL); > > No error checking for huge values of nr_range? Is that wise? You > really want userspace to be able to allocate "all" of the kernel memory > in the system? > > thanks, > > greg k-h We added a kernel imposed limit MAX_COPY_NR_RANGE for that purpose, but missed adding the check here. Will have that fixed. Thanks for pointing this out. Nitesh Shetty -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy 2021-08-17 10:14 ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S 2021-08-17 13:09 ` Greg KH 2021-08-17 13:10 ` Greg KH @ 2021-08-17 23:36 ` Darrick J. Wong 2021-08-18 15:37 ` Nitesh Shetty 2 siblings, 1 reply; 30+ messages in thread From: Darrick J. Wong @ 2021-08-17 23:36 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote: > From: Nitesh Shetty <nj.shetty@samsung.com> > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges > to a destination in the device. COPY ioctl accepts a 'copy_range' > structure that contains destination (in sectors), no of sources and > pointer to the array of source ranges. Each source range is represented by > 'range_entry' that contains start and length of source ranges (in sectors) > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. > > Example code, to issue BLKCOPY: > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to > * [64,24], on the same device */ > > int main(void) > { > int ret, fd; > struct range_entry source_range[] = {{.src = 0, .len = 8}, > {.src = 16, .len = 8}, {.src = 32, .len = 8},}; > struct copy_range cr; > > cr.dest = 64; > cr.nr_range = 3; > cr.range_list = (__u64)&source_range; > > fd = open("/dev/nvme0n1", O_RDWR); > if (fd < 0) return 1; > > ret = ioctl(fd, BLKCOPY, &cr); > if (ret < 0) printf("copy failure\n"); > > close(fd); > > return ret; > } > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ > include/uapi/linux/fs.h | 8 ++++++++ > 2 files changed, 41 insertions(+) > > diff --git a/block/ioctl.c b/block/ioctl.c > index eb0491e90b9a..2af56d01e9fe 100644 > --- a/block/ioctl.c > +++ b/block/ioctl.c > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, > GFP_KERNEL, flags); > } > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, > + unsigned long arg) > +{ > + struct copy_range crange; > + struct range_entry *rlist; > + int ret; > + > + if (!(mode & FMODE_WRITE)) > + return -EBADF; > + > + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) > + return -EFAULT; > + > + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), > + GFP_KERNEL); > + if (!rlist) > + return -ENOMEM; > + > + if (copy_from_user(rlist, (void __user *)crange.range_list, > + sizeof(*rlist) * crange.nr_range)) { > + ret = -EFAULT; > + goto out; > + } > + > + ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest, > + GFP_KERNEL, 0); > +out: > + kfree(rlist); > + return ret; > +} > + > static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, > unsigned long arg) > { > @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, > case BLKSECDISCARD: > return blk_ioctl_discard(bdev, mode, arg, > BLKDEV_DISCARD_SECURE); > + case BLKCOPY: > + return blk_ioctl_copy(bdev, mode, arg); > case BLKZEROOUT: > return blk_ioctl_zeroout(bdev, mode, arg); > case BLKGETDISKSEQ: > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > index 7a97b588d892..4183688ff398 100644 > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -76,6 +76,13 @@ struct range_entry { > __u64 len; > }; > > +struct copy_range { > + __u64 dest; > + __u64 nr_range; If the maximum number of elements in the range list is 1<<12, there's no need for this to be larger than a u16, right? > + __u64 range_list; Pointers embedded in a structure are /not/ a good idea, because this will create a lot of compatibility headaches for 32-bit binaries running on 64-bit kernels. Please just make the size of this header structure a multiple of 8 bytes and put the range_entry list immediately after it. struct copy_range { __s64 dest_offset; __u32 nr_range_entries; __u32 flags; __u64 reserved[2]; }; struct __user range_entry *re = ((struct range_entry *)(copyhead + 1)); copy_from_user(&urk, re...); --D > + __u64 rsvd; > +}; > + > /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */ > #define FILE_DEDUPE_RANGE_SAME 0 > #define FILE_DEDUPE_RANGE_DIFFERS 1 > @@ -197,6 +204,7 @@ struct fsxattr { > #define BLKROTATIONAL _IO(0x12,126) > #define BLKZEROOUT _IO(0x12,127) > #define BLKGETDISKSEQ _IOR(0x12,128,__u64) > +#define BLKCOPY _IOWR(0x12, 129, struct copy_range) > /* > * A jump here: 130-136 are reserved for zoned block devices > * (see uapi/linux/blkzoned.h) > -- > 2.25.1 > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy 2021-08-17 23:36 ` Darrick J. Wong @ 2021-08-18 15:37 ` Nitesh Shetty 2021-08-18 16:17 ` Darrick J. Wong 0 siblings, 1 reply; 30+ messages in thread From: Nitesh Shetty @ 2021-08-18 15:37 UTC (permalink / raw) To: Darrick J. Wong Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Wed, Aug 18, 2021 at 5:06 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote: > > From: Nitesh Shetty <nj.shetty@samsung.com> > > > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges > > to a destination in the device. COPY ioctl accepts a 'copy_range' > > structure that contains destination (in sectors), no of sources and > > pointer to the array of source ranges. Each source range is represented by > > 'range_entry' that contains start and length of source ranges (in sectors) > > > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and > > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. > > > > Example code, to issue BLKCOPY: > > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to > > * [64,24], on the same device */ > > > > int main(void) > > { > > int ret, fd; > > struct range_entry source_range[] = {{.src = 0, .len = 8}, > > {.src = 16, .len = 8}, {.src = 32, .len = 8},}; > > struct copy_range cr; > > > > cr.dest = 64; > > cr.nr_range = 3; > > cr.range_list = (__u64)&source_range; > > > > fd = open("/dev/nvme0n1", O_RDWR); > > if (fd < 0) return 1; > > > > ret = ioctl(fd, BLKCOPY, &cr); > > if (ret < 0) printf("copy failure\n"); > > > > close(fd); > > > > return ret; > > } > > > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > > --- > > block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ > > include/uapi/linux/fs.h | 8 ++++++++ > > 2 files changed, 41 insertions(+) > > > > diff --git a/block/ioctl.c b/block/ioctl.c > > index eb0491e90b9a..2af56d01e9fe 100644 > > --- a/block/ioctl.c > > +++ b/block/ioctl.c > > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, > > GFP_KERNEL, flags); > > } > > > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, > > + unsigned long arg) > > +{ > > + struct copy_range crange; > > + struct range_entry *rlist; > > + int ret; > > + > > + if (!(mode & FMODE_WRITE)) > > + return -EBADF; > > + > > + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) > > + return -EFAULT; > > + > > + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), > > + GFP_KERNEL); > > + if (!rlist) > > + return -ENOMEM; > > + > > + if (copy_from_user(rlist, (void __user *)crange.range_list, > > + sizeof(*rlist) * crange.nr_range)) { > > + ret = -EFAULT; > > + goto out; > > + } > > + > > + ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest, > > + GFP_KERNEL, 0); > > +out: > > + kfree(rlist); > > + return ret; > > +} > > + > > static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, > > unsigned long arg) > > { > > @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, > > case BLKSECDISCARD: > > return blk_ioctl_discard(bdev, mode, arg, > > BLKDEV_DISCARD_SECURE); > > + case BLKCOPY: > > + return blk_ioctl_copy(bdev, mode, arg); > > case BLKZEROOUT: > > return blk_ioctl_zeroout(bdev, mode, arg); > > case BLKGETDISKSEQ: > > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > > index 7a97b588d892..4183688ff398 100644 > > --- a/include/uapi/linux/fs.h > > +++ b/include/uapi/linux/fs.h > > @@ -76,6 +76,13 @@ struct range_entry { > > __u64 len; > > }; > > > > +struct copy_range { > > + __u64 dest; > > + __u64 nr_range; > > If the maximum number of elements in the range list is 1<<12, there's no > need for this to be larger than a u16, right? > > > + __u64 range_list; > > Pointers embedded in a structure are /not/ a good idea, because this > will create a lot of compatibility headaches for 32-bit binaries running > on 64-bit kernels. Please just make the size of this header structure > a multiple of 8 bytes and put the range_entry list immediately after it. > > struct copy_range { > __s64 dest_offset; > __u32 nr_range_entries; > __u32 flags; > __u64 reserved[2]; > }; > > struct __user range_entry *re = ((struct range_entry *)(copyhead + 1)); > > copy_from_user(&urk, re...); > > --D > Thanks, this is better. 'Reserved' field was there to be used for future extension of the interface. Now that you mentioned 'flags', it seems we can do away with 'reserved' fields altogether? Regards, Nitesh Shetty -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy 2021-08-18 15:37 ` Nitesh Shetty @ 2021-08-18 16:17 ` Darrick J. Wong 0 siblings, 0 replies; 30+ messages in thread From: Darrick J. Wong @ 2021-08-18 16:17 UTC (permalink / raw) To: Nitesh Shetty Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Wed, Aug 18, 2021 at 09:07:54PM +0530, Nitesh Shetty wrote: > On Wed, Aug 18, 2021 at 5:06 AM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Tue, Aug 17, 2021 at 03:44:20PM +0530, SelvaKumar S wrote: > > > From: Nitesh Shetty <nj.shetty@samsung.com> > > > > > > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges > > > to a destination in the device. COPY ioctl accepts a 'copy_range' > > > structure that contains destination (in sectors), no of sources and > > > pointer to the array of source ranges. Each source range is represented by > > > 'range_entry' that contains start and length of source ranges (in sectors) > > > > > > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and > > > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle. > > > > > > Example code, to issue BLKCOPY: > > > /* Sample example to copy three source-ranges [0, 8] [16, 8] [32,8] to > > > * [64,24], on the same device */ > > > > > > int main(void) > > > { > > > int ret, fd; > > > struct range_entry source_range[] = {{.src = 0, .len = 8}, > > > {.src = 16, .len = 8}, {.src = 32, .len = 8},}; > > > struct copy_range cr; > > > > > > cr.dest = 64; > > > cr.nr_range = 3; > > > cr.range_list = (__u64)&source_range; > > > > > > fd = open("/dev/nvme0n1", O_RDWR); > > > if (fd < 0) return 1; > > > > > > ret = ioctl(fd, BLKCOPY, &cr); > > > if (ret < 0) printf("copy failure\n"); > > > > > > close(fd); > > > > > > return ret; > > > } > > > > > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > > > --- > > > block/ioctl.c | 33 +++++++++++++++++++++++++++++++++ > > > include/uapi/linux/fs.h | 8 ++++++++ > > > 2 files changed, 41 insertions(+) > > > > > > diff --git a/block/ioctl.c b/block/ioctl.c > > > index eb0491e90b9a..2af56d01e9fe 100644 > > > --- a/block/ioctl.c > > > +++ b/block/ioctl.c > > > @@ -143,6 +143,37 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, > > > GFP_KERNEL, flags); > > > } > > > > > > +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, > > > + unsigned long arg) > > > +{ > > > + struct copy_range crange; > > > + struct range_entry *rlist; > > > + int ret; > > > + > > > + if (!(mode & FMODE_WRITE)) > > > + return -EBADF; > > > + > > > + if (copy_from_user(&crange, (void __user *)arg, sizeof(crange))) > > > + return -EFAULT; > > > + > > > + rlist = kmalloc_array(crange.nr_range, sizeof(*rlist), > > > + GFP_KERNEL); > > > + if (!rlist) > > > + return -ENOMEM; > > > + > > > + if (copy_from_user(rlist, (void __user *)crange.range_list, > > > + sizeof(*rlist) * crange.nr_range)) { > > > + ret = -EFAULT; > > > + goto out; > > > + } > > > + > > > + ret = blkdev_issue_copy(bdev, crange.nr_range, rlist, bdev, crange.dest, > > > + GFP_KERNEL, 0); > > > +out: > > > + kfree(rlist); > > > + return ret; > > > +} > > > + > > > static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, > > > unsigned long arg) > > > { > > > @@ -468,6 +499,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, > > > case BLKSECDISCARD: > > > return blk_ioctl_discard(bdev, mode, arg, > > > BLKDEV_DISCARD_SECURE); > > > + case BLKCOPY: > > > + return blk_ioctl_copy(bdev, mode, arg); > > > case BLKZEROOUT: > > > return blk_ioctl_zeroout(bdev, mode, arg); > > > case BLKGETDISKSEQ: > > > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > > > index 7a97b588d892..4183688ff398 100644 > > > --- a/include/uapi/linux/fs.h > > > +++ b/include/uapi/linux/fs.h > > > @@ -76,6 +76,13 @@ struct range_entry { > > > __u64 len; > > > }; > > > > > > +struct copy_range { > > > + __u64 dest; > > > + __u64 nr_range; > > > > If the maximum number of elements in the range list is 1<<12, there's no > > need for this to be larger than a u16, right? > > > > > + __u64 range_list; > > > > Pointers embedded in a structure are /not/ a good idea, because this > > will create a lot of compatibility headaches for 32-bit binaries running > > on 64-bit kernels. Please just make the size of this header structure > > a multiple of 8 bytes and put the range_entry list immediately after it. > > > > struct copy_range { > > __s64 dest_offset; > > __u32 nr_range_entries; > > __u32 flags; > > __u64 reserved[2]; > > }; > > > > struct __user range_entry *re = ((struct range_entry *)(copyhead + 1)); > > > > copy_from_user(&urk, re...); > > > > --D > > > Thanks, this is better. 'Reserved' field was there to be used for > future extension of the interface. > Now that you mentioned 'flags', it seems we can do away with > 'reserved' fields altogether? We still want the reserved-must-be-zero fields so that adding the first field or two doesn't require changes to the pointer arithmetic. Also, I suppose you could make the relationship between copy_range and range_entry more explicit: struct copy_range { __s64 dest_offset; __u32 nr_range_entries; __u32 flags; __u64 reserved[2]; /* must come last */ struct range_entry entries[]; }; struct __user range_entry *re = ©head->entries[0]; --D > > Regards, > Nitesh Shetty -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101809epcas5p39eed3531ed82f5f08127eb3dba1fc50f@epcas5p3.samsung.com>]
* [dm-devel] [PATCH 5/7] block: add emulation for simple copy [not found] ` <CGME20210817101809epcas5p39eed3531ed82f5f08127eb3dba1fc50f@epcas5p3.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 2021-08-17 22:10 ` kernel test robot 0 siblings, 1 reply; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence From: Nitesh Shetty <nj.shetty@samsung.com> For the devices which does not support simple copy, copy emulation is added. Also for stacked devices, copy is performed via emulation. Copy-emulation is implemented by allocating maximum possible memory less than or equal to total copy size. The source ranges are read into memory by chaining bio for each source ranges and submitting them async and the last bio waits for completion. After data is read, it is written to the destination and the process is repeated till no source ranges left. bio_map_kern() is used to allocate bio and add pages of copy buffer to bio. As bio->bi_private and bio->bi_end_io are needed for chaining the bio and gets over-written, invalidate_kernel_vmap_range() for read is called in the caller. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> --- block/blk-lib.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7fee0ae95c44..d29c52b90dcf 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -333,6 +333,64 @@ int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, return ret; } +int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len, + sector_t sector, unsigned int op, gfp_t gfp_mask) +{ + struct request_queue *q = bdev_get_queue(bdev); + struct bio *bio, *parent = NULL; + sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q), + queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)); + sector_t len, remaining; + int ret; + + for (remaining = buf_len; remaining > 0; remaining -= len) { + len = min_t(int, max_hw_len, remaining); +retry: + bio = bio_map_kern(q, buf, len << SECTOR_SHIFT, gfp_mask); + if (IS_ERR(bio)) { + len >>= 1; + if (len) + goto retry; + return PTR_ERR(bio); + } + + bio->bi_iter.bi_sector = sector; + bio->bi_opf = op; + bio_set_dev(bio, bdev); + bio->bi_end_io = NULL; + bio->bi_private = NULL; + + if (parent) { + bio_chain(parent, bio); + submit_bio(parent); + } + parent = bio; + sector += len; + } + ret = submit_bio_wait(bio); + bio_put(bio); + + return ret; +} + +static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask) +{ + int min_size = PAGE_SIZE; + void *buf; + + while (req_size >= min_size) { + buf = kvmalloc(req_size, gfp_mask); + if (buf) { + *alloc_size = (req_size >> SECTOR_SHIFT); + return buf; + } + /* retry half the requested size */ + req_size >>= 1; + } + + return NULL; +} + static inline sector_t blk_copy_len(struct range_entry *rlist, int nr_srcs) { int i; @@ -348,6 +406,46 @@ static inline sector_t blk_copy_len(struct range_entry *rlist, int nr_srcs) return len; } +/* + * If native copy offload feature is absent, this function tries to emulate, + * by copying data from source to a temporary buffer and from buffer to + * destination device. + */ +static int blk_copy_emulate(struct block_device *src_bdev, int nr_srcs, + struct range_entry *rlist, struct block_device *dest_bdev, + sector_t dest, gfp_t gfp_mask) +{ + void *buf = NULL; + int ret, nr_i = 0; + sector_t src_blk, copy_len, buf_len, read_len, copied_len, remaining = 0; + + copy_len = blk_copy_len(rlist, nr_srcs); + buf = blk_alloc_buf(copy_len << SECTOR_SHIFT, &buf_len, gfp_mask); + if (!buf) + return -ENOMEM; + + for (copied_len = 0; copied_len < copy_len; copied_len += read_len) { + if (!remaining) { + src_blk = rlist[nr_i].src; + remaining = rlist[nr_i++].len; + } + + read_len = min_t(sector_t, remaining, buf_len); + ret = blk_submit_rw_buf(src_bdev, buf, read_len, src_blk, REQ_OP_READ, gfp_mask); + if (ret) + goto out; + src_blk += read_len; + remaining -= read_len; + ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dest + copied_len, REQ_OP_WRITE, + gfp_mask); + if (ret) + goto out; + } +out: + kvfree(buf); + return ret; +} + static inline bool blk_check_offload_scc(struct request_queue *src_q, struct request_queue *dest_q) { @@ -398,6 +496,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs, if (blk_check_offload_scc(src_q, dest_q)) ret = blk_copy_offload_scc(src_bdev, nr_srcs, src_rlist, dest_bdev, dest, gfp_mask); + else + ret = blk_copy_emulate(src_bdev, nr_srcs, src_rlist, dest_bdev, dest, gfp_mask); return ret; } -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 5/7] block: add emulation for simple copy 2021-08-17 10:14 ` [dm-devel] [PATCH 5/7] block: add emulation " SelvaKumar S @ 2021-08-17 22:10 ` kernel test robot 0 siblings, 0 replies; 30+ messages in thread From: kernel test robot @ 2021-08-17 22:10 UTC (permalink / raw) To: SelvaKumar S, linux-nvme, linux-block Cc: axboe, damien.lemoal, kbuild-all, linux-scsi, linux-api, kbusch, clang-built-linux, dm-devel, linux-fsdevel, asml.silence [-- Attachment #1: Type: text/plain, Size: 4154 bytes --] Hi SelvaKumar, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on block/for-next] [also build test WARNING on dm/for-next next-20210817] [cannot apply to linus/master linux-nvme/for-next v5.14-rc6] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next config: hexagon-randconfig-r013-20210816 (attached as .config) compiler: clang version 12.0.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/c307f1051a72122d636502c6885df8b2b25ed697 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review SelvaKumar-S/block-make-bio_map_kern-non-static/20210817-193111 git checkout c307f1051a72122d636502c6885df8b2b25ed697 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): block/blk-lib.c:197:5: warning: no previous prototype for function 'blk_copy_offload_submit_bio' [-Wmissing-prototypes] int blk_copy_offload_submit_bio(struct block_device *bdev, ^ block/blk-lib.c:197:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int blk_copy_offload_submit_bio(struct block_device *bdev, ^ static block/blk-lib.c:250:5: warning: no previous prototype for function 'blk_copy_offload_scc' [-Wmissing-prototypes] int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, ^ block/blk-lib.c:250:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int blk_copy_offload_scc(struct block_device *src_bdev, int nr_srcs, ^ static >> block/blk-lib.c:336:5: warning: no previous prototype for function 'blk_submit_rw_buf' [-Wmissing-prototypes] int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len, ^ block/blk-lib.c:336:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len, ^ static 3 warnings generated. vim +/blk_submit_rw_buf +336 block/blk-lib.c 335 > 336 int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len, 337 sector_t sector, unsigned int op, gfp_t gfp_mask) 338 { 339 struct request_queue *q = bdev_get_queue(bdev); 340 struct bio *bio, *parent = NULL; 341 sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q), 342 queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)); 343 sector_t len, remaining; 344 int ret; 345 346 for (remaining = buf_len; remaining > 0; remaining -= len) { 347 len = min_t(int, max_hw_len, remaining); 348 retry: 349 bio = bio_map_kern(q, buf, len << SECTOR_SHIFT, gfp_mask); 350 if (IS_ERR(bio)) { 351 len >>= 1; 352 if (len) 353 goto retry; 354 return PTR_ERR(bio); 355 } 356 357 bio->bi_iter.bi_sector = sector; 358 bio->bi_opf = op; 359 bio_set_dev(bio, bdev); 360 bio->bi_end_io = NULL; 361 bio->bi_private = NULL; 362 363 if (parent) { 364 bio_chain(parent, bio); 365 submit_bio(parent); 366 } 367 parent = bio; 368 sector += len; 369 } 370 ret = submit_bio_wait(bio); 371 bio_put(bio); 372 373 return ret; 374 } 375 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 22159 bytes --] [-- Attachment #3: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101814epcas5p41db3d7269f5139efcaf2ca685cd04a16@epcas5p4.samsung.com>]
* [dm-devel] [PATCH 6/7] nvme: add simple copy support [not found] ` <CGME20210817101814epcas5p41db3d7269f5139efcaf2ca685cd04a16@epcas5p4.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 0 siblings, 0 replies; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence Add support for TP 4065a ("Simple Copy Command"), v2020.05.04 ("Ratified") For device supporting native simple copy, this implementation accepts the payload passed from the block layer and convert payload to form simple copy command and submit to the device. Set the device copy limits to queue limits. By default copy_offload is disabled. End-to-end protection is done by setting both PRINFOR and PRINFOW to 0. Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Javier González <javier.gonz@samsung.com> --- drivers/nvme/host/core.c | 83 +++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/trace.c | 19 +++++++++ include/linux/nvme.h | 43 ++++++++++++++++++-- 3 files changed, 142 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f5aabbb4a00f..41f02ee8c0d2 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -825,6 +825,59 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); } +static inline blk_status_t nvme_setup_copy(struct nvme_ns *ns, + struct request *req, struct nvme_command *cmnd) +{ + struct nvme_ctrl *ctrl = ns->ctrl; + struct nvme_copy_range *range = NULL; + struct blk_copy_payload *payload; + unsigned short nr_range = 0; + u16 control = 0, ssrl; + u32 dsmgmt = 0; + u64 slba; + int i; + + payload = bio_data(req->bio); + nr_range = payload->copy_nr_ranges; + + if (req->cmd_flags & REQ_FUA) + control |= NVME_RW_FUA; + + if (req->cmd_flags & REQ_FAILFAST_DEV) + control |= NVME_RW_LR; + + cmnd->copy.opcode = nvme_cmd_copy; + cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id); + cmnd->copy.sdlba = cpu_to_le64(blk_rq_pos(req) >> (ns->lba_shift - 9)); + + range = kmalloc_array(nr_range, sizeof(*range), + GFP_ATOMIC | __GFP_NOWARN); + if (!range) + return BLK_STS_RESOURCE; + + for (i = 0; i < nr_range; i++) { + slba = (payload->range[i].src) >> (ns->lba_shift - 9); + ssrl = (payload->range[i].len) >> (ns->lba_shift - 9); + range[i].slba = cpu_to_le64(slba); + range[i].nlb = cpu_to_le16(ssrl - 1); + } + + cmnd->copy.nr_range = nr_range - 1; + + req->special_vec.bv_page = virt_to_page(range); + req->special_vec.bv_offset = offset_in_page(range); + req->special_vec.bv_len = sizeof(*range) * nr_range; + req->rq_flags |= RQF_SPECIAL_PAYLOAD; + + if (ctrl->nr_streams) + nvme_assign_write_stream(ctrl, req, &control, &dsmgmt); + + cmnd->rw.control = cpu_to_le16(control); + cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt); + + return BLK_STS_OK; +} + static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd) { @@ -1012,6 +1065,9 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req) case REQ_OP_DISCARD: ret = nvme_setup_discard(ns, req, cmd); break; + case REQ_OP_COPY: + ret = nvme_setup_copy(ns, req, cmd); + break; case REQ_OP_READ: ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read); break; @@ -1640,6 +1696,31 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns) blk_queue_max_write_zeroes_sectors(queue, UINT_MAX); } +static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns, + struct nvme_id_ns *id) +{ + struct nvme_ctrl *ctrl = ns->ctrl; + struct request_queue *queue = disk->queue; + + if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) { + queue->limits.copy_offload = 0; + queue->limits.max_copy_sectors = 0; + queue->limits.max_copy_range_sectors = 0; + queue->limits.max_copy_nr_ranges = 0; + blk_queue_flag_clear(QUEUE_FLAG_SIMPLE_COPY, queue); + return; + } + + /* setting copy limits */ + blk_queue_flag_test_and_set(QUEUE_FLAG_SIMPLE_COPY, queue); + queue->limits.copy_offload = 0; + queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) * + (1 << (ns->lba_shift - 9)); + queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) * + (1 << (ns->lba_shift - 9)); + queue->limits.max_copy_nr_ranges = id->msrc + 1; +} + static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids) { return !uuid_is_null(&ids->uuid) || @@ -1811,6 +1892,7 @@ static void nvme_update_disk_info(struct gendisk *disk, nvme_config_discard(disk, ns); blk_queue_max_write_zeroes_sectors(disk->queue, ns->ctrl->max_zeroes_sectors); + nvme_config_copy(disk, ns, id); set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) || test_bit(NVME_NS_FORCE_RO, &ns->flags)); @@ -4564,6 +4646,7 @@ static inline void _nvme_check_size(void) BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64); BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64); BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64); + BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64); BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64); BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64); BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64); diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c index 6543015b6121..0e7db7ad5ed3 100644 --- a/drivers/nvme/host/trace.c +++ b/drivers/nvme/host/trace.c @@ -136,6 +136,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10) return ret; } +static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10) +{ + const char *ret = trace_seq_buffer_ptr(p); + u64 slba = get_unaligned_le64(cdw10); + u8 nr_range = get_unaligned_le16(cdw10 + 8); + u16 control = get_unaligned_le16(cdw10 + 10); + u32 dsmgmt = get_unaligned_le32(cdw10 + 12); + u32 reftag = get_unaligned_le32(cdw10 + 16); + + trace_seq_printf(p, + "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u", + slba, nr_range, control, dsmgmt, reftag); + trace_seq_putc(p, 0); + + return ret; +} + static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10) { const char *ret = trace_seq_buffer_ptr(p); @@ -227,6 +244,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p, return nvme_trace_zone_mgmt_send(p, cdw10); case nvme_cmd_zone_mgmt_recv: return nvme_trace_zone_mgmt_recv(p, cdw10); + case nvme_cmd_copy: + return nvme_trace_copy(p, cdw10); default: return nvme_trace_common(p, cdw10); } diff --git a/include/linux/nvme.h b/include/linux/nvme.h index b7c4c4130b65..4f79f223c9eb 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -295,7 +295,7 @@ struct nvme_id_ctrl { __u8 nvscc; __u8 nwpc; __le16 acwu; - __u8 rsvd534[2]; + __le16 ocfs; __le32 sgls; __le32 mnan; __u8 rsvd544[224]; @@ -320,6 +320,7 @@ enum { NVME_CTRL_ONCS_WRITE_ZEROES = 1 << 3, NVME_CTRL_ONCS_RESERVATIONS = 1 << 5, NVME_CTRL_ONCS_TIMESTAMP = 1 << 6, + NVME_CTRL_ONCS_COPY = 1 << 8, NVME_CTRL_VWC_PRESENT = 1 << 0, NVME_CTRL_OACS_SEC_SUPP = 1 << 0, NVME_CTRL_OACS_DIRECTIVES = 1 << 5, @@ -368,7 +369,10 @@ struct nvme_id_ns { __le16 npdg; __le16 npda; __le16 nows; - __u8 rsvd74[18]; + __le16 mssrl; + __le32 mcl; + __u8 msrc; + __u8 rsvd91[11]; __le32 anagrpid; __u8 rsvd96[3]; __u8 nsattr; @@ -689,6 +693,7 @@ enum nvme_opcode { nvme_cmd_resv_report = 0x0e, nvme_cmd_resv_acquire = 0x11, nvme_cmd_resv_release = 0x15, + nvme_cmd_copy = 0x19, nvme_cmd_zone_mgmt_send = 0x79, nvme_cmd_zone_mgmt_recv = 0x7a, nvme_cmd_zone_append = 0x7d, @@ -710,7 +715,8 @@ enum nvme_opcode { nvme_opcode_name(nvme_cmd_resv_release), \ nvme_opcode_name(nvme_cmd_zone_mgmt_send), \ nvme_opcode_name(nvme_cmd_zone_mgmt_recv), \ - nvme_opcode_name(nvme_cmd_zone_append)) + nvme_opcode_name(nvme_cmd_zone_append), \ + nvme_opcode_name(nvme_cmd_copy)) @@ -883,6 +889,36 @@ struct nvme_dsm_range { __le64 slba; }; +struct nvme_copy_command { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 nsid; + __u64 rsvd2; + __le64 metadata; + union nvme_data_ptr dptr; + __le64 sdlba; + __u8 nr_range; + __u8 rsvd12; + __le16 control; + __le16 rsvd13; + __le16 dspec; + __le32 ilbrt; + __le16 lbat; + __le16 lbatm; +}; + +struct nvme_copy_range { + __le64 rsvd0; + __le64 slba; + __le16 nlb; + __le16 rsvd18; + __le32 rsvd20; + __le32 eilbrt; + __le16 elbat; + __le16 elbatm; +}; + struct nvme_write_zeroes_cmd { __u8 opcode; __u8 flags; @@ -1427,6 +1463,7 @@ struct nvme_command { struct nvme_download_firmware dlfw; struct nvme_format_cmd format; struct nvme_dsm_cmd dsm; + struct nvme_copy_command copy; struct nvme_write_zeroes_cmd write_zeroes; struct nvme_zone_mgmt_send_cmd zms; struct nvme_zone_mgmt_recv_cmd zmr; -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
[parent not found: <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com>]
* [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support [not found] ` <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com> @ 2021-08-17 10:14 ` SelvaKumar S 2021-08-17 20:29 ` Mikulas Patocka 0 siblings, 1 reply; 30+ messages in thread From: SelvaKumar S @ 2021-08-17 10:14 UTC (permalink / raw) To: linux-nvme, linux-block Cc: snitzer, djwong, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, SelvaKumar S, selvajove, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence Introduce copy_jobs to use copy-offload, if supported by underlying devices otherwise fall back to existing method. run_copy_jobs() calls block layer copy offload API, if both source and destination request queue are same and support copy offload. On successful completion, destination regions copied count is made zero, failed regions are processed via existing method. Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> --- drivers/md/dm-kcopyd.c | 56 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c index 37b03ab7e5c9..d9ee105a6127 100644 --- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -74,18 +74,20 @@ struct dm_kcopyd_client { atomic_t nr_jobs; /* - * We maintain four lists of jobs: + * We maintain five lists of jobs: * - * i) jobs waiting for pages - * ii) jobs that have pages, and are waiting for the io to be issued. - * iii) jobs that don't need to do any IO and just run a callback - * iv) jobs that have completed. + * i) jobs waiting to try copy offload + * ii) jobs waiting for pages + * iii) jobs that have pages, and are waiting for the io to be issued. + * iv) jobs that don't need to do any IO and just run a callback + * v) jobs that have completed. * - * All four of these are protected by job_lock. + * All five of these are protected by job_lock. */ spinlock_t job_lock; struct list_head callback_jobs; struct list_head complete_jobs; + struct list_head copy_jobs; struct list_head io_jobs; struct list_head pages_jobs; }; @@ -579,6 +581,43 @@ static int run_io_job(struct kcopyd_job *job) return r; } +static int run_copy_job(struct kcopyd_job *job) +{ + int r, i, count = 0; + unsigned long flags = 0; + struct range_entry srange; + + struct request_queue *src_q, *dest_q; + + for (i = 0; i < job->num_dests; i++) { + srange.src = job->source.sector; + srange.len = job->source.count; + + src_q = bdev_get_queue(job->source.bdev); + dest_q = bdev_get_queue(job->dests[i].bdev); + + if (src_q != dest_q && !src_q->limits.copy_offload) + break; + + r = blkdev_issue_copy(job->source.bdev, 1, &srange, + job->dests[i].bdev, job->dests[i].sector, GFP_KERNEL, flags); + if (r) + break; + + job->dests[i].count = 0; + count++; + } + + if (count == job->num_dests) { + push(&job->kc->complete_jobs, job); + } else { + push(&job->kc->pages_jobs, job); + r = 0; + } + + return r; +} + static int run_pages_job(struct kcopyd_job *job) { int r; @@ -659,6 +698,7 @@ static void do_work(struct work_struct *work) spin_unlock_irq(&kc->job_lock); blk_start_plug(&plug); + process_jobs(&kc->copy_jobs, kc, run_copy_job); process_jobs(&kc->complete_jobs, kc, run_complete_job); process_jobs(&kc->pages_jobs, kc, run_pages_job); process_jobs(&kc->io_jobs, kc, run_io_job); @@ -676,6 +716,8 @@ static void dispatch_job(struct kcopyd_job *job) atomic_inc(&kc->nr_jobs); if (unlikely(!job->source.count)) push(&kc->callback_jobs, job); + else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk) + push(&kc->copy_jobs, job); else if (job->pages == &zero_page_list) push(&kc->io_jobs, job); else @@ -916,6 +958,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro spin_lock_init(&kc->job_lock); INIT_LIST_HEAD(&kc->callback_jobs); INIT_LIST_HEAD(&kc->complete_jobs); + INIT_LIST_HEAD(&kc->copy_jobs); INIT_LIST_HEAD(&kc->io_jobs); INIT_LIST_HEAD(&kc->pages_jobs); kc->throttle = throttle; @@ -971,6 +1014,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc) BUG_ON(!list_empty(&kc->callback_jobs)); BUG_ON(!list_empty(&kc->complete_jobs)); + WARN_ON(!list_empty(&kc->copy_jobs)); BUG_ON(!list_empty(&kc->io_jobs)); BUG_ON(!list_empty(&kc->pages_jobs)); destroy_workqueue(kc->kcopyd_wq); -- 2.25.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support 2021-08-17 10:14 ` [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support SelvaKumar S @ 2021-08-17 20:29 ` Mikulas Patocka 0 siblings, 0 replies; 30+ messages in thread From: Mikulas Patocka @ 2021-08-17 20:29 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, djwong, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, joshiiitr, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, johannes.thumshirn, linux-api, linux-fsdevel, javier.gonz, asml.silence On Tue, 17 Aug 2021, SelvaKumar S wrote: > Introduce copy_jobs to use copy-offload, if supported by underlying devices > otherwise fall back to existing method. dm-kcopyd is usually used on the dm-linear target. And this patchset doesn't support passing copy requests through the linear target - so this patch doesn't seem useful. Mikulas > run_copy_jobs() calls block layer copy offload API, if both source and > destination request queue are same and support copy offload. > On successful completion, destination regions copied count is made zero, > failed regions are processed via existing method. > > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> > --- > drivers/md/dm-kcopyd.c | 56 +++++++++++++++++++++++++++++++++++++----- > 1 file changed, 50 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c > index 37b03ab7e5c9..d9ee105a6127 100644 > --- a/drivers/md/dm-kcopyd.c > +++ b/drivers/md/dm-kcopyd.c > @@ -74,18 +74,20 @@ struct dm_kcopyd_client { > atomic_t nr_jobs; > > /* > - * We maintain four lists of jobs: > + * We maintain five lists of jobs: > * > - * i) jobs waiting for pages > - * ii) jobs that have pages, and are waiting for the io to be issued. > - * iii) jobs that don't need to do any IO and just run a callback > - * iv) jobs that have completed. > + * i) jobs waiting to try copy offload > + * ii) jobs waiting for pages > + * iii) jobs that have pages, and are waiting for the io to be issued. > + * iv) jobs that don't need to do any IO and just run a callback > + * v) jobs that have completed. > * > - * All four of these are protected by job_lock. > + * All five of these are protected by job_lock. > */ > spinlock_t job_lock; > struct list_head callback_jobs; > struct list_head complete_jobs; > + struct list_head copy_jobs; > struct list_head io_jobs; > struct list_head pages_jobs; > }; > @@ -579,6 +581,43 @@ static int run_io_job(struct kcopyd_job *job) > return r; > } > > +static int run_copy_job(struct kcopyd_job *job) > +{ > + int r, i, count = 0; > + unsigned long flags = 0; > + struct range_entry srange; > + > + struct request_queue *src_q, *dest_q; > + > + for (i = 0; i < job->num_dests; i++) { > + srange.src = job->source.sector; > + srange.len = job->source.count; > + > + src_q = bdev_get_queue(job->source.bdev); > + dest_q = bdev_get_queue(job->dests[i].bdev); > + > + if (src_q != dest_q && !src_q->limits.copy_offload) > + break; > + > + r = blkdev_issue_copy(job->source.bdev, 1, &srange, > + job->dests[i].bdev, job->dests[i].sector, GFP_KERNEL, flags); > + if (r) > + break; > + > + job->dests[i].count = 0; > + count++; > + } > + > + if (count == job->num_dests) { > + push(&job->kc->complete_jobs, job); > + } else { > + push(&job->kc->pages_jobs, job); > + r = 0; > + } > + > + return r; > +} > + > static int run_pages_job(struct kcopyd_job *job) > { > int r; > @@ -659,6 +698,7 @@ static void do_work(struct work_struct *work) > spin_unlock_irq(&kc->job_lock); > > blk_start_plug(&plug); > + process_jobs(&kc->copy_jobs, kc, run_copy_job); > process_jobs(&kc->complete_jobs, kc, run_complete_job); > process_jobs(&kc->pages_jobs, kc, run_pages_job); > process_jobs(&kc->io_jobs, kc, run_io_job); > @@ -676,6 +716,8 @@ static void dispatch_job(struct kcopyd_job *job) > atomic_inc(&kc->nr_jobs); > if (unlikely(!job->source.count)) > push(&kc->callback_jobs, job); > + else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk) > + push(&kc->copy_jobs, job); > else if (job->pages == &zero_page_list) > push(&kc->io_jobs, job); > else > @@ -916,6 +958,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro > spin_lock_init(&kc->job_lock); > INIT_LIST_HEAD(&kc->callback_jobs); > INIT_LIST_HEAD(&kc->complete_jobs); > + INIT_LIST_HEAD(&kc->copy_jobs); > INIT_LIST_HEAD(&kc->io_jobs); > INIT_LIST_HEAD(&kc->pages_jobs); > kc->throttle = throttle; > @@ -971,6 +1014,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc) > > BUG_ON(!list_empty(&kc->callback_jobs)); > BUG_ON(!list_empty(&kc->complete_jobs)); > + WARN_ON(!list_empty(&kc->copy_jobs)); > BUG_ON(!list_empty(&kc->io_jobs)); > BUG_ON(!list_empty(&kc->pages_jobs)); > destroy_workqueue(kc->kcopyd_wq); > -- > 2.25.1 > > > -- > dm-devel mailing list > dm-devel@redhat.com > https://listman.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 0/7] add simple copy support 2021-08-17 10:14 ` [dm-devel] [PATCH 0/7] add simple copy support SelvaKumar S ` (6 preceding siblings ...) [not found] ` <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com> @ 2021-08-17 23:37 ` Darrick J. Wong 2021-08-18 15:40 ` Nitesh Shetty 7 siblings, 1 reply; 30+ messages in thread From: Darrick J. Wong @ 2021-08-17 23:37 UTC (permalink / raw) To: SelvaKumar S Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, nitheshshetty, willy, nj.shetty, kch, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Tue, Aug 17, 2021 at 03:44:16PM +0530, SelvaKumar S wrote: > This started out as an attempt to support NVMe Simple Copy Command (SCC), > and evolved during the RFC review process. > > The patchset, at this point, contains - > 1. SCC support in NVMe driver > 2. Block-layer infra for copy-offload operation > 3. ioctl interface to user-space > 4. copy-emulation infra in the block-layer > 5. copy-offload plumbing to dm-kcopyd (thus creating couple of in-kernel > users such as dm-clone) > > > The SCC specification, i.e. TP4065a can be found in following link > > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip > > Simple copy is a copy offloading feature and can be used to copy multiple > contiguous ranges (source_ranges) of LBA's to a single destination LBA > within the device, reducing traffic between host and device. > > We define a block ioctl for copy and copy payload format similar to > discard. For device supporting native simple copy, we attach the control > information as payload to the bio and submit to the device. Copy emulation > is implemented incase underlaying device does not support copy offload or > based on sysfs choice. Copy emulation is done by reading each source range > into buffer and writing it to the destination. Seems useful. Would you mind adapting the loop driver to call copy_file_range (for files) so that anyone interested in making a filesystem use this capability (cough) can write fstests? --D > At present this implementation does not support copy offload for stacked/dm > devices, rather copy operation is completed through emulation. > > One of the in-kernel use case for copy-offload is implemented by this > patchset. dm-kcopyd infra has been changed to leverage the copy-offload if > it is natively available. Other future use-cases could be F2FS GC, BTRFS > relocation/balance and copy_file_range. > > Following limits are added to queue limits and are exposed via sysfs to > userspace, which user can use to form a payload. > > - copy_offload: > configurable, can be used set emulation or copy offload > 0 to disable copy offload, > 1 to enable copy offloading support. Offload can be only > enabled, if underlaying device supports offload > > - max_copy_sectors: > total copy length supported by copy offload feature in device. > 0 indicates copy offload is not supported. > > - max_copy_nr_ranges: > maximum number of source range entries supported by copy offload > feature in device > > - max_copy_range_sectors: > maximum copy length per source range entry > > *blkdev_issue_copy* takes source bdev, no of sources, array of source > ranges (in sectors), destination bdev and destination offset(in sectors). > If both source and destination block devices are same and queue parameter > copy_offload is 1, then copy is done through native copy offloading. > Copy emulation is used in otherwise. > > Changes from RFC v5 > > 1. Handle copy larger than maximum copy limits > 2. Create copy context and submit copy offload asynchronously > 3. Remove BLKDEV_COPY_NOEMULATION opt-in option of copy offload and > check for copy support before submission from dm and other layers > 4. Allocate maximum possible allocatable buffer for copy emulation > rather failing very large copy offload. > 5. Fix copy_offload sysfs to be either have 0 or 1 > > Changes from RFC v4 > > 1. Extend dm-kcopyd to leverage copy-offload, while copying within the > same device. The other approach was to have copy-emulation by moving > dm-kcopyd to block layer. But it also required moving core dm-io infra, > causing a massive churn across multiple dm-targets. > 2. Remove export in bio_map_kern() > 3. Change copy_offload sysfs to accept 0 or else > 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY > 5. Rename payload entries, add source bdev field to be used while > partition remapping, remove copy_size > 6. Change the blkdev_issue_copy() interface to accept destination and > source values in sector rather in bytes > 7. Add payload to bio using bio_map_kern() for copy_offload case > 8. Add check to return error if one of the source range length is 0 > 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy > emulation incase of copy offload is not supported. Caller can his use > his existing copying logic to complete the io. > 10. Bug fix copy checks and reduce size of rcu_lock() > > Changes from RFC v3 > > 1. gfp_flag fixes. > 2. Export bio_map_kern() and use it to allocate and add pages to bio. > 3. Move copy offload, reading to buf, writing from buf to separate functions > 4. Send read bio of copy offload by chaining them and submit asynchronously > 5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy(). > 6. Move single source range limit check to blk_check_copy() > 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper. > 8. Change blkdev_issue_copy() interface generic to accepts destination bdev > to support XCOPY as well. > 9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory. > 10. Fix buf allocoation logic to allocate buffer for the total size of copy. > 11. Reword patch commit description. > > Changes from RFC v2 > > 1. Add emulation support for devices not supporting copy. > 2. Add *copy_offload* sysfs entry to enable and disable copy_offload > in devices supporting simple copy. > 3. Remove simple copy support for stacked devices. > > Changes from RFC v1: > > 1. Fix memory leak in __blkdev_issue_copy > 2. Unmark blk_check_copy inline > 3. Fix line break in blk_check_copy_eod > 4. Remove p checks and made code more readable > 5. Don't use bio_set_op_attrs and remove op and set > bi_opf directly > 6. Use struct_size to calculate total_size > 7. Fix partition remap of copy destination > 8. Remove mcl,mssrl,msrc from nvme_ns > 9. Initialize copy queue limits to 0 in nvme_config_copy > 10. Remove return in QUEUE_FLAG_COPY check > 11. Remove unused OCFS > > > Nitesh Shetty (4): > block: Introduce queue limits for copy-offload support > block: copy offload support infrastructure > block: Introduce a new ioctl for simple copy > block: add emulation for simple copy > > SelvaKumar S (3): > block: make bio_map_kern() non static > nvme: add simple copy support > dm kcopyd: add simple copy offload support > > block/blk-core.c | 84 ++++++++- > block/blk-lib.c | 352 ++++++++++++++++++++++++++++++++++++++ > block/blk-map.c | 2 +- > block/blk-settings.c | 4 + > block/blk-sysfs.c | 51 ++++++ > block/blk-zoned.c | 1 + > block/bounce.c | 1 + > block/ioctl.c | 33 ++++ > drivers/md/dm-kcopyd.c | 56 +++++- > drivers/nvme/host/core.c | 83 +++++++++ > drivers/nvme/host/trace.c | 19 ++ > include/linux/bio.h | 1 + > include/linux/blk_types.h | 20 +++ > include/linux/blkdev.h | 21 +++ > include/linux/nvme.h | 43 ++++- > include/uapi/linux/fs.h | 20 +++ > 16 files changed, 775 insertions(+), 16 deletions(-) > > -- > 2.25.1 > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] [PATCH 0/7] add simple copy support 2021-08-17 23:37 ` [dm-devel] [PATCH 0/7] add simple copy support Darrick J. Wong @ 2021-08-18 15:40 ` Nitesh Shetty 0 siblings, 0 replies; 30+ messages in thread From: Nitesh Shetty @ 2021-08-18 15:40 UTC (permalink / raw) To: Darrick J. Wong Cc: snitzer, linux-nvme, dm-devel, hch, agk, bvanassche, linux-scsi, willy, nj.shetty, kch, SelvaKumar S, selvajove, linux-block, mpatocka, javier.gonz, kbusch, axboe, damien.lemoal, joshi.k, martin.petersen, linux-api, johannes.thumshirn, linux-fsdevel, joshiiitr, asml.silence On Wed, Aug 18, 2021 at 5:07 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Aug 17, 2021 at 03:44:16PM +0530, SelvaKumar S wrote: > > This started out as an attempt to support NVMe Simple Copy Command (SCC), > > and evolved during the RFC review process. > > > > The patchset, at this point, contains - > > 1. SCC support in NVMe driver > > 2. Block-layer infra for copy-offload operation > > 3. ioctl interface to user-space > > 4. copy-emulation infra in the block-layer > > 5. copy-offload plumbing to dm-kcopyd (thus creating couple of in-kernel > > users such as dm-clone) > > > > > > The SCC specification, i.e. TP4065a can be found in following link > > > > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip > > > > Simple copy is a copy offloading feature and can be used to copy multiple > > contiguous ranges (source_ranges) of LBA's to a single destination LBA > > within the device, reducing traffic between host and device. > > > > We define a block ioctl for copy and copy payload format similar to > > discard. For device supporting native simple copy, we attach the control > > information as payload to the bio and submit to the device. Copy emulation > > is implemented incase underlaying device does not support copy offload or > > based on sysfs choice. Copy emulation is done by reading each source range > > into buffer and writing it to the destination. > > Seems useful. Would you mind adapting the loop driver to call > copy_file_range (for files) so that anyone interested in making a > filesystem use this capability (cough) can write fstests? We are planning to look into copy_file_range plumbing after settling on the current series, which already became heavyweight for the first drop. Nitesh Shetty > > --D > > > At present this implementation does not support copy offload for stacked/dm > > devices, rather copy operation is completed through emulation. > > > > One of the in-kernel use case for copy-offload is implemented by this > > patchset. dm-kcopyd infra has been changed to leverage the copy-offload if > > it is natively available. Other future use-cases could be F2FS GC, BTRFS > > relocation/balance and copy_file_range. > > > > Following limits are added to queue limits and are exposed via sysfs to > > userspace, which user can use to form a payload. > > > > - copy_offload: > > configurable, can be used set emulation or copy offload > > 0 to disable copy offload, > > 1 to enable copy offloading support. Offload can be only > > enabled, if underlaying device supports offload > > > > - max_copy_sectors: > > total copy length supported by copy offload feature in device. > > 0 indicates copy offload is not supported. > > > > - max_copy_nr_ranges: > > maximum number of source range entries supported by copy offload > > feature in device > > > > - max_copy_range_sectors: > > maximum copy length per source range entry > > > > *blkdev_issue_copy* takes source bdev, no of sources, array of source > > ranges (in sectors), destination bdev and destination offset(in sectors). > > If both source and destination block devices are same and queue parameter > > copy_offload is 1, then copy is done through native copy offloading. > > Copy emulation is used in otherwise. > > > > Changes from RFC v5 > > > > 1. Handle copy larger than maximum copy limits > > 2. Create copy context and submit copy offload asynchronously > > 3. Remove BLKDEV_COPY_NOEMULATION opt-in option of copy offload and > > check for copy support before submission from dm and other layers > > 4. Allocate maximum possible allocatable buffer for copy emulation > > rather failing very large copy offload. > > 5. Fix copy_offload sysfs to be either have 0 or 1 > > > > Changes from RFC v4 > > > > 1. Extend dm-kcopyd to leverage copy-offload, while copying within the > > same device. The other approach was to have copy-emulation by moving > > dm-kcopyd to block layer. But it also required moving core dm-io infra, > > causing a massive churn across multiple dm-targets. > > 2. Remove export in bio_map_kern() > > 3. Change copy_offload sysfs to accept 0 or else > > 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY > > 5. Rename payload entries, add source bdev field to be used while > > partition remapping, remove copy_size > > 6. Change the blkdev_issue_copy() interface to accept destination and > > source values in sector rather in bytes > > 7. Add payload to bio using bio_map_kern() for copy_offload case > > 8. Add check to return error if one of the source range length is 0 > > 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy > > emulation incase of copy offload is not supported. Caller can his use > > his existing copying logic to complete the io. > > 10. Bug fix copy checks and reduce size of rcu_lock() > > > > Changes from RFC v3 > > > > 1. gfp_flag fixes. > > 2. Export bio_map_kern() and use it to allocate and add pages to bio. > > 3. Move copy offload, reading to buf, writing from buf to separate functions > > 4. Send read bio of copy offload by chaining them and submit asynchronously > > 5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy(). > > 6. Move single source range limit check to blk_check_copy() > > 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper. > > 8. Change blkdev_issue_copy() interface generic to accepts destination bdev > > to support XCOPY as well. > > 9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory. > > 10. Fix buf allocoation logic to allocate buffer for the total size of copy. > > 11. Reword patch commit description. > > > > Changes from RFC v2 > > > > 1. Add emulation support for devices not supporting copy. > > 2. Add *copy_offload* sysfs entry to enable and disable copy_offload > > in devices supporting simple copy. > > 3. Remove simple copy support for stacked devices. > > > > Changes from RFC v1: > > > > 1. Fix memory leak in __blkdev_issue_copy > > 2. Unmark blk_check_copy inline > > 3. Fix line break in blk_check_copy_eod > > 4. Remove p checks and made code more readable > > 5. Don't use bio_set_op_attrs and remove op and set > > bi_opf directly > > 6. Use struct_size to calculate total_size > > 7. Fix partition remap of copy destination > > 8. Remove mcl,mssrl,msrc from nvme_ns > > 9. Initialize copy queue limits to 0 in nvme_config_copy > > 10. Remove return in QUEUE_FLAG_COPY check > > 11. Remove unused OCFS > > > > > > Nitesh Shetty (4): > > block: Introduce queue limits for copy-offload support > > block: copy offload support infrastructure > > block: Introduce a new ioctl for simple copy > > block: add emulation for simple copy > > > > SelvaKumar S (3): > > block: make bio_map_kern() non static > > nvme: add simple copy support > > dm kcopyd: add simple copy offload support > > > > block/blk-core.c | 84 ++++++++- > > block/blk-lib.c | 352 ++++++++++++++++++++++++++++++++++++++ > > block/blk-map.c | 2 +- > > block/blk-settings.c | 4 + > > block/blk-sysfs.c | 51 ++++++ > > block/blk-zoned.c | 1 + > > block/bounce.c | 1 + > > block/ioctl.c | 33 ++++ > > drivers/md/dm-kcopyd.c | 56 +++++- > > drivers/nvme/host/core.c | 83 +++++++++ > > drivers/nvme/host/trace.c | 19 ++ > > include/linux/bio.h | 1 + > > include/linux/blk_types.h | 20 +++ > > include/linux/blkdev.h | 21 +++ > > include/linux/nvme.h | 43 ++++- > > include/uapi/linux/fs.h | 20 +++ > > 16 files changed, 775 insertions(+), 16 deletions(-) > > > > -- > > 2.25.1 > > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2021-08-30 6:53 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20210817101741epcas5p174ca0a539587da6a67b9f58cd13f2bad@epcas5p1.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 0/7] add simple copy support SelvaKumar S [not found] ` <CGME20210817101747epcas5p1242e63ec29b127b03b6f9f5f1b57f86e@epcas5p1.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 1/7] block: make bio_map_kern() non static SelvaKumar S [not found] ` <CGME20210817101753epcas5p4f4257f8edda27e184ecbb273b700ccbc@epcas5p4.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 2/7] block: Introduce queue limits for copy-offload support SelvaKumar S 2021-08-17 13:08 ` Greg KH 2021-08-17 14:42 ` Nitesh Shetty [not found] ` <CGME20210817101758epcas5p1ec353b3838d64654e69488229256d9eb@epcas5p1.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 3/7] block: copy offload support infrastructure SelvaKumar S 2021-08-17 17:14 ` Bart Van Assche 2021-08-17 20:41 ` Mikulas Patocka 2021-08-17 21:53 ` Douglas Gilbert 2021-08-17 22:06 ` Bart Van Assche 2021-08-20 10:39 ` Kanchan Joshi 2021-08-20 21:18 ` Bart Van Assche 2021-08-26 7:46 ` Nitesh Shetty 2021-08-17 20:35 ` kernel test robot 2021-08-18 18:35 ` Martin K. Petersen 2021-08-20 11:11 ` Kanchan Joshi [not found] ` <CGME20210817101803epcas5p10cda1d52f8a8f1172e34b1f9cf8eef3b@epcas5p1.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 4/7] block: Introduce a new ioctl for simple copy SelvaKumar S 2021-08-17 13:09 ` Greg KH 2021-08-17 13:10 ` Greg KH 2021-08-17 14:48 ` Nitesh Shetty 2021-08-17 23:36 ` Darrick J. Wong 2021-08-18 15:37 ` Nitesh Shetty 2021-08-18 16:17 ` Darrick J. Wong [not found] ` <CGME20210817101809epcas5p39eed3531ed82f5f08127eb3dba1fc50f@epcas5p3.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 5/7] block: add emulation " SelvaKumar S 2021-08-17 22:10 ` kernel test robot [not found] ` <CGME20210817101814epcas5p41db3d7269f5139efcaf2ca685cd04a16@epcas5p4.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 6/7] nvme: add simple copy support SelvaKumar S [not found] ` <CGME20210817101822epcas5p470644cf681d5e8db5367dc7998305c65@epcas5p4.samsung.com> 2021-08-17 10:14 ` [dm-devel] [PATCH 7/7] dm kcopyd: add simple copy offload support SelvaKumar S 2021-08-17 20:29 ` Mikulas Patocka 2021-08-17 23:37 ` [dm-devel] [PATCH 0/7] add simple copy support Darrick J. Wong 2021-08-18 15:40 ` Nitesh Shetty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).