All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Add Copy offload support
       [not found] <CGME20220426101804epcas5p4a0a325d3ce89e868e4924bbdeeba6d15@epcas5p4.samsung.com>
  2022-04-26 10:12   ` Nitesh Shetty
@ 2022-04-26 10:12   ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

The patch series covers the points discussed in November 2021 virtual call
[LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
We have covered the Initial agreed requirements in this patchset.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.

Overall series supports –

1. Driver
- NVMe Copy command (single NS), including support in nvme-target (for
    block and file backend)

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface accommodating
    two block-devs, and multi-source/destination interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl
- copy_file_range for zonefs

4. In-kernel user
- dm-kcopyd
- copy_file_range in zonefs

For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
testing is done at this point using a custom application for unit testing.

Appreciate the inputs on plumbing and how to test this further?
Perhaps some of it can be discussed during LSF/MM too.

[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/

Changes in v4:
- added copy_file_range support for zonefs
- added documentaion about new sysfs entries
- incorporated review comments on v3
- minor fixes


Arnav Dawn (2):
  nvmet: add copy command support for bdev and file ns
  fs: add support for copy file range in zonefs

Nitesh Shetty (7):
  block: Introduce queue limits for copy-offload support
  block: Add copy offload support infrastructure
  block: Introduce a new ioctl for copy
  block: add emulation for copy
  nvme: add copy offload support
  dm: Add support for copy offload.
  dm: Enable copy offload for dm-linear target

SelvaKumar S (1):
  dm kcopyd: use copy offload support

 Documentation/ABI/stable/sysfs-block |  83 +++++++
 block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
 block/blk-map.c                      |   2 +-
 block/blk-settings.c                 |  59 +++++
 block/blk-sysfs.c                    | 138 +++++++++++
 block/blk.h                          |   2 +
 block/ioctl.c                        |  32 +++
 drivers/md/dm-kcopyd.c               |  55 +++-
 drivers/md/dm-linear.c               |   1 +
 drivers/md/dm-table.c                |  45 ++++
 drivers/md/dm.c                      |   6 +
 drivers/nvme/host/core.c             | 116 ++++++++-
 drivers/nvme/host/fc.c               |   4 +
 drivers/nvme/host/nvme.h             |   7 +
 drivers/nvme/host/pci.c              |  25 ++
 drivers/nvme/host/rdma.c             |   6 +
 drivers/nvme/host/tcp.c              |  14 ++
 drivers/nvme/host/trace.c            |  19 ++
 drivers/nvme/target/admin-cmd.c      |   8 +-
 drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
 drivers/nvme/target/io-cmd-file.c    |  49 ++++
 fs/zonefs/super.c                    | 178 ++++++++++++-
 fs/zonefs/zonefs.h                   |   1 +
 include/linux/blk_types.h            |  21 ++
 include/linux/blkdev.h               |  17 ++
 include/linux/device-mapper.h        |   5 +
 include/linux/nvme.h                 |  43 +++-
 include/uapi/linux/fs.h              |  23 ++
 28 files changed, 1367 insertions(+), 15 deletions(-)


base-commit: e7d6987e09a328d4a949701db40ef63fbb970670
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH v4 00/10] Add Copy offload support
@ 2022-04-26 10:12   ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

The patch series covers the points discussed in November 2021 virtual call
[LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
We have covered the Initial agreed requirements in this patchset.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.

Overall series supports –

1. Driver
- NVMe Copy command (single NS), including support in nvme-target (for
    block and file backend)

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface accommodating
    two block-devs, and multi-source/destination interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl
- copy_file_range for zonefs

4. In-kernel user
- dm-kcopyd
- copy_file_range in zonefs

For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
testing is done at this point using a custom application for unit testing.

Appreciate the inputs on plumbing and how to test this further?
Perhaps some of it can be discussed during LSF/MM too.

[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/

Changes in v4:
- added copy_file_range support for zonefs
- added documentaion about new sysfs entries
- incorporated review comments on v3
- minor fixes


Arnav Dawn (2):
  nvmet: add copy command support for bdev and file ns
  fs: add support for copy file range in zonefs

Nitesh Shetty (7):
  block: Introduce queue limits for copy-offload support
  block: Add copy offload support infrastructure
  block: Introduce a new ioctl for copy
  block: add emulation for copy
  nvme: add copy offload support
  dm: Add support for copy offload.
  dm: Enable copy offload for dm-linear target

SelvaKumar S (1):
  dm kcopyd: use copy offload support

 Documentation/ABI/stable/sysfs-block |  83 +++++++
 block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
 block/blk-map.c                      |   2 +-
 block/blk-settings.c                 |  59 +++++
 block/blk-sysfs.c                    | 138 +++++++++++
 block/blk.h                          |   2 +
 block/ioctl.c                        |  32 +++
 drivers/md/dm-kcopyd.c               |  55 +++-
 drivers/md/dm-linear.c               |   1 +
 drivers/md/dm-table.c                |  45 ++++
 drivers/md/dm.c                      |   6 +
 drivers/nvme/host/core.c             | 116 ++++++++-
 drivers/nvme/host/fc.c               |   4 +
 drivers/nvme/host/nvme.h             |   7 +
 drivers/nvme/host/pci.c              |  25 ++
 drivers/nvme/host/rdma.c             |   6 +
 drivers/nvme/host/tcp.c              |  14 ++
 drivers/nvme/host/trace.c            |  19 ++
 drivers/nvme/target/admin-cmd.c      |   8 +-
 drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
 drivers/nvme/target/io-cmd-file.c    |  49 ++++
 fs/zonefs/super.c                    | 178 ++++++++++++-
 fs/zonefs/zonefs.h                   |   1 +
 include/linux/blk_types.h            |  21 ++
 include/linux/blkdev.h               |  17 ++
 include/linux/device-mapper.h        |   5 +
 include/linux/nvme.h                 |  43 +++-
 include/uapi/linux/fs.h              |  23 ++
 28 files changed, 1367 insertions(+), 15 deletions(-)


base-commit: e7d6987e09a328d4a949701db40ef63fbb970670
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-26 10:12   ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel, jack,
	linux-fsdevel, lsf-pc, Damien Le Moal, Alexander Viro

The patch series covers the points discussed in November 2021 virtual call
[LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
We have covered the Initial agreed requirements in this patchset.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.

Overall series supports –

1. Driver
- NVMe Copy command (single NS), including support in nvme-target (for
    block and file backend)

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface accommodating
    two block-devs, and multi-source/destination interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl
- copy_file_range for zonefs

4. In-kernel user
- dm-kcopyd
- copy_file_range in zonefs

For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
testing is done at this point using a custom application for unit testing.

Appreciate the inputs on plumbing and how to test this further?
Perhaps some of it can be discussed during LSF/MM too.

[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/

Changes in v4:
- added copy_file_range support for zonefs
- added documentaion about new sysfs entries
- incorporated review comments on v3
- minor fixes


Arnav Dawn (2):
  nvmet: add copy command support for bdev and file ns
  fs: add support for copy file range in zonefs

Nitesh Shetty (7):
  block: Introduce queue limits for copy-offload support
  block: Add copy offload support infrastructure
  block: Introduce a new ioctl for copy
  block: add emulation for copy
  nvme: add copy offload support
  dm: Add support for copy offload.
  dm: Enable copy offload for dm-linear target

SelvaKumar S (1):
  dm kcopyd: use copy offload support

 Documentation/ABI/stable/sysfs-block |  83 +++++++
 block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
 block/blk-map.c                      |   2 +-
 block/blk-settings.c                 |  59 +++++
 block/blk-sysfs.c                    | 138 +++++++++++
 block/blk.h                          |   2 +
 block/ioctl.c                        |  32 +++
 drivers/md/dm-kcopyd.c               |  55 +++-
 drivers/md/dm-linear.c               |   1 +
 drivers/md/dm-table.c                |  45 ++++
 drivers/md/dm.c                      |   6 +
 drivers/nvme/host/core.c             | 116 ++++++++-
 drivers/nvme/host/fc.c               |   4 +
 drivers/nvme/host/nvme.h             |   7 +
 drivers/nvme/host/pci.c              |  25 ++
 drivers/nvme/host/rdma.c             |   6 +
 drivers/nvme/host/tcp.c              |  14 ++
 drivers/nvme/host/trace.c            |  19 ++
 drivers/nvme/target/admin-cmd.c      |   8 +-
 drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
 drivers/nvme/target/io-cmd-file.c    |  49 ++++
 fs/zonefs/super.c                    | 178 ++++++++++++-
 fs/zonefs/zonefs.h                   |   1 +
 include/linux/blk_types.h            |  21 ++
 include/linux/blkdev.h               |  17 ++
 include/linux/device-mapper.h        |   5 +
 include/linux/nvme.h                 |  43 +++-
 include/uapi/linux/fs.h              |  23 ++
 28 files changed, 1367 insertions(+), 15 deletions(-)


base-commit: e7d6987e09a328d4a949701db40ef63fbb970670
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
       [not found]   ` <CGME20220426101910epcas5p4fd64f83c6da9bbd891107d158a2743b5@epcas5p4.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Kanchan Joshi, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

Add device limits as sysfs entries,
        - copy_offload (RW)
        - copy_max_bytes (RW)
        - copy_max_hw_bytes (RO)
        - copy_max_range_bytes (RW)
        - copy_max_range_hw_bytes (RO)
        - copy_max_nr_ranges (RW)
        - copy_max_nr_ranges_hw (RO)

Above limits help to split the copy payload in block layer.
copy_offload, used for setting copy offload(1) or emulation(0).
copy_max_bytes: maximum total length of copy in single payload.
copy_max_range_bytes: maximum length in a single entry.
copy_max_nr_ranges: maximum number of entries in a payload.
copy_max_*_hw_*: Reflects the device supported maximum limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
 block/blk-settings.c                 |  59 ++++++++++++
 block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
 include/linux/blkdev.h               |  13 +++
 4 files changed, 293 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index e8797cd09aff..65e64b5a0105 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -155,6 +155,89 @@ Description:
 		last zone of the device which may be smaller.
 
 
+What:		/sys/block/<disk>/queue/copy_offload
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] When read, this file shows whether offloading copy to
+		device is enabled (1) or disabled (0). Writing '0' to this
+		file will disable offloading copies for this device.
+		Writing any '1' value will enable this feature.
+
+
+What:		/sys/block/<disk>/queue/copy_max_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
+		device, 'copy_max_bytes' setting is the software limit.
+		Setting this value lower will make Linux issue smaller size
+		copies.
+
+
+What:		/sys/block/<disk>/queue/copy_max_hw_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the number of bytes that can be offloaded
+		in a single operation. The `copy_max_hw_bytes`
+		parameter is set by the device driver to the maximum number of
+		bytes that can be copied in a single operation. Copy
+		requests issued to the device must not exceed this limit.
+		A value of 0 means that the device does not
+		support copy offload.
+
+
+What:		/sys/block/<disk>/queue/copy_max_nr_ranges
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
+		device, 'copy_max_nr_ranges' setting is the software limit.
+
+
+What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the number of ranges in single copy operation
+		that can be offloaded in a single operation.
+		A range is tuple of source, destination and length of data
+		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
+		the device driver to the maximum number of ranges that can be
+		copied in a single operation. Copy requests issued to the device
+		must not exceed this limit. A value of 0 means that the device
+		does not support copy offload.
+
+
+What:		/sys/block/<disk>/queue/copy_max_range_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
+		the device, 'copy_max_range_bytes' setting is the software
+		limit.
+
+
+What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the size of data, that can be copied in a
+		single range within a single copy operation.
+		A range is tuple of source, destination and length of data to be
+		copied. The `copy_max_range_hw_bytes` parameter is set by the
+		device driver to set the maximum length in bytes of a range
+		that can be copied in an operation.
+		Copy requests issued to the device must not exceed this limit.
+		Sum of sizes of all ranges in a single opeartion should not
+		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
+		does not support copy offload.
+
+
 What:		/sys/block/<disk>/queue/crypto/
 Date:		February 2022
 Contact:	linux-block@vger.kernel.org
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 6ccceb421ed2..70167aee3bf7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->max_hw_copy_sectors = 0;
+	lim->max_copy_sectors = 0;
+	lim->max_hw_copy_nr_ranges = 0;
+	lim->max_copy_nr_ranges = 0;
+	lim->max_hw_copy_range_sectors = 0;
+	lim->max_copy_range_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_hw_copy_sectors = ULONG_MAX;
+	lim->max_copy_sectors = ULONG_MAX;
+	lim->max_hw_copy_range_sectors = UINT_MAX;
+	lim->max_copy_range_sectors = UINT_MAX;
+	lim->max_hw_copy_nr_ranges = USHRT_MAX;
+	lim->max_copy_nr_ranges = USHRT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
 
+/**
+ * blk_queue_max_copy_sectors - set max sectors for a single copy payload
+ * @q:  the request queue for the device
+ * @max_copy_sectors: maximum number of sectors to copy
+ **/
+void blk_queue_max_copy_sectors(struct request_queue *q,
+		unsigned int max_copy_sectors)
+{
+	q->limits.max_hw_copy_sectors = max_copy_sectors;
+	q->limits.max_copy_sectors = max_copy_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
+
+/**
+ * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
+ * @q:  the request queue for the device
+ * @max_copy_range_sectors: maximum number of sectors to copy in a single range
+ **/
+void blk_queue_max_copy_range_sectors(struct request_queue *q,
+		unsigned int max_copy_range_sectors)
+{
+	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
+	q->limits.max_copy_range_sectors = max_copy_range_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
+
+/**
+ * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
+ * @q:  the request queue for the device
+ * @max_copy_nr_ranges: maximum number of ranges
+ **/
+void blk_queue_max_copy_nr_ranges(struct request_queue *q,
+		unsigned int max_copy_nr_ranges)
+{
+	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
+	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
+
 /**
  * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
  * @q:  the request queue for the device
@@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
+	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
+	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
+	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
+						b->max_hw_copy_range_sectors);
+	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
+	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..bae987c10f7f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
 	return queue_var_show(0, page);
 }
 
+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(blk_queue_copy(q), page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long copy_offload;
+	ssize_t ret = queue_var_store(&copy_offload, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_offload && !q->limits.max_hw_copy_sectors)
+		return -EINVAL;
+
+	if (copy_offload)
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	else
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+
+	return ret;
+}
+
+static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
+}
+
+static ssize_t queue_copy_max_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_copy_sectors << 9);
+}
+
+static ssize_t queue_copy_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_copy;
+	ssize_t ret = queue_var_store(&max_copy, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_copy & (queue_logical_block_size(q) - 1))
+		return -EINVAL;
+
+	max_copy >>= 9;
+	if (max_copy > q->limits.max_hw_copy_sectors)
+		max_copy = q->limits.max_hw_copy_sectors;
+
+	q->limits.max_copy_sectors = max_copy;
+	return ret;
+}
+
+static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
+}
+
+static ssize_t queue_copy_range_max_show(struct request_queue *q,
+		char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_copy_range_sectors << 9);
+}
+
+static ssize_t queue_copy_range_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_copy;
+	ssize_t ret = queue_var_store(&max_copy, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_copy & (queue_logical_block_size(q) - 1))
+		return -EINVAL;
+
+	max_copy >>= 9;
+	if (max_copy > UINT_MAX)
+		return -EINVAL;
+
+	if (max_copy > q->limits.max_hw_copy_range_sectors)
+		max_copy = q->limits.max_hw_copy_range_sectors;
+
+	q->limits.max_copy_range_sectors = max_copy;
+	return ret;
+}
+
+static ssize_t queue_copy_nr_ranges_max_hw_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.max_hw_copy_nr_ranges, page);
+}
+
+static ssize_t queue_copy_nr_ranges_max_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_nr_ranges, page);
+}
+
+static ssize_t queue_copy_nr_ranges_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_nr;
+	ssize_t ret = queue_var_store(&max_nr, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_nr > USHRT_MAX)
+		return -EINVAL;
+
+	if (max_nr > q->limits.max_hw_copy_nr_ranges)
+		max_nr = q->limits.max_hw_copy_nr_ranges;
+
+	q->limits.max_copy_nr_ranges = max_nr;
+	return ret;
+}
+
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(0, page);
@@ -596,6 +719,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
 QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
 QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
 
+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes");
+QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes");
+QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes");
+QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes");
+QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw");
+QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges");
+
 QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
 QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
 QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -642,6 +773,13 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_copy_offload_entry.attr,
+	&queue_copy_max_hw_entry.attr,
+	&queue_copy_max_entry.attr,
+	&queue_copy_range_max_hw_entry.attr,
+	&queue_copy_range_max_entry.attr,
+	&queue_copy_nr_ranges_max_hw_entry.attr,
+	&queue_copy_nr_ranges_max_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1b24c1fb3bb1..3596fd37fae7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -270,6 +270,13 @@ struct queue_limits {
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
 
+	unsigned long		max_hw_copy_sectors;
+	unsigned long		max_copy_sectors;
+	unsigned int		max_hw_copy_range_sectors;
+	unsigned int		max_copy_range_sectors;
+	unsigned short		max_hw_copy_nr_ranges;
+	unsigned short		max_copy_nr_ranges;
+
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
 	unsigned short		max_discard_segments;
@@ -574,6 +581,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_COPY		30	/* supports copy offload */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -596,6 +604,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
+#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
@@ -960,6 +969,10 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_max_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_discard_segments(struct request_queue *,
 		unsigned short);
+extern void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int max_copy_sectors);
+extern void blk_queue_max_copy_range_sectors(struct request_queue *q,
+		unsigned int max_copy_range_sectors);
+extern void blk_queue_max_copy_nr_ranges(struct request_queue *q, unsigned int max_copy_nr_ranges);
 void blk_queue_max_secure_erase_sectors(struct request_queue *q,
 		unsigned int max_sectors);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Kanchan Joshi, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

Add device limits as sysfs entries,
        - copy_offload (RW)
        - copy_max_bytes (RW)
        - copy_max_hw_bytes (RO)
        - copy_max_range_bytes (RW)
        - copy_max_range_hw_bytes (RO)
        - copy_max_nr_ranges (RW)
        - copy_max_nr_ranges_hw (RO)

Above limits help to split the copy payload in block layer.
copy_offload, used for setting copy offload(1) or emulation(0).
copy_max_bytes: maximum total length of copy in single payload.
copy_max_range_bytes: maximum length in a single entry.
copy_max_nr_ranges: maximum number of entries in a payload.
copy_max_*_hw_*: Reflects the device supported maximum limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
 block/blk-settings.c                 |  59 ++++++++++++
 block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
 include/linux/blkdev.h               |  13 +++
 4 files changed, 293 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index e8797cd09aff..65e64b5a0105 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -155,6 +155,89 @@ Description:
 		last zone of the device which may be smaller.
 
 
+What:		/sys/block/<disk>/queue/copy_offload
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] When read, this file shows whether offloading copy to
+		device is enabled (1) or disabled (0). Writing '0' to this
+		file will disable offloading copies for this device.
+		Writing any '1' value will enable this feature.
+
+
+What:		/sys/block/<disk>/queue/copy_max_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
+		device, 'copy_max_bytes' setting is the software limit.
+		Setting this value lower will make Linux issue smaller size
+		copies.
+
+
+What:		/sys/block/<disk>/queue/copy_max_hw_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the number of bytes that can be offloaded
+		in a single operation. The `copy_max_hw_bytes`
+		parameter is set by the device driver to the maximum number of
+		bytes that can be copied in a single operation. Copy
+		requests issued to the device must not exceed this limit.
+		A value of 0 means that the device does not
+		support copy offload.
+
+
+What:		/sys/block/<disk>/queue/copy_max_nr_ranges
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
+		device, 'copy_max_nr_ranges' setting is the software limit.
+
+
+What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the number of ranges in single copy operation
+		that can be offloaded in a single operation.
+		A range is tuple of source, destination and length of data
+		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
+		the device driver to the maximum number of ranges that can be
+		copied in a single operation. Copy requests issued to the device
+		must not exceed this limit. A value of 0 means that the device
+		does not support copy offload.
+
+
+What:		/sys/block/<disk>/queue/copy_max_range_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
+		the device, 'copy_max_range_bytes' setting is the software
+		limit.
+
+
+What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the size of data, that can be copied in a
+		single range within a single copy operation.
+		A range is tuple of source, destination and length of data to be
+		copied. The `copy_max_range_hw_bytes` parameter is set by the
+		device driver to set the maximum length in bytes of a range
+		that can be copied in an operation.
+		Copy requests issued to the device must not exceed this limit.
+		Sum of sizes of all ranges in a single opeartion should not
+		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
+		does not support copy offload.
+
+
 What:		/sys/block/<disk>/queue/crypto/
 Date:		February 2022
 Contact:	linux-block@vger.kernel.org
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 6ccceb421ed2..70167aee3bf7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->max_hw_copy_sectors = 0;
+	lim->max_copy_sectors = 0;
+	lim->max_hw_copy_nr_ranges = 0;
+	lim->max_copy_nr_ranges = 0;
+	lim->max_hw_copy_range_sectors = 0;
+	lim->max_copy_range_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_hw_copy_sectors = ULONG_MAX;
+	lim->max_copy_sectors = ULONG_MAX;
+	lim->max_hw_copy_range_sectors = UINT_MAX;
+	lim->max_copy_range_sectors = UINT_MAX;
+	lim->max_hw_copy_nr_ranges = USHRT_MAX;
+	lim->max_copy_nr_ranges = USHRT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
 
+/**
+ * blk_queue_max_copy_sectors - set max sectors for a single copy payload
+ * @q:  the request queue for the device
+ * @max_copy_sectors: maximum number of sectors to copy
+ **/
+void blk_queue_max_copy_sectors(struct request_queue *q,
+		unsigned int max_copy_sectors)
+{
+	q->limits.max_hw_copy_sectors = max_copy_sectors;
+	q->limits.max_copy_sectors = max_copy_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
+
+/**
+ * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
+ * @q:  the request queue for the device
+ * @max_copy_range_sectors: maximum number of sectors to copy in a single range
+ **/
+void blk_queue_max_copy_range_sectors(struct request_queue *q,
+		unsigned int max_copy_range_sectors)
+{
+	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
+	q->limits.max_copy_range_sectors = max_copy_range_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
+
+/**
+ * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
+ * @q:  the request queue for the device
+ * @max_copy_nr_ranges: maximum number of ranges
+ **/
+void blk_queue_max_copy_nr_ranges(struct request_queue *q,
+		unsigned int max_copy_nr_ranges)
+{
+	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
+	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
+
 /**
  * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
  * @q:  the request queue for the device
@@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
+	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
+	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
+	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
+						b->max_hw_copy_range_sectors);
+	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
+	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..bae987c10f7f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
 	return queue_var_show(0, page);
 }
 
+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(blk_queue_copy(q), page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long copy_offload;
+	ssize_t ret = queue_var_store(&copy_offload, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_offload && !q->limits.max_hw_copy_sectors)
+		return -EINVAL;
+
+	if (copy_offload)
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	else
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+
+	return ret;
+}
+
+static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
+}
+
+static ssize_t queue_copy_max_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_copy_sectors << 9);
+}
+
+static ssize_t queue_copy_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_copy;
+	ssize_t ret = queue_var_store(&max_copy, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_copy & (queue_logical_block_size(q) - 1))
+		return -EINVAL;
+
+	max_copy >>= 9;
+	if (max_copy > q->limits.max_hw_copy_sectors)
+		max_copy = q->limits.max_hw_copy_sectors;
+
+	q->limits.max_copy_sectors = max_copy;
+	return ret;
+}
+
+static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
+}
+
+static ssize_t queue_copy_range_max_show(struct request_queue *q,
+		char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_copy_range_sectors << 9);
+}
+
+static ssize_t queue_copy_range_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_copy;
+	ssize_t ret = queue_var_store(&max_copy, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_copy & (queue_logical_block_size(q) - 1))
+		return -EINVAL;
+
+	max_copy >>= 9;
+	if (max_copy > UINT_MAX)
+		return -EINVAL;
+
+	if (max_copy > q->limits.max_hw_copy_range_sectors)
+		max_copy = q->limits.max_hw_copy_range_sectors;
+
+	q->limits.max_copy_range_sectors = max_copy;
+	return ret;
+}
+
+static ssize_t queue_copy_nr_ranges_max_hw_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.max_hw_copy_nr_ranges, page);
+}
+
+static ssize_t queue_copy_nr_ranges_max_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_nr_ranges, page);
+}
+
+static ssize_t queue_copy_nr_ranges_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_nr;
+	ssize_t ret = queue_var_store(&max_nr, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_nr > USHRT_MAX)
+		return -EINVAL;
+
+	if (max_nr > q->limits.max_hw_copy_nr_ranges)
+		max_nr = q->limits.max_hw_copy_nr_ranges;
+
+	q->limits.max_copy_nr_ranges = max_nr;
+	return ret;
+}
+
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(0, page);
@@ -596,6 +719,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
 QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
 QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
 
+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes");
+QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes");
+QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes");
+QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes");
+QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw");
+QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges");
+
 QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
 QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
 QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -642,6 +773,13 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_copy_offload_entry.attr,
+	&queue_copy_max_hw_entry.attr,
+	&queue_copy_max_entry.attr,
+	&queue_copy_range_max_hw_entry.attr,
+	&queue_copy_range_max_entry.attr,
+	&queue_copy_nr_ranges_max_hw_entry.attr,
+	&queue_copy_nr_ranges_max_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1b24c1fb3bb1..3596fd37fae7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -270,6 +270,13 @@ struct queue_limits {
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
 
+	unsigned long		max_hw_copy_sectors;
+	unsigned long		max_copy_sectors;
+	unsigned int		max_hw_copy_range_sectors;
+	unsigned int		max_copy_range_sectors;
+	unsigned short		max_hw_copy_nr_ranges;
+	unsigned short		max_copy_nr_ranges;
+
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
 	unsigned short		max_discard_segments;
@@ -574,6 +581,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_COPY		30	/* supports copy offload */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -596,6 +604,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
+#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
@@ -960,6 +969,10 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_max_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_discard_segments(struct request_queue *,
 		unsigned short);
+extern void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int max_copy_sectors);
+extern void blk_queue_max_copy_range_sectors(struct request_queue *q,
+		unsigned int max_copy_range_sectors);
+extern void blk_queue_max_copy_nr_ranges(struct request_queue *q, unsigned int max_copy_nr_ranges);
 void blk_queue_max_secure_erase_sectors(struct request_queue *q,
 		unsigned int max_sectors);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, Kanchan Joshi, martin.petersen,
	linux-kernel, Arnav Dawn, jack, linux-fsdevel, lsf-pc,
	Damien Le Moal, Alexander Viro

Add device limits as sysfs entries,
        - copy_offload (RW)
        - copy_max_bytes (RW)
        - copy_max_hw_bytes (RO)
        - copy_max_range_bytes (RW)
        - copy_max_range_hw_bytes (RO)
        - copy_max_nr_ranges (RW)
        - copy_max_nr_ranges_hw (RO)

Above limits help to split the copy payload in block layer.
copy_offload, used for setting copy offload(1) or emulation(0).
copy_max_bytes: maximum total length of copy in single payload.
copy_max_range_bytes: maximum length in a single entry.
copy_max_nr_ranges: maximum number of entries in a payload.
copy_max_*_hw_*: Reflects the device supported maximum limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
 block/blk-settings.c                 |  59 ++++++++++++
 block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
 include/linux/blkdev.h               |  13 +++
 4 files changed, 293 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index e8797cd09aff..65e64b5a0105 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -155,6 +155,89 @@ Description:
 		last zone of the device which may be smaller.
 
 
+What:		/sys/block/<disk>/queue/copy_offload
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] When read, this file shows whether offloading copy to
+		device is enabled (1) or disabled (0). Writing '0' to this
+		file will disable offloading copies for this device.
+		Writing any '1' value will enable this feature.
+
+
+What:		/sys/block/<disk>/queue/copy_max_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
+		device, 'copy_max_bytes' setting is the software limit.
+		Setting this value lower will make Linux issue smaller size
+		copies.
+
+
+What:		/sys/block/<disk>/queue/copy_max_hw_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the number of bytes that can be offloaded
+		in a single operation. The `copy_max_hw_bytes`
+		parameter is set by the device driver to the maximum number of
+		bytes that can be copied in a single operation. Copy
+		requests issued to the device must not exceed this limit.
+		A value of 0 means that the device does not
+		support copy offload.
+
+
+What:		/sys/block/<disk>/queue/copy_max_nr_ranges
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
+		device, 'copy_max_nr_ranges' setting is the software limit.
+
+
+What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the number of ranges in single copy operation
+		that can be offloaded in a single operation.
+		A range is tuple of source, destination and length of data
+		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
+		the device driver to the maximum number of ranges that can be
+		copied in a single operation. Copy requests issued to the device
+		must not exceed this limit. A value of 0 means that the device
+		does not support copy offload.
+
+
+What:		/sys/block/<disk>/queue/copy_max_range_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
+		the device, 'copy_max_range_bytes' setting is the software
+		limit.
+
+
+What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
+Date:		April 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support offloading copy functionality may have
+		internal limits on the size of data, that can be copied in a
+		single range within a single copy operation.
+		A range is tuple of source, destination and length of data to be
+		copied. The `copy_max_range_hw_bytes` parameter is set by the
+		device driver to set the maximum length in bytes of a range
+		that can be copied in an operation.
+		Copy requests issued to the device must not exceed this limit.
+		Sum of sizes of all ranges in a single opeartion should not
+		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
+		does not support copy offload.
+
+
 What:		/sys/block/<disk>/queue/crypto/
 Date:		February 2022
 Contact:	linux-block@vger.kernel.org
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 6ccceb421ed2..70167aee3bf7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->max_hw_copy_sectors = 0;
+	lim->max_copy_sectors = 0;
+	lim->max_hw_copy_nr_ranges = 0;
+	lim->max_copy_nr_ranges = 0;
+	lim->max_hw_copy_range_sectors = 0;
+	lim->max_copy_range_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	lim->max_hw_copy_sectors = ULONG_MAX;
+	lim->max_copy_sectors = ULONG_MAX;
+	lim->max_hw_copy_range_sectors = UINT_MAX;
+	lim->max_copy_range_sectors = UINT_MAX;
+	lim->max_hw_copy_nr_ranges = USHRT_MAX;
+	lim->max_copy_nr_ranges = USHRT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
 
+/**
+ * blk_queue_max_copy_sectors - set max sectors for a single copy payload
+ * @q:  the request queue for the device
+ * @max_copy_sectors: maximum number of sectors to copy
+ **/
+void blk_queue_max_copy_sectors(struct request_queue *q,
+		unsigned int max_copy_sectors)
+{
+	q->limits.max_hw_copy_sectors = max_copy_sectors;
+	q->limits.max_copy_sectors = max_copy_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
+
+/**
+ * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
+ * @q:  the request queue for the device
+ * @max_copy_range_sectors: maximum number of sectors to copy in a single range
+ **/
+void blk_queue_max_copy_range_sectors(struct request_queue *q,
+		unsigned int max_copy_range_sectors)
+{
+	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
+	q->limits.max_copy_range_sectors = max_copy_range_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
+
+/**
+ * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
+ * @q:  the request queue for the device
+ * @max_copy_nr_ranges: maximum number of ranges
+ **/
+void blk_queue_max_copy_nr_ranges(struct request_queue *q,
+		unsigned int max_copy_nr_ranges)
+{
+	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
+	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
+
 /**
  * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
  * @q:  the request queue for the device
@@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
+	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
+	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
+	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
+						b->max_hw_copy_range_sectors);
+	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
+	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..bae987c10f7f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
 	return queue_var_show(0, page);
 }
 
+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(blk_queue_copy(q), page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long copy_offload;
+	ssize_t ret = queue_var_store(&copy_offload, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_offload && !q->limits.max_hw_copy_sectors)
+		return -EINVAL;
+
+	if (copy_offload)
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	else
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+
+	return ret;
+}
+
+static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
+}
+
+static ssize_t queue_copy_max_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_copy_sectors << 9);
+}
+
+static ssize_t queue_copy_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_copy;
+	ssize_t ret = queue_var_store(&max_copy, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_copy & (queue_logical_block_size(q) - 1))
+		return -EINVAL;
+
+	max_copy >>= 9;
+	if (max_copy > q->limits.max_hw_copy_sectors)
+		max_copy = q->limits.max_hw_copy_sectors;
+
+	q->limits.max_copy_sectors = max_copy;
+	return ret;
+}
+
+static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
+}
+
+static ssize_t queue_copy_range_max_show(struct request_queue *q,
+		char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_copy_range_sectors << 9);
+}
+
+static ssize_t queue_copy_range_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_copy;
+	ssize_t ret = queue_var_store(&max_copy, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_copy & (queue_logical_block_size(q) - 1))
+		return -EINVAL;
+
+	max_copy >>= 9;
+	if (max_copy > UINT_MAX)
+		return -EINVAL;
+
+	if (max_copy > q->limits.max_hw_copy_range_sectors)
+		max_copy = q->limits.max_hw_copy_range_sectors;
+
+	q->limits.max_copy_range_sectors = max_copy;
+	return ret;
+}
+
+static ssize_t queue_copy_nr_ranges_max_hw_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.max_hw_copy_nr_ranges, page);
+}
+
+static ssize_t queue_copy_nr_ranges_max_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_nr_ranges, page);
+}
+
+static ssize_t queue_copy_nr_ranges_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_nr;
+	ssize_t ret = queue_var_store(&max_nr, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_nr > USHRT_MAX)
+		return -EINVAL;
+
+	if (max_nr > q->limits.max_hw_copy_nr_ranges)
+		max_nr = q->limits.max_hw_copy_nr_ranges;
+
+	q->limits.max_copy_nr_ranges = max_nr;
+	return ret;
+}
+
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(0, page);
@@ -596,6 +719,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
 QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
 QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
 
+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes");
+QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes");
+QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes");
+QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes");
+QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw");
+QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges");
+
 QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
 QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
 QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -642,6 +773,13 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_copy_offload_entry.attr,
+	&queue_copy_max_hw_entry.attr,
+	&queue_copy_max_entry.attr,
+	&queue_copy_range_max_hw_entry.attr,
+	&queue_copy_range_max_entry.attr,
+	&queue_copy_nr_ranges_max_hw_entry.attr,
+	&queue_copy_nr_ranges_max_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1b24c1fb3bb1..3596fd37fae7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -270,6 +270,13 @@ struct queue_limits {
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
 
+	unsigned long		max_hw_copy_sectors;
+	unsigned long		max_copy_sectors;
+	unsigned int		max_hw_copy_range_sectors;
+	unsigned int		max_copy_range_sectors;
+	unsigned short		max_hw_copy_nr_ranges;
+	unsigned short		max_copy_nr_ranges;
+
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
 	unsigned short		max_discard_segments;
@@ -574,6 +581,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_COPY		30	/* supports copy offload */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -596,6 +604,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
+#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
@@ -960,6 +969,10 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_max_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_discard_segments(struct request_queue *,
 		unsigned short);
+extern void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int max_copy_sectors);
+extern void blk_queue_max_copy_range_sectors(struct request_queue *q,
+		unsigned int max_copy_range_sectors);
+extern void blk_queue_max_copy_nr_ranges(struct request_queue *q, unsigned int max_copy_nr_ranges);
 void blk_queue_max_secure_erase_sectors(struct request_queue *q,
 		unsigned int max_sectors);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 02/10] block: Add copy offload support infrastructure
       [not found]   ` <CGME20220426101921epcas5p341707619b5e836490284a42c92762083@epcas5p3.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Arnav Dawn, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Alexander Viro, linux-kernel

Introduce blkdev_issue_copy which supports source and destination bdevs,
and an array of (source, destination and copy length) tuples.
Introduce REQ_COPY copy offload operation flag. Create a read-write
bio pair with a token as payload and submitted to the device in order.
Read request populates token with source specific information which
is then passed with write request.
This design is courtesy Mikulas Patocka's token based copy

Larger copy will be divided, based on max_copy_sectors,
max_copy_range_sector limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
 block/blk.h               |   2 +
 include/linux/blk_types.h |  21 ++++
 include/linux/blkdev.h    |   2 +
 include/uapi/linux/fs.h   |  14 +++
 5 files changed, 271 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 09b7e1200c0f..ba9da2d2f429 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+/*
+ * Wait on and process all in-flight BIOs.  This must only be called once
+ * all bios have been issued so that the refcount can only decrease.
+ * This just waits for all bios to make it through bio_copy_end_io. IO
+ * errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cio->lock, flags);
+	if (cio->refcount) {
+		cio->waiter = current;
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		spin_unlock_irqrestore(&cio->lock, flags);
+		blk_io_schedule();
+		/* wake up sets us TASK_RUNNING */
+		spin_lock_irqsave(&cio->lock, flags);
+		cio->waiter = NULL;
+		ret = cio->io_err;
+	}
+	spin_unlock_irqrestore(&cio->lock, flags);
+	kvfree(cio);
+
+	return ret;
+}
+
+static void bio_copy_end_io(struct bio *bio)
+{
+	struct copy_ctx *ctx = bio->bi_private;
+	struct cio *cio = ctx->cio;
+	sector_t clen;
+	int ri = ctx->range_idx;
+	unsigned long flags;
+	bool wake = false;
+
+	if (bio->bi_status) {
+		cio->io_err = bio->bi_status;
+		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
+		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
+	}
+	__free_page(bio->bi_io_vec[0].bv_page);
+	kfree(ctx);
+	bio_put(bio);
+
+	spin_lock_irqsave(&cio->lock, flags);
+	if (((--cio->refcount) <= 0) && cio->waiter)
+		wake = true;
+	spin_unlock_irqrestore(&cio->lock, flags);
+	if (wake)
+		wake_up_process(cio->waiter);
+}
+
+/*
+ * blk_copy_offload	- Use device's native copy offload feature
+ * Go through user provide payload, prepare new payload based on device's copy offload limits.
+ */
+int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *sq = bdev_get_queue(src_bdev);
+	struct request_queue *dq = bdev_get_queue(dst_bdev);
+	struct bio *read_bio, *write_bio;
+	struct copy_ctx *ctx;
+	struct cio *cio;
+	struct page *token;
+	sector_t src_blk, copy_len, dst_blk;
+	sector_t remaining, max_copy_len = LONG_MAX;
+	unsigned long flags;
+	int ri = 0, ret = 0;
+
+	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
+	if (!cio)
+		return -ENOMEM;
+	cio->rlist = rlist;
+	spin_lock_init(&cio->lock);
+
+	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
+			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		cio->rlist[ri].comp_len = rlist[ri].len;
+		src_blk = rlist[ri].src;
+		dst_blk = rlist[ri].dst;
+		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
+			copy_len = min(remaining, max_copy_len);
+
+			token = alloc_page(gfp_mask);
+			if (unlikely(!token)) {
+				ret = -ENOMEM;
+				goto err_token;
+			}
+
+			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
+			if (!ctx) {
+				ret = -ENOMEM;
+				goto err_ctx;
+			}
+			ctx->cio = cio;
+			ctx->range_idx = ri;
+			ctx->start_sec = dst_blk;
+
+			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!read_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
+			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
+			read_bio->bi_iter.bi_size = copy_len;
+			ret = submit_bio_wait(read_bio);
+			bio_put(read_bio);
+			if (ret)
+				goto err_read_bio;
+
+			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!write_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
+			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
+			write_bio->bi_iter.bi_size = copy_len;
+			write_bio->bi_end_io = bio_copy_end_io;
+			write_bio->bi_private = ctx;
+
+			spin_lock_irqsave(&cio->lock, flags);
+			++cio->refcount;
+			spin_unlock_irqrestore(&cio->lock, flags);
+
+			submit_bio(write_bio);
+			src_blk += copy_len;
+			dst_blk += copy_len;
+		}
+	}
+
+	/* Wait for completion of all IO's*/
+	return cio_await_completion(cio);
+
+err_read_bio:
+	kfree(ctx);
+err_ctx:
+	__free_page(token);
+err_token:
+	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
+
+	cio->io_err = ret;
+	return cio_await_completion(cio);
+}
+
+static inline int blk_copy_sanity_check(struct block_device *src_bdev,
+		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
+{
+	unsigned int align_mask = max(
+			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
+	sector_t len = 0;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		if (rlist[i].len)
+			len += rlist[i].len;
+		else
+			return -EINVAL;
+		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
+				(rlist[i].len & align_mask))
+			return -EINVAL;
+		rlist[i].comp_len = 0;
+	}
+
+	if (len && len >= MAX_COPY_TOTAL_LENGTH)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline bool blk_check_copy_offload(struct request_queue *src_q,
+		struct request_queue *dest_q)
+{
+	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
+		return true;
+
+	return false;
+}
+
+/*
+ * blkdev_issue_copy - queue a copy
+ * @src_bdev:	source block device
+ * @nr_srcs:	number of source ranges to copy
+ * @rlist:	array of source/dest/len
+ * @dest_bdev:	destination block device
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *	Copy source ranges from source block device to destination block device.
+ *	length of a source range cannot be zero.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *src_q = bdev_get_queue(src_bdev);
+	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
+	int ret = -EINVAL;
+
+	if (!src_q || !dest_q)
+		return -ENXIO;
+
+	if (!nr)
+		return -EINVAL;
+
+	if (nr >= MAX_COPY_NR_RANGE)
+		return -EINVAL;
+
+	if (bdev_read_only(dest_bdev))
+		return -EPERM;
+
+	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
+	if (ret)
+		return ret;
+
+	if (blk_check_copy_offload(src_q, dest_q))
+		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_issue_copy);
+
 static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
 		struct bio **biop, unsigned flags)
diff --git a/block/blk.h b/block/blk.h
index 434017701403..6010eda58c70 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 		break;
 	}
 
+	if (unlikely(op_is_copy(bio->bi_opf)))
+		return false;
 	/*
 	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
 	 * This is a quick and dirty check that relies on the fact that
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index c62274466e72..f5b01f284c43 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -418,6 +418,7 @@ enum req_flag_bits {
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
+	__REQ_COPY,		/* copy request */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -443,6 +444,7 @@ enum req_flag_bits {
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
+#define REQ_COPY		(1ULL << __REQ_COPY)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
@@ -459,6 +461,11 @@ enum stat_group {
 	NR_STAT_GROUPS
 };
 
+static inline bool op_is_copy(unsigned int op)
+{
+	return (op & REQ_COPY);
+}
+
 #define bio_op(bio) \
 	((bio)->bi_opf & REQ_OP_MASK)
 
@@ -533,4 +540,18 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+struct cio {
+	struct range_entry *rlist;
+	struct task_struct *waiter;     /* waiting task (NULL if none) */
+	spinlock_t lock;		/* protects refcount and waiter */
+	int refcount;
+	blk_status_t io_err;
+};
+
+struct copy_ctx {
+	int range_idx;
+	sector_t start_sec;
+	struct cio *cio;
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3596fd37fae7..c6cb3fe82ba2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..822c28cebf3a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,20 @@ struct fstrim_range {
 	__u64 minlen;
 };
 
+/* Maximum no of entries supported */
+#define MAX_COPY_NR_RANGE	(1 << 12)
+
+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH	(1 << 27)
+
+/* Source range entry for copy */
+struct range_entry {
+	__u64 src;
+	__u64 dst;
+	__u64 len;
+	__u64 comp_len;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Arnav Dawn, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Alexander Viro, linux-kernel

Introduce blkdev_issue_copy which supports source and destination bdevs,
and an array of (source, destination and copy length) tuples.
Introduce REQ_COPY copy offload operation flag. Create a read-write
bio pair with a token as payload and submitted to the device in order.
Read request populates token with source specific information which
is then passed with write request.
This design is courtesy Mikulas Patocka's token based copy

Larger copy will be divided, based on max_copy_sectors,
max_copy_range_sector limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
 block/blk.h               |   2 +
 include/linux/blk_types.h |  21 ++++
 include/linux/blkdev.h    |   2 +
 include/uapi/linux/fs.h   |  14 +++
 5 files changed, 271 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 09b7e1200c0f..ba9da2d2f429 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+/*
+ * Wait on and process all in-flight BIOs.  This must only be called once
+ * all bios have been issued so that the refcount can only decrease.
+ * This just waits for all bios to make it through bio_copy_end_io. IO
+ * errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cio->lock, flags);
+	if (cio->refcount) {
+		cio->waiter = current;
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		spin_unlock_irqrestore(&cio->lock, flags);
+		blk_io_schedule();
+		/* wake up sets us TASK_RUNNING */
+		spin_lock_irqsave(&cio->lock, flags);
+		cio->waiter = NULL;
+		ret = cio->io_err;
+	}
+	spin_unlock_irqrestore(&cio->lock, flags);
+	kvfree(cio);
+
+	return ret;
+}
+
+static void bio_copy_end_io(struct bio *bio)
+{
+	struct copy_ctx *ctx = bio->bi_private;
+	struct cio *cio = ctx->cio;
+	sector_t clen;
+	int ri = ctx->range_idx;
+	unsigned long flags;
+	bool wake = false;
+
+	if (bio->bi_status) {
+		cio->io_err = bio->bi_status;
+		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
+		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
+	}
+	__free_page(bio->bi_io_vec[0].bv_page);
+	kfree(ctx);
+	bio_put(bio);
+
+	spin_lock_irqsave(&cio->lock, flags);
+	if (((--cio->refcount) <= 0) && cio->waiter)
+		wake = true;
+	spin_unlock_irqrestore(&cio->lock, flags);
+	if (wake)
+		wake_up_process(cio->waiter);
+}
+
+/*
+ * blk_copy_offload	- Use device's native copy offload feature
+ * Go through user provide payload, prepare new payload based on device's copy offload limits.
+ */
+int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *sq = bdev_get_queue(src_bdev);
+	struct request_queue *dq = bdev_get_queue(dst_bdev);
+	struct bio *read_bio, *write_bio;
+	struct copy_ctx *ctx;
+	struct cio *cio;
+	struct page *token;
+	sector_t src_blk, copy_len, dst_blk;
+	sector_t remaining, max_copy_len = LONG_MAX;
+	unsigned long flags;
+	int ri = 0, ret = 0;
+
+	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
+	if (!cio)
+		return -ENOMEM;
+	cio->rlist = rlist;
+	spin_lock_init(&cio->lock);
+
+	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
+			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		cio->rlist[ri].comp_len = rlist[ri].len;
+		src_blk = rlist[ri].src;
+		dst_blk = rlist[ri].dst;
+		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
+			copy_len = min(remaining, max_copy_len);
+
+			token = alloc_page(gfp_mask);
+			if (unlikely(!token)) {
+				ret = -ENOMEM;
+				goto err_token;
+			}
+
+			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
+			if (!ctx) {
+				ret = -ENOMEM;
+				goto err_ctx;
+			}
+			ctx->cio = cio;
+			ctx->range_idx = ri;
+			ctx->start_sec = dst_blk;
+
+			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!read_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
+			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
+			read_bio->bi_iter.bi_size = copy_len;
+			ret = submit_bio_wait(read_bio);
+			bio_put(read_bio);
+			if (ret)
+				goto err_read_bio;
+
+			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!write_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
+			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
+			write_bio->bi_iter.bi_size = copy_len;
+			write_bio->bi_end_io = bio_copy_end_io;
+			write_bio->bi_private = ctx;
+
+			spin_lock_irqsave(&cio->lock, flags);
+			++cio->refcount;
+			spin_unlock_irqrestore(&cio->lock, flags);
+
+			submit_bio(write_bio);
+			src_blk += copy_len;
+			dst_blk += copy_len;
+		}
+	}
+
+	/* Wait for completion of all IO's*/
+	return cio_await_completion(cio);
+
+err_read_bio:
+	kfree(ctx);
+err_ctx:
+	__free_page(token);
+err_token:
+	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
+
+	cio->io_err = ret;
+	return cio_await_completion(cio);
+}
+
+static inline int blk_copy_sanity_check(struct block_device *src_bdev,
+		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
+{
+	unsigned int align_mask = max(
+			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
+	sector_t len = 0;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		if (rlist[i].len)
+			len += rlist[i].len;
+		else
+			return -EINVAL;
+		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
+				(rlist[i].len & align_mask))
+			return -EINVAL;
+		rlist[i].comp_len = 0;
+	}
+
+	if (len && len >= MAX_COPY_TOTAL_LENGTH)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline bool blk_check_copy_offload(struct request_queue *src_q,
+		struct request_queue *dest_q)
+{
+	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
+		return true;
+
+	return false;
+}
+
+/*
+ * blkdev_issue_copy - queue a copy
+ * @src_bdev:	source block device
+ * @nr_srcs:	number of source ranges to copy
+ * @rlist:	array of source/dest/len
+ * @dest_bdev:	destination block device
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *	Copy source ranges from source block device to destination block device.
+ *	length of a source range cannot be zero.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *src_q = bdev_get_queue(src_bdev);
+	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
+	int ret = -EINVAL;
+
+	if (!src_q || !dest_q)
+		return -ENXIO;
+
+	if (!nr)
+		return -EINVAL;
+
+	if (nr >= MAX_COPY_NR_RANGE)
+		return -EINVAL;
+
+	if (bdev_read_only(dest_bdev))
+		return -EPERM;
+
+	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
+	if (ret)
+		return ret;
+
+	if (blk_check_copy_offload(src_q, dest_q))
+		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_issue_copy);
+
 static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
 		struct bio **biop, unsigned flags)
diff --git a/block/blk.h b/block/blk.h
index 434017701403..6010eda58c70 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 		break;
 	}
 
+	if (unlikely(op_is_copy(bio->bi_opf)))
+		return false;
 	/*
 	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
 	 * This is a quick and dirty check that relies on the fact that
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index c62274466e72..f5b01f284c43 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -418,6 +418,7 @@ enum req_flag_bits {
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
+	__REQ_COPY,		/* copy request */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -443,6 +444,7 @@ enum req_flag_bits {
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
+#define REQ_COPY		(1ULL << __REQ_COPY)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
@@ -459,6 +461,11 @@ enum stat_group {
 	NR_STAT_GROUPS
 };
 
+static inline bool op_is_copy(unsigned int op)
+{
+	return (op & REQ_COPY);
+}
+
 #define bio_op(bio) \
 	((bio)->bi_opf & REQ_OP_MASK)
 
@@ -533,4 +540,18 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+struct cio {
+	struct range_entry *rlist;
+	struct task_struct *waiter;     /* waiting task (NULL if none) */
+	spinlock_t lock;		/* protects refcount and waiter */
+	int refcount;
+	blk_status_t io_err;
+};
+
+struct copy_ctx {
+	int range_idx;
+	sector_t start_sec;
+	struct cio *cio;
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3596fd37fae7..c6cb3fe82ba2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..822c28cebf3a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,20 @@ struct fstrim_range {
 	__u64 minlen;
 };
 
+/* Maximum no of entries supported */
+#define MAX_COPY_NR_RANGE	(1 << 12)
+
+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH	(1 << 27)
+
+/* Source range entry for copy */
+struct range_entry {
+	__u64 src;
+	__u64 dst;
+	__u64 len;
+	__u64 comp_len;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, lsf-pc, Damien Le Moal,
	Alexander Viro

Introduce blkdev_issue_copy which supports source and destination bdevs,
and an array of (source, destination and copy length) tuples.
Introduce REQ_COPY copy offload operation flag. Create a read-write
bio pair with a token as payload and submitted to the device in order.
Read request populates token with source specific information which
is then passed with write request.
This design is courtesy Mikulas Patocka's token based copy

Larger copy will be divided, based on max_copy_sectors,
max_copy_range_sector limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
 block/blk.h               |   2 +
 include/linux/blk_types.h |  21 ++++
 include/linux/blkdev.h    |   2 +
 include/uapi/linux/fs.h   |  14 +++
 5 files changed, 271 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 09b7e1200c0f..ba9da2d2f429 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+/*
+ * Wait on and process all in-flight BIOs.  This must only be called once
+ * all bios have been issued so that the refcount can only decrease.
+ * This just waits for all bios to make it through bio_copy_end_io. IO
+ * errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cio->lock, flags);
+	if (cio->refcount) {
+		cio->waiter = current;
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		spin_unlock_irqrestore(&cio->lock, flags);
+		blk_io_schedule();
+		/* wake up sets us TASK_RUNNING */
+		spin_lock_irqsave(&cio->lock, flags);
+		cio->waiter = NULL;
+		ret = cio->io_err;
+	}
+	spin_unlock_irqrestore(&cio->lock, flags);
+	kvfree(cio);
+
+	return ret;
+}
+
+static void bio_copy_end_io(struct bio *bio)
+{
+	struct copy_ctx *ctx = bio->bi_private;
+	struct cio *cio = ctx->cio;
+	sector_t clen;
+	int ri = ctx->range_idx;
+	unsigned long flags;
+	bool wake = false;
+
+	if (bio->bi_status) {
+		cio->io_err = bio->bi_status;
+		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
+		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
+	}
+	__free_page(bio->bi_io_vec[0].bv_page);
+	kfree(ctx);
+	bio_put(bio);
+
+	spin_lock_irqsave(&cio->lock, flags);
+	if (((--cio->refcount) <= 0) && cio->waiter)
+		wake = true;
+	spin_unlock_irqrestore(&cio->lock, flags);
+	if (wake)
+		wake_up_process(cio->waiter);
+}
+
+/*
+ * blk_copy_offload	- Use device's native copy offload feature
+ * Go through user provide payload, prepare new payload based on device's copy offload limits.
+ */
+int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *sq = bdev_get_queue(src_bdev);
+	struct request_queue *dq = bdev_get_queue(dst_bdev);
+	struct bio *read_bio, *write_bio;
+	struct copy_ctx *ctx;
+	struct cio *cio;
+	struct page *token;
+	sector_t src_blk, copy_len, dst_blk;
+	sector_t remaining, max_copy_len = LONG_MAX;
+	unsigned long flags;
+	int ri = 0, ret = 0;
+
+	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
+	if (!cio)
+		return -ENOMEM;
+	cio->rlist = rlist;
+	spin_lock_init(&cio->lock);
+
+	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
+			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		cio->rlist[ri].comp_len = rlist[ri].len;
+		src_blk = rlist[ri].src;
+		dst_blk = rlist[ri].dst;
+		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
+			copy_len = min(remaining, max_copy_len);
+
+			token = alloc_page(gfp_mask);
+			if (unlikely(!token)) {
+				ret = -ENOMEM;
+				goto err_token;
+			}
+
+			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
+			if (!ctx) {
+				ret = -ENOMEM;
+				goto err_ctx;
+			}
+			ctx->cio = cio;
+			ctx->range_idx = ri;
+			ctx->start_sec = dst_blk;
+
+			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!read_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
+			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
+			read_bio->bi_iter.bi_size = copy_len;
+			ret = submit_bio_wait(read_bio);
+			bio_put(read_bio);
+			if (ret)
+				goto err_read_bio;
+
+			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!write_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
+			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
+			write_bio->bi_iter.bi_size = copy_len;
+			write_bio->bi_end_io = bio_copy_end_io;
+			write_bio->bi_private = ctx;
+
+			spin_lock_irqsave(&cio->lock, flags);
+			++cio->refcount;
+			spin_unlock_irqrestore(&cio->lock, flags);
+
+			submit_bio(write_bio);
+			src_blk += copy_len;
+			dst_blk += copy_len;
+		}
+	}
+
+	/* Wait for completion of all IO's*/
+	return cio_await_completion(cio);
+
+err_read_bio:
+	kfree(ctx);
+err_ctx:
+	__free_page(token);
+err_token:
+	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
+
+	cio->io_err = ret;
+	return cio_await_completion(cio);
+}
+
+static inline int blk_copy_sanity_check(struct block_device *src_bdev,
+		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
+{
+	unsigned int align_mask = max(
+			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
+	sector_t len = 0;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		if (rlist[i].len)
+			len += rlist[i].len;
+		else
+			return -EINVAL;
+		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
+				(rlist[i].len & align_mask))
+			return -EINVAL;
+		rlist[i].comp_len = 0;
+	}
+
+	if (len && len >= MAX_COPY_TOTAL_LENGTH)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline bool blk_check_copy_offload(struct request_queue *src_q,
+		struct request_queue *dest_q)
+{
+	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
+		return true;
+
+	return false;
+}
+
+/*
+ * blkdev_issue_copy - queue a copy
+ * @src_bdev:	source block device
+ * @nr_srcs:	number of source ranges to copy
+ * @rlist:	array of source/dest/len
+ * @dest_bdev:	destination block device
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *	Copy source ranges from source block device to destination block device.
+ *	length of a source range cannot be zero.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *src_q = bdev_get_queue(src_bdev);
+	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
+	int ret = -EINVAL;
+
+	if (!src_q || !dest_q)
+		return -ENXIO;
+
+	if (!nr)
+		return -EINVAL;
+
+	if (nr >= MAX_COPY_NR_RANGE)
+		return -EINVAL;
+
+	if (bdev_read_only(dest_bdev))
+		return -EPERM;
+
+	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
+	if (ret)
+		return ret;
+
+	if (blk_check_copy_offload(src_q, dest_q))
+		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_issue_copy);
+
 static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
 		struct bio **biop, unsigned flags)
diff --git a/block/blk.h b/block/blk.h
index 434017701403..6010eda58c70 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 		break;
 	}
 
+	if (unlikely(op_is_copy(bio->bi_opf)))
+		return false;
 	/*
 	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
 	 * This is a quick and dirty check that relies on the fact that
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index c62274466e72..f5b01f284c43 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -418,6 +418,7 @@ enum req_flag_bits {
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
+	__REQ_COPY,		/* copy request */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -443,6 +444,7 @@ enum req_flag_bits {
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
+#define REQ_COPY		(1ULL << __REQ_COPY)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
@@ -459,6 +461,11 @@ enum stat_group {
 	NR_STAT_GROUPS
 };
 
+static inline bool op_is_copy(unsigned int op)
+{
+	return (op & REQ_COPY);
+}
+
 #define bio_op(bio) \
 	((bio)->bi_opf & REQ_OP_MASK)
 
@@ -533,4 +540,18 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+struct cio {
+	struct range_entry *rlist;
+	struct task_struct *waiter;     /* waiting task (NULL if none) */
+	spinlock_t lock;		/* protects refcount and waiter */
+	int refcount;
+	blk_status_t io_err;
+};
+
+struct copy_ctx {
+	int range_idx;
+	sector_t start_sec;
+	struct cio *cio;
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3596fd37fae7..c6cb3fe82ba2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..822c28cebf3a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,20 @@ struct fstrim_range {
 	__u64 minlen;
 };
 
+/* Maximum no of entries supported */
+#define MAX_COPY_NR_RANGE	(1 << 12)
+
+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH	(1 << 27)
+
+/* Source range entry for copy */
+struct range_entry {
+	__u64 src;
+	__u64 dst;
+	__u64 len;
+	__u64 comp_len;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 03/10] block: Introduce a new ioctl for copy
       [not found]   ` <CGME20220426101938epcas5p291690dd1f0e931cd9f8139daaf3f9296@epcas5p2.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Javier González, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.

Example code, to issue BLKCOPY:
/* Sample example to copy three entries with [dest,src,len],
* [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */

int main(void)
{
	int i, ret, fd;
	unsigned long src = 0, dst = 32768, len = 4096;
	struct copy_range *cr;
	cr = (struct copy_range *)malloc(sizeof(*cr)+
					(sizeof(struct range_entry)*3));
	cr->nr_range = 3;
	cr->reserved = 0;
	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
		cr->range_list[i].dst = dst;
		cr->range_list[i].src = src;
		cr->range_list[i].len = len;
		cr->range_list[i].comp_len = 0;
	}
	fd = open("/dev/nvme0n1", O_RDWR);
	if (fd < 0) return 1;
	ret = ioctl(fd, BLKCOPY, cr);
	if (ret != 0)
	       printf("copy failed, ret= %d\n", ret);
	for (i=0; i< cr->nr_range; i++)
		if (cr->range_list[i].len != cr->range_list[i].comp_len)
			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
								i, cr->range_list[i].len,
								cr->range_list[i].comp_len);
	close(fd);
	free(cr);
	return ret;
}

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h |  9 +++++++++
 2 files changed, 41 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 46949f1b0dba..58d93c20ff30 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -120,6 +120,36 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	struct copy_range crange, *ranges = NULL;
+	size_t payload_size = 0;
+	int ret;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
+		return -EFAULT;
+
+	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
+		return -EINVAL;
+
+	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
+
+	ranges = memdup_user((void __user *)arg, payload_size);
+	if (IS_ERR(ranges))
+		return PTR_ERR(ranges);
+
+	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL);
+	if (copy_to_user((void __user *)arg, ranges, payload_size))
+		ret = -EFAULT;
+
+	kfree(ranges);
+	return ret;
+}
+
 static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
 		void __user *argp)
 {
@@ -481,6 +511,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 		return blk_ioctl_discard(bdev, mode, arg);
 	case BLKSECDISCARD:
 		return blk_ioctl_secure_erase(bdev, mode, argp);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 822c28cebf3a..a3b13406ffb8 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -78,6 +78,14 @@ struct range_entry {
 	__u64 comp_len;
 };
 
+struct copy_range {
+	__u64 nr_range;
+	__u64 reserved;
+
+	/* Range_list always must be at the end */
+	struct range_entry range_list[];
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
@@ -199,6 +207,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 03/10] block: Introduce a new ioctl for copy
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Javier González, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.

Example code, to issue BLKCOPY:
/* Sample example to copy three entries with [dest,src,len],
* [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */

int main(void)
{
	int i, ret, fd;
	unsigned long src = 0, dst = 32768, len = 4096;
	struct copy_range *cr;
	cr = (struct copy_range *)malloc(sizeof(*cr)+
					(sizeof(struct range_entry)*3));
	cr->nr_range = 3;
	cr->reserved = 0;
	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
		cr->range_list[i].dst = dst;
		cr->range_list[i].src = src;
		cr->range_list[i].len = len;
		cr->range_list[i].comp_len = 0;
	}
	fd = open("/dev/nvme0n1", O_RDWR);
	if (fd < 0) return 1;
	ret = ioctl(fd, BLKCOPY, cr);
	if (ret != 0)
	       printf("copy failed, ret= %d\n", ret);
	for (i=0; i< cr->nr_range; i++)
		if (cr->range_list[i].len != cr->range_list[i].comp_len)
			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
								i, cr->range_list[i].len,
								cr->range_list[i].comp_len);
	close(fd);
	free(cr);
	return ret;
}

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h |  9 +++++++++
 2 files changed, 41 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 46949f1b0dba..58d93c20ff30 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -120,6 +120,36 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	struct copy_range crange, *ranges = NULL;
+	size_t payload_size = 0;
+	int ret;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
+		return -EFAULT;
+
+	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
+		return -EINVAL;
+
+	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
+
+	ranges = memdup_user((void __user *)arg, payload_size);
+	if (IS_ERR(ranges))
+		return PTR_ERR(ranges);
+
+	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL);
+	if (copy_to_user((void __user *)arg, ranges, payload_size))
+		ret = -EFAULT;
+
+	kfree(ranges);
+	return ret;
+}
+
 static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
 		void __user *argp)
 {
@@ -481,6 +511,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 		return blk_ioctl_discard(bdev, mode, arg);
 	case BLKSECDISCARD:
 		return blk_ioctl_secure_erase(bdev, mode, argp);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 822c28cebf3a..a3b13406ffb8 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -78,6 +78,14 @@ struct range_entry {
 	__u64 comp_len;
 };
 
+struct copy_range {
+	__u64 nr_range;
+	__u64 reserved;
+
+	/* Range_list always must be at the end */
+	struct range_entry range_list[];
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
@@ -199,6 +207,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 03/10] block: Introduce a new ioctl for copy
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, Javier González, lsf-pc,
	Damien Le Moal, Alexander Viro

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.

Example code, to issue BLKCOPY:
/* Sample example to copy three entries with [dest,src,len],
* [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */

int main(void)
{
	int i, ret, fd;
	unsigned long src = 0, dst = 32768, len = 4096;
	struct copy_range *cr;
	cr = (struct copy_range *)malloc(sizeof(*cr)+
					(sizeof(struct range_entry)*3));
	cr->nr_range = 3;
	cr->reserved = 0;
	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
		cr->range_list[i].dst = dst;
		cr->range_list[i].src = src;
		cr->range_list[i].len = len;
		cr->range_list[i].comp_len = 0;
	}
	fd = open("/dev/nvme0n1", O_RDWR);
	if (fd < 0) return 1;
	ret = ioctl(fd, BLKCOPY, cr);
	if (ret != 0)
	       printf("copy failed, ret= %d\n", ret);
	for (i=0; i< cr->nr_range; i++)
		if (cr->range_list[i].len != cr->range_list[i].comp_len)
			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
								i, cr->range_list[i].len,
								cr->range_list[i].comp_len);
	close(fd);
	free(cr);
	return ret;
}

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h |  9 +++++++++
 2 files changed, 41 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 46949f1b0dba..58d93c20ff30 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -120,6 +120,36 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	struct copy_range crange, *ranges = NULL;
+	size_t payload_size = 0;
+	int ret;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
+		return -EFAULT;
+
+	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
+		return -EINVAL;
+
+	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
+
+	ranges = memdup_user((void __user *)arg, payload_size);
+	if (IS_ERR(ranges))
+		return PTR_ERR(ranges);
+
+	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL);
+	if (copy_to_user((void __user *)arg, ranges, payload_size))
+		ret = -EFAULT;
+
+	kfree(ranges);
+	return ret;
+}
+
 static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
 		void __user *argp)
 {
@@ -481,6 +511,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 		return blk_ioctl_discard(bdev, mode, arg);
 	case BLKSECDISCARD:
 		return blk_ioctl_secure_erase(bdev, mode, argp);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 822c28cebf3a..a3b13406ffb8 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -78,6 +78,14 @@ struct range_entry {
 	__u64 comp_len;
 };
 
+struct copy_range {
+	__u64 nr_range;
+	__u64 reserved;
+
+	/* Range_list always must be at the end */
+	struct range_entry range_list[];
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
@@ -199,6 +207,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 04/10] block: add emulation for copy
       [not found]   ` <CGME20220426101951epcas5p1f53a2120010607354dc29bf8331f6af8@epcas5p1.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Vincent Fu, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination synchronously.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c        | 128 ++++++++++++++++++++++++++++++++++++++++-
 block/blk-map.c        |   2 +-
 include/linux/blkdev.h |   2 +
 3 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index ba9da2d2f429..58c30a42ea44 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -273,6 +273,65 @@ int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
 	return cio_await_completion(cio);
 }
 
+int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
+				sector_t sector, unsigned int op, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio, *parent = NULL;
+	sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
+			queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
+	sector_t len, remaining;
+	int ret;
+
+	for (remaining = buf_len; remaining > 0; remaining -= len) {
+		len = min_t(int, max_hw_len, remaining);
+retry:
+		bio = bio_map_kern(q, buf, len, gfp_mask);
+		if (IS_ERR(bio)) {
+			len >>= 1;
+			if (len)
+				goto retry;
+			return PTR_ERR(bio);
+		}
+
+		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
+		bio->bi_opf = op;
+		bio_set_dev(bio, bdev);
+		bio->bi_end_io = NULL;
+		bio->bi_private = NULL;
+
+		if (parent) {
+			bio_chain(parent, bio);
+			submit_bio(parent);
+		}
+		parent = bio;
+		sector += len;
+		buf = (char *) buf + len;
+	}
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return ret;
+}
+
+static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask)
+{
+	int min_size = PAGE_SIZE;
+	void *buf;
+
+	while (req_size >= min_size) {
+		buf = kvmalloc(req_size, gfp_mask);
+		if (buf) {
+			*alloc_size = req_size;
+			return buf;
+		}
+		/* retry half the requested size */
+		req_size >>= 1;
+	}
+
+	return NULL;
+}
+
 static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
 {
@@ -298,6 +357,68 @@ static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 	return 0;
 }
 
+/* returns the total copy length still need to be copied */
+static inline sector_t blk_copy_max_range(struct range_entry *rlist, int nr, sector_t *max_len)
+{
+	int i;
+	sector_t len = 0;
+
+	*max_len = 0;
+	for (i = 0; i < nr; i++) {
+		*max_len = max(*max_len, rlist[i].len - rlist[i].comp_len);
+		len += (rlist[i].len - rlist[i].comp_len);
+	}
+
+	return len;
+}
+
+/*
+ * If native copy offload feature is absent, this function tries to emulate,
+ * by copying data from source to a temporary buffer and from buffer to
+ * destination device.
+ */
+static int blk_copy_emulate(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	void *buf = NULL;
+	int ret, nr_i = 0;
+	sector_t src, dst, copy_len, buf_len, read_len, copied_len,
+		 max_len = 0, remaining = 0, offset = 0;
+
+	copy_len = blk_copy_max_range(rlist, nr, &max_len);
+	buf = blk_alloc_buf(max_len, &buf_len, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	for (copied_len = 0; copied_len < copy_len; copied_len += read_len) {
+		if (!remaining) {
+			offset = rlist[nr_i].comp_len;
+			src = rlist[nr_i].src + offset;
+			dst = rlist[nr_i].dst + offset;
+			remaining = rlist[nr_i++].len - offset;
+		}
+
+		read_len = min_t(sector_t, remaining, buf_len);
+		if (!read_len)
+			continue;
+		ret = blk_submit_rw_buf(src_bdev, buf, read_len, src, REQ_OP_READ, gfp_mask);
+		if (ret)
+			goto out;
+		src += read_len;
+		remaining -= read_len;
+		ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dst, REQ_OP_WRITE,
+				gfp_mask);
+		if (ret)
+			goto out;
+		else
+			rlist[nr_i - 1].comp_len += read_len;
+		dst += read_len;
+	}
+out:
+	kvfree(buf);
+	return ret;
+}
+
 static inline bool blk_check_copy_offload(struct request_queue *src_q,
 		struct request_queue *dest_q)
 {
@@ -325,6 +446,7 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 	struct request_queue *src_q = bdev_get_queue(src_bdev);
 	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
 	int ret = -EINVAL;
+	bool offload = false;
 
 	if (!src_q || !dest_q)
 		return -ENXIO;
@@ -342,9 +464,13 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 	if (ret)
 		return ret;
 
-	if (blk_check_copy_offload(src_q, dest_q))
+	offload = blk_check_copy_offload(src_q, dest_q);
+	if (offload)
 		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
 
+	if (ret || !offload)
+		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_issue_copy);
diff --git a/block/blk-map.c b/block/blk-map.c
index 7ffde64f9019..ca2ad2c21f42 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -340,7 +340,7 @@ static void bio_map_kern_endio(struct bio *bio)
  *	Map the kernel address into a bio suitable for io to a block
  *	device. Returns an error pointer in case of error.
  */
-static struct bio *bio_map_kern(struct request_queue *q, void *data,
+struct bio *bio_map_kern(struct request_queue *q, void *data,
 		unsigned int len, gfp_t gfp_mask)
 {
 	unsigned long kaddr = (unsigned long)data;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c6cb3fe82ba2..ea1f3c8f8dad 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+		gfp_t gfp_mask);
 int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
 		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
 
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 04/10] block: add emulation for copy
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Vincent Fu, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination synchronously.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c        | 128 ++++++++++++++++++++++++++++++++++++++++-
 block/blk-map.c        |   2 +-
 include/linux/blkdev.h |   2 +
 3 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index ba9da2d2f429..58c30a42ea44 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -273,6 +273,65 @@ int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
 	return cio_await_completion(cio);
 }
 
+int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
+				sector_t sector, unsigned int op, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio, *parent = NULL;
+	sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
+			queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
+	sector_t len, remaining;
+	int ret;
+
+	for (remaining = buf_len; remaining > 0; remaining -= len) {
+		len = min_t(int, max_hw_len, remaining);
+retry:
+		bio = bio_map_kern(q, buf, len, gfp_mask);
+		if (IS_ERR(bio)) {
+			len >>= 1;
+			if (len)
+				goto retry;
+			return PTR_ERR(bio);
+		}
+
+		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
+		bio->bi_opf = op;
+		bio_set_dev(bio, bdev);
+		bio->bi_end_io = NULL;
+		bio->bi_private = NULL;
+
+		if (parent) {
+			bio_chain(parent, bio);
+			submit_bio(parent);
+		}
+		parent = bio;
+		sector += len;
+		buf = (char *) buf + len;
+	}
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return ret;
+}
+
+static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask)
+{
+	int min_size = PAGE_SIZE;
+	void *buf;
+
+	while (req_size >= min_size) {
+		buf = kvmalloc(req_size, gfp_mask);
+		if (buf) {
+			*alloc_size = req_size;
+			return buf;
+		}
+		/* retry half the requested size */
+		req_size >>= 1;
+	}
+
+	return NULL;
+}
+
 static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
 {
@@ -298,6 +357,68 @@ static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 	return 0;
 }
 
+/* returns the total copy length still need to be copied */
+static inline sector_t blk_copy_max_range(struct range_entry *rlist, int nr, sector_t *max_len)
+{
+	int i;
+	sector_t len = 0;
+
+	*max_len = 0;
+	for (i = 0; i < nr; i++) {
+		*max_len = max(*max_len, rlist[i].len - rlist[i].comp_len);
+		len += (rlist[i].len - rlist[i].comp_len);
+	}
+
+	return len;
+}
+
+/*
+ * If native copy offload feature is absent, this function tries to emulate,
+ * by copying data from source to a temporary buffer and from buffer to
+ * destination device.
+ */
+static int blk_copy_emulate(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	void *buf = NULL;
+	int ret, nr_i = 0;
+	sector_t src, dst, copy_len, buf_len, read_len, copied_len,
+		 max_len = 0, remaining = 0, offset = 0;
+
+	copy_len = blk_copy_max_range(rlist, nr, &max_len);
+	buf = blk_alloc_buf(max_len, &buf_len, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	for (copied_len = 0; copied_len < copy_len; copied_len += read_len) {
+		if (!remaining) {
+			offset = rlist[nr_i].comp_len;
+			src = rlist[nr_i].src + offset;
+			dst = rlist[nr_i].dst + offset;
+			remaining = rlist[nr_i++].len - offset;
+		}
+
+		read_len = min_t(sector_t, remaining, buf_len);
+		if (!read_len)
+			continue;
+		ret = blk_submit_rw_buf(src_bdev, buf, read_len, src, REQ_OP_READ, gfp_mask);
+		if (ret)
+			goto out;
+		src += read_len;
+		remaining -= read_len;
+		ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dst, REQ_OP_WRITE,
+				gfp_mask);
+		if (ret)
+			goto out;
+		else
+			rlist[nr_i - 1].comp_len += read_len;
+		dst += read_len;
+	}
+out:
+	kvfree(buf);
+	return ret;
+}
+
 static inline bool blk_check_copy_offload(struct request_queue *src_q,
 		struct request_queue *dest_q)
 {
@@ -325,6 +446,7 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 	struct request_queue *src_q = bdev_get_queue(src_bdev);
 	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
 	int ret = -EINVAL;
+	bool offload = false;
 
 	if (!src_q || !dest_q)
 		return -ENXIO;
@@ -342,9 +464,13 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 	if (ret)
 		return ret;
 
-	if (blk_check_copy_offload(src_q, dest_q))
+	offload = blk_check_copy_offload(src_q, dest_q);
+	if (offload)
 		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
 
+	if (ret || !offload)
+		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_issue_copy);
diff --git a/block/blk-map.c b/block/blk-map.c
index 7ffde64f9019..ca2ad2c21f42 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -340,7 +340,7 @@ static void bio_map_kern_endio(struct bio *bio)
  *	Map the kernel address into a bio suitable for io to a block
  *	device. Returns an error pointer in case of error.
  */
-static struct bio *bio_map_kern(struct request_queue *q, void *data,
+struct bio *bio_map_kern(struct request_queue *q, void *data,
 		unsigned int len, gfp_t gfp_mask)
 {
 	unsigned long kaddr = (unsigned long)data;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c6cb3fe82ba2..ea1f3c8f8dad 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+		gfp_t gfp_mask);
 int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
 		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
 
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 04/10] block: add emulation for copy
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: Vincent Fu, djwong, linux-nvme, clm, dm-devel, osandov,
	Alasdair Kergon, Naohiro Aota, msnitzer, bvanassche, linux-scsi,
	gost.dev, nitheshshetty, James Smart, hch, Nitesh Shetty,
	chaitanyak, Chaitanya Kulkarni, Mike Snitzer, josef, linux-block,
	dsterba, kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, lsf-pc, Damien Le Moal,
	Alexander Viro

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination synchronously.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c        | 128 ++++++++++++++++++++++++++++++++++++++++-
 block/blk-map.c        |   2 +-
 include/linux/blkdev.h |   2 +
 3 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index ba9da2d2f429..58c30a42ea44 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -273,6 +273,65 @@ int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
 	return cio_await_completion(cio);
 }
 
+int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
+				sector_t sector, unsigned int op, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio, *parent = NULL;
+	sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
+			queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
+	sector_t len, remaining;
+	int ret;
+
+	for (remaining = buf_len; remaining > 0; remaining -= len) {
+		len = min_t(int, max_hw_len, remaining);
+retry:
+		bio = bio_map_kern(q, buf, len, gfp_mask);
+		if (IS_ERR(bio)) {
+			len >>= 1;
+			if (len)
+				goto retry;
+			return PTR_ERR(bio);
+		}
+
+		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
+		bio->bi_opf = op;
+		bio_set_dev(bio, bdev);
+		bio->bi_end_io = NULL;
+		bio->bi_private = NULL;
+
+		if (parent) {
+			bio_chain(parent, bio);
+			submit_bio(parent);
+		}
+		parent = bio;
+		sector += len;
+		buf = (char *) buf + len;
+	}
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return ret;
+}
+
+static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask)
+{
+	int min_size = PAGE_SIZE;
+	void *buf;
+
+	while (req_size >= min_size) {
+		buf = kvmalloc(req_size, gfp_mask);
+		if (buf) {
+			*alloc_size = req_size;
+			return buf;
+		}
+		/* retry half the requested size */
+		req_size >>= 1;
+	}
+
+	return NULL;
+}
+
 static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
 {
@@ -298,6 +357,68 @@ static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 	return 0;
 }
 
+/* returns the total copy length still need to be copied */
+static inline sector_t blk_copy_max_range(struct range_entry *rlist, int nr, sector_t *max_len)
+{
+	int i;
+	sector_t len = 0;
+
+	*max_len = 0;
+	for (i = 0; i < nr; i++) {
+		*max_len = max(*max_len, rlist[i].len - rlist[i].comp_len);
+		len += (rlist[i].len - rlist[i].comp_len);
+	}
+
+	return len;
+}
+
+/*
+ * If native copy offload feature is absent, this function tries to emulate,
+ * by copying data from source to a temporary buffer and from buffer to
+ * destination device.
+ */
+static int blk_copy_emulate(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	void *buf = NULL;
+	int ret, nr_i = 0;
+	sector_t src, dst, copy_len, buf_len, read_len, copied_len,
+		 max_len = 0, remaining = 0, offset = 0;
+
+	copy_len = blk_copy_max_range(rlist, nr, &max_len);
+	buf = blk_alloc_buf(max_len, &buf_len, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	for (copied_len = 0; copied_len < copy_len; copied_len += read_len) {
+		if (!remaining) {
+			offset = rlist[nr_i].comp_len;
+			src = rlist[nr_i].src + offset;
+			dst = rlist[nr_i].dst + offset;
+			remaining = rlist[nr_i++].len - offset;
+		}
+
+		read_len = min_t(sector_t, remaining, buf_len);
+		if (!read_len)
+			continue;
+		ret = blk_submit_rw_buf(src_bdev, buf, read_len, src, REQ_OP_READ, gfp_mask);
+		if (ret)
+			goto out;
+		src += read_len;
+		remaining -= read_len;
+		ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dst, REQ_OP_WRITE,
+				gfp_mask);
+		if (ret)
+			goto out;
+		else
+			rlist[nr_i - 1].comp_len += read_len;
+		dst += read_len;
+	}
+out:
+	kvfree(buf);
+	return ret;
+}
+
 static inline bool blk_check_copy_offload(struct request_queue *src_q,
 		struct request_queue *dest_q)
 {
@@ -325,6 +446,7 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 	struct request_queue *src_q = bdev_get_queue(src_bdev);
 	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
 	int ret = -EINVAL;
+	bool offload = false;
 
 	if (!src_q || !dest_q)
 		return -ENXIO;
@@ -342,9 +464,13 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 	if (ret)
 		return ret;
 
-	if (blk_check_copy_offload(src_q, dest_q))
+	offload = blk_check_copy_offload(src_q, dest_q);
+	if (offload)
 		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
 
+	if (ret || !offload)
+		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_issue_copy);
diff --git a/block/blk-map.c b/block/blk-map.c
index 7ffde64f9019..ca2ad2c21f42 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -340,7 +340,7 @@ static void bio_map_kern_endio(struct bio *bio)
  *	Map the kernel address into a bio suitable for io to a block
  *	device. Returns an error pointer in case of error.
  */
-static struct bio *bio_map_kern(struct request_queue *q, void *data,
+struct bio *bio_map_kern(struct request_queue *q, void *data,
 		unsigned int len, gfp_t gfp_mask)
 {
 	unsigned long kaddr = (unsigned long)data;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c6cb3fe82ba2..ea1f3c8f8dad 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
 int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+		gfp_t gfp_mask);
 int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
 		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
 
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 05/10] nvme: add copy offload support
       [not found]   ` <CGME20220426102001epcas5p4e321347334971d704cb19ffa25f9d0b4@epcas5p4.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Kanchan Joshi, Javier González, Arnav Dawn,
	Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart,
	Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

For device supporting native copy, nvme driver receives read and
write request with BLK_COPY op flags.
For read request the nvme driver populates the payload with source
information.
For write request the driver converts it to nvme copy command using the
source information in the payload and submits to the device.
current design only supports single source range.
This design is courtesy Mikulas Patocka's token based copy

trace event support for nvme_copy_cmd.
Set the device copy limits to queue limits.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 drivers/nvme/host/core.c  | 116 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/fc.c    |   4 ++
 drivers/nvme/host/nvme.h  |   7 +++
 drivers/nvme/host/pci.c   |  25 ++++++++
 drivers/nvme/host/rdma.c  |   6 ++
 drivers/nvme/host/tcp.c   |  14 +++++
 drivers/nvme/host/trace.c |  19 +++++++
 include/linux/nvme.h      |  43 +++++++++++++-
 8 files changed, 229 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b9b0fbde97c8..9cbc8faace78 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,6 +724,87 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+static inline blk_status_t nvme_setup_copy_read(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+
+	return BLK_STS_OK;
+}
+
+static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
+	       struct request *req, struct nvme_command *cmnd)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_copy_range *range = NULL;
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+	unsigned short nr_range = 1;
+	u16 control = 0;
+	u32 dsmgmt = 0;
+
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+			unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+			unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+
+	if (req->cmd_flags & REQ_FAILFAST_DEV)
+		control |= NVME_RW_LR;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(dst_lba);
+
+	range = kmalloc_array(nr_range, sizeof(*range),
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!range)
+		return BLK_STS_RESOURCE;
+
+	range[0].slba = cpu_to_le64(src_lba);
+	range[0].nlb = cpu_to_le16(n_lba - 1);
+
+	cmnd->copy.nr_range = 0;
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * nr_range;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	cmnd->copy.control = cpu_to_le16(control);
+	cmnd->copy.dspec = cpu_to_le32(dsmgmt);
+
+	return BLK_STS_OK;
+}
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -947,10 +1028,16 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
 	case REQ_OP_READ:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_read(ns, req);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
 	case REQ_OP_WRITE:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_write(ns, req, cmd);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
 		break;
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
@@ -1642,6 +1729,29 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
+				       struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct request_queue *q = disk->queue;
+
+	if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
+		blk_queue_max_copy_sectors(q, 0);
+		blk_queue_max_copy_range_sectors(q, 0);
+		blk_queue_max_copy_nr_ranges(q, 0);
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		return;
+	}
+
+	/* setting copy limits */
+	if (blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, q))
+		return;
+
+	blk_queue_max_copy_sectors(q, nvme_lba_to_sect(ns, le32_to_cpu(id->mcl)));
+	blk_queue_max_copy_range_sectors(q, nvme_lba_to_sect(ns, le16_to_cpu(id->mssrl)));
+	blk_queue_max_copy_nr_ranges(q, id->msrc + 1);
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
 	return uuid_equal(&a->uuid, &b->uuid) &&
@@ -1841,6 +1951,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	set_capacity_and_notify(disk, capacity);
 
 	nvme_config_discard(disk, ns);
+	nvme_config_copy(disk, ns, id);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
 }
@@ -4833,6 +4944,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 080f85f4105f..0fea231b7ccb 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2788,6 +2788,10 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		return ret;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
 	/*
 	 * nvme core doesn't quite treat the rq opaquely. Commands such
 	 * as WRITE ZEROES will return a non-zero rq payload_bytes yet
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a2b53ca63335..dc51fc647f23 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -482,6 +482,13 @@ struct nvme_ns {
 
 };
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3aacf1c0d5a5..b9081c983b6f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -511,6 +511,14 @@ static inline void nvme_sq_copy_cmd(struct nvme_queue *nvmeq,
 		nvmeq->sq_tail = 0;
 }
 
+static void nvme_commit_sqdb(struct nvme_queue *nvmeq)
+{
+	spin_lock(&nvmeq->sq_lock);
+	if (nvmeq->sq_tail != nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq, true);
+	spin_unlock(&nvmeq->sq_lock);
+}
+
 static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
@@ -918,6 +926,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	if (ret)
 		return ret;
 
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_start_request(req);
+		return BLK_STS_OK;
+	}
+
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &iod->cmd);
 		if (ret)
@@ -931,6 +944,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	}
 
 	blk_mq_start_request(req);
+
 	return BLK_STS_OK;
 out_unmap_data:
 	nvme_unmap_data(dev, req);
@@ -964,6 +978,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	ret = nvme_prep_rq(dev, req);
 	if (unlikely(ret))
 		return ret;
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_set_request_complete(req);
+		blk_mq_end_request(req, BLK_STS_OK);
+		/* Commit the sq if copy read was the last req in the list,
+		 * as copy read deoesn't update sq db
+		 */
+		if (bd->last)
+			nvme_commit_sqdb(nvmeq);
+		return ret;
+	}
+
 	spin_lock(&nvmeq->sq_lock);
 	nvme_sq_copy_cmd(nvmeq, &iod->cmd);
 	nvme_write_sq_db(nvmeq, bd->last);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a69a45c5bd6..78af337c51bb 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2087,6 +2087,12 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		goto unmap_qe;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		ret = BLK_STS_OK;
+		goto unmap_qe;
+	}
+
 	blk_mq_start_request(rq);
 
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index ad3a2bf2f1e9..4e4cdcf8210a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2394,6 +2394,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 	if (ret)
 		return ret;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_start_request(req);
+		return BLK_STS_OK;
+	}
+
 	req->state = NVME_TCP_SEND_CMD_PDU;
 	req->status = cpu_to_le16(NVME_SC_SUCCESS);
 	req->offset = 0;
@@ -2462,6 +2467,15 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_set_request_complete(rq);
+		blk_mq_end_request(rq, BLK_STS_OK);
+		/* if copy read is the last req queue tcp reqs */
+		if (bd->last && nvme_tcp_queue_more(queue))
+			queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
+		return ret;
+	}
+
 	nvme_tcp_queue_request(req, true, bd->last);
 
 	return BLK_STS_OK;
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
index 2a89c5aa0790..ab72bf546a13 100644
--- a/drivers/nvme/host/trace.c
+++ b/drivers/nvme/host/trace.c
@@ -150,6 +150,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10)
 	return ret;
 }
 
+static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 slba = get_unaligned_le64(cdw10);
+	u8 nr_range = get_unaligned_le16(cdw10 + 8);
+	u16 control = get_unaligned_le16(cdw10 + 10);
+	u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
+	u32 reftag = get_unaligned_le32(cdw10 +  16);
+
+	trace_seq_printf(p,
+			 "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
+			 slba, nr_range, control, dsmgmt, reftag);
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+
 static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10)
 {
 	const char *ret = trace_seq_buffer_ptr(p);
@@ -243,6 +260,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
 		return nvme_trace_zone_mgmt_send(p, cdw10);
 	case nvme_cmd_zone_mgmt_recv:
 		return nvme_trace_zone_mgmt_recv(p, cdw10);
+	case nvme_cmd_copy:
+		return nvme_trace_copy(p, cdw10);
 	default:
 		return nvme_trace_common(p, cdw10);
 	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index f626a445d1a8..ec12492b3063 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -316,7 +316,7 @@ struct nvme_id_ctrl {
 	__u8			nvscc;
 	__u8			nwpc;
 	__le16			acwu;
-	__u8			rsvd534[2];
+	__le16			ocfs;
 	__le32			sgls;
 	__le32			mnan;
 	__u8			rsvd544[224];
@@ -344,6 +344,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_NS_MNGT_SUPP		= 1 << 3,
@@ -393,7 +394,10 @@ struct nvme_id_ns {
 	__le16			npdg;
 	__le16			npda;
 	__le16			nows;
-	__u8			rsvd74[18];
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd91[11];
 	__le32			anagrpid;
 	__u8			rsvd96[3];
 	__u8			nsattr;
@@ -750,6 +754,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -771,7 +776,8 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
-		nvme_opcode_name(nvme_cmd_zone_append))
+		nvme_opcode_name(nvme_cmd_zone_append),		\
+		nvme_opcode_name(nvme_cmd_copy))
 
 
 
@@ -945,6 +951,36 @@ struct nvme_dsm_range {
 	__le64			slba;
 };
 
+struct nvme_copy_command {
+	__u8                    opcode;
+	__u8                    flags;
+	__u16                   command_id;
+	__le32                  nsid;
+	__u64                   rsvd2;
+	__le64                  metadata;
+	union nvme_data_ptr     dptr;
+	__le64                  sdlba;
+	__u8			nr_range;
+	__u8			rsvd12;
+	__le16                  control;
+	__le16                  rsvd13;
+	__le16			dspec;
+	__le32                  ilbrt;
+	__le16                  lbat;
+	__le16                  lbatm;
+};
+
+struct nvme_copy_range {
+	__le64			rsvd0;
+	__le64			slba;
+	__le16			nlb;
+	__le16			rsvd18;
+	__le32			rsvd20;
+	__le32			eilbrt;
+	__le16			elbat;
+	__le16			elbatm;
+};
+
 struct nvme_write_zeroes_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1499,6 +1535,7 @@ struct nvme_command {
 		struct nvme_download_firmware dlfw;
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
+		struct nvme_copy_command copy;
 		struct nvme_write_zeroes_cmd write_zeroes;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 05/10] nvme: add copy offload support
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Kanchan Joshi, Javier González, Arnav Dawn,
	Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart,
	Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

For device supporting native copy, nvme driver receives read and
write request with BLK_COPY op flags.
For read request the nvme driver populates the payload with source
information.
For write request the driver converts it to nvme copy command using the
source information in the payload and submits to the device.
current design only supports single source range.
This design is courtesy Mikulas Patocka's token based copy

trace event support for nvme_copy_cmd.
Set the device copy limits to queue limits.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 drivers/nvme/host/core.c  | 116 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/fc.c    |   4 ++
 drivers/nvme/host/nvme.h  |   7 +++
 drivers/nvme/host/pci.c   |  25 ++++++++
 drivers/nvme/host/rdma.c  |   6 ++
 drivers/nvme/host/tcp.c   |  14 +++++
 drivers/nvme/host/trace.c |  19 +++++++
 include/linux/nvme.h      |  43 +++++++++++++-
 8 files changed, 229 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b9b0fbde97c8..9cbc8faace78 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,6 +724,87 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+static inline blk_status_t nvme_setup_copy_read(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+
+	return BLK_STS_OK;
+}
+
+static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
+	       struct request *req, struct nvme_command *cmnd)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_copy_range *range = NULL;
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+	unsigned short nr_range = 1;
+	u16 control = 0;
+	u32 dsmgmt = 0;
+
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+			unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+			unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+
+	if (req->cmd_flags & REQ_FAILFAST_DEV)
+		control |= NVME_RW_LR;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(dst_lba);
+
+	range = kmalloc_array(nr_range, sizeof(*range),
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!range)
+		return BLK_STS_RESOURCE;
+
+	range[0].slba = cpu_to_le64(src_lba);
+	range[0].nlb = cpu_to_le16(n_lba - 1);
+
+	cmnd->copy.nr_range = 0;
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * nr_range;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	cmnd->copy.control = cpu_to_le16(control);
+	cmnd->copy.dspec = cpu_to_le32(dsmgmt);
+
+	return BLK_STS_OK;
+}
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -947,10 +1028,16 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
 	case REQ_OP_READ:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_read(ns, req);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
 	case REQ_OP_WRITE:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_write(ns, req, cmd);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
 		break;
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
@@ -1642,6 +1729,29 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
+				       struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct request_queue *q = disk->queue;
+
+	if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
+		blk_queue_max_copy_sectors(q, 0);
+		blk_queue_max_copy_range_sectors(q, 0);
+		blk_queue_max_copy_nr_ranges(q, 0);
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		return;
+	}
+
+	/* setting copy limits */
+	if (blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, q))
+		return;
+
+	blk_queue_max_copy_sectors(q, nvme_lba_to_sect(ns, le32_to_cpu(id->mcl)));
+	blk_queue_max_copy_range_sectors(q, nvme_lba_to_sect(ns, le16_to_cpu(id->mssrl)));
+	blk_queue_max_copy_nr_ranges(q, id->msrc + 1);
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
 	return uuid_equal(&a->uuid, &b->uuid) &&
@@ -1841,6 +1951,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	set_capacity_and_notify(disk, capacity);
 
 	nvme_config_discard(disk, ns);
+	nvme_config_copy(disk, ns, id);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
 }
@@ -4833,6 +4944,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 080f85f4105f..0fea231b7ccb 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2788,6 +2788,10 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		return ret;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
 	/*
 	 * nvme core doesn't quite treat the rq opaquely. Commands such
 	 * as WRITE ZEROES will return a non-zero rq payload_bytes yet
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a2b53ca63335..dc51fc647f23 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -482,6 +482,13 @@ struct nvme_ns {
 
 };
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3aacf1c0d5a5..b9081c983b6f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -511,6 +511,14 @@ static inline void nvme_sq_copy_cmd(struct nvme_queue *nvmeq,
 		nvmeq->sq_tail = 0;
 }
 
+static void nvme_commit_sqdb(struct nvme_queue *nvmeq)
+{
+	spin_lock(&nvmeq->sq_lock);
+	if (nvmeq->sq_tail != nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq, true);
+	spin_unlock(&nvmeq->sq_lock);
+}
+
 static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
@@ -918,6 +926,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	if (ret)
 		return ret;
 
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_start_request(req);
+		return BLK_STS_OK;
+	}
+
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &iod->cmd);
 		if (ret)
@@ -931,6 +944,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	}
 
 	blk_mq_start_request(req);
+
 	return BLK_STS_OK;
 out_unmap_data:
 	nvme_unmap_data(dev, req);
@@ -964,6 +978,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	ret = nvme_prep_rq(dev, req);
 	if (unlikely(ret))
 		return ret;
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_set_request_complete(req);
+		blk_mq_end_request(req, BLK_STS_OK);
+		/* Commit the sq if copy read was the last req in the list,
+		 * as copy read deoesn't update sq db
+		 */
+		if (bd->last)
+			nvme_commit_sqdb(nvmeq);
+		return ret;
+	}
+
 	spin_lock(&nvmeq->sq_lock);
 	nvme_sq_copy_cmd(nvmeq, &iod->cmd);
 	nvme_write_sq_db(nvmeq, bd->last);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a69a45c5bd6..78af337c51bb 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2087,6 +2087,12 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		goto unmap_qe;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		ret = BLK_STS_OK;
+		goto unmap_qe;
+	}
+
 	blk_mq_start_request(rq);
 
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index ad3a2bf2f1e9..4e4cdcf8210a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2394,6 +2394,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 	if (ret)
 		return ret;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_start_request(req);
+		return BLK_STS_OK;
+	}
+
 	req->state = NVME_TCP_SEND_CMD_PDU;
 	req->status = cpu_to_le16(NVME_SC_SUCCESS);
 	req->offset = 0;
@@ -2462,6 +2467,15 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_set_request_complete(rq);
+		blk_mq_end_request(rq, BLK_STS_OK);
+		/* if copy read is the last req queue tcp reqs */
+		if (bd->last && nvme_tcp_queue_more(queue))
+			queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
+		return ret;
+	}
+
 	nvme_tcp_queue_request(req, true, bd->last);
 
 	return BLK_STS_OK;
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
index 2a89c5aa0790..ab72bf546a13 100644
--- a/drivers/nvme/host/trace.c
+++ b/drivers/nvme/host/trace.c
@@ -150,6 +150,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10)
 	return ret;
 }
 
+static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 slba = get_unaligned_le64(cdw10);
+	u8 nr_range = get_unaligned_le16(cdw10 + 8);
+	u16 control = get_unaligned_le16(cdw10 + 10);
+	u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
+	u32 reftag = get_unaligned_le32(cdw10 +  16);
+
+	trace_seq_printf(p,
+			 "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
+			 slba, nr_range, control, dsmgmt, reftag);
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+
 static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10)
 {
 	const char *ret = trace_seq_buffer_ptr(p);
@@ -243,6 +260,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
 		return nvme_trace_zone_mgmt_send(p, cdw10);
 	case nvme_cmd_zone_mgmt_recv:
 		return nvme_trace_zone_mgmt_recv(p, cdw10);
+	case nvme_cmd_copy:
+		return nvme_trace_copy(p, cdw10);
 	default:
 		return nvme_trace_common(p, cdw10);
 	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index f626a445d1a8..ec12492b3063 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -316,7 +316,7 @@ struct nvme_id_ctrl {
 	__u8			nvscc;
 	__u8			nwpc;
 	__le16			acwu;
-	__u8			rsvd534[2];
+	__le16			ocfs;
 	__le32			sgls;
 	__le32			mnan;
 	__u8			rsvd544[224];
@@ -344,6 +344,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_NS_MNGT_SUPP		= 1 << 3,
@@ -393,7 +394,10 @@ struct nvme_id_ns {
 	__le16			npdg;
 	__le16			npda;
 	__le16			nows;
-	__u8			rsvd74[18];
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd91[11];
 	__le32			anagrpid;
 	__u8			rsvd96[3];
 	__u8			nsattr;
@@ -750,6 +754,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -771,7 +776,8 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
-		nvme_opcode_name(nvme_cmd_zone_append))
+		nvme_opcode_name(nvme_cmd_zone_append),		\
+		nvme_opcode_name(nvme_cmd_copy))
 
 
 
@@ -945,6 +951,36 @@ struct nvme_dsm_range {
 	__le64			slba;
 };
 
+struct nvme_copy_command {
+	__u8                    opcode;
+	__u8                    flags;
+	__u16                   command_id;
+	__le32                  nsid;
+	__u64                   rsvd2;
+	__le64                  metadata;
+	union nvme_data_ptr     dptr;
+	__le64                  sdlba;
+	__u8			nr_range;
+	__u8			rsvd12;
+	__le16                  control;
+	__le16                  rsvd13;
+	__le16			dspec;
+	__le32                  ilbrt;
+	__le16                  lbat;
+	__le16                  lbatm;
+};
+
+struct nvme_copy_range {
+	__le64			rsvd0;
+	__le64			slba;
+	__le16			nlb;
+	__le16			rsvd18;
+	__le32			rsvd20;
+	__le32			eilbrt;
+	__le16			elbat;
+	__le16			elbatm;
+};
+
 struct nvme_write_zeroes_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1499,6 +1535,7 @@ struct nvme_command {
 		struct nvme_download_firmware dlfw;
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
+		struct nvme_copy_command copy;
 		struct nvme_write_zeroes_cmd write_zeroes;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 05/10] nvme: add copy offload support
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, Kanchan Joshi, martin.petersen,
	linux-kernel, Arnav Dawn, jack, linux-fsdevel,
	Javier González, lsf-pc, Damien Le Moal, Alexander Viro

For device supporting native copy, nvme driver receives read and
write request with BLK_COPY op flags.
For read request the nvme driver populates the payload with source
information.
For write request the driver converts it to nvme copy command using the
source information in the payload and submits to the device.
current design only supports single source range.
This design is courtesy Mikulas Patocka's token based copy

trace event support for nvme_copy_cmd.
Set the device copy limits to queue limits.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 drivers/nvme/host/core.c  | 116 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/fc.c    |   4 ++
 drivers/nvme/host/nvme.h  |   7 +++
 drivers/nvme/host/pci.c   |  25 ++++++++
 drivers/nvme/host/rdma.c  |   6 ++
 drivers/nvme/host/tcp.c   |  14 +++++
 drivers/nvme/host/trace.c |  19 +++++++
 include/linux/nvme.h      |  43 +++++++++++++-
 8 files changed, 229 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b9b0fbde97c8..9cbc8faace78 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,6 +724,87 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+static inline blk_status_t nvme_setup_copy_read(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+
+	return BLK_STS_OK;
+}
+
+static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
+	       struct request *req, struct nvme_command *cmnd)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_copy_range *range = NULL;
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+	unsigned short nr_range = 1;
+	u16 control = 0;
+	u32 dsmgmt = 0;
+
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+			unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+			unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+
+	if (req->cmd_flags & REQ_FAILFAST_DEV)
+		control |= NVME_RW_LR;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(dst_lba);
+
+	range = kmalloc_array(nr_range, sizeof(*range),
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!range)
+		return BLK_STS_RESOURCE;
+
+	range[0].slba = cpu_to_le64(src_lba);
+	range[0].nlb = cpu_to_le16(n_lba - 1);
+
+	cmnd->copy.nr_range = 0;
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * nr_range;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	cmnd->copy.control = cpu_to_le16(control);
+	cmnd->copy.dspec = cpu_to_le32(dsmgmt);
+
+	return BLK_STS_OK;
+}
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -947,10 +1028,16 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
 	case REQ_OP_READ:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_read(ns, req);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
 	case REQ_OP_WRITE:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_write(ns, req, cmd);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
 		break;
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
@@ -1642,6 +1729,29 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
+				       struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct request_queue *q = disk->queue;
+
+	if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
+		blk_queue_max_copy_sectors(q, 0);
+		blk_queue_max_copy_range_sectors(q, 0);
+		blk_queue_max_copy_nr_ranges(q, 0);
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		return;
+	}
+
+	/* setting copy limits */
+	if (blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, q))
+		return;
+
+	blk_queue_max_copy_sectors(q, nvme_lba_to_sect(ns, le32_to_cpu(id->mcl)));
+	blk_queue_max_copy_range_sectors(q, nvme_lba_to_sect(ns, le16_to_cpu(id->mssrl)));
+	blk_queue_max_copy_nr_ranges(q, id->msrc + 1);
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
 	return uuid_equal(&a->uuid, &b->uuid) &&
@@ -1841,6 +1951,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	set_capacity_and_notify(disk, capacity);
 
 	nvme_config_discard(disk, ns);
+	nvme_config_copy(disk, ns, id);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
 }
@@ -4833,6 +4944,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 080f85f4105f..0fea231b7ccb 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2788,6 +2788,10 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		return ret;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
 	/*
 	 * nvme core doesn't quite treat the rq opaquely. Commands such
 	 * as WRITE ZEROES will return a non-zero rq payload_bytes yet
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a2b53ca63335..dc51fc647f23 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -482,6 +482,13 @@ struct nvme_ns {
 
 };
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3aacf1c0d5a5..b9081c983b6f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -511,6 +511,14 @@ static inline void nvme_sq_copy_cmd(struct nvme_queue *nvmeq,
 		nvmeq->sq_tail = 0;
 }
 
+static void nvme_commit_sqdb(struct nvme_queue *nvmeq)
+{
+	spin_lock(&nvmeq->sq_lock);
+	if (nvmeq->sq_tail != nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq, true);
+	spin_unlock(&nvmeq->sq_lock);
+}
+
 static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
@@ -918,6 +926,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	if (ret)
 		return ret;
 
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_start_request(req);
+		return BLK_STS_OK;
+	}
+
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &iod->cmd);
 		if (ret)
@@ -931,6 +944,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	}
 
 	blk_mq_start_request(req);
+
 	return BLK_STS_OK;
 out_unmap_data:
 	nvme_unmap_data(dev, req);
@@ -964,6 +978,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	ret = nvme_prep_rq(dev, req);
 	if (unlikely(ret))
 		return ret;
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_set_request_complete(req);
+		blk_mq_end_request(req, BLK_STS_OK);
+		/* Commit the sq if copy read was the last req in the list,
+		 * as copy read deoesn't update sq db
+		 */
+		if (bd->last)
+			nvme_commit_sqdb(nvmeq);
+		return ret;
+	}
+
 	spin_lock(&nvmeq->sq_lock);
 	nvme_sq_copy_cmd(nvmeq, &iod->cmd);
 	nvme_write_sq_db(nvmeq, bd->last);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a69a45c5bd6..78af337c51bb 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2087,6 +2087,12 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		goto unmap_qe;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		ret = BLK_STS_OK;
+		goto unmap_qe;
+	}
+
 	blk_mq_start_request(rq);
 
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index ad3a2bf2f1e9..4e4cdcf8210a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2394,6 +2394,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 	if (ret)
 		return ret;
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_start_request(req);
+		return BLK_STS_OK;
+	}
+
 	req->state = NVME_TCP_SEND_CMD_PDU;
 	req->status = cpu_to_le16(NVME_SC_SUCCESS);
 	req->offset = 0;
@@ -2462,6 +2467,15 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
+	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
+		blk_mq_set_request_complete(rq);
+		blk_mq_end_request(rq, BLK_STS_OK);
+		/* if copy read is the last req queue tcp reqs */
+		if (bd->last && nvme_tcp_queue_more(queue))
+			queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
+		return ret;
+	}
+
 	nvme_tcp_queue_request(req, true, bd->last);
 
 	return BLK_STS_OK;
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
index 2a89c5aa0790..ab72bf546a13 100644
--- a/drivers/nvme/host/trace.c
+++ b/drivers/nvme/host/trace.c
@@ -150,6 +150,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10)
 	return ret;
 }
 
+static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 slba = get_unaligned_le64(cdw10);
+	u8 nr_range = get_unaligned_le16(cdw10 + 8);
+	u16 control = get_unaligned_le16(cdw10 + 10);
+	u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
+	u32 reftag = get_unaligned_le32(cdw10 +  16);
+
+	trace_seq_printf(p,
+			 "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
+			 slba, nr_range, control, dsmgmt, reftag);
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+
 static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10)
 {
 	const char *ret = trace_seq_buffer_ptr(p);
@@ -243,6 +260,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
 		return nvme_trace_zone_mgmt_send(p, cdw10);
 	case nvme_cmd_zone_mgmt_recv:
 		return nvme_trace_zone_mgmt_recv(p, cdw10);
+	case nvme_cmd_copy:
+		return nvme_trace_copy(p, cdw10);
 	default:
 		return nvme_trace_common(p, cdw10);
 	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index f626a445d1a8..ec12492b3063 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -316,7 +316,7 @@ struct nvme_id_ctrl {
 	__u8			nvscc;
 	__u8			nwpc;
 	__le16			acwu;
-	__u8			rsvd534[2];
+	__le16			ocfs;
 	__le32			sgls;
 	__le32			mnan;
 	__u8			rsvd544[224];
@@ -344,6 +344,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_NS_MNGT_SUPP		= 1 << 3,
@@ -393,7 +394,10 @@ struct nvme_id_ns {
 	__le16			npdg;
 	__le16			npda;
 	__le16			nows;
-	__u8			rsvd74[18];
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd91[11];
 	__le32			anagrpid;
 	__u8			rsvd96[3];
 	__u8			nsattr;
@@ -750,6 +754,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -771,7 +776,8 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
-		nvme_opcode_name(nvme_cmd_zone_append))
+		nvme_opcode_name(nvme_cmd_zone_append),		\
+		nvme_opcode_name(nvme_cmd_copy))
 
 
 
@@ -945,6 +951,36 @@ struct nvme_dsm_range {
 	__le64			slba;
 };
 
+struct nvme_copy_command {
+	__u8                    opcode;
+	__u8                    flags;
+	__u16                   command_id;
+	__le32                  nsid;
+	__u64                   rsvd2;
+	__le64                  metadata;
+	union nvme_data_ptr     dptr;
+	__le64                  sdlba;
+	__u8			nr_range;
+	__u8			rsvd12;
+	__le16                  control;
+	__le16                  rsvd13;
+	__le16			dspec;
+	__le32                  ilbrt;
+	__le16                  lbat;
+	__le16                  lbatm;
+};
+
+struct nvme_copy_range {
+	__le64			rsvd0;
+	__le64			slba;
+	__le16			nlb;
+	__le16			rsvd18;
+	__le32			rsvd20;
+	__le32			eilbrt;
+	__le16			elbat;
+	__le16			elbatm;
+};
+
 struct nvme_write_zeroes_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1499,6 +1535,7 @@ struct nvme_command {
 		struct nvme_download_firmware dlfw;
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
+		struct nvme_copy_command copy;
 		struct nvme_write_zeroes_cmd write_zeroes;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 06/10] nvmet: add copy command support for bdev and file ns
       [not found]   ` <CGME20220426102009epcas5p3e5b1ddfd5d3c7200972cecb139650da6@epcas5p3.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Arnav Dawn, Nitesh Shetty, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Alexander Viro, linux-kernel

From: Arnav Dawn <arnav.dawn@samsung.com>

Add support for handling target command on target.
For bdev-ns we call into blkdev_issue_copy, which the block layer
completes by a offloaded copy request to backend bdev or by emulating the
request.

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/nvme/host/tcp.c           |  2 +-
 drivers/nvme/target/admin-cmd.c   |  8 +++-
 drivers/nvme/target/io-cmd-bdev.c | 65 +++++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c | 49 +++++++++++++++++++++++
 4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 4e4cdcf8210a..2c77e5b596bb 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2395,7 +2395,7 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 		return ret;
 
 	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
-		blk_mq_start_request(req);
+		blk_mq_start_request(rq);
 		return BLK_STS_OK;
 	}
 
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 397daaf51f1b..db32debdb528 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
-			NVME_CTRL_ONCS_WRITE_ZEROES);
-
+			NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY);
 	/* XXX: don't report vwc if the underlying device is write through */
 	id->vwc = NVME_CTRL_VWC_PRESENT;
 
@@ -534,6 +533,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	if (req->ns->bdev)
 		nvmet_bdev_set_limits(req->ns->bdev, id);
+	else {
+		id->msrc = to0based(BIO_MAX_VECS);
+		id->mssrl = cpu_to_le16(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
+		id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
+	}
 
 	/*
 	 * We just provide a single LBA format that matches what the
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 27a72504d31c..18666d36423f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -47,6 +47,30 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->npda = id->npdg;
 	/* NOWS = Namespace Optimal Write Size */
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
+
+	/*Copy limits*/
+	if (ql->max_copy_sectors) {
+		id->mcl = cpu_to_le32((ql->max_copy_sectors << 9) / ql->logical_block_size);
+		id->mssrl = cpu_to_le16((ql->max_copy_range_sectors << 9) /
+				ql->logical_block_size);
+		id->msrc = to0based(ql->max_copy_nr_ranges);
+	} else {
+		if (ql->zoned == BLK_ZONED_NONE) {
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le16(
+					(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
+			id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
+#ifdef CONFIG_BLK_DEV_ZONED
+		} else {
+			/* TODO: get right values for zoned device */
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le16(min((BIO_MAX_VECS << PAGE_SHIFT),
+					ql->chunk_sectors) / ql->logical_block_size);
+			id->mcl = cpu_to_le32(min(le16_to_cpu(id->mssrl) * BIO_MAX_VECS,
+						ql->chunk_sectors));
+#endif
+		}
+	}
 }
 
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
@@ -442,6 +466,43 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static void nvmet_bdev_execute_copy(struct nvmet_req *req)
+{
+	struct nvme_copy_range range;
+	struct range_entry *rlist;
+	struct nvme_command *cmnd = req->cmd;
+	sector_t dest, dest_off = 0;
+	int ret, id, nr_range;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift;
+	rlist = kmalloc_array(nr_range, sizeof(*rlist), GFP_KERNEL);
+
+	for (id = 0 ; id < nr_range; id++) {
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, sizeof(range));
+		if (ret)
+			goto out;
+
+		rlist[id].dst = dest + dest_off;
+		rlist[id].src = le64_to_cpu(range.slba) << req->ns->blksize_shift;
+		rlist[id].len = (le16_to_cpu(range.nlb) + 1) << req->ns->blksize_shift;
+		rlist[id].comp_len = 0;
+		dest_off += rlist[id].len;
+	}
+	ret = blkdev_issue_copy(req->ns->bdev, nr_range, rlist, req->ns->bdev, GFP_KERNEL);
+	if (ret) {
+		for (id = 0 ; id < nr_range; id++) {
+			if (rlist[id].len != rlist[id].comp_len) {
+				req->cqe->result.u32 = cpu_to_le32(id);
+				break;
+			}
+		}
+	}
+out:
+	kfree(rlist);
+	nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+}
+
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 {
 	switch (req->cmd->common.opcode) {
@@ -460,6 +521,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_bdev_execute_copy;
+		return 0;
+
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index f3d58abf11e0..fe26a9120436 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -338,6 +338,46 @@ static void nvmet_file_dsm_work(struct work_struct *w)
 	}
 }
 
+static void nvmet_file_copy_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+	int nr_range;
+	loff_t pos;
+	struct nvme_command *cmnd = req->cmd;
+	int ret = 0, len = 0, src, id;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
+	if (unlikely(pos + req->transfer_len > req->ns->size)) {
+		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
+		return;
+	}
+
+	for (id = 0 ; id < nr_range; id++) {
+		struct nvme_copy_range range;
+
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
+					sizeof(range));
+		if (ret)
+			goto out;
+
+		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
+		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
+		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
+out:
+		if (ret != len) {
+			pos += ret;
+			req->cqe->result.u32 = cpu_to_le32(id);
+			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
+					errno_to_nvme_status(req, -EIO));
+			return;
+
+		} else
+			pos += len;
+}
+	nvmet_req_complete(req, ret);
+
+}
 static void nvmet_file_execute_dsm(struct nvmet_req *req)
 {
 	if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req)))
@@ -346,6 +386,12 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
 	queue_work(nvmet_wq, &req->f.work);
 }
 
+static void nvmet_file_execute_copy(struct nvmet_req *req)
+{
+	INIT_WORK(&req->f.work, nvmet_file_copy_work);
+	schedule_work(&req->f.work);
+}
+
 static void nvmet_file_write_zeroes_work(struct work_struct *w)
 {
 	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -392,6 +438,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_file_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_file_execute_copy;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 06/10] nvmet: add copy command support for bdev and file ns
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Arnav Dawn, Nitesh Shetty, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Alexander Viro, linux-kernel

From: Arnav Dawn <arnav.dawn@samsung.com>

Add support for handling target command on target.
For bdev-ns we call into blkdev_issue_copy, which the block layer
completes by a offloaded copy request to backend bdev or by emulating the
request.

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/nvme/host/tcp.c           |  2 +-
 drivers/nvme/target/admin-cmd.c   |  8 +++-
 drivers/nvme/target/io-cmd-bdev.c | 65 +++++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c | 49 +++++++++++++++++++++++
 4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 4e4cdcf8210a..2c77e5b596bb 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2395,7 +2395,7 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 		return ret;
 
 	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
-		blk_mq_start_request(req);
+		blk_mq_start_request(rq);
 		return BLK_STS_OK;
 	}
 
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 397daaf51f1b..db32debdb528 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
-			NVME_CTRL_ONCS_WRITE_ZEROES);
-
+			NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY);
 	/* XXX: don't report vwc if the underlying device is write through */
 	id->vwc = NVME_CTRL_VWC_PRESENT;
 
@@ -534,6 +533,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	if (req->ns->bdev)
 		nvmet_bdev_set_limits(req->ns->bdev, id);
+	else {
+		id->msrc = to0based(BIO_MAX_VECS);
+		id->mssrl = cpu_to_le16(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
+		id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
+	}
 
 	/*
 	 * We just provide a single LBA format that matches what the
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 27a72504d31c..18666d36423f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -47,6 +47,30 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->npda = id->npdg;
 	/* NOWS = Namespace Optimal Write Size */
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
+
+	/*Copy limits*/
+	if (ql->max_copy_sectors) {
+		id->mcl = cpu_to_le32((ql->max_copy_sectors << 9) / ql->logical_block_size);
+		id->mssrl = cpu_to_le16((ql->max_copy_range_sectors << 9) /
+				ql->logical_block_size);
+		id->msrc = to0based(ql->max_copy_nr_ranges);
+	} else {
+		if (ql->zoned == BLK_ZONED_NONE) {
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le16(
+					(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
+			id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
+#ifdef CONFIG_BLK_DEV_ZONED
+		} else {
+			/* TODO: get right values for zoned device */
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le16(min((BIO_MAX_VECS << PAGE_SHIFT),
+					ql->chunk_sectors) / ql->logical_block_size);
+			id->mcl = cpu_to_le32(min(le16_to_cpu(id->mssrl) * BIO_MAX_VECS,
+						ql->chunk_sectors));
+#endif
+		}
+	}
 }
 
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
@@ -442,6 +466,43 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static void nvmet_bdev_execute_copy(struct nvmet_req *req)
+{
+	struct nvme_copy_range range;
+	struct range_entry *rlist;
+	struct nvme_command *cmnd = req->cmd;
+	sector_t dest, dest_off = 0;
+	int ret, id, nr_range;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift;
+	rlist = kmalloc_array(nr_range, sizeof(*rlist), GFP_KERNEL);
+
+	for (id = 0 ; id < nr_range; id++) {
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, sizeof(range));
+		if (ret)
+			goto out;
+
+		rlist[id].dst = dest + dest_off;
+		rlist[id].src = le64_to_cpu(range.slba) << req->ns->blksize_shift;
+		rlist[id].len = (le16_to_cpu(range.nlb) + 1) << req->ns->blksize_shift;
+		rlist[id].comp_len = 0;
+		dest_off += rlist[id].len;
+	}
+	ret = blkdev_issue_copy(req->ns->bdev, nr_range, rlist, req->ns->bdev, GFP_KERNEL);
+	if (ret) {
+		for (id = 0 ; id < nr_range; id++) {
+			if (rlist[id].len != rlist[id].comp_len) {
+				req->cqe->result.u32 = cpu_to_le32(id);
+				break;
+			}
+		}
+	}
+out:
+	kfree(rlist);
+	nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+}
+
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 {
 	switch (req->cmd->common.opcode) {
@@ -460,6 +521,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_bdev_execute_copy;
+		return 0;
+
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index f3d58abf11e0..fe26a9120436 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -338,6 +338,46 @@ static void nvmet_file_dsm_work(struct work_struct *w)
 	}
 }
 
+static void nvmet_file_copy_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+	int nr_range;
+	loff_t pos;
+	struct nvme_command *cmnd = req->cmd;
+	int ret = 0, len = 0, src, id;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
+	if (unlikely(pos + req->transfer_len > req->ns->size)) {
+		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
+		return;
+	}
+
+	for (id = 0 ; id < nr_range; id++) {
+		struct nvme_copy_range range;
+
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
+					sizeof(range));
+		if (ret)
+			goto out;
+
+		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
+		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
+		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
+out:
+		if (ret != len) {
+			pos += ret;
+			req->cqe->result.u32 = cpu_to_le32(id);
+			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
+					errno_to_nvme_status(req, -EIO));
+			return;
+
+		} else
+			pos += len;
+}
+	nvmet_req_complete(req, ret);
+
+}
 static void nvmet_file_execute_dsm(struct nvmet_req *req)
 {
 	if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req)))
@@ -346,6 +386,12 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
 	queue_work(nvmet_wq, &req->f.work);
 }
 
+static void nvmet_file_execute_copy(struct nvmet_req *req)
+{
+	INIT_WORK(&req->f.work, nvmet_file_copy_work);
+	schedule_work(&req->f.work);
+}
+
 static void nvmet_file_write_zeroes_work(struct work_struct *w)
 {
 	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -392,6 +438,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_file_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_file_execute_copy;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 06/10] nvmet: add copy command support for bdev and file ns
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, lsf-pc, Damien Le Moal,
	Alexander Viro

From: Arnav Dawn <arnav.dawn@samsung.com>

Add support for handling target command on target.
For bdev-ns we call into blkdev_issue_copy, which the block layer
completes by a offloaded copy request to backend bdev or by emulating the
request.

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/nvme/host/tcp.c           |  2 +-
 drivers/nvme/target/admin-cmd.c   |  8 +++-
 drivers/nvme/target/io-cmd-bdev.c | 65 +++++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c | 49 +++++++++++++++++++++++
 4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 4e4cdcf8210a..2c77e5b596bb 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2395,7 +2395,7 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 		return ret;
 
 	if (unlikely((rq->cmd_flags & REQ_COPY) && (req_op(rq) == REQ_OP_READ))) {
-		blk_mq_start_request(req);
+		blk_mq_start_request(rq);
 		return BLK_STS_OK;
 	}
 
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 397daaf51f1b..db32debdb528 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
-			NVME_CTRL_ONCS_WRITE_ZEROES);
-
+			NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY);
 	/* XXX: don't report vwc if the underlying device is write through */
 	id->vwc = NVME_CTRL_VWC_PRESENT;
 
@@ -534,6 +533,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	if (req->ns->bdev)
 		nvmet_bdev_set_limits(req->ns->bdev, id);
+	else {
+		id->msrc = to0based(BIO_MAX_VECS);
+		id->mssrl = cpu_to_le16(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
+		id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
+	}
 
 	/*
 	 * We just provide a single LBA format that matches what the
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 27a72504d31c..18666d36423f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -47,6 +47,30 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->npda = id->npdg;
 	/* NOWS = Namespace Optimal Write Size */
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
+
+	/*Copy limits*/
+	if (ql->max_copy_sectors) {
+		id->mcl = cpu_to_le32((ql->max_copy_sectors << 9) / ql->logical_block_size);
+		id->mssrl = cpu_to_le16((ql->max_copy_range_sectors << 9) /
+				ql->logical_block_size);
+		id->msrc = to0based(ql->max_copy_nr_ranges);
+	} else {
+		if (ql->zoned == BLK_ZONED_NONE) {
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le16(
+					(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
+			id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
+#ifdef CONFIG_BLK_DEV_ZONED
+		} else {
+			/* TODO: get right values for zoned device */
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le16(min((BIO_MAX_VECS << PAGE_SHIFT),
+					ql->chunk_sectors) / ql->logical_block_size);
+			id->mcl = cpu_to_le32(min(le16_to_cpu(id->mssrl) * BIO_MAX_VECS,
+						ql->chunk_sectors));
+#endif
+		}
+	}
 }
 
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
@@ -442,6 +466,43 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static void nvmet_bdev_execute_copy(struct nvmet_req *req)
+{
+	struct nvme_copy_range range;
+	struct range_entry *rlist;
+	struct nvme_command *cmnd = req->cmd;
+	sector_t dest, dest_off = 0;
+	int ret, id, nr_range;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift;
+	rlist = kmalloc_array(nr_range, sizeof(*rlist), GFP_KERNEL);
+
+	for (id = 0 ; id < nr_range; id++) {
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, sizeof(range));
+		if (ret)
+			goto out;
+
+		rlist[id].dst = dest + dest_off;
+		rlist[id].src = le64_to_cpu(range.slba) << req->ns->blksize_shift;
+		rlist[id].len = (le16_to_cpu(range.nlb) + 1) << req->ns->blksize_shift;
+		rlist[id].comp_len = 0;
+		dest_off += rlist[id].len;
+	}
+	ret = blkdev_issue_copy(req->ns->bdev, nr_range, rlist, req->ns->bdev, GFP_KERNEL);
+	if (ret) {
+		for (id = 0 ; id < nr_range; id++) {
+			if (rlist[id].len != rlist[id].comp_len) {
+				req->cqe->result.u32 = cpu_to_le32(id);
+				break;
+			}
+		}
+	}
+out:
+	kfree(rlist);
+	nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+}
+
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 {
 	switch (req->cmd->common.opcode) {
@@ -460,6 +521,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_bdev_execute_copy;
+		return 0;
+
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index f3d58abf11e0..fe26a9120436 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -338,6 +338,46 @@ static void nvmet_file_dsm_work(struct work_struct *w)
 	}
 }
 
+static void nvmet_file_copy_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+	int nr_range;
+	loff_t pos;
+	struct nvme_command *cmnd = req->cmd;
+	int ret = 0, len = 0, src, id;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
+	if (unlikely(pos + req->transfer_len > req->ns->size)) {
+		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
+		return;
+	}
+
+	for (id = 0 ; id < nr_range; id++) {
+		struct nvme_copy_range range;
+
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
+					sizeof(range));
+		if (ret)
+			goto out;
+
+		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
+		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
+		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
+out:
+		if (ret != len) {
+			pos += ret;
+			req->cqe->result.u32 = cpu_to_le32(id);
+			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
+					errno_to_nvme_status(req, -EIO));
+			return;
+
+		} else
+			pos += len;
+}
+	nvmet_req_complete(req, ret);
+
+}
 static void nvmet_file_execute_dsm(struct nvmet_req *req)
 {
 	if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req)))
@@ -346,6 +386,12 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
 	queue_work(nvmet_wq, &req->f.work);
 }
 
+static void nvmet_file_execute_copy(struct nvmet_req *req)
+{
+	INIT_WORK(&req->f.work, nvmet_file_copy_work);
+	schedule_work(&req->f.work);
+}
+
 static void nvmet_file_write_zeroes_work(struct work_struct *w)
 {
 	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -392,6 +438,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_file_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_file_execute_copy;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 07/10] dm: Add support for copy offload.
       [not found]   ` <CGME20220426102017epcas5p295d3b62eaa250765e48c767962cbf08b@epcas5p2.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

Before enabling copy for dm target, check if underlying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently splitting copy
request is not supported.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-table.c         | 45 +++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  6 +++++
 include/linux/device-mapper.h |  5 ++++
 3 files changed, 56 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index a37c7b763643..b7574f179ed6 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1893,6 +1893,38 @@ static bool dm_table_supports_nowait(struct dm_table *t)
 	return true;
 }
 
+static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev,
+				      sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return !blk_queue_copy(q);
+}
+
+static bool dm_table_supports_copy(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->copy_offload_supported)
+			return false;
+
+		/*
+		 * target provides copy support (as implied by setting 'copy_offload_supported')
+		 * and it relies on _all_ data devices having copy support.
+		 */
+		if (ti->copy_offload_supported &&
+		    (!ti->type->iterate_devices ||
+		     ti->type->iterate_devices(ti, device_not_copy_capable, NULL)))
+			return false;
+	}
+
+	return true;
+}
+
 static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev,
 				      sector_t start, sector_t len, void *data)
 {
@@ -1981,6 +2013,19 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		q->limits.discard_misaligned = 0;
 	}
 
+	if (!dm_table_supports_copy(t)) {
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		/* Must also clear copy limits... */
+		q->limits.max_copy_sectors = 0;
+		q->limits.max_hw_copy_sectors = 0;
+		q->limits.max_copy_range_sectors = 0;
+		q->limits.max_hw_copy_range_sectors = 0;
+		q->limits.max_copy_nr_ranges = 0;
+		q->limits.max_hw_copy_nr_ranges = 0;
+	} else {
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	}
+
 	if (!dm_table_supports_secure_erase(t))
 		q->limits.max_secure_erase_sectors = 0;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7e3b5bdcf520..b995de127093 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1595,6 +1595,12 @@ static blk_status_t __split_and_process_bio(struct clone_info *ci)
 	else if (unlikely(ci->is_abnormal_io))
 		return __process_abnormal_io(ci, ti);
 
+	if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
+				max_io_len(ti, ci->sector) < ci->sector_count)) {
+		DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
+				__func__, ci->sector_count, max_io_len(ti, ci->sector));
+		return -EIO;
+	}
 	/*
 	 * Only support bio polling for normal IO, and the target io is
 	 * exactly inside the dm_io instance (verified in dm_poll_dm_io)
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index c2a3758c4aaa..9304e640c9b9 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -362,6 +362,11 @@ struct dm_target {
 	 * after returning DM_MAPIO_SUBMITTED from its map function.
 	 */
 	bool accounts_remapped_io:1;
+
+	/*
+	 * copy offload is supported
+	 */
+	bool copy_offload_supported:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 07/10] dm: Add support for copy offload.
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

Before enabling copy for dm target, check if underlying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently splitting copy
request is not supported.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-table.c         | 45 +++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  6 +++++
 include/linux/device-mapper.h |  5 ++++
 3 files changed, 56 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index a37c7b763643..b7574f179ed6 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1893,6 +1893,38 @@ static bool dm_table_supports_nowait(struct dm_table *t)
 	return true;
 }
 
+static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev,
+				      sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return !blk_queue_copy(q);
+}
+
+static bool dm_table_supports_copy(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->copy_offload_supported)
+			return false;
+
+		/*
+		 * target provides copy support (as implied by setting 'copy_offload_supported')
+		 * and it relies on _all_ data devices having copy support.
+		 */
+		if (ti->copy_offload_supported &&
+		    (!ti->type->iterate_devices ||
+		     ti->type->iterate_devices(ti, device_not_copy_capable, NULL)))
+			return false;
+	}
+
+	return true;
+}
+
 static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev,
 				      sector_t start, sector_t len, void *data)
 {
@@ -1981,6 +2013,19 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		q->limits.discard_misaligned = 0;
 	}
 
+	if (!dm_table_supports_copy(t)) {
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		/* Must also clear copy limits... */
+		q->limits.max_copy_sectors = 0;
+		q->limits.max_hw_copy_sectors = 0;
+		q->limits.max_copy_range_sectors = 0;
+		q->limits.max_hw_copy_range_sectors = 0;
+		q->limits.max_copy_nr_ranges = 0;
+		q->limits.max_hw_copy_nr_ranges = 0;
+	} else {
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	}
+
 	if (!dm_table_supports_secure_erase(t))
 		q->limits.max_secure_erase_sectors = 0;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7e3b5bdcf520..b995de127093 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1595,6 +1595,12 @@ static blk_status_t __split_and_process_bio(struct clone_info *ci)
 	else if (unlikely(ci->is_abnormal_io))
 		return __process_abnormal_io(ci, ti);
 
+	if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
+				max_io_len(ti, ci->sector) < ci->sector_count)) {
+		DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
+				__func__, ci->sector_count, max_io_len(ti, ci->sector));
+		return -EIO;
+	}
 	/*
 	 * Only support bio polling for normal IO, and the target io is
 	 * exactly inside the dm_io instance (verified in dm_poll_dm_io)
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index c2a3758c4aaa..9304e640c9b9 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -362,6 +362,11 @@ struct dm_target {
 	 * after returning DM_MAPIO_SUBMITTED from its map function.
 	 */
 	bool accounts_remapped_io:1;
+
+	/*
+	 * copy offload is supported
+	 */
+	bool copy_offload_supported:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 07/10] dm: Add support for copy offload.
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel, jack,
	linux-fsdevel, lsf-pc, Damien Le Moal, Alexander Viro

Before enabling copy for dm target, check if underlying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently splitting copy
request is not supported.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-table.c         | 45 +++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  6 +++++
 include/linux/device-mapper.h |  5 ++++
 3 files changed, 56 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index a37c7b763643..b7574f179ed6 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1893,6 +1893,38 @@ static bool dm_table_supports_nowait(struct dm_table *t)
 	return true;
 }
 
+static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev,
+				      sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return !blk_queue_copy(q);
+}
+
+static bool dm_table_supports_copy(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->copy_offload_supported)
+			return false;
+
+		/*
+		 * target provides copy support (as implied by setting 'copy_offload_supported')
+		 * and it relies on _all_ data devices having copy support.
+		 */
+		if (ti->copy_offload_supported &&
+		    (!ti->type->iterate_devices ||
+		     ti->type->iterate_devices(ti, device_not_copy_capable, NULL)))
+			return false;
+	}
+
+	return true;
+}
+
 static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev,
 				      sector_t start, sector_t len, void *data)
 {
@@ -1981,6 +2013,19 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		q->limits.discard_misaligned = 0;
 	}
 
+	if (!dm_table_supports_copy(t)) {
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		/* Must also clear copy limits... */
+		q->limits.max_copy_sectors = 0;
+		q->limits.max_hw_copy_sectors = 0;
+		q->limits.max_copy_range_sectors = 0;
+		q->limits.max_hw_copy_range_sectors = 0;
+		q->limits.max_copy_nr_ranges = 0;
+		q->limits.max_hw_copy_nr_ranges = 0;
+	} else {
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	}
+
 	if (!dm_table_supports_secure_erase(t))
 		q->limits.max_secure_erase_sectors = 0;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7e3b5bdcf520..b995de127093 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1595,6 +1595,12 @@ static blk_status_t __split_and_process_bio(struct clone_info *ci)
 	else if (unlikely(ci->is_abnormal_io))
 		return __process_abnormal_io(ci, ti);
 
+	if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
+				max_io_len(ti, ci->sector) < ci->sector_count)) {
+		DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
+				__func__, ci->sector_count, max_io_len(ti, ci->sector));
+		return -EIO;
+	}
 	/*
 	 * Only support bio polling for normal IO, and the target io is
 	 * exactly inside the dm_io instance (verified in dm_poll_dm_io)
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index c2a3758c4aaa..9304e640c9b9 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -362,6 +362,11 @@ struct dm_target {
 	 * after returning DM_MAPIO_SUBMITTED from its map function.
 	 */
 	bool accounts_remapped_io:1;
+
+	/*
+	 * copy offload is supported
+	 */
+	bool copy_offload_supported:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 08/10] dm: Enable copy offload for dm-linear target
       [not found]   ` <CGME20220426102025epcas5p299d9a88c30db8b9a04a05c57dc809ff7@epcas5p2.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

Setting copy_offload_supported flag to enable offload.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-linear.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 0a6abbbe3745..3b8de6d5ca9c 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -61,6 +61,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->copy_offload_supported = 1;
 	ti->private = lc;
 	return 0;
 
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 08/10] dm: Enable copy offload for dm-linear target
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Nitesh Shetty, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

Setting copy_offload_supported flag to enable offload.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-linear.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 0a6abbbe3745..3b8de6d5ca9c 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -61,6 +61,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->copy_offload_supported = 1;
 	ti->private = lc;
 	return 0;
 
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 08/10] dm: Enable copy offload for dm-linear target
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Chaitanya Kulkarni, Mike Snitzer, josef, linux-block, dsterba,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel, jack,
	linux-fsdevel, lsf-pc, Damien Le Moal, Alexander Viro

Setting copy_offload_supported flag to enable offload.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-linear.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 0a6abbbe3745..3b8de6d5ca9c 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -61,6 +61,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->copy_offload_supported = 1;
 	ti->private = lc;
 	return 0;
 
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 09/10] dm kcopyd: use copy offload support
       [not found]   ` <CGME20220426102033epcas5p137171ff842e8b0a090d2708cfc0e3249@epcas5p1.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	SelvaKumar S, Arnav Dawn, Nitesh Shetty, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

From: SelvaKumar S <selvakuma.s1@samsung.com>

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-kcopyd.c | 55 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 37b03ab7e5c9..214fadd6d71f 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
 	atomic_t nr_jobs;
 
 /*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
  *
- * i)   jobs waiting for pages
- * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i)	jobs waiting to try copy offload
+ * ii)   jobs waiting for pages
+ * iii)  jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
  *
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
  */
 	spinlock_t job_lock;
 	struct list_head callback_jobs;
 	struct list_head complete_jobs;
+	struct list_head copy_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
 };
@@ -579,6 +581,42 @@ static int run_io_job(struct kcopyd_job *job)
 	return r;
 }
 
+static int run_copy_job(struct kcopyd_job *job)
+{
+	int r, i, count = 0;
+	struct range_entry range;
+
+	struct request_queue *src_q, *dest_q;
+
+	for (i = 0; i < job->num_dests; i++) {
+		range.dst = job->dests[i].sector << SECTOR_SHIFT;
+		range.src = job->source.sector << SECTOR_SHIFT;
+		range.len = job->source.count << SECTOR_SHIFT;
+
+		src_q = bdev_get_queue(job->source.bdev);
+		dest_q = bdev_get_queue(job->dests[i].bdev);
+
+		if (src_q != dest_q || !blk_queue_copy(src_q))
+			break;
+
+		r = blkdev_issue_copy(job->source.bdev, 1, &range, job->dests[i].bdev, GFP_KERNEL);
+		if (r)
+			break;
+
+		job->dests[i].count = 0;
+		count++;
+	}
+
+	if (count == job->num_dests) {
+		push(&job->kc->complete_jobs, job);
+	} else {
+		push(&job->kc->pages_jobs, job);
+		r = 0;
+	}
+
+	return r;
+}
+
 static int run_pages_job(struct kcopyd_job *job)
 {
 	int r;
@@ -659,6 +697,7 @@ static void do_work(struct work_struct *work)
 	spin_unlock_irq(&kc->job_lock);
 
 	blk_start_plug(&plug);
+	process_jobs(&kc->copy_jobs, kc, run_copy_job);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
 	process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +715,8 @@ static void dispatch_job(struct kcopyd_job *job)
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
 		push(&kc->callback_jobs, job);
+	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+		push(&kc->copy_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -916,6 +957,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 	spin_lock_init(&kc->job_lock);
 	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
+	INIT_LIST_HEAD(&kc->copy_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
 	kc->throttle = throttle;
@@ -971,6 +1013,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 
 	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
+	WARN_ON(!list_empty(&kc->copy_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
 	destroy_workqueue(kc->kcopyd_wq);
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 09/10] dm kcopyd: use copy offload support
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	SelvaKumar S, Arnav Dawn, Nitesh Shetty, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Alexander Viro,
	linux-kernel

From: SelvaKumar S <selvakuma.s1@samsung.com>

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-kcopyd.c | 55 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 37b03ab7e5c9..214fadd6d71f 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
 	atomic_t nr_jobs;
 
 /*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
  *
- * i)   jobs waiting for pages
- * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i)	jobs waiting to try copy offload
+ * ii)   jobs waiting for pages
+ * iii)  jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
  *
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
  */
 	spinlock_t job_lock;
 	struct list_head callback_jobs;
 	struct list_head complete_jobs;
+	struct list_head copy_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
 };
@@ -579,6 +581,42 @@ static int run_io_job(struct kcopyd_job *job)
 	return r;
 }
 
+static int run_copy_job(struct kcopyd_job *job)
+{
+	int r, i, count = 0;
+	struct range_entry range;
+
+	struct request_queue *src_q, *dest_q;
+
+	for (i = 0; i < job->num_dests; i++) {
+		range.dst = job->dests[i].sector << SECTOR_SHIFT;
+		range.src = job->source.sector << SECTOR_SHIFT;
+		range.len = job->source.count << SECTOR_SHIFT;
+
+		src_q = bdev_get_queue(job->source.bdev);
+		dest_q = bdev_get_queue(job->dests[i].bdev);
+
+		if (src_q != dest_q || !blk_queue_copy(src_q))
+			break;
+
+		r = blkdev_issue_copy(job->source.bdev, 1, &range, job->dests[i].bdev, GFP_KERNEL);
+		if (r)
+			break;
+
+		job->dests[i].count = 0;
+		count++;
+	}
+
+	if (count == job->num_dests) {
+		push(&job->kc->complete_jobs, job);
+	} else {
+		push(&job->kc->pages_jobs, job);
+		r = 0;
+	}
+
+	return r;
+}
+
 static int run_pages_job(struct kcopyd_job *job)
 {
 	int r;
@@ -659,6 +697,7 @@ static void do_work(struct work_struct *work)
 	spin_unlock_irq(&kc->job_lock);
 
 	blk_start_plug(&plug);
+	process_jobs(&kc->copy_jobs, kc, run_copy_job);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
 	process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +715,8 @@ static void dispatch_job(struct kcopyd_job *job)
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
 		push(&kc->callback_jobs, job);
+	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+		push(&kc->copy_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -916,6 +957,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 	spin_lock_init(&kc->job_lock);
 	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
+	INIT_LIST_HEAD(&kc->copy_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
 	kc->throttle = throttle;
@@ -971,6 +1013,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 
 	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
+	WARN_ON(!list_empty(&kc->copy_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
 	destroy_workqueue(kc->kcopyd_wq);
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 09/10] dm kcopyd: use copy offload support
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, Chaitanya Kulkarni, osandov,
	Alasdair Kergon, Naohiro Aota, msnitzer, bvanassche, linux-scsi,
	gost.dev, nitheshshetty, James Smart, hch, Nitesh Shetty,
	chaitanyak, SelvaKumar S, Mike Snitzer, josef, linux-block,
	dsterba, kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, lsf-pc, Damien Le Moal,
	Alexander Viro

From: SelvaKumar S <selvakuma.s1@samsung.com>

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-kcopyd.c | 55 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 37b03ab7e5c9..214fadd6d71f 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
 	atomic_t nr_jobs;
 
 /*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
  *
- * i)   jobs waiting for pages
- * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i)	jobs waiting to try copy offload
+ * ii)   jobs waiting for pages
+ * iii)  jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
  *
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
  */
 	spinlock_t job_lock;
 	struct list_head callback_jobs;
 	struct list_head complete_jobs;
+	struct list_head copy_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
 };
@@ -579,6 +581,42 @@ static int run_io_job(struct kcopyd_job *job)
 	return r;
 }
 
+static int run_copy_job(struct kcopyd_job *job)
+{
+	int r, i, count = 0;
+	struct range_entry range;
+
+	struct request_queue *src_q, *dest_q;
+
+	for (i = 0; i < job->num_dests; i++) {
+		range.dst = job->dests[i].sector << SECTOR_SHIFT;
+		range.src = job->source.sector << SECTOR_SHIFT;
+		range.len = job->source.count << SECTOR_SHIFT;
+
+		src_q = bdev_get_queue(job->source.bdev);
+		dest_q = bdev_get_queue(job->dests[i].bdev);
+
+		if (src_q != dest_q || !blk_queue_copy(src_q))
+			break;
+
+		r = blkdev_issue_copy(job->source.bdev, 1, &range, job->dests[i].bdev, GFP_KERNEL);
+		if (r)
+			break;
+
+		job->dests[i].count = 0;
+		count++;
+	}
+
+	if (count == job->num_dests) {
+		push(&job->kc->complete_jobs, job);
+	} else {
+		push(&job->kc->pages_jobs, job);
+		r = 0;
+	}
+
+	return r;
+}
+
 static int run_pages_job(struct kcopyd_job *job)
 {
 	int r;
@@ -659,6 +697,7 @@ static void do_work(struct work_struct *work)
 	spin_unlock_irq(&kc->job_lock);
 
 	blk_start_plug(&plug);
+	process_jobs(&kc->copy_jobs, kc, run_copy_job);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
 	process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +715,8 @@ static void dispatch_job(struct kcopyd_job *job)
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
 		push(&kc->callback_jobs, job);
+	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+		push(&kc->copy_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -916,6 +957,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 	spin_lock_init(&kc->job_lock);
 	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
+	INIT_LIST_HEAD(&kc->copy_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
 	kc->throttle = throttle;
@@ -971,6 +1013,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 
 	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
+	WARN_ON(!list_empty(&kc->copy_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
 	destroy_workqueue(kc->kcopyd_wq);
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 10/10] fs: add support for copy file range in zonefs
       [not found]   ` <CGME20220426102042epcas5p201aa0d9143d7bc650ae7858383b69288@epcas5p2.samsung.com>
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Arnav Dawn, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

From: Arnav Dawn <arnav.dawn@samsung.com>

copy_file_range is implemented using copy offload,
copy offloading to device is always enabled.
To disable copy offloading mount with "no_copy_offload" mount option.
At present copy offload is only used, if the source and destination files
are on same block device, otherwise copy file range is completed by
generic copy file range.

copy file range implemented as following:
	- write pending writes on the src and dest files
	- drop page cache for dest file if its conv zone
	- copy the range using offload
	- update dest file info

For all failure cases we fallback to generic file copy range
At present this implementation does not support conv aggregation

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 fs/zonefs/super.c  | 178 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/zonefs/zonefs.h |   1 +
 2 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index b3b0b71fdf6c..60563b592bf2 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -901,6 +901,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	else
 		ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
 				   &zonefs_write_dio_ops, 0, 0);
+
 	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
 	    (ret > 0 || ret == -EIOCBQUEUED)) {
 		if (ret > 0)
@@ -1189,6 +1190,171 @@ static int zonefs_file_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static int zonefs_is_file_size_ok(struct inode *src_inode, struct inode *dst_inode,
+			   loff_t src_off, loff_t dst_off, size_t len)
+{
+	loff_t size, endoff;
+
+	size = i_size_read(src_inode);
+	/* Don't copy beyond source file EOF. */
+	if (src_off + len > size) {
+		zonefs_err(src_inode->i_sb, "Copy beyond EOF (%llu + %zu > %llu)\n",
+		     src_off, len, size);
+		return -EOPNOTSUPP;
+	}
+
+	endoff = dst_off + len;
+	if (inode_newsize_ok(dst_inode, endoff))
+		return -EOPNOTSUPP;
+
+
+	return 0;
+}
+static ssize_t __zonefs_send_copy(struct zonefs_inode_info *src_zi, loff_t src_off,
+				struct zonefs_inode_info *dst_zi, loff_t dst_off, size_t len)
+{
+	struct block_device *src_bdev = src_zi->i_vnode.i_sb->s_bdev;
+	struct block_device *dst_bdev = dst_zi->i_vnode.i_sb->s_bdev;
+	struct range_entry *rlist;
+	int ret = -EIO;
+
+	rlist = kmalloc(sizeof(*rlist), GFP_KERNEL);
+	rlist[0].dst = (dst_zi->i_zsector << SECTOR_SHIFT) + dst_off;
+	rlist[0].src = (src_zi->i_zsector << SECTOR_SHIFT) + src_off;
+	rlist[0].len = len;
+	rlist[0].comp_len = 0;
+	ret = blkdev_issue_copy(src_bdev, 1, rlist, dst_bdev, GFP_KERNEL);
+	if (ret) {
+		if (rlist[0].comp_len != len) {
+			ret = rlist[0].comp_len;
+			kfree(rlist);
+			return ret;
+		}
+	}
+	kfree(rlist);
+	return len;
+}
+static ssize_t __zonefs_copy_file_range(struct file *src_file, loff_t src_off,
+				      struct file *dst_file, loff_t dst_off,
+				      size_t len, unsigned int flags)
+{
+	struct inode *src_inode = file_inode(src_file);
+	struct inode *dst_inode = file_inode(dst_file);
+	struct zonefs_inode_info *src_zi = ZONEFS_I(src_inode);
+	struct zonefs_inode_info *dst_zi = ZONEFS_I(dst_inode);
+	struct block_device *src_bdev = src_inode->i_sb->s_bdev;
+	struct block_device *dst_bdev = dst_inode->i_sb->s_bdev;
+	struct super_block *src_sb = src_inode->i_sb;
+	struct zonefs_sb_info *src_sbi = ZONEFS_SB(src_sb);
+	struct super_block *dst_sb = dst_inode->i_sb;
+	struct zonefs_sb_info *dst_sbi = ZONEFS_SB(dst_sb);
+	ssize_t ret = -EIO, bytes;
+
+	if (src_bdev != dst_bdev) {
+		zonefs_err(src_sb, "Copying files across two devices\n");
+			return -EXDEV;
+	}
+
+	/*
+	 * Some of the checks below will return -EOPNOTSUPP,
+	 * which will force a generic copy
+	 */
+
+	if (!(src_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
+		|| !(dst_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE))
+		return -EOPNOTSUPP;
+
+	/* Start by sync'ing the source and destination files ifor conv zones */
+	if (src_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = file_write_and_wait_range(src_file, src_off, (src_off + len));
+		if (ret < 0) {
+			zonefs_err(src_sb, "failed to write source file (%zd)\n", ret);
+			goto out;
+		}
+	}
+	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = file_write_and_wait_range(dst_file, dst_off, (dst_off + len));
+		if (ret < 0) {
+			zonefs_err(dst_sb, "failed to write destination file (%zd)\n", ret);
+			goto out;
+		}
+	}
+	mutex_lock(&dst_zi->i_truncate_mutex);
+	if (len > dst_zi->i_max_size - dst_zi->i_wpoffset) {
+		/* Adjust length */
+		len -= dst_zi->i_max_size - dst_zi->i_wpoffset;
+		if (len <= 0) {
+			mutex_unlock(&dst_zi->i_truncate_mutex);
+			return -EOPNOTSUPP;
+		}
+	}
+	if (dst_off != dst_zi->i_wpoffset) {
+		mutex_unlock(&dst_zi->i_truncate_mutex);
+		return -EOPNOTSUPP; /* copy not at zone write ptr */
+	}
+	mutex_lock(&src_zi->i_truncate_mutex);
+	ret = zonefs_is_file_size_ok(src_inode, dst_inode, src_off, dst_off, len);
+	if (ret < 0) {
+		mutex_unlock(&src_zi->i_truncate_mutex);
+		mutex_unlock(&dst_zi->i_truncate_mutex);
+		goto out;
+	}
+	mutex_unlock(&src_zi->i_truncate_mutex);
+
+	/* Drop dst file cached pages for a conv zone*/
+	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
+						    dst_off >> PAGE_SHIFT,
+						    (dst_off + len) >> PAGE_SHIFT);
+		if (ret < 0) {
+			zonefs_err(dst_sb, "Failed to invalidate inode pages (%zd)\n", ret);
+			ret = 0;
+		}
+	}
+	bytes = __zonefs_send_copy(src_zi, src_off, dst_zi, dst_off, len);
+	ret += bytes;
+
+	file_update_time(dst_file);
+	zonefs_update_stats(dst_inode, dst_off + bytes);
+	zonefs_i_size_write(dst_inode, dst_off + bytes);
+	dst_zi->i_wpoffset += bytes;
+	mutex_unlock(&dst_zi->i_truncate_mutex);
+
+
+
+	/*
+	 * if we still have some bytes left, do splice copy
+	 */
+	if (bytes && (bytes < len)) {
+		zonefs_info(src_sb, "Final partial copy of %zu bytes\n", len);
+		bytes = do_splice_direct(src_file, &src_off, dst_file,
+					 &dst_off, len, flags);
+		if (bytes > 0)
+			ret += bytes;
+		else
+			zonefs_info(src_sb, "Failed partial copy (%zd)\n", bytes);
+	}
+
+out:
+
+	return ret;
+}
+
+static ssize_t zonefs_copy_file_range(struct file *src_file, loff_t src_off,
+				    struct file *dst_file, loff_t dst_off,
+				    size_t len, unsigned int flags)
+{
+	ssize_t ret;
+
+	ret = __zonefs_copy_file_range(src_file, src_off, dst_file, dst_off,
+				     len, flags);
+
+	if (ret == -EOPNOTSUPP || ret == -EXDEV)
+		ret = generic_copy_file_range(src_file, src_off, dst_file,
+					      dst_off, len, flags);
+	return ret;
+}
+
 static const struct file_operations zonefs_file_operations = {
 	.open		= zonefs_file_open,
 	.release	= zonefs_file_release,
@@ -1200,6 +1366,7 @@ static const struct file_operations zonefs_file_operations = {
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.iopoll		= iocb_bio_iopoll,
+	.copy_file_range = zonefs_copy_file_range,
 };
 
 static struct kmem_cache *zonefs_inode_cachep;
@@ -1262,7 +1429,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 enum {
 	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
-	Opt_explicit_open, Opt_err,
+	Opt_explicit_open, Opt_no_copy_offload, Opt_err,
 };
 
 static const match_table_t tokens = {
@@ -1271,6 +1438,7 @@ static const match_table_t tokens = {
 	{ Opt_errors_zol,	"errors=zone-offline"},
 	{ Opt_errors_repair,	"errors=repair"},
 	{ Opt_explicit_open,	"explicit-open" },
+	{ Opt_no_copy_offload,	"no_copy_offload" },
 	{ Opt_err,		NULL}
 };
 
@@ -1280,6 +1448,7 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 	substring_t args[MAX_OPT_ARGS];
 	char *p;
 
+	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
 	if (!options)
 		return 0;
 
@@ -1310,6 +1479,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 		case Opt_explicit_open:
 			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
 			break;
+		case Opt_no_copy_offload:
+			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_COPY_FILE;
+			break;
 		default:
 			return -EINVAL;
 		}
@@ -1330,6 +1502,8 @@ static int zonefs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",errors=zone-offline");
 	if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_REPAIR)
 		seq_puts(seq, ",errors=repair");
+	if (sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
+		seq_puts(seq, ",copy_offload");
 
 	return 0;
 }
@@ -1769,6 +1943,8 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
 	atomic_set(&sbi->s_active_seq_files, 0);
 	sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev);
 
+	/* set copy support by default */
+	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
 	ret = zonefs_read_super(sb);
 	if (ret)
 		return ret;
diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
index 4b3de66c3233..efa6632c4b6a 100644
--- a/fs/zonefs/zonefs.h
+++ b/fs/zonefs/zonefs.h
@@ -162,6 +162,7 @@ enum zonefs_features {
 	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
 	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
 #define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
+#define ZONEFS_MNTOPT_COPY_FILE		(1 << 5) /* enable copy file range offload to kernel */
 
 /*
  * In-memory Super block information.
-- 
2.35.1.500.gb896f729e2


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH v4 10/10] fs: add support for copy file range in zonefs
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Arnav Dawn, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

From: Arnav Dawn <arnav.dawn@samsung.com>

copy_file_range is implemented using copy offload,
copy offloading to device is always enabled.
To disable copy offloading mount with "no_copy_offload" mount option.
At present copy offload is only used, if the source and destination files
are on same block device, otherwise copy file range is completed by
generic copy file range.

copy file range implemented as following:
	- write pending writes on the src and dest files
	- drop page cache for dest file if its conv zone
	- copy the range using offload
	- update dest file info

For all failure cases we fallback to generic file copy range
At present this implementation does not support conv aggregation

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 fs/zonefs/super.c  | 178 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/zonefs/zonefs.h |   1 +
 2 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index b3b0b71fdf6c..60563b592bf2 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -901,6 +901,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	else
 		ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
 				   &zonefs_write_dio_ops, 0, 0);
+
 	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
 	    (ret > 0 || ret == -EIOCBQUEUED)) {
 		if (ret > 0)
@@ -1189,6 +1190,171 @@ static int zonefs_file_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static int zonefs_is_file_size_ok(struct inode *src_inode, struct inode *dst_inode,
+			   loff_t src_off, loff_t dst_off, size_t len)
+{
+	loff_t size, endoff;
+
+	size = i_size_read(src_inode);
+	/* Don't copy beyond source file EOF. */
+	if (src_off + len > size) {
+		zonefs_err(src_inode->i_sb, "Copy beyond EOF (%llu + %zu > %llu)\n",
+		     src_off, len, size);
+		return -EOPNOTSUPP;
+	}
+
+	endoff = dst_off + len;
+	if (inode_newsize_ok(dst_inode, endoff))
+		return -EOPNOTSUPP;
+
+
+	return 0;
+}
+static ssize_t __zonefs_send_copy(struct zonefs_inode_info *src_zi, loff_t src_off,
+				struct zonefs_inode_info *dst_zi, loff_t dst_off, size_t len)
+{
+	struct block_device *src_bdev = src_zi->i_vnode.i_sb->s_bdev;
+	struct block_device *dst_bdev = dst_zi->i_vnode.i_sb->s_bdev;
+	struct range_entry *rlist;
+	int ret = -EIO;
+
+	rlist = kmalloc(sizeof(*rlist), GFP_KERNEL);
+	rlist[0].dst = (dst_zi->i_zsector << SECTOR_SHIFT) + dst_off;
+	rlist[0].src = (src_zi->i_zsector << SECTOR_SHIFT) + src_off;
+	rlist[0].len = len;
+	rlist[0].comp_len = 0;
+	ret = blkdev_issue_copy(src_bdev, 1, rlist, dst_bdev, GFP_KERNEL);
+	if (ret) {
+		if (rlist[0].comp_len != len) {
+			ret = rlist[0].comp_len;
+			kfree(rlist);
+			return ret;
+		}
+	}
+	kfree(rlist);
+	return len;
+}
+static ssize_t __zonefs_copy_file_range(struct file *src_file, loff_t src_off,
+				      struct file *dst_file, loff_t dst_off,
+				      size_t len, unsigned int flags)
+{
+	struct inode *src_inode = file_inode(src_file);
+	struct inode *dst_inode = file_inode(dst_file);
+	struct zonefs_inode_info *src_zi = ZONEFS_I(src_inode);
+	struct zonefs_inode_info *dst_zi = ZONEFS_I(dst_inode);
+	struct block_device *src_bdev = src_inode->i_sb->s_bdev;
+	struct block_device *dst_bdev = dst_inode->i_sb->s_bdev;
+	struct super_block *src_sb = src_inode->i_sb;
+	struct zonefs_sb_info *src_sbi = ZONEFS_SB(src_sb);
+	struct super_block *dst_sb = dst_inode->i_sb;
+	struct zonefs_sb_info *dst_sbi = ZONEFS_SB(dst_sb);
+	ssize_t ret = -EIO, bytes;
+
+	if (src_bdev != dst_bdev) {
+		zonefs_err(src_sb, "Copying files across two devices\n");
+			return -EXDEV;
+	}
+
+	/*
+	 * Some of the checks below will return -EOPNOTSUPP,
+	 * which will force a generic copy
+	 */
+
+	if (!(src_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
+		|| !(dst_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE))
+		return -EOPNOTSUPP;
+
+	/* Start by sync'ing the source and destination files ifor conv zones */
+	if (src_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = file_write_and_wait_range(src_file, src_off, (src_off + len));
+		if (ret < 0) {
+			zonefs_err(src_sb, "failed to write source file (%zd)\n", ret);
+			goto out;
+		}
+	}
+	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = file_write_and_wait_range(dst_file, dst_off, (dst_off + len));
+		if (ret < 0) {
+			zonefs_err(dst_sb, "failed to write destination file (%zd)\n", ret);
+			goto out;
+		}
+	}
+	mutex_lock(&dst_zi->i_truncate_mutex);
+	if (len > dst_zi->i_max_size - dst_zi->i_wpoffset) {
+		/* Adjust length */
+		len -= dst_zi->i_max_size - dst_zi->i_wpoffset;
+		if (len <= 0) {
+			mutex_unlock(&dst_zi->i_truncate_mutex);
+			return -EOPNOTSUPP;
+		}
+	}
+	if (dst_off != dst_zi->i_wpoffset) {
+		mutex_unlock(&dst_zi->i_truncate_mutex);
+		return -EOPNOTSUPP; /* copy not at zone write ptr */
+	}
+	mutex_lock(&src_zi->i_truncate_mutex);
+	ret = zonefs_is_file_size_ok(src_inode, dst_inode, src_off, dst_off, len);
+	if (ret < 0) {
+		mutex_unlock(&src_zi->i_truncate_mutex);
+		mutex_unlock(&dst_zi->i_truncate_mutex);
+		goto out;
+	}
+	mutex_unlock(&src_zi->i_truncate_mutex);
+
+	/* Drop dst file cached pages for a conv zone*/
+	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
+						    dst_off >> PAGE_SHIFT,
+						    (dst_off + len) >> PAGE_SHIFT);
+		if (ret < 0) {
+			zonefs_err(dst_sb, "Failed to invalidate inode pages (%zd)\n", ret);
+			ret = 0;
+		}
+	}
+	bytes = __zonefs_send_copy(src_zi, src_off, dst_zi, dst_off, len);
+	ret += bytes;
+
+	file_update_time(dst_file);
+	zonefs_update_stats(dst_inode, dst_off + bytes);
+	zonefs_i_size_write(dst_inode, dst_off + bytes);
+	dst_zi->i_wpoffset += bytes;
+	mutex_unlock(&dst_zi->i_truncate_mutex);
+
+
+
+	/*
+	 * if we still have some bytes left, do splice copy
+	 */
+	if (bytes && (bytes < len)) {
+		zonefs_info(src_sb, "Final partial copy of %zu bytes\n", len);
+		bytes = do_splice_direct(src_file, &src_off, dst_file,
+					 &dst_off, len, flags);
+		if (bytes > 0)
+			ret += bytes;
+		else
+			zonefs_info(src_sb, "Failed partial copy (%zd)\n", bytes);
+	}
+
+out:
+
+	return ret;
+}
+
+static ssize_t zonefs_copy_file_range(struct file *src_file, loff_t src_off,
+				    struct file *dst_file, loff_t dst_off,
+				    size_t len, unsigned int flags)
+{
+	ssize_t ret;
+
+	ret = __zonefs_copy_file_range(src_file, src_off, dst_file, dst_off,
+				     len, flags);
+
+	if (ret == -EOPNOTSUPP || ret == -EXDEV)
+		ret = generic_copy_file_range(src_file, src_off, dst_file,
+					      dst_off, len, flags);
+	return ret;
+}
+
 static const struct file_operations zonefs_file_operations = {
 	.open		= zonefs_file_open,
 	.release	= zonefs_file_release,
@@ -1200,6 +1366,7 @@ static const struct file_operations zonefs_file_operations = {
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.iopoll		= iocb_bio_iopoll,
+	.copy_file_range = zonefs_copy_file_range,
 };
 
 static struct kmem_cache *zonefs_inode_cachep;
@@ -1262,7 +1429,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 enum {
 	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
-	Opt_explicit_open, Opt_err,
+	Opt_explicit_open, Opt_no_copy_offload, Opt_err,
 };
 
 static const match_table_t tokens = {
@@ -1271,6 +1438,7 @@ static const match_table_t tokens = {
 	{ Opt_errors_zol,	"errors=zone-offline"},
 	{ Opt_errors_repair,	"errors=repair"},
 	{ Opt_explicit_open,	"explicit-open" },
+	{ Opt_no_copy_offload,	"no_copy_offload" },
 	{ Opt_err,		NULL}
 };
 
@@ -1280,6 +1448,7 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 	substring_t args[MAX_OPT_ARGS];
 	char *p;
 
+	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
 	if (!options)
 		return 0;
 
@@ -1310,6 +1479,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 		case Opt_explicit_open:
 			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
 			break;
+		case Opt_no_copy_offload:
+			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_COPY_FILE;
+			break;
 		default:
 			return -EINVAL;
 		}
@@ -1330,6 +1502,8 @@ static int zonefs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",errors=zone-offline");
 	if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_REPAIR)
 		seq_puts(seq, ",errors=repair");
+	if (sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
+		seq_puts(seq, ",copy_offload");
 
 	return 0;
 }
@@ -1769,6 +1943,8 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
 	atomic_set(&sbi->s_active_seq_files, 0);
 	sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev);
 
+	/* set copy support by default */
+	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
 	ret = zonefs_read_super(sb);
 	if (ret)
 		return ret;
diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
index 4b3de66c3233..efa6632c4b6a 100644
--- a/fs/zonefs/zonefs.h
+++ b/fs/zonefs/zonefs.h
@@ -162,6 +162,7 @@ enum zonefs_features {
 	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
 	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
 #define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
+#define ZONEFS_MNTOPT_COPY_FILE		(1 << 5) /* enable copy file range offload to kernel */
 
 /*
  * In-memory Super block information.
-- 
2.35.1.500.gb896f729e2



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [dm-devel] [PATCH v4 10/10] fs: add support for copy file range in zonefs
@ 2022-04-26 10:12       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-26 10:12 UTC (permalink / raw)
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, Arnav Dawn, jack,
	linux-fsdevel, lsf-pc, Damien Le Moal, Alexander Viro

From: Arnav Dawn <arnav.dawn@samsung.com>

copy_file_range is implemented using copy offload,
copy offloading to device is always enabled.
To disable copy offloading mount with "no_copy_offload" mount option.
At present copy offload is only used, if the source and destination files
are on same block device, otherwise copy file range is completed by
generic copy file range.

copy file range implemented as following:
	- write pending writes on the src and dest files
	- drop page cache for dest file if its conv zone
	- copy the range using offload
	- update dest file info

For all failure cases we fallback to generic file copy range
At present this implementation does not support conv aggregation

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 fs/zonefs/super.c  | 178 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/zonefs/zonefs.h |   1 +
 2 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index b3b0b71fdf6c..60563b592bf2 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -901,6 +901,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	else
 		ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
 				   &zonefs_write_dio_ops, 0, 0);
+
 	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
 	    (ret > 0 || ret == -EIOCBQUEUED)) {
 		if (ret > 0)
@@ -1189,6 +1190,171 @@ static int zonefs_file_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static int zonefs_is_file_size_ok(struct inode *src_inode, struct inode *dst_inode,
+			   loff_t src_off, loff_t dst_off, size_t len)
+{
+	loff_t size, endoff;
+
+	size = i_size_read(src_inode);
+	/* Don't copy beyond source file EOF. */
+	if (src_off + len > size) {
+		zonefs_err(src_inode->i_sb, "Copy beyond EOF (%llu + %zu > %llu)\n",
+		     src_off, len, size);
+		return -EOPNOTSUPP;
+	}
+
+	endoff = dst_off + len;
+	if (inode_newsize_ok(dst_inode, endoff))
+		return -EOPNOTSUPP;
+
+
+	return 0;
+}
+static ssize_t __zonefs_send_copy(struct zonefs_inode_info *src_zi, loff_t src_off,
+				struct zonefs_inode_info *dst_zi, loff_t dst_off, size_t len)
+{
+	struct block_device *src_bdev = src_zi->i_vnode.i_sb->s_bdev;
+	struct block_device *dst_bdev = dst_zi->i_vnode.i_sb->s_bdev;
+	struct range_entry *rlist;
+	int ret = -EIO;
+
+	rlist = kmalloc(sizeof(*rlist), GFP_KERNEL);
+	rlist[0].dst = (dst_zi->i_zsector << SECTOR_SHIFT) + dst_off;
+	rlist[0].src = (src_zi->i_zsector << SECTOR_SHIFT) + src_off;
+	rlist[0].len = len;
+	rlist[0].comp_len = 0;
+	ret = blkdev_issue_copy(src_bdev, 1, rlist, dst_bdev, GFP_KERNEL);
+	if (ret) {
+		if (rlist[0].comp_len != len) {
+			ret = rlist[0].comp_len;
+			kfree(rlist);
+			return ret;
+		}
+	}
+	kfree(rlist);
+	return len;
+}
+static ssize_t __zonefs_copy_file_range(struct file *src_file, loff_t src_off,
+				      struct file *dst_file, loff_t dst_off,
+				      size_t len, unsigned int flags)
+{
+	struct inode *src_inode = file_inode(src_file);
+	struct inode *dst_inode = file_inode(dst_file);
+	struct zonefs_inode_info *src_zi = ZONEFS_I(src_inode);
+	struct zonefs_inode_info *dst_zi = ZONEFS_I(dst_inode);
+	struct block_device *src_bdev = src_inode->i_sb->s_bdev;
+	struct block_device *dst_bdev = dst_inode->i_sb->s_bdev;
+	struct super_block *src_sb = src_inode->i_sb;
+	struct zonefs_sb_info *src_sbi = ZONEFS_SB(src_sb);
+	struct super_block *dst_sb = dst_inode->i_sb;
+	struct zonefs_sb_info *dst_sbi = ZONEFS_SB(dst_sb);
+	ssize_t ret = -EIO, bytes;
+
+	if (src_bdev != dst_bdev) {
+		zonefs_err(src_sb, "Copying files across two devices\n");
+			return -EXDEV;
+	}
+
+	/*
+	 * Some of the checks below will return -EOPNOTSUPP,
+	 * which will force a generic copy
+	 */
+
+	if (!(src_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
+		|| !(dst_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE))
+		return -EOPNOTSUPP;
+
+	/* Start by sync'ing the source and destination files ifor conv zones */
+	if (src_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = file_write_and_wait_range(src_file, src_off, (src_off + len));
+		if (ret < 0) {
+			zonefs_err(src_sb, "failed to write source file (%zd)\n", ret);
+			goto out;
+		}
+	}
+	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = file_write_and_wait_range(dst_file, dst_off, (dst_off + len));
+		if (ret < 0) {
+			zonefs_err(dst_sb, "failed to write destination file (%zd)\n", ret);
+			goto out;
+		}
+	}
+	mutex_lock(&dst_zi->i_truncate_mutex);
+	if (len > dst_zi->i_max_size - dst_zi->i_wpoffset) {
+		/* Adjust length */
+		len -= dst_zi->i_max_size - dst_zi->i_wpoffset;
+		if (len <= 0) {
+			mutex_unlock(&dst_zi->i_truncate_mutex);
+			return -EOPNOTSUPP;
+		}
+	}
+	if (dst_off != dst_zi->i_wpoffset) {
+		mutex_unlock(&dst_zi->i_truncate_mutex);
+		return -EOPNOTSUPP; /* copy not at zone write ptr */
+	}
+	mutex_lock(&src_zi->i_truncate_mutex);
+	ret = zonefs_is_file_size_ok(src_inode, dst_inode, src_off, dst_off, len);
+	if (ret < 0) {
+		mutex_unlock(&src_zi->i_truncate_mutex);
+		mutex_unlock(&dst_zi->i_truncate_mutex);
+		goto out;
+	}
+	mutex_unlock(&src_zi->i_truncate_mutex);
+
+	/* Drop dst file cached pages for a conv zone*/
+	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
+		ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
+						    dst_off >> PAGE_SHIFT,
+						    (dst_off + len) >> PAGE_SHIFT);
+		if (ret < 0) {
+			zonefs_err(dst_sb, "Failed to invalidate inode pages (%zd)\n", ret);
+			ret = 0;
+		}
+	}
+	bytes = __zonefs_send_copy(src_zi, src_off, dst_zi, dst_off, len);
+	ret += bytes;
+
+	file_update_time(dst_file);
+	zonefs_update_stats(dst_inode, dst_off + bytes);
+	zonefs_i_size_write(dst_inode, dst_off + bytes);
+	dst_zi->i_wpoffset += bytes;
+	mutex_unlock(&dst_zi->i_truncate_mutex);
+
+
+
+	/*
+	 * if we still have some bytes left, do splice copy
+	 */
+	if (bytes && (bytes < len)) {
+		zonefs_info(src_sb, "Final partial copy of %zu bytes\n", len);
+		bytes = do_splice_direct(src_file, &src_off, dst_file,
+					 &dst_off, len, flags);
+		if (bytes > 0)
+			ret += bytes;
+		else
+			zonefs_info(src_sb, "Failed partial copy (%zd)\n", bytes);
+	}
+
+out:
+
+	return ret;
+}
+
+static ssize_t zonefs_copy_file_range(struct file *src_file, loff_t src_off,
+				    struct file *dst_file, loff_t dst_off,
+				    size_t len, unsigned int flags)
+{
+	ssize_t ret;
+
+	ret = __zonefs_copy_file_range(src_file, src_off, dst_file, dst_off,
+				     len, flags);
+
+	if (ret == -EOPNOTSUPP || ret == -EXDEV)
+		ret = generic_copy_file_range(src_file, src_off, dst_file,
+					      dst_off, len, flags);
+	return ret;
+}
+
 static const struct file_operations zonefs_file_operations = {
 	.open		= zonefs_file_open,
 	.release	= zonefs_file_release,
@@ -1200,6 +1366,7 @@ static const struct file_operations zonefs_file_operations = {
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.iopoll		= iocb_bio_iopoll,
+	.copy_file_range = zonefs_copy_file_range,
 };
 
 static struct kmem_cache *zonefs_inode_cachep;
@@ -1262,7 +1429,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 enum {
 	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
-	Opt_explicit_open, Opt_err,
+	Opt_explicit_open, Opt_no_copy_offload, Opt_err,
 };
 
 static const match_table_t tokens = {
@@ -1271,6 +1438,7 @@ static const match_table_t tokens = {
 	{ Opt_errors_zol,	"errors=zone-offline"},
 	{ Opt_errors_repair,	"errors=repair"},
 	{ Opt_explicit_open,	"explicit-open" },
+	{ Opt_no_copy_offload,	"no_copy_offload" },
 	{ Opt_err,		NULL}
 };
 
@@ -1280,6 +1448,7 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 	substring_t args[MAX_OPT_ARGS];
 	char *p;
 
+	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
 	if (!options)
 		return 0;
 
@@ -1310,6 +1479,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 		case Opt_explicit_open:
 			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
 			break;
+		case Opt_no_copy_offload:
+			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_COPY_FILE;
+			break;
 		default:
 			return -EINVAL;
 		}
@@ -1330,6 +1502,8 @@ static int zonefs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",errors=zone-offline");
 	if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_REPAIR)
 		seq_puts(seq, ",errors=repair");
+	if (sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
+		seq_puts(seq, ",copy_offload");
 
 	return 0;
 }
@@ -1769,6 +1943,8 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
 	atomic_set(&sbi->s_active_seq_files, 0);
 	sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev);
 
+	/* set copy support by default */
+	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
 	ret = zonefs_read_super(sb);
 	if (ret)
 		return ret;
diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
index 4b3de66c3233..efa6632c4b6a 100644
--- a/fs/zonefs/zonefs.h
+++ b/fs/zonefs/zonefs.h
@@ -162,6 +162,7 @@ enum zonefs_features {
 	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
 	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
 #define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
+#define ZONEFS_MNTOPT_COPY_FILE		(1 << 5) /* enable copy file range offload to kernel */
 
 /*
  * In-memory Super block information.
-- 
2.35.1.500.gb896f729e2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27  0:11         ` kernel test robot
  -1 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-27  0:11 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: llvm, kbuild-all, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack, nitheshshetty,
	gost.dev, Nitesh Shetty, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: hexagon-randconfig-r041-20220425 (https://download.01.org/0day-ci/archive/20220427/202204270754.pM0Ewhl5-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/3e91cba65ef73ba116953031d5548da7fd33a150
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout 3e91cba65ef73ba116953031d5548da7fd33a150
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:178:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:178:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
   1 warning generated.


vim +/blk_copy_offload +178 block/blk-lib.c

   173	
   174	/*
   175	 * blk_copy_offload	- Use device's native copy offload feature
   176	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   177	 */
 > 178	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   179			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   180	{
   181		struct request_queue *sq = bdev_get_queue(src_bdev);
   182		struct request_queue *dq = bdev_get_queue(dst_bdev);
   183		struct bio *read_bio, *write_bio;
   184		struct copy_ctx *ctx;
   185		struct cio *cio;
   186		struct page *token;
   187		sector_t src_blk, copy_len, dst_blk;
   188		sector_t remaining, max_copy_len = LONG_MAX;
   189		unsigned long flags;
   190		int ri = 0, ret = 0;
   191	
   192		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   193		if (!cio)
   194			return -ENOMEM;
   195		cio->rlist = rlist;
   196		spin_lock_init(&cio->lock);
   197	
   198		max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
   199		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   200				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   201	
   202		for (ri = 0; ri < nr_srcs; ri++) {
   203			cio->rlist[ri].comp_len = rlist[ri].len;
   204			src_blk = rlist[ri].src;
   205			dst_blk = rlist[ri].dst;
   206			for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
   207				copy_len = min(remaining, max_copy_len);
   208	
   209				token = alloc_page(gfp_mask);
   210				if (unlikely(!token)) {
   211					ret = -ENOMEM;
   212					goto err_token;
   213				}
   214	
   215				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   216				if (!ctx) {
   217					ret = -ENOMEM;
   218					goto err_ctx;
   219				}
   220				ctx->cio = cio;
   221				ctx->range_idx = ri;
   222				ctx->start_sec = dst_blk;
   223	
   224				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   225						gfp_mask);
   226				if (!read_bio) {
   227					ret = -ENOMEM;
   228					goto err_read_bio;
   229				}
   230				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   231				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   232				/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
   233				read_bio->bi_iter.bi_size = copy_len;
   234				ret = submit_bio_wait(read_bio);
   235				bio_put(read_bio);
   236				if (ret)
   237					goto err_read_bio;
   238	
   239				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   240						gfp_mask);
   241				if (!write_bio) {
   242					ret = -ENOMEM;
   243					goto err_read_bio;
   244				}
   245				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   246				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   247				/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
   248				write_bio->bi_iter.bi_size = copy_len;
   249				write_bio->bi_end_io = bio_copy_end_io;
   250				write_bio->bi_private = ctx;
   251	
   252				spin_lock_irqsave(&cio->lock, flags);
   253				++cio->refcount;
   254				spin_unlock_irqrestore(&cio->lock, flags);
   255	
   256				submit_bio(write_bio);
   257				src_blk += copy_len;
   258				dst_blk += copy_len;
   259			}
   260		}
   261	
   262		/* Wait for completion of all IO's*/
   263		return cio_await_completion(cio);
   264	
   265	err_read_bio:
   266		kfree(ctx);
   267	err_ctx:
   268		__free_page(token);
   269	err_token:
   270		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   271	
   272		cio->io_err = ret;
   273		return cio_await_completion(cio);
   274	}
   275	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-27  0:11         ` kernel test robot
  0 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-27  0:11 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, llvm, linux-nvme, clm, dm-devel, osandov,
	Alasdair Kergon, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, Nitesh Shetty, chaitanyak,
	Mike Snitzer, josef, linux-block, dsterba, kbusch, tytso,
	Frederick.Knight, Sagi Grimberg, axboe, kbuild-all,
	martin.petersen, Arnav Dawn, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: hexagon-randconfig-r041-20220425 (https://download.01.org/0day-ci/archive/20220427/202204270754.pM0Ewhl5-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/3e91cba65ef73ba116953031d5548da7fd33a150
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout 3e91cba65ef73ba116953031d5548da7fd33a150
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:178:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:178:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
   1 warning generated.


vim +/blk_copy_offload +178 block/blk-lib.c

   173	
   174	/*
   175	 * blk_copy_offload	- Use device's native copy offload feature
   176	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   177	 */
 > 178	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   179			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   180	{
   181		struct request_queue *sq = bdev_get_queue(src_bdev);
   182		struct request_queue *dq = bdev_get_queue(dst_bdev);
   183		struct bio *read_bio, *write_bio;
   184		struct copy_ctx *ctx;
   185		struct cio *cio;
   186		struct page *token;
   187		sector_t src_blk, copy_len, dst_blk;
   188		sector_t remaining, max_copy_len = LONG_MAX;
   189		unsigned long flags;
   190		int ri = 0, ret = 0;
   191	
   192		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   193		if (!cio)
   194			return -ENOMEM;
   195		cio->rlist = rlist;
   196		spin_lock_init(&cio->lock);
   197	
   198		max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
   199		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   200				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   201	
   202		for (ri = 0; ri < nr_srcs; ri++) {
   203			cio->rlist[ri].comp_len = rlist[ri].len;
   204			src_blk = rlist[ri].src;
   205			dst_blk = rlist[ri].dst;
   206			for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
   207				copy_len = min(remaining, max_copy_len);
   208	
   209				token = alloc_page(gfp_mask);
   210				if (unlikely(!token)) {
   211					ret = -ENOMEM;
   212					goto err_token;
   213				}
   214	
   215				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   216				if (!ctx) {
   217					ret = -ENOMEM;
   218					goto err_ctx;
   219				}
   220				ctx->cio = cio;
   221				ctx->range_idx = ri;
   222				ctx->start_sec = dst_blk;
   223	
   224				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   225						gfp_mask);
   226				if (!read_bio) {
   227					ret = -ENOMEM;
   228					goto err_read_bio;
   229				}
   230				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   231				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   232				/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
   233				read_bio->bi_iter.bi_size = copy_len;
   234				ret = submit_bio_wait(read_bio);
   235				bio_put(read_bio);
   236				if (ret)
   237					goto err_read_bio;
   238	
   239				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   240						gfp_mask);
   241				if (!write_bio) {
   242					ret = -ENOMEM;
   243					goto err_read_bio;
   244				}
   245				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   246				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   247				/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
   248				write_bio->bi_iter.bi_size = copy_len;
   249				write_bio->bi_end_io = bio_copy_end_io;
   250				write_bio->bi_private = ctx;
   251	
   252				spin_lock_irqsave(&cio->lock, flags);
   253				++cio->refcount;
   254				spin_unlock_irqrestore(&cio->lock, flags);
   255	
   256				submit_bio(write_bio);
   257				src_blk += copy_len;
   258				dst_blk += copy_len;
   259			}
   260		}
   261	
   262		/* Wait for completion of all IO's*/
   263		return cio_await_completion(cio);
   264	
   265	err_read_bio:
   266		kfree(ctx);
   267	err_ctx:
   268		__free_page(token);
   269	err_token:
   270		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   271	
   272		cio->io_err = ret;
   273		return cio_await_completion(cio);
   274	}
   275	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 04/10] block: add emulation for copy
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27  1:33         ` kernel test robot
  -1 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-27  1:33 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: llvm, kbuild-all, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack, nitheshshetty,
	gost.dev, Nitesh Shetty, Vincent Fu, Arnav Dawn, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: hexagon-randconfig-r041-20220425 (https://download.01.org/0day-ci/archive/20220427/202204270913.Ecb3uQx1-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/c406c5145dc7d628d4197f6726c23a3f1179b88e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout c406c5145dc7d628d4197f6726c23a3f1179b88e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   block/blk-lib.c:178:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:178:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
>> block/blk-lib.c:276:5: warning: no previous prototype for function 'blk_submit_rw_buf' [-Wmissing-prototypes]
   int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
       ^
   block/blk-lib.c:276:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   ^
   static 
   2 warnings generated.


vim +/blk_submit_rw_buf +276 block/blk-lib.c

   275	
 > 276	int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   277					sector_t sector, unsigned int op, gfp_t gfp_mask)
   278	{
   279		struct request_queue *q = bdev_get_queue(bdev);
   280		struct bio *bio, *parent = NULL;
   281		sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
   282				queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
   283		sector_t len, remaining;
   284		int ret;
   285	
   286		for (remaining = buf_len; remaining > 0; remaining -= len) {
   287			len = min_t(int, max_hw_len, remaining);
   288	retry:
   289			bio = bio_map_kern(q, buf, len, gfp_mask);
   290			if (IS_ERR(bio)) {
   291				len >>= 1;
   292				if (len)
   293					goto retry;
   294				return PTR_ERR(bio);
   295			}
   296	
   297			bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
   298			bio->bi_opf = op;
   299			bio_set_dev(bio, bdev);
   300			bio->bi_end_io = NULL;
   301			bio->bi_private = NULL;
   302	
   303			if (parent) {
   304				bio_chain(parent, bio);
   305				submit_bio(parent);
   306			}
   307			parent = bio;
   308			sector += len;
   309			buf = (char *) buf + len;
   310		}
   311		ret = submit_bio_wait(bio);
   312		bio_put(bio);
   313	
   314		return ret;
   315	}
   316	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 04/10] block: add emulation for copy
@ 2022-04-27  1:33         ` kernel test robot
  0 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-27  1:33 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: Vincent Fu, djwong, llvm, linux-nvme, clm, dm-devel, osandov,
	Alasdair Kergon, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, hch, Nitesh Shetty, chaitanyak, Mike Snitzer,
	josef, linux-block, dsterba, kbusch, tytso, Frederick.Knight,
	Sagi Grimberg, axboe, kbuild-all, martin.petersen, Arnav Dawn,
	jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: hexagon-randconfig-r041-20220425 (https://download.01.org/0day-ci/archive/20220427/202204270913.Ecb3uQx1-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/c406c5145dc7d628d4197f6726c23a3f1179b88e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout c406c5145dc7d628d4197f6726c23a3f1179b88e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   block/blk-lib.c:178:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:178:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
>> block/blk-lib.c:276:5: warning: no previous prototype for function 'blk_submit_rw_buf' [-Wmissing-prototypes]
   int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
       ^
   block/blk-lib.c:276:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   ^
   static 
   2 warnings generated.


vim +/blk_submit_rw_buf +276 block/blk-lib.c

   275	
 > 276	int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   277					sector_t sector, unsigned int op, gfp_t gfp_mask)
   278	{
   279		struct request_queue *q = bdev_get_queue(bdev);
   280		struct bio *bio, *parent = NULL;
   281		sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
   282				queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
   283		sector_t len, remaining;
   284		int ret;
   285	
   286		for (remaining = buf_len; remaining > 0; remaining -= len) {
   287			len = min_t(int, max_hw_len, remaining);
   288	retry:
   289			bio = bio_map_kern(q, buf, len, gfp_mask);
   290			if (IS_ERR(bio)) {
   291				len >>= 1;
   292				if (len)
   293					goto retry;
   294				return PTR_ERR(bio);
   295			}
   296	
   297			bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
   298			bio->bi_opf = op;
   299			bio_set_dev(bio, bdev);
   300			bio->bi_end_io = NULL;
   301			bio->bi_private = NULL;
   302	
   303			if (parent) {
   304				bio_chain(parent, bio);
   305				submit_bio(parent);
   306			}
   307			parent = bio;
   308			sector += len;
   309			buf = (char *) buf + len;
   310		}
   311		ret = submit_bio_wait(bio);
   312		bio_put(bio);
   313	
   314		return ret;
   315	}
   316	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 10/10] fs: add support for copy file range in zonefs
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27  1:42         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  1:42 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Arnav Dawn, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> From: Arnav Dawn <arnav.dawn@samsung.com>
> 
> copy_file_range is implemented using copy offload,
> copy offloading to device is always enabled.
> To disable copy offloading mount with "no_copy_offload" mount option.
> At present copy offload is only used, if the source and destination files
> are on same block device, otherwise copy file range is completed by
> generic copy file range.

Why not integrate copy offload inside generic_copy_file_range() ?

> 
> copy file range implemented as following:
> 	- write pending writes on the src and dest files
> 	- drop page cache for dest file if its conv zone
> 	- copy the range using offload
> 	- update dest file info
> 
> For all failure cases we fallback to generic file copy range
> At present this implementation does not support conv aggregation
> 
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  fs/zonefs/super.c  | 178 ++++++++++++++++++++++++++++++++++++++++++++-
>  fs/zonefs/zonefs.h |   1 +
>  2 files changed, 178 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index b3b0b71fdf6c..60563b592bf2 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -901,6 +901,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
>  	else
>  		ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
>  				   &zonefs_write_dio_ops, 0, 0);
> +

Unnecessary white line change.

>  	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
>  	    (ret > 0 || ret == -EIOCBQUEUED)) {
>  		if (ret > 0)
> @@ -1189,6 +1190,171 @@ static int zonefs_file_release(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static int zonefs_is_file_size_ok(struct inode *src_inode, struct inode *dst_inode,
> +			   loff_t src_off, loff_t dst_off, size_t len)

This function is badly named. It is not checking if the size of files is
OK, is is checking if the copy offsets are OK.

> +{
> +	loff_t size, endoff;
> +
> +	size = i_size_read(src_inode);
> +	/* Don't copy beyond source file EOF. */
> +	if (src_off + len > size) {
> +		zonefs_err(src_inode->i_sb, "Copy beyond EOF (%llu + %zu > %llu)\n",
> +		     src_off, len, size);
> +		return -EOPNOTSUPP;

Reading beyond EOF returns 0, not an error, for any FS, including zonefs.
So why return an error here ?

> +	}
> +
> +	endoff = dst_off + len;
> +	if (inode_newsize_ok(dst_inode, endoff))
> +		return -EOPNOTSUPP;

This is not EOPNOTSUPP. This is EINVAL since the user is asking for
something that we know will fail due to the unaligned destination.

Furthermore, this code does not consider the zone type for the file. Since
the dest file could be a an aggregated conventional zone file which is
larger than a sequential zone file, this is not using the right limit.
This must be checked against i_max_size of struct zonefs_inode_info.

Note that inode_newsize_ok() must be called with inode->i_mutex held but
you never took that lock.

Also, the dest file could be a conventional zone file used for swap. And
you are not checking that. You have plenty of other checks missing. See
generic_copy_file_checks(). Calling that function should be fine for
zonefs too.

> +
> +
> +	return 0;
> +}
> +static ssize_t __zonefs_send_copy(struct zonefs_inode_info *src_zi, loff_t src_off,
> +				struct zonefs_inode_info *dst_zi, loff_t dst_off, size_t len)

Please rename this zonefs_issue_copy().

> +{
> +	struct block_device *src_bdev = src_zi->i_vnode.i_sb->s_bdev;
> +	struct block_device *dst_bdev = dst_zi->i_vnode.i_sb->s_bdev;
> +	struct range_entry *rlist;
> +	int ret = -EIO;

Initializing ret is not needed.

> +
> +	rlist = kmalloc(sizeof(*rlist), GFP_KERNEL);

No NULL check ?

> +	rlist[0].dst = (dst_zi->i_zsector << SECTOR_SHIFT) + dst_off;
> +	rlist[0].src = (src_zi->i_zsector << SECTOR_SHIFT) + src_off;
> +	rlist[0].len = len;
> +	rlist[0].comp_len = 0;
> +	ret = blkdev_issue_copy(src_bdev, 1, rlist, dst_bdev, GFP_KERNEL);
> +	if (ret) {
> +		if (rlist[0].comp_len != len) {

Pack this condition with the previous if using &&.

> +			ret = rlist[0].comp_len;
> +			kfree(rlist);
> +			return ret;

These 2 lines are not needed, the same is done below.

> +		}
> +	}
> +	kfree(rlist);
> +	return len;

And how do you handle this failing ? Where is zonefs_io_error() called ?
Without a call to that function, there is no way to guarantee that the
destination file state is still in sync with the zone state. This can fail
for all sorts of reasons (e.g. zone went offline), and that needs to be
checked.

> +}
> +static ssize_t __zonefs_copy_file_range(struct file *src_file, loff_t src_off,
> +				      struct file *dst_file, loff_t dst_off,
> +				      size_t len, unsigned int flags)
> +{
> +	struct inode *src_inode = file_inode(src_file);
> +	struct inode *dst_inode = file_inode(dst_file);
> +	struct zonefs_inode_info *src_zi = ZONEFS_I(src_inode);
> +	struct zonefs_inode_info *dst_zi = ZONEFS_I(dst_inode);
> +	struct block_device *src_bdev = src_inode->i_sb->s_bdev;
> +	struct block_device *dst_bdev = dst_inode->i_sb->s_bdev;
> +	struct super_block *src_sb = src_inode->i_sb;
> +	struct zonefs_sb_info *src_sbi = ZONEFS_SB(src_sb);
> +	struct super_block *dst_sb = dst_inode->i_sb;
> +	struct zonefs_sb_info *dst_sbi = ZONEFS_SB(dst_sb);
> +	ssize_t ret = -EIO, bytes;
> +
> +	if (src_bdev != dst_bdev) {
> +		zonefs_err(src_sb, "Copying files across two devices\n");
> +			return -EXDEV;

Weird indentation. And the error message is not needed.
The test can also be simplified to

if (src_inode->i_sb != dst_inode->i_sb)

> +	}
> +
> +	/*
> +	 * Some of the checks below will return -EOPNOTSUPP,
> +	 * which will force a generic copy
> +	 */
> +
> +	if (!(src_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
> +		|| !(dst_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE))
> +		return -EOPNOTSUPP;

I do not see the point of having this option. See below.

> +
> +	/* Start by sync'ing the source and destination files ifor conv zones */
> +	if (src_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
> +		ret = file_write_and_wait_range(src_file, src_off, (src_off + len));
> +		if (ret < 0) {
> +			zonefs_err(src_sb, "failed to write source file (%zd)\n", ret);
> +			goto out;
> +		}
> +	}
> +	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
> +		ret = file_write_and_wait_range(dst_file, dst_off, (dst_off + len));
> +		if (ret < 0) {
> +			zonefs_err(dst_sb, "failed to write destination file (%zd)\n", ret);
> +			goto out;
> +		}
> +	}

And what about inode_dio_wait() for sequential dst file ? Not needed ?

> +	mutex_lock(&dst_zi->i_truncate_mutex);
> +	if (len > dst_zi->i_max_size - dst_zi->i_wpoffset) {
> +		/* Adjust length */
> +		len -= dst_zi->i_max_size - dst_zi->i_wpoffset;
> +		if (len <= 0) {
> +			mutex_unlock(&dst_zi->i_truncate_mutex);
> +			return -EOPNOTSUPP;

If len is 0, return 0.

> +		}
> +	}
> +	if (dst_off != dst_zi->i_wpoffset) {
> +		mutex_unlock(&dst_zi->i_truncate_mutex);
> +		return -EOPNOTSUPP; /* copy not at zone write ptr */

This must be an EINVAL. See zonefs_file_dio_write().
The condition is also invalid since the file can be a conventional zone
file which allows to write anywhere. Did you really test this code properly ?

> +	}
> +	mutex_lock(&src_zi->i_truncate_mutex);
> +	ret = zonefs_is_file_size_ok(src_inode, dst_inode, src_off, dst_off, len);
> +	if (ret < 0) {
> +		mutex_unlock(&src_zi->i_truncate_mutex);
> +		mutex_unlock(&dst_zi->i_truncate_mutex);
> +		goto out;
> +	}
> +	mutex_unlock(&src_zi->i_truncate_mutex);
> +
> +	/* Drop dst file cached pages for a conv zone*/
> +	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
> +		ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
> +						    dst_off >> PAGE_SHIFT,
> +						    (dst_off + len) >> PAGE_SHIFT);
> +		if (ret < 0) {
> +			zonefs_err(dst_sb, "Failed to invalidate inode pages (%zd)\n", ret);
> +			ret = 0;

And you still go on ? That will corrupt data. No way. This error must be
returned to fail the copy.

> +		}
> +	}
> +	bytes = __zonefs_send_copy(src_zi, src_off, dst_zi, dst_off, len);
> +	ret += bytes;

You cannot hold i_truncate_mutex while doing IOs because if there is an
error, there will be a deadlock. Also, __zonefs_send_copy() can return an
error and that is not checked.

> +
> +	file_update_time(dst_file);
> +	zonefs_update_stats(dst_inode, dst_off + bytes);
> +	zonefs_i_size_write(dst_inode, dst_off + bytes);
> +	dst_zi->i_wpoffset += bytes;
> +	mutex_unlock(&dst_zi->i_truncate_mutex);
> +
> +
> +

2 extra uneeded blank lines.

> +	/*
> +	 * if we still have some bytes left, do splice copy
> +	 */

This comment fits on a single line.

> +	if (bytes && (bytes < len)) {
> +		zonefs_info(src_sb, "Final partial copy of %zu bytes\n", len);
> +		bytes = do_splice_direct(src_file, &src_off, dst_file,
> +					 &dst_off, len, flags);

And this can fail because other writes may be coming in since you never
locked inode->i_mutex.

> +		if (bytes > 0)
> +			ret += bytes;
> +		else
> +			zonefs_info(src_sb, "Failed partial copy (%zd)\n", bytes);

Error message not needed.

> +	}
> +
> +out:
> +
> +	return ret;
> +}
> +
> +static ssize_t zonefs_copy_file_range(struct file *src_file, loff_t src_off,
> +				    struct file *dst_file, loff_t dst_off,
> +				    size_t len, unsigned int flags)
> +{
> +	ssize_t ret;
> +
> +	ret = __zonefs_copy_file_range(src_file, src_off, dst_file, dst_off,
> +				     len, flags);
> +

Useless blank line. __zonefs_copy_file_range() needs to be split into
zonefs_copy_file_checks() and zonefs_copy_file(). The below call to
generic_copy_file_range() should go into zonefs_copy_file().
zonefs_copy_file() calling either generic_copy_file_range() if the device
does not have copy offload or zonefs_issue_copy() if it does.

> +	if (ret == -EOPNOTSUPP || ret == -EXDEV)> +		ret = generic_copy_file_range(src_file, src_off, dst_file,
> +					      dst_off, len, flags);

This function is not taking the inode_lock() for source and est files.
This means that this can run with concurent regular writes and truncate
and that potentially can result in some very weird results, unaligned
write errors and the FS going read-only.


> +	return ret;
> +}
> +
>  static const struct file_operations zonefs_file_operations = {
>  	.open		= zonefs_file_open,
>  	.release	= zonefs_file_release,
> @@ -1200,6 +1366,7 @@ static const struct file_operations zonefs_file_operations = {
>  	.splice_read	= generic_file_splice_read,
>  	.splice_write	= iter_file_splice_write,
>  	.iopoll		= iocb_bio_iopoll,
> +	.copy_file_range = zonefs_copy_file_range,
>  };
>  
>  static struct kmem_cache *zonefs_inode_cachep;
> @@ -1262,7 +1429,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  
>  enum {
>  	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
> -	Opt_explicit_open, Opt_err,
> +	Opt_explicit_open, Opt_no_copy_offload, Opt_err,

This mount option does not make much sense. Copy file range was not
supported until now. Existing applications are thus not using it. Adding
support for copy_file_range op will not break these applications so it can
be unconditionally supported.


>  };
>  
>  static const match_table_t tokens = {
> @@ -1271,6 +1438,7 @@ static const match_table_t tokens = {
>  	{ Opt_errors_zol,	"errors=zone-offline"},
>  	{ Opt_errors_repair,	"errors=repair"},
>  	{ Opt_explicit_open,	"explicit-open" },
> +	{ Opt_no_copy_offload,	"no_copy_offload" },
>  	{ Opt_err,		NULL}
>  };
>  
> @@ -1280,6 +1448,7 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
>  	substring_t args[MAX_OPT_ARGS];
>  	char *p;
>  
> +	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
>  	if (!options)
>  		return 0;
>  
> @@ -1310,6 +1479,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
>  		case Opt_explicit_open:
>  			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
>  			break;
> +		case Opt_no_copy_offload:
> +			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_COPY_FILE;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> @@ -1330,6 +1502,8 @@ static int zonefs_show_options(struct seq_file *seq, struct dentry *root)
>  		seq_puts(seq, ",errors=zone-offline");
>  	if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_REPAIR)
>  		seq_puts(seq, ",errors=repair");
> +	if (sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
> +		seq_puts(seq, ",copy_offload");
>  
>  	return 0;
>  }
> @@ -1769,6 +1943,8 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
>  	atomic_set(&sbi->s_active_seq_files, 0);
>  	sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev);
>  
> +	/* set copy support by default */
> +	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
>  	ret = zonefs_read_super(sb);
>  	if (ret)
>  		return ret;
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 4b3de66c3233..efa6632c4b6a 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -162,6 +162,7 @@ enum zonefs_features {
>  	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
>  	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
>  #define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
> +#define ZONEFS_MNTOPT_COPY_FILE		(1 << 5) /* enable copy file range offload to kernel */
>  
>  /*
>   * In-memory Super block information.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 10/10] fs: add support for copy file range in zonefs
@ 2022-04-27  1:42         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  1:42 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, Arnav Dawn, jack,
	linux-fsdevel, lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> From: Arnav Dawn <arnav.dawn@samsung.com>
> 
> copy_file_range is implemented using copy offload,
> copy offloading to device is always enabled.
> To disable copy offloading mount with "no_copy_offload" mount option.
> At present copy offload is only used, if the source and destination files
> are on same block device, otherwise copy file range is completed by
> generic copy file range.

Why not integrate copy offload inside generic_copy_file_range() ?

> 
> copy file range implemented as following:
> 	- write pending writes on the src and dest files
> 	- drop page cache for dest file if its conv zone
> 	- copy the range using offload
> 	- update dest file info
> 
> For all failure cases we fallback to generic file copy range
> At present this implementation does not support conv aggregation
> 
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  fs/zonefs/super.c  | 178 ++++++++++++++++++++++++++++++++++++++++++++-
>  fs/zonefs/zonefs.h |   1 +
>  2 files changed, 178 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index b3b0b71fdf6c..60563b592bf2 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -901,6 +901,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
>  	else
>  		ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
>  				   &zonefs_write_dio_ops, 0, 0);
> +

Unnecessary white line change.

>  	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
>  	    (ret > 0 || ret == -EIOCBQUEUED)) {
>  		if (ret > 0)
> @@ -1189,6 +1190,171 @@ static int zonefs_file_release(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static int zonefs_is_file_size_ok(struct inode *src_inode, struct inode *dst_inode,
> +			   loff_t src_off, loff_t dst_off, size_t len)

This function is badly named. It is not checking if the size of files is
OK, is is checking if the copy offsets are OK.

> +{
> +	loff_t size, endoff;
> +
> +	size = i_size_read(src_inode);
> +	/* Don't copy beyond source file EOF. */
> +	if (src_off + len > size) {
> +		zonefs_err(src_inode->i_sb, "Copy beyond EOF (%llu + %zu > %llu)\n",
> +		     src_off, len, size);
> +		return -EOPNOTSUPP;

Reading beyond EOF returns 0, not an error, for any FS, including zonefs.
So why return an error here ?

> +	}
> +
> +	endoff = dst_off + len;
> +	if (inode_newsize_ok(dst_inode, endoff))
> +		return -EOPNOTSUPP;

This is not EOPNOTSUPP. This is EINVAL since the user is asking for
something that we know will fail due to the unaligned destination.

Furthermore, this code does not consider the zone type for the file. Since
the dest file could be a an aggregated conventional zone file which is
larger than a sequential zone file, this is not using the right limit.
This must be checked against i_max_size of struct zonefs_inode_info.

Note that inode_newsize_ok() must be called with inode->i_mutex held but
you never took that lock.

Also, the dest file could be a conventional zone file used for swap. And
you are not checking that. You have plenty of other checks missing. See
generic_copy_file_checks(). Calling that function should be fine for
zonefs too.

> +
> +
> +	return 0;
> +}
> +static ssize_t __zonefs_send_copy(struct zonefs_inode_info *src_zi, loff_t src_off,
> +				struct zonefs_inode_info *dst_zi, loff_t dst_off, size_t len)

Please rename this zonefs_issue_copy().

> +{
> +	struct block_device *src_bdev = src_zi->i_vnode.i_sb->s_bdev;
> +	struct block_device *dst_bdev = dst_zi->i_vnode.i_sb->s_bdev;
> +	struct range_entry *rlist;
> +	int ret = -EIO;

Initializing ret is not needed.

> +
> +	rlist = kmalloc(sizeof(*rlist), GFP_KERNEL);

No NULL check ?

> +	rlist[0].dst = (dst_zi->i_zsector << SECTOR_SHIFT) + dst_off;
> +	rlist[0].src = (src_zi->i_zsector << SECTOR_SHIFT) + src_off;
> +	rlist[0].len = len;
> +	rlist[0].comp_len = 0;
> +	ret = blkdev_issue_copy(src_bdev, 1, rlist, dst_bdev, GFP_KERNEL);
> +	if (ret) {
> +		if (rlist[0].comp_len != len) {

Pack this condition with the previous if using &&.

> +			ret = rlist[0].comp_len;
> +			kfree(rlist);
> +			return ret;

These 2 lines are not needed, the same is done below.

> +		}
> +	}
> +	kfree(rlist);
> +	return len;

And how do you handle this failing ? Where is zonefs_io_error() called ?
Without a call to that function, there is no way to guarantee that the
destination file state is still in sync with the zone state. This can fail
for all sorts of reasons (e.g. zone went offline), and that needs to be
checked.

> +}
> +static ssize_t __zonefs_copy_file_range(struct file *src_file, loff_t src_off,
> +				      struct file *dst_file, loff_t dst_off,
> +				      size_t len, unsigned int flags)
> +{
> +	struct inode *src_inode = file_inode(src_file);
> +	struct inode *dst_inode = file_inode(dst_file);
> +	struct zonefs_inode_info *src_zi = ZONEFS_I(src_inode);
> +	struct zonefs_inode_info *dst_zi = ZONEFS_I(dst_inode);
> +	struct block_device *src_bdev = src_inode->i_sb->s_bdev;
> +	struct block_device *dst_bdev = dst_inode->i_sb->s_bdev;
> +	struct super_block *src_sb = src_inode->i_sb;
> +	struct zonefs_sb_info *src_sbi = ZONEFS_SB(src_sb);
> +	struct super_block *dst_sb = dst_inode->i_sb;
> +	struct zonefs_sb_info *dst_sbi = ZONEFS_SB(dst_sb);
> +	ssize_t ret = -EIO, bytes;
> +
> +	if (src_bdev != dst_bdev) {
> +		zonefs_err(src_sb, "Copying files across two devices\n");
> +			return -EXDEV;

Weird indentation. And the error message is not needed.
The test can also be simplified to

if (src_inode->i_sb != dst_inode->i_sb)

> +	}
> +
> +	/*
> +	 * Some of the checks below will return -EOPNOTSUPP,
> +	 * which will force a generic copy
> +	 */
> +
> +	if (!(src_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
> +		|| !(dst_sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE))
> +		return -EOPNOTSUPP;

I do not see the point of having this option. See below.

> +
> +	/* Start by sync'ing the source and destination files ifor conv zones */
> +	if (src_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
> +		ret = file_write_and_wait_range(src_file, src_off, (src_off + len));
> +		if (ret < 0) {
> +			zonefs_err(src_sb, "failed to write source file (%zd)\n", ret);
> +			goto out;
> +		}
> +	}
> +	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
> +		ret = file_write_and_wait_range(dst_file, dst_off, (dst_off + len));
> +		if (ret < 0) {
> +			zonefs_err(dst_sb, "failed to write destination file (%zd)\n", ret);
> +			goto out;
> +		}
> +	}

And what about inode_dio_wait() for sequential dst file ? Not needed ?

> +	mutex_lock(&dst_zi->i_truncate_mutex);
> +	if (len > dst_zi->i_max_size - dst_zi->i_wpoffset) {
> +		/* Adjust length */
> +		len -= dst_zi->i_max_size - dst_zi->i_wpoffset;
> +		if (len <= 0) {
> +			mutex_unlock(&dst_zi->i_truncate_mutex);
> +			return -EOPNOTSUPP;

If len is 0, return 0.

> +		}
> +	}
> +	if (dst_off != dst_zi->i_wpoffset) {
> +		mutex_unlock(&dst_zi->i_truncate_mutex);
> +		return -EOPNOTSUPP; /* copy not at zone write ptr */

This must be an EINVAL. See zonefs_file_dio_write().
The condition is also invalid since the file can be a conventional zone
file which allows to write anywhere. Did you really test this code properly ?

> +	}
> +	mutex_lock(&src_zi->i_truncate_mutex);
> +	ret = zonefs_is_file_size_ok(src_inode, dst_inode, src_off, dst_off, len);
> +	if (ret < 0) {
> +		mutex_unlock(&src_zi->i_truncate_mutex);
> +		mutex_unlock(&dst_zi->i_truncate_mutex);
> +		goto out;
> +	}
> +	mutex_unlock(&src_zi->i_truncate_mutex);
> +
> +	/* Drop dst file cached pages for a conv zone*/
> +	if (dst_zi->i_ztype == ZONEFS_ZTYPE_CNV) {
> +		ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
> +						    dst_off >> PAGE_SHIFT,
> +						    (dst_off + len) >> PAGE_SHIFT);
> +		if (ret < 0) {
> +			zonefs_err(dst_sb, "Failed to invalidate inode pages (%zd)\n", ret);
> +			ret = 0;

And you still go on ? That will corrupt data. No way. This error must be
returned to fail the copy.

> +		}
> +	}
> +	bytes = __zonefs_send_copy(src_zi, src_off, dst_zi, dst_off, len);
> +	ret += bytes;

You cannot hold i_truncate_mutex while doing IOs because if there is an
error, there will be a deadlock. Also, __zonefs_send_copy() can return an
error and that is not checked.

> +
> +	file_update_time(dst_file);
> +	zonefs_update_stats(dst_inode, dst_off + bytes);
> +	zonefs_i_size_write(dst_inode, dst_off + bytes);
> +	dst_zi->i_wpoffset += bytes;
> +	mutex_unlock(&dst_zi->i_truncate_mutex);
> +
> +
> +

2 extra uneeded blank lines.

> +	/*
> +	 * if we still have some bytes left, do splice copy
> +	 */

This comment fits on a single line.

> +	if (bytes && (bytes < len)) {
> +		zonefs_info(src_sb, "Final partial copy of %zu bytes\n", len);
> +		bytes = do_splice_direct(src_file, &src_off, dst_file,
> +					 &dst_off, len, flags);

And this can fail because other writes may be coming in since you never
locked inode->i_mutex.

> +		if (bytes > 0)
> +			ret += bytes;
> +		else
> +			zonefs_info(src_sb, "Failed partial copy (%zd)\n", bytes);

Error message not needed.

> +	}
> +
> +out:
> +
> +	return ret;
> +}
> +
> +static ssize_t zonefs_copy_file_range(struct file *src_file, loff_t src_off,
> +				    struct file *dst_file, loff_t dst_off,
> +				    size_t len, unsigned int flags)
> +{
> +	ssize_t ret;
> +
> +	ret = __zonefs_copy_file_range(src_file, src_off, dst_file, dst_off,
> +				     len, flags);
> +

Useless blank line. __zonefs_copy_file_range() needs to be split into
zonefs_copy_file_checks() and zonefs_copy_file(). The below call to
generic_copy_file_range() should go into zonefs_copy_file().
zonefs_copy_file() calling either generic_copy_file_range() if the device
does not have copy offload or zonefs_issue_copy() if it does.

> +	if (ret == -EOPNOTSUPP || ret == -EXDEV)> +		ret = generic_copy_file_range(src_file, src_off, dst_file,
> +					      dst_off, len, flags);

This function is not taking the inode_lock() for source and est files.
This means that this can run with concurent regular writes and truncate
and that potentially can result in some very weird results, unaligned
write errors and the FS going read-only.


> +	return ret;
> +}
> +
>  static const struct file_operations zonefs_file_operations = {
>  	.open		= zonefs_file_open,
>  	.release	= zonefs_file_release,
> @@ -1200,6 +1366,7 @@ static const struct file_operations zonefs_file_operations = {
>  	.splice_read	= generic_file_splice_read,
>  	.splice_write	= iter_file_splice_write,
>  	.iopoll		= iocb_bio_iopoll,
> +	.copy_file_range = zonefs_copy_file_range,
>  };
>  
>  static struct kmem_cache *zonefs_inode_cachep;
> @@ -1262,7 +1429,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  
>  enum {
>  	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
> -	Opt_explicit_open, Opt_err,
> +	Opt_explicit_open, Opt_no_copy_offload, Opt_err,

This mount option does not make much sense. Copy file range was not
supported until now. Existing applications are thus not using it. Adding
support for copy_file_range op will not break these applications so it can
be unconditionally supported.


>  };
>  
>  static const match_table_t tokens = {
> @@ -1271,6 +1438,7 @@ static const match_table_t tokens = {
>  	{ Opt_errors_zol,	"errors=zone-offline"},
>  	{ Opt_errors_repair,	"errors=repair"},
>  	{ Opt_explicit_open,	"explicit-open" },
> +	{ Opt_no_copy_offload,	"no_copy_offload" },
>  	{ Opt_err,		NULL}
>  };
>  
> @@ -1280,6 +1448,7 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
>  	substring_t args[MAX_OPT_ARGS];
>  	char *p;
>  
> +	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
>  	if (!options)
>  		return 0;
>  
> @@ -1310,6 +1479,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
>  		case Opt_explicit_open:
>  			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
>  			break;
> +		case Opt_no_copy_offload:
> +			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_COPY_FILE;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> @@ -1330,6 +1502,8 @@ static int zonefs_show_options(struct seq_file *seq, struct dentry *root)
>  		seq_puts(seq, ",errors=zone-offline");
>  	if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_REPAIR)
>  		seq_puts(seq, ",errors=repair");
> +	if (sbi->s_mount_opts & ZONEFS_MNTOPT_COPY_FILE)
> +		seq_puts(seq, ",copy_offload");
>  
>  	return 0;
>  }
> @@ -1769,6 +1943,8 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
>  	atomic_set(&sbi->s_active_seq_files, 0);
>  	sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev);
>  
> +	/* set copy support by default */
> +	sbi->s_mount_opts |= ZONEFS_MNTOPT_COPY_FILE;
>  	ret = zonefs_read_super(sb);
>  	if (ret)
>  		return ret;
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 4b3de66c3233..efa6632c4b6a 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -162,6 +162,7 @@ enum zonefs_features {
>  	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
>  	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
>  #define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
> +#define ZONEFS_MNTOPT_COPY_FILE		(1 << 5) /* enable copy file range offload to kernel */
>  
>  /*
>   * In-memory Super block information.


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-26 10:12   ` Nitesh Shetty
@ 2022-04-27  1:46     ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  1:46 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart,
	Chaitanya Kulkarni, Naohiro Aota, Johannes Thumshirn,
	Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.
> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
>     block and file backend)
> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
>     two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> - copy_file_range for zonefs
> 
> 4. In-kernel user
> - dm-kcopyd
> - copy_file_range in zonefs
> 
> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> testing is done at this point using a custom application for unit testing.

https://github.com/westerndigitalcorporation/zonefs-tools

./configure --with-tests
make
sudo make install

Then run tests/zonefs-tests.sh

Adding test case is simple. Just add script files under tests/scripts

I just realized that the README file of this project is not documenting
this. I will update it.

> 
> Appreciate the inputs on plumbing and how to test this further?
> Perhaps some of it can be discussed during LSF/MM too.
> 
> [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
> 
> Changes in v4:
> - added copy_file_range support for zonefs
> - added documentaion about new sysfs entries
> - incorporated review comments on v3
> - minor fixes
> 
> 
> Arnav Dawn (2):
>   nvmet: add copy command support for bdev and file ns
>   fs: add support for copy file range in zonefs
> 
> Nitesh Shetty (7):
>   block: Introduce queue limits for copy-offload support
>   block: Add copy offload support infrastructure
>   block: Introduce a new ioctl for copy
>   block: add emulation for copy
>   nvme: add copy offload support
>   dm: Add support for copy offload.
>   dm: Enable copy offload for dm-linear target
> 
> SelvaKumar S (1):
>   dm kcopyd: use copy offload support
> 
>  Documentation/ABI/stable/sysfs-block |  83 +++++++
>  block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
>  block/blk-map.c                      |   2 +-
>  block/blk-settings.c                 |  59 +++++
>  block/blk-sysfs.c                    | 138 +++++++++++
>  block/blk.h                          |   2 +
>  block/ioctl.c                        |  32 +++
>  drivers/md/dm-kcopyd.c               |  55 +++-
>  drivers/md/dm-linear.c               |   1 +
>  drivers/md/dm-table.c                |  45 ++++
>  drivers/md/dm.c                      |   6 +
>  drivers/nvme/host/core.c             | 116 ++++++++-
>  drivers/nvme/host/fc.c               |   4 +
>  drivers/nvme/host/nvme.h             |   7 +
>  drivers/nvme/host/pci.c              |  25 ++
>  drivers/nvme/host/rdma.c             |   6 +
>  drivers/nvme/host/tcp.c              |  14 ++
>  drivers/nvme/host/trace.c            |  19 ++
>  drivers/nvme/target/admin-cmd.c      |   8 +-
>  drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
>  drivers/nvme/target/io-cmd-file.c    |  49 ++++
>  fs/zonefs/super.c                    | 178 ++++++++++++-
>  fs/zonefs/zonefs.h                   |   1 +
>  include/linux/blk_types.h            |  21 ++
>  include/linux/blkdev.h               |  17 ++
>  include/linux/device-mapper.h        |   5 +
>  include/linux/nvme.h                 |  43 +++-
>  include/uapi/linux/fs.h              |  23 ++
>  28 files changed, 1367 insertions(+), 15 deletions(-)
> 
> 
> base-commit: e7d6987e09a328d4a949701db40ef63fbb970670


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27  1:46     ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  1:46 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, jack, linux-fsdevel,
	lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.
> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
>     block and file backend)
> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
>     two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> - copy_file_range for zonefs
> 
> 4. In-kernel user
> - dm-kcopyd
> - copy_file_range in zonefs
> 
> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> testing is done at this point using a custom application for unit testing.

https://github.com/westerndigitalcorporation/zonefs-tools

./configure --with-tests
make
sudo make install

Then run tests/zonefs-tests.sh

Adding test case is simple. Just add script files under tests/scripts

I just realized that the README file of this project is not documenting
this. I will update it.

> 
> Appreciate the inputs on plumbing and how to test this further?
> Perhaps some of it can be discussed during LSF/MM too.
> 
> [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
> 
> Changes in v4:
> - added copy_file_range support for zonefs
> - added documentaion about new sysfs entries
> - incorporated review comments on v3
> - minor fixes
> 
> 
> Arnav Dawn (2):
>   nvmet: add copy command support for bdev and file ns
>   fs: add support for copy file range in zonefs
> 
> Nitesh Shetty (7):
>   block: Introduce queue limits for copy-offload support
>   block: Add copy offload support infrastructure
>   block: Introduce a new ioctl for copy
>   block: add emulation for copy
>   nvme: add copy offload support
>   dm: Add support for copy offload.
>   dm: Enable copy offload for dm-linear target
> 
> SelvaKumar S (1):
>   dm kcopyd: use copy offload support
> 
>  Documentation/ABI/stable/sysfs-block |  83 +++++++
>  block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
>  block/blk-map.c                      |   2 +-
>  block/blk-settings.c                 |  59 +++++
>  block/blk-sysfs.c                    | 138 +++++++++++
>  block/blk.h                          |   2 +
>  block/ioctl.c                        |  32 +++
>  drivers/md/dm-kcopyd.c               |  55 +++-
>  drivers/md/dm-linear.c               |   1 +
>  drivers/md/dm-table.c                |  45 ++++
>  drivers/md/dm.c                      |   6 +
>  drivers/nvme/host/core.c             | 116 ++++++++-
>  drivers/nvme/host/fc.c               |   4 +
>  drivers/nvme/host/nvme.h             |   7 +
>  drivers/nvme/host/pci.c              |  25 ++
>  drivers/nvme/host/rdma.c             |   6 +
>  drivers/nvme/host/tcp.c              |  14 ++
>  drivers/nvme/host/trace.c            |  19 ++
>  drivers/nvme/target/admin-cmd.c      |   8 +-
>  drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
>  drivers/nvme/target/io-cmd-file.c    |  49 ++++
>  fs/zonefs/super.c                    | 178 ++++++++++++-
>  fs/zonefs/zonefs.h                   |   1 +
>  include/linux/blk_types.h            |  21 ++
>  include/linux/blkdev.h               |  17 ++
>  include/linux/device-mapper.h        |   5 +
>  include/linux/nvme.h                 |  43 +++-
>  include/uapi/linux/fs.h              |  23 ++
>  28 files changed, 1367 insertions(+), 15 deletions(-)
> 
> 
> base-commit: e7d6987e09a328d4a949701db40ef63fbb970670


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27  1:59         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  1:59 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Kanchan Joshi, Arnav Dawn, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> Add device limits as sysfs entries,
>         - copy_offload (RW)
>         - copy_max_bytes (RW)
>         - copy_max_hw_bytes (RO)
>         - copy_max_range_bytes (RW)
>         - copy_max_range_hw_bytes (RO)
>         - copy_max_nr_ranges (RW)
>         - copy_max_nr_ranges_hw (RO)
> 
> Above limits help to split the copy payload in block layer.
> copy_offload, used for setting copy offload(1) or emulation(0).
> copy_max_bytes: maximum total length of copy in single payload.
> copy_max_range_bytes: maximum length in a single entry.
> copy_max_nr_ranges: maximum number of entries in a payload.
> copy_max_*_hw_*: Reflects the device supported maximum limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
>  block/blk-settings.c                 |  59 ++++++++++++
>  block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
>  include/linux/blkdev.h               |  13 +++
>  4 files changed, 293 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index e8797cd09aff..65e64b5a0105 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -155,6 +155,89 @@ Description:
>  		last zone of the device which may be smaller.
>  
>  
> +What:		/sys/block/<disk>/queue/copy_offload
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] When read, this file shows whether offloading copy to
> +		device is enabled (1) or disabled (0). Writing '0' to this
> +		file will disable offloading copies for this device.
> +		Writing any '1' value will enable this feature.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
> +		device, 'copy_max_bytes' setting is the software limit.
> +		Setting this value lower will make Linux issue smaller size
> +		copies.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_hw_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Devices that support offloading copy functionality may have
> +		internal limits on the number of bytes that can be offloaded
> +		in a single operation. The `copy_max_hw_bytes`
> +		parameter is set by the device driver to the maximum number of
> +		bytes that can be copied in a single operation. Copy
> +		requests issued to the device must not exceed this limit.
> +		A value of 0 means that the device does not
> +		support copy offload.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_nr_ranges
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
> +		device, 'copy_max_nr_ranges' setting is the software limit.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Devices that support offloading copy functionality may have
> +		internal limits on the number of ranges in single copy operation
> +		that can be offloaded in a single operation.
> +		A range is tuple of source, destination and length of data
> +		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
> +		the device driver to the maximum number of ranges that can be
> +		copied in a single operation. Copy requests issued to the device
> +		must not exceed this limit. A value of 0 means that the device
> +		does not support copy offload.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_range_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
> +		the device, 'copy_max_range_bytes' setting is the software
> +		limit.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Devices that support offloading copy functionality may have
> +		internal limits on the size of data, that can be copied in a
> +		single range within a single copy operation.
> +		A range is tuple of source, destination and length of data to be
> +		copied. The `copy_max_range_hw_bytes` parameter is set by the
> +		device driver to set the maximum length in bytes of a range
> +		that can be copied in an operation.
> +		Copy requests issued to the device must not exceed this limit.
> +		Sum of sizes of all ranges in a single opeartion should not
> +		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
> +		does not support copy offload.
> +
> +
>  What:		/sys/block/<disk>/queue/crypto/
>  Date:		February 2022
>  Contact:	linux-block@vger.kernel.org
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 6ccceb421ed2..70167aee3bf7 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->misaligned = 0;
>  	lim->zoned = BLK_ZONED_NONE;
>  	lim->zone_write_granularity = 0;
> +	lim->max_hw_copy_sectors = 0;

For readability, I would keep "hw" next to sectors/nr_ranges:

max_copy_hw_sectors
max_copy_sectors
max_copy_hw_nr_ranges
max_copy_nr_ranges
max_copy_range_hw_sectors
max_copy_range_sectors

> +	lim->max_copy_sectors = 0;
> +	lim->max_hw_copy_nr_ranges = 0;
> +	lim->max_copy_nr_ranges = 0;
> +	lim->max_hw_copy_range_sectors = 0;
> +	lim->max_copy_range_sectors = 0;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>  
> @@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>  	lim->max_dev_sectors = UINT_MAX;
>  	lim->max_write_zeroes_sectors = UINT_MAX;
>  	lim->max_zone_append_sectors = UINT_MAX;
> +	lim->max_hw_copy_sectors = ULONG_MAX;
> +	lim->max_copy_sectors = ULONG_MAX;
> +	lim->max_hw_copy_range_sectors = UINT_MAX;
> +	lim->max_copy_range_sectors = UINT_MAX;
> +	lim->max_hw_copy_nr_ranges = USHRT_MAX;
> +	lim->max_copy_nr_ranges = USHRT_MAX;
>  }
>  EXPORT_SYMBOL(blk_set_stacking_limits);
>  
> @@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_queue_max_discard_sectors);
>  
> +/**
> + * blk_queue_max_copy_sectors - set max sectors for a single copy payload
> + * @q:  the request queue for the device
> + * @max_copy_sectors: maximum number of sectors to copy
> + **/
> +void blk_queue_max_copy_sectors(struct request_queue *q,

This should be blk_queue_max_copy_hw_sectors().

> +		unsigned int max_copy_sectors)
> +{
> +	q->limits.max_hw_copy_sectors = max_copy_sectors;
> +	q->limits.max_copy_sectors = max_copy_sectors;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> +
> +/**
> + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
> + * @q:  the request queue for the device
> + * @max_copy_range_sectors: maximum number of sectors to copy in a single range
> + **/
> +void blk_queue_max_copy_range_sectors(struct request_queue *q,

And this should be blk_queue_max_copy_range_hw_sectors(). Etc for the
other ones below.

> +		unsigned int max_copy_range_sectors)
> +{
> +	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
> +	q->limits.max_copy_range_sectors = max_copy_range_sectors;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
> +
> +/**
> + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
> + * @q:  the request queue for the device
> + * @max_copy_nr_ranges: maximum number of ranges
> + **/
> +void blk_queue_max_copy_nr_ranges(struct request_queue *q,
> +		unsigned int max_copy_nr_ranges)
> +{
> +	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
> +	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
> +
>  /**
>   * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
>   * @q:  the request queue for the device
> @@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  	t->max_segment_size = min_not_zero(t->max_segment_size,
>  					   b->max_segment_size);
>  
> +	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
> +	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
> +	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
> +	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
> +						b->max_hw_copy_range_sectors);
> +	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
> +	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
> +
>  	t->misaligned |= b->misaligned;
>  
>  	alignment = queue_limit_alignment_offset(b, start);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 88bd41d4cb59..bae987c10f7f 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
>  	return queue_var_show(0, page);
>  }
>  
> +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(blk_queue_copy(q), page);
> +}
> +
> +static ssize_t queue_copy_offload_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long copy_offload;
> +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (copy_offload && !q->limits.max_hw_copy_sectors)
> +		return -EINVAL;
> +
> +	if (copy_offload)
> +		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
> +
> +	return ret;
> +}
> +
> +static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_max_show(struct request_queue *q, char *page> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_copy_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_max_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long max_copy;
> +	ssize_t ret = queue_var_store(&max_copy, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (max_copy & (queue_logical_block_size(q) - 1))
> +		return -EINVAL;
> +
> +	max_copy >>= 9;
> +	if (max_copy > q->limits.max_hw_copy_sectors)
> +		max_copy = q->limits.max_hw_copy_sectors;
> +
> +	q->limits.max_copy_sectors = max_copy;
> +	return ret;
> +}
> +
> +static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_range_max_show(struct request_queue *q,
> +		char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_copy_range_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_range_max_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long max_copy;
> +	ssize_t ret = queue_var_store(&max_copy, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (max_copy & (queue_logical_block_size(q) - 1))
> +		return -EINVAL;
> +
> +	max_copy >>= 9;
> +	if (max_copy > UINT_MAX)

On 32-bits arch, unsigned long and unsigned int are the same so this test
is useless for these arch. Better have max_copy declared as unsigned long
long.

> +		return -EINVAL;
> +
> +	if (max_copy > q->limits.max_hw_copy_range_sectors)
> +		max_copy = q->limits.max_hw_copy_range_sectors;
> +
> +	q->limits.max_copy_range_sectors = max_copy;
> +	return ret;
> +}
> +
> +static ssize_t queue_copy_nr_ranges_max_hw_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->limits.max_hw_copy_nr_ranges, page);
> +}
> +
> +static ssize_t queue_copy_nr_ranges_max_show(struct request_queue *q,
> +		char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_nr_ranges, page);
> +}
> +
> +static ssize_t queue_copy_nr_ranges_max_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long max_nr;
> +	ssize_t ret = queue_var_store(&max_nr, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (max_nr > USHRT_MAX)
> +		return -EINVAL;
> +
> +	if (max_nr > q->limits.max_hw_copy_nr_ranges)
> +		max_nr = q->limits.max_hw_copy_nr_ranges;
> +
> +	q->limits.max_copy_nr_ranges = max_nr;
> +	return ret;
> +}
> +
>  static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
>  {
>  	return queue_var_show(0, page);
> @@ -596,6 +719,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
>  QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
>  QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
>  
> +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
> +QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes");
> +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes");
> +QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes");
> +QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes");
> +QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw");
> +QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges");
> +
>  QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
>  QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
>  QUEUE_RW_ENTRY(queue_poll, "io_poll");
> @@ -642,6 +773,13 @@ static struct attribute *queue_attrs[] = {
>  	&queue_discard_max_entry.attr,
>  	&queue_discard_max_hw_entry.attr,
>  	&queue_discard_zeroes_data_entry.attr,
> +	&queue_copy_offload_entry.attr,
> +	&queue_copy_max_hw_entry.attr,
> +	&queue_copy_max_entry.attr,
> +	&queue_copy_range_max_hw_entry.attr,
> +	&queue_copy_range_max_entry.attr,
> +	&queue_copy_nr_ranges_max_hw_entry.attr,
> +	&queue_copy_nr_ranges_max_entry.attr,
>  	&queue_write_same_max_entry.attr,
>  	&queue_write_zeroes_max_entry.attr,
>  	&queue_zone_append_max_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1b24c1fb3bb1..3596fd37fae7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -270,6 +270,13 @@ struct queue_limits {
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
>  
> +	unsigned long		max_hw_copy_sectors;
> +	unsigned long		max_copy_sectors;
> +	unsigned int		max_hw_copy_range_sectors;
> +	unsigned int		max_copy_range_sectors;
> +	unsigned short		max_hw_copy_nr_ranges;
> +	unsigned short		max_copy_nr_ranges;
> +
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
>  	unsigned short		max_discard_segments;
> @@ -574,6 +581,7 @@ struct request_queue {
>  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
>  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
>  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> +#define QUEUE_FLAG_COPY		30	/* supports copy offload */
>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> @@ -596,6 +604,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
>  #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
>  #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
> +#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
>  #define blk_queue_zone_resetall(q)	\
>  	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
>  #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> @@ -960,6 +969,10 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
>  extern void blk_queue_max_segments(struct request_queue *, unsigned short);
>  extern void blk_queue_max_discard_segments(struct request_queue *,
>  		unsigned short);
> +extern void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int max_copy_sectors);
> +extern void blk_queue_max_copy_range_sectors(struct request_queue *q,
> +		unsigned int max_copy_range_sectors);
> +extern void blk_queue_max_copy_nr_ranges(struct request_queue *q, unsigned int max_copy_nr_ranges);
>  void blk_queue_max_secure_erase_sectors(struct request_queue *q,
>  		unsigned int max_sectors);
>  extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
@ 2022-04-27  1:59         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  1:59 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, Kanchan Joshi, martin.petersen, linux-kernel, Arnav Dawn,
	jack, linux-fsdevel, lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> Add device limits as sysfs entries,
>         - copy_offload (RW)
>         - copy_max_bytes (RW)
>         - copy_max_hw_bytes (RO)
>         - copy_max_range_bytes (RW)
>         - copy_max_range_hw_bytes (RO)
>         - copy_max_nr_ranges (RW)
>         - copy_max_nr_ranges_hw (RO)
> 
> Above limits help to split the copy payload in block layer.
> copy_offload, used for setting copy offload(1) or emulation(0).
> copy_max_bytes: maximum total length of copy in single payload.
> copy_max_range_bytes: maximum length in a single entry.
> copy_max_nr_ranges: maximum number of entries in a payload.
> copy_max_*_hw_*: Reflects the device supported maximum limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
>  block/blk-settings.c                 |  59 ++++++++++++
>  block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
>  include/linux/blkdev.h               |  13 +++
>  4 files changed, 293 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index e8797cd09aff..65e64b5a0105 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -155,6 +155,89 @@ Description:
>  		last zone of the device which may be smaller.
>  
>  
> +What:		/sys/block/<disk>/queue/copy_offload
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] When read, this file shows whether offloading copy to
> +		device is enabled (1) or disabled (0). Writing '0' to this
> +		file will disable offloading copies for this device.
> +		Writing any '1' value will enable this feature.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
> +		device, 'copy_max_bytes' setting is the software limit.
> +		Setting this value lower will make Linux issue smaller size
> +		copies.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_hw_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Devices that support offloading copy functionality may have
> +		internal limits on the number of bytes that can be offloaded
> +		in a single operation. The `copy_max_hw_bytes`
> +		parameter is set by the device driver to the maximum number of
> +		bytes that can be copied in a single operation. Copy
> +		requests issued to the device must not exceed this limit.
> +		A value of 0 means that the device does not
> +		support copy offload.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_nr_ranges
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
> +		device, 'copy_max_nr_ranges' setting is the software limit.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Devices that support offloading copy functionality may have
> +		internal limits on the number of ranges in single copy operation
> +		that can be offloaded in a single operation.
> +		A range is tuple of source, destination and length of data
> +		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
> +		the device driver to the maximum number of ranges that can be
> +		copied in a single operation. Copy requests issued to the device
> +		must not exceed this limit. A value of 0 means that the device
> +		does not support copy offload.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_range_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
> +		the device, 'copy_max_range_bytes' setting is the software
> +		limit.
> +
> +
> +What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
> +Date:		April 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Devices that support offloading copy functionality may have
> +		internal limits on the size of data, that can be copied in a
> +		single range within a single copy operation.
> +		A range is tuple of source, destination and length of data to be
> +		copied. The `copy_max_range_hw_bytes` parameter is set by the
> +		device driver to set the maximum length in bytes of a range
> +		that can be copied in an operation.
> +		Copy requests issued to the device must not exceed this limit.
> +		Sum of sizes of all ranges in a single opeartion should not
> +		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
> +		does not support copy offload.
> +
> +
>  What:		/sys/block/<disk>/queue/crypto/
>  Date:		February 2022
>  Contact:	linux-block@vger.kernel.org
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 6ccceb421ed2..70167aee3bf7 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->misaligned = 0;
>  	lim->zoned = BLK_ZONED_NONE;
>  	lim->zone_write_granularity = 0;
> +	lim->max_hw_copy_sectors = 0;

For readability, I would keep "hw" next to sectors/nr_ranges:

max_copy_hw_sectors
max_copy_sectors
max_copy_hw_nr_ranges
max_copy_nr_ranges
max_copy_range_hw_sectors
max_copy_range_sectors

> +	lim->max_copy_sectors = 0;
> +	lim->max_hw_copy_nr_ranges = 0;
> +	lim->max_copy_nr_ranges = 0;
> +	lim->max_hw_copy_range_sectors = 0;
> +	lim->max_copy_range_sectors = 0;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>  
> @@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>  	lim->max_dev_sectors = UINT_MAX;
>  	lim->max_write_zeroes_sectors = UINT_MAX;
>  	lim->max_zone_append_sectors = UINT_MAX;
> +	lim->max_hw_copy_sectors = ULONG_MAX;
> +	lim->max_copy_sectors = ULONG_MAX;
> +	lim->max_hw_copy_range_sectors = UINT_MAX;
> +	lim->max_copy_range_sectors = UINT_MAX;
> +	lim->max_hw_copy_nr_ranges = USHRT_MAX;
> +	lim->max_copy_nr_ranges = USHRT_MAX;
>  }
>  EXPORT_SYMBOL(blk_set_stacking_limits);
>  
> @@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_queue_max_discard_sectors);
>  
> +/**
> + * blk_queue_max_copy_sectors - set max sectors for a single copy payload
> + * @q:  the request queue for the device
> + * @max_copy_sectors: maximum number of sectors to copy
> + **/
> +void blk_queue_max_copy_sectors(struct request_queue *q,

This should be blk_queue_max_copy_hw_sectors().

> +		unsigned int max_copy_sectors)
> +{
> +	q->limits.max_hw_copy_sectors = max_copy_sectors;
> +	q->limits.max_copy_sectors = max_copy_sectors;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> +
> +/**
> + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
> + * @q:  the request queue for the device
> + * @max_copy_range_sectors: maximum number of sectors to copy in a single range
> + **/
> +void blk_queue_max_copy_range_sectors(struct request_queue *q,

And this should be blk_queue_max_copy_range_hw_sectors(). Etc for the
other ones below.

> +		unsigned int max_copy_range_sectors)
> +{
> +	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
> +	q->limits.max_copy_range_sectors = max_copy_range_sectors;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
> +
> +/**
> + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
> + * @q:  the request queue for the device
> + * @max_copy_nr_ranges: maximum number of ranges
> + **/
> +void blk_queue_max_copy_nr_ranges(struct request_queue *q,
> +		unsigned int max_copy_nr_ranges)
> +{
> +	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
> +	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
> +
>  /**
>   * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
>   * @q:  the request queue for the device
> @@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  	t->max_segment_size = min_not_zero(t->max_segment_size,
>  					   b->max_segment_size);
>  
> +	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
> +	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
> +	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
> +	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
> +						b->max_hw_copy_range_sectors);
> +	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
> +	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
> +
>  	t->misaligned |= b->misaligned;
>  
>  	alignment = queue_limit_alignment_offset(b, start);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 88bd41d4cb59..bae987c10f7f 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
>  	return queue_var_show(0, page);
>  }
>  
> +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(blk_queue_copy(q), page);
> +}
> +
> +static ssize_t queue_copy_offload_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long copy_offload;
> +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (copy_offload && !q->limits.max_hw_copy_sectors)
> +		return -EINVAL;
> +
> +	if (copy_offload)
> +		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
> +
> +	return ret;
> +}
> +
> +static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_max_show(struct request_queue *q, char *page> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_copy_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_max_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long max_copy;
> +	ssize_t ret = queue_var_store(&max_copy, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (max_copy & (queue_logical_block_size(q) - 1))
> +		return -EINVAL;
> +
> +	max_copy >>= 9;
> +	if (max_copy > q->limits.max_hw_copy_sectors)
> +		max_copy = q->limits.max_hw_copy_sectors;
> +
> +	q->limits.max_copy_sectors = max_copy;
> +	return ret;
> +}
> +
> +static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_range_max_show(struct request_queue *q,
> +		char *page)
> +{
> +	return sprintf(page, "%llu\n",
> +		(unsigned long long)q->limits.max_copy_range_sectors << 9);
> +}
> +
> +static ssize_t queue_copy_range_max_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long max_copy;
> +	ssize_t ret = queue_var_store(&max_copy, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (max_copy & (queue_logical_block_size(q) - 1))
> +		return -EINVAL;
> +
> +	max_copy >>= 9;
> +	if (max_copy > UINT_MAX)

On 32-bits arch, unsigned long and unsigned int are the same so this test
is useless for these arch. Better have max_copy declared as unsigned long
long.

> +		return -EINVAL;
> +
> +	if (max_copy > q->limits.max_hw_copy_range_sectors)
> +		max_copy = q->limits.max_hw_copy_range_sectors;
> +
> +	q->limits.max_copy_range_sectors = max_copy;
> +	return ret;
> +}
> +
> +static ssize_t queue_copy_nr_ranges_max_hw_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->limits.max_hw_copy_nr_ranges, page);
> +}
> +
> +static ssize_t queue_copy_nr_ranges_max_show(struct request_queue *q,
> +		char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_nr_ranges, page);
> +}
> +
> +static ssize_t queue_copy_nr_ranges_max_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long max_nr;
> +	ssize_t ret = queue_var_store(&max_nr, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (max_nr > USHRT_MAX)
> +		return -EINVAL;
> +
> +	if (max_nr > q->limits.max_hw_copy_nr_ranges)
> +		max_nr = q->limits.max_hw_copy_nr_ranges;
> +
> +	q->limits.max_copy_nr_ranges = max_nr;
> +	return ret;
> +}
> +
>  static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
>  {
>  	return queue_var_show(0, page);
> @@ -596,6 +719,14 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
>  QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
>  QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
>  
> +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
> +QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_hw_bytes");
> +QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes");
> +QUEUE_RO_ENTRY(queue_copy_range_max_hw, "copy_max_range_hw_bytes");
> +QUEUE_RW_ENTRY(queue_copy_range_max, "copy_max_range_bytes");
> +QUEUE_RO_ENTRY(queue_copy_nr_ranges_max_hw, "copy_max_nr_ranges_hw");
> +QUEUE_RW_ENTRY(queue_copy_nr_ranges_max, "copy_max_nr_ranges");
> +
>  QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
>  QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
>  QUEUE_RW_ENTRY(queue_poll, "io_poll");
> @@ -642,6 +773,13 @@ static struct attribute *queue_attrs[] = {
>  	&queue_discard_max_entry.attr,
>  	&queue_discard_max_hw_entry.attr,
>  	&queue_discard_zeroes_data_entry.attr,
> +	&queue_copy_offload_entry.attr,
> +	&queue_copy_max_hw_entry.attr,
> +	&queue_copy_max_entry.attr,
> +	&queue_copy_range_max_hw_entry.attr,
> +	&queue_copy_range_max_entry.attr,
> +	&queue_copy_nr_ranges_max_hw_entry.attr,
> +	&queue_copy_nr_ranges_max_entry.attr,
>  	&queue_write_same_max_entry.attr,
>  	&queue_write_zeroes_max_entry.attr,
>  	&queue_zone_append_max_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1b24c1fb3bb1..3596fd37fae7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -270,6 +270,13 @@ struct queue_limits {
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
>  
> +	unsigned long		max_hw_copy_sectors;
> +	unsigned long		max_copy_sectors;
> +	unsigned int		max_hw_copy_range_sectors;
> +	unsigned int		max_copy_range_sectors;
> +	unsigned short		max_hw_copy_nr_ranges;
> +	unsigned short		max_copy_nr_ranges;
> +
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
>  	unsigned short		max_discard_segments;
> @@ -574,6 +581,7 @@ struct request_queue {
>  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
>  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
>  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> +#define QUEUE_FLAG_COPY		30	/* supports copy offload */
>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> @@ -596,6 +604,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
>  #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
>  #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
> +#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
>  #define blk_queue_zone_resetall(q)	\
>  	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
>  #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> @@ -960,6 +969,10 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
>  extern void blk_queue_max_segments(struct request_queue *, unsigned short);
>  extern void blk_queue_max_discard_segments(struct request_queue *,
>  		unsigned short);
> +extern void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int max_copy_sectors);
> +extern void blk_queue_max_copy_range_sectors(struct request_queue *q,
> +		unsigned int max_copy_range_sectors);
> +extern void blk_queue_max_copy_nr_ranges(struct request_queue *q, unsigned int max_copy_nr_ranges);
>  void blk_queue_max_secure_erase_sectors(struct request_queue *q,
>  		unsigned int max_sectors);
>  extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-26 10:12   ` Nitesh Shetty
@ 2022-04-27  2:00     ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:00 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart,
	Chaitanya Kulkarni, Naohiro Aota, Johannes Thumshirn,
	Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.

Please reduce the distribution list. List servers (and email clients) are
complaining about it being too large.

> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
>     block and file backend)
> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
>     two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> - copy_file_range for zonefs
> 
> 4. In-kernel user
> - dm-kcopyd
> - copy_file_range in zonefs
> 
> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> testing is done at this point using a custom application for unit testing.
> 
> Appreciate the inputs on plumbing and how to test this further?
> Perhaps some of it can be discussed during LSF/MM too.
> 
> [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
> 
> Changes in v4:
> - added copy_file_range support for zonefs
> - added documentaion about new sysfs entries
> - incorporated review comments on v3
> - minor fixes
> 
> 
> Arnav Dawn (2):
>   nvmet: add copy command support for bdev and file ns
>   fs: add support for copy file range in zonefs
> 
> Nitesh Shetty (7):
>   block: Introduce queue limits for copy-offload support
>   block: Add copy offload support infrastructure
>   block: Introduce a new ioctl for copy
>   block: add emulation for copy
>   nvme: add copy offload support
>   dm: Add support for copy offload.
>   dm: Enable copy offload for dm-linear target
> 
> SelvaKumar S (1):
>   dm kcopyd: use copy offload support
> 
>  Documentation/ABI/stable/sysfs-block |  83 +++++++
>  block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
>  block/blk-map.c                      |   2 +-
>  block/blk-settings.c                 |  59 +++++
>  block/blk-sysfs.c                    | 138 +++++++++++
>  block/blk.h                          |   2 +
>  block/ioctl.c                        |  32 +++
>  drivers/md/dm-kcopyd.c               |  55 +++-
>  drivers/md/dm-linear.c               |   1 +
>  drivers/md/dm-table.c                |  45 ++++
>  drivers/md/dm.c                      |   6 +
>  drivers/nvme/host/core.c             | 116 ++++++++-
>  drivers/nvme/host/fc.c               |   4 +
>  drivers/nvme/host/nvme.h             |   7 +
>  drivers/nvme/host/pci.c              |  25 ++
>  drivers/nvme/host/rdma.c             |   6 +
>  drivers/nvme/host/tcp.c              |  14 ++
>  drivers/nvme/host/trace.c            |  19 ++
>  drivers/nvme/target/admin-cmd.c      |   8 +-
>  drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
>  drivers/nvme/target/io-cmd-file.c    |  49 ++++
>  fs/zonefs/super.c                    | 178 ++++++++++++-
>  fs/zonefs/zonefs.h                   |   1 +
>  include/linux/blk_types.h            |  21 ++
>  include/linux/blkdev.h               |  17 ++
>  include/linux/device-mapper.h        |   5 +
>  include/linux/nvme.h                 |  43 +++-
>  include/uapi/linux/fs.h              |  23 ++
>  28 files changed, 1367 insertions(+), 15 deletions(-)
> 
> 
> base-commit: e7d6987e09a328d4a949701db40ef63fbb970670


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27  2:00     ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:00 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, jack, linux-fsdevel,
	lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.

Please reduce the distribution list. List servers (and email clients) are
complaining about it being too large.

> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
>     block and file backend)
> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
>     two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> - copy_file_range for zonefs
> 
> 4. In-kernel user
> - dm-kcopyd
> - copy_file_range in zonefs
> 
> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> testing is done at this point using a custom application for unit testing.
> 
> Appreciate the inputs on plumbing and how to test this further?
> Perhaps some of it can be discussed during LSF/MM too.
> 
> [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
> 
> Changes in v4:
> - added copy_file_range support for zonefs
> - added documentaion about new sysfs entries
> - incorporated review comments on v3
> - minor fixes
> 
> 
> Arnav Dawn (2):
>   nvmet: add copy command support for bdev and file ns
>   fs: add support for copy file range in zonefs
> 
> Nitesh Shetty (7):
>   block: Introduce queue limits for copy-offload support
>   block: Add copy offload support infrastructure
>   block: Introduce a new ioctl for copy
>   block: add emulation for copy
>   nvme: add copy offload support
>   dm: Add support for copy offload.
>   dm: Enable copy offload for dm-linear target
> 
> SelvaKumar S (1):
>   dm kcopyd: use copy offload support
> 
>  Documentation/ABI/stable/sysfs-block |  83 +++++++
>  block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
>  block/blk-map.c                      |   2 +-
>  block/blk-settings.c                 |  59 +++++
>  block/blk-sysfs.c                    | 138 +++++++++++
>  block/blk.h                          |   2 +
>  block/ioctl.c                        |  32 +++
>  drivers/md/dm-kcopyd.c               |  55 +++-
>  drivers/md/dm-linear.c               |   1 +
>  drivers/md/dm-table.c                |  45 ++++
>  drivers/md/dm.c                      |   6 +
>  drivers/nvme/host/core.c             | 116 ++++++++-
>  drivers/nvme/host/fc.c               |   4 +
>  drivers/nvme/host/nvme.h             |   7 +
>  drivers/nvme/host/pci.c              |  25 ++
>  drivers/nvme/host/rdma.c             |   6 +
>  drivers/nvme/host/tcp.c              |  14 ++
>  drivers/nvme/host/trace.c            |  19 ++
>  drivers/nvme/target/admin-cmd.c      |   8 +-
>  drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
>  drivers/nvme/target/io-cmd-file.c    |  49 ++++
>  fs/zonefs/super.c                    | 178 ++++++++++++-
>  fs/zonefs/zonefs.h                   |   1 +
>  include/linux/blk_types.h            |  21 ++
>  include/linux/blkdev.h               |  17 ++
>  include/linux/device-mapper.h        |   5 +
>  include/linux/nvme.h                 |  43 +++-
>  include/uapi/linux/fs.h              |  23 ++
>  28 files changed, 1367 insertions(+), 15 deletions(-)
> 
> 
> base-commit: e7d6987e09a328d4a949701db40ef63fbb970670


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-26 10:12   ` Nitesh Shetty
@ 2022-04-27  2:19     ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:19 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart,
	Chaitanya Kulkarni, Naohiro Aota, Johannes Thumshirn,
	Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.
> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
>     block and file backend)

It would also be nice to have copy offload emulation in null_blk for testing.

> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
>     two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> - copy_file_range for zonefs
> 
> 4. In-kernel user
> - dm-kcopyd
> - copy_file_range in zonefs
> 
> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> testing is done at this point using a custom application for unit testing.
> 
> Appreciate the inputs on plumbing and how to test this further?
> Perhaps some of it can be discussed during LSF/MM too.
> 
> [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
> 
> Changes in v4:
> - added copy_file_range support for zonefs
> - added documentaion about new sysfs entries
> - incorporated review comments on v3
> - minor fixes
> 
> 
> Arnav Dawn (2):
>   nvmet: add copy command support for bdev and file ns
>   fs: add support for copy file range in zonefs
> 
> Nitesh Shetty (7):
>   block: Introduce queue limits for copy-offload support
>   block: Add copy offload support infrastructure
>   block: Introduce a new ioctl for copy
>   block: add emulation for copy
>   nvme: add copy offload support
>   dm: Add support for copy offload.
>   dm: Enable copy offload for dm-linear target
> 
> SelvaKumar S (1):
>   dm kcopyd: use copy offload support
> 
>  Documentation/ABI/stable/sysfs-block |  83 +++++++
>  block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
>  block/blk-map.c                      |   2 +-
>  block/blk-settings.c                 |  59 +++++
>  block/blk-sysfs.c                    | 138 +++++++++++
>  block/blk.h                          |   2 +
>  block/ioctl.c                        |  32 +++
>  drivers/md/dm-kcopyd.c               |  55 +++-
>  drivers/md/dm-linear.c               |   1 +
>  drivers/md/dm-table.c                |  45 ++++
>  drivers/md/dm.c                      |   6 +
>  drivers/nvme/host/core.c             | 116 ++++++++-
>  drivers/nvme/host/fc.c               |   4 +
>  drivers/nvme/host/nvme.h             |   7 +
>  drivers/nvme/host/pci.c              |  25 ++
>  drivers/nvme/host/rdma.c             |   6 +
>  drivers/nvme/host/tcp.c              |  14 ++
>  drivers/nvme/host/trace.c            |  19 ++
>  drivers/nvme/target/admin-cmd.c      |   8 +-
>  drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
>  drivers/nvme/target/io-cmd-file.c    |  49 ++++
>  fs/zonefs/super.c                    | 178 ++++++++++++-
>  fs/zonefs/zonefs.h                   |   1 +
>  include/linux/blk_types.h            |  21 ++
>  include/linux/blkdev.h               |  17 ++
>  include/linux/device-mapper.h        |   5 +
>  include/linux/nvme.h                 |  43 +++-
>  include/uapi/linux/fs.h              |  23 ++
>  28 files changed, 1367 insertions(+), 15 deletions(-)
> 
> 
> base-commit: e7d6987e09a328d4a949701db40ef63fbb970670


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27  2:19     ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:19 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, jack, linux-fsdevel,
	lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.
> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
>     block and file backend)

It would also be nice to have copy offload emulation in null_blk for testing.

> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
>     two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> - copy_file_range for zonefs
> 
> 4. In-kernel user
> - dm-kcopyd
> - copy_file_range in zonefs
> 
> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> testing is done at this point using a custom application for unit testing.
> 
> Appreciate the inputs on plumbing and how to test this further?
> Perhaps some of it can be discussed during LSF/MM too.
> 
> [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
> 
> Changes in v4:
> - added copy_file_range support for zonefs
> - added documentaion about new sysfs entries
> - incorporated review comments on v3
> - minor fixes
> 
> 
> Arnav Dawn (2):
>   nvmet: add copy command support for bdev and file ns
>   fs: add support for copy file range in zonefs
> 
> Nitesh Shetty (7):
>   block: Introduce queue limits for copy-offload support
>   block: Add copy offload support infrastructure
>   block: Introduce a new ioctl for copy
>   block: add emulation for copy
>   nvme: add copy offload support
>   dm: Add support for copy offload.
>   dm: Enable copy offload for dm-linear target
> 
> SelvaKumar S (1):
>   dm kcopyd: use copy offload support
> 
>  Documentation/ABI/stable/sysfs-block |  83 +++++++
>  block/blk-lib.c                      | 358 +++++++++++++++++++++++++++
>  block/blk-map.c                      |   2 +-
>  block/blk-settings.c                 |  59 +++++
>  block/blk-sysfs.c                    | 138 +++++++++++
>  block/blk.h                          |   2 +
>  block/ioctl.c                        |  32 +++
>  drivers/md/dm-kcopyd.c               |  55 +++-
>  drivers/md/dm-linear.c               |   1 +
>  drivers/md/dm-table.c                |  45 ++++
>  drivers/md/dm.c                      |   6 +
>  drivers/nvme/host/core.c             | 116 ++++++++-
>  drivers/nvme/host/fc.c               |   4 +
>  drivers/nvme/host/nvme.h             |   7 +
>  drivers/nvme/host/pci.c              |  25 ++
>  drivers/nvme/host/rdma.c             |   6 +
>  drivers/nvme/host/tcp.c              |  14 ++
>  drivers/nvme/host/trace.c            |  19 ++
>  drivers/nvme/target/admin-cmd.c      |   8 +-
>  drivers/nvme/target/io-cmd-bdev.c    |  65 +++++
>  drivers/nvme/target/io-cmd-file.c    |  49 ++++
>  fs/zonefs/super.c                    | 178 ++++++++++++-
>  fs/zonefs/zonefs.h                   |   1 +
>  include/linux/blk_types.h            |  21 ++
>  include/linux/blkdev.h               |  17 ++
>  include/linux/device-mapper.h        |   5 +
>  include/linux/nvme.h                 |  43 +++-
>  include/uapi/linux/fs.h              |  23 ++
>  28 files changed, 1367 insertions(+), 15 deletions(-)
> 
> 
> base-commit: e7d6987e09a328d4a949701db40ef63fbb970670


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27  2:45         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:45 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Arnav Dawn, Alasdair Kergon, Mike Snitzer, Sagi Grimberg,
	James Smart, Chaitanya Kulkarni, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> Introduce blkdev_issue_copy which supports source and destination bdevs,
> and an array of (source, destination and copy length) tuples.
> Introduce REQ_COPY copy offload operation flag. Create a read-write
> bio pair with a token as payload and submitted to the device in order.
> Read request populates token with source specific information which
> is then passed with write request.
> This design is courtesy Mikulas Patocka's token based copy
> 
> Larger copy will be divided, based on max_copy_sectors,
> max_copy_range_sector limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
>  block/blk.h               |   2 +
>  include/linux/blk_types.h |  21 ++++
>  include/linux/blkdev.h    |   2 +
>  include/uapi/linux/fs.h   |  14 +++
>  5 files changed, 271 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 09b7e1200c0f..ba9da2d2f429 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  }
>  EXPORT_SYMBOL(blkdev_issue_discard);
>  
> +/*
> + * Wait on and process all in-flight BIOs.  This must only be called once
> + * all bios have been issued so that the refcount can only decrease.
> + * This just waits for all bios to make it through bio_copy_end_io. IO
> + * errors are propagated through cio->io_error.
> + */
> +static int cio_await_completion(struct cio *cio)
> +{
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (cio->refcount) {
> +		cio->waiter = current;
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		spin_unlock_irqrestore(&cio->lock, flags);
> +		blk_io_schedule();
> +		/* wake up sets us TASK_RUNNING */
> +		spin_lock_irqsave(&cio->lock, flags);
> +		cio->waiter = NULL;
> +		ret = cio->io_err;
> +	}
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	kvfree(cio);

cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?

> +
> +	return ret;
> +}
> +
> +static void bio_copy_end_io(struct bio *bio)
> +{
> +	struct copy_ctx *ctx = bio->bi_private;
> +	struct cio *cio = ctx->cio;
> +	sector_t clen;
> +	int ri = ctx->range_idx;
> +	unsigned long flags;
> +	bool wake = false;
> +
> +	if (bio->bi_status) {
> +		cio->io_err = bio->bi_status;
> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);

long line.

> +	}
> +	__free_page(bio->bi_io_vec[0].bv_page);
> +	kfree(ctx);
> +	bio_put(bio);
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (((--cio->refcount) <= 0) && cio->waiter)
> +		wake = true;
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	if (wake)
> +		wake_up_process(cio->waiter);
> +}
> +
> +/*
> + * blk_copy_offload	- Use device's native copy offload feature
> + * Go through user provide payload, prepare new payload based on device's copy offload limits.

long line.

> + */
> +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)

long line.

rlist is an array, but rlist naming implies a list. Why not call that
argument "ranges" ?

The argument ordering is also strange. I would make that:

blk_copy_offload(struct block_device *src_bdev,
	         struct block_device *dst_bdev,
		 struct range_entry *rlist, int nr_srcs,
		 gfp_t gfp_mask)

> +{
> +	struct request_queue *sq = bdev_get_queue(src_bdev);
> +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> +	struct bio *read_bio, *write_bio;
> +	struct copy_ctx *ctx;
> +	struct cio *cio;
> +	struct page *token;
> +	sector_t src_blk, copy_len, dst_blk;
> +	sector_t remaining, max_copy_len = LONG_MAX;
> +	unsigned long flags;
> +	int ri = 0, ret = 0;
> +
> +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> +	if (!cio)
> +		return -ENOMEM;
> +	cio->rlist = rlist;
> +	spin_lock_init(&cio->lock);
> +
> +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;

But max_copy_range_sectors is for one sector only, right ? So what is this
second min3() doing ? It is mixing up total length and one range length.
The device should not have reported a per range max length larger than the
total length in the first place, right ? If it does, that would be a very
starnge device...

> +
> +	for (ri = 0; ri < nr_srcs; ri++) {
> +		cio->rlist[ri].comp_len = rlist[ri].len;
> +		src_blk = rlist[ri].src;
> +		dst_blk = rlist[ri].dst;
> +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> +			copy_len = min(remaining, max_copy_len);
> +
> +			token = alloc_page(gfp_mask);
> +			if (unlikely(!token)) {
> +				ret = -ENOMEM;
> +				goto err_token;
> +			}
> +
> +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> +			if (!ctx) {
> +				ret = -ENOMEM;
> +				goto err_ctx;
> +			}
> +			ctx->cio = cio;
> +			ctx->range_idx = ri;
> +			ctx->start_sec = dst_blk;
> +
> +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!read_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			read_bio->bi_iter.bi_size = copy_len;
> +			ret = submit_bio_wait(read_bio);
> +			bio_put(read_bio);
> +			if (ret)
> +				goto err_read_bio;
> +
> +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!write_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			write_bio->bi_iter.bi_size = copy_len;
> +			write_bio->bi_end_io = bio_copy_end_io;
> +			write_bio->bi_private = ctx;
> +
> +			spin_lock_irqsave(&cio->lock, flags);
> +			++cio->refcount;

Shouldn't this be an atomic_t ?

And wrap lines please. Many are too long.

> +			spin_unlock_irqrestore(&cio->lock, flags);
> +
> +			submit_bio(write_bio);
> +			src_blk += copy_len;
> +			dst_blk += copy_len;
> +		}
> +	}
> +
> +	/* Wait for completion of all IO's*/
> +	return cio_await_completion(cio);
> +
> +err_read_bio:
> +	kfree(ctx);
> +err_ctx:
> +	__free_page(token);
> +err_token:
> +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> +
> +	cio->io_err = ret;
> +	return cio_await_completion(cio);
> +}
> +
> +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> +{
> +	unsigned int align_mask = max(
> +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> +	sector_t len = 0;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		if (rlist[i].len)
> +			len += rlist[i].len;
> +		else
> +			return -EINVAL;

Reverse the if condition and return to avoid the else.

> +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> +				(rlist[i].len & align_mask))
> +			return -EINVAL;
> +		rlist[i].comp_len = 0;
> +	}
> +
> +	if (len && len >= MAX_COPY_TOTAL_LENGTH)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> +		struct request_queue *dest_q)
> +{
> +	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
> +		return true;
> +
> +	return false;

return blk_queue_copy(dest_q) && blk_queue_copy(src_q);

would be simpler.

> +}
> +
> +/*
> + * blkdev_issue_copy - queue a copy
> + * @src_bdev:	source block device
> + * @nr_srcs:	number of source ranges to copy
> + * @rlist:	array of source/dest/len
> + * @dest_bdev:	destination block device
> + * @gfp_mask:   memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *	Copy source ranges from source block device to destination block device.
> + *	length of a source range cannot be zero.
> + */
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> +		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)

same comment as above about args order and naming.

> +{
> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> +	int ret = -EINVAL;
> +
> +	if (!src_q || !dest_q)
> +		return -ENXIO;
> +
> +	if (!nr)
> +		return -EINVAL;
> +
> +	if (nr >= MAX_COPY_NR_RANGE)
> +		return -EINVAL;

Where do you check the number of ranges against what the device can do ?

> +
> +	if (bdev_read_only(dest_bdev))
> +		return -EPERM;
> +
> +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> +	if (ret)
> +		return ret;

nr check should be in this function...

> +
> +	if (blk_check_copy_offload(src_q, dest_q))

...which should be only one function with this one.

> +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_issue_copy);
> +
>  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>  		struct bio **biop, unsigned flags)
> diff --git a/block/blk.h b/block/blk.h
> index 434017701403..6010eda58c70 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
>  		break;
>  	}
>  
> +	if (unlikely(op_is_copy(bio->bi_opf)))
> +		return false;
>  	/*
>  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
>  	 * This is a quick and dirty check that relies on the fact that
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index c62274466e72..f5b01f284c43 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -418,6 +418,7 @@ enum req_flag_bits {
>  	/* for driver use */
>  	__REQ_DRV,
>  	__REQ_SWAP,		/* swapping request. */
> +	__REQ_COPY,		/* copy request */
>  	__REQ_NR_BITS,		/* stops here */
>  };
>  
> @@ -443,6 +444,7 @@ enum req_flag_bits {
>  
>  #define REQ_DRV			(1ULL << __REQ_DRV)
>  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> +#define REQ_COPY		(1ULL << __REQ_COPY)
>  
>  #define REQ_FAILFAST_MASK \
>  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> @@ -459,6 +461,11 @@ enum stat_group {
>  	NR_STAT_GROUPS
>  };
>  
> +static inline bool op_is_copy(unsigned int op)
> +{
> +	return (op & REQ_COPY);
> +}
> +
>  #define bio_op(bio) \
>  	((bio)->bi_opf & REQ_OP_MASK)
>  
> @@ -533,4 +540,18 @@ struct blk_rq_stat {
>  	u64 batch;
>  };
>  
> +struct cio {
> +	struct range_entry *rlist;

naming... This is an array, right ?

> +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> +	spinlock_t lock;		/* protects refcount and waiter */
> +	int refcount;
> +	blk_status_t io_err;
> +};
> +
> +struct copy_ctx {
> +	int range_idx;
> +	sector_t start_sec;
> +	struct cio *cio;
> +};
> +
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 3596fd37fae7..c6cb3fe82ba2 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp);
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
>  
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index bdf7b404b3e7..822c28cebf3a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,20 @@ struct fstrim_range {
>  	__u64 minlen;
>  };
>  
> +/* Maximum no of entries supported */
> +#define MAX_COPY_NR_RANGE	(1 << 12)

This value should be used also when setting the limits in the previous
patch. max_copy_nr_ranges and max_hw_copy_nr_ranges must be bounded by it.

> +
> +/* maximum total copy length */
> +#define MAX_COPY_TOTAL_LENGTH	(1 << 27)

Same for this one. And where does this magic number come from ?

> +
> +/* Source range entry for copy */
> +struct range_entry {
> +	__u64 src;
> +	__u64 dst;
> +	__u64 len;
> +	__u64 comp_len;

Please describe the fields of this structure. The meaning of them is
really not clear from the names.

> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-27  2:45         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:45 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, Arnav Dawn, jack,
	linux-fsdevel, lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> Introduce blkdev_issue_copy which supports source and destination bdevs,
> and an array of (source, destination and copy length) tuples.
> Introduce REQ_COPY copy offload operation flag. Create a read-write
> bio pair with a token as payload and submitted to the device in order.
> Read request populates token with source specific information which
> is then passed with write request.
> This design is courtesy Mikulas Patocka's token based copy
> 
> Larger copy will be divided, based on max_copy_sectors,
> max_copy_range_sector limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
>  block/blk.h               |   2 +
>  include/linux/blk_types.h |  21 ++++
>  include/linux/blkdev.h    |   2 +
>  include/uapi/linux/fs.h   |  14 +++
>  5 files changed, 271 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 09b7e1200c0f..ba9da2d2f429 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  }
>  EXPORT_SYMBOL(blkdev_issue_discard);
>  
> +/*
> + * Wait on and process all in-flight BIOs.  This must only be called once
> + * all bios have been issued so that the refcount can only decrease.
> + * This just waits for all bios to make it through bio_copy_end_io. IO
> + * errors are propagated through cio->io_error.
> + */
> +static int cio_await_completion(struct cio *cio)
> +{
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (cio->refcount) {
> +		cio->waiter = current;
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		spin_unlock_irqrestore(&cio->lock, flags);
> +		blk_io_schedule();
> +		/* wake up sets us TASK_RUNNING */
> +		spin_lock_irqsave(&cio->lock, flags);
> +		cio->waiter = NULL;
> +		ret = cio->io_err;
> +	}
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	kvfree(cio);

cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?

> +
> +	return ret;
> +}
> +
> +static void bio_copy_end_io(struct bio *bio)
> +{
> +	struct copy_ctx *ctx = bio->bi_private;
> +	struct cio *cio = ctx->cio;
> +	sector_t clen;
> +	int ri = ctx->range_idx;
> +	unsigned long flags;
> +	bool wake = false;
> +
> +	if (bio->bi_status) {
> +		cio->io_err = bio->bi_status;
> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);

long line.

> +	}
> +	__free_page(bio->bi_io_vec[0].bv_page);
> +	kfree(ctx);
> +	bio_put(bio);
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (((--cio->refcount) <= 0) && cio->waiter)
> +		wake = true;
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	if (wake)
> +		wake_up_process(cio->waiter);
> +}
> +
> +/*
> + * blk_copy_offload	- Use device's native copy offload feature
> + * Go through user provide payload, prepare new payload based on device's copy offload limits.

long line.

> + */
> +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)

long line.

rlist is an array, but rlist naming implies a list. Why not call that
argument "ranges" ?

The argument ordering is also strange. I would make that:

blk_copy_offload(struct block_device *src_bdev,
	         struct block_device *dst_bdev,
		 struct range_entry *rlist, int nr_srcs,
		 gfp_t gfp_mask)

> +{
> +	struct request_queue *sq = bdev_get_queue(src_bdev);
> +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> +	struct bio *read_bio, *write_bio;
> +	struct copy_ctx *ctx;
> +	struct cio *cio;
> +	struct page *token;
> +	sector_t src_blk, copy_len, dst_blk;
> +	sector_t remaining, max_copy_len = LONG_MAX;
> +	unsigned long flags;
> +	int ri = 0, ret = 0;
> +
> +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> +	if (!cio)
> +		return -ENOMEM;
> +	cio->rlist = rlist;
> +	spin_lock_init(&cio->lock);
> +
> +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;

But max_copy_range_sectors is for one sector only, right ? So what is this
second min3() doing ? It is mixing up total length and one range length.
The device should not have reported a per range max length larger than the
total length in the first place, right ? If it does, that would be a very
starnge device...

> +
> +	for (ri = 0; ri < nr_srcs; ri++) {
> +		cio->rlist[ri].comp_len = rlist[ri].len;
> +		src_blk = rlist[ri].src;
> +		dst_blk = rlist[ri].dst;
> +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> +			copy_len = min(remaining, max_copy_len);
> +
> +			token = alloc_page(gfp_mask);
> +			if (unlikely(!token)) {
> +				ret = -ENOMEM;
> +				goto err_token;
> +			}
> +
> +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> +			if (!ctx) {
> +				ret = -ENOMEM;
> +				goto err_ctx;
> +			}
> +			ctx->cio = cio;
> +			ctx->range_idx = ri;
> +			ctx->start_sec = dst_blk;
> +
> +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!read_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			read_bio->bi_iter.bi_size = copy_len;
> +			ret = submit_bio_wait(read_bio);
> +			bio_put(read_bio);
> +			if (ret)
> +				goto err_read_bio;
> +
> +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!write_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			write_bio->bi_iter.bi_size = copy_len;
> +			write_bio->bi_end_io = bio_copy_end_io;
> +			write_bio->bi_private = ctx;
> +
> +			spin_lock_irqsave(&cio->lock, flags);
> +			++cio->refcount;

Shouldn't this be an atomic_t ?

And wrap lines please. Many are too long.

> +			spin_unlock_irqrestore(&cio->lock, flags);
> +
> +			submit_bio(write_bio);
> +			src_blk += copy_len;
> +			dst_blk += copy_len;
> +		}
> +	}
> +
> +	/* Wait for completion of all IO's*/
> +	return cio_await_completion(cio);
> +
> +err_read_bio:
> +	kfree(ctx);
> +err_ctx:
> +	__free_page(token);
> +err_token:
> +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> +
> +	cio->io_err = ret;
> +	return cio_await_completion(cio);
> +}
> +
> +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> +{
> +	unsigned int align_mask = max(
> +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> +	sector_t len = 0;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		if (rlist[i].len)
> +			len += rlist[i].len;
> +		else
> +			return -EINVAL;

Reverse the if condition and return to avoid the else.

> +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> +				(rlist[i].len & align_mask))
> +			return -EINVAL;
> +		rlist[i].comp_len = 0;
> +	}
> +
> +	if (len && len >= MAX_COPY_TOTAL_LENGTH)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> +		struct request_queue *dest_q)
> +{
> +	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
> +		return true;
> +
> +	return false;

return blk_queue_copy(dest_q) && blk_queue_copy(src_q);

would be simpler.

> +}
> +
> +/*
> + * blkdev_issue_copy - queue a copy
> + * @src_bdev:	source block device
> + * @nr_srcs:	number of source ranges to copy
> + * @rlist:	array of source/dest/len
> + * @dest_bdev:	destination block device
> + * @gfp_mask:   memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *	Copy source ranges from source block device to destination block device.
> + *	length of a source range cannot be zero.
> + */
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> +		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)

same comment as above about args order and naming.

> +{
> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> +	int ret = -EINVAL;
> +
> +	if (!src_q || !dest_q)
> +		return -ENXIO;
> +
> +	if (!nr)
> +		return -EINVAL;
> +
> +	if (nr >= MAX_COPY_NR_RANGE)
> +		return -EINVAL;

Where do you check the number of ranges against what the device can do ?

> +
> +	if (bdev_read_only(dest_bdev))
> +		return -EPERM;
> +
> +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> +	if (ret)
> +		return ret;

nr check should be in this function...

> +
> +	if (blk_check_copy_offload(src_q, dest_q))

...which should be only one function with this one.

> +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_issue_copy);
> +
>  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>  		struct bio **biop, unsigned flags)
> diff --git a/block/blk.h b/block/blk.h
> index 434017701403..6010eda58c70 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
>  		break;
>  	}
>  
> +	if (unlikely(op_is_copy(bio->bi_opf)))
> +		return false;
>  	/*
>  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
>  	 * This is a quick and dirty check that relies on the fact that
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index c62274466e72..f5b01f284c43 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -418,6 +418,7 @@ enum req_flag_bits {
>  	/* for driver use */
>  	__REQ_DRV,
>  	__REQ_SWAP,		/* swapping request. */
> +	__REQ_COPY,		/* copy request */
>  	__REQ_NR_BITS,		/* stops here */
>  };
>  
> @@ -443,6 +444,7 @@ enum req_flag_bits {
>  
>  #define REQ_DRV			(1ULL << __REQ_DRV)
>  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> +#define REQ_COPY		(1ULL << __REQ_COPY)
>  
>  #define REQ_FAILFAST_MASK \
>  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> @@ -459,6 +461,11 @@ enum stat_group {
>  	NR_STAT_GROUPS
>  };
>  
> +static inline bool op_is_copy(unsigned int op)
> +{
> +	return (op & REQ_COPY);
> +}
> +
>  #define bio_op(bio) \
>  	((bio)->bi_opf & REQ_OP_MASK)
>  
> @@ -533,4 +540,18 @@ struct blk_rq_stat {
>  	u64 batch;
>  };
>  
> +struct cio {
> +	struct range_entry *rlist;

naming... This is an array, right ?

> +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> +	spinlock_t lock;		/* protects refcount and waiter */
> +	int refcount;
> +	blk_status_t io_err;
> +};
> +
> +struct copy_ctx {
> +	int range_idx;
> +	sector_t start_sec;
> +	struct cio *cio;
> +};
> +
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 3596fd37fae7..c6cb3fe82ba2 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp);
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
>  
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index bdf7b404b3e7..822c28cebf3a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,20 @@ struct fstrim_range {
>  	__u64 minlen;
>  };
>  
> +/* Maximum no of entries supported */
> +#define MAX_COPY_NR_RANGE	(1 << 12)

This value should be used also when setting the limits in the previous
patch. max_copy_nr_ranges and max_hw_copy_nr_ranges must be bounded by it.

> +
> +/* maximum total copy length */
> +#define MAX_COPY_TOTAL_LENGTH	(1 << 27)

Same for this one. And where does this magic number come from ?

> +
> +/* Source range entry for copy */
> +struct range_entry {
> +	__u64 src;
> +	__u64 dst;
> +	__u64 len;
> +	__u64 comp_len;

Please describe the fields of this structure. The meaning of them is
really not clear from the names.

> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 03/10] block: Introduce a new ioctl for copy
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27  2:48         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:48 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	hare, kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong,
	josef, clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Javier González, Arnav Dawn, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 19:12, Nitesh Shetty wrote:
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to one or more destination in a device. COPY ioctl accepts a 'copy_range'
> structure that contains no of range, a reserved field , followed by an
> array of ranges. Each source range is represented by 'range_entry' that
> contains source start offset, destination start offset and length of
> source ranges (in bytes)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three entries with [dest,src,len],
> * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */
> 
> int main(void)
> {
> 	int i, ret, fd;
> 	unsigned long src = 0, dst = 32768, len = 4096;
> 	struct copy_range *cr;
> 	cr = (struct copy_range *)malloc(sizeof(*cr)+
> 					(sizeof(struct range_entry)*3));
> 	cr->nr_range = 3;
> 	cr->reserved = 0;
> 	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
> 		cr->range_list[i].dst = dst;
> 		cr->range_list[i].src = src;
> 		cr->range_list[i].len = len;
> 		cr->range_list[i].comp_len = 0;
> 	}
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 	ret = ioctl(fd, BLKCOPY, cr);
> 	if (ret != 0)
> 	       printf("copy failed, ret= %d\n", ret);
> 	for (i=0; i< cr->nr_range; i++)
> 		if (cr->range_list[i].len != cr->range_list[i].comp_len)
> 			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
> 								i, cr->range_list[i].len,
> 								cr->range_list[i].comp_len);
> 	close(fd);
> 	free(cr);
> 	return ret;
> }

Nice to have a code example. But please format it correctly.

> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Javier González <javier.gonz@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/fs.h |  9 +++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 46949f1b0dba..58d93c20ff30 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -120,6 +120,36 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
>  	return err;
>  }
>  
> +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> +		unsigned long arg)
> +{
> +	struct copy_range crange, *ranges = NULL;
> +	size_t payload_size = 0;
> +	int ret;
> +
> +	if (!(mode & FMODE_WRITE))
> +		return -EBADF;
> +
> +	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> +		return -EFAULT;
> +
> +	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
> +		return -EINVAL;
> +
> +	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
> +
> +	ranges = memdup_user((void __user *)arg, payload_size);
> +	if (IS_ERR(ranges))
> +		return PTR_ERR(ranges);
> +
> +	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL);
> +	if (copy_to_user((void __user *)arg, ranges, payload_size))
> +		ret = -EFAULT;
> +
> +	kfree(ranges);
> +	return ret;
> +}
> +
>  static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
>  		void __user *argp)
>  {
> @@ -481,6 +511,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
>  		return blk_ioctl_discard(bdev, mode, arg);
>  	case BLKSECDISCARD:
>  		return blk_ioctl_secure_erase(bdev, mode, argp);
> +	case BLKCOPY:
> +		return blk_ioctl_copy(bdev, mode, arg);
>  	case BLKZEROOUT:
>  		return blk_ioctl_zeroout(bdev, mode, arg);
>  	case BLKGETDISKSEQ:
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 822c28cebf3a..a3b13406ffb8 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -78,6 +78,14 @@ struct range_entry {
>  	__u64 comp_len;
>  };
>  
> +struct copy_range {
> +	__u64 nr_range;
> +	__u64 reserved;
> +
> +	/* Range_list always must be at the end */
> +	struct range_entry range_list[];
> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1
> @@ -199,6 +207,7 @@ struct fsxattr {
>  #define BLKROTATIONAL _IO(0x12,126)
>  #define BLKZEROOUT _IO(0x12,127)
>  #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
> +#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
>  /*
>   * A jump here: 130-136 are reserved for zoned block devices
>   * (see uapi/linux/blkzoned.h)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 03/10] block: Introduce a new ioctl for copy
@ 2022-04-27  2:48         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27  2:48 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, Sagi Grimberg, axboe, Johannes Thumshirn,
	tytso, martin.petersen, linux-kernel, Arnav Dawn, jack,
	linux-fsdevel, Javier González, lsf-pc, Alexander Viro

On 4/26/22 19:12, Nitesh Shetty wrote:
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to one or more destination in a device. COPY ioctl accepts a 'copy_range'
> structure that contains no of range, a reserved field , followed by an
> array of ranges. Each source range is represented by 'range_entry' that
> contains source start offset, destination start offset and length of
> source ranges (in bytes)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three entries with [dest,src,len],
> * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */
> 
> int main(void)
> {
> 	int i, ret, fd;
> 	unsigned long src = 0, dst = 32768, len = 4096;
> 	struct copy_range *cr;
> 	cr = (struct copy_range *)malloc(sizeof(*cr)+
> 					(sizeof(struct range_entry)*3));
> 	cr->nr_range = 3;
> 	cr->reserved = 0;
> 	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
> 		cr->range_list[i].dst = dst;
> 		cr->range_list[i].src = src;
> 		cr->range_list[i].len = len;
> 		cr->range_list[i].comp_len = 0;
> 	}
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 	ret = ioctl(fd, BLKCOPY, cr);
> 	if (ret != 0)
> 	       printf("copy failed, ret= %d\n", ret);
> 	for (i=0; i< cr->nr_range; i++)
> 		if (cr->range_list[i].len != cr->range_list[i].comp_len)
> 			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
> 								i, cr->range_list[i].len,
> 								cr->range_list[i].comp_len);
> 	close(fd);
> 	free(cr);
> 	return ret;
> }

Nice to have a code example. But please format it correctly.

> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Javier González <javier.gonz@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/fs.h |  9 +++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 46949f1b0dba..58d93c20ff30 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -120,6 +120,36 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
>  	return err;
>  }
>  
> +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
> +		unsigned long arg)
> +{
> +	struct copy_range crange, *ranges = NULL;
> +	size_t payload_size = 0;
> +	int ret;
> +
> +	if (!(mode & FMODE_WRITE))
> +		return -EBADF;
> +
> +	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
> +		return -EFAULT;
> +
> +	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
> +		return -EINVAL;
> +
> +	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
> +
> +	ranges = memdup_user((void __user *)arg, payload_size);
> +	if (IS_ERR(ranges))
> +		return PTR_ERR(ranges);
> +
> +	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL);
> +	if (copy_to_user((void __user *)arg, ranges, payload_size))
> +		ret = -EFAULT;
> +
> +	kfree(ranges);
> +	return ret;
> +}
> +
>  static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
>  		void __user *argp)
>  {
> @@ -481,6 +511,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
>  		return blk_ioctl_discard(bdev, mode, arg);
>  	case BLKSECDISCARD:
>  		return blk_ioctl_secure_erase(bdev, mode, argp);
> +	case BLKCOPY:
> +		return blk_ioctl_copy(bdev, mode, arg);
>  	case BLKZEROOUT:
>  		return blk_ioctl_zeroout(bdev, mode, arg);
>  	case BLKGETDISKSEQ:
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 822c28cebf3a..a3b13406ffb8 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -78,6 +78,14 @@ struct range_entry {
>  	__u64 comp_len;
>  };
>  
> +struct copy_range {
> +	__u64 nr_range;
> +	__u64 reserved;
> +
> +	/* Range_list always must be at the end */
> +	struct range_entry range_list[];
> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1
> @@ -199,6 +207,7 @@ struct fsxattr {
>  #define BLKROTATIONAL _IO(0x12,126)
>  #define BLKZEROOUT _IO(0x12,127)
>  #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
> +#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
>  /*
>   * A jump here: 130-136 are reserved for zoned block devices
>   * (see uapi/linux/blkzoned.h)


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27 10:29         ` Hannes Reinecke
  -1 siblings, 0 replies; 101+ messages in thread
From: Hannes Reinecke @ 2022-04-27 10:29 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, Alexander Viro,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, lsf-pc, Damien Le Moal

On 4/26/22 12:12, Nitesh Shetty wrote:
> Introduce blkdev_issue_copy which supports source and destination bdevs,
> and an array of (source, destination and copy length) tuples.
> Introduce REQ_COPY copy offload operation flag. Create a read-write
> bio pair with a token as payload and submitted to the device in order.
> Read request populates token with source specific information which
> is then passed with write request.
> This design is courtesy Mikulas Patocka's token based copy
> 
> Larger copy will be divided, based on max_copy_sectors,
> max_copy_range_sector limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>   block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
>   block/blk.h               |   2 +
>   include/linux/blk_types.h |  21 ++++
>   include/linux/blkdev.h    |   2 +
>   include/uapi/linux/fs.h   |  14 +++
>   5 files changed, 271 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 09b7e1200c0f..ba9da2d2f429 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>   }
>   EXPORT_SYMBOL(blkdev_issue_discard);
>   
> +/*
> + * Wait on and process all in-flight BIOs.  This must only be called once
> + * all bios have been issued so that the refcount can only decrease.
> + * This just waits for all bios to make it through bio_copy_end_io. IO
> + * errors are propagated through cio->io_error.
> + */
> +static int cio_await_completion(struct cio *cio)
> +{
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (cio->refcount) {
> +		cio->waiter = current;
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		spin_unlock_irqrestore(&cio->lock, flags);
> +		blk_io_schedule();
> +		/* wake up sets us TASK_RUNNING */
> +		spin_lock_irqsave(&cio->lock, flags);
> +		cio->waiter = NULL;
> +		ret = cio->io_err;
> +	}
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	kvfree(cio);
> +
> +	return ret;
> +}
> +
> +static void bio_copy_end_io(struct bio *bio)
> +{
> +	struct copy_ctx *ctx = bio->bi_private;
> +	struct cio *cio = ctx->cio;
> +	sector_t clen;
> +	int ri = ctx->range_idx;
> +	unsigned long flags;
> +	bool wake = false;
> +
> +	if (bio->bi_status) {
> +		cio->io_err = bio->bi_status;
> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> +	}
> +	__free_page(bio->bi_io_vec[0].bv_page);
> +	kfree(ctx);
> +	bio_put(bio);
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (((--cio->refcount) <= 0) && cio->waiter)
> +		wake = true;
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	if (wake)
> +		wake_up_process(cio->waiter);
> +}
> +
> +/*
> + * blk_copy_offload	- Use device's native copy offload feature
> + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> + */
> +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> +{
> +	struct request_queue *sq = bdev_get_queue(src_bdev);
> +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> +	struct bio *read_bio, *write_bio;
> +	struct copy_ctx *ctx;
> +	struct cio *cio;
> +	struct page *token;
> +	sector_t src_blk, copy_len, dst_blk;
> +	sector_t remaining, max_copy_len = LONG_MAX;
> +	unsigned long flags;
> +	int ri = 0, ret = 0;
> +
> +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> +	if (!cio)
> +		return -ENOMEM;
> +	cio->rlist = rlist;
> +	spin_lock_init(&cio->lock);
> +
> +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> +
> +	for (ri = 0; ri < nr_srcs; ri++) {
> +		cio->rlist[ri].comp_len = rlist[ri].len;
> +		src_blk = rlist[ri].src;
> +		dst_blk = rlist[ri].dst;
> +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> +			copy_len = min(remaining, max_copy_len);
> +
> +			token = alloc_page(gfp_mask);
> +			if (unlikely(!token)) {
> +				ret = -ENOMEM;
> +				goto err_token;
> +			}
> +
> +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> +			if (!ctx) {
> +				ret = -ENOMEM;
> +				goto err_ctx;
> +			}
> +			ctx->cio = cio;
> +			ctx->range_idx = ri;
> +			ctx->start_sec = dst_blk;
> +
> +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!read_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			read_bio->bi_iter.bi_size = copy_len;
> +			ret = submit_bio_wait(read_bio);
> +			bio_put(read_bio);
> +			if (ret)
> +				goto err_read_bio;
> +
> +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!write_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			write_bio->bi_iter.bi_size = copy_len;
> +			write_bio->bi_end_io = bio_copy_end_io;
> +			write_bio->bi_private = ctx;
> +
> +			spin_lock_irqsave(&cio->lock, flags);
> +			++cio->refcount;
> +			spin_unlock_irqrestore(&cio->lock, flags);
> +
> +			submit_bio(write_bio);
> +			src_blk += copy_len;
> +			dst_blk += copy_len;
> +		}
> +	}
> +

Hmm. I'm not sure if I like the copy loop.
What I definitely would do is to allocate the write bio before reading 
data; after all, if we can't allocate the write bio reading is pretty 
much pointless.

But the real issue I have with this is that it's doing synchronous 
reads, thereby limiting the performance.

Can't you submit the write bio from the end_io function of the read bio?
That would disentangle things, and we should be getting a better 
performance.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-27 10:29         ` Hannes Reinecke
  0 siblings, 0 replies; 101+ messages in thread
From: Hannes Reinecke @ 2022-04-27 10:29 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef,
	clm, dsterba, tytso, jack, nitheshshetty, gost.dev, Arnav Dawn,
	Alasdair Kergon, Mike Snitzer, Sagi Grimberg, James Smart,
	Chaitanya Kulkarni, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 12:12, Nitesh Shetty wrote:
> Introduce blkdev_issue_copy which supports source and destination bdevs,
> and an array of (source, destination and copy length) tuples.
> Introduce REQ_COPY copy offload operation flag. Create a read-write
> bio pair with a token as payload and submitted to the device in order.
> Read request populates token with source specific information which
> is then passed with write request.
> This design is courtesy Mikulas Patocka's token based copy
> 
> Larger copy will be divided, based on max_copy_sectors,
> max_copy_range_sector limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>   block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
>   block/blk.h               |   2 +
>   include/linux/blk_types.h |  21 ++++
>   include/linux/blkdev.h    |   2 +
>   include/uapi/linux/fs.h   |  14 +++
>   5 files changed, 271 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 09b7e1200c0f..ba9da2d2f429 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>   }
>   EXPORT_SYMBOL(blkdev_issue_discard);
>   
> +/*
> + * Wait on and process all in-flight BIOs.  This must only be called once
> + * all bios have been issued so that the refcount can only decrease.
> + * This just waits for all bios to make it through bio_copy_end_io. IO
> + * errors are propagated through cio->io_error.
> + */
> +static int cio_await_completion(struct cio *cio)
> +{
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (cio->refcount) {
> +		cio->waiter = current;
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		spin_unlock_irqrestore(&cio->lock, flags);
> +		blk_io_schedule();
> +		/* wake up sets us TASK_RUNNING */
> +		spin_lock_irqsave(&cio->lock, flags);
> +		cio->waiter = NULL;
> +		ret = cio->io_err;
> +	}
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	kvfree(cio);
> +
> +	return ret;
> +}
> +
> +static void bio_copy_end_io(struct bio *bio)
> +{
> +	struct copy_ctx *ctx = bio->bi_private;
> +	struct cio *cio = ctx->cio;
> +	sector_t clen;
> +	int ri = ctx->range_idx;
> +	unsigned long flags;
> +	bool wake = false;
> +
> +	if (bio->bi_status) {
> +		cio->io_err = bio->bi_status;
> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> +	}
> +	__free_page(bio->bi_io_vec[0].bv_page);
> +	kfree(ctx);
> +	bio_put(bio);
> +
> +	spin_lock_irqsave(&cio->lock, flags);
> +	if (((--cio->refcount) <= 0) && cio->waiter)
> +		wake = true;
> +	spin_unlock_irqrestore(&cio->lock, flags);
> +	if (wake)
> +		wake_up_process(cio->waiter);
> +}
> +
> +/*
> + * blk_copy_offload	- Use device's native copy offload feature
> + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> + */
> +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> +{
> +	struct request_queue *sq = bdev_get_queue(src_bdev);
> +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> +	struct bio *read_bio, *write_bio;
> +	struct copy_ctx *ctx;
> +	struct cio *cio;
> +	struct page *token;
> +	sector_t src_blk, copy_len, dst_blk;
> +	sector_t remaining, max_copy_len = LONG_MAX;
> +	unsigned long flags;
> +	int ri = 0, ret = 0;
> +
> +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> +	if (!cio)
> +		return -ENOMEM;
> +	cio->rlist = rlist;
> +	spin_lock_init(&cio->lock);
> +
> +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> +
> +	for (ri = 0; ri < nr_srcs; ri++) {
> +		cio->rlist[ri].comp_len = rlist[ri].len;
> +		src_blk = rlist[ri].src;
> +		dst_blk = rlist[ri].dst;
> +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> +			copy_len = min(remaining, max_copy_len);
> +
> +			token = alloc_page(gfp_mask);
> +			if (unlikely(!token)) {
> +				ret = -ENOMEM;
> +				goto err_token;
> +			}
> +
> +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> +			if (!ctx) {
> +				ret = -ENOMEM;
> +				goto err_ctx;
> +			}
> +			ctx->cio = cio;
> +			ctx->range_idx = ri;
> +			ctx->start_sec = dst_blk;
> +
> +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!read_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			read_bio->bi_iter.bi_size = copy_len;
> +			ret = submit_bio_wait(read_bio);
> +			bio_put(read_bio);
> +			if (ret)
> +				goto err_read_bio;
> +
> +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!write_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> +			write_bio->bi_iter.bi_size = copy_len;
> +			write_bio->bi_end_io = bio_copy_end_io;
> +			write_bio->bi_private = ctx;
> +
> +			spin_lock_irqsave(&cio->lock, flags);
> +			++cio->refcount;
> +			spin_unlock_irqrestore(&cio->lock, flags);
> +
> +			submit_bio(write_bio);
> +			src_blk += copy_len;
> +			dst_blk += copy_len;
> +		}
> +	}
> +

Hmm. I'm not sure if I like the copy loop.
What I definitely would do is to allocate the write bio before reading 
data; after all, if we can't allocate the write bio reading is pretty 
much pointless.

But the real issue I have with this is that it's doing synchronous 
reads, thereby limiting the performance.

Can't you submit the write bio from the end_io function of the read bio?
That would disentangle things, and we should be getting a better 
performance.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27 10:30         ` Hannes Reinecke
  -1 siblings, 0 replies; 101+ messages in thread
From: Hannes Reinecke @ 2022-04-27 10:30 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, Alexander Viro,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, Kanchan Joshi, martin.petersen,
	linux-kernel, Arnav Dawn, jack, linux-fsdevel, lsf-pc,
	Damien Le Moal

On 4/26/22 12:12, Nitesh Shetty wrote:
> Add device limits as sysfs entries,
>          - copy_offload (RW)
>          - copy_max_bytes (RW)
>          - copy_max_hw_bytes (RO)
>          - copy_max_range_bytes (RW)
>          - copy_max_range_hw_bytes (RO)
>          - copy_max_nr_ranges (RW)
>          - copy_max_nr_ranges_hw (RO)
> 
> Above limits help to split the copy payload in block layer.
> copy_offload, used for setting copy offload(1) or emulation(0).
> copy_max_bytes: maximum total length of copy in single payload.
> copy_max_range_bytes: maximum length in a single entry.
> copy_max_nr_ranges: maximum number of entries in a payload.
> copy_max_*_hw_*: Reflects the device supported maximum limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>   Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
>   block/blk-settings.c                 |  59 ++++++++++++
>   block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
>   include/linux/blkdev.h               |  13 +++
>   4 files changed, 293 insertions(+)
> 

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
@ 2022-04-27 10:30         ` Hannes Reinecke
  0 siblings, 0 replies; 101+ messages in thread
From: Hannes Reinecke @ 2022-04-27 10:30 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef,
	clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Kanchan Joshi, Arnav Dawn, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 12:12, Nitesh Shetty wrote:
> Add device limits as sysfs entries,
>          - copy_offload (RW)
>          - copy_max_bytes (RW)
>          - copy_max_hw_bytes (RO)
>          - copy_max_range_bytes (RW)
>          - copy_max_range_hw_bytes (RO)
>          - copy_max_nr_ranges (RW)
>          - copy_max_nr_ranges_hw (RO)
> 
> Above limits help to split the copy payload in block layer.
> copy_offload, used for setting copy offload(1) or emulation(0).
> copy_max_bytes: maximum total length of copy in single payload.
> copy_max_range_bytes: maximum length in a single entry.
> copy_max_nr_ranges: maximum number of entries in a payload.
> copy_max_*_hw_*: Reflects the device supported maximum limits.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>   Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
>   block/blk-settings.c                 |  59 ++++++++++++
>   block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
>   include/linux/blkdev.h               |  13 +++
>   4 files changed, 293 insertions(+)
> 

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 03/10] block: Introduce a new ioctl for copy
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-27 10:37         ` Hannes Reinecke
  -1 siblings, 0 replies; 101+ messages in thread
From: Hannes Reinecke @ 2022-04-27 10:37 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	Naohiro Aota, msnitzer, bvanassche, linux-scsi, gost.dev,
	nitheshshetty, James Smart, hch, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, Alexander Viro,
	kbusch, Frederick.Knight, Sagi Grimberg, axboe,
	Johannes Thumshirn, tytso, martin.petersen, linux-kernel,
	Arnav Dawn, jack, linux-fsdevel, Javier González, lsf-pc,
	Damien Le Moal

On 4/26/22 12:12, Nitesh Shetty wrote:
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to one or more destination in a device. COPY ioctl accepts a 'copy_range'
> structure that contains no of range, a reserved field , followed by an
> array of ranges. Each source range is represented by 'range_entry' that
> contains source start offset, destination start offset and length of
> source ranges (in bytes)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three entries with [dest,src,len],
> * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */
> 
> int main(void)
> {
> 	int i, ret, fd;
> 	unsigned long src = 0, dst = 32768, len = 4096;
> 	struct copy_range *cr;
> 	cr = (struct copy_range *)malloc(sizeof(*cr)+
> 					(sizeof(struct range_entry)*3));
> 	cr->nr_range = 3;
> 	cr->reserved = 0;
> 	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
> 		cr->range_list[i].dst = dst;
> 		cr->range_list[i].src = src;
> 		cr->range_list[i].len = len;
> 		cr->range_list[i].comp_len = 0;
> 	}
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 	ret = ioctl(fd, BLKCOPY, cr);
> 	if (ret != 0)
> 	       printf("copy failed, ret= %d\n", ret);
> 	for (i=0; i< cr->nr_range; i++)
> 		if (cr->range_list[i].len != cr->range_list[i].comp_len)
> 			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
> 								i, cr->range_list[i].len,
> 								cr->range_list[i].comp_len);
> 	close(fd);
> 	free(cr);
> 	return ret;
> }
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Javier González <javier.gonz@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>   block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
>   include/uapi/linux/fs.h |  9 +++++++++
>   2 files changed, 41 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 03/10] block: Introduce a new ioctl for copy
@ 2022-04-27 10:37         ` Hannes Reinecke
  0 siblings, 0 replies; 101+ messages in thread
From: Hannes Reinecke @ 2022-04-27 10:37 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: chaitanyak, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, axboe, msnitzer, bvanassche, martin.petersen,
	kbusch, hch, Frederick.Knight, osandov, lsf-pc, djwong, josef,
	clm, dsterba, tytso, jack, nitheshshetty, gost.dev,
	Javier González, Arnav Dawn, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Alexander Viro, linux-kernel

On 4/26/22 12:12, Nitesh Shetty wrote:
> Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> to one or more destination in a device. COPY ioctl accepts a 'copy_range'
> structure that contains no of range, a reserved field , followed by an
> array of ranges. Each source range is represented by 'range_entry' that
> contains source start offset, destination start offset and length of
> source ranges (in bytes)
> 
> MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> 
> Example code, to issue BLKCOPY:
> /* Sample example to copy three entries with [dest,src,len],
> * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */
> 
> int main(void)
> {
> 	int i, ret, fd;
> 	unsigned long src = 0, dst = 32768, len = 4096;
> 	struct copy_range *cr;
> 	cr = (struct copy_range *)malloc(sizeof(*cr)+
> 					(sizeof(struct range_entry)*3));
> 	cr->nr_range = 3;
> 	cr->reserved = 0;
> 	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
> 		cr->range_list[i].dst = dst;
> 		cr->range_list[i].src = src;
> 		cr->range_list[i].len = len;
> 		cr->range_list[i].comp_len = 0;
> 	}
> 	fd = open("/dev/nvme0n1", O_RDWR);
> 	if (fd < 0) return 1;
> 	ret = ioctl(fd, BLKCOPY, cr);
> 	if (ret != 0)
> 	       printf("copy failed, ret= %d\n", ret);
> 	for (i=0; i< cr->nr_range; i++)
> 		if (cr->range_list[i].len != cr->range_list[i].comp_len)
> 			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
> 								i, cr->range_list[i].len,
> 								cr->range_list[i].comp_len);
> 	close(fd);
> 	free(cr);
> 	return ret;
> }
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Javier González <javier.gonz@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>   block/ioctl.c           | 32 ++++++++++++++++++++++++++++++++
>   include/uapi/linux/fs.h |  9 +++++++++
>   2 files changed, 41 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-27  2:19     ` [dm-devel] " Damien Le Moal
@ 2022-04-27 12:49       ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 12:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 731 bytes --]

O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > The patch series covers the points discussed in November 2021 virtual call
> > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > We have covered the Initial agreed requirements in this patchset.
> > Patchset borrows Mikulas's token based approach for 2 bdev
> > implementation.
> > 
> > Overall series supports –
> > 
> > 1. Driver
> > - NVMe Copy command (single NS), including support in nvme-target (for
> >     block and file backend)
> 
> It would also be nice to have copy offload emulation in null_blk for testing.
>

We can plan this in next phase of copy support, once this series settles down.

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27 12:49       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 12:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > The patch series covers the points discussed in November 2021 virtual call
> > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > We have covered the Initial agreed requirements in this patchset.
> > Patchset borrows Mikulas's token based approach for 2 bdev
> > implementation.
> > 
> > Overall series supports –
> > 
> > 1. Driver
> > - NVMe Copy command (single NS), including support in nvme-target (for
> >     block and file backend)
> 
> It would also be nice to have copy offload emulation in null_blk for testing.
>

We can plan this in next phase of copy support, once this series settles down.

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 03/10] block: Introduce a new ioctl for copy
  2022-04-27  2:48         ` [dm-devel] " Damien Le Moal
@ 2022-04-27 13:03           ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 13:03 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]

O Wed, Apr 27, 2022 at 11:48:57AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> > to one or more destination in a device. COPY ioctl accepts a 'copy_range'
> > structure that contains no of range, a reserved field , followed by an
> > array of ranges. Each source range is represented by 'range_entry' that
> > contains source start offset, destination start offset and length of
> > source ranges (in bytes)
> > 
> > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> > 
> > Example code, to issue BLKCOPY:
> > /* Sample example to copy three entries with [dest,src,len],
> > * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */
> > 
> > int main(void)
> > {
> > 	int i, ret, fd;
> > 	unsigned long src = 0, dst = 32768, len = 4096;
> > 	struct copy_range *cr;
> > 	cr = (struct copy_range *)malloc(sizeof(*cr)+
> > 					(sizeof(struct range_entry)*3));
> > 	cr->nr_range = 3;
> > 	cr->reserved = 0;
> > 	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
> > 		cr->range_list[i].dst = dst;
> > 		cr->range_list[i].src = src;
> > 		cr->range_list[i].len = len;
> > 		cr->range_list[i].comp_len = 0;
> > 	}
> > 	fd = open("/dev/nvme0n1", O_RDWR);
> > 	if (fd < 0) return 1;
> > 	ret = ioctl(fd, BLKCOPY, cr);
> > 	if (ret != 0)
> > 	       printf("copy failed, ret= %d\n", ret);
> > 	for (i=0; i< cr->nr_range; i++)
> > 		if (cr->range_list[i].len != cr->range_list[i].comp_len)
> > 			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
> > 								i, cr->range_list[i].len,
> > 								cr->range_list[i].comp_len);
> > 	close(fd);
> > 	free(cr);
> > 	return ret;
> > }
> 
> Nice to have a code example. But please format it correctly.
>

acked

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 03/10] block: Introduce a new ioctl for copy
@ 2022-04-27 13:03           ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 13:03 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]

O Wed, Apr 27, 2022 at 11:48:57AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
> > to one or more destination in a device. COPY ioctl accepts a 'copy_range'
> > structure that contains no of range, a reserved field , followed by an
> > array of ranges. Each source range is represented by 'range_entry' that
> > contains source start offset, destination start offset and length of
> > source ranges (in bytes)
> > 
> > MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
> > MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.
> > 
> > Example code, to issue BLKCOPY:
> > /* Sample example to copy three entries with [dest,src,len],
> > * [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */
> > 
> > int main(void)
> > {
> > 	int i, ret, fd;
> > 	unsigned long src = 0, dst = 32768, len = 4096;
> > 	struct copy_range *cr;
> > 	cr = (struct copy_range *)malloc(sizeof(*cr)+
> > 					(sizeof(struct range_entry)*3));
> > 	cr->nr_range = 3;
> > 	cr->reserved = 0;
> > 	for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
> > 		cr->range_list[i].dst = dst;
> > 		cr->range_list[i].src = src;
> > 		cr->range_list[i].len = len;
> > 		cr->range_list[i].comp_len = 0;
> > 	}
> > 	fd = open("/dev/nvme0n1", O_RDWR);
> > 	if (fd < 0) return 1;
> > 	ret = ioctl(fd, BLKCOPY, cr);
> > 	if (ret != 0)
> > 	       printf("copy failed, ret= %d\n", ret);
> > 	for (i=0; i< cr->nr_range; i++)
> > 		if (cr->range_list[i].len != cr->range_list[i].comp_len)
> > 			printf("Partial copy for entry %d: requested %llu, completed %llu\n",
> > 								i, cr->range_list[i].len,
> > 								cr->range_list[i].comp_len);
> > 	close(fd);
> > 	free(cr);
> > 	return ret;
> > }
> 
> Nice to have a code example. But please format it correctly.
>

acked

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-27  2:45         ` [dm-devel] " Damien Le Moal
@ 2022-04-27 15:15           ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:15 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 15432 bytes --]

On Wed, Apr 27, 2022 at 11:45:26AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > Introduce blkdev_issue_copy which supports source and destination bdevs,
> > and an array of (source, destination and copy length) tuples.
> > Introduce REQ_COPY copy offload operation flag. Create a read-write
> > bio pair with a token as payload and submitted to the device in order.
> > Read request populates token with source specific information which
> > is then passed with write request.
> > This design is courtesy Mikulas Patocka's token based copy
> > 
> > Larger copy will be divided, based on max_copy_sectors,
> > max_copy_range_sector limits.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
> >  block/blk.h               |   2 +
> >  include/linux/blk_types.h |  21 ++++
> >  include/linux/blkdev.h    |   2 +
> >  include/uapi/linux/fs.h   |  14 +++
> >  5 files changed, 271 insertions(+)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 09b7e1200c0f..ba9da2d2f429 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  }
> >  EXPORT_SYMBOL(blkdev_issue_discard);
> >  
> > +/*
> > + * Wait on and process all in-flight BIOs.  This must only be called once
> > + * all bios have been issued so that the refcount can only decrease.
> > + * This just waits for all bios to make it through bio_copy_end_io. IO
> > + * errors are propagated through cio->io_error.
> > + */
> > +static int cio_await_completion(struct cio *cio)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (cio->refcount) {
> > +		cio->waiter = current;
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		spin_unlock_irqrestore(&cio->lock, flags);
> > +		blk_io_schedule();
> > +		/* wake up sets us TASK_RUNNING */
> > +		spin_lock_irqsave(&cio->lock, flags);
> > +		cio->waiter = NULL;
> > +		ret = cio->io_err;
> > +	}
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	kvfree(cio);
> 
> cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?
>

acked.

> > +
> > +	return ret;
> > +}
> > +
> > +static void bio_copy_end_io(struct bio *bio)
> > +{
> > +	struct copy_ctx *ctx = bio->bi_private;
> > +	struct cio *cio = ctx->cio;
> > +	sector_t clen;
> > +	int ri = ctx->range_idx;
> > +	unsigned long flags;
> > +	bool wake = false;
> > +
> > +	if (bio->bi_status) {
> > +		cio->io_err = bio->bi_status;
> > +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> > +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> 
> long line.

Is it because line is more than 80 character, I thought limit is 100 now, so
went with longer lines ?

> 
> > +	}
> > +	__free_page(bio->bi_io_vec[0].bv_page);
> > +	kfree(ctx);
> > +	bio_put(bio);
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (((--cio->refcount) <= 0) && cio->waiter)
> > +		wake = true;
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	if (wake)
> > +		wake_up_process(cio->waiter);
> > +}
> > +
> > +/*
> > + * blk_copy_offload	- Use device's native copy offload feature
> > + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> 
> long line.
> 

Same as above

> > + */
> > +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> 
> long line.
>

Same as above

> rlist is an array, but rlist naming implies a list. Why not call that
> argument "ranges" ?
> 
> The argument ordering is also strange. I would make that:
> 
> blk_copy_offload(struct block_device *src_bdev,
> 	         struct block_device *dst_bdev,
> 		 struct range_entry *rlist, int nr_srcs,
> 		 gfp_t gfp_mask)
> 

Yes, looks better. We will update in next version.
One doubt, this arguments ordering is based on size ?
Since we ordered it with logic that, we use nr_srcs to get number of entries
in rlist(ranges).

> > +{
> > +	struct request_queue *sq = bdev_get_queue(src_bdev);
> > +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> > +	struct bio *read_bio, *write_bio;
> > +	struct copy_ctx *ctx;
> > +	struct cio *cio;
> > +	struct page *token;
> > +	sector_t src_blk, copy_len, dst_blk;
> > +	sector_t remaining, max_copy_len = LONG_MAX;
> > +	unsigned long flags;
> > +	int ri = 0, ret = 0;
> > +
> > +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> > +	if (!cio)
> > +		return -ENOMEM;
> > +	cio->rlist = rlist;
> > +	spin_lock_init(&cio->lock);
> > +
> > +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> > +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> 
> But max_copy_range_sectors is for one sector only, right ? So what is this
> second min3() doing ? It is mixing up total length and one range length.
> The device should not have reported a per range max length larger than the
> total length in the first place, right ? If it does, that would be a very
> starnge device...
> 

Yeah you are right, makes sense, will update in next version.

> > +
> > +	for (ri = 0; ri < nr_srcs; ri++) {
> > +		cio->rlist[ri].comp_len = rlist[ri].len;
> > +		src_blk = rlist[ri].src;
> > +		dst_blk = rlist[ri].dst;
> > +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> > +			copy_len = min(remaining, max_copy_len);
> > +
> > +			token = alloc_page(gfp_mask);
> > +			if (unlikely(!token)) {
> > +				ret = -ENOMEM;
> > +				goto err_token;
> > +			}
> > +
> > +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> > +			if (!ctx) {
> > +				ret = -ENOMEM;
> > +				goto err_ctx;
> > +			}
> > +			ctx->cio = cio;
> > +			ctx->range_idx = ri;
> > +			ctx->start_sec = dst_blk;
> > +
> > +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!read_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			read_bio->bi_iter.bi_size = copy_len;
> > +			ret = submit_bio_wait(read_bio);
> > +			bio_put(read_bio);
> > +			if (ret)
> > +				goto err_read_bio;
> > +
> > +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!write_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			write_bio->bi_iter.bi_size = copy_len;
> > +			write_bio->bi_end_io = bio_copy_end_io;
> > +			write_bio->bi_private = ctx;
> > +
> > +			spin_lock_irqsave(&cio->lock, flags);
> > +			++cio->refcount;
> 
> Shouldn't this be an atomic_t ?
> 

We changed it to normal variable and used a single spin_lock to avoid race
condition on refcount and current process wakeup in completion path.
Since for making copy asynchronous, we needed to store process
context as well. So there was a possibility of race condition.
https://lore.kernel.org/all/20220209102208.GB7698@test-zns/

> And wrap lines please. Many are too long.
> 
> > +			spin_unlock_irqrestore(&cio->lock, flags);
> > +
> > +			submit_bio(write_bio);
> > +			src_blk += copy_len;
> > +			dst_blk += copy_len;
> > +		}
> > +	}
> > +
> > +	/* Wait for completion of all IO's*/
> > +	return cio_await_completion(cio);
> > +
> > +err_read_bio:
> > +	kfree(ctx);
> > +err_ctx:
> > +	__free_page(token);
> > +err_token:
> > +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> > +
> > +	cio->io_err = ret;
> > +	return cio_await_completion(cio);
> > +}
> > +
> > +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> > +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> > +{
> > +	unsigned int align_mask = max(
> > +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> > +	sector_t len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < nr; i++) {
> > +		if (rlist[i].len)
> > +			len += rlist[i].len;
> > +		else
> > +			return -EINVAL;
> 
> Reverse the if condition and return to avoid the else.
> 

acked

> > +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> > +				(rlist[i].len & align_mask))
> > +			return -EINVAL;
> > +		rlist[i].comp_len = 0;
> > +	}
> > +
> > +	if (len && len >= MAX_COPY_TOTAL_LENGTH)
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +
> > +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> > +		struct request_queue *dest_q)
> > +{
> > +	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
> > +		return true;
> > +
> > +	return false;
> 
> return blk_queue_copy(dest_q) && blk_queue_copy(src_q);
> 
> would be simpler.
> 

acked

> > +}
> > +
> > +/*
> > + * blkdev_issue_copy - queue a copy
> > + * @src_bdev:	source block device
> > + * @nr_srcs:	number of source ranges to copy
> > + * @rlist:	array of source/dest/len
> > + * @dest_bdev:	destination block device
> > + * @gfp_mask:   memory allocation flags (for bio_alloc)
> > + *
> > + * Description:
> > + *	Copy source ranges from source block device to destination block device.
> > + *	length of a source range cannot be zero.
> > + */
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> > +		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
> 
> same comment as above about args order and naming.
> 

acked

> > +{
> > +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> > +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> > +	int ret = -EINVAL;
> > +
> > +	if (!src_q || !dest_q)
> > +		return -ENXIO;
> > +
> > +	if (!nr)
> > +		return -EINVAL;
> > +
> > +	if (nr >= MAX_COPY_NR_RANGE)
> > +		return -EINVAL;
> 
> Where do you check the number of ranges against what the device can do ?
>

The present implementation submits only one range at a time. This was done to 
make copy offload generic, so that other types of copy implementation such as
XCOPY should be able to use same infrastructure. Downside at present being
NVMe copy offload is not optimal.

> > +
> > +	if (bdev_read_only(dest_bdev))
> > +		return -EPERM;
> > +
> > +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> > +	if (ret)
> > +		return ret;
> 
> nr check should be in this function...
> 
> > +
> > +	if (blk_check_copy_offload(src_q, dest_q))
> 
> ...which should be only one function with this one.
> 

Sure, we can combine blk_copy_sanity_check and blk_check_copy_offload.

> > +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(blkdev_issue_copy);
> > +
> >  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
> >  		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
> >  		struct bio **biop, unsigned flags)
> > diff --git a/block/blk.h b/block/blk.h
> > index 434017701403..6010eda58c70 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
> >  		break;
> >  	}
> >  
> > +	if (unlikely(op_is_copy(bio->bi_opf)))
> > +		return false;
> >  	/*
> >  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
> >  	 * This is a quick and dirty check that relies on the fact that
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index c62274466e72..f5b01f284c43 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -418,6 +418,7 @@ enum req_flag_bits {
> >  	/* for driver use */
> >  	__REQ_DRV,
> >  	__REQ_SWAP,		/* swapping request. */
> > +	__REQ_COPY,		/* copy request */
> >  	__REQ_NR_BITS,		/* stops here */
> >  };
> >  
> > @@ -443,6 +444,7 @@ enum req_flag_bits {
> >  
> >  #define REQ_DRV			(1ULL << __REQ_DRV)
> >  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> > +#define REQ_COPY		(1ULL << __REQ_COPY)
> >  
> >  #define REQ_FAILFAST_MASK \
> >  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> > @@ -459,6 +461,11 @@ enum stat_group {
> >  	NR_STAT_GROUPS
> >  };
> >  
> > +static inline bool op_is_copy(unsigned int op)
> > +{
> > +	return (op & REQ_COPY);
> > +}
> > +
> >  #define bio_op(bio) \
> >  	((bio)->bi_opf & REQ_OP_MASK)
> >  
> > @@ -533,4 +540,18 @@ struct blk_rq_stat {
> >  	u64 batch;
> >  };
> >  
> > +struct cio {
> > +	struct range_entry *rlist;
> 
> naming... This is an array, right ?
>

acked, will update in next version.

> > +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> > +	spinlock_t lock;		/* protects refcount and waiter */
> > +	int refcount;
> > +	blk_status_t io_err;
> > +};
> > +
> > +struct copy_ctx {
> > +	int range_idx;
> > +	sector_t start_sec;
> > +	struct cio *cio;
> > +};
> > +
> >  #endif /* __LINUX_BLK_TYPES_H */
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 3596fd37fae7..c6cb3fe82ba2 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
> >  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
> >  		sector_t nr_sects, gfp_t gfp);
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
> >  
> >  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
> >  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index bdf7b404b3e7..822c28cebf3a 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -64,6 +64,20 @@ struct fstrim_range {
> >  	__u64 minlen;
> >  };
> >  
> > +/* Maximum no of entries supported */
> > +#define MAX_COPY_NR_RANGE	(1 << 12)
> 
> This value should be used also when setting the limits in the previous
> patch. max_copy_nr_ranges and max_hw_copy_nr_ranges must be bounded by it.
> 

acked.

> > +
> > +/* maximum total copy length */
> > +#define MAX_COPY_TOTAL_LENGTH	(1 << 27)
> 
> Same for this one. And where does this magic number come from ?
>

We used this as max size for local testing, so as not to hang resources in case
of emulation. Feel free to suggest better values if you have anything in mind !!

> > +
> > +/* Source range entry for copy */
> > +struct range_entry {
> > +	__u64 src;
> > +	__u64 dst;
> > +	__u64 len;
> > +	__u64 comp_len;
> 
> Please describe the fields of this structure. The meaning of them is
> really not clear from the names.
> 

acked

> > +};
> > +
> >  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
> >  #define FILE_DEDUPE_RANGE_SAME		0
> >  #define FILE_DEDUPE_RANGE_DIFFERS	1
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-27 15:15           ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:15 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 15432 bytes --]

On Wed, Apr 27, 2022 at 11:45:26AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > Introduce blkdev_issue_copy which supports source and destination bdevs,
> > and an array of (source, destination and copy length) tuples.
> > Introduce REQ_COPY copy offload operation flag. Create a read-write
> > bio pair with a token as payload and submitted to the device in order.
> > Read request populates token with source specific information which
> > is then passed with write request.
> > This design is courtesy Mikulas Patocka's token based copy
> > 
> > Larger copy will be divided, based on max_copy_sectors,
> > max_copy_range_sector limits.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
> >  block/blk.h               |   2 +
> >  include/linux/blk_types.h |  21 ++++
> >  include/linux/blkdev.h    |   2 +
> >  include/uapi/linux/fs.h   |  14 +++
> >  5 files changed, 271 insertions(+)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 09b7e1200c0f..ba9da2d2f429 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  }
> >  EXPORT_SYMBOL(blkdev_issue_discard);
> >  
> > +/*
> > + * Wait on and process all in-flight BIOs.  This must only be called once
> > + * all bios have been issued so that the refcount can only decrease.
> > + * This just waits for all bios to make it through bio_copy_end_io. IO
> > + * errors are propagated through cio->io_error.
> > + */
> > +static int cio_await_completion(struct cio *cio)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (cio->refcount) {
> > +		cio->waiter = current;
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		spin_unlock_irqrestore(&cio->lock, flags);
> > +		blk_io_schedule();
> > +		/* wake up sets us TASK_RUNNING */
> > +		spin_lock_irqsave(&cio->lock, flags);
> > +		cio->waiter = NULL;
> > +		ret = cio->io_err;
> > +	}
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	kvfree(cio);
> 
> cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?
>

acked.

> > +
> > +	return ret;
> > +}
> > +
> > +static void bio_copy_end_io(struct bio *bio)
> > +{
> > +	struct copy_ctx *ctx = bio->bi_private;
> > +	struct cio *cio = ctx->cio;
> > +	sector_t clen;
> > +	int ri = ctx->range_idx;
> > +	unsigned long flags;
> > +	bool wake = false;
> > +
> > +	if (bio->bi_status) {
> > +		cio->io_err = bio->bi_status;
> > +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> > +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> 
> long line.

Is it because line is more than 80 character, I thought limit is 100 now, so
went with longer lines ?

> 
> > +	}
> > +	__free_page(bio->bi_io_vec[0].bv_page);
> > +	kfree(ctx);
> > +	bio_put(bio);
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (((--cio->refcount) <= 0) && cio->waiter)
> > +		wake = true;
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	if (wake)
> > +		wake_up_process(cio->waiter);
> > +}
> > +
> > +/*
> > + * blk_copy_offload	- Use device's native copy offload feature
> > + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> 
> long line.
> 

Same as above

> > + */
> > +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> 
> long line.
>

Same as above

> rlist is an array, but rlist naming implies a list. Why not call that
> argument "ranges" ?
> 
> The argument ordering is also strange. I would make that:
> 
> blk_copy_offload(struct block_device *src_bdev,
> 	         struct block_device *dst_bdev,
> 		 struct range_entry *rlist, int nr_srcs,
> 		 gfp_t gfp_mask)
> 

Yes, looks better. We will update in next version.
One doubt, this arguments ordering is based on size ?
Since we ordered it with logic that, we use nr_srcs to get number of entries
in rlist(ranges).

> > +{
> > +	struct request_queue *sq = bdev_get_queue(src_bdev);
> > +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> > +	struct bio *read_bio, *write_bio;
> > +	struct copy_ctx *ctx;
> > +	struct cio *cio;
> > +	struct page *token;
> > +	sector_t src_blk, copy_len, dst_blk;
> > +	sector_t remaining, max_copy_len = LONG_MAX;
> > +	unsigned long flags;
> > +	int ri = 0, ret = 0;
> > +
> > +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> > +	if (!cio)
> > +		return -ENOMEM;
> > +	cio->rlist = rlist;
> > +	spin_lock_init(&cio->lock);
> > +
> > +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> > +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> 
> But max_copy_range_sectors is for one sector only, right ? So what is this
> second min3() doing ? It is mixing up total length and one range length.
> The device should not have reported a per range max length larger than the
> total length in the first place, right ? If it does, that would be a very
> starnge device...
> 

Yeah you are right, makes sense, will update in next version.

> > +
> > +	for (ri = 0; ri < nr_srcs; ri++) {
> > +		cio->rlist[ri].comp_len = rlist[ri].len;
> > +		src_blk = rlist[ri].src;
> > +		dst_blk = rlist[ri].dst;
> > +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> > +			copy_len = min(remaining, max_copy_len);
> > +
> > +			token = alloc_page(gfp_mask);
> > +			if (unlikely(!token)) {
> > +				ret = -ENOMEM;
> > +				goto err_token;
> > +			}
> > +
> > +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> > +			if (!ctx) {
> > +				ret = -ENOMEM;
> > +				goto err_ctx;
> > +			}
> > +			ctx->cio = cio;
> > +			ctx->range_idx = ri;
> > +			ctx->start_sec = dst_blk;
> > +
> > +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!read_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			read_bio->bi_iter.bi_size = copy_len;
> > +			ret = submit_bio_wait(read_bio);
> > +			bio_put(read_bio);
> > +			if (ret)
> > +				goto err_read_bio;
> > +
> > +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!write_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			write_bio->bi_iter.bi_size = copy_len;
> > +			write_bio->bi_end_io = bio_copy_end_io;
> > +			write_bio->bi_private = ctx;
> > +
> > +			spin_lock_irqsave(&cio->lock, flags);
> > +			++cio->refcount;
> 
> Shouldn't this be an atomic_t ?
> 

We changed it to normal variable and used a single spin_lock to avoid race
condition on refcount and current process wakeup in completion path.
Since for making copy asynchronous, we needed to store process
context as well. So there was a possibility of race condition.
https://lore.kernel.org/all/20220209102208.GB7698@test-zns/

> And wrap lines please. Many are too long.
> 
> > +			spin_unlock_irqrestore(&cio->lock, flags);
> > +
> > +			submit_bio(write_bio);
> > +			src_blk += copy_len;
> > +			dst_blk += copy_len;
> > +		}
> > +	}
> > +
> > +	/* Wait for completion of all IO's*/
> > +	return cio_await_completion(cio);
> > +
> > +err_read_bio:
> > +	kfree(ctx);
> > +err_ctx:
> > +	__free_page(token);
> > +err_token:
> > +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> > +
> > +	cio->io_err = ret;
> > +	return cio_await_completion(cio);
> > +}
> > +
> > +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> > +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> > +{
> > +	unsigned int align_mask = max(
> > +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> > +	sector_t len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < nr; i++) {
> > +		if (rlist[i].len)
> > +			len += rlist[i].len;
> > +		else
> > +			return -EINVAL;
> 
> Reverse the if condition and return to avoid the else.
> 

acked

> > +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> > +				(rlist[i].len & align_mask))
> > +			return -EINVAL;
> > +		rlist[i].comp_len = 0;
> > +	}
> > +
> > +	if (len && len >= MAX_COPY_TOTAL_LENGTH)
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +
> > +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> > +		struct request_queue *dest_q)
> > +{
> > +	if (blk_queue_copy(dest_q) && blk_queue_copy(src_q))
> > +		return true;
> > +
> > +	return false;
> 
> return blk_queue_copy(dest_q) && blk_queue_copy(src_q);
> 
> would be simpler.
> 

acked

> > +}
> > +
> > +/*
> > + * blkdev_issue_copy - queue a copy
> > + * @src_bdev:	source block device
> > + * @nr_srcs:	number of source ranges to copy
> > + * @rlist:	array of source/dest/len
> > + * @dest_bdev:	destination block device
> > + * @gfp_mask:   memory allocation flags (for bio_alloc)
> > + *
> > + * Description:
> > + *	Copy source ranges from source block device to destination block device.
> > + *	length of a source range cannot be zero.
> > + */
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> > +		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
> 
> same comment as above about args order and naming.
> 

acked

> > +{
> > +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> > +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> > +	int ret = -EINVAL;
> > +
> > +	if (!src_q || !dest_q)
> > +		return -ENXIO;
> > +
> > +	if (!nr)
> > +		return -EINVAL;
> > +
> > +	if (nr >= MAX_COPY_NR_RANGE)
> > +		return -EINVAL;
> 
> Where do you check the number of ranges against what the device can do ?
>

The present implementation submits only one range at a time. This was done to 
make copy offload generic, so that other types of copy implementation such as
XCOPY should be able to use same infrastructure. Downside at present being
NVMe copy offload is not optimal.

> > +
> > +	if (bdev_read_only(dest_bdev))
> > +		return -EPERM;
> > +
> > +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> > +	if (ret)
> > +		return ret;
> 
> nr check should be in this function...
> 
> > +
> > +	if (blk_check_copy_offload(src_q, dest_q))
> 
> ...which should be only one function with this one.
> 

Sure, we can combine blk_copy_sanity_check and blk_check_copy_offload.

> > +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(blkdev_issue_copy);
> > +
> >  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
> >  		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
> >  		struct bio **biop, unsigned flags)
> > diff --git a/block/blk.h b/block/blk.h
> > index 434017701403..6010eda58c70 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -291,6 +291,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
> >  		break;
> >  	}
> >  
> > +	if (unlikely(op_is_copy(bio->bi_opf)))
> > +		return false;
> >  	/*
> >  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
> >  	 * This is a quick and dirty check that relies on the fact that
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index c62274466e72..f5b01f284c43 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -418,6 +418,7 @@ enum req_flag_bits {
> >  	/* for driver use */
> >  	__REQ_DRV,
> >  	__REQ_SWAP,		/* swapping request. */
> > +	__REQ_COPY,		/* copy request */
> >  	__REQ_NR_BITS,		/* stops here */
> >  };
> >  
> > @@ -443,6 +444,7 @@ enum req_flag_bits {
> >  
> >  #define REQ_DRV			(1ULL << __REQ_DRV)
> >  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> > +#define REQ_COPY		(1ULL << __REQ_COPY)
> >  
> >  #define REQ_FAILFAST_MASK \
> >  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> > @@ -459,6 +461,11 @@ enum stat_group {
> >  	NR_STAT_GROUPS
> >  };
> >  
> > +static inline bool op_is_copy(unsigned int op)
> > +{
> > +	return (op & REQ_COPY);
> > +}
> > +
> >  #define bio_op(bio) \
> >  	((bio)->bi_opf & REQ_OP_MASK)
> >  
> > @@ -533,4 +540,18 @@ struct blk_rq_stat {
> >  	u64 batch;
> >  };
> >  
> > +struct cio {
> > +	struct range_entry *rlist;
> 
> naming... This is an array, right ?
>

acked, will update in next version.

> > +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> > +	spinlock_t lock;		/* protects refcount and waiter */
> > +	int refcount;
> > +	blk_status_t io_err;
> > +};
> > +
> > +struct copy_ctx {
> > +	int range_idx;
> > +	sector_t start_sec;
> > +	struct cio *cio;
> > +};
> > +
> >  #endif /* __LINUX_BLK_TYPES_H */
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 3596fd37fae7..c6cb3fe82ba2 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -1121,6 +1121,8 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
> >  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
> >  		sector_t nr_sects, gfp_t gfp);
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *src_rlist, struct block_device *dest_bdev, gfp_t gfp_mask);
> >  
> >  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
> >  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index bdf7b404b3e7..822c28cebf3a 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -64,6 +64,20 @@ struct fstrim_range {
> >  	__u64 minlen;
> >  };
> >  
> > +/* Maximum no of entries supported */
> > +#define MAX_COPY_NR_RANGE	(1 << 12)
> 
> This value should be used also when setting the limits in the previous
> patch. max_copy_nr_ranges and max_hw_copy_nr_ranges must be bounded by it.
> 

acked.

> > +
> > +/* maximum total copy length */
> > +#define MAX_COPY_TOTAL_LENGTH	(1 << 27)
> 
> Same for this one. And where does this magic number come from ?
>

We used this as max size for local testing, so as not to hang resources in case
of emulation. Feel free to suggest better values if you have anything in mind !!

> > +
> > +/* Source range entry for copy */
> > +struct range_entry {
> > +	__u64 src;
> > +	__u64 dst;
> > +	__u64 len;
> > +	__u64 comp_len;
> 
> Please describe the fields of this structure. The meaning of them is
> really not clear from the names.
> 

acked

> > +};
> > +
> >  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
> >  #define FILE_DEDUPE_RANGE_SAME		0
> >  #define FILE_DEDUPE_RANGE_DIFFERS	1
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
  2022-04-27  1:59         ` [dm-devel] " Damien Le Moal
@ 2022-04-27 15:30           ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:30 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, inux-kernel

[-- Attachment #1: Type: text/plain, Size: 12544 bytes --]

On Wed, Apr 27, 2022 at 10:59:01AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > Add device limits as sysfs entries,
> >         - copy_offload (RW)
> >         - copy_max_bytes (RW)
> >         - copy_max_hw_bytes (RO)
> >         - copy_max_range_bytes (RW)
> >         - copy_max_range_hw_bytes (RO)
> >         - copy_max_nr_ranges (RW)
> >         - copy_max_nr_ranges_hw (RO)
> > 
> > Above limits help to split the copy payload in block layer.
> > copy_offload, used for setting copy offload(1) or emulation(0).
> > copy_max_bytes: maximum total length of copy in single payload.
> > copy_max_range_bytes: maximum length in a single entry.
> > copy_max_nr_ranges: maximum number of entries in a payload.
> > copy_max_*_hw_*: Reflects the device supported maximum limits.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >  Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
> >  block/blk-settings.c                 |  59 ++++++++++++
> >  block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
> >  include/linux/blkdev.h               |  13 +++
> >  4 files changed, 293 insertions(+)
> > 
> > diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> > index e8797cd09aff..65e64b5a0105 100644
> > --- a/Documentation/ABI/stable/sysfs-block
> > +++ b/Documentation/ABI/stable/sysfs-block
> > @@ -155,6 +155,89 @@ Description:
> >  		last zone of the device which may be smaller.
> >  
> >  
> > +What:		/sys/block/<disk>/queue/copy_offload
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] When read, this file shows whether offloading copy to
> > +		device is enabled (1) or disabled (0). Writing '0' to this
> > +		file will disable offloading copies for this device.
> > +		Writing any '1' value will enable this feature.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
> > +		device, 'copy_max_bytes' setting is the software limit.
> > +		Setting this value lower will make Linux issue smaller size
> > +		copies.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_hw_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Devices that support offloading copy functionality may have
> > +		internal limits on the number of bytes that can be offloaded
> > +		in a single operation. The `copy_max_hw_bytes`
> > +		parameter is set by the device driver to the maximum number of
> > +		bytes that can be copied in a single operation. Copy
> > +		requests issued to the device must not exceed this limit.
> > +		A value of 0 means that the device does not
> > +		support copy offload.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_nr_ranges
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
> > +		device, 'copy_max_nr_ranges' setting is the software limit.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Devices that support offloading copy functionality may have
> > +		internal limits on the number of ranges in single copy operation
> > +		that can be offloaded in a single operation.
> > +		A range is tuple of source, destination and length of data
> > +		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
> > +		the device driver to the maximum number of ranges that can be
> > +		copied in a single operation. Copy requests issued to the device
> > +		must not exceed this limit. A value of 0 means that the device
> > +		does not support copy offload.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_range_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
> > +		the device, 'copy_max_range_bytes' setting is the software
> > +		limit.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Devices that support offloading copy functionality may have
> > +		internal limits on the size of data, that can be copied in a
> > +		single range within a single copy operation.
> > +		A range is tuple of source, destination and length of data to be
> > +		copied. The `copy_max_range_hw_bytes` parameter is set by the
> > +		device driver to set the maximum length in bytes of a range
> > +		that can be copied in an operation.
> > +		Copy requests issued to the device must not exceed this limit.
> > +		Sum of sizes of all ranges in a single opeartion should not
> > +		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
> > +		does not support copy offload.
> > +
> > +
> >  What:		/sys/block/<disk>/queue/crypto/
> >  Date:		February 2022
> >  Contact:	linux-block@vger.kernel.org
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 6ccceb421ed2..70167aee3bf7 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
> >  	lim->misaligned = 0;
> >  	lim->zoned = BLK_ZONED_NONE;
> >  	lim->zone_write_granularity = 0;
> > +	lim->max_hw_copy_sectors = 0;
> 
> For readability, I would keep "hw" next to sectors/nr_ranges:
> 
> max_copy_hw_sectors
> max_copy_sectors
> max_copy_hw_nr_ranges
> max_copy_nr_ranges
> max_copy_range_hw_sectors
> max_copy_range_sectors
>

acked

> > +	lim->max_copy_sectors = 0;
> > +	lim->max_hw_copy_nr_ranges = 0;
> > +	lim->max_copy_nr_ranges = 0;
> > +	lim->max_hw_copy_range_sectors = 0;
> > +	lim->max_copy_range_sectors = 0;
> >  }
> >  EXPORT_SYMBOL(blk_set_default_limits);
> >  
> > @@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> >  	lim->max_dev_sectors = UINT_MAX;
> >  	lim->max_write_zeroes_sectors = UINT_MAX;
> >  	lim->max_zone_append_sectors = UINT_MAX;
> > +	lim->max_hw_copy_sectors = ULONG_MAX;
> > +	lim->max_copy_sectors = ULONG_MAX;
> > +	lim->max_hw_copy_range_sectors = UINT_MAX;
> > +	lim->max_copy_range_sectors = UINT_MAX;
> > +	lim->max_hw_copy_nr_ranges = USHRT_MAX;
> > +	lim->max_copy_nr_ranges = USHRT_MAX;
> >  }
> >  EXPORT_SYMBOL(blk_set_stacking_limits);
> >  
> > @@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
> >  }
> >  EXPORT_SYMBOL(blk_queue_max_discard_sectors);
> >  
> > +/**
> > + * blk_queue_max_copy_sectors - set max sectors for a single copy payload
> > + * @q:  the request queue for the device
> > + * @max_copy_sectors: maximum number of sectors to copy
> > + **/
> > +void blk_queue_max_copy_sectors(struct request_queue *q,
> 
> This should be blk_queue_max_copy_hw_sectors().
>

acked. Reasoning being, this function is used only by driver once for setting hw
limits ?

> > +		unsigned int max_copy_sectors)
> > +{
> > +	q->limits.max_hw_copy_sectors = max_copy_sectors;
> > +	q->limits.max_copy_sectors = max_copy_sectors;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> > +
> > +/**
> > + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
> > + * @q:  the request queue for the device
> > + * @max_copy_range_sectors: maximum number of sectors to copy in a single range
> > + **/
> > +void blk_queue_max_copy_range_sectors(struct request_queue *q,
> 
> And this should be blk_queue_max_copy_range_hw_sectors(). Etc for the
> other ones below.
> 

acked

> > +		unsigned int max_copy_range_sectors)
> > +{
> > +	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
> > +	q->limits.max_copy_range_sectors = max_copy_range_sectors;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
> > +
> > +/**
> > + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
> > + * @q:  the request queue for the device
> > + * @max_copy_nr_ranges: maximum number of ranges
> > + **/
> > +void blk_queue_max_copy_nr_ranges(struct request_queue *q,
> > +		unsigned int max_copy_nr_ranges)
> > +{
> > +	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
> > +	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
> > +
> >  /**
> >   * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
> >   * @q:  the request queue for the device
> > @@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >  	t->max_segment_size = min_not_zero(t->max_segment_size,
> >  					   b->max_segment_size);
> >  
> > +	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
> > +	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
> > +	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
> > +	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
> > +						b->max_hw_copy_range_sectors);
> > +	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
> > +	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
> > +
> >  	t->misaligned |= b->misaligned;
> >  
> >  	alignment = queue_limit_alignment_offset(b, start);
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 88bd41d4cb59..bae987c10f7f 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
> >  	return queue_var_show(0, page);
> >  }
> >  
> > +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> > +{
> > +	return queue_var_show(blk_queue_copy(q), page);
> > +}
> > +
> > +static ssize_t queue_copy_offload_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long copy_offload;
> > +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (copy_offload && !q->limits.max_hw_copy_sectors)
> > +		return -EINVAL;
> > +
> > +	if (copy_offload)
> > +		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
> > +	else
> > +		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
> > +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_max_show(struct request_queue *q, char *page> +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_copy_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_max_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long max_copy;
> > +	ssize_t ret = queue_var_store(&max_copy, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (max_copy & (queue_logical_block_size(q) - 1))
> > +		return -EINVAL;
> > +
> > +	max_copy >>= 9;
> > +	if (max_copy > q->limits.max_hw_copy_sectors)
> > +		max_copy = q->limits.max_hw_copy_sectors;
> > +
> > +	q->limits.max_copy_sectors = max_copy;
> > +	return ret;
> > +}
> > +
> > +static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
> > +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_range_max_show(struct request_queue *q,
> > +		char *page)
> > +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_copy_range_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_range_max_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long max_copy;
> > +	ssize_t ret = queue_var_store(&max_copy, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (max_copy & (queue_logical_block_size(q) - 1))
> > +		return -EINVAL;
> > +
> > +	max_copy >>= 9;
> > +	if (max_copy > UINT_MAX)
> 
> On 32-bits arch, unsigned long and unsigned int are the same so this test
> is useless for these arch. Better have max_copy declared as unsigned long
> long.
>

acked

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
@ 2022-04-27 15:30           ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:30 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: inux-kernel, linux-scsi, nitheshshetty, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 12544 bytes --]

On Wed, Apr 27, 2022 at 10:59:01AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > Add device limits as sysfs entries,
> >         - copy_offload (RW)
> >         - copy_max_bytes (RW)
> >         - copy_max_hw_bytes (RO)
> >         - copy_max_range_bytes (RW)
> >         - copy_max_range_hw_bytes (RO)
> >         - copy_max_nr_ranges (RW)
> >         - copy_max_nr_ranges_hw (RO)
> > 
> > Above limits help to split the copy payload in block layer.
> > copy_offload, used for setting copy offload(1) or emulation(0).
> > copy_max_bytes: maximum total length of copy in single payload.
> > copy_max_range_bytes: maximum length in a single entry.
> > copy_max_nr_ranges: maximum number of entries in a payload.
> > copy_max_*_hw_*: Reflects the device supported maximum limits.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >  Documentation/ABI/stable/sysfs-block |  83 ++++++++++++++++
> >  block/blk-settings.c                 |  59 ++++++++++++
> >  block/blk-sysfs.c                    | 138 +++++++++++++++++++++++++++
> >  include/linux/blkdev.h               |  13 +++
> >  4 files changed, 293 insertions(+)
> > 
> > diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> > index e8797cd09aff..65e64b5a0105 100644
> > --- a/Documentation/ABI/stable/sysfs-block
> > +++ b/Documentation/ABI/stable/sysfs-block
> > @@ -155,6 +155,89 @@ Description:
> >  		last zone of the device which may be smaller.
> >  
> >  
> > +What:		/sys/block/<disk>/queue/copy_offload
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] When read, this file shows whether offloading copy to
> > +		device is enabled (1) or disabled (0). Writing '0' to this
> > +		file will disable offloading copies for this device.
> > +		Writing any '1' value will enable this feature.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] While 'copy_max_hw_bytes' is the hardware limit for the
> > +		device, 'copy_max_bytes' setting is the software limit.
> > +		Setting this value lower will make Linux issue smaller size
> > +		copies.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_hw_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Devices that support offloading copy functionality may have
> > +		internal limits on the number of bytes that can be offloaded
> > +		in a single operation. The `copy_max_hw_bytes`
> > +		parameter is set by the device driver to the maximum number of
> > +		bytes that can be copied in a single operation. Copy
> > +		requests issued to the device must not exceed this limit.
> > +		A value of 0 means that the device does not
> > +		support copy offload.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_nr_ranges
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] While 'copy_max_nr_ranges_hw' is the hardware limit for the
> > +		device, 'copy_max_nr_ranges' setting is the software limit.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_nr_ranges_hw
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Devices that support offloading copy functionality may have
> > +		internal limits on the number of ranges in single copy operation
> > +		that can be offloaded in a single operation.
> > +		A range is tuple of source, destination and length of data
> > +		to be copied. The `copy_max_nr_ranges_hw` parameter is set by
> > +		the device driver to the maximum number of ranges that can be
> > +		copied in a single operation. Copy requests issued to the device
> > +		must not exceed this limit. A value of 0 means that the device
> > +		does not support copy offload.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_range_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RW] While 'copy_max_range_hw_bytes' is the hardware limit for
> > +		the device, 'copy_max_range_bytes' setting is the software
> > +		limit.
> > +
> > +
> > +What:		/sys/block/<disk>/queue/copy_max_range_hw_bytes
> > +Date:		April 2022
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Devices that support offloading copy functionality may have
> > +		internal limits on the size of data, that can be copied in a
> > +		single range within a single copy operation.
> > +		A range is tuple of source, destination and length of data to be
> > +		copied. The `copy_max_range_hw_bytes` parameter is set by the
> > +		device driver to set the maximum length in bytes of a range
> > +		that can be copied in an operation.
> > +		Copy requests issued to the device must not exceed this limit.
> > +		Sum of sizes of all ranges in a single opeartion should not
> > +		exceed 'copy_max_hw_bytes'. A value of 0 means that the device
> > +		does not support copy offload.
> > +
> > +
> >  What:		/sys/block/<disk>/queue/crypto/
> >  Date:		February 2022
> >  Contact:	linux-block@vger.kernel.org
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 6ccceb421ed2..70167aee3bf7 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -57,6 +57,12 @@ void blk_set_default_limits(struct queue_limits *lim)
> >  	lim->misaligned = 0;
> >  	lim->zoned = BLK_ZONED_NONE;
> >  	lim->zone_write_granularity = 0;
> > +	lim->max_hw_copy_sectors = 0;
> 
> For readability, I would keep "hw" next to sectors/nr_ranges:
> 
> max_copy_hw_sectors
> max_copy_sectors
> max_copy_hw_nr_ranges
> max_copy_nr_ranges
> max_copy_range_hw_sectors
> max_copy_range_sectors
>

acked

> > +	lim->max_copy_sectors = 0;
> > +	lim->max_hw_copy_nr_ranges = 0;
> > +	lim->max_copy_nr_ranges = 0;
> > +	lim->max_hw_copy_range_sectors = 0;
> > +	lim->max_copy_range_sectors = 0;
> >  }
> >  EXPORT_SYMBOL(blk_set_default_limits);
> >  
> > @@ -81,6 +87,12 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> >  	lim->max_dev_sectors = UINT_MAX;
> >  	lim->max_write_zeroes_sectors = UINT_MAX;
> >  	lim->max_zone_append_sectors = UINT_MAX;
> > +	lim->max_hw_copy_sectors = ULONG_MAX;
> > +	lim->max_copy_sectors = ULONG_MAX;
> > +	lim->max_hw_copy_range_sectors = UINT_MAX;
> > +	lim->max_copy_range_sectors = UINT_MAX;
> > +	lim->max_hw_copy_nr_ranges = USHRT_MAX;
> > +	lim->max_copy_nr_ranges = USHRT_MAX;
> >  }
> >  EXPORT_SYMBOL(blk_set_stacking_limits);
> >  
> > @@ -177,6 +189,45 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
> >  }
> >  EXPORT_SYMBOL(blk_queue_max_discard_sectors);
> >  
> > +/**
> > + * blk_queue_max_copy_sectors - set max sectors for a single copy payload
> > + * @q:  the request queue for the device
> > + * @max_copy_sectors: maximum number of sectors to copy
> > + **/
> > +void blk_queue_max_copy_sectors(struct request_queue *q,
> 
> This should be blk_queue_max_copy_hw_sectors().
>

acked. Reasoning being, this function is used only by driver once for setting hw
limits ?

> > +		unsigned int max_copy_sectors)
> > +{
> > +	q->limits.max_hw_copy_sectors = max_copy_sectors;
> > +	q->limits.max_copy_sectors = max_copy_sectors;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> > +
> > +/**
> > + * blk_queue_max_copy_range_sectors - set max sectors for a single range, in a copy payload
> > + * @q:  the request queue for the device
> > + * @max_copy_range_sectors: maximum number of sectors to copy in a single range
> > + **/
> > +void blk_queue_max_copy_range_sectors(struct request_queue *q,
> 
> And this should be blk_queue_max_copy_range_hw_sectors(). Etc for the
> other ones below.
> 

acked

> > +		unsigned int max_copy_range_sectors)
> > +{
> > +	q->limits.max_hw_copy_range_sectors = max_copy_range_sectors;
> > +	q->limits.max_copy_range_sectors = max_copy_range_sectors;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_range_sectors);
> > +
> > +/**
> > + * blk_queue_max_copy_nr_ranges - set max number of ranges, in a copy payload
> > + * @q:  the request queue for the device
> > + * @max_copy_nr_ranges: maximum number of ranges
> > + **/
> > +void blk_queue_max_copy_nr_ranges(struct request_queue *q,
> > +		unsigned int max_copy_nr_ranges)
> > +{
> > +	q->limits.max_hw_copy_nr_ranges = max_copy_nr_ranges;
> > +	q->limits.max_copy_nr_ranges = max_copy_nr_ranges;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_nr_ranges);
> > +
> >  /**
> >   * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
> >   * @q:  the request queue for the device
> > @@ -572,6 +623,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >  	t->max_segment_size = min_not_zero(t->max_segment_size,
> >  					   b->max_segment_size);
> >  
> > +	t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
> > +	t->max_hw_copy_sectors = min(t->max_hw_copy_sectors, b->max_hw_copy_sectors);
> > +	t->max_copy_range_sectors = min(t->max_copy_range_sectors, b->max_copy_range_sectors);
> > +	t->max_hw_copy_range_sectors = min(t->max_hw_copy_range_sectors,
> > +						b->max_hw_copy_range_sectors);
> > +	t->max_copy_nr_ranges = min(t->max_copy_nr_ranges, b->max_copy_nr_ranges);
> > +	t->max_hw_copy_nr_ranges = min(t->max_hw_copy_nr_ranges, b->max_hw_copy_nr_ranges);
> > +
> >  	t->misaligned |= b->misaligned;
> >  
> >  	alignment = queue_limit_alignment_offset(b, start);
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 88bd41d4cb59..bae987c10f7f 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -212,6 +212,129 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
> >  	return queue_var_show(0, page);
> >  }
> >  
> > +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> > +{
> > +	return queue_var_show(blk_queue_copy(q), page);
> > +}
> > +
> > +static ssize_t queue_copy_offload_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long copy_offload;
> > +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (copy_offload && !q->limits.max_hw_copy_sectors)
> > +		return -EINVAL;
> > +
> > +	if (copy_offload)
> > +		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
> > +	else
> > +		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
> > +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_hw_copy_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_max_show(struct request_queue *q, char *page> +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_copy_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_max_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long max_copy;
> > +	ssize_t ret = queue_var_store(&max_copy, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (max_copy & (queue_logical_block_size(q) - 1))
> > +		return -EINVAL;
> > +
> > +	max_copy >>= 9;
> > +	if (max_copy > q->limits.max_hw_copy_sectors)
> > +		max_copy = q->limits.max_hw_copy_sectors;
> > +
> > +	q->limits.max_copy_sectors = max_copy;
> > +	return ret;
> > +}
> > +
> > +static ssize_t queue_copy_range_max_hw_show(struct request_queue *q, char *page)
> > +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_hw_copy_range_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_range_max_show(struct request_queue *q,
> > +		char *page)
> > +{
> > +	return sprintf(page, "%llu\n",
> > +		(unsigned long long)q->limits.max_copy_range_sectors << 9);
> > +}
> > +
> > +static ssize_t queue_copy_range_max_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long max_copy;
> > +	ssize_t ret = queue_var_store(&max_copy, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (max_copy & (queue_logical_block_size(q) - 1))
> > +		return -EINVAL;
> > +
> > +	max_copy >>= 9;
> > +	if (max_copy > UINT_MAX)
> 
> On 32-bits arch, unsigned long and unsigned int are the same so this test
> is useless for these arch. Better have max_copy declared as unsigned long
> long.
>

acked

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-27  1:46     ` [dm-devel] " Damien Le Moal
@ 2022-04-27 15:38       ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:38 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

On Wed, Apr 27, 2022 at 10:46:32AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > The patch series covers the points discussed in November 2021 virtual call
> > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > We have covered the Initial agreed requirements in this patchset.
> > Patchset borrows Mikulas's token based approach for 2 bdev
> > implementation.
> > 
> > Overall series supports –
> > 
> > 1. Driver
> > - NVMe Copy command (single NS), including support in nvme-target (for
> >     block and file backend)
> > 
> > 2. Block layer
> > - Block-generic copy (REQ_COPY flag), with interface accommodating
> >     two block-devs, and multi-source/destination interface
> > - Emulation, when offload is natively absent
> > - dm-linear support (for cases not requiring split)
> > 
> > 3. User-interface
> > - new ioctl
> > - copy_file_range for zonefs
> > 
> > 4. In-kernel user
> > - dm-kcopyd
> > - copy_file_range in zonefs
> > 
> > For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> > testing is done at this point using a custom application for unit testing.
> 
> https://protect2.fireeye.com/v1/url?k=b14bf8e1-d0361099-b14a73ae-74fe485fffb1-9bd9bbb269af18f9&q=1&e=b9714c29-ea22-4fa5-8a2a-eeb42ca4bdc1&u=https%3A%2F%2Fgithub.com%2Fwesterndigitalcorporation%2Fzonefs-tools
> 
> ./configure --with-tests
> make
> sudo make install
> 
> Then run tests/zonefs-tests.sh
> 
> Adding test case is simple. Just add script files under tests/scripts
> 
> I just realized that the README file of this project is not documenting
> this. I will update it.
>

Thank you. We will try to use this.
Any plans to integrate this testsuite with fstests(xfstest) ?

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27 15:38       ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:38 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1779 bytes --]

On Wed, Apr 27, 2022 at 10:46:32AM +0900, Damien Le Moal wrote:
> On 4/26/22 19:12, Nitesh Shetty wrote:
> > The patch series covers the points discussed in November 2021 virtual call
> > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > We have covered the Initial agreed requirements in this patchset.
> > Patchset borrows Mikulas's token based approach for 2 bdev
> > implementation.
> > 
> > Overall series supports –
> > 
> > 1. Driver
> > - NVMe Copy command (single NS), including support in nvme-target (for
> >     block and file backend)
> > 
> > 2. Block layer
> > - Block-generic copy (REQ_COPY flag), with interface accommodating
> >     two block-devs, and multi-source/destination interface
> > - Emulation, when offload is natively absent
> > - dm-linear support (for cases not requiring split)
> > 
> > 3. User-interface
> > - new ioctl
> > - copy_file_range for zonefs
> > 
> > 4. In-kernel user
> > - dm-kcopyd
> > - copy_file_range in zonefs
> > 
> > For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
> > testing is done at this point using a custom application for unit testing.
> 
> https://protect2.fireeye.com/v1/url?k=b14bf8e1-d0361099-b14a73ae-74fe485fffb1-9bd9bbb269af18f9&q=1&e=b9714c29-ea22-4fa5-8a2a-eeb42ca4bdc1&u=https%3A%2F%2Fgithub.com%2Fwesterndigitalcorporation%2Fzonefs-tools
> 
> ./configure --with-tests
> make
> sudo make install
> 
> Then run tests/zonefs-tests.sh
> 
> Adding test case is simple. Just add script files under tests/scripts
> 
> I just realized that the README file of this project is not documenting
> this. I will update it.
>

Thank you. We will try to use this.
Any plans to integrate this testsuite with fstests(xfstest) ?

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-27 10:29         ` Hannes Reinecke
@ 2022-04-27 15:48           ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:48 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6739 bytes --]

On Wed, Apr 27, 2022 at 12:29:15PM +0200, Hannes Reinecke wrote:
> On 4/26/22 12:12, Nitesh Shetty wrote:
> > Introduce blkdev_issue_copy which supports source and destination bdevs,
> > and an array of (source, destination and copy length) tuples.
> > Introduce REQ_COPY copy offload operation flag. Create a read-write
> > bio pair with a token as payload and submitted to the device in order.
> > Read request populates token with source specific information which
> > is then passed with write request.
> > This design is courtesy Mikulas Patocka's token based copy
> > 
> > Larger copy will be divided, based on max_copy_sectors,
> > max_copy_range_sector limits.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >   block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
> >   block/blk.h               |   2 +
> >   include/linux/blk_types.h |  21 ++++
> >   include/linux/blkdev.h    |   2 +
> >   include/uapi/linux/fs.h   |  14 +++
> >   5 files changed, 271 insertions(+)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 09b7e1200c0f..ba9da2d2f429 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >   }
> >   EXPORT_SYMBOL(blkdev_issue_discard);
> > +/*
> > + * Wait on and process all in-flight BIOs.  This must only be called once
> > + * all bios have been issued so that the refcount can only decrease.
> > + * This just waits for all bios to make it through bio_copy_end_io. IO
> > + * errors are propagated through cio->io_error.
> > + */
> > +static int cio_await_completion(struct cio *cio)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (cio->refcount) {
> > +		cio->waiter = current;
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		spin_unlock_irqrestore(&cio->lock, flags);
> > +		blk_io_schedule();
> > +		/* wake up sets us TASK_RUNNING */
> > +		spin_lock_irqsave(&cio->lock, flags);
> > +		cio->waiter = NULL;
> > +		ret = cio->io_err;
> > +	}
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	kvfree(cio);
> > +
> > +	return ret;
> > +}
> > +
> > +static void bio_copy_end_io(struct bio *bio)
> > +{
> > +	struct copy_ctx *ctx = bio->bi_private;
> > +	struct cio *cio = ctx->cio;
> > +	sector_t clen;
> > +	int ri = ctx->range_idx;
> > +	unsigned long flags;
> > +	bool wake = false;
> > +
> > +	if (bio->bi_status) {
> > +		cio->io_err = bio->bi_status;
> > +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> > +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> > +	}
> > +	__free_page(bio->bi_io_vec[0].bv_page);
> > +	kfree(ctx);
> > +	bio_put(bio);
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (((--cio->refcount) <= 0) && cio->waiter)
> > +		wake = true;
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	if (wake)
> > +		wake_up_process(cio->waiter);
> > +}
> > +
> > +/*
> > + * blk_copy_offload	- Use device's native copy offload feature
> > + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> > + */
> > +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> > +{
> > +	struct request_queue *sq = bdev_get_queue(src_bdev);
> > +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> > +	struct bio *read_bio, *write_bio;
> > +	struct copy_ctx *ctx;
> > +	struct cio *cio;
> > +	struct page *token;
> > +	sector_t src_blk, copy_len, dst_blk;
> > +	sector_t remaining, max_copy_len = LONG_MAX;
> > +	unsigned long flags;
> > +	int ri = 0, ret = 0;
> > +
> > +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> > +	if (!cio)
> > +		return -ENOMEM;
> > +	cio->rlist = rlist;
> > +	spin_lock_init(&cio->lock);
> > +
> > +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> > +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> > +
> > +	for (ri = 0; ri < nr_srcs; ri++) {
> > +		cio->rlist[ri].comp_len = rlist[ri].len;
> > +		src_blk = rlist[ri].src;
> > +		dst_blk = rlist[ri].dst;
> > +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> > +			copy_len = min(remaining, max_copy_len);
> > +
> > +			token = alloc_page(gfp_mask);
> > +			if (unlikely(!token)) {
> > +				ret = -ENOMEM;
> > +				goto err_token;
> > +			}
> > +
> > +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> > +			if (!ctx) {
> > +				ret = -ENOMEM;
> > +				goto err_ctx;
> > +			}
> > +			ctx->cio = cio;
> > +			ctx->range_idx = ri;
> > +			ctx->start_sec = dst_blk;
> > +
> > +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!read_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			read_bio->bi_iter.bi_size = copy_len;
> > +			ret = submit_bio_wait(read_bio);
> > +			bio_put(read_bio);
> > +			if (ret)
> > +				goto err_read_bio;
> > +
> > +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!write_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			write_bio->bi_iter.bi_size = copy_len;
> > +			write_bio->bi_end_io = bio_copy_end_io;
> > +			write_bio->bi_private = ctx;
> > +
> > +			spin_lock_irqsave(&cio->lock, flags);
> > +			++cio->refcount;
> > +			spin_unlock_irqrestore(&cio->lock, flags);
> > +
> > +			submit_bio(write_bio);
> > +			src_blk += copy_len;
> > +			dst_blk += copy_len;
> > +		}
> > +	}
> > +
> 
> Hmm. I'm not sure if I like the copy loop.
> What I definitely would do is to allocate the write bio before reading data;
> after all, if we can't allocate the write bio reading is pretty much
> pointless.
> 
> But the real issue I have with this is that it's doing synchronous reads,
> thereby limiting the performance.
> 
> Can't you submit the write bio from the end_io function of the read bio?
> That would disentangle things, and we should be getting a better
> performance.
> 

Agree, it will make code efficient.

--
Thank you 
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-27 15:48           ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-27 15:48 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 6739 bytes --]

On Wed, Apr 27, 2022 at 12:29:15PM +0200, Hannes Reinecke wrote:
> On 4/26/22 12:12, Nitesh Shetty wrote:
> > Introduce blkdev_issue_copy which supports source and destination bdevs,
> > and an array of (source, destination and copy length) tuples.
> > Introduce REQ_COPY copy offload operation flag. Create a read-write
> > bio pair with a token as payload and submitted to the device in order.
> > Read request populates token with source specific information which
> > is then passed with write request.
> > This design is courtesy Mikulas Patocka's token based copy
> > 
> > Larger copy will be divided, based on max_copy_sectors,
> > max_copy_range_sector limits.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >   block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
> >   block/blk.h               |   2 +
> >   include/linux/blk_types.h |  21 ++++
> >   include/linux/blkdev.h    |   2 +
> >   include/uapi/linux/fs.h   |  14 +++
> >   5 files changed, 271 insertions(+)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 09b7e1200c0f..ba9da2d2f429 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >   }
> >   EXPORT_SYMBOL(blkdev_issue_discard);
> > +/*
> > + * Wait on and process all in-flight BIOs.  This must only be called once
> > + * all bios have been issued so that the refcount can only decrease.
> > + * This just waits for all bios to make it through bio_copy_end_io. IO
> > + * errors are propagated through cio->io_error.
> > + */
> > +static int cio_await_completion(struct cio *cio)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (cio->refcount) {
> > +		cio->waiter = current;
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		spin_unlock_irqrestore(&cio->lock, flags);
> > +		blk_io_schedule();
> > +		/* wake up sets us TASK_RUNNING */
> > +		spin_lock_irqsave(&cio->lock, flags);
> > +		cio->waiter = NULL;
> > +		ret = cio->io_err;
> > +	}
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	kvfree(cio);
> > +
> > +	return ret;
> > +}
> > +
> > +static void bio_copy_end_io(struct bio *bio)
> > +{
> > +	struct copy_ctx *ctx = bio->bi_private;
> > +	struct cio *cio = ctx->cio;
> > +	sector_t clen;
> > +	int ri = ctx->range_idx;
> > +	unsigned long flags;
> > +	bool wake = false;
> > +
> > +	if (bio->bi_status) {
> > +		cio->io_err = bio->bi_status;
> > +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> > +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> > +	}
> > +	__free_page(bio->bi_io_vec[0].bv_page);
> > +	kfree(ctx);
> > +	bio_put(bio);
> > +
> > +	spin_lock_irqsave(&cio->lock, flags);
> > +	if (((--cio->refcount) <= 0) && cio->waiter)
> > +		wake = true;
> > +	spin_unlock_irqrestore(&cio->lock, flags);
> > +	if (wake)
> > +		wake_up_process(cio->waiter);
> > +}
> > +
> > +/*
> > + * blk_copy_offload	- Use device's native copy offload feature
> > + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> > + */
> > +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> > +{
> > +	struct request_queue *sq = bdev_get_queue(src_bdev);
> > +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> > +	struct bio *read_bio, *write_bio;
> > +	struct copy_ctx *ctx;
> > +	struct cio *cio;
> > +	struct page *token;
> > +	sector_t src_blk, copy_len, dst_blk;
> > +	sector_t remaining, max_copy_len = LONG_MAX;
> > +	unsigned long flags;
> > +	int ri = 0, ret = 0;
> > +
> > +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> > +	if (!cio)
> > +		return -ENOMEM;
> > +	cio->rlist = rlist;
> > +	spin_lock_init(&cio->lock);
> > +
> > +	max_copy_len = min_t(sector_t, sq->limits.max_copy_sectors, dq->limits.max_copy_sectors);
> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> > +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> > +
> > +	for (ri = 0; ri < nr_srcs; ri++) {
> > +		cio->rlist[ri].comp_len = rlist[ri].len;
> > +		src_blk = rlist[ri].src;
> > +		dst_blk = rlist[ri].dst;
> > +		for (remaining = rlist[ri].len; remaining > 0; remaining -= copy_len) {
> > +			copy_len = min(remaining, max_copy_len);
> > +
> > +			token = alloc_page(gfp_mask);
> > +			if (unlikely(!token)) {
> > +				ret = -ENOMEM;
> > +				goto err_token;
> > +			}
> > +
> > +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> > +			if (!ctx) {
> > +				ret = -ENOMEM;
> > +				goto err_ctx;
> > +			}
> > +			ctx->cio = cio;
> > +			ctx->range_idx = ri;
> > +			ctx->start_sec = dst_blk;
> > +
> > +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!read_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			read_bio->bi_iter.bi_size = copy_len;
> > +			ret = submit_bio_wait(read_bio);
> > +			bio_put(read_bio);
> > +			if (ret)
> > +				goto err_read_bio;
> > +
> > +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!write_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> > +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> > +			/*__bio_add_page increases bi_size by len, so overwrite it with copy len*/
> > +			write_bio->bi_iter.bi_size = copy_len;
> > +			write_bio->bi_end_io = bio_copy_end_io;
> > +			write_bio->bi_private = ctx;
> > +
> > +			spin_lock_irqsave(&cio->lock, flags);
> > +			++cio->refcount;
> > +			spin_unlock_irqrestore(&cio->lock, flags);
> > +
> > +			submit_bio(write_bio);
> > +			src_blk += copy_len;
> > +			dst_blk += copy_len;
> > +		}
> > +	}
> > +
> 
> Hmm. I'm not sure if I like the copy loop.
> What I definitely would do is to allocate the write bio before reading data;
> after all, if we can't allocate the write bio reading is pretty much
> pointless.
> 
> But the real issue I have with this is that it's doing synchronous reads,
> thereby limiting the performance.
> 
> Can't you submit the write bio from the end_io function of the read bio?
> That would disentangle things, and we should be getting a better
> performance.
> 

Agree, it will make code efficient.

--
Thank you 
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-27 15:38       ` [dm-devel] " Nitesh Shetty
@ 2022-04-27 21:56         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 21:56 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

On 4/28/22 00:38, Nitesh Shetty wrote:
> On Wed, Apr 27, 2022 at 10:46:32AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual call
>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>> We have covered the Initial agreed requirements in this patchset.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> Overall series supports –
>>>
>>> 1. Driver
>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>     block and file backend)
>>>
>>> 2. Block layer
>>> - Block-generic copy (REQ_COPY flag), with interface accommodating
>>>     two block-devs, and multi-source/destination interface
>>> - Emulation, when offload is natively absent
>>> - dm-linear support (for cases not requiring split)
>>>
>>> 3. User-interface
>>> - new ioctl
>>> - copy_file_range for zonefs
>>>
>>> 4. In-kernel user
>>> - dm-kcopyd
>>> - copy_file_range in zonefs
>>>
>>> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
>>> testing is done at this point using a custom application for unit testing.
>>
>> https://protect2.fireeye.com/v1/url?k=b14bf8e1-d0361099-b14a73ae-74fe485fffb1-9bd9bbb269af18f9&q=1&e=b9714c29-ea22-4fa5-8a2a-eeb42ca4bdc1&u=https%3A%2F%2Fgithub.com%2Fwesterndigitalcorporation%2Fzonefs-tools
>>
>> ./configure --with-tests
>> make
>> sudo make install
>>
>> Then run tests/zonefs-tests.sh
>>
>> Adding test case is simple. Just add script files under tests/scripts
>>
>> I just realized that the README file of this project is not documenting
>> this. I will update it.
>>
> 
> Thank you. We will try to use this.
> Any plans to integrate this testsuite with fstests(xfstest) ?

No. It is not a good fit since zonefs cannot pass most of the generic test
cases.

> 
> --
> Nitesh Shetty
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27 21:56         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 21:56 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/28/22 00:38, Nitesh Shetty wrote:
> On Wed, Apr 27, 2022 at 10:46:32AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual call
>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>> We have covered the Initial agreed requirements in this patchset.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> Overall series supports –
>>>
>>> 1. Driver
>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>     block and file backend)
>>>
>>> 2. Block layer
>>> - Block-generic copy (REQ_COPY flag), with interface accommodating
>>>     two block-devs, and multi-source/destination interface
>>> - Emulation, when offload is natively absent
>>> - dm-linear support (for cases not requiring split)
>>>
>>> 3. User-interface
>>> - new ioctl
>>> - copy_file_range for zonefs
>>>
>>> 4. In-kernel user
>>> - dm-kcopyd
>>> - copy_file_range in zonefs
>>>
>>> For zonefs copy_file_range - Seems we cannot levearge fstest here. Limited
>>> testing is done at this point using a custom application for unit testing.
>>
>> https://protect2.fireeye.com/v1/url?k=b14bf8e1-d0361099-b14a73ae-74fe485fffb1-9bd9bbb269af18f9&q=1&e=b9714c29-ea22-4fa5-8a2a-eeb42ca4bdc1&u=https%3A%2F%2Fgithub.com%2Fwesterndigitalcorporation%2Fzonefs-tools
>>
>> ./configure --with-tests
>> make
>> sudo make install
>>
>> Then run tests/zonefs-tests.sh
>>
>> Adding test case is simple. Just add script files under tests/scripts
>>
>> I just realized that the README file of this project is not documenting
>> this. I will update it.
>>
> 
> Thank you. We will try to use this.
> Any plans to integrate this testsuite with fstests(xfstest) ?

No. It is not a good fit since zonefs cannot pass most of the generic test
cases.

> 
> --
> Nitesh Shetty
> 
> 


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
  2022-04-27 15:30           ` [dm-devel] " Nitesh Shetty
@ 2022-04-27 21:57             ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 21:57 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, inux-kernel

On 4/28/22 00:30, Nitesh Shetty wrote:
>>> +/**
>>> + * blk_queue_max_copy_sectors - set max sectors for a single copy payload
>>> + * @q:  the request queue for the device
>>> + * @max_copy_sectors: maximum number of sectors to copy
>>> + **/
>>> +void blk_queue_max_copy_sectors(struct request_queue *q,
>>
>> This should be blk_queue_max_copy_hw_sectors().
>>
> 
> acked. Reasoning being, this function is used only by driver once for setting hw
> limits ?

function name points at what limit field it sets.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 01/10] block: Introduce queue limits for copy-offload support
@ 2022-04-27 21:57             ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 21:57 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: inux-kernel, linux-scsi, nitheshshetty, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/28/22 00:30, Nitesh Shetty wrote:
>>> +/**
>>> + * blk_queue_max_copy_sectors - set max sectors for a single copy payload
>>> + * @q:  the request queue for the device
>>> + * @max_copy_sectors: maximum number of sectors to copy
>>> + **/
>>> +void blk_queue_max_copy_sectors(struct request_queue *q,
>>
>> This should be blk_queue_max_copy_hw_sectors().
>>
> 
> acked. Reasoning being, this function is used only by driver once for setting hw
> limits ?

function name points at what limit field it sets.

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-27 15:15           ` [dm-devel] " Nitesh Shetty
@ 2022-04-27 22:04             ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 22:04 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

On 4/28/22 00:15, Nitesh Shetty wrote:
> On Wed, Apr 27, 2022 at 11:45:26AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> Introduce blkdev_issue_copy which supports source and destination bdevs,
>>> and an array of (source, destination and copy length) tuples.
>>> Introduce REQ_COPY copy offload operation flag. Create a read-write
>>> bio pair with a token as payload and submitted to the device in order.
>>> Read request populates token with source specific information which
>>> is then passed with write request.
>>> This design is courtesy Mikulas Patocka's token based copy
>>>
>>> Larger copy will be divided, based on max_copy_sectors,
>>> max_copy_range_sector limits.
>>>
>>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>>> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
>>> ---
>>>  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
>>>  block/blk.h               |   2 +
>>>  include/linux/blk_types.h |  21 ++++
>>>  include/linux/blkdev.h    |   2 +
>>>  include/uapi/linux/fs.h   |  14 +++
>>>  5 files changed, 271 insertions(+)
>>>
>>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>>> index 09b7e1200c0f..ba9da2d2f429 100644
>>> --- a/block/blk-lib.c
>>> +++ b/block/blk-lib.c
>>> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>>>  }
>>>  EXPORT_SYMBOL(blkdev_issue_discard);
>>>  
>>> +/*
>>> + * Wait on and process all in-flight BIOs.  This must only be called once
>>> + * all bios have been issued so that the refcount can only decrease.
>>> + * This just waits for all bios to make it through bio_copy_end_io. IO
>>> + * errors are propagated through cio->io_error.
>>> + */
>>> +static int cio_await_completion(struct cio *cio)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&cio->lock, flags);
>>> +	if (cio->refcount) {
>>> +		cio->waiter = current;
>>> +		__set_current_state(TASK_UNINTERRUPTIBLE);
>>> +		spin_unlock_irqrestore(&cio->lock, flags);
>>> +		blk_io_schedule();
>>> +		/* wake up sets us TASK_RUNNING */
>>> +		spin_lock_irqsave(&cio->lock, flags);
>>> +		cio->waiter = NULL;
>>> +		ret = cio->io_err;
>>> +	}
>>> +	spin_unlock_irqrestore(&cio->lock, flags);
>>> +	kvfree(cio);
>>
>> cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?
>>
> 
> acked.
> 
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void bio_copy_end_io(struct bio *bio)
>>> +{
>>> +	struct copy_ctx *ctx = bio->bi_private;
>>> +	struct cio *cio = ctx->cio;
>>> +	sector_t clen;
>>> +	int ri = ctx->range_idx;
>>> +	unsigned long flags;
>>> +	bool wake = false;
>>> +
>>> +	if (bio->bi_status) {
>>> +		cio->io_err = bio->bi_status;
>>> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
>>> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
>>
>> long line.
> 
> Is it because line is more than 80 character, I thought limit is 100 now, so
> went with longer lines ?

When it is easy to wrap the lines without readability loss, please do to
keep things under 80 char per line.


>>> +{
>>> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
>>> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
>>> +	int ret = -EINVAL;
>>> +
>>> +	if (!src_q || !dest_q)
>>> +		return -ENXIO;
>>> +
>>> +	if (!nr)
>>> +		return -EINVAL;
>>> +
>>> +	if (nr >= MAX_COPY_NR_RANGE)
>>> +		return -EINVAL;
>>
>> Where do you check the number of ranges against what the device can do ?
>>
> 
> The present implementation submits only one range at a time. This was done to 
> make copy offload generic, so that other types of copy implementation such as
> XCOPY should be able to use same infrastructure. Downside at present being
> NVMe copy offload is not optimal.

If you issue one range at a time without checking the number of ranges,
what is the point of the nr ranges queue limit ? The user can submit a
copy ioctl request exceeding it. Please use that limit and enforce it or
remove it entirely.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-27 22:04             ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 22:04 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/28/22 00:15, Nitesh Shetty wrote:
> On Wed, Apr 27, 2022 at 11:45:26AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> Introduce blkdev_issue_copy which supports source and destination bdevs,
>>> and an array of (source, destination and copy length) tuples.
>>> Introduce REQ_COPY copy offload operation flag. Create a read-write
>>> bio pair with a token as payload and submitted to the device in order.
>>> Read request populates token with source specific information which
>>> is then passed with write request.
>>> This design is courtesy Mikulas Patocka's token based copy
>>>
>>> Larger copy will be divided, based on max_copy_sectors,
>>> max_copy_range_sector limits.
>>>
>>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>>> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
>>> ---
>>>  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
>>>  block/blk.h               |   2 +
>>>  include/linux/blk_types.h |  21 ++++
>>>  include/linux/blkdev.h    |   2 +
>>>  include/uapi/linux/fs.h   |  14 +++
>>>  5 files changed, 271 insertions(+)
>>>
>>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>>> index 09b7e1200c0f..ba9da2d2f429 100644
>>> --- a/block/blk-lib.c
>>> +++ b/block/blk-lib.c
>>> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>>>  }
>>>  EXPORT_SYMBOL(blkdev_issue_discard);
>>>  
>>> +/*
>>> + * Wait on and process all in-flight BIOs.  This must only be called once
>>> + * all bios have been issued so that the refcount can only decrease.
>>> + * This just waits for all bios to make it through bio_copy_end_io. IO
>>> + * errors are propagated through cio->io_error.
>>> + */
>>> +static int cio_await_completion(struct cio *cio)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&cio->lock, flags);
>>> +	if (cio->refcount) {
>>> +		cio->waiter = current;
>>> +		__set_current_state(TASK_UNINTERRUPTIBLE);
>>> +		spin_unlock_irqrestore(&cio->lock, flags);
>>> +		blk_io_schedule();
>>> +		/* wake up sets us TASK_RUNNING */
>>> +		spin_lock_irqsave(&cio->lock, flags);
>>> +		cio->waiter = NULL;
>>> +		ret = cio->io_err;
>>> +	}
>>> +	spin_unlock_irqrestore(&cio->lock, flags);
>>> +	kvfree(cio);
>>
>> cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?
>>
> 
> acked.
> 
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void bio_copy_end_io(struct bio *bio)
>>> +{
>>> +	struct copy_ctx *ctx = bio->bi_private;
>>> +	struct cio *cio = ctx->cio;
>>> +	sector_t clen;
>>> +	int ri = ctx->range_idx;
>>> +	unsigned long flags;
>>> +	bool wake = false;
>>> +
>>> +	if (bio->bi_status) {
>>> +		cio->io_err = bio->bi_status;
>>> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
>>> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
>>
>> long line.
> 
> Is it because line is more than 80 character, I thought limit is 100 now, so
> went with longer lines ?

When it is easy to wrap the lines without readability loss, please do to
keep things under 80 char per line.


>>> +{
>>> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
>>> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
>>> +	int ret = -EINVAL;
>>> +
>>> +	if (!src_q || !dest_q)
>>> +		return -ENXIO;
>>> +
>>> +	if (!nr)
>>> +		return -EINVAL;
>>> +
>>> +	if (nr >= MAX_COPY_NR_RANGE)
>>> +		return -EINVAL;
>>
>> Where do you check the number of ranges against what the device can do ?
>>
> 
> The present implementation submits only one range at a time. This was done to 
> make copy offload generic, so that other types of copy implementation such as
> XCOPY should be able to use same infrastructure. Downside at present being
> NVMe copy offload is not optimal.

If you issue one range at a time without checking the number of ranges,
what is the point of the nr ranges queue limit ? The user can submit a
copy ioctl request exceeding it. Please use that limit and enforce it or
remove it entirely.


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-27 12:49       ` [dm-devel] " Nitesh Shetty
@ 2022-04-27 22:05         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 22:05 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

On 4/27/22 21:49, Nitesh Shetty wrote:
> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual call
>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>> We have covered the Initial agreed requirements in this patchset.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> Overall series supports –
>>>
>>> 1. Driver
>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>     block and file backend)
>>
>> It would also be nice to have copy offload emulation in null_blk for testing.
>>
> 
> We can plan this in next phase of copy support, once this series settles down.

So how can people test your series ? Not a lot of drives out there with
copy support.

> 
> --
> Nitesh Shetty
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-27 22:05         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-27 22:05 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/27/22 21:49, Nitesh Shetty wrote:
> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual call
>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>> We have covered the Initial agreed requirements in this patchset.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> Overall series supports –
>>>
>>> 1. Driver
>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>     block and file backend)
>>
>> It would also be nice to have copy offload emulation in null_blk for testing.
>>
> 
> We can plan this in next phase of copy support, once this series settles down.

So how can people test your series ? Not a lot of drives out there with
copy support.

> 
> --
> Nitesh Shetty
> 
> 


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-27 22:05         ` [dm-devel] " Damien Le Moal
@ 2022-04-28  7:49           ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-28  7:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

On Thu, Apr 28, 2022 at 07:05:32AM +0900, Damien Le Moal wrote:
> On 4/27/22 21:49, Nitesh Shetty wrote:
> > O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> >> On 4/26/22 19:12, Nitesh Shetty wrote:
> >>> The patch series covers the points discussed in November 2021 virtual call
> >>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> >>> We have covered the Initial agreed requirements in this patchset.
> >>> Patchset borrows Mikulas's token based approach for 2 bdev
> >>> implementation.
> >>>
> >>> Overall series supports –
> >>>
> >>> 1. Driver
> >>> - NVMe Copy command (single NS), including support in nvme-target (for
> >>>     block and file backend)
> >>
> >> It would also be nice to have copy offload emulation in null_blk for testing.
> >>
> > 
> > We can plan this in next phase of copy support, once this series settles down.
> 
> So how can people test your series ? Not a lot of drives out there with
> copy support.
>

Yeah not many drives at present, Qemu can be used to test NVMe copy.

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-28  7:49           ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-28  7:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1073 bytes --]

On Thu, Apr 28, 2022 at 07:05:32AM +0900, Damien Le Moal wrote:
> On 4/27/22 21:49, Nitesh Shetty wrote:
> > O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> >> On 4/26/22 19:12, Nitesh Shetty wrote:
> >>> The patch series covers the points discussed in November 2021 virtual call
> >>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> >>> We have covered the Initial agreed requirements in this patchset.
> >>> Patchset borrows Mikulas's token based approach for 2 bdev
> >>> implementation.
> >>>
> >>> Overall series supports –
> >>>
> >>> 1. Driver
> >>> - NVMe Copy command (single NS), including support in nvme-target (for
> >>>     block and file backend)
> >>
> >> It would also be nice to have copy offload emulation in null_blk for testing.
> >>
> > 
> > We can plan this in next phase of copy support, once this series settles down.
> 
> So how can people test your series ? Not a lot of drives out there with
> copy support.
>

Yeah not many drives at present, Qemu can be used to test NVMe copy.

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 02/10] block: Add copy offload support infrastructure
  2022-04-27 22:04             ` [dm-devel] " Damien Le Moal
@ 2022-04-28  8:01               ` Nitesh Shetty
  -1 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-28  8:01 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4403 bytes --]

On Thu, Apr 28, 2022 at 07:04:13AM +0900, Damien Le Moal wrote:
> On 4/28/22 00:15, Nitesh Shetty wrote:
> > On Wed, Apr 27, 2022 at 11:45:26AM +0900, Damien Le Moal wrote:
> >> On 4/26/22 19:12, Nitesh Shetty wrote:
> >>> Introduce blkdev_issue_copy which supports source and destination bdevs,
> >>> and an array of (source, destination and copy length) tuples.
> >>> Introduce REQ_COPY copy offload operation flag. Create a read-write
> >>> bio pair with a token as payload and submitted to the device in order.
> >>> Read request populates token with source specific information which
> >>> is then passed with write request.
> >>> This design is courtesy Mikulas Patocka's token based copy
> >>>
> >>> Larger copy will be divided, based on max_copy_sectors,
> >>> max_copy_range_sector limits.
> >>>
> >>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> >>> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> >>> ---
> >>>  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
> >>>  block/blk.h               |   2 +
> >>>  include/linux/blk_types.h |  21 ++++
> >>>  include/linux/blkdev.h    |   2 +
> >>>  include/uapi/linux/fs.h   |  14 +++
> >>>  5 files changed, 271 insertions(+)
> >>>
> >>> diff --git a/block/blk-lib.c b/block/blk-lib.c
> >>> index 09b7e1200c0f..ba9da2d2f429 100644
> >>> --- a/block/blk-lib.c
> >>> +++ b/block/blk-lib.c
> >>> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >>>  }
> >>>  EXPORT_SYMBOL(blkdev_issue_discard);
> >>>  
> >>> +/*
> >>> + * Wait on and process all in-flight BIOs.  This must only be called once
> >>> + * all bios have been issued so that the refcount can only decrease.
> >>> + * This just waits for all bios to make it through bio_copy_end_io. IO
> >>> + * errors are propagated through cio->io_error.
> >>> + */
> >>> +static int cio_await_completion(struct cio *cio)
> >>> +{
> >>> +	int ret = 0;
> >>> +	unsigned long flags;
> >>> +
> >>> +	spin_lock_irqsave(&cio->lock, flags);
> >>> +	if (cio->refcount) {
> >>> +		cio->waiter = current;
> >>> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> >>> +		spin_unlock_irqrestore(&cio->lock, flags);
> >>> +		blk_io_schedule();
> >>> +		/* wake up sets us TASK_RUNNING */
> >>> +		spin_lock_irqsave(&cio->lock, flags);
> >>> +		cio->waiter = NULL;
> >>> +		ret = cio->io_err;
> >>> +	}
> >>> +	spin_unlock_irqrestore(&cio->lock, flags);
> >>> +	kvfree(cio);
> >>
> >> cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?
> >>
> > 
> > acked.
> > 
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +static void bio_copy_end_io(struct bio *bio)
> >>> +{
> >>> +	struct copy_ctx *ctx = bio->bi_private;
> >>> +	struct cio *cio = ctx->cio;
> >>> +	sector_t clen;
> >>> +	int ri = ctx->range_idx;
> >>> +	unsigned long flags;
> >>> +	bool wake = false;
> >>> +
> >>> +	if (bio->bi_status) {
> >>> +		cio->io_err = bio->bi_status;
> >>> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> >>> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> >>
> >> long line.
> > 
> > Is it because line is more than 80 character, I thought limit is 100 now, so
> > went with longer lines ?
> 
> When it is easy to wrap the lines without readability loss, please do to
> keep things under 80 char per line.
> 
>

acked

> >>> +{
> >>> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> >>> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> >>> +	int ret = -EINVAL;
> >>> +
> >>> +	if (!src_q || !dest_q)
> >>> +		return -ENXIO;
> >>> +
> >>> +	if (!nr)
> >>> +		return -EINVAL;
> >>> +
> >>> +	if (nr >= MAX_COPY_NR_RANGE)
> >>> +		return -EINVAL;
> >>
> >> Where do you check the number of ranges against what the device can do ?
> >>
> > 
> > The present implementation submits only one range at a time. This was done to 
> > make copy offload generic, so that other types of copy implementation such as
> > XCOPY should be able to use same infrastructure. Downside at present being
> > NVMe copy offload is not optimal.
> 
> If you issue one range at a time without checking the number of ranges,
> what is the point of the nr ranges queue limit ? The user can submit a
> copy ioctl request exceeding it. Please use that limit and enforce it or
> remove it entirely.
> 

Sure, will remove this limit in next version.

--
Thank you
Nitesh Shetty


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 02/10] block: Add copy offload support infrastructure
@ 2022-04-28  8:01               ` Nitesh Shetty
  0 siblings, 0 replies; 101+ messages in thread
From: Nitesh Shetty @ 2022-04-28  8:01 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 4403 bytes --]

On Thu, Apr 28, 2022 at 07:04:13AM +0900, Damien Le Moal wrote:
> On 4/28/22 00:15, Nitesh Shetty wrote:
> > On Wed, Apr 27, 2022 at 11:45:26AM +0900, Damien Le Moal wrote:
> >> On 4/26/22 19:12, Nitesh Shetty wrote:
> >>> Introduce blkdev_issue_copy which supports source and destination bdevs,
> >>> and an array of (source, destination and copy length) tuples.
> >>> Introduce REQ_COPY copy offload operation flag. Create a read-write
> >>> bio pair with a token as payload and submitted to the device in order.
> >>> Read request populates token with source specific information which
> >>> is then passed with write request.
> >>> This design is courtesy Mikulas Patocka's token based copy
> >>>
> >>> Larger copy will be divided, based on max_copy_sectors,
> >>> max_copy_range_sector limits.
> >>>
> >>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> >>> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> >>> ---
> >>>  block/blk-lib.c           | 232 ++++++++++++++++++++++++++++++++++++++
> >>>  block/blk.h               |   2 +
> >>>  include/linux/blk_types.h |  21 ++++
> >>>  include/linux/blkdev.h    |   2 +
> >>>  include/uapi/linux/fs.h   |  14 +++
> >>>  5 files changed, 271 insertions(+)
> >>>
> >>> diff --git a/block/blk-lib.c b/block/blk-lib.c
> >>> index 09b7e1200c0f..ba9da2d2f429 100644
> >>> --- a/block/blk-lib.c
> >>> +++ b/block/blk-lib.c
> >>> @@ -117,6 +117,238 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >>>  }
> >>>  EXPORT_SYMBOL(blkdev_issue_discard);
> >>>  
> >>> +/*
> >>> + * Wait on and process all in-flight BIOs.  This must only be called once
> >>> + * all bios have been issued so that the refcount can only decrease.
> >>> + * This just waits for all bios to make it through bio_copy_end_io. IO
> >>> + * errors are propagated through cio->io_error.
> >>> + */
> >>> +static int cio_await_completion(struct cio *cio)
> >>> +{
> >>> +	int ret = 0;
> >>> +	unsigned long flags;
> >>> +
> >>> +	spin_lock_irqsave(&cio->lock, flags);
> >>> +	if (cio->refcount) {
> >>> +		cio->waiter = current;
> >>> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> >>> +		spin_unlock_irqrestore(&cio->lock, flags);
> >>> +		blk_io_schedule();
> >>> +		/* wake up sets us TASK_RUNNING */
> >>> +		spin_lock_irqsave(&cio->lock, flags);
> >>> +		cio->waiter = NULL;
> >>> +		ret = cio->io_err;
> >>> +	}
> >>> +	spin_unlock_irqrestore(&cio->lock, flags);
> >>> +	kvfree(cio);
> >>
> >> cio is allocated with kzalloc() == kmalloc(). So why the kvfree() here ?
> >>
> > 
> > acked.
> > 
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +static void bio_copy_end_io(struct bio *bio)
> >>> +{
> >>> +	struct copy_ctx *ctx = bio->bi_private;
> >>> +	struct cio *cio = ctx->cio;
> >>> +	sector_t clen;
> >>> +	int ri = ctx->range_idx;
> >>> +	unsigned long flags;
> >>> +	bool wake = false;
> >>> +
> >>> +	if (bio->bi_status) {
> >>> +		cio->io_err = bio->bi_status;
> >>> +		clen = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - ctx->start_sec;
> >>> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> >>
> >> long line.
> > 
> > Is it because line is more than 80 character, I thought limit is 100 now, so
> > went with longer lines ?
> 
> When it is easy to wrap the lines without readability loss, please do to
> keep things under 80 char per line.
> 
>

acked

> >>> +{
> >>> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> >>> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> >>> +	int ret = -EINVAL;
> >>> +
> >>> +	if (!src_q || !dest_q)
> >>> +		return -ENXIO;
> >>> +
> >>> +	if (!nr)
> >>> +		return -EINVAL;
> >>> +
> >>> +	if (nr >= MAX_COPY_NR_RANGE)
> >>> +		return -EINVAL;
> >>
> >> Where do you check the number of ranges against what the device can do ?
> >>
> > 
> > The present implementation submits only one range at a time. This was done to 
> > make copy offload generic, so that other types of copy implementation such as
> > XCOPY should be able to use same infrastructure. Downside at present being
> > NVMe copy offload is not optimal.
> 
> If you issue one range at a time without checking the number of ranges,
> what is the point of the nr ranges queue limit ? The user can submit a
> copy ioctl request exceeding it. Please use that limit and enforce it or
> remove it entirely.
> 

Sure, will remove this limit in next version.

--
Thank you
Nitesh Shetty


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 05/10] nvme: add copy offload support
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-28 14:02         ` kernel test robot
  -1 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-28 14:02 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: kbuild-all, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack, nitheshshetty,
	gost.dev, Nitesh Shetty, Kanchan Joshi, Javier González,
	Arnav Dawn, Alasdair Kergon, Mike Snitzer

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: s390-randconfig-s032-20220427 (https://download.01.org/0day-ci/archive/20220428/202204282136.kqIaq8aK-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/e029014185aff1d7c8facf6e19447487c6ce2b93
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout e029014185aff1d7c8facf6e19447487c6ce2b93
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash drivers/md/ drivers/nvme/host/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/host/core.c:803:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] dspec @@     got restricted __le32 [usertype] @@
   drivers/nvme/host/core.c:803:26: sparse:     expected restricted __le16 [usertype] dspec
   drivers/nvme/host/core.c:803:26: sparse:     got restricted __le32 [usertype]

vim +803 drivers/nvme/host/core.c

   739	
   740	static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
   741		       struct request *req, struct nvme_command *cmnd)
   742	{
   743		struct nvme_ctrl *ctrl = ns->ctrl;
   744		struct nvme_copy_range *range = NULL;
   745		struct bio *bio = req->bio;
   746		struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
   747		sector_t src_sector, dst_sector, n_sectors;
   748		u64 src_lba, dst_lba, n_lba;
   749		unsigned short nr_range = 1;
   750		u16 control = 0;
   751		u32 dsmgmt = 0;
   752	
   753		if (unlikely(memcmp(token->subsys, "nvme", 4)))
   754			return BLK_STS_NOTSUPP;
   755		if (unlikely(token->ns != ns))
   756			return BLK_STS_NOTSUPP;
   757	
   758		src_sector = token->src_sector;
   759		dst_sector = bio->bi_iter.bi_sector;
   760		n_sectors = token->sectors;
   761		if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
   762			return BLK_STS_NOTSUPP;
   763	
   764		src_lba = nvme_sect_to_lba(ns, src_sector);
   765		dst_lba = nvme_sect_to_lba(ns, dst_sector);
   766		n_lba = nvme_sect_to_lba(ns, n_sectors);
   767	
   768		if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
   769				unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
   770				unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
   771			return BLK_STS_NOTSUPP;
   772	
   773		if (WARN_ON(!n_lba))
   774			return BLK_STS_NOTSUPP;
   775	
   776		if (req->cmd_flags & REQ_FUA)
   777			control |= NVME_RW_FUA;
   778	
   779		if (req->cmd_flags & REQ_FAILFAST_DEV)
   780			control |= NVME_RW_LR;
   781	
   782		memset(cmnd, 0, sizeof(*cmnd));
   783		cmnd->copy.opcode = nvme_cmd_copy;
   784		cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
   785		cmnd->copy.sdlba = cpu_to_le64(dst_lba);
   786	
   787		range = kmalloc_array(nr_range, sizeof(*range),
   788				GFP_ATOMIC | __GFP_NOWARN);
   789		if (!range)
   790			return BLK_STS_RESOURCE;
   791	
   792		range[0].slba = cpu_to_le64(src_lba);
   793		range[0].nlb = cpu_to_le16(n_lba - 1);
   794	
   795		cmnd->copy.nr_range = 0;
   796	
   797		req->special_vec.bv_page = virt_to_page(range);
   798		req->special_vec.bv_offset = offset_in_page(range);
   799		req->special_vec.bv_len = sizeof(*range) * nr_range;
   800		req->rq_flags |= RQF_SPECIAL_PAYLOAD;
   801	
   802		cmnd->copy.control = cpu_to_le16(control);
 > 803		cmnd->copy.dspec = cpu_to_le32(dsmgmt);
   804	
   805		return BLK_STS_OK;
   806	}
   807	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 05/10] nvme: add copy offload support
@ 2022-04-28 14:02         ` kernel test robot
  0 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-28 14:02 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	msnitzer, bvanassche, linux-scsi, gost.dev, nitheshshetty, hch,
	Nitesh Shetty, chaitanyak, Mike Snitzer, josef, linux-block,
	dsterba, kbusch, tytso, Frederick.Knight, axboe, kbuild-all,
	Kanchan Joshi, martin.petersen, Arnav Dawn, jack, linux-fsdevel,
	Javier González, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: s390-randconfig-s032-20220427 (https://download.01.org/0day-ci/archive/20220428/202204282136.kqIaq8aK-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/e029014185aff1d7c8facf6e19447487c6ce2b93
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout e029014185aff1d7c8facf6e19447487c6ce2b93
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash drivers/md/ drivers/nvme/host/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/host/core.c:803:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] dspec @@     got restricted __le32 [usertype] @@
   drivers/nvme/host/core.c:803:26: sparse:     expected restricted __le16 [usertype] dspec
   drivers/nvme/host/core.c:803:26: sparse:     got restricted __le32 [usertype]

vim +803 drivers/nvme/host/core.c

   739	
   740	static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
   741		       struct request *req, struct nvme_command *cmnd)
   742	{
   743		struct nvme_ctrl *ctrl = ns->ctrl;
   744		struct nvme_copy_range *range = NULL;
   745		struct bio *bio = req->bio;
   746		struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
   747		sector_t src_sector, dst_sector, n_sectors;
   748		u64 src_lba, dst_lba, n_lba;
   749		unsigned short nr_range = 1;
   750		u16 control = 0;
   751		u32 dsmgmt = 0;
   752	
   753		if (unlikely(memcmp(token->subsys, "nvme", 4)))
   754			return BLK_STS_NOTSUPP;
   755		if (unlikely(token->ns != ns))
   756			return BLK_STS_NOTSUPP;
   757	
   758		src_sector = token->src_sector;
   759		dst_sector = bio->bi_iter.bi_sector;
   760		n_sectors = token->sectors;
   761		if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
   762			return BLK_STS_NOTSUPP;
   763	
   764		src_lba = nvme_sect_to_lba(ns, src_sector);
   765		dst_lba = nvme_sect_to_lba(ns, dst_sector);
   766		n_lba = nvme_sect_to_lba(ns, n_sectors);
   767	
   768		if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
   769				unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
   770				unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
   771			return BLK_STS_NOTSUPP;
   772	
   773		if (WARN_ON(!n_lba))
   774			return BLK_STS_NOTSUPP;
   775	
   776		if (req->cmd_flags & REQ_FUA)
   777			control |= NVME_RW_FUA;
   778	
   779		if (req->cmd_flags & REQ_FAILFAST_DEV)
   780			control |= NVME_RW_LR;
   781	
   782		memset(cmnd, 0, sizeof(*cmnd));
   783		cmnd->copy.opcode = nvme_cmd_copy;
   784		cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
   785		cmnd->copy.sdlba = cpu_to_le64(dst_lba);
   786	
   787		range = kmalloc_array(nr_range, sizeof(*range),
   788				GFP_ATOMIC | __GFP_NOWARN);
   789		if (!range)
   790			return BLK_STS_RESOURCE;
   791	
   792		range[0].slba = cpu_to_le64(src_lba);
   793		range[0].nlb = cpu_to_le16(n_lba - 1);
   794	
   795		cmnd->copy.nr_range = 0;
   796	
   797		req->special_vec.bv_page = virt_to_page(range);
   798		req->special_vec.bv_offset = offset_in_page(range);
   799		req->special_vec.bv_len = sizeof(*range) * nr_range;
   800		req->rq_flags |= RQF_SPECIAL_PAYLOAD;
   801	
   802		cmnd->copy.control = cpu_to_le16(control);
 > 803		cmnd->copy.dspec = cpu_to_le32(dsmgmt);
   804	
   805		return BLK_STS_OK;
   806	}
   807	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 06/10] nvmet: add copy command support for bdev and file ns
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-28 14:53         ` kernel test robot
  -1 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-28 14:53 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: kbuild-all, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack, nitheshshetty,
	gost.dev, Arnav Dawn, Nitesh Shetty, Alasdair Kergon,
	Mike Snitzer, Sagi Grimberg, James Smart

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: s390-randconfig-s032-20220427 (https://download.01.org/0day-ci/archive/20220428/202204282248.B5VfX8LS-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/6a9ea8570c34a7222786ca4d129578f48426d2f2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout 6a9ea8570c34a7222786ca4d129578f48426d2f2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash drivers/md/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/io-cmd-bdev.c:56:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:56:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:56:26: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:59:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:59:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:59:34: sparse:     got restricted __le16
--
>> drivers/nvme/target/admin-cmd.c:537:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/admin-cmd.c:537:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/admin-cmd.c:537:26: sparse:     got restricted __le16

vim +56 drivers/nvme/target/io-cmd-bdev.c

    12	
    13	void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
    14	{
    15		const struct queue_limits *ql = &bdev_get_queue(bdev)->limits;
    16		/* Number of logical blocks per physical block. */
    17		const u32 lpp = ql->physical_block_size / ql->logical_block_size;
    18		/* Logical blocks per physical block, 0's based. */
    19		const __le16 lpp0b = to0based(lpp);
    20	
    21		/*
    22		 * For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
    23		 * NAWUPF, and NACWU are defined for this namespace and should be
    24		 * used by the host for this namespace instead of the AWUN, AWUPF,
    25		 * and ACWU fields in the Identify Controller data structure. If
    26		 * any of these fields are zero that means that the corresponding
    27		 * field from the identify controller data structure should be used.
    28		 */
    29		id->nsfeat |= 1 << 1;
    30		id->nawun = lpp0b;
    31		id->nawupf = lpp0b;
    32		id->nacwu = lpp0b;
    33	
    34		/*
    35		 * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
    36		 * NOWS are defined for this namespace and should be used by
    37		 * the host for I/O optimization.
    38		 */
    39		id->nsfeat |= 1 << 4;
    40		/* NPWG = Namespace Preferred Write Granularity. 0's based */
    41		id->npwg = lpp0b;
    42		/* NPWA = Namespace Preferred Write Alignment. 0's based */
    43		id->npwa = id->npwg;
    44		/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
    45		id->npdg = to0based(ql->discard_granularity / ql->logical_block_size);
    46		/* NPDG = Namespace Preferred Deallocate Alignment */
    47		id->npda = id->npdg;
    48		/* NOWS = Namespace Optimal Write Size */
    49		id->nows = to0based(ql->io_opt / ql->logical_block_size);
    50	
    51		/*Copy limits*/
    52		if (ql->max_copy_sectors) {
    53			id->mcl = cpu_to_le32((ql->max_copy_sectors << 9) / ql->logical_block_size);
    54			id->mssrl = cpu_to_le16((ql->max_copy_range_sectors << 9) /
    55					ql->logical_block_size);
  > 56			id->msrc = to0based(ql->max_copy_nr_ranges);
    57		} else {
    58			if (ql->zoned == BLK_ZONED_NONE) {
    59				id->msrc = to0based(BIO_MAX_VECS);
    60				id->mssrl = cpu_to_le16(
    61						(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
    62				id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
    63	#ifdef CONFIG_BLK_DEV_ZONED
    64			} else {
    65				/* TODO: get right values for zoned device */
    66				id->msrc = to0based(BIO_MAX_VECS);
    67				id->mssrl = cpu_to_le16(min((BIO_MAX_VECS << PAGE_SHIFT),
    68						ql->chunk_sectors) / ql->logical_block_size);
    69				id->mcl = cpu_to_le32(min(le16_to_cpu(id->mssrl) * BIO_MAX_VECS,
    70							ql->chunk_sectors));
    71	#endif
    72			}
    73		}
    74	}
    75	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 06/10] nvmet: add copy command support for bdev and file ns
@ 2022-04-28 14:53         ` kernel test robot
  0 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-28 14:53 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	msnitzer, bvanassche, linux-scsi, gost.dev, nitheshshetty,
	James Smart, hch, Nitesh Shetty, chaitanyak, Mike Snitzer, josef,
	linux-block, dsterba, kbusch, tytso, Frederick.Knight,
	Sagi Grimberg, axboe, kbuild-all, martin.petersen, Arnav Dawn,
	jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: s390-randconfig-s032-20220427 (https://download.01.org/0day-ci/archive/20220428/202204282248.B5VfX8LS-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/6a9ea8570c34a7222786ca4d129578f48426d2f2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout 6a9ea8570c34a7222786ca4d129578f48426d2f2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash drivers/md/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/io-cmd-bdev.c:56:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:56:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:56:26: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:59:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:59:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:59:34: sparse:     got restricted __le16
--
>> drivers/nvme/target/admin-cmd.c:537:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/admin-cmd.c:537:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/admin-cmd.c:537:26: sparse:     got restricted __le16

vim +56 drivers/nvme/target/io-cmd-bdev.c

    12	
    13	void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
    14	{
    15		const struct queue_limits *ql = &bdev_get_queue(bdev)->limits;
    16		/* Number of logical blocks per physical block. */
    17		const u32 lpp = ql->physical_block_size / ql->logical_block_size;
    18		/* Logical blocks per physical block, 0's based. */
    19		const __le16 lpp0b = to0based(lpp);
    20	
    21		/*
    22		 * For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
    23		 * NAWUPF, and NACWU are defined for this namespace and should be
    24		 * used by the host for this namespace instead of the AWUN, AWUPF,
    25		 * and ACWU fields in the Identify Controller data structure. If
    26		 * any of these fields are zero that means that the corresponding
    27		 * field from the identify controller data structure should be used.
    28		 */
    29		id->nsfeat |= 1 << 1;
    30		id->nawun = lpp0b;
    31		id->nawupf = lpp0b;
    32		id->nacwu = lpp0b;
    33	
    34		/*
    35		 * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
    36		 * NOWS are defined for this namespace and should be used by
    37		 * the host for I/O optimization.
    38		 */
    39		id->nsfeat |= 1 << 4;
    40		/* NPWG = Namespace Preferred Write Granularity. 0's based */
    41		id->npwg = lpp0b;
    42		/* NPWA = Namespace Preferred Write Alignment. 0's based */
    43		id->npwa = id->npwg;
    44		/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
    45		id->npdg = to0based(ql->discard_granularity / ql->logical_block_size);
    46		/* NPDG = Namespace Preferred Deallocate Alignment */
    47		id->npda = id->npdg;
    48		/* NOWS = Namespace Optimal Write Size */
    49		id->nows = to0based(ql->io_opt / ql->logical_block_size);
    50	
    51		/*Copy limits*/
    52		if (ql->max_copy_sectors) {
    53			id->mcl = cpu_to_le32((ql->max_copy_sectors << 9) / ql->logical_block_size);
    54			id->mssrl = cpu_to_le16((ql->max_copy_range_sectors << 9) /
    55					ql->logical_block_size);
  > 56			id->msrc = to0based(ql->max_copy_nr_ranges);
    57		} else {
    58			if (ql->zoned == BLK_ZONED_NONE) {
    59				id->msrc = to0based(BIO_MAX_VECS);
    60				id->mssrl = cpu_to_le16(
    61						(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
    62				id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl) * BIO_MAX_VECS);
    63	#ifdef CONFIG_BLK_DEV_ZONED
    64			} else {
    65				/* TODO: get right values for zoned device */
    66				id->msrc = to0based(BIO_MAX_VECS);
    67				id->mssrl = cpu_to_le16(min((BIO_MAX_VECS << PAGE_SHIFT),
    68						ql->chunk_sectors) / ql->logical_block_size);
    69				id->mcl = cpu_to_le32(min(le16_to_cpu(id->mssrl) * BIO_MAX_VECS,
    70							ql->chunk_sectors));
    71	#endif
    72			}
    73		}
    74	}
    75	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 07/10] dm: Add support for copy offload.
  2022-04-26 10:12       ` Nitesh Shetty
@ 2022-04-28 15:54         ` kernel test robot
  -1 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-28 15:54 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: kbuild-all, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack, nitheshshetty,
	gost.dev, Nitesh Shetty, Alasdair Kergon, Mike Snitzer,
	Sagi Grimberg, James Smart, Chaitanya Kulkarni

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: s390-randconfig-s032-20220427 (https://download.01.org/0day-ci/archive/20220428/202204282336.7AY0GVKz-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/913c8c5197fea28ee3c8424e16eadd8b159a91f0
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout 913c8c5197fea28ee3c8424e16eadd8b159a91f0
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash drivers/md/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/md/dm.c:1602:24: sparse: sparse: incorrect type in return expression (different base types) @@     expected restricted blk_status_t @@     got int @@
   drivers/md/dm.c:1602:24: sparse:     expected restricted blk_status_t
   drivers/md/dm.c:1602:24: sparse:     got int

vim +1602 drivers/md/dm.c

  1582	
  1583	/*
  1584	 * Select the correct strategy for processing a non-flush bio.
  1585	 */
  1586	static blk_status_t __split_and_process_bio(struct clone_info *ci)
  1587	{
  1588		struct bio *clone;
  1589		struct dm_target *ti;
  1590		unsigned len;
  1591	
  1592		ti = dm_table_find_target(ci->map, ci->sector);
  1593		if (unlikely(!ti))
  1594			return BLK_STS_IOERR;
  1595		else if (unlikely(ci->is_abnormal_io))
  1596			return __process_abnormal_io(ci, ti);
  1597	
  1598		if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
  1599					max_io_len(ti, ci->sector) < ci->sector_count)) {
  1600			DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
  1601					__func__, ci->sector_count, max_io_len(ti, ci->sector));
> 1602			return -EIO;
  1603		}
  1604		/*
  1605		 * Only support bio polling for normal IO, and the target io is
  1606		 * exactly inside the dm_io instance (verified in dm_poll_dm_io)
  1607		 */
  1608		ci->submit_as_polled = ci->bio->bi_opf & REQ_POLLED;
  1609	
  1610		len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count);
  1611		setup_split_accounting(ci, len);
  1612		clone = alloc_tio(ci, ti, 0, &len, GFP_NOIO);
  1613		__map_bio(clone);
  1614	
  1615		ci->sector += len;
  1616		ci->sector_count -= len;
  1617	
  1618		return BLK_STS_OK;
  1619	}
  1620	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 07/10] dm: Add support for copy offload.
@ 2022-04-28 15:54         ` kernel test robot
  0 siblings, 0 replies; 101+ messages in thread
From: kernel test robot @ 2022-04-28 15:54 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Alasdair Kergon,
	msnitzer, bvanassche, linux-scsi, gost.dev, nitheshshetty,
	James Smart, hch, Nitesh Shetty, chaitanyak, Chaitanya Kulkarni,
	Mike Snitzer, josef, linux-block, dsterba, kbusch, tytso,
	Frederick.Knight, Sagi Grimberg, axboe, kbuild-all,
	martin.petersen, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220422]
[cannot apply to axboe-block/for-next device-mapper-dm/for-next linus/master v5.18-rc4 v5.18-rc3 v5.18-rc2 v5.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
base:    e7d6987e09a328d4a949701db40ef63fbb970670
config: s390-randconfig-s032-20220427 (https://download.01.org/0day-ci/archive/20220428/202204282336.7AY0GVKz-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/913c8c5197fea28ee3c8424e16eadd8b159a91f0
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20220426-201825
        git checkout 913c8c5197fea28ee3c8424e16eadd8b159a91f0
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash drivers/md/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/md/dm.c:1602:24: sparse: sparse: incorrect type in return expression (different base types) @@     expected restricted blk_status_t @@     got int @@
   drivers/md/dm.c:1602:24: sparse:     expected restricted blk_status_t
   drivers/md/dm.c:1602:24: sparse:     got int

vim +1602 drivers/md/dm.c

  1582	
  1583	/*
  1584	 * Select the correct strategy for processing a non-flush bio.
  1585	 */
  1586	static blk_status_t __split_and_process_bio(struct clone_info *ci)
  1587	{
  1588		struct bio *clone;
  1589		struct dm_target *ti;
  1590		unsigned len;
  1591	
  1592		ti = dm_table_find_target(ci->map, ci->sector);
  1593		if (unlikely(!ti))
  1594			return BLK_STS_IOERR;
  1595		else if (unlikely(ci->is_abnormal_io))
  1596			return __process_abnormal_io(ci, ti);
  1597	
  1598		if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
  1599					max_io_len(ti, ci->sector) < ci->sector_count)) {
  1600			DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
  1601					__func__, ci->sector_count, max_io_len(ti, ci->sector));
> 1602			return -EIO;
  1603		}
  1604		/*
  1605		 * Only support bio polling for normal IO, and the target io is
  1606		 * exactly inside the dm_io instance (verified in dm_poll_dm_io)
  1607		 */
  1608		ci->submit_as_polled = ci->bio->bi_opf & REQ_POLLED;
  1609	
  1610		len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count);
  1611		setup_split_accounting(ci, len);
  1612		clone = alloc_tio(ci, ti, 0, &len, GFP_NOIO);
  1613		__map_bio(clone);
  1614	
  1615		ci->sector += len;
  1616		ci->sector_count -= len;
  1617	
  1618		return BLK_STS_OK;
  1619	}
  1620	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-28  7:49           ` [dm-devel] " Nitesh Shetty
@ 2022-04-28 21:37             ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-28 21:37 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

On 4/28/22 16:49, Nitesh Shetty wrote:
> On Thu, Apr 28, 2022 at 07:05:32AM +0900, Damien Le Moal wrote:
>> On 4/27/22 21:49, Nitesh Shetty wrote:
>>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>>> The patch series covers the points discussed in November 2021 virtual call
>>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>>> We have covered the Initial agreed requirements in this patchset.
>>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>>> implementation.
>>>>>
>>>>> Overall series supports –
>>>>>
>>>>> 1. Driver
>>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>>     block and file backend)
>>>>
>>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>>
>>>
>>> We can plan this in next phase of copy support, once this series settles down.
>>
>> So how can people test your series ? Not a lot of drives out there with
>> copy support.
>>
> 
> Yeah not many drives at present, Qemu can be used to test NVMe copy.

Upstream QEMU ? What is the command line options ? An example would be
nice. But I still think null_blk support would be easiest.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-28 21:37             ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-04-28 21:37 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/28/22 16:49, Nitesh Shetty wrote:
> On Thu, Apr 28, 2022 at 07:05:32AM +0900, Damien Le Moal wrote:
>> On 4/27/22 21:49, Nitesh Shetty wrote:
>>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>>> The patch series covers the points discussed in November 2021 virtual call
>>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>>> We have covered the Initial agreed requirements in this patchset.
>>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>>> implementation.
>>>>>
>>>>> Overall series supports –
>>>>>
>>>>> 1. Driver
>>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>>     block and file backend)
>>>>
>>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>>
>>>
>>> We can plan this in next phase of copy support, once this series settles down.
>>
>> So how can people test your series ? Not a lot of drives out there with
>> copy support.
>>
> 
> Yeah not many drives at present, Qemu can be used to test NVMe copy.

Upstream QEMU ? What is the command line options ? An example would be
nice. But I still think null_blk support would be easiest.


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
  2022-04-28 21:37             ` [dm-devel] " Damien Le Moal
@ 2022-04-29  3:39               ` Bart Van Assche
  -1 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2022-04-29  3:39 UTC (permalink / raw)
  To: Damien Le Moal, Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/28/22 14:37, Damien Le Moal wrote:
> On 4/28/22 16:49, Nitesh Shetty wrote:
>> On Thu, Apr 28, 2022 at 07:05:32AM +0900, Damien Le Moal wrote:
>>> On 4/27/22 21:49, Nitesh Shetty wrote:
>>>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>>>> The patch series covers the points discussed in November 2021 virtual call
>>>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>>>> We have covered the Initial agreed requirements in this patchset.
>>>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>>>> implementation.
>>>>>>
>>>>>> Overall series supports –
>>>>>>
>>>>>> 1. Driver
>>>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>>>      block and file backend)
>>>>>
>>>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>>>
>>>>
>>>> We can plan this in next phase of copy support, once this series settles down.
>>>
>>> So how can people test your series ? Not a lot of drives out there with
>>> copy support.
>>>
>>
>> Yeah not many drives at present, Qemu can be used to test NVMe copy.
> 
> Upstream QEMU ? What is the command line options ? An example would be
> nice. But I still think null_blk support would be easiest.

+1 for adding copy offloading support in null_blk. That enables running 
copy offloading tests without depending on Qemu.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-04-29  3:39               ` Bart Van Assche
  0 siblings, 0 replies; 101+ messages in thread
From: Bart Van Assche @ 2022-04-29  3:39 UTC (permalink / raw)
  To: Damien Le Moal, Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 4/28/22 14:37, Damien Le Moal wrote:
> On 4/28/22 16:49, Nitesh Shetty wrote:
>> On Thu, Apr 28, 2022 at 07:05:32AM +0900, Damien Le Moal wrote:
>>> On 4/27/22 21:49, Nitesh Shetty wrote:
>>>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>>>> The patch series covers the points discussed in November 2021 virtual call
>>>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>>>> We have covered the Initial agreed requirements in this patchset.
>>>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>>>> implementation.
>>>>>>
>>>>>> Overall series supports –
>>>>>>
>>>>>> 1. Driver
>>>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>>>      block and file backend)
>>>>>
>>>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>>>
>>>>
>>>> We can plan this in next phase of copy support, once this series settles down.
>>>
>>> So how can people test your series ? Not a lot of drives out there with
>>> copy support.
>>>
>>
>> Yeah not many drives at present, Qemu can be used to test NVMe copy.
> 
> Upstream QEMU ? What is the command line options ? An example would be
> nice. But I still think null_blk support would be easiest.

+1 for adding copy offloading support in null_blk. That enables running 
copy offloading tests without depending on Qemu.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-04-27 12:49       ` [dm-devel] " Nitesh Shetty
@ 2022-05-02  4:09         ` Dave Chinner
  -1 siblings, 0 replies; 101+ messages in thread
From: Dave Chinner @ 2022-05-02  4:09 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: Damien Le Moal, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, nitheshshetty, linux-kernel

On Wed, Apr 27, 2022 at 06:19:51PM +0530, Nitesh Shetty wrote:
> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> > On 4/26/22 19:12, Nitesh Shetty wrote:
> > > The patch series covers the points discussed in November 2021 virtual call
> > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > > We have covered the Initial agreed requirements in this patchset.
> > > Patchset borrows Mikulas's token based approach for 2 bdev
> > > implementation.
> > > 
> > > Overall series supports –
> > > 
> > > 1. Driver
> > > - NVMe Copy command (single NS), including support in nvme-target (for
> > >     block and file backend)
> > 
> > It would also be nice to have copy offload emulation in null_blk for testing.
> >
> 
> We can plan this in next phase of copy support, once this series settles down.

Why not just hook the loopback driver up to copy_file_range() so
that the backend filesystem can just reflink copy the ranges being
passed? That would enable testing on btrfs, XFS and NFSv4.2 hosted
image files without needing any special block device setup at all...

i.e. I think you're doing this compeltely backwards by trying to
target non-existent hardware first....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-05-02  4:09         ` Dave Chinner
  0 siblings, 0 replies; 101+ messages in thread
From: Dave Chinner @ 2022-05-02  4:09 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, Damien Le Moal, linux-kernel, linux-nvme,
	linux-block, dm-devel, linux-fsdevel, nitheshshetty

On Wed, Apr 27, 2022 at 06:19:51PM +0530, Nitesh Shetty wrote:
> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> > On 4/26/22 19:12, Nitesh Shetty wrote:
> > > The patch series covers the points discussed in November 2021 virtual call
> > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > > We have covered the Initial agreed requirements in this patchset.
> > > Patchset borrows Mikulas's token based approach for 2 bdev
> > > implementation.
> > > 
> > > Overall series supports –
> > > 
> > > 1. Driver
> > > - NVMe Copy command (single NS), including support in nvme-target (for
> > >     block and file backend)
> > 
> > It would also be nice to have copy offload emulation in null_blk for testing.
> >
> 
> We can plan this in next phase of copy support, once this series settles down.

Why not just hook the loopback driver up to copy_file_range() so
that the backend filesystem can just reflink copy the ranges being
passed? That would enable testing on btrfs, XFS and NFSv4.2 hosted
image files without needing any special block device setup at all...

i.e. I think you're doing this compeltely backwards by trying to
target non-existent hardware first....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
  2022-04-27 12:49       ` [dm-devel] " Nitesh Shetty
@ 2022-05-02 12:14         ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-05-02 12:14 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 2022/04/27 21:49, Nitesh Shetty wrote:
> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual call
>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>> We have covered the Initial agreed requirements in this patchset.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> Overall series supports –
>>>
>>> 1. Driver
>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>     block and file backend)
>>
>> It would also be nice to have copy offload emulation in null_blk for testing.
>>
> 
> We can plan this in next phase of copy support, once this series settles down.

Why ? How do you expect people to test simply without null_blk ? Sutre, you said
QEMU can be used. But if copy offload is not upstream for QEMU either, there is
no easy way to test.

Adding that support to null_blk would not be hard at all.



-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-05-02 12:14         ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-05-02 12:14 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 2022/04/27 21:49, Nitesh Shetty wrote:
> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual call
>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>> We have covered the Initial agreed requirements in this patchset.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> Overall series supports –
>>>
>>> 1. Driver
>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>     block and file backend)
>>
>> It would also be nice to have copy offload emulation in null_blk for testing.
>>
> 
> We can plan this in next phase of copy support, once this series settles down.

Why ? How do you expect people to test simply without null_blk ? Sutre, you said
QEMU can be used. But if copy offload is not upstream for QEMU either, there is
no easy way to test.

Adding that support to null_blk would not be hard at all.



-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
  2022-05-02 12:14         ` Damien Le Moal
@ 2022-05-02 12:16           ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-05-02 12:16 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 2022/05/02 21:14, Damien Le Moal wrote:
> On 2022/04/27 21:49, Nitesh Shetty wrote:
>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>> The patch series covers the points discussed in November 2021 virtual call
>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>> We have covered the Initial agreed requirements in this patchset.
>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>> implementation.
>>>>
>>>> Overall series supports –
>>>>
>>>> 1. Driver
>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>     block and file backend)
>>>
>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>
>>
>> We can plan this in next phase of copy support, once this series settles down.
> 
> Why ? How do you expect people to test simply without null_blk ? Sutre, you said
> QEMU can be used. But if copy offload is not upstream for QEMU either, there is
> no easy way to test.
> 
> Adding that support to null_blk would not be hard at all.

Sorry. Replied again to an email I already replied to. vger keep sending me
multiple copies of the same emails...

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-05-02 12:16           ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-05-02 12:16 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 2022/05/02 21:14, Damien Le Moal wrote:
> On 2022/04/27 21:49, Nitesh Shetty wrote:
>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>> The patch series covers the points discussed in November 2021 virtual call
>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>> We have covered the Initial agreed requirements in this patchset.
>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>> implementation.
>>>>
>>>> Overall series supports –
>>>>
>>>> 1. Driver
>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>     block and file backend)
>>>
>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>
>>
>> We can plan this in next phase of copy support, once this series settles down.
> 
> Why ? How do you expect people to test simply without null_blk ? Sutre, you said
> QEMU can be used. But if copy offload is not upstream for QEMU either, there is
> no easy way to test.
> 
> Adding that support to null_blk would not be hard at all.

Sorry. Replied again to an email I already replied to. vger keep sending me
multiple copies of the same emails...

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-05-02  4:09         ` [dm-devel] " Dave Chinner
@ 2022-05-02 12:54           ` Damien Le Moal
  -1 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-05-02 12:54 UTC (permalink / raw)
  To: Dave Chinner, Nitesh Shetty
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	nitheshshetty, linux-kernel

On 2022/05/02 13:09, Dave Chinner wrote:
> On Wed, Apr 27, 2022 at 06:19:51PM +0530, Nitesh Shetty wrote:
>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>> The patch series covers the points discussed in November 2021 virtual call
>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>> We have covered the Initial agreed requirements in this patchset.
>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>> implementation.
>>>>
>>>> Overall series supports –
>>>>
>>>> 1. Driver
>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>     block and file backend)
>>>
>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>
>>
>> We can plan this in next phase of copy support, once this series settles down.
> 
> Why not just hook the loopback driver up to copy_file_range() so
> that the backend filesystem can just reflink copy the ranges being
> passed? That would enable testing on btrfs, XFS and NFSv4.2 hosted
> image files without needing any special block device setup at all...

That is a very good idea ! But that will cover only the non-zoned case. For copy
offload on zoned devices, adding support in null_blk is probably the simplest
thing to do.

> 
> i.e. I think you're doing this compeltely backwards by trying to
> target non-existent hardware first....
> 
> Cheers,
> 
> Dave.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-05-02 12:54           ` Damien Le Moal
  0 siblings, 0 replies; 101+ messages in thread
From: Damien Le Moal @ 2022-05-02 12:54 UTC (permalink / raw)
  To: Dave Chinner, Nitesh Shetty
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 2022/05/02 13:09, Dave Chinner wrote:
> On Wed, Apr 27, 2022 at 06:19:51PM +0530, Nitesh Shetty wrote:
>> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
>>> On 4/26/22 19:12, Nitesh Shetty wrote:
>>>> The patch series covers the points discussed in November 2021 virtual call
>>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
>>>> We have covered the Initial agreed requirements in this patchset.
>>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>>> implementation.
>>>>
>>>> Overall series supports –
>>>>
>>>> 1. Driver
>>>> - NVMe Copy command (single NS), including support in nvme-target (for
>>>>     block and file backend)
>>>
>>> It would also be nice to have copy offload emulation in null_blk for testing.
>>>
>>
>> We can plan this in next phase of copy support, once this series settles down.
> 
> Why not just hook the loopback driver up to copy_file_range() so
> that the backend filesystem can just reflink copy the ranges being
> passed? That would enable testing on btrfs, XFS and NFSv4.2 hosted
> image files without needing any special block device setup at all...

That is a very good idea ! But that will cover only the non-zoned case. For copy
offload on zoned devices, adding support in null_blk is probably the simplest
thing to do.

> 
> i.e. I think you're doing this compeltely backwards by trying to
> target non-existent hardware first....
> 
> Cheers,
> 
> Dave.


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH v4 00/10] Add Copy offload support
  2022-05-02 12:54           ` [dm-devel] " Damien Le Moal
@ 2022-05-02 23:20             ` Dave Chinner
  -1 siblings, 0 replies; 101+ messages in thread
From: Dave Chinner @ 2022-05-02 23:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Nitesh Shetty, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, nitheshshetty, linux-kernel

On Mon, May 02, 2022 at 09:54:55PM +0900, Damien Le Moal wrote:
> On 2022/05/02 13:09, Dave Chinner wrote:
> > On Wed, Apr 27, 2022 at 06:19:51PM +0530, Nitesh Shetty wrote:
> >> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> >>> On 4/26/22 19:12, Nitesh Shetty wrote:
> >>>> The patch series covers the points discussed in November 2021 virtual call
> >>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> >>>> We have covered the Initial agreed requirements in this patchset.
> >>>> Patchset borrows Mikulas's token based approach for 2 bdev
> >>>> implementation.
> >>>>
> >>>> Overall series supports –
> >>>>
> >>>> 1. Driver
> >>>> - NVMe Copy command (single NS), including support in nvme-target (for
> >>>>     block and file backend)
> >>>
> >>> It would also be nice to have copy offload emulation in null_blk for testing.
> >>>
> >>
> >> We can plan this in next phase of copy support, once this series settles down.
> > 
> > Why not just hook the loopback driver up to copy_file_range() so
> > that the backend filesystem can just reflink copy the ranges being
> > passed? That would enable testing on btrfs, XFS and NFSv4.2 hosted
> > image files without needing any special block device setup at all...
> 
> That is a very good idea ! But that will cover only the non-zoned case. For copy
> offload on zoned devices, adding support in null_blk is probably the simplest
> thing to do.

Sure, but that's a zone device implementation issue, not a "how do
applications use this offload" issue.

i.e. zonefs support is not necessary to test the bio/block layer
interfaces at all. All we need is a block device that can decode the
bio-encoded offload packet and execute it to do full block layer
testing. We can build dm devices on top of loop devices, etc, so we
can test that the oflload support is plumbed, sliced, diced, and
regurgitated correctly that way. We don't need actual low level
device drivers to test this.

And, unlike the nullblk device, using the loopback device w/
copy_file_range() will also allow data integrity testing if a
generic copy_file_range() offload implementation is added. That is,
we test a non-reflink capable filesystem on the loop device with the
image file hosted on a reflink-capable filesystem. The upper
filesystem copy then gets offloaded to reflinks in the lower
filesystem. We already have copy_file_range() support in fsx, so all
the data integrity fsx tests in fstests will exercise this offload
path and find all the data corruptions the initial block layer bugs
expose...

Further, fsstress also has copy_file_range() support, and so all the
fstests that generate stress tests or use fstress as load for
failure testing will also exercise it.

Indeed, this then gives us fine-grained error injection capability
within fstests via devices like dm-flakey. What happens when
dm-flakey kills the device IO mid-offload? Does everything recover
correctly? Do we end up with data corruption? Are partial offload
completions when errors occur signalled correctly? Is there -any-
test coverage (or even capability for testing) of user driven copy
offload failure situations like this in any of the other test
suites?

I mean, once the loop device has cfr offload, we can use dm-flakey
to kill IO in the image file or even do a force shutdown of the
image host filesystem. Hence we can actually trash the copy offload
operation in mid-flight, not just error it out on full completion.
This is trivial to do with the fstests infrastructure - it just
relies on having generic copy_file_range() block offload support and
a loopback device offload of hardware copy bios back to
copy_file_range()....

This is what I mean about copy offload being designed the wrong way.
We have the high level hooks needed to implement it right though the
filesystems and block layer without any specific hardware support,
and we can test the whole stack without needing specific hardware
support. We already have filesystem level copy offload acceleration,
so the last thing we want to see is a block layer offload
implementation that is incompatible with the semantics we've already
exposed to userspace for copy offloads.

As I said:

> > i.e. I think you're doing this compeltely backwards by trying to
> > target non-existent hardware first....

Rather than tie the block layer offload function/implementation to
the specific quirks of a specific target hardware, we should be
adding generic support in the block layer for the copy offload
semantics we've already exposed to userspace. We already have test
coverage and infrastructure for this interface and is already in use
by applications.

Transparent hardware acceleration of data copies when the hardware
supports it is exactly where copy offloads are useful - implementing
support based around hardware made of unobtainium and then adding
high level user facing API support as an afterthought is putting the
cart before the horse. We need to make sure the high level
functionality is robust and handles errors correctly before we even
worry about what quirks the hardware might bring to the table.

Build a reference model first with the loop device and
copy-file-range, test it, validate it, make sure it all works. Then
hook up the hardware, and fix all the hardware bugs that are exposed
before the hardware is released to the general public....

Why haven't we learnt this lesson yet from all the problems we've
had with, say, broken discard/trim, zeroing, erase, etc in hardware
implementations, incompatible hardware protocol implementations of
equivalent functionality, etc? i.e. We haven't defined the OS
required behaviour that hardware must support and instead just tried
to make whatever has come from the hardware vendor's
"standarisation" process work ok?

In this case, we already have a functioning model, syscalls and user
applications making use of copy offloads at the OS level. Now we
need to implement those exact semantics at the block layer to build
a validated reference model for the block layer offload behaviour
that hardware must comply with. Then hardware offloads in actual
hardware can be compared and validated against the reference model
behaviour, and any hardware that doesn't match can be
quirked/blacklisted until the manufacturer fixes their firmware...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [dm-devel] [PATCH v4 00/10] Add Copy offload support
@ 2022-05-02 23:20             ` Dave Chinner
  0 siblings, 0 replies; 101+ messages in thread
From: Dave Chinner @ 2022-05-02 23:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, nitheshshetty, linux-kernel, linux-nvme, linux-block,
	dm-devel, linux-fsdevel, Nitesh Shetty

On Mon, May 02, 2022 at 09:54:55PM +0900, Damien Le Moal wrote:
> On 2022/05/02 13:09, Dave Chinner wrote:
> > On Wed, Apr 27, 2022 at 06:19:51PM +0530, Nitesh Shetty wrote:
> >> O Wed, Apr 27, 2022 at 11:19:48AM +0900, Damien Le Moal wrote:
> >>> On 4/26/22 19:12, Nitesh Shetty wrote:
> >>>> The patch series covers the points discussed in November 2021 virtual call
> >>>> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> >>>> We have covered the Initial agreed requirements in this patchset.
> >>>> Patchset borrows Mikulas's token based approach for 2 bdev
> >>>> implementation.
> >>>>
> >>>> Overall series supports –
> >>>>
> >>>> 1. Driver
> >>>> - NVMe Copy command (single NS), including support in nvme-target (for
> >>>>     block and file backend)
> >>>
> >>> It would also be nice to have copy offload emulation in null_blk for testing.
> >>>
> >>
> >> We can plan this in next phase of copy support, once this series settles down.
> > 
> > Why not just hook the loopback driver up to copy_file_range() so
> > that the backend filesystem can just reflink copy the ranges being
> > passed? That would enable testing on btrfs, XFS and NFSv4.2 hosted
> > image files without needing any special block device setup at all...
> 
> That is a very good idea ! But that will cover only the non-zoned case. For copy
> offload on zoned devices, adding support in null_blk is probably the simplest
> thing to do.

Sure, but that's a zone device implementation issue, not a "how do
applications use this offload" issue.

i.e. zonefs support is not necessary to test the bio/block layer
interfaces at all. All we need is a block device that can decode the
bio-encoded offload packet and execute it to do full block layer
testing. We can build dm devices on top of loop devices, etc, so we
can test that the oflload support is plumbed, sliced, diced, and
regurgitated correctly that way. We don't need actual low level
device drivers to test this.

And, unlike the nullblk device, using the loopback device w/
copy_file_range() will also allow data integrity testing if a
generic copy_file_range() offload implementation is added. That is,
we test a non-reflink capable filesystem on the loop device with the
image file hosted on a reflink-capable filesystem. The upper
filesystem copy then gets offloaded to reflinks in the lower
filesystem. We already have copy_file_range() support in fsx, so all
the data integrity fsx tests in fstests will exercise this offload
path and find all the data corruptions the initial block layer bugs
expose...

Further, fsstress also has copy_file_range() support, and so all the
fstests that generate stress tests or use fstress as load for
failure testing will also exercise it.

Indeed, this then gives us fine-grained error injection capability
within fstests via devices like dm-flakey. What happens when
dm-flakey kills the device IO mid-offload? Does everything recover
correctly? Do we end up with data corruption? Are partial offload
completions when errors occur signalled correctly? Is there -any-
test coverage (or even capability for testing) of user driven copy
offload failure situations like this in any of the other test
suites?

I mean, once the loop device has cfr offload, we can use dm-flakey
to kill IO in the image file or even do a force shutdown of the
image host filesystem. Hence we can actually trash the copy offload
operation in mid-flight, not just error it out on full completion.
This is trivial to do with the fstests infrastructure - it just
relies on having generic copy_file_range() block offload support and
a loopback device offload of hardware copy bios back to
copy_file_range()....

This is what I mean about copy offload being designed the wrong way.
We have the high level hooks needed to implement it right though the
filesystems and block layer without any specific hardware support,
and we can test the whole stack without needing specific hardware
support. We already have filesystem level copy offload acceleration,
so the last thing we want to see is a block layer offload
implementation that is incompatible with the semantics we've already
exposed to userspace for copy offloads.

As I said:

> > i.e. I think you're doing this compeltely backwards by trying to
> > target non-existent hardware first....

Rather than tie the block layer offload function/implementation to
the specific quirks of a specific target hardware, we should be
adding generic support in the block layer for the copy offload
semantics we've already exposed to userspace. We already have test
coverage and infrastructure for this interface and is already in use
by applications.

Transparent hardware acceleration of data copies when the hardware
supports it is exactly where copy offloads are useful - implementing
support based around hardware made of unobtainium and then adding
high level user facing API support as an afterthought is putting the
cart before the horse. We need to make sure the high level
functionality is robust and handles errors correctly before we even
worry about what quirks the hardware might bring to the table.

Build a reference model first with the loop device and
copy-file-range, test it, validate it, make sure it all works. Then
hook up the hardware, and fix all the hardware bugs that are exposed
before the hardware is released to the general public....

Why haven't we learnt this lesson yet from all the problems we've
had with, say, broken discard/trim, zeroing, erase, etc in hardware
implementations, incompatible hardware protocol implementations of
equivalent functionality, etc? i.e. We haven't defined the OS
required behaviour that hardware must support and instead just tried
to make whatever has come from the hardware vendor's
"standarisation" process work ok?

In this case, we already have a functioning model, syscalls and user
applications making use of copy offloads at the OS level. Now we
need to implement those exact semantics at the block layer to build
a validated reference model for the block layer offload behaviour
that hardware must comply with. Then hardware offloads in actual
hardware can be compared and validated against the reference model
behaviour, and any hardware that doesn't match can be
quirked/blacklisted until the manufacturer fixes their firmware...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, other threads:[~2022-05-02 23:23 UTC | newest]

Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220426101804epcas5p4a0a325d3ce89e868e4924bbdeeba6d15@epcas5p4.samsung.com>
2022-04-26 10:12 ` [PATCH v4 00/10] Add Copy offload support Nitesh Shetty
2022-04-26 10:12   ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12   ` Nitesh Shetty
     [not found]   ` <CGME20220426101910epcas5p4fd64f83c6da9bbd891107d158a2743b5@epcas5p4.samsung.com>
2022-04-26 10:12     ` [PATCH v4 01/10] block: Introduce queue limits for copy-offload support Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-27  1:59       ` Damien Le Moal
2022-04-27  1:59         ` [dm-devel] " Damien Le Moal
2022-04-27 15:30         ` Nitesh Shetty
2022-04-27 15:30           ` [dm-devel] " Nitesh Shetty
2022-04-27 21:57           ` Damien Le Moal
2022-04-27 21:57             ` [dm-devel] " Damien Le Moal
2022-04-27 10:30       ` Hannes Reinecke
2022-04-27 10:30         ` Hannes Reinecke
     [not found]   ` <CGME20220426101921epcas5p341707619b5e836490284a42c92762083@epcas5p3.samsung.com>
2022-04-26 10:12     ` [PATCH v4 02/10] block: Add copy offload support infrastructure Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-27  0:11       ` kernel test robot
2022-04-27  0:11         ` [dm-devel] " kernel test robot
2022-04-27  2:45       ` Damien Le Moal
2022-04-27  2:45         ` [dm-devel] " Damien Le Moal
2022-04-27 15:15         ` Nitesh Shetty
2022-04-27 15:15           ` [dm-devel] " Nitesh Shetty
2022-04-27 22:04           ` Damien Le Moal
2022-04-27 22:04             ` [dm-devel] " Damien Le Moal
2022-04-28  8:01             ` Nitesh Shetty
2022-04-28  8:01               ` [dm-devel] " Nitesh Shetty
2022-04-27 10:29       ` Hannes Reinecke
2022-04-27 10:29         ` Hannes Reinecke
2022-04-27 15:48         ` Nitesh Shetty
2022-04-27 15:48           ` [dm-devel] " Nitesh Shetty
     [not found]   ` <CGME20220426101938epcas5p291690dd1f0e931cd9f8139daaf3f9296@epcas5p2.samsung.com>
2022-04-26 10:12     ` [PATCH v4 03/10] block: Introduce a new ioctl for copy Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-27  2:48       ` Damien Le Moal
2022-04-27  2:48         ` [dm-devel] " Damien Le Moal
2022-04-27 13:03         ` Nitesh Shetty
2022-04-27 13:03           ` [dm-devel] " Nitesh Shetty
2022-04-27 10:37       ` Hannes Reinecke
2022-04-27 10:37         ` Hannes Reinecke
     [not found]   ` <CGME20220426101951epcas5p1f53a2120010607354dc29bf8331f6af8@epcas5p1.samsung.com>
2022-04-26 10:12     ` [PATCH v4 04/10] block: add emulation " Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-27  1:33       ` kernel test robot
2022-04-27  1:33         ` [dm-devel] " kernel test robot
     [not found]   ` <CGME20220426102001epcas5p4e321347334971d704cb19ffa25f9d0b4@epcas5p4.samsung.com>
2022-04-26 10:12     ` [PATCH v4 05/10] nvme: add copy offload support Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-28 14:02       ` kernel test robot
2022-04-28 14:02         ` [dm-devel] " kernel test robot
     [not found]   ` <CGME20220426102009epcas5p3e5b1ddfd5d3c7200972cecb139650da6@epcas5p3.samsung.com>
2022-04-26 10:12     ` [PATCH v4 06/10] nvmet: add copy command support for bdev and file ns Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-28 14:53       ` kernel test robot
2022-04-28 14:53         ` [dm-devel] " kernel test robot
     [not found]   ` <CGME20220426102017epcas5p295d3b62eaa250765e48c767962cbf08b@epcas5p2.samsung.com>
2022-04-26 10:12     ` [PATCH v4 07/10] dm: Add support for copy offload Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-28 15:54       ` kernel test robot
2022-04-28 15:54         ` [dm-devel] " kernel test robot
     [not found]   ` <CGME20220426102025epcas5p299d9a88c30db8b9a04a05c57dc809ff7@epcas5p2.samsung.com>
2022-04-26 10:12     ` [PATCH v4 08/10] dm: Enable copy offload for dm-linear target Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
     [not found]   ` <CGME20220426102033epcas5p137171ff842e8b0a090d2708cfc0e3249@epcas5p1.samsung.com>
2022-04-26 10:12     ` [PATCH v4 09/10] dm kcopyd: use copy offload support Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
     [not found]   ` <CGME20220426102042epcas5p201aa0d9143d7bc650ae7858383b69288@epcas5p2.samsung.com>
2022-04-26 10:12     ` [PATCH v4 10/10] fs: add support for copy file range in zonefs Nitesh Shetty
2022-04-26 10:12       ` [dm-devel] " Nitesh Shetty
2022-04-26 10:12       ` Nitesh Shetty
2022-04-27  1:42       ` Damien Le Moal
2022-04-27  1:42         ` [dm-devel] " Damien Le Moal
2022-04-27  1:46   ` [PATCH v4 00/10] Add Copy offload support Damien Le Moal
2022-04-27  1:46     ` [dm-devel] " Damien Le Moal
2022-04-27 15:38     ` Nitesh Shetty
2022-04-27 15:38       ` [dm-devel] " Nitesh Shetty
2022-04-27 21:56       ` Damien Le Moal
2022-04-27 21:56         ` [dm-devel] " Damien Le Moal
2022-04-27  2:00   ` Damien Le Moal
2022-04-27  2:00     ` [dm-devel] " Damien Le Moal
2022-04-27  2:19   ` Damien Le Moal
2022-04-27  2:19     ` [dm-devel] " Damien Le Moal
2022-04-27 12:49     ` Nitesh Shetty
2022-04-27 12:49       ` [dm-devel] " Nitesh Shetty
2022-04-27 22:05       ` Damien Le Moal
2022-04-27 22:05         ` [dm-devel] " Damien Le Moal
2022-04-28  7:49         ` Nitesh Shetty
2022-04-28  7:49           ` [dm-devel] " Nitesh Shetty
2022-04-28 21:37           ` Damien Le Moal
2022-04-28 21:37             ` [dm-devel] " Damien Le Moal
2022-04-29  3:39             ` Bart Van Assche
2022-04-29  3:39               ` Bart Van Assche
2022-05-02  4:09       ` Dave Chinner
2022-05-02  4:09         ` [dm-devel] " Dave Chinner
2022-05-02 12:54         ` Damien Le Moal
2022-05-02 12:54           ` [dm-devel] " Damien Le Moal
2022-05-02 23:20           ` Dave Chinner
2022-05-02 23:20             ` [dm-devel] " Dave Chinner
2022-05-02 12:14       ` Damien Le Moal
2022-05-02 12:14         ` Damien Le Moal
2022-05-02 12:16         ` Damien Le Moal
2022-05-02 12:16           ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.