linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-11  0:15 Chaitanya Kulkarni
  2021-05-11 21:15 ` Knight, Frederick
                   ` (9 more replies)
  0 siblings, 10 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-11  0:15 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2021-05-11 21:15 ` Knight, Frederick
  2021-05-12  2:21 ` Bart Van Assche
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 62+ messages in thread
From: Knight, Frederick @ 2021-05-11 21:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	Hannes Reinecke, kbusch, rwheeler, hch, zach.brown, osandov

I'd love to participate in this discussion.

You mention the 2 different models (single command vs. multi-command).  Just as a reminder, there are specific reasons for those 2 different models.

Some applications know both the source and the destination, so can use the single command model (the application is aware it is doing a copy).  But, there is a group of applications that do NOT know both pieces of information at the same time, in the same thread, in the same context (the application is NOT aware it is doing a copy - the application thinks it is doing reads and writes).

That is why there are 2 different models - because the application engineers didn't want to change their application.  So, the author of the CP application (the shell copy command) wanted to use the existing READ / WRITE model (2 commands).  Just replace the READ with "get the data ready" and replace the WRITE with "use the data you got ready".  It was easier for that application to use the existing model, rather than totally redesigning the application.

But, other application engineers had a code base that already knew a copy was happening, and their code already knew both the source and destination in the same code path. A BACKUP application is one that generally fits into this camp.  So, it was easier for that application to replace that function with a single copy request.  Another application was a VM mastering/replicating application that could spin up new VM images very quickly - the source and destination are known to be able to use a single request.

When this offload journey began, both interfaces were needed and used.  But yes, it did bifurcate the space, creating 2 camps of engineers - each with their favorite method (based on the application where they planned to use it).  Each camp of engineers often sees no reason that the other camp can't just switch to do it the way they do - if they'd only see the light.  But, originally, there were 2 different sets of requirements that each drove a specific design of a copy offload model.

Even NVMe has recently joined the copy offload camp with a new COPY command (single namespace, multiple source ranges, single destination range - works well for defrag, and other use cases). I'm confident its capabilities will grow over time.

SO, I think this will be a great discussion to have!!!

	Fred Knight



-----Original Message-----
From: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> 
Sent: Monday, May 10, 2021 8:16 PM
To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; martin.petersen@oracle.com; roland@purestorage.com; mpatocka@redhat.com; Hannes Reinecke <hare@suse.de>; kbusch@kernel.org; rwheeler@redhat.com; hch@lst.de; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; osandov@fb.com
Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices to be instructed to copy files/logical blocks without requiring involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded performance is limiting due to Denard scaling and multi-threaded performance is slowing down due Moore's law limitations. With the rise of SNIA Computation Technical Storage Working Group (TWG) [2], offloading computations to the device or over the fabrics is becoming popular as there are several solutions available [2]. One of the common operation which is popular in the kernel and is not merged yet is Copy offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The latest work which is posted by Mikulas [4] is not merged yet. These two approaches are totally different from each other. Several storage vendors discourage mixing copy offload requests with regular READ/WRITE I/O. Also, the fact that the operation fails if a copy request ever needs to be split as it traverses the stack it has the unfortunate side-effect of preventing copy offload from working in pretty much every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without splitting the command in two, one for copying IN and one for copying OUT. Which is then demonstrated by the [4] why [3] it is not a suitable candidate. Also, with [4] there is an unresolved problem with the two-command approach about how to handle changes to the DM layout between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2], existing SCSI XCopy support in the protocol, recent advancement in the Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the Linux Kernel mainly for NVMe devices [7] and eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.

With this background we have significant number of use-cases which are strong candidates waiting for outstanding Linux Kernel Block Layer Copy Offload support, so that Linux Kernel Storage subsystem can to address previously mentioned problems [1] and allow efficient offloading of the data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
  2021-05-11 21:15 ` Knight, Frederick
@ 2021-05-12  2:21 ` Bart Van Assche
       [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 62+ messages in thread
From: Bart Van Assche @ 2021-05-12  2:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/10/21 5:15 PM, Chaitanya Kulkarni wrote:
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?

Are there any blockers left? My understanding is that what is needed is
to implement what has been proposed recently
(https://lore.kernel.org/linux-nvme/yq1blf3smcl.fsf@ca-mkp.ca.oracle.com/).
Anyway, I'm interested to attend the conversation about this topic.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
       [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
@ 2021-05-12  7:13   ` Javier González
  0 siblings, 0 replies; 62+ messages in thread
From: Javier González @ 2021-05-12  7:13 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, martin.petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Kanchan Joshi, SelvaKumar S

On 11.05.2021 00:15, Chaitanya Kulkarni wrote:
>Hi,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite file system, block layer, and device drivers
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Keith Busch
>
>Regards,
>Chaitanya
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf


I would like to participate in this discussion too.

Cc'in Selva and Kanchan, who have been posting several series for NVMe
Simple Copy (SCC). Even though SCC is a very narrow use-case of
copy-offload, it seems like a good start to start getting generic code
in the block layer.

Javier



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (2 preceding siblings ...)
       [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
@ 2021-05-12  7:30 ` Johannes Thumshirn
       [not found]   ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
  2021-05-12 15:23 ` Hannes Reinecke
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 62+ messages in thread
From: Johannes Thumshirn @ 2021-05-12  7:30 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

On 11/05/2021 02:15, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
>

I would like to participate in this discussion as well. A generic block layer
copy API is extremely helpful for filesystem garbage collection and copy operations
like copy_file_range().

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (3 preceding siblings ...)
  2021-05-12  7:30 ` Johannes Thumshirn
@ 2021-05-12 15:23 ` Hannes Reinecke
  2021-05-12 15:45 ` Himanshu Madhani
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 62+ messages in thread
From: Hannes Reinecke @ 2021-05-12 15:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/11/21 2:15 AM, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
The neverending topic.

Count me in.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (4 preceding siblings ...)
  2021-05-12 15:23 ` Hannes Reinecke
@ 2021-05-12 15:45 ` Himanshu Madhani
  2021-05-17 16:39 ` Kanchan Joshi
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 62+ messages in thread
From: Himanshu Madhani @ 2021-05-12 15:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, Martin Petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov



> On May 10, 2021, at 7:15 PM, Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> wrote:
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.

I would like to participate in this discussion as well. 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (5 preceding siblings ...)
  2021-05-12 15:45 ` Himanshu Madhani
@ 2021-05-17 16:39 ` Kanchan Joshi
  2021-05-18  0:15 ` Bart Van Assche
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 62+ messages in thread
From: Kanchan Joshi @ 2021-05-17 16:39 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, martin.petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov

> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> -----------------------------------------------------------------------
>
> I'd like to invite file system, block layer, and device drivers
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
I'd like to participate in discussion.
Hopefully we can get consensus on some elements (or discover new
issues) before Dec.
An async-interface (via io_uring) would be good to be discussed while
we are at it.


-- 
Kanchan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (6 preceding siblings ...)
  2021-05-17 16:39 ` Kanchan Joshi
@ 2021-05-18  0:15 ` Bart Van Assche
  2021-06-11  6:03 ` Chaitanya Kulkarni
  2021-06-11 15:35 ` Nikos Tsironis
  9 siblings, 0 replies; 62+ messages in thread
From: Bart Van Assche @ 2021-05-18  0:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/10/21 5:15 PM, Chaitanya Kulkarni wrote:
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?

We need to achieve agreement about an approach. The text below is my
attempt at guiding the discussion. A HTML version is available at
https://github.com/bvanassche/linux-kernel-copy-offload. As usual,
feedback is welcome.

Bart.


# Implementing Copy Offloading in the Linux Kernel

## Introduction

Efforts to add copy offloading support in the Linux kernel started considerable
time ago. Despite this copy offloading support is not yet upstream and there is
no detailed plan yet of how to implement copy offloading.

This document outlines a possible implementation. The purpose of this document
is to help guiding the conversations around copy offloading.

## Block Layer

We need an interface to pass copy offload requests from user space or file
systems to block drivers. Although the first implementation of copy offloading
added a single operation to the block layer for copy offloading, there seems
to be agreement today to implement copy offloading as two operations,
namely `REQ_COPY_IN` and `REQ_COPY_OUT`.

A possible approach is as follows:

* Fall back to a non-offloaded copy operation if necessary, e.g. if copy
  offloading is not supported or if data is encrypted and the ciphertext
  depends on the LBA. The following code may be a good starting point:
  `drivers/md/dm-kcopyd.c`.
* If the block driver supports copy offloading, submit the `REQ_COPY_IN`
  operation first. The block driver stores the data ranges associated with the
  `REQ_COPY_IN` operation.
* Wait for completion of the `REQ_COPY_IN` operation.
* After the `REQ_COPY_IN` operation has completed, submit the `REQ_COPY_OUT`
  operation and include a reference to the `REQ_COPY_IN` operation. If the
  block driver that receives the `REQ_COPY_OUT` operation receives a matching
  `REQ_COPY_IN` operation, offload the copy operation. Otherwise report that no
  data has been copied and let the block layer perform a non-offloaded copy
  operation.

The operation type is stored in the top bits of the `bi_opf` member of struct
bio.  With each bio a single data buffer and a single contiguous byte range on
the storage medium are associated. Pointers to the data buffer occur in
`bi_io_vec[]`. The affected byte range is represented by `bi_iter.bi_sector` and
`bi_iter.bi_size`.

While the NVMe and SCSI copy offload commands both support multiple source
ranges, XCOPY supports multiple destination ranges while the NVMe simple copy
command supports a single destination range.

Possible approaches for passing the data ranges involved in a copy operation
from the block layer to block drivers are as follows:

* Attach a bio to each copy offload request and encode all relevant copy
  offload parameters in that data buffer. These parameters include source
  device and source ranges for `REQ_COPY_IN` and destination device and
  destination ranges for `REQ_COPY_OUT`. Let the block drivers translate these
  parameters into something the storage device understands (NVMe simple copy
  parameters or SCSI XCOPY parameters). Fill in the parameter structure size
  in `bi_iter.bi_size`. Set `bi_vcnt` to 1 and fill in `bio->bi_io_vec[0]`.
* Map each source range and each destination range onto a different bio. Link
  all the bios with the `bi_next` pointer and attach these bios to the copy
  offload requests. Leave `bi_vcnt` zero. This is related but not identical to
  the approach followed by `__blkdev_issue_discard()`.

I think that the first approach would require more changes in the device mapper
than the second approach since the device mapper code knows how to split bios
but not how to split a buffer with LBA range descriptors.

The following code needs to be modified no matter how copy offloading is
implemented:

* Request cloning. The code for checking the limits before request are cloned
  compares `blk_rq_sectors()` with `max_sectors`. This is inappropriate for
  `REQ_COPY_*` requests.
* Request splitting. `bio_split()` assumes that `bi_iter.bi_size` represents
  the number of bytes affected on the medium.
* Code related to retrying the original requests of a merged request with
  mixed failfast attributes, e.g. `blk_rq_err_bytes()`.
* Code related to partially completing a request, e.g. `blk_update_request()`.
* The code for merging block layer requests.
* `blk_mq_end_request()` since it calls `blk_update_request()` and
  `blk_rq_bytes()`.
* The plugging code because of the following test in the plugging code:
  `blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE`.
* The I/O accounting code (task_io_account_read()) since that code uses
  bio_has_data() and hence skips discard, secure erase and write zeroes
  requests:
```
static inline bool bio_has_data(struct bio *bio)
{
	return bio && bio->bi_iter.bi_size &&
	    bio_op(bio) != REQ_OP_DISCARD &&
	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
	    bio_op(bio) != REQ_OP_WRITE_ZEROESy;
}
```

Block drivers will need to use the `special_vec` member of struct request to
pass the copy offload parameters to the storage device. That member is used
e.g. when a REQ_OP_DISCARD operation is submitted to an NVMe driver. The SCSI
sd driver uses `special_vec` while processing an UNMAP or WRITE SAME command.

## Device Mapper

The device mapper may have to split a request. As an example, LVM is
based on the dm-linear driver. A request that is submitted to an LVM volume
has to be split if it affects multiple block devices. Copy offload requests
that affect multiple block devices should be split or should be onloaded.

The call chain for bio-based dm drivers is as follows:
```
dm_submit_bio(bio)
-> __split_and_process_bio(md, map, bio)
  -> __split_and_process_non_flush(clone_info)
    -> __clone_and_map_data_bio(clone_info, target_info, sector, len)
      -> clone_bio(dm_target_io, bio, sector, len)
      -> __map_bio(dm_target_io)
        -> ti->type->map(dm_target_io, clone)
```

## NVMe

Process copy offload commands by translating REQ_COPY_OUT requests into simple
copy commands.

## SCSI

From inside `sd_revalidate_disk()`, query the third-party copy VPD page. Extract
the following parameters (see also SPC-6):

* MAXIMUM CSCD DESCRIPTOR COUNT
* MAXIMUM SEGMENT DESCRIPTOR COUNT
* MAXIMUM DESCRIPTOR LIST LENGTH
* Supported third-party copy commands.
* SUPPORTED CSCD DESCRIPTOR ID (0 or more)
* ROD type descriptor (0 or more)
* TOTAL CONCURRENT COPIES
* MAXIMUM IDENTIFIED CONCURRENT COPIES
* MAXIMUM SEGMENT LENGTH

From inside `sd_init_command()`, translate REQ_COPY_OUT into either EXTENDED
COPY or POPULATE TOKEN + WRITE USING TOKEN.

Set the parameters in the copy offload commands as follows:

* We may have to set the STR bit. From SPC-6: "A sequential striped (STR) bit
  set to one specifies to the copy manager that the majority of the block
  device references in the parameter list represent sequential access of
  several block devices that are striped. This may be used by the copy manager
  to perform reads from a copy source block device at any time and in any
  order during processing of an EXTENDED COPY command as described in
  6.6.5.3. A STR bit set to zero specifies to the copy manager that disk
  references, if any, may not be sequential."
* Set the LIST ID USAGE field to 3 and the LIST ID to 0. This means that
  neither "held data" nor the RECEIVE COPY STATUS command are supported. This
  improves security because the data that is being copied cannot be accessed
  via the LIST ID.
* We may have to set the G_SENSE (good with sense data) bit. From SPC-6: " If
  the G _SENSE bit is set to one and the copy manager completes the EXTENDED
  COPY command with GOOD status, then the copy manager shall include sense
  data with the GOOD status in which the sense key is set to COMPLETED, the
  additional sense code is set to EXTENDED COPY INFORMATION AVAILABLE, and the
  COMMAND-SPECIFIC INFORMATION field is set to the number of segment
  descriptors the copy manager has processed."
* Clear the IMMED bit.

## System Call Interface

To submit copy offload requests from user space, we need:

* A system call for passing these requests, e.g. copy_file_range() or io_uring.
* Add a copy offload parameter format description to the user space ABI. The
  parameters include source device, source ranges, destination device and
  destination ranges.
* A flag that indicates whether or not it is acceptable to fall back to
  onloading the copy operation.

## Sysfs Interface

To do: define which aspects of copy offloading should be configurable through
new sysfs parameters under /sys/block/*/queue/.

## See Also

* Martin Petersen, [Copy
  Offload](https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg28998.html),
  linux-scsi, 28 May 2014.
* Mikulas Patocka, [ANNOUNCE: SCSI XCOPY support for the kernel and device
  mapper](https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg686111.html),
  15 July 2014.
* [kcopyd documentation](https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/kcopyd.html), kernel.org.
* Martin K. Petersen, [Copy Offload - Here Be Dragons](http://mkp.net/pubs/xcopy.pdf), 2019-08-21.
* Martin K. Petersen, [Re: [dm-devel] [RFC PATCH v2 1/2] block: add simple copy
support](https://lore.kernel.org/linux-nvme/yq1blf3smcl.fsf@ca-mkp.ca.oracle.com/), linux-nvme mailing list, 2020-12-08.
* NVM Express Organization, [NVMe - TP 4065b Simple Copy Command 2021.01.25 -
  Ratified.pdf](https://workspace.nvmexpress.org/apps/org/workgroup/allmembers/download.php/4773/NVMe%20-%20TP%204065b%20Simple%20Copy%20Command%202021.01.25%20-%20Ratified.pdf), 2021-01-25.
* Selvakumar S, [[RFC PATCH v5 0/4] add simple copy
  support](https://lore.kernel.org/linux-nvme/20210219124517.79359-1-selvakuma.s1@samsung.com/),
  linux-nvme, 2021-02-19.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (7 preceding siblings ...)
  2021-05-18  0:15 ` Bart Van Assche
@ 2021-06-11  6:03 ` Chaitanya Kulkarni
  2021-06-11 15:35 ` Nikos Tsironis
  9 siblings, 0 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-11  6:03 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc; +Cc: ckulkarnilinux

On 5/10/21 17:15, Chaitanya Kulkarni wrote:
> Hi,
>
> * Background :-
> -----------------------------------------------------------------------
>
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
>
> * Problem :-
> -----------------------------------------------------------------------
>
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
>
> * Current state of the work :-
> -----------------------------------------------------------------------
>
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
>
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> -----------------------------------------------------------------------
>
> I'd like to invite file system, block layer, and device drivers
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
>
> Regards,
> Chaitanya
>
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
>
>

Mail server is dropping emails from the mailing list, adding personal
email address.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                   ` (8 preceding siblings ...)
  2021-06-11  6:03 ` Chaitanya Kulkarni
@ 2021-06-11 15:35 ` Nikos Tsironis
  9 siblings, 0 replies; 62+ messages in thread
From: Nikos Tsironis @ 2021-06-11 15:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

On 5/11/21 3:15 AM, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> Regards,
> Chaitanya
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 

I would like to participate in this discussion too.

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
       [not found]   ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
@ 2021-09-28 19:13     ` Javier González
  2021-09-29  6:44       ` Johannes Thumshirn
                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Javier González @ 2021-09-28 19:13 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 12.05.2021 07:30, Johannes Thumshirn wrote:
>On 11/05/2021 02:15, Chaitanya Kulkarni wrote:
>> Hi,
>>
>> * Background :-
>> -----------------------------------------------------------------------
>>
>> Copy offload is a feature that allows file-systems or storage devices
>> to be instructed to copy files/logical blocks without requiring
>> involvement of the local CPU.
>>
>> With reference to the RISC-V summit keynote [1] single threaded
>> performance is limiting due to Denard scaling and multi-threaded
>> performance is slowing down due Moore's law limitations. With the rise
>> of SNIA Computation Technical Storage Working Group (TWG) [2],
>> offloading computations to the device or over the fabrics is becoming
>> popular as there are several solutions available [2]. One of the common
>> operation which is popular in the kernel and is not merged yet is Copy
>> offload over the fabrics or on to the device.
>>
>> * Problem :-
>> -----------------------------------------------------------------------
>>
>> The original work which is done by Martin is present here [3]. The
>> latest work which is posted by Mikulas [4] is not merged yet. These two
>> approaches are totally different from each other. Several storage
>> vendors discourage mixing copy offload requests with regular READ/WRITE
>> I/O. Also, the fact that the operation fails if a copy request ever
>> needs to be split as it traverses the stack it has the unfortunate
>> side-effect of preventing copy offload from working in pretty much
>> every common deployment configuration out there.
>>
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
>>
>> * Why Linux Kernel Storage System needs Copy Offload support now ?
>> -----------------------------------------------------------------------
>>
>> With the rise of the SNIA Computational Storage TWG and solutions [2],
>> existing SCSI XCopy support in the protocol, recent advancement in the
>> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>> DMA support in the Linux Kernel mainly for NVMe devices [7] and
>> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>> from Copy offload operation.
>>
>> With this background we have significant number of use-cases which are
>> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>> Offload support, so that Linux Kernel Storage subsystem can to address
>> previously mentioned problems [1] and allow efficient offloading of the
>> data related operations. (Such as move/copy etc.)
>>
>> For reference following is the list of the use-cases/candidates waiting
>> for Copy Offload support :-
>>
>> 1. SCSI-attached storage arrays.
>> 2. Stacking drivers supporting XCopy DM/MD.
>> 3. Computational Storage solutions.
>> 7. File systems :- Local, NFS and Zonefs.
>> 4. Block devices :- Distributed, local, and Zoned devices.
>> 5. Peer to Peer DMA support solutions.
>> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>>
>> * What we will discuss in the proposed session ?
>> -----------------------------------------------------------------------
>>
>> I'd like to propose a session to go over this topic to understand :-
>>
>> 1. What are the blockers for Copy Offload implementation ?
>> 2. Discussion about having a file system interface.
>> 3. Discussion about having right system call for user-space.
>> 4. What is the right way to move this work forward ?
>> 5. How can we help to contribute and move this work forward ?
>>
>> * Required Participants :-
>> -----------------------------------------------------------------------
>>
>> I'd like to invite file system, block layer, and device drivers
>> developers to:-
>>
>> 1. Share their opinion on the topic.
>> 2. Share their experience and any other issues with [4].
>> 3. Uncover additional details that are missing from this proposal.
>>
>> Required attendees :-
>>
>> Martin K. Petersen
>> Jens Axboe
>> Christoph Hellwig
>> Bart Van Assche
>> Zach Brown
>> Roland Dreier
>> Ric Wheeler
>> Trond Myklebust
>> Mike Snitzer
>> Keith Busch
>> Sagi Grimberg
>> Hannes Reinecke
>> Frederick Knight
>> Mikulas Patocka
>> Keith Busch
>>
>
>I would like to participate in this discussion as well. A generic block layer
>copy API is extremely helpful for filesystem garbage collection and copy operations
>like copy_file_range().


Hi all,

Since we are not going to be able to talk about this at LSF/MM, a few of
us thought about holding a dedicated virtual discussion about Copy
Offload. I believe we can use Chaitanya's thread as a start. Given the
current state of the current patches, I would propose that we focus on
the next step to get the minimal patchset that can go upstream so that
we can build from there.

Before we try to find a date and a time that fits most of us, who would
be interested in participating?

Thanks,
Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-28 19:13     ` Javier González
@ 2021-09-29  6:44       ` Johannes Thumshirn
  2021-09-30  9:43       ` Chaitanya Kulkarni
  2021-09-30 16:20       ` Bart Van Assche
  2 siblings, 0 replies; 62+ messages in thread
From: Johannes Thumshirn @ 2021-09-29  6:44 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 28/09/2021 21:13, Javier González wrote:
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 
> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?

I'd definitively be interested in participating.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-28 19:13     ` Javier González
  2021-09-29  6:44       ` Johannes Thumshirn
@ 2021-09-30  9:43       ` Chaitanya Kulkarni
  2021-09-30  9:53         ` Javier González
  2021-10-06 10:01         ` Javier González
  2021-09-30 16:20       ` Bart Van Assche
  2 siblings, 2 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2021-09-30  9:43 UTC (permalink / raw)
  To: Javier González, Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

Javier,

> 
> Hi all,
> 
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 

I agree with having a call as it has been two years I'm trying to have 
this discussion.

Before we setup a call, please summarize following here :-

1. Exactly what work has been done so far.
2. What kind of feedback you got.
3. What are the exact blockers/objections.
4. Potential ways of moving forward.

Although this all information is present in the mailing archives it is 
scattered all over the places, looking at the long CC list above we need 
to get the everyone on the same page in order to have a productive call.

Once we have above discussion we can setup a precise agenda and assign 
slots.

> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?
> 
> Thanks,
> Javier

-ck

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-30  9:43       ` Chaitanya Kulkarni
@ 2021-09-30  9:53         ` Javier González
  2021-10-06 10:01         ` Javier González
  1 sibling, 0 replies; 62+ messages in thread
From: Javier González @ 2021-09-30  9:53 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>Javier,
>
>>
>> Hi all,
>>
>> Since we are not going to be able to talk about this at LSF/MM, a few of
>> us thought about holding a dedicated virtual discussion about Copy
>> Offload. I believe we can use Chaitanya's thread as a start. Given the
>> current state of the current patches, I would propose that we focus on
>> the next step to get the minimal patchset that can go upstream so that
>> we can build from there.
>>
>
>I agree with having a call as it has been two years I'm trying to have
>this discussion.
>
>Before we setup a call, please summarize following here :-
>
>1. Exactly what work has been done so far.
>2. What kind of feedback you got.
>3. What are the exact blockers/objections.
>4. Potential ways of moving forward.
>
>Although this all information is present in the mailing archives it is
>scattered all over the places, looking at the long CC list above we need
>to get the everyone on the same page in order to have a productive call.
>
>Once we have above discussion we can setup a precise agenda and assign
>slots.

Sounds reasonable. Let me collect all this information and post it here.
I will maintain a list of people that has showed interest on joining.
For now:

   - Martin
   - Johannes
   - Fred
   - Chaitanya
   - Adam
   - Kanchan
   - Selva
   - Nitesh
   - Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-28 19:13     ` Javier González
  2021-09-29  6:44       ` Johannes Thumshirn
  2021-09-30  9:43       ` Chaitanya Kulkarni
@ 2021-09-30 16:20       ` Bart Van Assche
  2021-10-06 10:05         ` Javier González
  2 siblings, 1 reply; 62+ messages in thread
From: Bart Van Assche @ 2021-09-30 16:20 UTC (permalink / raw)
  To: Javier González, Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, martin.petersen, roland,
	mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 9/28/21 12:13 PM, Javier González wrote:
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 
> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?

Given the technical complexity of this topic and also that the people who are
interested live in multiple time zones, I prefer email to discuss the technical
aspects of this work. My attempt to summarize how to implement copy offloading
is available here: https://github.com/bvanassche/linux-kernel-copy-offload.
Feedback on this text is welcome.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-30  9:43       ` Chaitanya Kulkarni
  2021-09-30  9:53         ` Javier González
@ 2021-10-06 10:01         ` Javier González
  2021-10-13  8:35           ` Javier González
  1 sibling, 1 reply; 62+ messages in thread
From: Javier González @ 2021-10-06 10:01 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>Javier,
>
>>
>> Hi all,
>>
>> Since we are not going to be able to talk about this at LSF/MM, a few of
>> us thought about holding a dedicated virtual discussion about Copy
>> Offload. I believe we can use Chaitanya's thread as a start. Given the
>> current state of the current patches, I would propose that we focus on
>> the next step to get the minimal patchset that can go upstream so that
>> we can build from there.
>>
>
>I agree with having a call as it has been two years I'm trying to have
>this discussion.
>
>Before we setup a call, please summarize following here :-
>
>1. Exactly what work has been done so far.


We can categorize that into two sets. First one for XCopy (2014), and
second one for NVMe Copy (2021).

XCOPY set *********
- block-generic copy command (single range, between one
   source/destination device)
- ioctl interface for the above
- SCSI plumbing (block-generic to XCOPY conversion)
- device-mapper support: offload copy whenever possible (if IO is not
   split while traveling layers of virtual devices)

NVMe-Copy set *************
- block-generic copy command (multiple ranges, between one
   source/destination device)
- ioctl interface for the above
- NVMe plumbing (block-generic to NVMe Copy conversion)
- copy-emulation (read + write) in block-layer
- device-mapper support: no offload, rather fall back to copy-emulation


>2. What kind of feedback you got.

For NVMe Copy, the major points are - a) add copy-emulation in
block-layer and use that if copy-offload is not natively supported by
device b) user-interface (ioctl) should be extendable for copy across
two devices (one source, one destination) c) device-mapper targets
should support copy-offload, whenever possible

"whenever possible" cases get reduced compared to XCOPY because NVMe
Copy is wit

>3. What are the exact blockers/objections.

I think it was device-mapper for XCOPY and remains the same for NVMe
Copy as well.  Device-mapper support requires decomposing copy operation
to read and write.  While that is not great for efficiency PoV, bigger
concern is to check if we are taking the same route as XCOPY.

 From Martin's document (http://mkp.net/pubs/xcopy.pdf), if I got it
right, one the major blocker is having more failure cases than
successful ones. And that did not justify the effort/code to wire up
device mapper.  Is that a factor to consider for NVMe Copy (which is
narrower in scope than XCOPY).

>4. Potential ways of moving forward.

a) we defer attempt device-mapper support (until NVMe has
support/usecase), and address everything else (reusable user-interface
etc.)

b) we attempt device-mapper support (by moving to composite read+write
communication between block-layer and nvme)


Is this enough in your mind to move forward with a specific agenda? If
we can, I would like to target the meetup in the next 2 weeks.

Thanks,
Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-30 16:20       ` Bart Van Assche
@ 2021-10-06 10:05         ` Javier González
  2021-10-06 17:33           ` Bart Van Assche
  0 siblings, 1 reply; 62+ messages in thread
From: Javier González @ 2021-10-06 10:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:20, Bart Van Assche wrote:
>On 9/28/21 12:13 PM, Javier González wrote:
>>Since we are not going to be able to talk about this at LSF/MM, a few of
>>us thought about holding a dedicated virtual discussion about Copy
>>Offload. I believe we can use Chaitanya's thread as a start. Given the
>>current state of the current patches, I would propose that we focus on
>>the next step to get the minimal patchset that can go upstream so that
>>we can build from there.
>>
>>Before we try to find a date and a time that fits most of us, who would
>>be interested in participating?
>
>Given the technical complexity of this topic and also that the people who are
>interested live in multiple time zones, I prefer email to discuss the technical
>aspects of this work. My attempt to summarize how to implement copy offloading
>is available here: https://protect2.fireeye.com/v1/url?k=ba7e5d9a-e5e564d5-ba7fd6d5-0cc47a30d446-07a47f3f53cbfe53&q=1&e=c3973bdc-b6fd-43fb-80e6-0c86cb6b4d5f&u=https%3A%2F%2Fgithub.com%2Fbvanassche%2Flinux-kernel-copy-offload.
>Feedback on this text is welcome.

Thanks for sharing this Bart.

I agree that the topic is complex. However, we have not been able to
find a clear path forward in the mailing list.

What do you think about joining the call to talk very specific next
steps to get a patchset that we can start reviewing in detail.

I think that your presence in the call will help us all.

What do you think?


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-06 10:05         ` Javier González
@ 2021-10-06 17:33           ` Bart Van Assche
  2021-10-08  6:49             ` Javier González
  0 siblings, 1 reply; 62+ messages in thread
From: Bart Van Assche @ 2021-10-06 17:33 UTC (permalink / raw)
  To: Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 10/6/21 3:05 AM, Javier González wrote:
> I agree that the topic is complex. However, we have not been able to
> find a clear path forward in the mailing list.

Hmm ... really? At least Martin Petersen and I consider device mapper 
support essential. How about starting from Mikulas' patch series that 
supports the device mapper? See also 
https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/

> What do you think about joining the call to talk very specific next
> steps to get a patchset that we can start reviewing in detail.

I can do that.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-06 17:33           ` Bart Van Assche
@ 2021-10-08  6:49             ` Javier González
  2021-10-29  0:21               ` Chaitanya Kulkarni
  0 siblings, 1 reply; 62+ messages in thread
From: Javier González @ 2021-10-08  6:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 06.10.2021 10:33, Bart Van Assche wrote:
>On 10/6/21 3:05 AM, Javier González wrote:
>>I agree that the topic is complex. However, we have not been able to
>>find a clear path forward in the mailing list.
>
>Hmm ... really? At least Martin Petersen and I consider device mapper 
>support essential. How about starting from Mikulas' patch series that 
>supports the device mapper? See also https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/

Thanks for the pointers. We are looking into Mikulas' patch - I agree
that it is a good start.

>>What do you think about joining the call to talk very specific next
>>steps to get a patchset that we can start reviewing in detail.
>
>I can do that.

Thanks. I will wait until Chaitanya's reply on his questions. We will
start suggesting some dates then.

Thanks,
Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-06 10:01         ` Javier González
@ 2021-10-13  8:35           ` Javier González
  0 siblings, 0 replies; 62+ messages in thread
From: Javier González @ 2021-10-13  8:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

Chaitanya,

Did you have a chance to look at the answers below?

I would like to start finding candidate dates throughout the next couple
of weeks.

Thanks,
Javier

On 06.10.2021 12:01, Javier González wrote:
>On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>>Javier,
>>
>>>
>>>Hi all,
>>>
>>>Since we are not going to be able to talk about this at LSF/MM, a few of
>>>us thought about holding a dedicated virtual discussion about Copy
>>>Offload. I believe we can use Chaitanya's thread as a start. Given the
>>>current state of the current patches, I would propose that we focus on
>>>the next step to get the minimal patchset that can go upstream so that
>>>we can build from there.
>>>
>>
>>I agree with having a call as it has been two years I'm trying to have
>>this discussion.
>>
>>Before we setup a call, please summarize following here :-
>>
>>1. Exactly what work has been done so far.
>
>
>We can categorize that into two sets. First one for XCopy (2014), and
>second one for NVMe Copy (2021).
>
>XCOPY set *********
>- block-generic copy command (single range, between one
>  source/destination device)
>- ioctl interface for the above
>- SCSI plumbing (block-generic to XCOPY conversion)
>- device-mapper support: offload copy whenever possible (if IO is not
>  split while traveling layers of virtual devices)
>
>NVMe-Copy set *************
>- block-generic copy command (multiple ranges, between one
>  source/destination device)
>- ioctl interface for the above
>- NVMe plumbing (block-generic to NVMe Copy conversion)
>- copy-emulation (read + write) in block-layer
>- device-mapper support: no offload, rather fall back to copy-emulation
>
>
>>2. What kind of feedback you got.
>
>For NVMe Copy, the major points are - a) add copy-emulation in
>block-layer and use that if copy-offload is not natively supported by
>device b) user-interface (ioctl) should be extendable for copy across
>two devices (one source, one destination) c) device-mapper targets
>should support copy-offload, whenever possible
>
>"whenever possible" cases get reduced compared to XCOPY because NVMe
>Copy is wit
>
>>3. What are the exact blockers/objections.
>
>I think it was device-mapper for XCOPY and remains the same for NVMe
>Copy as well.  Device-mapper support requires decomposing copy operation
>to read and write.  While that is not great for efficiency PoV, bigger
>concern is to check if we are taking the same route as XCOPY.
>
>From Martin's document (http://mkp.net/pubs/xcopy.pdf), if I got it
>right, one the major blocker is having more failure cases than
>successful ones. And that did not justify the effort/code to wire up
>device mapper.  Is that a factor to consider for NVMe Copy (which is
>narrower in scope than XCOPY).
>
>>4. Potential ways of moving forward.
>
>a) we defer attempt device-mapper support (until NVMe has
>support/usecase), and address everything else (reusable user-interface
>etc.)
>
>b) we attempt device-mapper support (by moving to composite read+write
>communication between block-layer and nvme)
>
>
>Is this enough in your mind to move forward with a specific agenda? If
>we can, I would like to target the meetup in the next 2 weeks.
>
>Thanks,
>Javier


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-08  6:49             ` Javier González
@ 2021-10-29  0:21               ` Chaitanya Kulkarni
  2021-10-29  5:51                 ` Hannes Reinecke
  2021-10-29  8:14                 ` Javier González
  0 siblings, 2 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-29  0:21 UTC (permalink / raw)
  To: Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 10/7/21 11:49 PM, Javier González wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 06.10.2021 10:33, Bart Van Assche wrote:
>> On 10/6/21 3:05 AM, Javier González wrote:
>>> I agree that the topic is complex. However, we have not been able to
>>> find a clear path forward in the mailing list.
>>
>> Hmm ... really? At least Martin Petersen and I consider device mapper
>> support essential. How about starting from Mikulas' patch series that
>> supports the device mapper? See also 
>> https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/ 
>>

When we add a new REQ_OP_XXX we need to make sure it will work with 
device mapper, so I agree with Bart and Martin.

Starting with Mikulas patches is a right direction as of now..

> 
> Thanks for the pointers. We are looking into Mikulas' patch - I agree
> that it is a good start.
> 
>>> What do you think about joining the call to talk very specific next
>>> steps to get a patchset that we can start reviewing in detail.
>>
>> I can do that.
> 
> Thanks. I will wait until Chaitanya's reply on his questions. We will
> start suggesting some dates then.
> 

I think at this point we need to at least decide on having a first call
focused on how to proceed forward with Mikulas approach  ...

Javier, can you please organize a call with people you listed in this 
thread earlier ?

> Thanks,
> Javier


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  0:21               ` Chaitanya Kulkarni
@ 2021-10-29  5:51                 ` Hannes Reinecke
  2021-10-29  8:16                   ` Javier González
  2021-10-29 16:15                   ` Bart Van Assche
  2021-10-29  8:14                 ` Javier González
  1 sibling, 2 replies; 62+ messages in thread
From: Hannes Reinecke @ 2021-10-29  5:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 10/29/21 2:21 AM, Chaitanya Kulkarni wrote:
> On 10/7/21 11:49 PM, Javier González wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 06.10.2021 10:33, Bart Van Assche wrote:
>>> On 10/6/21 3:05 AM, Javier González wrote:
>>>> I agree that the topic is complex. However, we have not been able to
>>>> find a clear path forward in the mailing list.
>>>
>>> Hmm ... really? At least Martin Petersen and I consider device mapper
>>> support essential. How about starting from Mikulas' patch series that
>>> supports the device mapper? See also
>>> https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>
> 
> When we add a new REQ_OP_XXX we need to make sure it will work with
> device mapper, so I agree with Bart and Martin.
> 
> Starting with Mikulas patches is a right direction as of now..
> 
>>
>> Thanks for the pointers. We are looking into Mikulas' patch - I agree
>> that it is a good start.
>>
>>>> What do you think about joining the call to talk very specific next
>>>> steps to get a patchset that we can start reviewing in detail.
>>>
>>> I can do that.
>>
>> Thanks. I will wait until Chaitanya's reply on his questions. We will
>> start suggesting some dates then.
>>
> 
> I think at this point we need to at least decide on having a first call
> focused on how to proceed forward with Mikulas approach  ...
> 
> Javier, can you please organize a call with people you listed in this
> thread earlier ?
> 
Also Keith presented his work on a simple zone-based remapping block 
device, which included an in-kernel copy offload facility.
Idea is to lift that as a standalone patch such that we can use it a 
fallback (ie software) implementation if no other copy offload mechanism 
is available.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  0:21               ` Chaitanya Kulkarni
  2021-10-29  5:51                 ` Hannes Reinecke
@ 2021-10-29  8:14                 ` Javier González
  2021-11-03 19:27                   ` Javier González
  1 sibling, 1 reply; 62+ messages in thread
From: Javier González @ 2021-10-29  8:14 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 29.10.2021 00:21, Chaitanya Kulkarni wrote:
>On 10/7/21 11:49 PM, Javier González wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 06.10.2021 10:33, Bart Van Assche wrote:
>>> On 10/6/21 3:05 AM, Javier González wrote:
>>>> I agree that the topic is complex. However, we have not been able to
>>>> find a clear path forward in the mailing list.
>>>
>>> Hmm ... really? At least Martin Petersen and I consider device mapper
>>> support essential. How about starting from Mikulas' patch series that
>>> supports the device mapper? See also
>>> https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>
>
>When we add a new REQ_OP_XXX we need to make sure it will work with
>device mapper, so I agree with Bart and Martin.
>
>Starting with Mikulas patches is a right direction as of now..
>
>>
>> Thanks for the pointers. We are looking into Mikulas' patch - I agree
>> that it is a good start.
>>
>>>> What do you think about joining the call to talk very specific next
>>>> steps to get a patchset that we can start reviewing in detail.
>>>
>>> I can do that.
>>
>> Thanks. I will wait until Chaitanya's reply on his questions. We will
>> start suggesting some dates then.
>>
>
>I think at this point we need to at least decide on having a first call
>focused on how to proceed forward with Mikulas approach  ...
>
>Javier, can you please organize a call with people you listed in this
>thread earlier ?

Here you have a Doogle for end of next week and the week after OCP.
Please fill it out until Wednesday. I will set up a call with the
selected slot:

     https://doodle.com/poll/r2c8duy3r8g88v8q?utm_source=poll&utm_medium=link

Thanks,
Javier


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  5:51                 ` Hannes Reinecke
@ 2021-10-29  8:16                   ` Javier González
  2021-10-29 16:15                   ` Bart Van Assche
  1 sibling, 0 replies; 62+ messages in thread
From: Javier González @ 2021-10-29  8:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, kbusch, rwheeler,
	hch, Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu,
	Bart Van Assche

On 29.10.2021 07:51, Hannes Reinecke wrote:
>On 10/29/21 2:21 AM, Chaitanya Kulkarni wrote:
>>On 10/7/21 11:49 PM, Javier González wrote:
>>>External email: Use caution opening links or attachments
>>>
>>>
>>>On 06.10.2021 10:33, Bart Van Assche wrote:
>>>>On 10/6/21 3:05 AM, Javier González wrote:
>>>>>I agree that the topic is complex. However, we have not been able to
>>>>>find a clear path forward in the mailing list.
>>>>
>>>>Hmm ... really? At least Martin Petersen and I consider device mapper
>>>>support essential. How about starting from Mikulas' patch series that
>>>>supports the device mapper? See also
>>>>https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>>
>>
>>When we add a new REQ_OP_XXX we need to make sure it will work with
>>device mapper, so I agree with Bart and Martin.
>>
>>Starting with Mikulas patches is a right direction as of now..
>>
>>>
>>>Thanks for the pointers. We are looking into Mikulas' patch - I agree
>>>that it is a good start.
>>>
>>>>>What do you think about joining the call to talk very specific next
>>>>>steps to get a patchset that we can start reviewing in detail.
>>>>
>>>>I can do that.
>>>
>>>Thanks. I will wait until Chaitanya's reply on his questions. We will
>>>start suggesting some dates then.
>>>
>>
>>I think at this point we need to at least decide on having a first call
>>focused on how to proceed forward with Mikulas approach  ...
>>
>>Javier, can you please organize a call with people you listed in this
>>thread earlier ?
>>
>Also Keith presented his work on a simple zone-based remapping block 
>device, which included an in-kernel copy offload facility.
>Idea is to lift that as a standalone patch such that we can use it a 
>fallback (ie software) implementation if no other copy offload 
>mechanism is available.
>

I believe this is in essence what we are trying to convey here: a
minimal patchset that enables Simple Copy and the infra around to extend
copy-offload use-cases.

I look forward to hear Keith's ideas around this!

Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  5:51                 ` Hannes Reinecke
  2021-10-29  8:16                   ` Javier González
@ 2021-10-29 16:15                   ` Bart Van Assche
  2021-11-01 17:54                     ` Keith Busch
  1 sibling, 1 reply; 62+ messages in thread
From: Bart Van Assche @ 2021-10-29 16:15 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc

On 10/28/21 10:51 PM, Hannes Reinecke wrote:
> Also Keith presented his work on a simple zone-based remapping block device, which included an in-kernel copy offload facility.
> Idea is to lift that as a standalone patch such that we can use it a fallback (ie software) implementation if no other copy offload mechanism is available.

Is a link to the presentation available?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29 16:15                   ` Bart Van Assche
@ 2021-11-01 17:54                     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2021-11-01 17:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hannes Reinecke, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc

On Fri, Oct 29, 2021 at 09:15:43AM -0700, Bart Van Assche wrote:
> On 10/28/21 10:51 PM, Hannes Reinecke wrote:
> > Also Keith presented his work on a simple zone-based remapping block device, which included an in-kernel copy offload facility.
> > Idea is to lift that as a standalone patch such that we can use it a fallback (ie software) implementation if no other copy offload mechanism is available.
> 
> Is a link to the presentation available?

Thanks for the interest.

I didn't post them online as the conference didn't provide it, and I
don't think the slides would be particularly interesting without the
prepared speech anyway.

The presentation described a simple prototype implementing a redirection
table on zone block devices. There was one bullet point explaining how a
generic kernel implementation would be an improvement. For zoned block
devices, an "append" like copy offload would be an even better option.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  8:14                 ` Javier González
@ 2021-11-03 19:27                   ` Javier González
  2021-11-16 13:43                     ` Javier González
  0 siblings, 1 reply; 62+ messages in thread
From: Javier González @ 2021-11-03 19:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 29.10.2021 10:14, Javier González wrote:
>On 29.10.2021 00:21, Chaitanya Kulkarni wrote:
>>On 10/7/21 11:49 PM, Javier González wrote:
>>>External email: Use caution opening links or attachments
>>>
>>>
>>>On 06.10.2021 10:33, Bart Van Assche wrote:
>>>>On 10/6/21 3:05 AM, Javier González wrote:
>>>>>I agree that the topic is complex. However, we have not been able to
>>>>>find a clear path forward in the mailing list.
>>>>
>>>>Hmm ... really? At least Martin Petersen and I consider device mapper
>>>>support essential. How about starting from Mikulas' patch series that
>>>>supports the device mapper? See also
>>>>https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>>
>>
>>When we add a new REQ_OP_XXX we need to make sure it will work with
>>device mapper, so I agree with Bart and Martin.
>>
>>Starting with Mikulas patches is a right direction as of now..
>>
>>>
>>>Thanks for the pointers. We are looking into Mikulas' patch - I agree
>>>that it is a good start.
>>>
>>>>>What do you think about joining the call to talk very specific next
>>>>>steps to get a patchset that we can start reviewing in detail.
>>>>
>>>>I can do that.
>>>
>>>Thanks. I will wait until Chaitanya's reply on his questions. We will
>>>start suggesting some dates then.
>>>
>>
>>I think at this point we need to at least decide on having a first call
>>focused on how to proceed forward with Mikulas approach  ...
>>
>>Javier, can you please organize a call with people you listed in this
>>thread earlier ?
>
>Here you have a Doogle for end of next week and the week after OCP.
>Please fill it out until Wednesday. I will set up a call with the
>selected slot:
>
>    https://doodle.com/poll/r2c8duy3r8g88v8q?utm_source=poll&utm_medium=link
>
>Thanks,
>Javier

I sent the invite for the people that signed up into the Doodle. The
call will take place on Monday November 15th, 17.00-19.00 CET. See the
list of current participants below. If anyone else wants to participate,
please send me a note and I will extend the invite.

   Johannes.Thumshirn@wdc.com
   Vincent.fu@samsung.com
   a.dawn@samsung.com
   a.manzanares@samsung.com
   bvanassche@acm.org
   himanshu.madhani@oracle.com
   joshi.k@samsung.com
   kch@nvidia.com
   martin.petersen@oracle.com
   nj.shetty@samsung.com
   selvakuma.s1@samsung

Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-03 19:27                   ` Javier González
@ 2021-11-16 13:43                     ` Javier González
  2021-11-16 17:59                       ` Bart Van Assche
                                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Javier González @ 2021-11-16 13:43 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

Hi all,

Thanks for attending the call on Copy Offload yesterday. Here you have
the meeting notes and 2 specific actions before we proceed with another
version of the patchset.

We will work on a version of the use-case matrix internally and reply
here in the next couple of days.

Please, add to the notes and the matrix as you see fit.

Thanks,
Javier

----

ATTENDEES

- Adam
- Arnav
- Chaitanya
- Himashu
- Johannes
- Kanchan
- Keith
- Martin
- Mikulas
- Niklas
- Nitesh
- Selva
- Vincent
- Bart

NOTES

- MD and DM are hard requirements
	- We need support for all the main users of the block layer
	- Same problem with crypto and integrity
- Martin would be OK with separating Simple Copy in ZNS and Copy Offload
- Why did Mikulas work not get upstream?
	- Timing was an issue
		- Use-case was about copying data across VMs
		- No HW vendor support
		- Hard from a protocol perspective
			- At that point, SCSI was still adding support in the spec
			- MSFT could not implement extended copy command in the target (destination) device.
				- This is what triggered the token-based implementation
				- This triggered array vendors to implement support for copy offload as token-based. This allows mixing with normal read / write workloads
			- Martin lost the implementation and dropped it

DIRECTION

- Keeping the IOCTL interface is an option. It might make sense to move from IOCTL to io_uring opcode
- Martin is happy to do the SCSIpart if the block layer API is upstreamed
- Token-based implementationis the norm. This allows mixing normal read / write workloads to avoid DoS
	- This is the direction as opposed to the extended copy command
	- It addresses problems when integrating with DM and simplifies command multiplexing a single bio into many
	- It simplifies multiple bios
	- We should explore Mikulas approach with pointers.
- Use-cases
	- ZNS GC
	- dm-kcopyd
	- file system GC
	- fabrics offload to the storage node
	- copy_file_range
- It is OK to implement support incrementally, but the interface needs to support all customers of the block layer
	- OK to not support specific DMs (e.g., RAID5)
	- We should support DM and MD as a framework and the personalities that are needed. Inspiration in integrity
		- dm-linear
		- dm-crypt and dm-verify are needed for F2FSuse-case in Androd
			- Here, we need copy emulation to support encryption without dealing with HW issues and garbage
- User-interface can wait and be carried out on the side
- Maybe it makes sense to start with internal users
	- copy_file_range
	- F2FS GC, btrfs GC
- User-space should be allowed to do anything and kernel-space can chop the command accordingly
- We need to define the heuristics of the sizes
- User-space should only work on block devices (no other constructs that are protocol-specific) . Export capabilities in sysfs
	- Need copy domains to be exposed in sysfs
	- We need to start with bdev to bdev in block layer
	- Not specific requirement on multi-namespace in NVMe, but it should be extendable
	- Plumbing will support all use-cases
- Try to start with one in-kernel consumer
- Emulation is a must
	- Needed for failed I/Os
	- Expose capabilities so that users can decide
- We can get help from btrfs and F2FS folks
- The use case for GC and for copy are different. We might have to reflect this in the interface, but the internal plumbing should allow both paths to be maintained as a single one.

ACTIONS

- [ ] Make a list of use-cases that we want to support in each specification and pick 1-2 examples for MD, DM. Make sure that the interfaces support this
- [ ] Vendors: Ask internally what is the recommended size for copy, if
   any


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 13:43                     ` Javier González
@ 2021-11-16 17:59                       ` Bart Van Assche
  2021-11-17 12:53                         ` Javier González
  2021-11-19 10:47                       ` Kanchan Joshi
  2021-11-22  7:39                       ` Kanchan Joshi
  2 siblings, 1 reply; 62+ messages in thread
From: Bart Van Assche @ 2021-11-16 17:59 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 11/16/21 05:43, Javier González wrote:
>              - Here, we need copy emulation to support encryption 
> without dealing with HW issues and garbage

Hi Javier,

Thanks very much for having taken notes and also for having shared 
these. Regarding the above comment, after the meeting I learned that the 
above is not correct. Encryption in Android is LBA independent and hence 
it should be possible to offload F2FS garbage collection in Android once 
the (UFS) storage controller supports this.

For the general case, I propose to let the dm-crypt driver decide 
whether or not to offload data copying since that driver knows whether 
or not data copying can be offloaded.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 17:59                       ` Bart Van Assche
@ 2021-11-17 12:53                         ` Javier González
  2021-11-17 15:52                           ` Bart Van Assche
  0 siblings, 1 reply; 62+ messages in thread
From: Javier González @ 2021-11-17 12:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu

On 16.11.2021 09:59, Bart Van Assche wrote:
>On 11/16/21 05:43, Javier González wrote:
>>             - Here, we need copy emulation to support encryption 
>>without dealing with HW issues and garbage
>
>Hi Javier,
>
>Thanks very much for having taken notes and also for having shared 
>these.

My pleasure. Thanks for attending. Happy to see this moving.

>Regarding the above comment, after the meeting I learned that 
>the above is not correct. Encryption in Android is LBA independent and 
>hence it should be possible to offload F2FS garbage collection in 
>Android once the (UFS) storage controller supports this.
>
>For the general case, I propose to let the dm-crypt driver decide 
>whether or not to offload data copying since that driver knows whether 
>or not data copying can be offloaded.

Thanks for sharing this. We will make sure that DM / MD are supported
and then we can cover examples. Hopefully, you guys can help with the
bits for dm-crypt to make the decision to offload when it make sense.

I will update the notes to keep them alive. Maybe we can have them open
in your github page?

>
>Thanks,
>
>Bart.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-17 12:53                         ` Javier González
@ 2021-11-17 15:52                           ` Bart Van Assche
  2021-11-19  7:38                             ` Javier González
  0 siblings, 1 reply; 62+ messages in thread
From: Bart Van Assche @ 2021-11-17 15:52 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu

On 11/17/21 04:53, Javier González wrote:
> Thanks for sharing this. We will make sure that DM / MD are supported
> and then we can cover examples. Hopefully, you guys can help with the
> bits for dm-crypt to make the decision to offload when it make sense.

Will ask around to learn who should work on this.

> I will update the notes to keep them alive. Maybe we can have them open
> in your github page?

Feel free to submit a pull request.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-17 15:52                           ` Bart Van Assche
@ 2021-11-19  7:38                             ` Javier González
  0 siblings, 0 replies; 62+ messages in thread
From: Javier González @ 2021-11-19  7:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu



> On 17 Nov 2021, at 16.52, Bart Van Assche <bvanassche@acm.org> wrote:
> 
> On 11/17/21 04:53, Javier González wrote:
>> Thanks for sharing this. We will make sure that DM / MD are supported
>> and then we can cover examples. Hopefully, you guys can help with the
>> bits for dm-crypt to make the decision to offload when it make sense.
> 
> Will ask around to learn who should work on this.

Great. Thanks. 
> 
>> I will update the notes to keep them alive. Maybe we can have them open
>> in your github page?
> 
> Feel free to submit a pull request.

Will do.

Javier 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 13:43                     ` Javier González
  2021-11-16 17:59                       ` Bart Van Assche
@ 2021-11-19 10:47                       ` Kanchan Joshi
  2021-11-19 15:51                         ` Keith Busch
  2021-11-19 16:21                         ` Bart Van Assche
  2021-11-22  7:39                       ` Kanchan Joshi
  2 siblings, 2 replies; 62+ messages in thread
From: Kanchan Joshi @ 2021-11-19 10:47 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu, Bart Van Assche

Given the multitude of things accumulated on this topic, Martin
suggested to have a table/matrix.
Some of those should go in the initial patchset, and the remaining are
to be staged for subsequent work.
Here is the attempt to split the stuff into two buckets. Please change
if something needs to be changed below.

1. Driver
*********
Initial: NVMe Copy command (single NS)
Subsequent: Multi NS copy, XCopy/Token-based Copy

2. Block layer
**************
Initial:
- Block-generic copy (REQ_OP_COPY), with interface accommodating two block-devs
- Emulation, when offload is natively absent
- DM support (at least dm-linear)

3. User-interface
*****************
Initial: new ioctl or io_uring opcode

4. In-kernel user
******************
Initial: at least one user
- dm-kcopyd user (e.g. dm-clone), or FS requiring GC (F2FS/Btrfs)

Subsequent:
- copy_file_range

Thanks,
On Tue, Nov 16, 2021 at 7:15 PM Javier González <javier@javigon.com> wrote:
>
> Hi all,
>
> Thanks for attending the call on Copy Offload yesterday. Here you have
> the meeting notes and 2 specific actions before we proceed with another
> version of the patchset.
>
> We will work on a version of the use-case matrix internally and reply
> here in the next couple of days.
>
> Please, add to the notes and the matrix as you see fit.
>
> Thanks,
> Javier
>
> ----
>
> ATTENDEES
>
> - Adam
> - Arnav
> - Chaitanya
> - Himashu
> - Johannes
> - Kanchan
> - Keith
> - Martin
> - Mikulas
> - Niklas
> - Nitesh
> - Selva
> - Vincent
> - Bart
>
> NOTES
>
> - MD and DM are hard requirements
>         - We need support for all the main users of the block layer
>         - Same problem with crypto and integrity
> - Martin would be OK with separating Simple Copy in ZNS and Copy Offload
> - Why did Mikulas work not get upstream?
>         - Timing was an issue
>                 - Use-case was about copying data across VMs
>                 - No HW vendor support
>                 - Hard from a protocol perspective
>                         - At that point, SCSI was still adding support in the spec
>                         - MSFT could not implement extended copy command in the target (destination) device.
>                                 - This is what triggered the token-based implementation
>                                 - This triggered array vendors to implement support for copy offload as token-based. This allows mixing with normal read / write workloads
>                         - Martin lost the implementation and dropped it
>
> DIRECTION
>
> - Keeping the IOCTL interface is an option. It might make sense to move from IOCTL to io_uring opcode
> - Martin is happy to do the SCSIpart if the block layer API is upstreamed
> - Token-based implementationis the norm. This allows mixing normal read / write workloads to avoid DoS
>         - This is the direction as opposed to the extended copy command
>         - It addresses problems when integrating with DM and simplifies command multiplexing a single bio into many
>         - It simplifies multiple bios
>         - We should explore Mikulas approach with pointers.
> - Use-cases
>         - ZNS GC
>         - dm-kcopyd
>         - file system GC
>         - fabrics offload to the storage node
>         - copy_file_range
> - It is OK to implement support incrementally, but the interface needs to support all customers of the block layer
>         - OK to not support specific DMs (e.g., RAID5)
>         - We should support DM and MD as a framework and the personalities that are needed. Inspiration in integrity
>                 - dm-linear
>                 - dm-crypt and dm-verify are needed for F2FSuse-case in Androd
>                         - Here, we need copy emulation to support encryption without dealing with HW issues and garbage
> - User-interface can wait and be carried out on the side
> - Maybe it makes sense to start with internal users
>         - copy_file_range
>         - F2FS GC, btrfs GC
> - User-space should be allowed to do anything and kernel-space can chop the command accordingly
> - We need to define the heuristics of the sizes
> - User-space should only work on block devices (no other constructs that are protocol-specific) . Export capabilities in sysfs
>         - Need copy domains to be exposed in sysfs
>         - We need to start with bdev to bdev in block layer
>         - Not specific requirement on multi-namespace in NVMe, but it should be extendable
>         - Plumbing will support all use-cases
> - Try to start with one in-kernel consumer
> - Emulation is a must
>         - Needed for failed I/Os
>         - Expose capabilities so that users can decide
> - We can get help from btrfs and F2FS folks
> - The use case for GC and for copy are different. We might have to reflect this in the interface, but the internal plumbing should allow both paths to be maintained as a single one.
>
> ACTIONS
>
> - [ ] Make a list of use-cases that we want to support in each specification and pick 1-2 examples for MD, DM. Make sure that the interfaces support this
> - [ ] Vendors: Ask internally what is the recommended size for copy, if
>    any
>


-- 
Joshi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-19 10:47                       ` Kanchan Joshi
@ 2021-11-19 15:51                         ` Keith Busch
  2021-11-19 16:21                         ` Bart Van Assche
  1 sibling, 0 replies; 62+ messages in thread
From: Keith Busch @ 2021-11-19 15:51 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: Javier González, Chaitanya Kulkarni, Johannes Thumshirn,
	Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, martin.petersen, roland,
	mpatocka, hare, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov, Adam Manzanares, SelvaKumar S, Nitesh Shetty,
	Kanchan Joshi, Vincent Fu, Bart Van Assche

On Fri, Nov 19, 2021 at 04:17:51PM +0530, Kanchan Joshi wrote:
> Given the multitude of things accumulated on this topic, Martin
> suggested to have a table/matrix.
> Some of those should go in the initial patchset, and the remaining are
> to be staged for subsequent work.
> Here is the attempt to split the stuff into two buckets. Please change
> if something needs to be changed below.
> 
> 1. Driver
> *********
> Initial: NVMe Copy command (single NS)

Does this point include implementing the copy command in the nvme target
driver or just the host side? Enabling the target should be pretty
straight forward, and would provide an in-kernel standard compliant
"device" that everyone could test future improvements against.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-19 10:47                       ` Kanchan Joshi
  2021-11-19 15:51                         ` Keith Busch
@ 2021-11-19 16:21                         ` Bart Van Assche
  1 sibling, 0 replies; 62+ messages in thread
From: Bart Van Assche @ 2021-11-19 16:21 UTC (permalink / raw)
  To: Kanchan Joshi, Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu

On 11/19/21 02:47, Kanchan Joshi wrote:
> Given the multitude of things accumulated on this topic, Martin
> suggested to have a table/matrix.
> Some of those should go in the initial patchset, and the remaining are
> to be staged for subsequent work.
> Here is the attempt to split the stuff into two buckets. Please change
> if something needs to be changed below.
> 
> 1. Driver
> *********
> Initial: NVMe Copy command (single NS)
> Subsequent: Multi NS copy, XCopy/Token-based Copy
> 
> 2. Block layer
> **************
> Initial:
> - Block-generic copy (REQ_OP_COPY), with interface accommodating two block-devs
> - Emulation, when offload is natively absent
> - DM support (at least dm-linear)
> 
> 3. User-interface
> *****************
> Initial: new ioctl or io_uring opcode
> 
> 4. In-kernel user
> ******************
> Initial: at least one user
> - dm-kcopyd user (e.g. dm-clone), or FS requiring GC (F2FS/Btrfs)
> 
> Subsequent:
> - copy_file_range

Integrity support and inline encryption support are missing from the above
overview. Both are supported by the block layer. See also block/blk-integrity.c
and include/linux/blk-crypto.h. I'm not claiming that these should be supported
in the first version but I think it would be good to add these to the above
overview.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 13:43                     ` Javier González
  2021-11-16 17:59                       ` Bart Van Assche
  2021-11-19 10:47                       ` Kanchan Joshi
@ 2021-11-22  7:39                       ` Kanchan Joshi
  2 siblings, 0 replies; 62+ messages in thread
From: Kanchan Joshi @ 2021-11-22  7:39 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu, Bart Van Assche

Updated one (points from Keith and Bart) -

Given the multitude of things accumulated on this topic, Martin
suggested to have a table/matrix.
Some of those should go in the initial patchset, and the remaining are
to be staged for subsequent work.
Here is the attempt to split the stuff into two buckets. Please change
if something needs to be changed below.

1. Driver
*********
Initial: NVMe Copy command (single NS), including support in nvme-target
Subsequent: Multi NS copy, XCopy/Token-based Copy

2. Block layer
**************
Initial:
- Block-generic copy (REQ_OP_COPY), with interface accommodating two block-devs
- Emulation, when offload is natively absent
- DM support (at least dm-linear)

Subsequent: Integrity and encryption support

3. User-interface
*****************
Initial: new ioctl or io_uring opcode

4. In-kernel user
******************
Initial: at least one user
- dm-kcopyd user (e.g. dm-clone), or FS requiring GC (F2FS/Btrfs)

Subsequent:
- copy_file_range

On Tue, Nov 16, 2021 at 7:15 PM Javier González <javier@javigon.com> wrote:
>
> Hi all,
>
> Thanks for attending the call on Copy Offload yesterday. Here you have
> the meeting notes and 2 specific actions before we proceed with another
> version of the patchset.
>
> We will work on a version of the use-case matrix internally and reply
> here in the next couple of days.
>
> Please, add to the notes and the matrix as you see fit.
>
> Thanks,
> Javier
>
> ----
>
> ATTENDEES
>
> - Adam
> - Arnav
> - Chaitanya
> - Himashu
> - Johannes
> - Kanchan
> - Keith
> - Martin
> - Mikulas
> - Niklas
> - Nitesh
> - Selva
> - Vincent
> - Bart
>
> NOTES
>
> - MD and DM are hard requirements
>         - We need support for all the main users of the block layer
>         - Same problem with crypto and integrity
> - Martin would be OK with separating Simple Copy in ZNS and Copy Offload
> - Why did Mikulas work not get upstream?
>         - Timing was an issue
>                 - Use-case was about copying data across VMs
>                 - No HW vendor support
>                 - Hard from a protocol perspective
>                         - At that point, SCSI was still adding support in the spec
>                         - MSFT could not implement extended copy command in the target (destination) device.
>                                 - This is what triggered the token-based implementation
>                                 - This triggered array vendors to implement support for copy offload as token-based. This allows mixing with normal read / write workloads
>                         - Martin lost the implementation and dropped it
>
> DIRECTION
>
> - Keeping the IOCTL interface is an option. It might make sense to move from IOCTL to io_uring opcode
> - Martin is happy to do the SCSIpart if the block layer API is upstreamed
> - Token-based implementationis the norm. This allows mixing normal read / write workloads to avoid DoS
>         - This is the direction as opposed to the extended copy command
>         - It addresses problems when integrating with DM and simplifies command multiplexing a single bio into many
>         - It simplifies multiple bios
>         - We should explore Mikulas approach with pointers.
> - Use-cases
>         - ZNS GC
>         - dm-kcopyd
>         - file system GC
>         - fabrics offload to the storage node
>         - copy_file_range
> - It is OK to implement support incrementally, but the interface needs to support all customers of the block layer
>         - OK to not support specific DMs (e.g., RAID5)
>         - We should support DM and MD as a framework and the personalities that are needed. Inspiration in integrity
>                 - dm-linear
>                 - dm-crypt and dm-verify are needed for F2FSuse-case in Androd
>                         - Here, we need copy emulation to support encryption without dealing with HW issues and garbage
> - User-interface can wait and be carried out on the side
> - Maybe it makes sense to start with internal users
>         - copy_file_range
>         - F2FS GC, btrfs GC
> - User-space should be allowed to do anything and kernel-space can chop the command accordingly
> - We need to define the heuristics of the sizes
> - User-space should only work on block devices (no other constructs that are protocol-specific) . Export capabilities in sysfs
>         - Need copy domains to be exposed in sysfs
>         - We need to start with bdev to bdev in block layer
>         - Not specific requirement on multi-namespace in NVMe, but it should be extendable
>         - Plumbing will support all use-cases
> - Try to start with one in-kernel consumer
> - Emulation is a must
>         - Needed for failed I/Os
>         - Expose capabilities so that users can decide
> - We can get help from btrfs and F2FS folks
> - The use case for GC and for copy are different. We might have to reflect this in the interface, but the internal plumbing should allow both paths to be maintained as a single one.
>
> ACTIONS
>
> - [ ] Make a list of use-cases that we want to support in each specification and pick 1-2 examples for MD, DM. Make sure that the interfaces support this
> - [ ] Vendors: Ask internally what is the recommended size for copy, if
>    any
>


-- 
Joshi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-09  8:51         ` Mikulas Patocka
@ 2022-03-09 15:49           ` Nikos Tsironis
  0 siblings, 0 replies; 62+ messages in thread
From: Nikos Tsironis @ 2022-03-09 15:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 3/9/22 10:51, Mikulas Patocka wrote:
> 
> Hi
> 
> Note that you must submit kcopyd callbacks from a single thread, otherwise
> there's a race condition in snapshot.
> 

Hi,

Thanks for the feedback. Yes, I'm aware of that.

> The snapshot code doesn't take locks in the copy_callback and it expects
> that the callbacks are serialized.
> 
> Maybe, adding the locks to copy_callback would solve it.
> 

That's what I did. I used a lock to ensure that kcopyd callbacks are
serialized for persistent snapshots.

For transient snapshots we can lift this limitation, and complete
pending exceptions out-of-oder and in "parallel", i.e., without
explicitly serializing kcopyd callbacks. The locks in pending_complete()
are enough in this case.

Nikos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-08 20:48       ` Nikos Tsironis
@ 2022-03-09  8:51         ` Mikulas Patocka
  2022-03-09 15:49           ` Nikos Tsironis
  0 siblings, 1 reply; 62+ messages in thread
From: Mikulas Patocka @ 2022-03-09  8:51 UTC (permalink / raw)
  To: Nikos Tsironis
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2404 bytes --]



On Tue, 8 Mar 2022, Nikos Tsironis wrote:

> My work focuses mainly on improving the IOPs and latency of the
> dm-snapshot target, in order to bring the performance of short-lived
> snapshots as close as possible to bare-metal performance.
> 
> My initial performance evaluation of dm-snapshot had revealed a big
> performance drop, while the snapshot is active; a drop which is not
> justified by COW alone.
> 
> Using fio with blktrace I had noticed that the per-CPU I/O distribution
> was uneven. Although many threads were doing I/O, only a couple of the
> CPUs ended up submitting I/O requests to the underlying device.
> 
> The same issue also affects dm-clone, when doing I/O with sizes smaller
> than the target's region size, where kcopyd is used for COW.
> 
> The bottleneck here is kcopyd serializing all I/O. Users of kcopyd, such
> as dm-snapshot and dm-clone, cannot take advantage of the increased I/O
> parallelism that comes with using blk-mq in modern multi-core systems,
> because I/Os are issued only by a single CPU at a time, the one on which
> kcopyd’s thread happens to be running.
> 
> So, I experimented redesigning kcopyd to prevent I/O serialization by
> respecting thread locality for I/Os and their completions. This made the
> distribution of I/O processing uniform across CPUs.
> 
> My measurements had shown that scaling kcopyd, in combination with
> scaling dm-snapshot itself [1] [2], can lead to an eventual performance
> improvement of ~300% increase in sustained throughput and ~80% decrease
> in I/O latency for transient snapshots, over the null_blk device.
> 
> The work for scaling dm-snapshot has been merged [1], but,
> unfortunately, I haven't been able to send upstream my work on kcopyd
> yet, because I have been really busy with other things the last couple
> of years.
> 
> I haven't looked into the details of copy offload yet, but it would be
> really interesting to see how it affects the performance of random and
> sequential workloads, and to check how, and if, scaling kcopyd affects
> the performance, in combination with copy offload.
> 
> Nikos

Hi

Note that you must submit kcopyd callbacks from a single thread, otherwise 
there's a race condition in snapshot.

The snapshot code doesn't take locks in the copy_callback and it expects 
that the callbacks are serialized.

Maybe, adding the locks to copy_callback would solve it.

Mikulas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-01 21:32     ` Chaitanya Kulkarni
  2022-03-03 18:36       ` Nikos Tsironis
@ 2022-03-08 20:48       ` Nikos Tsironis
  2022-03-09  8:51         ` Mikulas Patocka
  1 sibling, 1 reply; 62+ messages in thread
From: Nikos Tsironis @ 2022-03-08 20:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 3/1/22 23:32, Chaitanya Kulkarni wrote:
> Nikos,
> 
>>> [8] https://kernel.dk/io_uring.pdf
>>
>> I would like to participate in the discussion too.
>>
>> The dm-clone target would also benefit from copy offload, as it heavily
>> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
>> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
>> NVMe devices, but copy offload sounds even more promising, especially
>> for larger copies happening in the background (as is the case with
>> dm-clone's background hydration).
>>
>> Thanks,
>> Nikos
> 
> If you can document your findings here it will be great for me to
> add it to the agenda.
> 

My work focuses mainly on improving the IOPs and latency of the
dm-snapshot target, in order to bring the performance of short-lived
snapshots as close as possible to bare-metal performance.

My initial performance evaluation of dm-snapshot had revealed a big
performance drop, while the snapshot is active; a drop which is not
justified by COW alone.

Using fio with blktrace I had noticed that the per-CPU I/O distribution
was uneven. Although many threads were doing I/O, only a couple of the
CPUs ended up submitting I/O requests to the underlying device.

The same issue also affects dm-clone, when doing I/O with sizes smaller
than the target's region size, where kcopyd is used for COW.

The bottleneck here is kcopyd serializing all I/O. Users of kcopyd, such
as dm-snapshot and dm-clone, cannot take advantage of the increased I/O
parallelism that comes with using blk-mq in modern multi-core systems,
because I/Os are issued only by a single CPU at a time, the one on which
kcopyd’s thread happens to be running.

So, I experimented redesigning kcopyd to prevent I/O serialization by
respecting thread locality for I/Os and their completions. This made the
distribution of I/O processing uniform across CPUs.

My measurements had shown that scaling kcopyd, in combination with
scaling dm-snapshot itself [1] [2], can lead to an eventual performance
improvement of ~300% increase in sustained throughput and ~80% decrease
in I/O latency for transient snapshots, over the null_blk device.

The work for scaling dm-snapshot has been merged [1], but,
unfortunately, I haven't been able to send upstream my work on kcopyd
yet, because I have been really busy with other things the last couple
of years.

I haven't looked into the details of copy offload yet, but it would be
really interesting to see how it affects the performance of random and
sequential workloads, and to check how, and if, scaling kcopyd affects
the performance, in combination with copy offload.

Nikos

[1] https://lore.kernel.org/dm-devel/20190317122258.21760-1-ntsironis@arrikto.com/
[2] https://lore.kernel.org/dm-devel/425d7efe-ab3f-67be-264e-9c3b6db229bc@arrikto.com/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-01 21:32     ` Chaitanya Kulkarni
@ 2022-03-03 18:36       ` Nikos Tsironis
  2022-03-08 20:48       ` Nikos Tsironis
  1 sibling, 0 replies; 62+ messages in thread
From: Nikos Tsironis @ 2022-03-03 18:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 3/1/22 23:32, Chaitanya Kulkarni wrote:
> Nikos,
> 
>>> [8] https://kernel.dk/io_uring.pdf
>>
>> I would like to participate in the discussion too.
>>
>> The dm-clone target would also benefit from copy offload, as it heavily
>> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
>> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
>> NVMe devices, but copy offload sounds even more promising, especially
>> for larger copies happening in the background (as is the case with
>> dm-clone's background hydration).
>>
>> Thanks,
>> Nikos
> 
> If you can document your findings here it will be great for me to
> add it to the agenda.
> 

Hi,

Give me a few days to gather my notes, because it's been a while since
the last time I worked on this, and I will come back with a summary of
my findings.

Nikos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-01 17:34   ` Nikos Tsironis
@ 2022-03-01 21:32     ` Chaitanya Kulkarni
  2022-03-03 18:36       ` Nikos Tsironis
  2022-03-08 20:48       ` Nikos Tsironis
  0 siblings, 2 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2022-03-01 21:32 UTC (permalink / raw)
  To: Nikos Tsironis
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

Nikos,

>> [8] https://kernel.dk/io_uring.pdf
> 
> I would like to participate in the discussion too.
> 
> The dm-clone target would also benefit from copy offload, as it heavily
> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
> NVMe devices, but copy offload sounds even more promising, especially
> for larger copies happening in the background (as is the case with
> dm-clone's background hydration).
> 
> Thanks,
> Nikos

If you can document your findings here it will be great for me to
add it to the agenda.

-ck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
                     ` (5 preceding siblings ...)
  2022-02-07 10:45   ` David Disseldorp
@ 2022-03-01 17:34   ` Nikos Tsironis
  2022-03-01 21:32     ` Chaitanya Kulkarni
  6 siblings, 1 reply; 62+ messages in thread
From: Nikos Tsironis @ 2022-03-01 17:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 1/27/22 09:14, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since
> lack of LSFMMM and we would like to share the details with broader
> community members.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> -ck
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>         https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf

I would like to participate in the discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
                     ` (4 preceding siblings ...)
  2022-02-02  5:57   ` Kanchan Joshi
@ 2022-02-07 10:45   ` David Disseldorp
  2022-03-01 17:34   ` Nikos Tsironis
  6 siblings, 0 replies; 62+ messages in thread
From: David Disseldorp @ 2022-02-07 10:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack

On Thu, 27 Jan 2022 07:14:13 +0000, Chaitanya Kulkarni wrote:

> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since 
> lack of LSFMMM and we would like to share the details with broader
> community members.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.

I'd like to attend this discussion. I've worked on the LIO XCOPY
implementation in drivers/target/target_core_xcopy.c and added Samba's
FSCTL_SRV_COPYCHUNK/FSCTL_DUPLICATE_EXTENTS_TO_FILE support.

Cheers, David

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-02-01 10:21   ` Javier González
@ 2022-02-07  9:57     ` Nitesh Shetty
  0 siblings, 0 replies; 62+ messages in thread
From: Nitesh Shetty @ 2022-02-07  9:57 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

Chaitanya,

I would like to join the conversation.

Thanks,
Nitesh

On Sun, Feb 6, 2022 at 7:29 PM Javier González <javier@javigon.com> wrote:
>
> On 27.01.2022 07:14, Chaitanya Kulkarni wrote:
> >Hi,
> >
> >* Background :-
> >-----------------------------------------------------------------------
> >
> >Copy offload is a feature that allows file-systems or storage devices
> >to be instructed to copy files/logical blocks without requiring
> >involvement of the local CPU.
> >
> >With reference to the RISC-V summit keynote [1] single threaded
> >performance is limiting due to Denard scaling and multi-threaded
> >performance is slowing down due Moore's law limitations. With the rise
> >of SNIA Computation Technical Storage Working Group (TWG) [2],
> >offloading computations to the device or over the fabrics is becoming
> >popular as there are several solutions available [2]. One of the common
> >operation which is popular in the kernel and is not merged yet is Copy
> >offload over the fabrics or on to the device.
> >
> >* Problem :-
> >-----------------------------------------------------------------------
> >
> >The original work which is done by Martin is present here [3]. The
> >latest work which is posted by Mikulas [4] is not merged yet. These two
> >approaches are totally different from each other. Several storage
> >vendors discourage mixing copy offload requests with regular READ/WRITE
> >I/O. Also, the fact that the operation fails if a copy request ever
> >needs to be split as it traverses the stack it has the unfortunate
> >side-effect of preventing copy offload from working in pretty much
> >every common deployment configuration out there.
> >
> >* Current state of the work :-
> >-----------------------------------------------------------------------
> >
> >With [3] being hard to handle arbitrary DM/MD stacking without
> >splitting the command in two, one for copying IN and one for copying
> >OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> >candidate. Also, with [4] there is an unresolved problem with the
> >two-command approach about how to handle changes to the DM layout
> >between an IN and OUT operations.
> >
> >We have conducted a call with interested people late last year since
> >lack of LSFMMM and we would like to share the details with broader
> >community members.
>
> Chaitanya,
>
> I would also like to join the F2F conversation as a follow up of the
> virtual one last year. We will have a first version of the patches
> posted in the next few weeks. This will hopefully serve as a good first
> step.
>
> Adding Kanchan to thread too.
>
> Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
                     ` (3 preceding siblings ...)
  2022-02-01 10:21   ` Javier González
@ 2022-02-02  5:57   ` Kanchan Joshi
  2022-02-07 10:45   ` David Disseldorp
  2022-03-01 17:34   ` Nikos Tsironis
  6 siblings, 0 replies; 62+ messages in thread
From: Kanchan Joshi @ 2022-02-02  5:57 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On Thu, Jan 27, 2022 at 12:51 PM Chaitanya Kulkarni
<chaitanyak@nvidia.com> wrote:
>
> Hi,
>
> * Background :-
> -----------------------------------------------------------------------
>
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
>
> * Problem :-
> -----------------------------------------------------------------------
>
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
>
> * Current state of the work :-
> -----------------------------------------------------------------------
>
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
>
> We have conducted a call with interested people late last year since
> lack of LSFMMM and we would like to share the details with broader
> community members.

I'm keen on this topic and would like to join the F2F discussion.
The Novmber call did establish some consensus on requirements.
Planning to have a round or two of code-discussions soon.


Thanks,
-- 
Kanchan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
                     ` (2 preceding siblings ...)
  2022-02-01  1:54   ` Luis Chamberlain
@ 2022-02-01 10:21   ` Javier González
  2022-02-07  9:57     ` Nitesh Shetty
  2022-02-02  5:57   ` Kanchan Joshi
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 62+ messages in thread
From: Javier González @ 2022-02-01 10:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On 27.01.2022 07:14, Chaitanya Kulkarni wrote:
>Hi,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>We have conducted a call with interested people late last year since
>lack of LSFMMM and we would like to share the details with broader
>community members.

Chaitanya,

I would also like to join the F2F conversation as a follow up of the
virtual one last year. We will have a first version of the patches
posted in the next few weeks. This will hopefully serve as a good first
step.

Adding Kanchan to thread too.

Javier

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
  2022-01-28 19:59   ` Adam Manzanares
  2022-01-31 19:03   ` Bart Van Assche
@ 2022-02-01  1:54   ` Luis Chamberlain
  2022-02-01 10:21   ` Javier González
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 62+ messages in thread
From: Luis Chamberlain @ 2022-02-01  1:54 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.

Consider me intersted in this topic.

  Luis

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
  2022-01-28 19:59   ` Adam Manzanares
@ 2022-01-31 19:03   ` Bart Van Assche
  2022-02-01  1:54   ` Luis Chamberlain
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 62+ messages in thread
From: Bart Van Assche @ 2022-01-31 19:03 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 1/26/22 23:14, Chaitanya Kulkarni wrote:
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>         https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf

Please consider adding the following link to the above list:
https://github.com/bvanassche/linux-kernel-copy-offload

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-28 19:59   ` Adam Manzanares
@ 2022-01-31 11:49     ` Johannes Thumshirn
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Thumshirn @ 2022-01-31 11:49 UTC (permalink / raw)
  To: chaitanyak, a.manzanares
  Cc: clm, josef, lsf-pc, linux-nvme, kbusch, osandov, bvanassche,
	dm-devel, linux-scsi, mpatocka, djwong, dsterba, msnitzer,
	Frederick.Knight, hch, hare, roland, tytso, axboe, linux-block,
	zach.brown, linux-fsdevel, jack, martin.petersen

On Fri, 2022-01-28 at 19:59 +0000, Adam Manzanares wrote:
> On Thu, Jan 27, 2022 at 07:14:13AM +0000, Chaitanya Kulkarni wrote:
> > 
> > * Current state of the work :-
> > -------------------------------------------------------------------
> > ----
> > 
> > With [3] being hard to handle arbitrary DM/MD stacking without
> > splitting the command in two, one for copying IN and one for
> > copying
> > OUT. Which is then demonstrated by the [4] why [3] it is not a
> > suitable
> > candidate. Also, with [4] there is an unresolved problem with the
> > two-command approach about how to handle changes to the DM layout
> > between an IN and OUT operations.
> > 
> > We have conducted a call with interested people late last year
> > since 
> > lack of LSFMMM and we would like to share the details with broader
> > community members.
> 
> Was on that call and I am interested in joining this discussion.

Same for me :)




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` Chaitanya Kulkarni
@ 2022-01-28 19:59   ` Adam Manzanares
  2022-01-31 11:49     ` Johannes Thumshirn
  2022-01-31 19:03   ` Bart Van Assche
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 62+ messages in thread
From: Adam Manzanares @ 2022-01-28 19:59 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On Thu, Jan 27, 2022 at 07:14:13AM +0000, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since 
> lack of LSFMMM and we would like to share the details with broader
> community members.

Was on that call and I am interested in joining this discussion.

> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> -ck
> 
> [1]https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=3933d1bc-66a8e8f3-39325af3-0cc47a30d446-55df181e6aabd8e8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fcontent.riscv.org*2Fwp-content*2Fuploads*2F2018*2F12*2FA-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf__;JSUlJSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggCI71vV3$ 
> [2] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=e9dc0639-b6473f76-e9dd8d76-0cc47a30d446-03d65bc9ad20d215&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.snia.org*2Fcomputational__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggLInnHhS$ 
> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=13eb47ed-4c707ea2-13eacca2-0cc47a30d446-3d06014a33154497&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.napatech.com*2Fsupport*2Fresources*2Fsolution-descriptions*2Fnapatech-smartnic-solution-for-hardware-offload*2F__;JSUlJSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggJJSlhVh$ 
>        https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=8ba72fbf-d43c16f0-8ba6a4f0-0cc47a30d446-359457fd63a1a13d&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.eideticom.com*2Fproducts.html__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggGCerEbv$ 
> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=75b96fa9-2a2256e6-75b8e4e6-0cc47a30d446-0403b00d6ff1bab8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.xilinx.com*2Fapplications*2Fdata-center*2Fcomputational-storage.html__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggK0Hp6vG$ 
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=3a49563e-65d26f71-3a48dd71-0cc47a30d446-3cecc3d55115742b&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.spinics.net*2Flists*2Flinux-block*2Fmsg00599.html__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggPvo936U$ 
> [5] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=910e6991-ce9550de-910fe2de-0cc47a30d446-c412c0c3c4c51c2b&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Flwn.net*2FArticles*2F793585*2F__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggIprHJMJ$ 
> [6] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0ab886e2-5523bfad-0ab90dad-0cc47a30d446-df0ae4acca6d59f2&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fnvmexpress.org*2Fnew-nvmetm-specification-defines-zoned-__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggB4MUwfa$ 
> namespaces-zns-as-go-to-industry-technology/
> [7] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=44a1a51b-1b3a9c54-44a02e54-0cc47a30d446-8577b144c92493eb&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fgithub.com*2Fsbates130272*2Flinux-p2pmem__;JSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggEa99Fso$ 
> [8] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0745845d-58debd12-07440f12-0cc47a30d446-53178030a251a9d8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fkernel.dk*2Fio_uring.pdf__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggJUxR2B3$ 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-01-27  7:14 ` Chaitanya Kulkarni
  2022-01-28 19:59   ` Adam Manzanares
                     ` (6 more replies)
  0 siblings, 7 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2022-01-27  7:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

We have conducted a call with interested people late last year since 
lack of LSFMMM and we would like to share the details with broader
community members.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

-ck

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
       https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-02-13  5:11   ` joshi.k
@ 2020-02-13 13:09     ` Knight, Frederick
  0 siblings, 0 replies; 62+ messages in thread
From: Knight, Frederick @ 2020-02-13 13:09 UTC (permalink / raw)
  To: joshi.k, 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	'Matias Bjorling', 'Stephen Bates',
	roland, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig',
	zach.brown, javier

FWIW - the design of NVMe Simply Copy specifically included versioning of the data structure that describes what to copy.

The reason for that was random peoples desire to complexify the Simple Copy command.  Specifically, there was room designed into the data structure to accommodate a source NSID (to allow cross namespace copy - the intention being namespaces attached to the same controller); and room to accommodate the KPIO key tag value for each source range.  Other people thought they could use this data structure versioning to design a fully SCSI XCOPY compatible data structure.

My point, is just to consider the flexibility and extensibility of the OS interfaces when thinking about "Simple Copy".

I'm just not sure how SIMPLY it will remain.

	Fred 


-----Original Message-----
From: joshi.k@samsung.com <joshi.k@samsung.com> 
Sent: Thursday, February 13, 2020 12:11 AM
To: 'Chaitanya Kulkarni' <Chaitanya.Kulkarni@wdc.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; 'Martin K. Petersen' <martin.petersen@oracle.com>; 'Matias Bjorling' <Matias.Bjorling@wdc.com>; 'Stephen Bates' <sbates@raithlin.com>; roland@purestorage.com; joshi.k@samsung.com; mpatocka@redhat.com; hare@suse.de; 'Keith Busch' <kbusch@kernel.org>; rwheeler@redhat.com; 'Christoph Hellwig' <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; joshi.k@samsung.com; javier@javigon.com
Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On 
> Behalf Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux- 
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux- 
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling 
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>; 
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch 
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig 
> <hch@lst.de>; frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
>
> Hi all,
>
> * Background :-
> ----------------------------------------------------------------------
> -
>
> Copy offload is a feature that allows file-systems or storage devices 
> to
be
> instructed to copy files/logical blocks without requiring involvement 
> of
the local
> CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is 
> slowing
down
> due Moore's law limitations. With the rise of SNIA Computation 
> Technical Storage Working Group (TWG) [2], offloading computations to 
> the device or over the fabrics is becoming popular as there are 
> several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
>
> * Problem :-
> ----------------------------------------------------------------------
> -
>
> The original work which is done by Martin is present here [3]. The 
> latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches 
> are totally different from each other. Several storage vendors 
> discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that 
> the operation fails if a copy request ever needs to be split as it 
> traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from 
> working in pretty much every common deployment configuration out there.
>
> * Current state of the work :-
> ----------------------------------------------------------------------
> -
>
> With [3] being hard to handle arbitrary DM/MD stacking without 
> splitting
the
> command in two, one for copying IN and one for copying OUT. Which is 
> then demonstrated by the [4] why [3] it is not a suitable candidate. 
> Also, with
[4]
> there is an unresolved problem with the two-command approach about how 
> to handle changes to the DM layout between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> ----------------------------------------------------------------------
> -
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux 
> Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
>
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy 
> Offload support, so that Linux Kernel Storage subsystem can to address 
> previously mentioned problems [1] and allow efficient offloading of 
> the data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates 
> waiting
for Copy
> Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> ----------------------------------------------------------------------
> -
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> ----------------------------------------------------------------------
> -
>
> I'd like to invite block layer, device drivers and file system 
> developers
to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
>
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy 
> [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
>
> Regards,
> Chaitanya
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-n
> vme



^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` Chaitanya Kulkarni
                     ` (2 preceding siblings ...)
  2020-01-24 14:23   ` Nikos Tsironis
@ 2020-02-13  5:11   ` joshi.k
  2020-02-13 13:09     ` Knight, Frederick
  3 siblings, 1 reply; 62+ messages in thread
From: joshi.k @ 2020-02-13  5:11 UTC (permalink / raw)
  To: 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	'Matias Bjorling', 'Stephen Bates',
	roland, joshi.k, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig',
	frederick.knight, zach.brown, joshi.k, javier

I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss
and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single
namespace. Some of the problems that original XCOPY work [2] faced may not
be applicable for simple-copy, e.g. split of single copy due to differing
device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple
copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend
copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses
page-cache, which is fair. But, for relatively cold/warm data (if that needs
to be garbage-collected anyway), it can rather bypass the Host and skip
running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will
benefit from internal copy. But failure scenarios (partial copy,
write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf
> Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>;
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig <hch@lst.de>;
> frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
> 
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices to
be
> instructed to copy files/logical blocks without requiring involvement of
the local
> CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is slowing
down
> due Moore's law limitations. With the rise of SNIA Computation Technical
> Storage Working Group (TWG) [2], offloading computations to the device or
> over the fabrics is becoming popular as there are several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches are
> totally different from each other. Several storage vendors discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that the
> operation fails if a copy request ever needs to be split as it traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from working in
> pretty much every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without splitting
the
> command in two, one for copying IN and one for copying OUT. Which is then
> demonstrated by the [4] why [3] it is not a suitable candidate. Also, with
[4]
> there is an unresolved problem with the two-command approach about how to
> handle changes to the DM layout between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy Offload
> support, so that Linux Kernel Storage subsystem can to address previously
> mentioned problems [1] and allow efficient offloading of the data related
> operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
for Copy
> Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system developers
to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-nvme



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` Chaitanya Kulkarni
  2020-01-08 10:17   ` Javier González
  2020-01-09  3:18   ` Bart Van Assche
@ 2020-01-24 14:23   ` Nikos Tsironis
  2020-02-13  5:11   ` joshi.k
  3 siblings, 0 replies; 62+ messages in thread
From: Nikos Tsironis @ 2020-01-24 14:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

On 1/7/20 8:14 PM, Chaitanya Kulkarni wrote:
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 

This is a very interesting topic and I would like to participate in the
discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18   ` Bart Van Assche
  2020-01-09  4:01     ` Chaitanya Kulkarni
  2020-01-09  5:56     ` Damien Le Moal
@ 2020-01-10  5:33     ` Martin K. Petersen
  2 siblings, 0 replies; 62+ messages in thread
From: Martin K. Petersen @ 2020-01-10  5:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling


Bart,

> * Copying must be supported not only within a single storage device but
>   also between storage devices.

Identifying which devices to permit copies between has been challenging.
That has since been addressed in T10.

> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).

I don't think LID1 vs LID4 is particularly interesting for the Linux use
case. It's just an additional command tag since the copy manager is a
third party.

> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).

Microsoft uses the token commands.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18   ` Bart Van Assche
  2020-01-09  4:01     ` Chaitanya Kulkarni
@ 2020-01-09  5:56     ` Damien Le Moal
  2020-01-10  5:33     ` Martin K. Petersen
  2 siblings, 0 replies; 62+ messages in thread
From: Damien Le Moal @ 2020-01-09  5:56 UTC (permalink / raw)
  To: Bart Van Assche, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
> 
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).

Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.

> 
> Thanks,
> 
> Bart.
> 
> 
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
> 
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
>   dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
>   and RAID, at least if RAID is supported by the filesystem. Note: the
>   BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
>   should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
>   network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
>   copying can happen locally on the storage array. XCOPY is widely used
>   for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
> 
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
>   support asynchronous operation such that users of this API are not
>   blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
>   also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
>   (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
> 
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
>   List Identifier length of 1 byte and LID4 stands for a List Identifier
>   length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
>   EXTENDED COPY(LID4) (83h/01h).
> 
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
> 
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
> 
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
> 
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
> 
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
> 
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
>   -> dm_make_request()
>     -> __dm_make_request()
>       -> __split_and_process_bio()
>         -> __split_and_process_non_flush()
>           -> __clone_and_map_data_bio()
>           -> alloc_tio()
>           -> clone_bio()
>             -> bio_advance()
>           -> __map_bio()
> 
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
>   #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
> 
> struct file_clone_range {
> 	__s64 src_fd;
> 	__u64 src_offset;
> 	__u64 src_length;
> 	__u64 dest_offset;
> };
> 
> * The sendfile() system call. sendfile() copies a given number of bytes
>   from one file to another. The output offset is the offset of the
>   output file descriptor. The input offset is either the input file
>   descriptor offset or can be specified explicitly. The sendfile()
>   prototype is as follows:
>   ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
>   ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
>   prototype is as follows:
>   ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>      loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
>   functionality since it copies data from or to a pipe. Its prototype is
>   as follows:
>   long splice(struct file *in, loff_t *off_in, struct file *out,
>     loff_t *off_out, size_t len, unsigned int flags);
> 
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
>   device are both specified in the same bio. Only works for block
>   devices. Does not work for files. Adds a new blocking ioctl() for
>   XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
>   REQ_OP_COPY_READ operations. These are sent individually down stacked
>   drivers and are paired by the driver at the bottom of the stack.
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18   ` Bart Van Assche
@ 2020-01-09  4:01     ` Chaitanya Kulkarni
  2020-01-09  5:56     ` Damien Le Moal
  2020-01-10  5:33     ` Martin K. Petersen
  2 siblings, 0 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-09  4:01 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).
>
> Thanks,
>
> Bart.
>

Thanks for sharing this Bart, this is very helpful.

I've not found any notes on lwn for the session which was held in 2018.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` Chaitanya Kulkarni
  2020-01-08 10:17   ` Javier González
@ 2020-01-09  3:18   ` Bart Van Assche
  2020-01-09  4:01     ` Chaitanya Kulkarni
                       ` (2 more replies)
  2020-01-24 14:23   ` Nikos Tsironis
  2020-02-13  5:11   ` joshi.k
  3 siblings, 3 replies; 62+ messages in thread
From: Bart Van Assche @ 2020-01-09  3:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.

Was this last discussed during the 2018 edition of LSF/MM (see also
https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
taken notes during that session? I haven't found a report of that
session in the official proceedings (https://lwn.net/Articles/752509/).

Thanks,

Bart.


This is my own collection with two year old notes about copy offloading
for the Linux Kernel:

Potential Users
* All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
  dm-writecache and dm-zoned.
* Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
  and RAID, at least if RAID is supported by the filesystem. Note: the
  BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
  should use FICLONERANGE instead.
* Network filesystems, e.g. NFS. Copying at the server side can reduce
  network traffic significantly.
* Linux SCSI initiator systems connected to SAN systems such that
  copying can happen locally on the storage array. XCOPY is widely used
  for provisioning virtual machine images.
* Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.

Requirements
* The block layer must gain support for XCOPY. The new XCOPY API must
  support asynchronous operation such that users of this API are not
  blocked while the XCOPY operation is in progress.
* Copying must be supported not only within a single storage device but
  also between storage devices.
* The SCSI sd driver must gain support for XCOPY.
* A user space API must be added and that API must support asynchronous
  (non-blocking) operation.
* The block layer XCOPY primitive must be support by the device mapper.

SCSI Extended Copy (ANSI T10 SPC)
The SCSI commands that support extended copy operations are:
* POPULATE TOKEN + WRITE USING TOKEN.
* EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
  List Identifier length of 1 byte and LID4 stands for a List Identifier
  length of 4 bytes.
* SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
  EXTENDED COPY(LID4) (83h/01h).

Existing Users and Implementations of SCSI XCOPY
* VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
* Microsoft, which uses ODX (aka LID4 because it has a four-byte length
  ID).
* Storage vendors all support XCOPY, but ODX support is growing.

Block Layer Notes
The block layer supports the following types of block drivers:
* blk-mq request-based drivers.
* make_request drivers.

Notes:
With each request a list of bio's is associated.
Since submit_bio() only accepts a single bio and not a bio list this
means that all make_request block drivers process one bio at a time.

Device Mapper
The device mapper core supports bio processing and blk-mq requests. The
function in the device mapper that creates a request queue is called
alloc_dev(). That function not only allocates a request queue but also
associates a struct gendisk with the request queue. The
DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
causes the type of a dm device to be set to one of the following:
DM_TYPE_NONE;
DM_TYPE_BIO_BASED;
DM_TYPE_REQUEST_BASED;
DM_TYPE_MQ_REQUEST_BASED;
DM_TYPE_DAX_BIO_BASED;
DM_TYPE_NVME_BIO_BASED.

Device mapper drivers must implement target_type.map(),
target_type.clone_and_map_rq() or both. .map() maps a bio list.
.clone_and_map_rq() maps a single request. The multipath and error
device mapper drivers implement both methods. All other dm drivers only
implement the .map() method.

Device mapper bio processing
submit_bio()
-> generic_make_request()
  -> dm_make_request()
    -> __dm_make_request()
      -> __split_and_process_bio()
        -> __split_and_process_non_flush()
          -> __clone_and_map_data_bio()
          -> alloc_tio()
          -> clone_bio()
            -> bio_advance()
          -> __map_bio()

Existing Linux Copy Offload APIs
* The FICLONERANGE ioctl. From <include/linux/fs.h>:
  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)

struct file_clone_range {
	__s64 src_fd;
	__u64 src_offset;
	__u64 src_length;
	__u64 dest_offset;
};

* The sendfile() system call. sendfile() copies a given number of bytes
  from one file to another. The output offset is the offset of the
  output file descriptor. The input offset is either the input file
  descriptor offset or can be specified explicitly. The sendfile()
  prototype is as follows:
  ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
  ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
* The copy_file_range() system call. See also vfs_copy_file_range(). Its
  prototype is as follows:
  ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
     loff_t *off_out, size_t len, unsigned int flags);
* The splice() system call is not appropriate for adding extended copy
  functionality since it copies data from or to a pipe. Its prototype is
  as follows:
  long splice(struct file *in, loff_t *off_in, struct file *out,
    loff_t *off_out, size_t len, unsigned int flags);

Existing Linux Block Layer Copy Offload Implementations
* Martin Petersen's REQ_COPY bio, where source and destination block
  device are both specified in the same bio. Only works for block
  devices. Does not work for files. Adds a new blocking ioctl() for
  XCOPY from user space.
* Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
  REQ_OP_COPY_READ operations. These are sent individually down stacked
  drivers and are paired by the driver at the bottom of the stack.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-08 10:17   ` Javier González
@ 2020-01-09  0:51     ` Logan Gunthorpe
  0 siblings, 0 replies; 62+ messages in thread
From: Logan Gunthorpe @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling,
	Kanchan Joshi, stephen



On 2020-01-08 3:17 a.m., Javier González wrote:
> I think this is good topic and I would like to participate in the
> discussion too. I think that Logan Gunthorpe would also be interested
> (Cc). Adding Kanchan too, who is also working on this and can contribute
> to the discussion
> 
> We discussed this in the context of P2P at different SNIA events in the
> context of computational offloads and also as the backend implementation
> for Simple Copy, which is coming in NVMe. Discussing this (again) at
> LSF/MM and finding a way to finally get XCOPY merged would be great.

Yes, I would definitely be interested in discussing copy offload
especially in the context of P2P. Sorting out a userspace interface for
this that supports a P2P use case would be very beneficial to a lot of
folks.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` Chaitanya Kulkarni
@ 2020-01-08 10:17   ` Javier González
  2020-01-09  0:51     ` Logan Gunthorpe
  2020-01-09  3:18   ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Javier González @ 2020-01-08 10:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling,
	Kanchan Joshi, Logan Gunthorpe, stephen

On 07.01.2020 18:14, Chaitanya Kulkarni wrote:
>Hi all,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite block layer, device drivers and file system
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Stephen Bates
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Matias Bjørling
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf
>
>Regards,
>Chaitanya

I think this is good topic and I would like to participate in the
discussion too. I think that Logan Gunthorpe would also be interested
(Cc). Adding Kanchan too, who is also working on this and can contribute
to the discussion

We discussed this in the context of P2P at different SNIA events in the
context of computational offloads and also as the backend implementation
for Simple Copy, which is coming in NVMe. Discussing this (again) at
LSF/MM and finding a way to finally get XCOPY merged would be great.

Thanks,
Javier




^ permalink raw reply	[flat|nested] 62+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-07 18:14 ` Chaitanya Kulkarni
  2020-01-08 10:17   ` Javier González
                     ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-07 18:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

Hi all,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite block layer, device drivers and file system
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Stephen Bates
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Matias Bjørling

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

Regards,
Chaitanya

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2022-03-09 15:50 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
2021-05-11 21:15 ` Knight, Frederick
2021-05-12  2:21 ` Bart Van Assche
     [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
2021-05-12  7:13   ` Javier González
2021-05-12  7:30 ` Johannes Thumshirn
     [not found]   ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
2021-09-28 19:13     ` Javier González
2021-09-29  6:44       ` Johannes Thumshirn
2021-09-30  9:43       ` Chaitanya Kulkarni
2021-09-30  9:53         ` Javier González
2021-10-06 10:01         ` Javier González
2021-10-13  8:35           ` Javier González
2021-09-30 16:20       ` Bart Van Assche
2021-10-06 10:05         ` Javier González
2021-10-06 17:33           ` Bart Van Assche
2021-10-08  6:49             ` Javier González
2021-10-29  0:21               ` Chaitanya Kulkarni
2021-10-29  5:51                 ` Hannes Reinecke
2021-10-29  8:16                   ` Javier González
2021-10-29 16:15                   ` Bart Van Assche
2021-11-01 17:54                     ` Keith Busch
2021-10-29  8:14                 ` Javier González
2021-11-03 19:27                   ` Javier González
2021-11-16 13:43                     ` Javier González
2021-11-16 17:59                       ` Bart Van Assche
2021-11-17 12:53                         ` Javier González
2021-11-17 15:52                           ` Bart Van Assche
2021-11-19  7:38                             ` Javier González
2021-11-19 10:47                       ` Kanchan Joshi
2021-11-19 15:51                         ` Keith Busch
2021-11-19 16:21                         ` Bart Van Assche
2021-11-22  7:39                       ` Kanchan Joshi
2021-05-12 15:23 ` Hannes Reinecke
2021-05-12 15:45 ` Himanshu Madhani
2021-05-17 16:39 ` Kanchan Joshi
2021-05-18  0:15 ` Bart Van Assche
2021-06-11  6:03 ` Chaitanya Kulkarni
2021-06-11 15:35 ` Nikos Tsironis
     [not found] <CGME20220127071544uscas1p2f70f4d2509f3ebd574b7ed746d3fa551@uscas1p2.samsung.com>
2022-01-27  7:14 ` Chaitanya Kulkarni
2022-01-28 19:59   ` Adam Manzanares
2022-01-31 11:49     ` Johannes Thumshirn
2022-01-31 19:03   ` Bart Van Assche
2022-02-01  1:54   ` Luis Chamberlain
2022-02-01 10:21   ` Javier González
2022-02-07  9:57     ` Nitesh Shetty
2022-02-02  5:57   ` Kanchan Joshi
2022-02-07 10:45   ` David Disseldorp
2022-03-01 17:34   ` Nikos Tsironis
2022-03-01 21:32     ` Chaitanya Kulkarni
2022-03-03 18:36       ` Nikos Tsironis
2022-03-08 20:48       ` Nikos Tsironis
2022-03-09  8:51         ` Mikulas Patocka
2022-03-09 15:49           ` Nikos Tsironis
     [not found] <CGME20200107181551epcas5p4f47eeafd807c28a26b4024245c4e00ab@epcas5p4.samsung.com>
2020-01-07 18:14 ` Chaitanya Kulkarni
2020-01-08 10:17   ` Javier González
2020-01-09  0:51     ` Logan Gunthorpe
2020-01-09  3:18   ` Bart Van Assche
2020-01-09  4:01     ` Chaitanya Kulkarni
2020-01-09  5:56     ` Damien Le Moal
2020-01-10  5:33     ` Martin K. Petersen
2020-01-24 14:23   ` Nikos Tsironis
2020-02-13  5:11   ` joshi.k
2020-02-13 13:09     ` Knight, Frederick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).