Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-07 18:14 ` Chaitanya Kulkarni
  2020-01-08 10:17   ` Javier González
                     ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-07 18:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

Hi all,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite block layer, device drivers and file system
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Stephen Bates
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Matias Bjørling

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

Regards,
Chaitanya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2020-01-08 10:17   ` Javier González
  2020-01-09  0:51     ` Logan Gunthorpe
  2020-01-09  3:18   ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Javier González @ 2020-01-08 10:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling,
	Kanchan Joshi, Logan Gunthorpe, stephen

On 07.01.2020 18:14, Chaitanya Kulkarni wrote:
>Hi all,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite block layer, device drivers and file system
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Stephen Bates
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Matias Bjørling
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf
>
>Regards,
>Chaitanya

I think this is good topic and I would like to participate in the
discussion too. I think that Logan Gunthorpe would also be interested
(Cc). Adding Kanchan too, who is also working on this and can contribute
to the discussion

We discussed this in the context of P2P at different SNIA events in the
context of computational offloads and also as the backend implementation
for Simple Copy, which is coming in NVMe. Discussing this (again) at
LSF/MM and finding a way to finally get XCOPY merged would be great.

Thanks,
Javier




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-08 10:17   ` Javier González
@ 2020-01-09  0:51     ` Logan Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Logan Gunthorpe @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling,
	Kanchan Joshi, stephen



On 2020-01-08 3:17 a.m., Javier González wrote:
> I think this is good topic and I would like to participate in the
> discussion too. I think that Logan Gunthorpe would also be interested
> (Cc). Adding Kanchan too, who is also working on this and can contribute
> to the discussion
> 
> We discussed this in the context of P2P at different SNIA events in the
> context of computational offloads and also as the backend implementation
> for Simple Copy, which is coming in NVMe. Discussing this (again) at
> LSF/MM and finding a way to finally get XCOPY merged would be great.

Yes, I would definitely be interested in discussing copy offload
especially in the context of P2P. Sorting out a userspace interface for
this that supports a P2P use case would be very beneficial to a lot of
folks.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
  2020-01-08 10:17   ` Javier González
@ 2020-01-09  3:18   ` Bart Van Assche
  2020-01-09  4:01     ` Chaitanya Kulkarni
                       ` (2 more replies)
  2020-01-24 14:23   ` Nikos Tsironis
  2020-02-13  5:11   ` joshi.k
  3 siblings, 3 replies; 10+ messages in thread
From: Bart Van Assche @ 2020-01-09  3:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.

Was this last discussed during the 2018 edition of LSF/MM (see also
https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
taken notes during that session? I haven't found a report of that
session in the official proceedings (https://lwn.net/Articles/752509/).

Thanks,

Bart.


This is my own collection with two year old notes about copy offloading
for the Linux Kernel:

Potential Users
* All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
  dm-writecache and dm-zoned.
* Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
  and RAID, at least if RAID is supported by the filesystem. Note: the
  BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
  should use FICLONERANGE instead.
* Network filesystems, e.g. NFS. Copying at the server side can reduce
  network traffic significantly.
* Linux SCSI initiator systems connected to SAN systems such that
  copying can happen locally on the storage array. XCOPY is widely used
  for provisioning virtual machine images.
* Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.

Requirements
* The block layer must gain support for XCOPY. The new XCOPY API must
  support asynchronous operation such that users of this API are not
  blocked while the XCOPY operation is in progress.
* Copying must be supported not only within a single storage device but
  also between storage devices.
* The SCSI sd driver must gain support for XCOPY.
* A user space API must be added and that API must support asynchronous
  (non-blocking) operation.
* The block layer XCOPY primitive must be support by the device mapper.

SCSI Extended Copy (ANSI T10 SPC)
The SCSI commands that support extended copy operations are:
* POPULATE TOKEN + WRITE USING TOKEN.
* EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
  List Identifier length of 1 byte and LID4 stands for a List Identifier
  length of 4 bytes.
* SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
  EXTENDED COPY(LID4) (83h/01h).

Existing Users and Implementations of SCSI XCOPY
* VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
* Microsoft, which uses ODX (aka LID4 because it has a four-byte length
  ID).
* Storage vendors all support XCOPY, but ODX support is growing.

Block Layer Notes
The block layer supports the following types of block drivers:
* blk-mq request-based drivers.
* make_request drivers.

Notes:
With each request a list of bio's is associated.
Since submit_bio() only accepts a single bio and not a bio list this
means that all make_request block drivers process one bio at a time.

Device Mapper
The device mapper core supports bio processing and blk-mq requests. The
function in the device mapper that creates a request queue is called
alloc_dev(). That function not only allocates a request queue but also
associates a struct gendisk with the request queue. The
DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
causes the type of a dm device to be set to one of the following:
DM_TYPE_NONE;
DM_TYPE_BIO_BASED;
DM_TYPE_REQUEST_BASED;
DM_TYPE_MQ_REQUEST_BASED;
DM_TYPE_DAX_BIO_BASED;
DM_TYPE_NVME_BIO_BASED.

Device mapper drivers must implement target_type.map(),
target_type.clone_and_map_rq() or both. .map() maps a bio list.
.clone_and_map_rq() maps a single request. The multipath and error
device mapper drivers implement both methods. All other dm drivers only
implement the .map() method.

Device mapper bio processing
submit_bio()
-> generic_make_request()
  -> dm_make_request()
    -> __dm_make_request()
      -> __split_and_process_bio()
        -> __split_and_process_non_flush()
          -> __clone_and_map_data_bio()
          -> alloc_tio()
          -> clone_bio()
            -> bio_advance()
          -> __map_bio()

Existing Linux Copy Offload APIs
* The FICLONERANGE ioctl. From <include/linux/fs.h>:
  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)

struct file_clone_range {
	__s64 src_fd;
	__u64 src_offset;
	__u64 src_length;
	__u64 dest_offset;
};

* The sendfile() system call. sendfile() copies a given number of bytes
  from one file to another. The output offset is the offset of the
  output file descriptor. The input offset is either the input file
  descriptor offset or can be specified explicitly. The sendfile()
  prototype is as follows:
  ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
  ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
* The copy_file_range() system call. See also vfs_copy_file_range(). Its
  prototype is as follows:
  ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
     loff_t *off_out, size_t len, unsigned int flags);
* The splice() system call is not appropriate for adding extended copy
  functionality since it copies data from or to a pipe. Its prototype is
  as follows:
  long splice(struct file *in, loff_t *off_in, struct file *out,
    loff_t *off_out, size_t len, unsigned int flags);

Existing Linux Block Layer Copy Offload Implementations
* Martin Petersen's REQ_COPY bio, where source and destination block
  device are both specified in the same bio. Only works for block
  devices. Does not work for files. Adds a new blocking ioctl() for
  XCOPY from user space.
* Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
  REQ_OP_COPY_READ operations. These are sent individually down stacked
  drivers and are paired by the driver at the bottom of the stack.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18   ` Bart Van Assche
@ 2020-01-09  4:01     ` Chaitanya Kulkarni
  2020-01-09  5:56     ` Damien Le Moal
  2020-01-10  5:33     ` Martin K. Petersen
  2 siblings, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-09  4:01 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).
>
> Thanks,
>
> Bart.
>

Thanks for sharing this Bart, this is very helpful.

I've not found any notes on lwn for the session which was held in 2018.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18   ` Bart Van Assche
  2020-01-09  4:01     ` Chaitanya Kulkarni
@ 2020-01-09  5:56     ` Damien Le Moal
  2020-01-10  5:33     ` Martin K. Petersen
  2 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2020-01-09  5:56 UTC (permalink / raw)
  To: Bart Van Assche, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
> 
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).

Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.

> 
> Thanks,
> 
> Bart.
> 
> 
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
> 
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
>   dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
>   and RAID, at least if RAID is supported by the filesystem. Note: the
>   BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
>   should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
>   network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
>   copying can happen locally on the storage array. XCOPY is widely used
>   for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
> 
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
>   support asynchronous operation such that users of this API are not
>   blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
>   also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
>   (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
> 
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
>   List Identifier length of 1 byte and LID4 stands for a List Identifier
>   length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
>   EXTENDED COPY(LID4) (83h/01h).
> 
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
> 
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
> 
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
> 
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
> 
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
> 
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
>   -> dm_make_request()
>     -> __dm_make_request()
>       -> __split_and_process_bio()
>         -> __split_and_process_non_flush()
>           -> __clone_and_map_data_bio()
>           -> alloc_tio()
>           -> clone_bio()
>             -> bio_advance()
>           -> __map_bio()
> 
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
>   #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
> 
> struct file_clone_range {
> 	__s64 src_fd;
> 	__u64 src_offset;
> 	__u64 src_length;
> 	__u64 dest_offset;
> };
> 
> * The sendfile() system call. sendfile() copies a given number of bytes
>   from one file to another. The output offset is the offset of the
>   output file descriptor. The input offset is either the input file
>   descriptor offset or can be specified explicitly. The sendfile()
>   prototype is as follows:
>   ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
>   ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
>   prototype is as follows:
>   ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>      loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
>   functionality since it copies data from or to a pipe. Its prototype is
>   as follows:
>   long splice(struct file *in, loff_t *off_in, struct file *out,
>     loff_t *off_out, size_t len, unsigned int flags);
> 
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
>   device are both specified in the same bio. Only works for block
>   devices. Does not work for files. Adds a new blocking ioctl() for
>   XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
>   REQ_OP_COPY_READ operations. These are sent individually down stacked
>   drivers and are paired by the driver at the bottom of the stack.
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18   ` Bart Van Assche
  2020-01-09  4:01     ` Chaitanya Kulkarni
  2020-01-09  5:56     ` Damien Le Moal
@ 2020-01-10  5:33     ` Martin K. Petersen
  2 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2020-01-10  5:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, linux-block\, linux-scsi\, linux-nvme\,
	dm-devel\, lsf-pc\, axboe\, hare\,
	Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer\, mpatocka\, zach.brown\, roland\,
	rwheeler\, frederick.knight\,
	Matias Bjorling


Bart,

> * Copying must be supported not only within a single storage device but
>   also between storage devices.

Identifying which devices to permit copies between has been challenging.
That has since been addressed in T10.

> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).

I don't think LID1 vs LID4 is particularly interesting for the Linux use
case. It's just an additional command tag since the copy manager is a
third party.

> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).

Microsoft uses the token commands.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
  2020-01-08 10:17   ` Javier González
  2020-01-09  3:18   ` Bart Van Assche
@ 2020-01-24 14:23   ` Nikos Tsironis
  2020-02-13  5:11   ` joshi.k
  3 siblings, 0 replies; 10+ messages in thread
From: Nikos Tsironis @ 2020-01-24 14:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

On 1/7/20 8:14 PM, Chaitanya Kulkarni wrote:
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 

This is a very interesting topic and I would like to participate in the
discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                     ` (2 preceding siblings ...)
  2020-01-24 14:23   ` Nikos Tsironis
@ 2020-02-13  5:11   ` joshi.k
  2020-02-13 13:09     ` Knight, Frederick
  3 siblings, 1 reply; 10+ messages in thread
From: joshi.k @ 2020-02-13  5:11 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, joshi.k, mpatocka, hare, Keith Busch,
	rwheeler, Christoph Hellwig, frederick.knight, zach.brown,
	joshi.k, javier

I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss
and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single
namespace. Some of the problems that original XCOPY work [2] faced may not
be applicable for simple-copy, e.g. split of single copy due to differing
device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple
copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend
copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses
page-cache, which is fair. But, for relatively cold/warm data (if that needs
to be garbage-collected anyway), it can rather bypass the Host and skip
running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will
benefit from internal copy. But failure scenarios (partial copy,
write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf
> Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>;
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig <hch@lst.de>;
> frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
> 
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices to
be
> instructed to copy files/logical blocks without requiring involvement of
the local
> CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is slowing
down
> due Moore's law limitations. With the rise of SNIA Computation Technical
> Storage Working Group (TWG) [2], offloading computations to the device or
> over the fabrics is becoming popular as there are several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches are
> totally different from each other. Several storage vendors discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that the
> operation fails if a copy request ever needs to be split as it traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from working in
> pretty much every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without splitting
the
> command in two, one for copying IN and one for copying OUT. Which is then
> demonstrated by the [4] why [3] it is not a suitable candidate. Also, with
[4]
> there is an unresolved problem with the two-command approach about how to
> handle changes to the DM layout between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy Offload
> support, so that Linux Kernel Storage subsystem can to address previously
> mentioned problems [1] and allow efficient offloading of the data related
> operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
for Copy
> Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system developers
to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-nvme



^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-02-13  5:11   ` joshi.k
@ 2020-02-13 13:09     ` Knight, Frederick
  0 siblings, 0 replies; 10+ messages in thread
From: Knight, Frederick @ 2020-02-13 13:09 UTC (permalink / raw)
  To: joshi.k, Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, hare, Keith Busch, rwheeler,
	Christoph Hellwig, zach.brown, javier

FWIW - the design of NVMe Simply Copy specifically included versioning of the data structure that describes what to copy.

The reason for that was random peoples desire to complexify the Simple Copy command.  Specifically, there was room designed into the data structure to accommodate a source NSID (to allow cross namespace copy - the intention being namespaces attached to the same controller); and room to accommodate the KPIO key tag value for each source range.  Other people thought they could use this data structure versioning to design a fully SCSI XCOPY compatible data structure.

My point, is just to consider the flexibility and extensibility of the OS interfaces when thinking about "Simple Copy".

I'm just not sure how SIMPLY it will remain.

	Fred 


-----Original Message-----
From: joshi.k@samsung.com <joshi.k@samsung.com> 
Sent: Thursday, February 13, 2020 12:11 AM
To: 'Chaitanya Kulkarni' <Chaitanya.Kulkarni@wdc.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; 'Martin K. Petersen' <martin.petersen@oracle.com>; 'Matias Bjorling' <Matias.Bjorling@wdc.com>; 'Stephen Bates' <sbates@raithlin.com>; roland@purestorage.com; joshi.k@samsung.com; mpatocka@redhat.com; hare@suse.de; 'Keith Busch' <kbusch@kernel.org>; rwheeler@redhat.com; 'Christoph Hellwig' <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; joshi.k@samsung.com; javier@javigon.com
Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On 
> Behalf Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux- 
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux- 
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling 
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>; 
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch 
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig 
> <hch@lst.de>; frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
>
> Hi all,
>
> * Background :-
> ----------------------------------------------------------------------
> -
>
> Copy offload is a feature that allows file-systems or storage devices 
> to
be
> instructed to copy files/logical blocks without requiring involvement 
> of
the local
> CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is 
> slowing
down
> due Moore's law limitations. With the rise of SNIA Computation 
> Technical Storage Working Group (TWG) [2], offloading computations to 
> the device or over the fabrics is becoming popular as there are 
> several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
>
> * Problem :-
> ----------------------------------------------------------------------
> -
>
> The original work which is done by Martin is present here [3]. The 
> latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches 
> are totally different from each other. Several storage vendors 
> discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that 
> the operation fails if a copy request ever needs to be split as it 
> traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from 
> working in pretty much every common deployment configuration out there.
>
> * Current state of the work :-
> ----------------------------------------------------------------------
> -
>
> With [3] being hard to handle arbitrary DM/MD stacking without 
> splitting
the
> command in two, one for copying IN and one for copying OUT. Which is 
> then demonstrated by the [4] why [3] it is not a suitable candidate. 
> Also, with
[4]
> there is an unresolved problem with the two-command approach about how 
> to handle changes to the DM layout between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> ----------------------------------------------------------------------
> -
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux 
> Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
>
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy 
> Offload support, so that Linux Kernel Storage subsystem can to address 
> previously mentioned problems [1] and allow efficient offloading of 
> the data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates 
> waiting
for Copy
> Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> ----------------------------------------------------------------------
> -
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> ----------------------------------------------------------------------
> -
>
> I'd like to invite block layer, device drivers and file system 
> developers
to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
>
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy 
> [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
>
> Regards,
> Chaitanya
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-n
> vme



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20200107181551epcas5p4f47eeafd807c28a26b4024245c4e00ab@epcas5p4.samsung.com>
2020-01-07 18:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
2020-01-08 10:17   ` Javier González
2020-01-09  0:51     ` Logan Gunthorpe
2020-01-09  3:18   ` Bart Van Assche
2020-01-09  4:01     ` Chaitanya Kulkarni
2020-01-09  5:56     ` Damien Le Moal
2020-01-10  5:33     ` Martin K. Petersen
2020-01-24 14:23   ` Nikos Tsironis
2020-02-13  5:11   ` joshi.k
2020-02-13 13:09     ` Knight, Frederick

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git