linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Knight, Frederick" <Frederick.Knight@netapp.com>
To: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	"msnitzer@redhat.com" <msnitzer@redhat.com>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"roland@purestorage.com" <roland@purestorage.com>,
	"mpatocka@redhat.com" <mpatocka@redhat.com>,
	Hannes Reinecke <hare@suse.de>,
	"kbusch@kernel.org" <kbusch@kernel.org>,
	"rwheeler@redhat.com" <rwheeler@redhat.com>,
	"hch@lst.de" <hch@lst.de>,
	"zach.brown@ni.com" <zach.brown@ni.com>,
	"osandov@fb.com" <osandov@fb.com>
Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
Date: Tue, 11 May 2021 21:15:48 +0000	[thread overview]
Message-ID: <DM5PR0601MB3624FF16F91F5F03BB5F0CC1F1539@DM5PR0601MB3624.namprd06.prod.outlook.com> (raw)
In-Reply-To: <BYAPR04MB49652C4B75E38F3716F3C06386539@BYAPR04MB4965.namprd04.prod.outlook.com>

I'd love to participate in this discussion.

You mention the 2 different models (single command vs. multi-command).  Just as a reminder, there are specific reasons for those 2 different models.

Some applications know both the source and the destination, so can use the single command model (the application is aware it is doing a copy).  But, there is a group of applications that do NOT know both pieces of information at the same time, in the same thread, in the same context (the application is NOT aware it is doing a copy - the application thinks it is doing reads and writes).

That is why there are 2 different models - because the application engineers didn't want to change their application.  So, the author of the CP application (the shell copy command) wanted to use the existing READ / WRITE model (2 commands).  Just replace the READ with "get the data ready" and replace the WRITE with "use the data you got ready".  It was easier for that application to use the existing model, rather than totally redesigning the application.

But, other application engineers had a code base that already knew a copy was happening, and their code already knew both the source and destination in the same code path. A BACKUP application is one that generally fits into this camp.  So, it was easier for that application to replace that function with a single copy request.  Another application was a VM mastering/replicating application that could spin up new VM images very quickly - the source and destination are known to be able to use a single request.

When this offload journey began, both interfaces were needed and used.  But yes, it did bifurcate the space, creating 2 camps of engineers - each with their favorite method (based on the application where they planned to use it).  Each camp of engineers often sees no reason that the other camp can't just switch to do it the way they do - if they'd only see the light.  But, originally, there were 2 different sets of requirements that each drove a specific design of a copy offload model.

Even NVMe has recently joined the copy offload camp with a new COPY command (single namespace, multiple source ranges, single destination range - works well for defrag, and other use cases). I'm confident its capabilities will grow over time.

SO, I think this will be a great discussion to have!!!

	Fred Knight



-----Original Message-----
From: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> 
Sent: Monday, May 10, 2021 8:16 PM
To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; martin.petersen@oracle.com; roland@purestorage.com; mpatocka@redhat.com; Hannes Reinecke <hare@suse.de>; kbusch@kernel.org; rwheeler@redhat.com; hch@lst.de; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; osandov@fb.com
Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices to be instructed to copy files/logical blocks without requiring involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded performance is limiting due to Denard scaling and multi-threaded performance is slowing down due Moore's law limitations. With the rise of SNIA Computation Technical Storage Working Group (TWG) [2], offloading computations to the device or over the fabrics is becoming popular as there are several solutions available [2]. One of the common operation which is popular in the kernel and is not merged yet is Copy offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The latest work which is posted by Mikulas [4] is not merged yet. These two approaches are totally different from each other. Several storage vendors discourage mixing copy offload requests with regular READ/WRITE I/O. Also, the fact that the operation fails if a copy request ever needs to be split as it traverses the stack it has the unfortunate side-effect of preventing copy offload from working in pretty much every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without splitting the command in two, one for copying IN and one for copying OUT. Which is then demonstrated by the [4] why [3] it is not a suitable candidate. Also, with [4] there is an unresolved problem with the two-command approach about how to handle changes to the DM layout between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2], existing SCSI XCopy support in the protocol, recent advancement in the Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the Linux Kernel mainly for NVMe devices [7] and eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.

With this background we have significant number of use-cases which are strong candidates waiting for outstanding Linux Kernel Block Layer Copy Offload support, so that Linux Kernel Storage subsystem can to address previously mentioned problems [1] and allow efficient offloading of the data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-05-11 21:16 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
2021-05-11 21:15 ` Knight, Frederick [this message]
2021-05-12  2:21 ` Bart Van Assche
     [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
2021-05-12  7:13   ` Javier González
2021-05-12  7:30 ` Johannes Thumshirn
     [not found]   ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
2021-09-28 19:13     ` Javier González
2021-09-29  6:44       ` Johannes Thumshirn
2021-09-30  9:43       ` Chaitanya Kulkarni
2021-09-30  9:53         ` Javier González
2021-10-06 10:01         ` Javier González
2021-10-13  8:35           ` Javier González
2021-09-30 16:20       ` Bart Van Assche
2021-10-06 10:05         ` Javier González
2021-10-06 17:33           ` Bart Van Assche
2021-10-08  6:49             ` Javier González
2021-10-29  0:21               ` Chaitanya Kulkarni
2021-10-29  5:51                 ` Hannes Reinecke
2021-10-29  8:16                   ` Javier González
2021-10-29 16:15                   ` Bart Van Assche
2021-11-01 17:54                     ` Keith Busch
2021-10-29  8:14                 ` Javier González
2021-11-03 19:27                   ` Javier González
2021-11-16 13:43                     ` Javier González
2021-11-16 17:59                       ` Bart Van Assche
2021-11-17 12:53                         ` Javier González
2021-11-17 15:52                           ` Bart Van Assche
2021-11-19  7:38                             ` Javier González
2021-11-19 10:47                       ` Kanchan Joshi
2021-11-19 15:51                         ` Keith Busch
2021-11-19 16:21                         ` Bart Van Assche
2021-11-22  7:39                       ` Kanchan Joshi
2021-05-12 15:23 ` Hannes Reinecke
2021-05-12 15:45 ` Himanshu Madhani
2021-05-17 16:39 ` Kanchan Joshi
2021-05-18  0:15 ` Bart Van Assche
2021-06-11  6:03 ` Chaitanya Kulkarni
2021-06-11 15:35 ` Nikos Tsironis
     [not found] <CGME20230113094723epcas5p2f6f81ca1ad85f4b26829b87f8ec301ce@epcas5p2.samsung.com>
2023-01-13  9:46 ` Nitesh Shetty
     [not found] <CGME20220127071544uscas1p2f70f4d2509f3ebd574b7ed746d3fa551@uscas1p2.samsung.com>
2022-01-27  7:14 ` Chaitanya Kulkarni
2022-01-28 19:59   ` Adam Manzanares
2022-01-31 11:49     ` Johannes Thumshirn
2022-01-31 19:03   ` Bart Van Assche
2022-02-01  1:54   ` Luis Chamberlain
2022-02-01 10:21   ` Javier González
2022-02-07  9:57     ` Nitesh Shetty
2022-02-02  5:57   ` Kanchan Joshi
2022-02-07 10:45   ` David Disseldorp
2022-03-01 17:34   ` Nikos Tsironis
2022-03-01 21:32     ` Chaitanya Kulkarni
2022-03-03 18:36       ` Nikos Tsironis
2022-03-08 20:48       ` Nikos Tsironis
2022-03-09  8:51         ` Mikulas Patocka
2022-03-09 15:49           ` Nikos Tsironis
     [not found] <CGME20200107181551epcas5p4f47eeafd807c28a26b4024245c4e00ab@epcas5p4.samsung.com>
2020-01-07 18:14 ` Chaitanya Kulkarni
2020-01-08 10:17   ` Javier González
2020-01-09  0:51     ` Logan Gunthorpe
2020-01-09  3:18   ` Bart Van Assche
2020-01-09  4:01     ` Chaitanya Kulkarni
2020-01-09  5:56     ` Damien Le Moal
2020-01-10  5:33     ` Martin K. Petersen
2020-01-24 14:23   ` Nikos Tsironis
2020-02-13  5:11   ` joshi.k
2020-02-13 13:09     ` Knight, Frederick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR0601MB3624FF16F91F5F03BB5F0CC1F1539@DM5PR0601MB3624.namprd06.prod.outlook.com \
    --to=frederick.knight@netapp.com \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpatocka@redhat.com \
    --cc=msnitzer@redhat.com \
    --cc=osandov@fb.com \
    --cc=roland@purestorage.com \
    --cc=rwheeler@redhat.com \
    --cc=zach.brown@ni.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).