From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Bart Van Assche <bvanassche@acm.org>,
Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
"hare@suse.de" <hare@suse.de>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
Stephen Bates <sbates@raithlin.com>,
"msnitzer@redhat.com" <msnitzer@redhat.com>,
"mpatocka@redhat.com" <mpatocka@redhat.com>,
"zach.brown@ni.com" <zach.brown@ni.com>,
"roland@purestorage.com" <roland@purestorage.com>,
"rwheeler@redhat.com" <rwheeler@redhat.com>,
"frederick.knight@netapp.com" <frederick.knight@netapp.com>,
Matias Bjorling <Matias.Bjorling@wdc.com>
Subject: Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
Date: Thu, 9 Jan 2020 05:56:07 +0000 [thread overview]
Message-ID: <BYAPR04MB581697B0367321CBAD04F9C4E7390@BYAPR04MB5816.namprd04.prod.outlook.com> (raw)
In-Reply-To: fda88fd3-2d75-085e-ca15-a29f89c1e781@acm.org
On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
>
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).
Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.
>
> Thanks,
>
> Bart.
>
>
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
>
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
> dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
> and RAID, at least if RAID is supported by the filesystem. Note: the
> BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
> should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
> network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
> copying can happen locally on the storage array. XCOPY is widely used
> for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
>
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
> support asynchronous operation such that users of this API are not
> blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
> also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
> (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
>
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
> List Identifier length of 1 byte and LID4 stands for a List Identifier
> length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
> EXTENDED COPY(LID4) (83h/01h).
>
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
> ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
>
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
>
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
>
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
>
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
>
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
> -> dm_make_request()
> -> __dm_make_request()
> -> __split_and_process_bio()
> -> __split_and_process_non_flush()
> -> __clone_and_map_data_bio()
> -> alloc_tio()
> -> clone_bio()
> -> bio_advance()
> -> __map_bio()
>
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
> #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
>
> struct file_clone_range {
> __s64 src_fd;
> __u64 src_offset;
> __u64 src_length;
> __u64 dest_offset;
> };
>
> * The sendfile() system call. sendfile() copies a given number of bytes
> from one file to another. The output offset is the offset of the
> output file descriptor. The input offset is either the input file
> descriptor offset or can be specified explicitly. The sendfile()
> prototype is as follows:
> ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
> ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
> prototype is as follows:
> ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
> loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
> functionality since it copies data from or to a pipe. Its prototype is
> as follows:
> long splice(struct file *in, loff_t *off_in, struct file *out,
> loff_t *off_out, size_t len, unsigned int flags);
>
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
> device are both specified in the same bio. Only works for block
> devices. Does not work for files. Adds a new blocking ioctl() for
> XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
> REQ_OP_COPY_READ operations. These are sent individually down stacked
> drivers and are paired by the driver at the bottom of the stack.
>
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2020-01-09 5:56 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20200107181551epcas5p4f47eeafd807c28a26b4024245c4e00ab@epcas5p4.samsung.com>
2020-01-07 18:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
2020-01-08 10:17 ` Javier González
2020-01-09 0:51 ` Logan Gunthorpe
2020-01-09 3:18 ` Bart Van Assche
2020-01-09 4:01 ` Chaitanya Kulkarni
2020-01-09 5:56 ` Damien Le Moal [this message]
2020-01-10 5:33 ` Martin K. Petersen
2020-01-24 14:23 ` Nikos Tsironis
2020-02-13 5:11 ` joshi.k
2020-02-13 13:09 ` Knight, Frederick
2021-05-11 0:15 Chaitanya Kulkarni
2021-05-11 21:15 ` Knight, Frederick
2021-05-12 2:21 ` Bart Van Assche
[not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
2021-05-12 7:13 ` Javier González
2021-05-12 7:30 ` Johannes Thumshirn
[not found] ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
2021-09-28 19:13 ` Javier González
2021-09-29 6:44 ` Johannes Thumshirn
2021-09-30 9:43 ` Chaitanya Kulkarni
2021-09-30 9:53 ` Javier González
2021-10-06 10:01 ` Javier González
2021-10-13 8:35 ` Javier González
2021-09-30 16:20 ` Bart Van Assche
2021-10-06 10:05 ` Javier González
2021-10-06 17:33 ` Bart Van Assche
2021-10-08 6:49 ` Javier González
2021-10-29 0:21 ` Chaitanya Kulkarni
2021-10-29 5:51 ` Hannes Reinecke
2021-10-29 8:16 ` Javier González
2021-10-29 16:15 ` Bart Van Assche
2021-11-01 17:54 ` Keith Busch
2021-10-29 8:14 ` Javier González
2021-11-03 19:27 ` Javier González
2021-11-16 13:43 ` Javier González
2021-11-16 17:59 ` Bart Van Assche
2021-11-17 12:53 ` Javier González
2021-11-17 15:52 ` Bart Van Assche
2021-11-19 7:38 ` Javier González
2021-11-19 10:47 ` Kanchan Joshi
2021-11-19 15:51 ` Keith Busch
2021-11-19 16:21 ` Bart Van Assche
2021-11-22 7:39 ` Kanchan Joshi
2021-05-12 15:23 ` Hannes Reinecke
2021-05-12 15:45 ` Himanshu Madhani
2021-05-17 16:39 ` Kanchan Joshi
2021-05-18 0:15 ` Bart Van Assche
2021-06-11 6:03 ` Chaitanya Kulkarni
2021-06-11 15:35 ` Nikos Tsironis
[not found] <CGME20220127071544uscas1p2f70f4d2509f3ebd574b7ed746d3fa551@uscas1p2.samsung.com>
2022-01-27 7:14 ` Chaitanya Kulkarni
2022-01-28 19:59 ` Adam Manzanares
2022-01-31 11:49 ` Johannes Thumshirn
2022-01-31 19:03 ` Bart Van Assche
2022-02-01 1:54 ` Luis Chamberlain
2022-02-01 10:21 ` Javier González
2022-02-07 9:57 ` Nitesh Shetty
2022-02-02 5:57 ` Kanchan Joshi
2022-02-07 10:45 ` David Disseldorp
2022-03-01 17:34 ` Nikos Tsironis
2022-03-01 21:32 ` Chaitanya Kulkarni
2022-03-03 18:36 ` Nikos Tsironis
2022-03-08 20:48 ` Nikos Tsironis
2022-03-09 8:51 ` Mikulas Patocka
2022-03-09 15:49 ` Nikos Tsironis
[not found] <CGME20230113094723epcas5p2f6f81ca1ad85f4b26829b87f8ec301ce@epcas5p2.samsung.com>
2023-01-13 9:46 ` Nitesh Shetty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BYAPR04MB581697B0367321CBAD04F9C4E7390@BYAPR04MB5816.namprd04.prod.outlook.com \
--to=damien.lemoal@wdc.com \
--cc=Chaitanya.Kulkarni@wdc.com \
--cc=Matias.Bjorling@wdc.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=dm-devel@redhat.com \
--cc=frederick.knight@netapp.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.petersen@oracle.com \
--cc=mpatocka@redhat.com \
--cc=msnitzer@redhat.com \
--cc=roland@purestorage.com \
--cc=rwheeler@redhat.com \
--cc=sbates@raithlin.com \
--cc=zach.brown@ni.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).