All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-01-27  7:14 ` Chaitanya Kulkarni
  2022-01-28 19:59     ` [dm-devel] " Adam Manzanares
                     ` (6 more replies)
  0 siblings, 7 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-01-27  7:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

We have conducted a call with interested people late last year since 
lack of LSFMMM and we would like to share the details with broader
community members.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

-ck

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
       https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2022-01-28 19:59     ` Adam Manzanares
  2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-01-28 19:59 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On Thu, Jan 27, 2022 at 07:14:13AM +0000, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since 
> lack of LSFMMM and we would like to share the details with broader
> community members.

Was on that call and I am interested in joining this discussion.

> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> -ck
> 
> [1]https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=3933d1bc-66a8e8f3-39325af3-0cc47a30d446-55df181e6aabd8e8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fcontent.riscv.org*2Fwp-content*2Fuploads*2F2018*2F12*2FA-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf__;JSUlJSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggCI71vV3$ 
> [2] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=e9dc0639-b6473f76-e9dd8d76-0cc47a30d446-03d65bc9ad20d215&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.snia.org*2Fcomputational__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggLInnHhS$ 
> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=13eb47ed-4c707ea2-13eacca2-0cc47a30d446-3d06014a33154497&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.napatech.com*2Fsupport*2Fresources*2Fsolution-descriptions*2Fnapatech-smartnic-solution-for-hardware-offload*2F__;JSUlJSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggJJSlhVh$ 
>        https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=8ba72fbf-d43c16f0-8ba6a4f0-0cc47a30d446-359457fd63a1a13d&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.eideticom.com*2Fproducts.html__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggGCerEbv$ 
> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=75b96fa9-2a2256e6-75b8e4e6-0cc47a30d446-0403b00d6ff1bab8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.xilinx.com*2Fapplications*2Fdata-center*2Fcomputational-storage.html__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggK0Hp6vG$ 
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=3a49563e-65d26f71-3a48dd71-0cc47a30d446-3cecc3d55115742b&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.spinics.net*2Flists*2Flinux-block*2Fmsg00599.html__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggPvo936U$ 
> [5] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=910e6991-ce9550de-910fe2de-0cc47a30d446-c412c0c3c4c51c2b&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Flwn.net*2FArticles*2F793585*2F__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggIprHJMJ$ 
> [6] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0ab886e2-5523bfad-0ab90dad-0cc47a30d446-df0ae4acca6d59f2&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fnvmexpress.org*2Fnew-nvmetm-specification-defines-zoned-__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggB4MUwfa$ 
> namespaces-zns-as-go-to-industry-technology/
> [7] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=44a1a51b-1b3a9c54-44a02e54-0cc47a30d446-8577b144c92493eb&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fgithub.com*2Fsbates130272*2Flinux-p2pmem__;JSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggEa99Fso$ 
> [8] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0745845d-58debd12-07440f12-0cc47a30d446-53178030a251a9d8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fkernel.dk*2Fio_uring.pdf__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggJUxR2B3$ 

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-01-28 19:59     ` Adam Manzanares
  0 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-01-28 19:59 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, mpatocka,
	kbus >> Keith Busch, Frederick.Knight, Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Thu, Jan 27, 2022 at 07:14:13AM +0000, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since 
> lack of LSFMMM and we would like to share the details with broader
> community members.

Was on that call and I am interested in joining this discussion.

> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> -ck
> 
> [1]https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=3933d1bc-66a8e8f3-39325af3-0cc47a30d446-55df181e6aabd8e8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fcontent.riscv.org*2Fwp-content*2Fuploads*2F2018*2F12*2FA-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf__;JSUlJSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggCI71vV3$ 
> [2] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=e9dc0639-b6473f76-e9dd8d76-0cc47a30d446-03d65bc9ad20d215&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.snia.org*2Fcomputational__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggLInnHhS$ 
> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=13eb47ed-4c707ea2-13eacca2-0cc47a30d446-3d06014a33154497&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.napatech.com*2Fsupport*2Fresources*2Fsolution-descriptions*2Fnapatech-smartnic-solution-for-hardware-offload*2F__;JSUlJSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggJJSlhVh$ 
>        https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=8ba72fbf-d43c16f0-8ba6a4f0-0cc47a30d446-359457fd63a1a13d&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.eideticom.com*2Fproducts.html__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggGCerEbv$ 
> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=75b96fa9-2a2256e6-75b8e4e6-0cc47a30d446-0403b00d6ff1bab8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.xilinx.com*2Fapplications*2Fdata-center*2Fcomputational-storage.html__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggK0Hp6vG$ 
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=3a49563e-65d26f71-3a48dd71-0cc47a30d446-3cecc3d55115742b&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fwww.spinics.net*2Flists*2Flinux-block*2Fmsg00599.html__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggPvo936U$ 
> [5] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=910e6991-ce9550de-910fe2de-0cc47a30d446-c412c0c3c4c51c2b&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Flwn.net*2FArticles*2F793585*2F__;JSUlJSUl!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggIprHJMJ$ 
> [6] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0ab886e2-5523bfad-0ab90dad-0cc47a30d446-df0ae4acca6d59f2&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fnvmexpress.org*2Fnew-nvmetm-specification-defines-zoned-__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggB4MUwfa$ 
> namespaces-zns-as-go-to-industry-technology/
> [7] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=44a1a51b-1b3a9c54-44a02e54-0cc47a30d446-8577b144c92493eb&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fgithub.com*2Fsbates130272*2Flinux-p2pmem__;JSUlJSU!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggEa99Fso$ 
> [8] https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0745845d-58debd12-07440f12-0cc47a30d446-53178030a251a9d8&q=1&e=c880f1d4-0275-4c86-ba38-205de0f24f69&u=https*3A*2F*2Fkernel.dk*2Fio_uring.pdf__;JSUlJQ!!EwVzqGoTKBqv-0DWAJBm!BhtIUewpIpaTRbAVe6VvjiRs-431N4ehiLybkoGuMxLiIvcuYlijJGJWlXVggJUxR2B3$ 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-28 19:59     ` [dm-devel] " Adam Manzanares
@ 2022-01-31 11:49       ` Johannes Thumshirn
  -1 siblings, 0 replies; 272+ messages in thread
From: Johannes Thumshirn @ 2022-01-31 11:49 UTC (permalink / raw)
  To: chaitanyak, a.manzanares
  Cc: clm, josef, lsf-pc, linux-nvme, kbusch, osandov, bvanassche,
	dm-devel, linux-scsi, mpatocka, djwong, dsterba, msnitzer,
	Frederick.Knight, hch, hare, roland, tytso, axboe, linux-block,
	zach.brown, linux-fsdevel, jack, martin.petersen

On Fri, 2022-01-28 at 19:59 +0000, Adam Manzanares wrote:
> On Thu, Jan 27, 2022 at 07:14:13AM +0000, Chaitanya Kulkarni wrote:
> > 
> > * Current state of the work :-
> > -------------------------------------------------------------------
> > ----
> > 
> > With [3] being hard to handle arbitrary DM/MD stacking without
> > splitting the command in two, one for copying IN and one for
> > copying
> > OUT. Which is then demonstrated by the [4] why [3] it is not a
> > suitable
> > candidate. Also, with [4] there is an unresolved problem with the
> > two-command approach about how to handle changes to the DM layout
> > between an IN and OUT operations.
> > 
> > We have conducted a call with interested people late last year
> > since 
> > lack of LSFMMM and we would like to share the details with broader
> > community members.
> 
> Was on that call and I am interested in joining this discussion.

Same for me :)




^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-01-31 11:49       ` Johannes Thumshirn
  0 siblings, 0 replies; 272+ messages in thread
From: Johannes Thumshirn @ 2022-01-31 11:49 UTC (permalink / raw)
  To: chaitanyak, a.manzanares
  Cc: djwong, linux-nvme, clm, dm-devel, hch, msnitzer, bvanassche,
	linux-scsi, osandov, roland, zach.brown, josef, linux-block,
	mpatocka, dsterba, Frederick.Knight, axboe, tytso,
	martin.petersen, kbusch, jack, linux-fsdevel, lsf-pc

On Fri, 2022-01-28 at 19:59 +0000, Adam Manzanares wrote:
> On Thu, Jan 27, 2022 at 07:14:13AM +0000, Chaitanya Kulkarni wrote:
> > 
> > * Current state of the work :-
> > -------------------------------------------------------------------
> > ----
> > 
> > With [3] being hard to handle arbitrary DM/MD stacking without
> > splitting the command in two, one for copying IN and one for
> > copying
> > OUT. Which is then demonstrated by the [4] why [3] it is not a
> > suitable
> > candidate. Also, with [4] there is an unresolved problem with the
> > two-command approach about how to handle changes to the DM layout
> > between an IN and OUT operations.
> > 
> > We have conducted a call with interested people late last year
> > since 
> > lack of LSFMMM and we would like to share the details with broader
> > community members.
> 
> Was on that call and I am interested in joining this discussion.

Same for me :)




--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2022-01-31 19:03     ` Bart Van Assche
  2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-01-31 19:03 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 1/26/22 23:14, Chaitanya Kulkarni wrote:
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>         https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf

Please consider adding the following link to the above list:
https://github.com/bvanassche/linux-kernel-copy-offload

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-01-31 19:03     ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-01-31 19:03 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 1/26/22 23:14, Chaitanya Kulkarni wrote:
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>         https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf

Please consider adding the following link to the above list:
https://github.com/bvanassche/linux-kernel-copy-offload

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2022-02-01  1:54     ` Luis Chamberlain
  2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-01  1:54 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.

Consider me intersted in this topic.

  Luis

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-02-01  1:54     ` Luis Chamberlain
  0 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-01  1:54 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, mpatocka,
	kbus >> Keith Busch, Frederick.Knight, Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.

Consider me intersted in this topic.

  Luis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
                     ` (2 preceding siblings ...)
  2022-02-01  1:54     ` [dm-devel] " Luis Chamberlain
@ 2022-02-01 10:21   ` Javier González
  2022-02-01 18:31       ` [dm-devel] " Mikulas Patocka
  2022-02-07  9:57       ` [dm-devel] " Nitesh Shetty
  2022-02-02  5:57     ` [dm-devel] " Kanchan Joshi
                     ` (2 subsequent siblings)
  6 siblings, 2 replies; 272+ messages in thread
From: Javier González @ 2022-02-01 10:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On 27.01.2022 07:14, Chaitanya Kulkarni wrote:
>Hi,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>We have conducted a call with interested people late last year since
>lack of LSFMMM and we would like to share the details with broader
>community members.

Chaitanya,

I would also like to join the F2F conversation as a follow up of the
virtual one last year. We will have a first version of the patches
posted in the next few weeks. This will hopefully serve as a good first
step.

Adding Kanchan to thread too.

Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* [RFC PATCH 0/3] NVMe copy offload patches
  2022-02-01 10:21   ` Javier González
@ 2022-02-01 18:31       ` Mikulas Patocka
  2022-02-07  9:57       ` [dm-devel] " Nitesh Shetty
  1 sibling, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:31 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

Hi

Here I'm submitting the first version of NVMe copy offload patches as a
request for comment. They use the token-based approach as we discussed on
the phone call.

The first patch adds generic copy offload support to the block layer - it
adds two new bio types (REQ_OP_COPY_READ_TOKEN and
REQ_OP_COPY_WRITE_TOKEN) and a new ioctl BLKCOPY and a kernel function
blkdev_issue_copy.

The second patch adds copy offload support to the NVMe subsystem.

The third patch implements a "nvme-debug" driver - it is similar to
"scsi-debug", it simulates a nvme host controller, it keeps data in memory
and it supports copy offload according to NVMe Command Set Specification
1.0a. (there are no hardware or software implementations supporting copy
offload so far, so I implemented it in nvme-debug)

TODO:
* implement copy offload in device mapper linear target
* implement copy offload in software NVMe target driver
* make it possible to complete REQ_OP_COPY_WRITE_TOKEN bios asynchronously
* should we use copy_file_range instead of a new ioctl?

How to test this:
* apply the three patches
* select CONFIG_NVME_DEBUG
* compile the kernel
* modprobe nvme-debug; nvme connect -t debug -a 123 -n 456
* issue the BLKCOPY ioctl on /dev/nvme0n1, for example, you can use this
  program:
  http://people.redhat.com/~mpatocka/patches/kernel/xcopy/example/blkcopy.c

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [dm-devel] [RFC PATCH 0/3] NVMe copy offload patches
@ 2022-02-01 18:31       ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:31 UTC (permalink / raw)
  To: Javier González
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

Hi

Here I'm submitting the first version of NVMe copy offload patches as a
request for comment. They use the token-based approach as we discussed on
the phone call.

The first patch adds generic copy offload support to the block layer - it
adds two new bio types (REQ_OP_COPY_READ_TOKEN and
REQ_OP_COPY_WRITE_TOKEN) and a new ioctl BLKCOPY and a kernel function
blkdev_issue_copy.

The second patch adds copy offload support to the NVMe subsystem.

The third patch implements a "nvme-debug" driver - it is similar to
"scsi-debug", it simulates a nvme host controller, it keeps data in memory
and it supports copy offload according to NVMe Command Set Specification
1.0a. (there are no hardware or software implementations supporting copy
offload so far, so I implemented it in nvme-debug)

TODO:
* implement copy offload in device mapper linear target
* implement copy offload in software NVMe target driver
* make it possible to complete REQ_OP_COPY_WRITE_TOKEN bios asynchronously
* should we use copy_file_range instead of a new ioctl?

How to test this:
* apply the three patches
* select CONFIG_NVME_DEBUG
* compile the kernel
* modprobe nvme-debug; nvme connect -t debug -a 123 -n 456
* issue the BLKCOPY ioctl on /dev/nvme0n1, for example, you can use this
  program:
  http://people.redhat.com/~mpatocka/patches/kernel/xcopy/example/blkcopy.c

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [RFC PATCH 1/3] block: add copy offload support
  2022-02-01 18:31       ` [dm-devel] " Mikulas Patocka
@ 2022-02-01 18:32         ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:32 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

Add generic copy offload support to the block layer.

We add two new bio types: REQ_OP_COPY_READ_TOKEN and
REQ_OP_COPY_WRITE_TOKEN. Their bio vector has one entry - a page
containing the token.

When we need to copy data, we send REQ_OP_COPY_READ_TOKEN to the source
device and then we send REQ_OP_COPY_WRITE_TOKEN to the destination device.

This patch introduces a new ioctl BLKCOPY that submits the copy operation.
BLKCOPY argument has four 64-bit numbers - source offset, destination
offset and length. The last number is returned by the ioctl and it is the
number of bytes that were actually copied.

For in-kernel users, we introduce a function blkdev_issue_copy.

Copying may fail anytime, the caller is required to fallback to explicit
copy.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 block/blk-core.c          |    7 +++
 block/blk-lib.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-settings.c      |   12 ++++++
 block/blk-sysfs.c         |    7 +++
 block/blk.h               |    3 +
 block/ioctl.c             |   56 ++++++++++++++++++++++++++++
 include/linux/blk_types.h |    4 ++
 include/linux/blkdev.h    |   18 +++++++++
 include/uapi/linux/fs.h   |    1 
 9 files changed, 197 insertions(+)

Index: linux-2.6/block/blk-settings.c
===================================================================
--- linux-2.6.orig/block/blk-settings.c	2022-01-26 19:12:30.000000000 +0100
+++ linux-2.6/block/blk-settings.c	2022-01-27 20:43:27.000000000 +0100
@@ -57,6 +57,7 @@ void blk_set_default_limits(struct queue
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->max_copy_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -365,6 +366,17 @@ void blk_queue_zone_write_granularity(st
 EXPORT_SYMBOL_GPL(blk_queue_zone_write_granularity);
 
 /**
+ * blk_queue_max_copy_sectors - set maximum copy offload sectors for the queue
+ * @q:  the request queue for the device
+ * @size:  the maximum copy offload sectors
+ */
+void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
+{
+	q->limits.max_copy_sectors = size;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
+
+/**
  * blk_queue_alignment_offset - set physical block alignment offset
  * @q:	the request queue for the device
  * @offset: alignment offset in bytes
Index: linux-2.6/include/linux/blkdev.h
===================================================================
--- linux-2.6.orig/include/linux/blkdev.h	2022-01-26 19:12:30.000000000 +0100
+++ linux-2.6/include/linux/blkdev.h	2022-01-29 17:46:03.000000000 +0100
@@ -103,6 +103,7 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int		max_copy_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -706,6 +707,7 @@ extern void blk_queue_max_zone_append_se
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
 void blk_queue_zone_write_granularity(struct request_queue *q,
 				      unsigned int size);
+void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size);
 extern void blk_queue_alignment_offset(struct request_queue *q,
 				       unsigned int alignment);
 void disk_update_readahead(struct gendisk *disk);
@@ -862,6 +864,10 @@ extern int __blkdev_issue_zeroout(struct
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned flags);
 
+extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
+		      struct block_device *bdev2, sector_t sector2,
+		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask);
+
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {
@@ -1001,6 +1007,18 @@ bdev_zone_write_granularity(struct block
 	return queue_zone_write_granularity(bdev_get_queue(bdev));
 }
 
+static inline unsigned int
+queue_max_copy_sectors(const struct request_queue *q)
+{
+	return q->limits.max_copy_sectors;
+}
+
+static inline unsigned int
+bdev_max_copy_sectors(struct block_device *bdev)
+{
+	return queue_max_copy_sectors(bdev_get_queue(bdev));
+}
+
 static inline int queue_alignment_offset(const struct request_queue *q)
 {
 	if (q->limits.misaligned)
Index: linux-2.6/block/blk-sysfs.c
===================================================================
--- linux-2.6.orig/block/blk-sysfs.c	2022-01-26 19:12:30.000000000 +0100
+++ linux-2.6/block/blk-sysfs.c	2022-01-26 19:12:30.000000000 +0100
@@ -230,6 +230,11 @@ static ssize_t queue_zone_write_granular
 	return queue_var_show(queue_zone_write_granularity(q), page);
 }
 
+static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(queue_max_copy_sectors(q), page);
+}
+
 static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 {
 	unsigned long long max_sectors = q->limits.max_zone_append_sectors;
@@ -591,6 +596,7 @@ QUEUE_RO_ENTRY(queue_write_same_max, "wr
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
+QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
 
 QUEUE_RO_ENTRY(queue_zoned, "zoned");
 QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
@@ -647,6 +653,7 @@ static struct attribute *queue_attrs[] =
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
 	&queue_zone_write_granularity_entry.attr,
+	&queue_max_copy_sectors_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_zoned_entry.attr,
 	&queue_nr_zones_entry.attr,
Index: linux-2.6/include/linux/blk_types.h
===================================================================
--- linux-2.6.orig/include/linux/blk_types.h	2022-01-06 18:55:01.000000000 +0100
+++ linux-2.6/include/linux/blk_types.h	2022-01-29 17:47:44.000000000 +0100
@@ -371,6 +371,10 @@ enum req_opf {
 	/* reset all the zone present on the device */
 	REQ_OP_ZONE_RESET_ALL	= 17,
 
+	/* copy offload bios */
+	REQ_OP_COPY_READ_TOKEN	= 18,
+	REQ_OP_COPY_WRITE_TOKEN	= 19,
+
 	/* Driver private requests */
 	REQ_OP_DRV_IN		= 34,
 	REQ_OP_DRV_OUT		= 35,
Index: linux-2.6/block/blk-lib.c
===================================================================
--- linux-2.6.orig/block/blk-lib.c	2021-08-18 13:59:55.000000000 +0200
+++ linux-2.6/block/blk-lib.c	2022-01-30 17:33:04.000000000 +0100
@@ -440,3 +440,92 @@ retry:
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+static void bio_wake_completion(struct bio *bio)
+{
+	struct completion *comp = bio->bi_private;
+	complete(comp);
+}
+
+int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
+		      struct block_device *bdev2, sector_t sector2,
+		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
+{
+	struct page *token;
+	sector_t m;
+	int r = 0;
+	struct completion comp;
+
+	*copied = 0;
+
+	m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
+	if (!m)
+		return -EOPNOTSUPP;
+	m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
+
+	if (unlikely(bdev_read_only(bdev2)))
+		return -EPERM;
+
+	token = alloc_page(gfp_mask);
+	if (unlikely(!token))
+		return -ENOMEM;
+
+	while (nr_sects) {
+		struct bio *read_bio, *write_bio;
+		sector_t this_step = min(nr_sects, m);
+
+		read_bio = bio_alloc(gfp_mask, 1);
+		if (unlikely(!read_bio)) {
+			r = -ENOMEM;
+			break;
+		}
+		bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
+		bio_set_dev(read_bio, bdev1);
+		__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+		read_bio->bi_iter.bi_sector = sector1;
+		read_bio->bi_iter.bi_size = this_step << 9;
+		read_bio->bi_private = &comp;
+		read_bio->bi_end_io = bio_wake_completion;
+		init_completion(&comp);
+		submit_bio(read_bio);
+		wait_for_completion(&comp);
+		if (unlikely(read_bio->bi_status != BLK_STS_OK)) {
+			r = blk_status_to_errno(read_bio->bi_status);
+			bio_put(read_bio);
+			break;
+		}
+		bio_put(read_bio);
+
+		write_bio = bio_alloc(gfp_mask, 1);
+		if (unlikely(!write_bio)) {
+			r = -ENOMEM;
+			break;
+		}
+		bio_set_op_attrs(write_bio, REQ_OP_COPY_WRITE_TOKEN, REQ_NOMERGE);
+		bio_set_dev(write_bio, bdev2);
+		__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+		write_bio->bi_iter.bi_sector = sector2;
+		write_bio->bi_iter.bi_size = this_step << 9;
+		write_bio->bi_private = &comp;
+		write_bio->bi_end_io = bio_wake_completion;
+		reinit_completion(&comp);
+		submit_bio(write_bio);
+		wait_for_completion(&comp);
+		if (unlikely(write_bio->bi_status != BLK_STS_OK)) {
+			r = blk_status_to_errno(write_bio->bi_status);
+			bio_put(write_bio);
+			break;
+		}
+		bio_put(write_bio);
+
+		sector1 += this_step;
+		sector2 += this_step;
+		nr_sects -= this_step;
+		*copied += this_step;
+	}
+
+	__free_page(token);
+
+	return r;
+}
+EXPORT_SYMBOL(blkdev_issue_copy);
Index: linux-2.6/block/ioctl.c
===================================================================
--- linux-2.6.orig/block/ioctl.c	2022-01-24 15:10:40.000000000 +0100
+++ linux-2.6/block/ioctl.c	2022-01-30 13:43:35.000000000 +0100
@@ -165,6 +165,60 @@ fail:
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	uint64_t range[4];
+	uint64_t start1, start2, end1, end2, len;
+	sector_t copied = 0;
+	struct inode *inode = bdev->bd_inode;
+	int err;
+
+	if (!(mode & FMODE_WRITE)) {
+		err = -EBADF;
+		goto fail1;
+	}
+
+	if (copy_from_user(range, (void __user *)arg, 24)) {
+		err = -EFAULT;
+		goto fail1;
+	}
+
+	start1 = range[0];
+	start2 = range[1];
+	len = range[2];
+	end1 = start1 + len - 1;
+	end2 = start2 + len - 1;
+
+	if ((start1 | start2 | len) & 511)
+		return -EINVAL;
+	if (end1 >= (uint64_t)bdev_nr_bytes(bdev))
+		return -EINVAL;
+	if (end2 >= (uint64_t)bdev_nr_bytes(bdev))
+		return -EINVAL;
+	if (end1 < start1)
+		return -EINVAL;
+	if (end2 < start2)
+		return -EINVAL;
+
+	filemap_invalidate_lock(inode->i_mapping);
+	err = truncate_bdev_range(bdev, mode, start2, end2);
+	if (err)
+		goto fail2;
+
+	err = blkdev_issue_copy(bdev, start1 >> 9, bdev, start2 >> 9, len >> 9, &copied, GFP_KERNEL);
+
+fail2:
+	filemap_invalidate_unlock(inode->i_mapping);
+
+fail1:
+	range[3] = (uint64_t)copied << 9;
+	if (copy_to_user((void __user *)(arg + 24), &range[3], 8))
+		err = -EFAULT;
+
+	return err;
+}
+
 static int put_ushort(unsigned short __user *argp, unsigned short val)
 {
 	return put_user(val, argp);
@@ -459,6 +513,8 @@ static int blkdev_common_ioctl(struct bl
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
 		return put_u64(argp, bdev->bd_disk->diskseq);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKREPORTZONE:
 		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
 	case BLKRESETZONE:
Index: linux-2.6/include/uapi/linux/fs.h
===================================================================
--- linux-2.6.orig/include/uapi/linux/fs.h	2021-09-23 17:07:02.000000000 +0200
+++ linux-2.6/include/uapi/linux/fs.h	2022-01-27 19:05:46.000000000 +0100
@@ -185,6 +185,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IO(0x12,129)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
Index: linux-2.6/block/blk.h
===================================================================
--- linux-2.6.orig/block/blk.h	2022-01-24 15:10:40.000000000 +0100
+++ linux-2.6/block/blk.h	2022-01-29 18:10:28.000000000 +0100
@@ -288,6 +288,9 @@ static inline bool blk_may_split(struct
 	case REQ_OP_WRITE_ZEROES:
 	case REQ_OP_WRITE_SAME:
 		return true; /* non-trivial splitting decisions */
+	case REQ_OP_COPY_READ_TOKEN:
+	case REQ_OP_COPY_WRITE_TOKEN:
+		return false;
 	default:
 		break;
 	}
Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c	2022-01-24 15:10:40.000000000 +0100
+++ linux-2.6/block/blk-core.c	2022-02-01 15:53:39.000000000 +0100
@@ -124,6 +124,8 @@ static const char *const blk_op_name[] =
 	REQ_OP_NAME(ZONE_APPEND),
 	REQ_OP_NAME(WRITE_SAME),
 	REQ_OP_NAME(WRITE_ZEROES),
+	REQ_OP_NAME(COPY_READ_TOKEN),
+	REQ_OP_NAME(COPY_WRITE_TOKEN),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
 };
@@ -758,6 +760,11 @@ noinline_for_stack bool submit_bio_check
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_COPY_READ_TOKEN:
+	case REQ_OP_COPY_WRITE_TOKEN:
+		if (!q->limits.max_copy_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-01 18:32         ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:32 UTC (permalink / raw)
  To: Javier González
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

Add generic copy offload support to the block layer.

We add two new bio types: REQ_OP_COPY_READ_TOKEN and
REQ_OP_COPY_WRITE_TOKEN. Their bio vector has one entry - a page
containing the token.

When we need to copy data, we send REQ_OP_COPY_READ_TOKEN to the source
device and then we send REQ_OP_COPY_WRITE_TOKEN to the destination device.

This patch introduces a new ioctl BLKCOPY that submits the copy operation.
BLKCOPY argument has four 64-bit numbers - source offset, destination
offset and length. The last number is returned by the ioctl and it is the
number of bytes that were actually copied.

For in-kernel users, we introduce a function blkdev_issue_copy.

Copying may fail anytime, the caller is required to fallback to explicit
copy.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 block/blk-core.c          |    7 +++
 block/blk-lib.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-settings.c      |   12 ++++++
 block/blk-sysfs.c         |    7 +++
 block/blk.h               |    3 +
 block/ioctl.c             |   56 ++++++++++++++++++++++++++++
 include/linux/blk_types.h |    4 ++
 include/linux/blkdev.h    |   18 +++++++++
 include/uapi/linux/fs.h   |    1 
 9 files changed, 197 insertions(+)

Index: linux-2.6/block/blk-settings.c
===================================================================
--- linux-2.6.orig/block/blk-settings.c	2022-01-26 19:12:30.000000000 +0100
+++ linux-2.6/block/blk-settings.c	2022-01-27 20:43:27.000000000 +0100
@@ -57,6 +57,7 @@ void blk_set_default_limits(struct queue
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->max_copy_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -365,6 +366,17 @@ void blk_queue_zone_write_granularity(st
 EXPORT_SYMBOL_GPL(blk_queue_zone_write_granularity);
 
 /**
+ * blk_queue_max_copy_sectors - set maximum copy offload sectors for the queue
+ * @q:  the request queue for the device
+ * @size:  the maximum copy offload sectors
+ */
+void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
+{
+	q->limits.max_copy_sectors = size;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
+
+/**
  * blk_queue_alignment_offset - set physical block alignment offset
  * @q:	the request queue for the device
  * @offset: alignment offset in bytes
Index: linux-2.6/include/linux/blkdev.h
===================================================================
--- linux-2.6.orig/include/linux/blkdev.h	2022-01-26 19:12:30.000000000 +0100
+++ linux-2.6/include/linux/blkdev.h	2022-01-29 17:46:03.000000000 +0100
@@ -103,6 +103,7 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int		max_copy_sectors;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -706,6 +707,7 @@ extern void blk_queue_max_zone_append_se
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
 void blk_queue_zone_write_granularity(struct request_queue *q,
 				      unsigned int size);
+void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size);
 extern void blk_queue_alignment_offset(struct request_queue *q,
 				       unsigned int alignment);
 void disk_update_readahead(struct gendisk *disk);
@@ -862,6 +864,10 @@ extern int __blkdev_issue_zeroout(struct
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned flags);
 
+extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
+		      struct block_device *bdev2, sector_t sector2,
+		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask);
+
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {
@@ -1001,6 +1007,18 @@ bdev_zone_write_granularity(struct block
 	return queue_zone_write_granularity(bdev_get_queue(bdev));
 }
 
+static inline unsigned int
+queue_max_copy_sectors(const struct request_queue *q)
+{
+	return q->limits.max_copy_sectors;
+}
+
+static inline unsigned int
+bdev_max_copy_sectors(struct block_device *bdev)
+{
+	return queue_max_copy_sectors(bdev_get_queue(bdev));
+}
+
 static inline int queue_alignment_offset(const struct request_queue *q)
 {
 	if (q->limits.misaligned)
Index: linux-2.6/block/blk-sysfs.c
===================================================================
--- linux-2.6.orig/block/blk-sysfs.c	2022-01-26 19:12:30.000000000 +0100
+++ linux-2.6/block/blk-sysfs.c	2022-01-26 19:12:30.000000000 +0100
@@ -230,6 +230,11 @@ static ssize_t queue_zone_write_granular
 	return queue_var_show(queue_zone_write_granularity(q), page);
 }
 
+static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(queue_max_copy_sectors(q), page);
+}
+
 static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 {
 	unsigned long long max_sectors = q->limits.max_zone_append_sectors;
@@ -591,6 +596,7 @@ QUEUE_RO_ENTRY(queue_write_same_max, "wr
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
+QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
 
 QUEUE_RO_ENTRY(queue_zoned, "zoned");
 QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
@@ -647,6 +653,7 @@ static struct attribute *queue_attrs[] =
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
 	&queue_zone_write_granularity_entry.attr,
+	&queue_max_copy_sectors_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_zoned_entry.attr,
 	&queue_nr_zones_entry.attr,
Index: linux-2.6/include/linux/blk_types.h
===================================================================
--- linux-2.6.orig/include/linux/blk_types.h	2022-01-06 18:55:01.000000000 +0100
+++ linux-2.6/include/linux/blk_types.h	2022-01-29 17:47:44.000000000 +0100
@@ -371,6 +371,10 @@ enum req_opf {
 	/* reset all the zone present on the device */
 	REQ_OP_ZONE_RESET_ALL	= 17,
 
+	/* copy offload bios */
+	REQ_OP_COPY_READ_TOKEN	= 18,
+	REQ_OP_COPY_WRITE_TOKEN	= 19,
+
 	/* Driver private requests */
 	REQ_OP_DRV_IN		= 34,
 	REQ_OP_DRV_OUT		= 35,
Index: linux-2.6/block/blk-lib.c
===================================================================
--- linux-2.6.orig/block/blk-lib.c	2021-08-18 13:59:55.000000000 +0200
+++ linux-2.6/block/blk-lib.c	2022-01-30 17:33:04.000000000 +0100
@@ -440,3 +440,92 @@ retry:
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+static void bio_wake_completion(struct bio *bio)
+{
+	struct completion *comp = bio->bi_private;
+	complete(comp);
+}
+
+int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
+		      struct block_device *bdev2, sector_t sector2,
+		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
+{
+	struct page *token;
+	sector_t m;
+	int r = 0;
+	struct completion comp;
+
+	*copied = 0;
+
+	m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
+	if (!m)
+		return -EOPNOTSUPP;
+	m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
+
+	if (unlikely(bdev_read_only(bdev2)))
+		return -EPERM;
+
+	token = alloc_page(gfp_mask);
+	if (unlikely(!token))
+		return -ENOMEM;
+
+	while (nr_sects) {
+		struct bio *read_bio, *write_bio;
+		sector_t this_step = min(nr_sects, m);
+
+		read_bio = bio_alloc(gfp_mask, 1);
+		if (unlikely(!read_bio)) {
+			r = -ENOMEM;
+			break;
+		}
+		bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
+		bio_set_dev(read_bio, bdev1);
+		__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+		read_bio->bi_iter.bi_sector = sector1;
+		read_bio->bi_iter.bi_size = this_step << 9;
+		read_bio->bi_private = &comp;
+		read_bio->bi_end_io = bio_wake_completion;
+		init_completion(&comp);
+		submit_bio(read_bio);
+		wait_for_completion(&comp);
+		if (unlikely(read_bio->bi_status != BLK_STS_OK)) {
+			r = blk_status_to_errno(read_bio->bi_status);
+			bio_put(read_bio);
+			break;
+		}
+		bio_put(read_bio);
+
+		write_bio = bio_alloc(gfp_mask, 1);
+		if (unlikely(!write_bio)) {
+			r = -ENOMEM;
+			break;
+		}
+		bio_set_op_attrs(write_bio, REQ_OP_COPY_WRITE_TOKEN, REQ_NOMERGE);
+		bio_set_dev(write_bio, bdev2);
+		__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+		write_bio->bi_iter.bi_sector = sector2;
+		write_bio->bi_iter.bi_size = this_step << 9;
+		write_bio->bi_private = &comp;
+		write_bio->bi_end_io = bio_wake_completion;
+		reinit_completion(&comp);
+		submit_bio(write_bio);
+		wait_for_completion(&comp);
+		if (unlikely(write_bio->bi_status != BLK_STS_OK)) {
+			r = blk_status_to_errno(write_bio->bi_status);
+			bio_put(write_bio);
+			break;
+		}
+		bio_put(write_bio);
+
+		sector1 += this_step;
+		sector2 += this_step;
+		nr_sects -= this_step;
+		*copied += this_step;
+	}
+
+	__free_page(token);
+
+	return r;
+}
+EXPORT_SYMBOL(blkdev_issue_copy);
Index: linux-2.6/block/ioctl.c
===================================================================
--- linux-2.6.orig/block/ioctl.c	2022-01-24 15:10:40.000000000 +0100
+++ linux-2.6/block/ioctl.c	2022-01-30 13:43:35.000000000 +0100
@@ -165,6 +165,60 @@ fail:
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	uint64_t range[4];
+	uint64_t start1, start2, end1, end2, len;
+	sector_t copied = 0;
+	struct inode *inode = bdev->bd_inode;
+	int err;
+
+	if (!(mode & FMODE_WRITE)) {
+		err = -EBADF;
+		goto fail1;
+	}
+
+	if (copy_from_user(range, (void __user *)arg, 24)) {
+		err = -EFAULT;
+		goto fail1;
+	}
+
+	start1 = range[0];
+	start2 = range[1];
+	len = range[2];
+	end1 = start1 + len - 1;
+	end2 = start2 + len - 1;
+
+	if ((start1 | start2 | len) & 511)
+		return -EINVAL;
+	if (end1 >= (uint64_t)bdev_nr_bytes(bdev))
+		return -EINVAL;
+	if (end2 >= (uint64_t)bdev_nr_bytes(bdev))
+		return -EINVAL;
+	if (end1 < start1)
+		return -EINVAL;
+	if (end2 < start2)
+		return -EINVAL;
+
+	filemap_invalidate_lock(inode->i_mapping);
+	err = truncate_bdev_range(bdev, mode, start2, end2);
+	if (err)
+		goto fail2;
+
+	err = blkdev_issue_copy(bdev, start1 >> 9, bdev, start2 >> 9, len >> 9, &copied, GFP_KERNEL);
+
+fail2:
+	filemap_invalidate_unlock(inode->i_mapping);
+
+fail1:
+	range[3] = (uint64_t)copied << 9;
+	if (copy_to_user((void __user *)(arg + 24), &range[3], 8))
+		err = -EFAULT;
+
+	return err;
+}
+
 static int put_ushort(unsigned short __user *argp, unsigned short val)
 {
 	return put_user(val, argp);
@@ -459,6 +513,8 @@ static int blkdev_common_ioctl(struct bl
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
 		return put_u64(argp, bdev->bd_disk->diskseq);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKREPORTZONE:
 		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
 	case BLKRESETZONE:
Index: linux-2.6/include/uapi/linux/fs.h
===================================================================
--- linux-2.6.orig/include/uapi/linux/fs.h	2021-09-23 17:07:02.000000000 +0200
+++ linux-2.6/include/uapi/linux/fs.h	2022-01-27 19:05:46.000000000 +0100
@@ -185,6 +185,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IO(0x12,129)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
Index: linux-2.6/block/blk.h
===================================================================
--- linux-2.6.orig/block/blk.h	2022-01-24 15:10:40.000000000 +0100
+++ linux-2.6/block/blk.h	2022-01-29 18:10:28.000000000 +0100
@@ -288,6 +288,9 @@ static inline bool blk_may_split(struct
 	case REQ_OP_WRITE_ZEROES:
 	case REQ_OP_WRITE_SAME:
 		return true; /* non-trivial splitting decisions */
+	case REQ_OP_COPY_READ_TOKEN:
+	case REQ_OP_COPY_WRITE_TOKEN:
+		return false;
 	default:
 		break;
 	}
Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c	2022-01-24 15:10:40.000000000 +0100
+++ linux-2.6/block/blk-core.c	2022-02-01 15:53:39.000000000 +0100
@@ -124,6 +124,8 @@ static const char *const blk_op_name[] =
 	REQ_OP_NAME(ZONE_APPEND),
 	REQ_OP_NAME(WRITE_SAME),
 	REQ_OP_NAME(WRITE_ZEROES),
+	REQ_OP_NAME(COPY_READ_TOKEN),
+	REQ_OP_NAME(COPY_WRITE_TOKEN),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
 };
@@ -758,6 +760,11 @@ noinline_for_stack bool submit_bio_check
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_COPY_READ_TOKEN:
+	case REQ_OP_COPY_WRITE_TOKEN:
+		if (!q->limits.max_copy_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [RFC PATCH 2/3] nvme: add copy offload support
  2022-02-01 18:31       ` [dm-devel] " Mikulas Patocka
@ 2022-02-01 18:33         ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:33 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

This patch adds copy offload support to the nvme host driver.

The function nvme_setup_read_token stores namespace and location in the
token and the function nvme_setup_write_token retrieves information from
the token and submits the copy command to the device.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/nvme/host/core.c   |   94 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/fc.c     |    5 ++
 drivers/nvme/host/nvme.h   |    1 
 drivers/nvme/host/pci.c    |    5 ++
 drivers/nvme/host/rdma.c   |    5 ++
 drivers/nvme/host/tcp.c    |    5 ++
 drivers/nvme/target/loop.c |    5 ++
 include/linux/nvme.h       |   33 +++++++++++++++
 8 files changed, 153 insertions(+)

Index: linux-2.6/drivers/nvme/host/core.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/core.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/core.c	2022-02-01 18:34:19.000000000 +0100
@@ -975,6 +975,85 @@ static inline blk_status_t nvme_setup_rw
 	return 0;
 }
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
+static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+	return 0;
+}
+
+static inline blk_status_t nvme_setup_write_token(struct nvme_ns *ns,
+		struct request *req, struct nvme_command *cmnd)
+{
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+
+	unsigned n_descriptors, i;
+	struct nvme_copy_desc *descriptors;
+
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+	    unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+	    unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	n_descriptors = (n_lba + 0xffff) / 0x10000;
+	descriptors = kzalloc(n_descriptors * sizeof(struct nvme_copy_desc), GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(!descriptors))
+		return BLK_STS_RESOURCE;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(dst_lba);
+	cmnd->copy.length = n_descriptors - 1;
+
+	for (i = 0; i < n_descriptors; i++) {
+		u64 this_step = min(n_lba, (u64)0x10000);
+		descriptors[i].slba = cpu_to_le64(src_lba);
+		descriptors[i].length = cpu_to_le16(this_step - 1);
+		src_lba += this_step;
+		n_lba -= this_step;
+	}
+
+	req->special_vec.bv_page = virt_to_page(descriptors);
+	req->special_vec.bv_offset = offset_in_page(descriptors);
+	req->special_vec.bv_len = n_descriptors * sizeof(struct nvme_copy_desc);
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	return 0;
+}
+
 void nvme_cleanup_cmd(struct request *req)
 {
 	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
@@ -1032,6 +1111,12 @@ blk_status_t nvme_setup_cmd(struct nvme_
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
 		break;
+	case REQ_OP_COPY_READ_TOKEN:
+		ret = nvme_setup_read_token(ns, req);
+		break;
+	case REQ_OP_COPY_WRITE_TOKEN:
+		ret = nvme_setup_write_token(ns, req, cmd);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return BLK_STS_IOERR;
@@ -1865,6 +1950,8 @@ static void nvme_update_disk_info(struct
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
 
+	blk_queue_max_copy_sectors(disk->queue, ns->ctrl->max_copy_sectors);
+
 	set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
 		test_bit(NVME_NS_FORCE_RO, &ns->flags));
 }
@@ -2891,6 +2978,12 @@ static int nvme_init_non_mdts_limits(str
 	else
 		ctrl->max_zeroes_sectors = 0;
 
+	if (ctrl->oncs & NVME_CTRL_ONCS_COPY) {
+		ctrl->max_copy_sectors = 1U << 24;
+	} else {
+		ctrl->max_copy_sectors = 0;
+	}
+
 	if (nvme_ctrl_limited_cns(ctrl))
 		return 0;
 
@@ -4716,6 +4809,7 @@ static inline void _nvme_check_size(void
 {
 	BUILD_BUG_ON(sizeof(struct nvme_common_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_rw_command) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_identify) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_features) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
Index: linux-2.6/drivers/nvme/host/nvme.h
===================================================================
--- linux-2.6.orig/drivers/nvme/host/nvme.h	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/nvme.h	2022-02-01 18:34:19.000000000 +0100
@@ -277,6 +277,7 @@ struct nvme_ctrl {
 #ifdef CONFIG_BLK_DEV_ZONED
 	u32 max_zone_append;
 #endif
+	u32 max_copy_sectors;
 	u16 crdt[3];
 	u16 oncs;
 	u16 oacs;
Index: linux-2.6/include/linux/nvme.h
===================================================================
--- linux-2.6.orig/include/linux/nvme.h	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/include/linux/nvme.h	2022-02-01 18:34:19.000000000 +0100
@@ -335,6 +335,8 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_VERIFY			= 1 << 7,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
@@ -704,6 +706,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -872,6 +875,35 @@ enum {
 	NVME_RW_DTYPE_STREAMS		= 1 << 4,
 };
 
+struct nvme_copy_command {
+	__u8			opcode;
+	__u8			flags;
+	__u16			command_id;
+	__le32			nsid;
+	__u64			rsvd2;
+	__le64			metadata;
+	union nvme_data_ptr	dptr;
+	__le64			sdlba;
+	__u8			length;
+	__u8			control2;
+	__le16			control;
+	__le32			dspec;
+	__le32			reftag;
+	__le16			apptag;
+	__le16			appmask;
+};
+
+struct nvme_copy_desc {
+	__u64			rsvd;
+	__le64			slba;
+	__le16			length;
+	__u16			rsvd2;
+	__u32			rsvd3;
+	__le32			reftag;
+	__le16			apptag;
+	__le16			appmask;
+};
+
 struct nvme_dsm_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1441,6 +1473,7 @@ struct nvme_command {
 	union {
 		struct nvme_common_command common;
 		struct nvme_rw_command rw;
+		struct nvme_copy_command copy;
 		struct nvme_identify identify;
 		struct nvme_features features;
 		struct nvme_create_cq create_cq;
Index: linux-2.6/drivers/nvme/host/pci.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/pci.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/pci.c	2022-02-01 18:34:19.000000000 +0100
@@ -949,6 +949,11 @@ static blk_status_t nvme_queue_rq(struct
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
 
+	if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	/*
 	 * We should not need to do this, but we're still using this to
 	 * ensure we can drain requests on a dying queue.
Index: linux-2.6/drivers/nvme/host/fc.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/fc.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/fc.c	2022-02-01 18:34:19.000000000 +0100
@@ -2780,6 +2780,11 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *h
 	u32 data_len;
 	blk_status_t ret;
 
+	if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE ||
 	    !nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready))
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq);
Index: linux-2.6/drivers/nvme/host/rdma.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/rdma.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/rdma.c	2022-02-01 18:34:19.000000000 +0100
@@ -2048,6 +2048,11 @@ static blk_status_t nvme_rdma_queue_rq(s
 	blk_status_t ret;
 	int err;
 
+	if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	WARN_ON_ONCE(rq->tag < 0);
 
 	if (!nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready))
Index: linux-2.6/drivers/nvme/host/tcp.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/tcp.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/tcp.c	2022-02-01 18:34:19.000000000 +0100
@@ -2372,6 +2372,11 @@ static blk_status_t nvme_tcp_queue_rq(st
 	bool queue_ready = test_bit(NVME_TCP_Q_LIVE, &queue->flags);
 	blk_status_t ret;
 
+	if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (!nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready))
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq);
 
Index: linux-2.6/drivers/nvme/target/loop.c
===================================================================
--- linux-2.6.orig/drivers/nvme/target/loop.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/target/loop.c	2022-02-01 18:34:19.000000000 +0100
@@ -138,6 +138,11 @@ static blk_status_t nvme_loop_queue_rq(s
 	bool queue_ready = test_bit(NVME_LOOP_Q_LIVE, &queue->flags);
 	blk_status_t ret;
 
+	if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (!nvme_check_ready(&queue->ctrl->ctrl, req, queue_ready))
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, req);
 


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [dm-devel] [RFC PATCH 2/3] nvme: add copy offload support
@ 2022-02-01 18:33         ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:33 UTC (permalink / raw)
  To: Javier González
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

This patch adds copy offload support to the nvme host driver.

The function nvme_setup_read_token stores namespace and location in the
token and the function nvme_setup_write_token retrieves information from
the token and submits the copy command to the device.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/nvme/host/core.c   |   94 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/fc.c     |    5 ++
 drivers/nvme/host/nvme.h   |    1 
 drivers/nvme/host/pci.c    |    5 ++
 drivers/nvme/host/rdma.c   |    5 ++
 drivers/nvme/host/tcp.c    |    5 ++
 drivers/nvme/target/loop.c |    5 ++
 include/linux/nvme.h       |   33 +++++++++++++++
 8 files changed, 153 insertions(+)

Index: linux-2.6/drivers/nvme/host/core.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/core.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/core.c	2022-02-01 18:34:19.000000000 +0100
@@ -975,6 +975,85 @@ static inline blk_status_t nvme_setup_rw
 	return 0;
 }
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
+static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+	return 0;
+}
+
+static inline blk_status_t nvme_setup_write_token(struct nvme_ns *ns,
+		struct request *req, struct nvme_command *cmnd)
+{
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+
+	unsigned n_descriptors, i;
+	struct nvme_copy_desc *descriptors;
+
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+	    unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+	    unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	n_descriptors = (n_lba + 0xffff) / 0x10000;
+	descriptors = kzalloc(n_descriptors * sizeof(struct nvme_copy_desc), GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(!descriptors))
+		return BLK_STS_RESOURCE;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(dst_lba);
+	cmnd->copy.length = n_descriptors - 1;
+
+	for (i = 0; i < n_descriptors; i++) {
+		u64 this_step = min(n_lba, (u64)0x10000);
+		descriptors[i].slba = cpu_to_le64(src_lba);
+		descriptors[i].length = cpu_to_le16(this_step - 1);
+		src_lba += this_step;
+		n_lba -= this_step;
+	}
+
+	req->special_vec.bv_page = virt_to_page(descriptors);
+	req->special_vec.bv_offset = offset_in_page(descriptors);
+	req->special_vec.bv_len = n_descriptors * sizeof(struct nvme_copy_desc);
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	return 0;
+}
+
 void nvme_cleanup_cmd(struct request *req)
 {
 	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
@@ -1032,6 +1111,12 @@ blk_status_t nvme_setup_cmd(struct nvme_
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
 		break;
+	case REQ_OP_COPY_READ_TOKEN:
+		ret = nvme_setup_read_token(ns, req);
+		break;
+	case REQ_OP_COPY_WRITE_TOKEN:
+		ret = nvme_setup_write_token(ns, req, cmd);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return BLK_STS_IOERR;
@@ -1865,6 +1950,8 @@ static void nvme_update_disk_info(struct
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
 
+	blk_queue_max_copy_sectors(disk->queue, ns->ctrl->max_copy_sectors);
+
 	set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
 		test_bit(NVME_NS_FORCE_RO, &ns->flags));
 }
@@ -2891,6 +2978,12 @@ static int nvme_init_non_mdts_limits(str
 	else
 		ctrl->max_zeroes_sectors = 0;
 
+	if (ctrl->oncs & NVME_CTRL_ONCS_COPY) {
+		ctrl->max_copy_sectors = 1U << 24;
+	} else {
+		ctrl->max_copy_sectors = 0;
+	}
+
 	if (nvme_ctrl_limited_cns(ctrl))
 		return 0;
 
@@ -4716,6 +4809,7 @@ static inline void _nvme_check_size(void
 {
 	BUILD_BUG_ON(sizeof(struct nvme_common_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_rw_command) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_identify) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_features) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
Index: linux-2.6/drivers/nvme/host/nvme.h
===================================================================
--- linux-2.6.orig/drivers/nvme/host/nvme.h	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/nvme.h	2022-02-01 18:34:19.000000000 +0100
@@ -277,6 +277,7 @@ struct nvme_ctrl {
 #ifdef CONFIG_BLK_DEV_ZONED
 	u32 max_zone_append;
 #endif
+	u32 max_copy_sectors;
 	u16 crdt[3];
 	u16 oncs;
 	u16 oacs;
Index: linux-2.6/include/linux/nvme.h
===================================================================
--- linux-2.6.orig/include/linux/nvme.h	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/include/linux/nvme.h	2022-02-01 18:34:19.000000000 +0100
@@ -335,6 +335,8 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_VERIFY			= 1 << 7,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
@@ -704,6 +706,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -872,6 +875,35 @@ enum {
 	NVME_RW_DTYPE_STREAMS		= 1 << 4,
 };
 
+struct nvme_copy_command {
+	__u8			opcode;
+	__u8			flags;
+	__u16			command_id;
+	__le32			nsid;
+	__u64			rsvd2;
+	__le64			metadata;
+	union nvme_data_ptr	dptr;
+	__le64			sdlba;
+	__u8			length;
+	__u8			control2;
+	__le16			control;
+	__le32			dspec;
+	__le32			reftag;
+	__le16			apptag;
+	__le16			appmask;
+};
+
+struct nvme_copy_desc {
+	__u64			rsvd;
+	__le64			slba;
+	__le16			length;
+	__u16			rsvd2;
+	__u32			rsvd3;
+	__le32			reftag;
+	__le16			apptag;
+	__le16			appmask;
+};
+
 struct nvme_dsm_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1441,6 +1473,7 @@ struct nvme_command {
 	union {
 		struct nvme_common_command common;
 		struct nvme_rw_command rw;
+		struct nvme_copy_command copy;
 		struct nvme_identify identify;
 		struct nvme_features features;
 		struct nvme_create_cq create_cq;
Index: linux-2.6/drivers/nvme/host/pci.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/pci.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/pci.c	2022-02-01 18:34:19.000000000 +0100
@@ -949,6 +949,11 @@ static blk_status_t nvme_queue_rq(struct
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
 
+	if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	/*
 	 * We should not need to do this, but we're still using this to
 	 * ensure we can drain requests on a dying queue.
Index: linux-2.6/drivers/nvme/host/fc.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/fc.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/fc.c	2022-02-01 18:34:19.000000000 +0100
@@ -2780,6 +2780,11 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *h
 	u32 data_len;
 	blk_status_t ret;
 
+	if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE ||
 	    !nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready))
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq);
Index: linux-2.6/drivers/nvme/host/rdma.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/rdma.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/rdma.c	2022-02-01 18:34:19.000000000 +0100
@@ -2048,6 +2048,11 @@ static blk_status_t nvme_rdma_queue_rq(s
 	blk_status_t ret;
 	int err;
 
+	if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	WARN_ON_ONCE(rq->tag < 0);
 
 	if (!nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready))
Index: linux-2.6/drivers/nvme/host/tcp.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/tcp.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/host/tcp.c	2022-02-01 18:34:19.000000000 +0100
@@ -2372,6 +2372,11 @@ static blk_status_t nvme_tcp_queue_rq(st
 	bool queue_ready = test_bit(NVME_TCP_Q_LIVE, &queue->flags);
 	blk_status_t ret;
 
+	if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(rq, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (!nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready))
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq);
 
Index: linux-2.6/drivers/nvme/target/loop.c
===================================================================
--- linux-2.6.orig/drivers/nvme/target/loop.c	2022-02-01 18:34:19.000000000 +0100
+++ linux-2.6/drivers/nvme/target/loop.c	2022-02-01 18:34:19.000000000 +0100
@@ -138,6 +138,11 @@ static blk_status_t nvme_loop_queue_rq(s
 	bool queue_ready = test_bit(NVME_LOOP_Q_LIVE, &queue->flags);
 	blk_status_t ret;
 
+	if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (!nvme_check_ready(&queue->ctrl->ctrl, req, queue_ready))
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, req);
 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-01 18:31       ` [dm-devel] " Mikulas Patocka
@ 2022-02-01 18:33         ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:33 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

This patch adds a new driver "nvme-debug". It uses memory as a backing
store and it is used to test the copy offload functionality.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/nvme/host/Kconfig      |   13 
 drivers/nvme/host/Makefile     |    1 
 drivers/nvme/host/nvme-debug.c |  838 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 852 insertions(+)

Index: linux-2.6/drivers/nvme/host/Kconfig
===================================================================
--- linux-2.6.orig/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
+++ linux-2.6/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
@@ -83,3 +83,16 @@ config NVME_TCP
 	  from https://github.com/linux-nvme/nvme-cli.
 
 	  If unsure, say N.
+
+config NVME_DEBUG
+	tristate "NVM Express debug"
+	depends on INET
+	depends on BLOCK
+	select NVME_CORE
+	select NVME_FABRICS
+	select CRYPTO
+	select CRYPTO_CRC32C
+	help
+	  This pseudo driver simulates a NVMe adapter.
+
+	  If unsure, say N.
Index: linux-2.6/drivers/nvme/host/Makefile
===================================================================
--- linux-2.6.orig/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
+++ linux-2.6/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabr
 obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
+obj-$(CONFIG_NVME_DEBUG)		+= nvme-debug.o
 
 nvme-core-y				:= core.o ioctl.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
Index: linux-2.6/drivers/nvme/host/nvme-debug.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/drivers/nvme/host/nvme-debug.c	2022-02-01 18:34:22.000000000 +0100
@@ -0,0 +1,838 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe debug
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/blk-mq.h>
+#include <linux/sort.h>
+#include <linux/version.h>
+
+#include "nvme.h"
+#include "fabrics.h"
+
+static ulong dev_size_mb = 16;
+module_param_named(dev_size_mb, dev_size_mb, ulong, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(dev_size_mb, "size in MiB of the namespace(def=8)");
+
+static unsigned sector_size = 512;
+module_param_named(sector_size, sector_size, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)");
+
+struct nvme_debug_ctrl {
+	struct nvme_ctrl ctrl;
+	uint32_t reg_cc;
+	struct blk_mq_tag_set admin_tag_set;
+	struct blk_mq_tag_set io_tag_set;
+	struct workqueue_struct *admin_wq;
+	struct workqueue_struct *io_wq;
+	struct list_head namespaces;
+	struct list_head list;
+};
+
+struct nvme_debug_namespace {
+	struct list_head list;
+	uint32_t nsid;
+	unsigned char sector_size_bits;
+	size_t n_sectors;
+	void *space;
+	char uuid[16];
+};
+
+struct nvme_debug_request {
+	struct nvme_request req;
+	struct nvme_command cmd;
+	struct work_struct work;
+};
+
+static LIST_HEAD(nvme_debug_ctrl_list);
+static DEFINE_MUTEX(nvme_debug_ctrl_mutex);
+
+DEFINE_STATIC_PERCPU_RWSEM(nvme_debug_sem);
+
+static inline struct nvme_debug_ctrl *to_debug_ctrl(struct nvme_ctrl *nctrl)
+{
+	return container_of(nctrl, struct nvme_debug_ctrl, ctrl);
+}
+
+static struct nvme_debug_namespace *nvme_debug_find_namespace(struct nvme_debug_ctrl *ctrl, unsigned nsid)
+{
+	struct nvme_debug_namespace *ns;
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->nsid == nsid)
+			return ns;
+	}
+	return NULL;
+}
+
+static bool nvme_debug_alloc_namespace(struct nvme_debug_ctrl *ctrl)
+{
+	struct nvme_debug_namespace *ns;
+	unsigned s;
+	size_t dsm;
+
+	ns = kmalloc(sizeof(struct nvme_debug_namespace), GFP_KERNEL);
+	if (!ns)
+		goto fail0;
+
+	ns->nsid = 1;
+	while (nvme_debug_find_namespace(ctrl, ns->nsid))
+		ns->nsid++;
+
+	s = READ_ONCE(sector_size);
+	if (s < 512 || s > PAGE_SIZE || !is_power_of_2(s))
+		goto fail1;
+	ns->sector_size_bits = __ffs(s);
+	dsm = READ_ONCE(dev_size_mb);
+	ns->n_sectors = dsm << (20 - ns->sector_size_bits);
+	if (ns->n_sectors >> (20 - ns->sector_size_bits) != dsm)
+		goto fail1;
+
+	if (ns->n_sectors << ns->sector_size_bits >> ns->sector_size_bits != ns->n_sectors)
+		goto fail1;
+
+	ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
+	if (!ns->space)
+		goto fail1;
+
+	generate_random_uuid(ns->uuid);
+
+	list_add(&ns->list, &ctrl->namespaces);
+	return true;
+
+fail1:
+	kfree(ns);
+fail0:
+	return false;
+}
+
+static void nvme_debug_free_namespace(struct nvme_debug_namespace *ns)
+{
+	vfree(ns->space);
+	list_del(&ns->list);
+	kfree(ns);
+}
+
+static int nvme_debug_reg_read32(struct nvme_ctrl *nctrl, u32 off, u32 *val)
+{
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
+	switch (off) {
+		case NVME_REG_VS: {
+			*val = 0x20000;
+			break;
+		}
+		case NVME_REG_CC: {
+			*val = ctrl->reg_cc;
+			break;
+		}
+		case NVME_REG_CSTS: {
+			*val = 0;
+			if (ctrl->reg_cc & NVME_CC_ENABLE)
+				*val |= NVME_CSTS_RDY;
+			if (ctrl->reg_cc & NVME_CC_SHN_MASK)
+				*val |= NVME_CSTS_SHST_CMPLT;
+			break;
+		}
+		default: {
+			printk("nvme_debug_reg_read32: %x\n", off);
+			return -ENOSYS;
+		}
+	}
+	return 0;
+}
+
+int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
+{
+	switch (off) {
+		case NVME_REG_CAP: {
+			*val = (1ULL << 0) | (1ULL << 37);
+			break;
+		}
+		default: {
+			printk("nvme_debug_reg_read64: %x\n", off);
+			return -ENOSYS;
+		}
+	}
+	return 0;
+}
+
+int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
+{
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
+	switch (off) {
+		case NVME_REG_CC: {
+			ctrl->reg_cc = val;
+			break;
+		}
+		default: {
+			printk("nvme_debug_reg_write32: %x, %x\n", off, val);
+			return -ENOSYS;
+		}
+	}
+	return 0;
+}
+
+static void nvme_debug_submit_async_event(struct nvme_ctrl *nctrl)
+{
+	printk("nvme_debug_submit_async_event\n");
+}
+
+static void nvme_debug_delete_ctrl(struct nvme_ctrl *nctrl)
+{
+	nvme_shutdown_ctrl(nctrl);
+}
+
+static void nvme_debug_free_namespaces(struct nvme_debug_ctrl *ctrl)
+{
+	if (!list_empty(&ctrl->namespaces)) {
+		struct nvme_debug_namespace *ns = container_of(ctrl->namespaces.next, struct nvme_debug_namespace, list);
+		nvme_debug_free_namespace(ns);
+	}
+}
+
+static void nvme_debug_free_ctrl(struct nvme_ctrl *nctrl)
+{
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
+
+	flush_workqueue(ctrl->admin_wq);
+	flush_workqueue(ctrl->io_wq);
+
+	nvme_debug_free_namespaces(ctrl);
+
+	if (list_empty(&ctrl->list))
+		goto free_ctrl;
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_del(&ctrl->list);
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+
+	nvmf_free_options(nctrl->opts);
+free_ctrl:
+	destroy_workqueue(ctrl->admin_wq);
+	destroy_workqueue(ctrl->io_wq);
+	kfree(ctrl);
+}
+
+static int nvme_debug_get_address(struct nvme_ctrl *nctrl, char *buf, int size)
+{
+	int len = 0;
+	len += snprintf(buf, size, "debug");
+	return len;
+}
+
+static void nvme_debug_reset_ctrl_work(struct work_struct *work)
+{
+	printk("nvme_reset_ctrl_work\n");
+}
+
+static void copy_data_request(struct request *req, void *data, size_t size, bool to_req)
+{
+	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
+		void *addr;
+		struct bio_vec bv = req->special_vec;
+		addr = kmap_atomic(bv.bv_page);
+		if (to_req) {
+			memcpy(addr + bv.bv_offset, data, bv.bv_len);
+			flush_dcache_page(bv.bv_page);
+		} else {
+			flush_dcache_page(bv.bv_page);
+			memcpy(data, addr + bv.bv_offset, bv.bv_len);
+		}
+		kunmap_atomic(addr);
+		data += bv.bv_len;
+		size -= bv.bv_len;
+	} else {
+		struct req_iterator bi;
+		struct bio_vec bv;
+		rq_for_each_segment(bv, req, bi) {
+			void *addr;
+			addr = kmap_atomic(bv.bv_page);
+			if (to_req) {
+				memcpy(addr + bv.bv_offset, data, bv.bv_len);
+				flush_dcache_page(bv.bv_page);
+			} else {
+				flush_dcache_page(bv.bv_page);
+				memcpy(data, addr + bv.bv_offset, bv.bv_len);
+			}
+			kunmap_atomic(addr);
+			data += bv.bv_len;
+			size -= bv.bv_len;
+		}
+	}
+	if (size)
+		printk("size mismatch: %lx\n", (unsigned long)size);
+}
+
+static void nvme_debug_identify_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_id_ns *id;
+	struct nvme_debug_namespace *ns;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+
+	id = kzalloc(sizeof(struct nvme_id_ns), GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
+	if (!ns) {
+		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
+		goto free_ret;
+	}
+
+	id->nsze = cpu_to_le64(ns->n_sectors);
+	id->ncap = id->nsze;
+	id->nuse = id->nsze;
+	/*id->nlbaf = 0;*/
+	id->dlfeat = 0x01;
+	id->lbaf[0].ds = ns->sector_size_bits;
+
+	copy_data_request(req, id, sizeof(struct nvme_id_ns), true);
+
+free_ret:
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_identify_ctrl(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_debug_namespace *ns;
+	struct nvme_id_ctrl *id;
+	char ver[9];
+	size_t ver_len;
+
+	id = kzalloc(sizeof(struct nvme_id_ctrl), GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	id->vid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
+	id->ssvid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
+	memset(id->sn, ' ', sizeof id->sn);
+	memset(id->mn, ' ', sizeof id->mn);
+	memcpy(id->mn, "nvme-debug", 10);
+	snprintf(ver, sizeof ver, "%X", LINUX_VERSION_CODE);
+	ver_len = min(strlen(ver), sizeof id->fr);
+	memset(id->fr, ' ', sizeof id->fr);
+	memcpy(id->fr, ver, ver_len);
+	memcpy(id->ieee, "\xe9\xf2\x40", sizeof id->ieee);
+	id->ver = cpu_to_le32(0x20000);
+	id->kas = cpu_to_le16(100);
+	id->sqes = 0x66;
+	id->cqes = 0x44;
+	id->maxcmd = cpu_to_le16(1);
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->nsid > le32_to_cpu(id->nn))
+			id->nn = cpu_to_le32(ns->nsid);
+	}
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_COPY);
+	id->vwc = 0x6;
+	id->mnan = cpu_to_le32(0xffffffff);
+	strcpy(id->subnqn, "nqn.2021-09.com.redhat:nvme-debug");
+	id->ioccsz = cpu_to_le32(4);
+	id->iorcsz = cpu_to_le32(1);
+
+	copy_data_request(req, id, sizeof(struct nvme_id_ctrl), true);
+
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static int cmp_ns(const void *a1, const void *a2)
+{
+	__u32 v1 = le32_to_cpu(*(__u32 *)a1);
+	__u32 v2 = le32_to_cpu(*(__u32 *)a2);
+	if (!v1)
+		v1 = 0xffffffffU;
+	if (!v2)
+		v2 = 0xffffffffU;
+	if (v1 < v2)
+		return -1;
+	if (v1 > v2)
+		return 1;
+	return 0;
+}
+
+static void nvme_debug_identify_active_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_debug_namespace *ns;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	unsigned size;
+	__u32 *id;
+	unsigned idp;
+
+	if (le32_to_cpu(ndr->cmd.identify.nsid) >= 0xFFFFFFFE) {
+		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
+		blk_mq_end_request(req, BLK_STS_OK);
+		return;
+	}
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	size = 0;
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		size++;
+	}
+	size = min(size, 1024U);
+
+	id = kzalloc(sizeof(__u32) * size, GFP_NOIO);
+	if (!id) {
+		mutex_unlock(&nvme_debug_ctrl_mutex);
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	idp = 0;
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->nsid > le32_to_cpu(ndr->cmd.identify.nsid))
+			id[idp++] = cpu_to_le32(ns->nsid);
+	}
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+	sort(id, idp, sizeof(__u32), cmp_ns, NULL);
+
+	copy_data_request(req, id, sizeof(__u32) * 1024, true);
+
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_identify_ns_desc_list(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_ns_id_desc *id;
+	struct nvme_debug_namespace *ns;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	id = kzalloc(4096, GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
+	if (!ns) {
+		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
+		goto free_ret;
+	}
+
+	id->nidt = NVME_NIDT_UUID;
+	id->nidl = NVME_NIDT_UUID_LEN;
+	memcpy((char *)(id + 1), ns->uuid, NVME_NIDT_UUID_LEN);
+
+	copy_data_request(req, id, 4096, true);
+
+free_ret:
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_identify_ctrl_cs(struct request *req)
+{
+	struct nvme_id_ctrl_nvm *id;
+	id = kzalloc(sizeof(struct nvme_id_ctrl_nvm), GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	copy_data_request(req, id, sizeof(struct nvme_id_ctrl_nvm), true);
+
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_admin_rq(struct work_struct *w)
+{
+	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
+	struct request *req = (struct request *)ndr - 1;
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
+
+	switch (ndr->cmd.common.opcode) {
+		case nvme_admin_identify: {
+			switch (ndr->cmd.identify.cns) {
+				case NVME_ID_CNS_NS: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ns(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				};
+				case NVME_ID_CNS_CTRL: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ctrl(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				case NVME_ID_CNS_NS_ACTIVE_LIST: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_active_ns(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				case NVME_ID_CNS_NS_DESC_LIST: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ns_desc_list(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				case NVME_ID_CNS_CS_CTRL: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ctrl_cs(req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				default: {
+					printk("nvme_admin_identify: %x\n", ndr->cmd.identify.cns);
+					break;
+				}
+			}
+			break;
+		}
+		default: {
+			printk("nvme_debug_admin_rq: %x\n", ndr->cmd.common.opcode);
+			break;
+		}
+	}
+	blk_mq_end_request(req, BLK_STS_NOTSUPP);
+}
+
+static void nvme_debug_rw(struct nvme_debug_namespace *ns, struct request *req)
+{
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	__u64 lba = cpu_to_le64(ndr->cmd.rw.slba);
+	__u64 len = (__u64)cpu_to_le16(ndr->cmd.rw.length) + 1;
+	void *addr;
+	if (unlikely(lba + len < lba) || unlikely(lba + len > ns->n_sectors)) {
+		blk_mq_end_request(req, BLK_STS_NOTSUPP);
+		return;
+	}
+	addr = ns->space + (lba << ns->sector_size_bits);
+	copy_data_request(req, addr, len << ns->sector_size_bits, ndr->cmd.rw.opcode == nvme_cmd_read);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_copy(struct nvme_debug_namespace *ns, struct request *req)
+{
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	__u64 dlba = cpu_to_le64(ndr->cmd.copy.sdlba);
+	unsigned n_descs = ndr->cmd.copy.length + 1;
+	struct nvme_copy_desc *descs;
+	unsigned i, ret;
+
+	descs = kmalloc(sizeof(struct nvme_copy_desc) * n_descs, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
+	if (!descs) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	copy_data_request(req, descs, sizeof(struct nvme_copy_desc) * n_descs, false);
+
+	for (i = 0; i < n_descs; i++) {
+		struct nvme_copy_desc *desc = &descs[i];
+		__u64 slba = cpu_to_le64(desc->slba);
+		__u64 len = (__u64)cpu_to_le16(desc->length) + 1;
+		void *saddr, *daddr;
+
+		if (unlikely(slba + len < slba) || unlikely(slba + len > ns->n_sectors) ||
+		    unlikely(dlba + len < dlba) || unlikely(dlba + len > ns->n_sectors)) {
+			ret = BLK_STS_NOTSUPP;
+			goto free_ret;
+		}
+
+		saddr = ns->space + (slba << ns->sector_size_bits);
+		daddr = ns->space + (dlba << ns->sector_size_bits);
+
+		memcpy(daddr, saddr, len << ns->sector_size_bits);
+
+		dlba += len;
+	}
+
+	ret = BLK_STS_OK;
+
+free_ret:
+	kfree(descs);
+
+	blk_mq_end_request(req, ret);
+}
+
+static void nvme_debug_io_rq(struct work_struct *w)
+{
+	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
+	struct request *req = (struct request *)ndr - 1;
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
+	__u32 nsid = le32_to_cpu(ndr->cmd.common.nsid);
+	struct nvme_debug_namespace *ns;
+
+	percpu_down_read(&nvme_debug_sem);
+	ns = nvme_debug_find_namespace(ctrl, nsid);
+	if (unlikely(!ns))
+		goto ret_notsupp;
+
+	switch (ndr->cmd.common.opcode) {
+		case nvme_cmd_flush: {
+			blk_mq_end_request(req, BLK_STS_OK);
+			goto ret;
+		}
+		case nvme_cmd_read:
+		case nvme_cmd_write: {
+			nvme_debug_rw(ns, req);
+			goto ret;
+		}
+		case nvme_cmd_copy: {
+			nvme_debug_copy(ns, req);
+			goto ret;
+		}
+		default: {
+			printk("nvme_debug_io_rq: %x\n", ndr->cmd.common.opcode);
+			break;
+		}
+	}
+ret_notsupp:
+	blk_mq_end_request(req, BLK_STS_NOTSUPP);
+ret:
+	percpu_up_read(&nvme_debug_sem);
+}
+
+static blk_status_t nvme_debug_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd)
+{
+	struct request *req = bd->rq;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
+	struct nvme_ns *ns = hctx->queue->queuedata;
+	blk_status_t r;
+
+	r = nvme_setup_cmd(ns, req);
+	if (unlikely(r))
+		return r;
+
+	if (!ns) {
+		INIT_WORK(&ndr->work, nvme_debug_admin_rq);
+		queue_work(ctrl->admin_wq, &ndr->work);
+		return BLK_STS_OK;
+	} else if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	} else {
+		INIT_WORK(&ndr->work, nvme_debug_io_rq);
+		queue_work(ctrl->io_wq, &ndr->work);
+		return BLK_STS_OK;
+	}
+}
+
+static int nvme_debug_init_request(struct blk_mq_tag_set *set, struct request *req, unsigned hctx_idx, unsigned numa_node)
+{
+	struct nvme_debug_ctrl *ctrl = set->driver_data;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	nvme_req(req)->ctrl = &ctrl->ctrl;
+	nvme_req(req)->cmd = &ndr->cmd;
+	return 0;
+}
+
+static int nvme_debug_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned hctx_idx)
+{
+	struct nvme_debug_ctrl *ctrl = data;
+	hctx->driver_data = ctrl;
+	return 0;
+}
+
+static const struct blk_mq_ops nvme_debug_mq_ops = {
+	.queue_rq = nvme_debug_queue_rq,
+	.init_request = nvme_debug_init_request,
+	.init_hctx = nvme_debug_init_hctx,
+};
+
+static int nvme_debug_configure_admin_queue(struct nvme_debug_ctrl *ctrl)
+{
+	int r;
+
+	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
+	ctrl->admin_tag_set.ops = &nvme_debug_mq_ops;
+	ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+	ctrl->admin_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
+	ctrl->admin_tag_set.numa_node = ctrl->ctrl.numa_node;
+	ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_debug_request);
+	ctrl->admin_tag_set.driver_data = ctrl;
+	ctrl->admin_tag_set.nr_hw_queues = 1;
+	ctrl->admin_tag_set.timeout = NVME_ADMIN_TIMEOUT;
+	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
+
+	r = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
+	if (r)
+		goto ret0;
+	ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
+
+	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_q)) {
+		r = PTR_ERR(ctrl->ctrl.admin_q);
+		goto ret1;
+	}
+
+	r = nvme_enable_ctrl(&ctrl->ctrl);
+	if (r)
+		goto ret2;
+
+	nvme_start_admin_queue(&ctrl->ctrl);
+
+	r = nvme_init_ctrl_finish(&ctrl->ctrl);
+	if (r)
+		goto ret3;
+
+	return 0;
+
+ret3:
+	nvme_stop_admin_queue(&ctrl->ctrl);
+	blk_sync_queue(ctrl->ctrl.admin_q);
+ret2:
+	blk_cleanup_queue(ctrl->ctrl.admin_q);
+ret1:
+	blk_mq_free_tag_set(&ctrl->admin_tag_set);
+ret0:
+	return r;
+}
+
+static int nvme_debug_configure_io_queue(struct nvme_debug_ctrl *ctrl)
+{
+	int r;
+
+	memset(&ctrl->io_tag_set, 0, sizeof(ctrl->io_tag_set));
+	ctrl->io_tag_set.ops = &nvme_debug_mq_ops;
+	ctrl->io_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+	ctrl->io_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
+	ctrl->io_tag_set.numa_node = ctrl->ctrl.numa_node;
+	ctrl->io_tag_set.cmd_size = sizeof(struct nvme_debug_request);
+	ctrl->io_tag_set.driver_data = ctrl;
+	ctrl->io_tag_set.nr_hw_queues = 1;
+	ctrl->io_tag_set.timeout = NVME_ADMIN_TIMEOUT;
+	ctrl->io_tag_set.flags = BLK_MQ_F_NO_SCHED;
+
+	r = blk_mq_alloc_tag_set(&ctrl->io_tag_set);
+	if (r)
+		goto ret0;
+	ctrl->ctrl.tagset = &ctrl->io_tag_set;
+	return 0;
+
+ret0:
+	return r;
+}
+
+static const struct nvme_ctrl_ops nvme_debug_ctrl_ops = {
+	.name = "debug",
+	.module = THIS_MODULE,
+	.flags = NVME_F_FABRICS,
+	.reg_read32 = nvme_debug_reg_read32,
+	.reg_read64 = nvme_debug_reg_read64,
+	.reg_write32 = nvme_debug_reg_write32,
+	.free_ctrl = nvme_debug_free_ctrl,
+	.submit_async_event = nvme_debug_submit_async_event,
+	.delete_ctrl = nvme_debug_delete_ctrl,
+	.get_address = nvme_debug_get_address,
+};
+
+static struct nvme_ctrl *nvme_debug_create_ctrl(struct device *dev,
+		struct nvmf_ctrl_options *opts)
+{
+	int r;
+	struct nvme_debug_ctrl *ctrl;
+
+	ctrl = kzalloc(sizeof(struct nvme_debug_ctrl), GFP_KERNEL);
+	if (!ctrl) {
+		r = -ENOMEM;
+		goto ret0;
+	}
+
+	INIT_LIST_HEAD(&ctrl->list);
+	INIT_LIST_HEAD(&ctrl->namespaces);
+	ctrl->ctrl.opts = opts;
+	ctrl->ctrl.queue_count = 2;
+	INIT_WORK(&ctrl->ctrl.reset_work, nvme_debug_reset_ctrl_work);
+
+	ctrl->admin_wq = alloc_workqueue("nvme-debug-admin", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+	if (!ctrl->admin_wq)
+		goto ret1;
+
+	ctrl->io_wq = alloc_workqueue("nvme-debug-io", WQ_MEM_RECLAIM, 0);
+	if (!ctrl->io_wq)
+		goto ret1;
+
+	if (!nvme_debug_alloc_namespace(ctrl)) {
+		r = -ENOMEM;
+		goto ret1;
+	}
+
+	r = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_debug_ctrl_ops, 0);
+	if (r)
+		goto ret1;
+
+        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
+		goto ret2;
+
+	r = nvme_debug_configure_admin_queue(ctrl);
+	if (r)
+		goto ret2;
+
+	r = nvme_debug_configure_io_queue(ctrl);
+	if (r)
+		goto ret2;
+
+        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE))
+		goto ret2;
+
+	nvme_start_ctrl(&ctrl->ctrl);
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_add_tail(&ctrl->list, &nvme_debug_ctrl_list);
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+
+	return &ctrl->ctrl;
+
+ret2:
+	nvme_uninit_ctrl(&ctrl->ctrl);
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ERR_PTR(r);
+ret1:
+	nvme_debug_free_namespaces(ctrl);
+	if (ctrl->admin_wq)
+		destroy_workqueue(ctrl->admin_wq);
+	if (ctrl->io_wq)
+		destroy_workqueue(ctrl->io_wq);
+	kfree(ctrl);
+ret0:
+	return ERR_PTR(r);
+}
+
+static struct nvmf_transport_ops nvme_debug_transport = {
+	.name		= "debug",
+	.module		= THIS_MODULE,
+	.required_opts	= NVMF_OPT_TRADDR,
+	.allowed_opts	= NVMF_OPT_CTRL_LOSS_TMO,
+	.create_ctrl	= nvme_debug_create_ctrl,
+};
+
+static int __init nvme_debug_init_module(void)
+{
+	nvmf_register_transport(&nvme_debug_transport);
+	return 0;
+}
+
+static void __exit nvme_debug_cleanup_module(void)
+{
+	struct nvme_debug_ctrl *ctrl;
+
+	nvmf_unregister_transport(&nvme_debug_transport);
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_debug_ctrl_list, list) {
+		nvme_delete_ctrl(&ctrl->ctrl);
+	}
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+	flush_workqueue(nvme_delete_wq);
+}
+
+module_init(nvme_debug_init_module);
+module_exit(nvme_debug_cleanup_module);
+
+MODULE_LICENSE("GPL v2");


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-01 18:33         ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 18:33 UTC (permalink / raw)
  To: Javier González
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

This patch adds a new driver "nvme-debug". It uses memory as a backing
store and it is used to test the copy offload functionality.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/nvme/host/Kconfig      |   13 
 drivers/nvme/host/Makefile     |    1 
 drivers/nvme/host/nvme-debug.c |  838 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 852 insertions(+)

Index: linux-2.6/drivers/nvme/host/Kconfig
===================================================================
--- linux-2.6.orig/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
+++ linux-2.6/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
@@ -83,3 +83,16 @@ config NVME_TCP
 	  from https://github.com/linux-nvme/nvme-cli.
 
 	  If unsure, say N.
+
+config NVME_DEBUG
+	tristate "NVM Express debug"
+	depends on INET
+	depends on BLOCK
+	select NVME_CORE
+	select NVME_FABRICS
+	select CRYPTO
+	select CRYPTO_CRC32C
+	help
+	  This pseudo driver simulates a NVMe adapter.
+
+	  If unsure, say N.
Index: linux-2.6/drivers/nvme/host/Makefile
===================================================================
--- linux-2.6.orig/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
+++ linux-2.6/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabr
 obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
+obj-$(CONFIG_NVME_DEBUG)		+= nvme-debug.o
 
 nvme-core-y				:= core.o ioctl.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
Index: linux-2.6/drivers/nvme/host/nvme-debug.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/drivers/nvme/host/nvme-debug.c	2022-02-01 18:34:22.000000000 +0100
@@ -0,0 +1,838 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe debug
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/blk-mq.h>
+#include <linux/sort.h>
+#include <linux/version.h>
+
+#include "nvme.h"
+#include "fabrics.h"
+
+static ulong dev_size_mb = 16;
+module_param_named(dev_size_mb, dev_size_mb, ulong, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(dev_size_mb, "size in MiB of the namespace(def=8)");
+
+static unsigned sector_size = 512;
+module_param_named(sector_size, sector_size, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)");
+
+struct nvme_debug_ctrl {
+	struct nvme_ctrl ctrl;
+	uint32_t reg_cc;
+	struct blk_mq_tag_set admin_tag_set;
+	struct blk_mq_tag_set io_tag_set;
+	struct workqueue_struct *admin_wq;
+	struct workqueue_struct *io_wq;
+	struct list_head namespaces;
+	struct list_head list;
+};
+
+struct nvme_debug_namespace {
+	struct list_head list;
+	uint32_t nsid;
+	unsigned char sector_size_bits;
+	size_t n_sectors;
+	void *space;
+	char uuid[16];
+};
+
+struct nvme_debug_request {
+	struct nvme_request req;
+	struct nvme_command cmd;
+	struct work_struct work;
+};
+
+static LIST_HEAD(nvme_debug_ctrl_list);
+static DEFINE_MUTEX(nvme_debug_ctrl_mutex);
+
+DEFINE_STATIC_PERCPU_RWSEM(nvme_debug_sem);
+
+static inline struct nvme_debug_ctrl *to_debug_ctrl(struct nvme_ctrl *nctrl)
+{
+	return container_of(nctrl, struct nvme_debug_ctrl, ctrl);
+}
+
+static struct nvme_debug_namespace *nvme_debug_find_namespace(struct nvme_debug_ctrl *ctrl, unsigned nsid)
+{
+	struct nvme_debug_namespace *ns;
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->nsid == nsid)
+			return ns;
+	}
+	return NULL;
+}
+
+static bool nvme_debug_alloc_namespace(struct nvme_debug_ctrl *ctrl)
+{
+	struct nvme_debug_namespace *ns;
+	unsigned s;
+	size_t dsm;
+
+	ns = kmalloc(sizeof(struct nvme_debug_namespace), GFP_KERNEL);
+	if (!ns)
+		goto fail0;
+
+	ns->nsid = 1;
+	while (nvme_debug_find_namespace(ctrl, ns->nsid))
+		ns->nsid++;
+
+	s = READ_ONCE(sector_size);
+	if (s < 512 || s > PAGE_SIZE || !is_power_of_2(s))
+		goto fail1;
+	ns->sector_size_bits = __ffs(s);
+	dsm = READ_ONCE(dev_size_mb);
+	ns->n_sectors = dsm << (20 - ns->sector_size_bits);
+	if (ns->n_sectors >> (20 - ns->sector_size_bits) != dsm)
+		goto fail1;
+
+	if (ns->n_sectors << ns->sector_size_bits >> ns->sector_size_bits != ns->n_sectors)
+		goto fail1;
+
+	ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
+	if (!ns->space)
+		goto fail1;
+
+	generate_random_uuid(ns->uuid);
+
+	list_add(&ns->list, &ctrl->namespaces);
+	return true;
+
+fail1:
+	kfree(ns);
+fail0:
+	return false;
+}
+
+static void nvme_debug_free_namespace(struct nvme_debug_namespace *ns)
+{
+	vfree(ns->space);
+	list_del(&ns->list);
+	kfree(ns);
+}
+
+static int nvme_debug_reg_read32(struct nvme_ctrl *nctrl, u32 off, u32 *val)
+{
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
+	switch (off) {
+		case NVME_REG_VS: {
+			*val = 0x20000;
+			break;
+		}
+		case NVME_REG_CC: {
+			*val = ctrl->reg_cc;
+			break;
+		}
+		case NVME_REG_CSTS: {
+			*val = 0;
+			if (ctrl->reg_cc & NVME_CC_ENABLE)
+				*val |= NVME_CSTS_RDY;
+			if (ctrl->reg_cc & NVME_CC_SHN_MASK)
+				*val |= NVME_CSTS_SHST_CMPLT;
+			break;
+		}
+		default: {
+			printk("nvme_debug_reg_read32: %x\n", off);
+			return -ENOSYS;
+		}
+	}
+	return 0;
+}
+
+int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
+{
+	switch (off) {
+		case NVME_REG_CAP: {
+			*val = (1ULL << 0) | (1ULL << 37);
+			break;
+		}
+		default: {
+			printk("nvme_debug_reg_read64: %x\n", off);
+			return -ENOSYS;
+		}
+	}
+	return 0;
+}
+
+int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
+{
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
+	switch (off) {
+		case NVME_REG_CC: {
+			ctrl->reg_cc = val;
+			break;
+		}
+		default: {
+			printk("nvme_debug_reg_write32: %x, %x\n", off, val);
+			return -ENOSYS;
+		}
+	}
+	return 0;
+}
+
+static void nvme_debug_submit_async_event(struct nvme_ctrl *nctrl)
+{
+	printk("nvme_debug_submit_async_event\n");
+}
+
+static void nvme_debug_delete_ctrl(struct nvme_ctrl *nctrl)
+{
+	nvme_shutdown_ctrl(nctrl);
+}
+
+static void nvme_debug_free_namespaces(struct nvme_debug_ctrl *ctrl)
+{
+	if (!list_empty(&ctrl->namespaces)) {
+		struct nvme_debug_namespace *ns = container_of(ctrl->namespaces.next, struct nvme_debug_namespace, list);
+		nvme_debug_free_namespace(ns);
+	}
+}
+
+static void nvme_debug_free_ctrl(struct nvme_ctrl *nctrl)
+{
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
+
+	flush_workqueue(ctrl->admin_wq);
+	flush_workqueue(ctrl->io_wq);
+
+	nvme_debug_free_namespaces(ctrl);
+
+	if (list_empty(&ctrl->list))
+		goto free_ctrl;
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_del(&ctrl->list);
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+
+	nvmf_free_options(nctrl->opts);
+free_ctrl:
+	destroy_workqueue(ctrl->admin_wq);
+	destroy_workqueue(ctrl->io_wq);
+	kfree(ctrl);
+}
+
+static int nvme_debug_get_address(struct nvme_ctrl *nctrl, char *buf, int size)
+{
+	int len = 0;
+	len += snprintf(buf, size, "debug");
+	return len;
+}
+
+static void nvme_debug_reset_ctrl_work(struct work_struct *work)
+{
+	printk("nvme_reset_ctrl_work\n");
+}
+
+static void copy_data_request(struct request *req, void *data, size_t size, bool to_req)
+{
+	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
+		void *addr;
+		struct bio_vec bv = req->special_vec;
+		addr = kmap_atomic(bv.bv_page);
+		if (to_req) {
+			memcpy(addr + bv.bv_offset, data, bv.bv_len);
+			flush_dcache_page(bv.bv_page);
+		} else {
+			flush_dcache_page(bv.bv_page);
+			memcpy(data, addr + bv.bv_offset, bv.bv_len);
+		}
+		kunmap_atomic(addr);
+		data += bv.bv_len;
+		size -= bv.bv_len;
+	} else {
+		struct req_iterator bi;
+		struct bio_vec bv;
+		rq_for_each_segment(bv, req, bi) {
+			void *addr;
+			addr = kmap_atomic(bv.bv_page);
+			if (to_req) {
+				memcpy(addr + bv.bv_offset, data, bv.bv_len);
+				flush_dcache_page(bv.bv_page);
+			} else {
+				flush_dcache_page(bv.bv_page);
+				memcpy(data, addr + bv.bv_offset, bv.bv_len);
+			}
+			kunmap_atomic(addr);
+			data += bv.bv_len;
+			size -= bv.bv_len;
+		}
+	}
+	if (size)
+		printk("size mismatch: %lx\n", (unsigned long)size);
+}
+
+static void nvme_debug_identify_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_id_ns *id;
+	struct nvme_debug_namespace *ns;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+
+	id = kzalloc(sizeof(struct nvme_id_ns), GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
+	if (!ns) {
+		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
+		goto free_ret;
+	}
+
+	id->nsze = cpu_to_le64(ns->n_sectors);
+	id->ncap = id->nsze;
+	id->nuse = id->nsze;
+	/*id->nlbaf = 0;*/
+	id->dlfeat = 0x01;
+	id->lbaf[0].ds = ns->sector_size_bits;
+
+	copy_data_request(req, id, sizeof(struct nvme_id_ns), true);
+
+free_ret:
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_identify_ctrl(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_debug_namespace *ns;
+	struct nvme_id_ctrl *id;
+	char ver[9];
+	size_t ver_len;
+
+	id = kzalloc(sizeof(struct nvme_id_ctrl), GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	id->vid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
+	id->ssvid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
+	memset(id->sn, ' ', sizeof id->sn);
+	memset(id->mn, ' ', sizeof id->mn);
+	memcpy(id->mn, "nvme-debug", 10);
+	snprintf(ver, sizeof ver, "%X", LINUX_VERSION_CODE);
+	ver_len = min(strlen(ver), sizeof id->fr);
+	memset(id->fr, ' ', sizeof id->fr);
+	memcpy(id->fr, ver, ver_len);
+	memcpy(id->ieee, "\xe9\xf2\x40", sizeof id->ieee);
+	id->ver = cpu_to_le32(0x20000);
+	id->kas = cpu_to_le16(100);
+	id->sqes = 0x66;
+	id->cqes = 0x44;
+	id->maxcmd = cpu_to_le16(1);
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->nsid > le32_to_cpu(id->nn))
+			id->nn = cpu_to_le32(ns->nsid);
+	}
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_COPY);
+	id->vwc = 0x6;
+	id->mnan = cpu_to_le32(0xffffffff);
+	strcpy(id->subnqn, "nqn.2021-09.com.redhat:nvme-debug");
+	id->ioccsz = cpu_to_le32(4);
+	id->iorcsz = cpu_to_le32(1);
+
+	copy_data_request(req, id, sizeof(struct nvme_id_ctrl), true);
+
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static int cmp_ns(const void *a1, const void *a2)
+{
+	__u32 v1 = le32_to_cpu(*(__u32 *)a1);
+	__u32 v2 = le32_to_cpu(*(__u32 *)a2);
+	if (!v1)
+		v1 = 0xffffffffU;
+	if (!v2)
+		v2 = 0xffffffffU;
+	if (v1 < v2)
+		return -1;
+	if (v1 > v2)
+		return 1;
+	return 0;
+}
+
+static void nvme_debug_identify_active_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_debug_namespace *ns;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	unsigned size;
+	__u32 *id;
+	unsigned idp;
+
+	if (le32_to_cpu(ndr->cmd.identify.nsid) >= 0xFFFFFFFE) {
+		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
+		blk_mq_end_request(req, BLK_STS_OK);
+		return;
+	}
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	size = 0;
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		size++;
+	}
+	size = min(size, 1024U);
+
+	id = kzalloc(sizeof(__u32) * size, GFP_NOIO);
+	if (!id) {
+		mutex_unlock(&nvme_debug_ctrl_mutex);
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	idp = 0;
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->nsid > le32_to_cpu(ndr->cmd.identify.nsid))
+			id[idp++] = cpu_to_le32(ns->nsid);
+	}
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+	sort(id, idp, sizeof(__u32), cmp_ns, NULL);
+
+	copy_data_request(req, id, sizeof(__u32) * 1024, true);
+
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_identify_ns_desc_list(struct nvme_debug_ctrl *ctrl, struct request *req)
+{
+	struct nvme_ns_id_desc *id;
+	struct nvme_debug_namespace *ns;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	id = kzalloc(4096, GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
+	if (!ns) {
+		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
+		goto free_ret;
+	}
+
+	id->nidt = NVME_NIDT_UUID;
+	id->nidl = NVME_NIDT_UUID_LEN;
+	memcpy((char *)(id + 1), ns->uuid, NVME_NIDT_UUID_LEN);
+
+	copy_data_request(req, id, 4096, true);
+
+free_ret:
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_identify_ctrl_cs(struct request *req)
+{
+	struct nvme_id_ctrl_nvm *id;
+	id = kzalloc(sizeof(struct nvme_id_ctrl_nvm), GFP_NOIO);
+	if (!id) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	copy_data_request(req, id, sizeof(struct nvme_id_ctrl_nvm), true);
+
+	kfree(id);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_admin_rq(struct work_struct *w)
+{
+	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
+	struct request *req = (struct request *)ndr - 1;
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
+
+	switch (ndr->cmd.common.opcode) {
+		case nvme_admin_identify: {
+			switch (ndr->cmd.identify.cns) {
+				case NVME_ID_CNS_NS: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ns(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				};
+				case NVME_ID_CNS_CTRL: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ctrl(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				case NVME_ID_CNS_NS_ACTIVE_LIST: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_active_ns(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				case NVME_ID_CNS_NS_DESC_LIST: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ns_desc_list(ctrl, req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				case NVME_ID_CNS_CS_CTRL: {
+					percpu_down_read(&nvme_debug_sem);
+					nvme_debug_identify_ctrl_cs(req);
+					percpu_up_read(&nvme_debug_sem);
+					return;
+				}
+				default: {
+					printk("nvme_admin_identify: %x\n", ndr->cmd.identify.cns);
+					break;
+				}
+			}
+			break;
+		}
+		default: {
+			printk("nvme_debug_admin_rq: %x\n", ndr->cmd.common.opcode);
+			break;
+		}
+	}
+	blk_mq_end_request(req, BLK_STS_NOTSUPP);
+}
+
+static void nvme_debug_rw(struct nvme_debug_namespace *ns, struct request *req)
+{
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	__u64 lba = cpu_to_le64(ndr->cmd.rw.slba);
+	__u64 len = (__u64)cpu_to_le16(ndr->cmd.rw.length) + 1;
+	void *addr;
+	if (unlikely(lba + len < lba) || unlikely(lba + len > ns->n_sectors)) {
+		blk_mq_end_request(req, BLK_STS_NOTSUPP);
+		return;
+	}
+	addr = ns->space + (lba << ns->sector_size_bits);
+	copy_data_request(req, addr, len << ns->sector_size_bits, ndr->cmd.rw.opcode == nvme_cmd_read);
+	blk_mq_end_request(req, BLK_STS_OK);
+}
+
+static void nvme_debug_copy(struct nvme_debug_namespace *ns, struct request *req)
+{
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	__u64 dlba = cpu_to_le64(ndr->cmd.copy.sdlba);
+	unsigned n_descs = ndr->cmd.copy.length + 1;
+	struct nvme_copy_desc *descs;
+	unsigned i, ret;
+
+	descs = kmalloc(sizeof(struct nvme_copy_desc) * n_descs, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
+	if (!descs) {
+		blk_mq_end_request(req, BLK_STS_RESOURCE);
+		return;
+	}
+
+	copy_data_request(req, descs, sizeof(struct nvme_copy_desc) * n_descs, false);
+
+	for (i = 0; i < n_descs; i++) {
+		struct nvme_copy_desc *desc = &descs[i];
+		__u64 slba = cpu_to_le64(desc->slba);
+		__u64 len = (__u64)cpu_to_le16(desc->length) + 1;
+		void *saddr, *daddr;
+
+		if (unlikely(slba + len < slba) || unlikely(slba + len > ns->n_sectors) ||
+		    unlikely(dlba + len < dlba) || unlikely(dlba + len > ns->n_sectors)) {
+			ret = BLK_STS_NOTSUPP;
+			goto free_ret;
+		}
+
+		saddr = ns->space + (slba << ns->sector_size_bits);
+		daddr = ns->space + (dlba << ns->sector_size_bits);
+
+		memcpy(daddr, saddr, len << ns->sector_size_bits);
+
+		dlba += len;
+	}
+
+	ret = BLK_STS_OK;
+
+free_ret:
+	kfree(descs);
+
+	blk_mq_end_request(req, ret);
+}
+
+static void nvme_debug_io_rq(struct work_struct *w)
+{
+	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
+	struct request *req = (struct request *)ndr - 1;
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
+	__u32 nsid = le32_to_cpu(ndr->cmd.common.nsid);
+	struct nvme_debug_namespace *ns;
+
+	percpu_down_read(&nvme_debug_sem);
+	ns = nvme_debug_find_namespace(ctrl, nsid);
+	if (unlikely(!ns))
+		goto ret_notsupp;
+
+	switch (ndr->cmd.common.opcode) {
+		case nvme_cmd_flush: {
+			blk_mq_end_request(req, BLK_STS_OK);
+			goto ret;
+		}
+		case nvme_cmd_read:
+		case nvme_cmd_write: {
+			nvme_debug_rw(ns, req);
+			goto ret;
+		}
+		case nvme_cmd_copy: {
+			nvme_debug_copy(ns, req);
+			goto ret;
+		}
+		default: {
+			printk("nvme_debug_io_rq: %x\n", ndr->cmd.common.opcode);
+			break;
+		}
+	}
+ret_notsupp:
+	blk_mq_end_request(req, BLK_STS_NOTSUPP);
+ret:
+	percpu_up_read(&nvme_debug_sem);
+}
+
+static blk_status_t nvme_debug_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd)
+{
+	struct request *req = bd->rq;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
+	struct nvme_ns *ns = hctx->queue->queuedata;
+	blk_status_t r;
+
+	r = nvme_setup_cmd(ns, req);
+	if (unlikely(r))
+		return r;
+
+	if (!ns) {
+		INIT_WORK(&ndr->work, nvme_debug_admin_rq);
+		queue_work(ctrl->admin_wq, &ndr->work);
+		return BLK_STS_OK;
+	} else if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	} else {
+		INIT_WORK(&ndr->work, nvme_debug_io_rq);
+		queue_work(ctrl->io_wq, &ndr->work);
+		return BLK_STS_OK;
+	}
+}
+
+static int nvme_debug_init_request(struct blk_mq_tag_set *set, struct request *req, unsigned hctx_idx, unsigned numa_node)
+{
+	struct nvme_debug_ctrl *ctrl = set->driver_data;
+	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
+	nvme_req(req)->ctrl = &ctrl->ctrl;
+	nvme_req(req)->cmd = &ndr->cmd;
+	return 0;
+}
+
+static int nvme_debug_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned hctx_idx)
+{
+	struct nvme_debug_ctrl *ctrl = data;
+	hctx->driver_data = ctrl;
+	return 0;
+}
+
+static const struct blk_mq_ops nvme_debug_mq_ops = {
+	.queue_rq = nvme_debug_queue_rq,
+	.init_request = nvme_debug_init_request,
+	.init_hctx = nvme_debug_init_hctx,
+};
+
+static int nvme_debug_configure_admin_queue(struct nvme_debug_ctrl *ctrl)
+{
+	int r;
+
+	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
+	ctrl->admin_tag_set.ops = &nvme_debug_mq_ops;
+	ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+	ctrl->admin_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
+	ctrl->admin_tag_set.numa_node = ctrl->ctrl.numa_node;
+	ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_debug_request);
+	ctrl->admin_tag_set.driver_data = ctrl;
+	ctrl->admin_tag_set.nr_hw_queues = 1;
+	ctrl->admin_tag_set.timeout = NVME_ADMIN_TIMEOUT;
+	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
+
+	r = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
+	if (r)
+		goto ret0;
+	ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
+
+	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_q)) {
+		r = PTR_ERR(ctrl->ctrl.admin_q);
+		goto ret1;
+	}
+
+	r = nvme_enable_ctrl(&ctrl->ctrl);
+	if (r)
+		goto ret2;
+
+	nvme_start_admin_queue(&ctrl->ctrl);
+
+	r = nvme_init_ctrl_finish(&ctrl->ctrl);
+	if (r)
+		goto ret3;
+
+	return 0;
+
+ret3:
+	nvme_stop_admin_queue(&ctrl->ctrl);
+	blk_sync_queue(ctrl->ctrl.admin_q);
+ret2:
+	blk_cleanup_queue(ctrl->ctrl.admin_q);
+ret1:
+	blk_mq_free_tag_set(&ctrl->admin_tag_set);
+ret0:
+	return r;
+}
+
+static int nvme_debug_configure_io_queue(struct nvme_debug_ctrl *ctrl)
+{
+	int r;
+
+	memset(&ctrl->io_tag_set, 0, sizeof(ctrl->io_tag_set));
+	ctrl->io_tag_set.ops = &nvme_debug_mq_ops;
+	ctrl->io_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+	ctrl->io_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
+	ctrl->io_tag_set.numa_node = ctrl->ctrl.numa_node;
+	ctrl->io_tag_set.cmd_size = sizeof(struct nvme_debug_request);
+	ctrl->io_tag_set.driver_data = ctrl;
+	ctrl->io_tag_set.nr_hw_queues = 1;
+	ctrl->io_tag_set.timeout = NVME_ADMIN_TIMEOUT;
+	ctrl->io_tag_set.flags = BLK_MQ_F_NO_SCHED;
+
+	r = blk_mq_alloc_tag_set(&ctrl->io_tag_set);
+	if (r)
+		goto ret0;
+	ctrl->ctrl.tagset = &ctrl->io_tag_set;
+	return 0;
+
+ret0:
+	return r;
+}
+
+static const struct nvme_ctrl_ops nvme_debug_ctrl_ops = {
+	.name = "debug",
+	.module = THIS_MODULE,
+	.flags = NVME_F_FABRICS,
+	.reg_read32 = nvme_debug_reg_read32,
+	.reg_read64 = nvme_debug_reg_read64,
+	.reg_write32 = nvme_debug_reg_write32,
+	.free_ctrl = nvme_debug_free_ctrl,
+	.submit_async_event = nvme_debug_submit_async_event,
+	.delete_ctrl = nvme_debug_delete_ctrl,
+	.get_address = nvme_debug_get_address,
+};
+
+static struct nvme_ctrl *nvme_debug_create_ctrl(struct device *dev,
+		struct nvmf_ctrl_options *opts)
+{
+	int r;
+	struct nvme_debug_ctrl *ctrl;
+
+	ctrl = kzalloc(sizeof(struct nvme_debug_ctrl), GFP_KERNEL);
+	if (!ctrl) {
+		r = -ENOMEM;
+		goto ret0;
+	}
+
+	INIT_LIST_HEAD(&ctrl->list);
+	INIT_LIST_HEAD(&ctrl->namespaces);
+	ctrl->ctrl.opts = opts;
+	ctrl->ctrl.queue_count = 2;
+	INIT_WORK(&ctrl->ctrl.reset_work, nvme_debug_reset_ctrl_work);
+
+	ctrl->admin_wq = alloc_workqueue("nvme-debug-admin", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+	if (!ctrl->admin_wq)
+		goto ret1;
+
+	ctrl->io_wq = alloc_workqueue("nvme-debug-io", WQ_MEM_RECLAIM, 0);
+	if (!ctrl->io_wq)
+		goto ret1;
+
+	if (!nvme_debug_alloc_namespace(ctrl)) {
+		r = -ENOMEM;
+		goto ret1;
+	}
+
+	r = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_debug_ctrl_ops, 0);
+	if (r)
+		goto ret1;
+
+        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
+		goto ret2;
+
+	r = nvme_debug_configure_admin_queue(ctrl);
+	if (r)
+		goto ret2;
+
+	r = nvme_debug_configure_io_queue(ctrl);
+	if (r)
+		goto ret2;
+
+        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE))
+		goto ret2;
+
+	nvme_start_ctrl(&ctrl->ctrl);
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_add_tail(&ctrl->list, &nvme_debug_ctrl_list);
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+
+	return &ctrl->ctrl;
+
+ret2:
+	nvme_uninit_ctrl(&ctrl->ctrl);
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ERR_PTR(r);
+ret1:
+	nvme_debug_free_namespaces(ctrl);
+	if (ctrl->admin_wq)
+		destroy_workqueue(ctrl->admin_wq);
+	if (ctrl->io_wq)
+		destroy_workqueue(ctrl->io_wq);
+	kfree(ctrl);
+ret0:
+	return ERR_PTR(r);
+}
+
+static struct nvmf_transport_ops nvme_debug_transport = {
+	.name		= "debug",
+	.module		= THIS_MODULE,
+	.required_opts	= NVMF_OPT_TRADDR,
+	.allowed_opts	= NVMF_OPT_CTRL_LOSS_TMO,
+	.create_ctrl	= nvme_debug_create_ctrl,
+};
+
+static int __init nvme_debug_init_module(void)
+{
+	nvmf_register_transport(&nvme_debug_transport);
+	return 0;
+}
+
+static void __exit nvme_debug_cleanup_module(void)
+{
+	struct nvme_debug_ctrl *ctrl;
+
+	nvmf_unregister_transport(&nvme_debug_transport);
+
+	mutex_lock(&nvme_debug_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_debug_ctrl_list, list) {
+		nvme_delete_ctrl(&ctrl->ctrl);
+	}
+	mutex_unlock(&nvme_debug_ctrl_mutex);
+	flush_workqueue(nvme_delete_wq);
+}
+
+module_init(nvme_debug_init_module);
+module_exit(nvme_debug_cleanup_module);
+
+MODULE_LICENSE("GPL v2");

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 2/3] nvme: add copy offload support
  2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
@ 2022-02-01 19:18           ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-02-01 19:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel

On 2/1/22 10:33, Mikulas Patocka wrote:
> +static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct request *req)
> +{
> +	struct bio *bio = req->bio;
> +	struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;

Hmm ... shouldn't this function use bvec_kmap_local() instead of 
page_to_virt()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 2/3] nvme: add copy offload support
@ 2022-02-01 19:18           ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-02-01 19:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Chaitanya Kulkarni, linux-scsi, linux-nvme, linux-block,
	dm-devel, linux-fsdevel

On 2/1/22 10:33, Mikulas Patocka wrote:
> +static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct request *req)
> +{
> +	struct bio *bio = req->bio;
> +	struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;

Hmm ... shouldn't this function use bvec_kmap_local() instead of 
page_to_virt()?

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-01 18:32         ` [dm-devel] " Mikulas Patocka
@ 2022-02-01 19:18           ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-02-01 19:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel

On 2/1/22 10:32, Mikulas Patocka wrote:
>   /**
> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the queue
> + * @q:  the request queue for the device
> + * @size:  the maximum copy offload sectors
> + */
> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
> +{
> +	q->limits.max_copy_sectors = size;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);

Please either change the unit of 'size' into bytes or change its type 
into sector_t.

> +extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +		      struct block_device *bdev2, sector_t sector2,
> +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask);
> +

Only supporting copying between contiguous LBA ranges seems restrictive 
to me. I expect garbage collection by filesystems for UFS devices to 
perform better if multiple LBA ranges are submitted as a single SCSI 
XCOPY command.

A general comment about the approach: encoding the LBA range information 
in a bio payload is not compatible with bio splitting. How can the dm 
driver implement copy offloading without the ability to split copy 
offload bio's?

> +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +		      struct block_device *bdev2, sector_t sector2,
> +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> +{
> +	struct page *token;
> +	sector_t m;
> +	int r = 0;
> +	struct completion comp;

Consider using DECLARE_COMPLETION_ONSTACK() instead of a separate 
declaration and init_completion() call.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-01 19:18           ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-02-01 19:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-block, linux-fsdevel, dm-devel, linux-nvme, linux-scsi

On 2/1/22 10:32, Mikulas Patocka wrote:
>   /**
> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the queue
> + * @q:  the request queue for the device
> + * @size:  the maximum copy offload sectors
> + */
> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
> +{
> +	q->limits.max_copy_sectors = size;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);

Please either change the unit of 'size' into bytes or change its type 
into sector_t.

> +extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +		      struct block_device *bdev2, sector_t sector2,
> +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask);
> +

Only supporting copying between contiguous LBA ranges seems restrictive 
to me. I expect garbage collection by filesystems for UFS devices to 
perform better if multiple LBA ranges are submitted as a single SCSI 
XCOPY command.

A general comment about the approach: encoding the LBA range information 
in a bio payload is not compatible with bio splitting. How can the dm 
driver implement copy offloading without the ability to split copy 
offload bio's?

> +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +		      struct block_device *bdev2, sector_t sector2,
> +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> +{
> +	struct page *token;
> +	sector_t m;
> +	int r = 0;
> +	struct completion comp;

Consider using DECLARE_COMPLETION_ONSTACK() instead of a separate 
declaration and init_completion() call.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 2/3] nvme: add copy offload support
  2022-02-01 19:18           ` [dm-devel] " Bart Van Assche
@ 2022-02-01 19:25             ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 19:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel



On Tue, 1 Feb 2022, Bart Van Assche wrote:

> On 2/1/22 10:33, Mikulas Patocka wrote:
> > +static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct
> > request *req)
> > +{
> > +	struct bio *bio = req->bio;
> > +	struct nvme_copy_token *token =
> > page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;
> 
> Hmm ... shouldn't this function use bvec_kmap_local() instead of
> page_to_virt()?
> 
> Thanks,
> 
> Bart.

.bv_page is allocated only in blkdev_issue_copy with alloc_page. So, 
page_to_virt works.

But you are right that bvec_kmap_local may be nicer.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 2/3] nvme: add copy offload support
@ 2022-02-01 19:25             ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-01 19:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, linux-scsi, linux-nvme, linux-block,
	dm-devel, linux-fsdevel



On Tue, 1 Feb 2022, Bart Van Assche wrote:

> On 2/1/22 10:33, Mikulas Patocka wrote:
> > +static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct
> > request *req)
> > +{
> > +	struct bio *bio = req->bio;
> > +	struct nvme_copy_token *token =
> > page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset;
> 
> Hmm ... shouldn't this function use bvec_kmap_local() instead of
> page_to_virt()?
> 
> Thanks,
> 
> Bart.

.bv_page is allocated only in blkdev_issue_copy with alloc_page. So, 
page_to_virt works.

But you are right that bvec_kmap_local may be nicer.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
  (?)
@ 2022-02-01 21:32         ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-01 21:32 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6417 bytes --]

Hi Mikulas,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v5.17-rc2 next-20220201]
[cannot apply to linux-nvme/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikulas-Patocka/block-add-copy-offload-support/20220202-033318
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: alpha-allmodconfig (https://download.01.org/0day-ci/archive/20220202/202202020535.upnRkg5V-lkp(a)intel.com/config)
compiler: alpha-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b59e9d971770c50aaac1d07d6edaa10a4a47171b
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikulas-Patocka/block-add-copy-offload-support/20220202-033318
        git checkout b59e9d971770c50aaac1d07d6edaa10a4a47171b
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=alpha SHELL=/bin/bash drivers/nvme/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/nvme/host/nvme-debug.c: In function 'nvme_debug_alloc_namespace':
>> drivers/nvme/host/nvme-debug.c:98:21: error: implicit declaration of function 'vzalloc'; did you mean 'kvzalloc'? [-Werror=implicit-function-declaration]
      98 |         ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
         |                     ^~~~~~~
         |                     kvzalloc
>> drivers/nvme/host/nvme-debug.c:98:19: warning: assignment to 'void *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
      98 |         ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
         |                   ^
   drivers/nvme/host/nvme-debug.c: In function 'nvme_debug_free_namespace':
>> drivers/nvme/host/nvme-debug.c:115:9: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
     115 |         vfree(ns->space);
         |         ^~~~~
         |         kvfree
   drivers/nvme/host/nvme-debug.c: At top level:
>> drivers/nvme/host/nvme-debug.c:148:5: warning: no previous prototype for 'nvme_debug_reg_read64' [-Wmissing-prototypes]
     148 | int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
         |     ^~~~~~~~~~~~~~~~~~~~~
>> drivers/nvme/host/nvme-debug.c:163:5: warning: no previous prototype for 'nvme_debug_reg_write32' [-Wmissing-prototypes]
     163 | int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
         |     ^~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +98 drivers/nvme/host/nvme-debug.c

    71	
    72	static bool nvme_debug_alloc_namespace(struct nvme_debug_ctrl *ctrl)
    73	{
    74		struct nvme_debug_namespace *ns;
    75		unsigned s;
    76		size_t dsm;
    77	
    78		ns = kmalloc(sizeof(struct nvme_debug_namespace), GFP_KERNEL);
    79		if (!ns)
    80			goto fail0;
    81	
    82		ns->nsid = 1;
    83		while (nvme_debug_find_namespace(ctrl, ns->nsid))
    84			ns->nsid++;
    85	
    86		s = READ_ONCE(sector_size);
    87		if (s < 512 || s > PAGE_SIZE || !is_power_of_2(s))
    88			goto fail1;
    89		ns->sector_size_bits = __ffs(s);
    90		dsm = READ_ONCE(dev_size_mb);
    91		ns->n_sectors = dsm << (20 - ns->sector_size_bits);
    92		if (ns->n_sectors >> (20 - ns->sector_size_bits) != dsm)
    93			goto fail1;
    94	
    95		if (ns->n_sectors << ns->sector_size_bits >> ns->sector_size_bits != ns->n_sectors)
    96			goto fail1;
    97	
  > 98		ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
    99		if (!ns->space)
   100			goto fail1;
   101	
   102		generate_random_uuid(ns->uuid);
   103	
   104		list_add(&ns->list, &ctrl->namespaces);
   105		return true;
   106	
   107	fail1:
   108		kfree(ns);
   109	fail0:
   110		return false;
   111	}
   112	
   113	static void nvme_debug_free_namespace(struct nvme_debug_namespace *ns)
   114	{
 > 115		vfree(ns->space);
   116		list_del(&ns->list);
   117		kfree(ns);
   118	}
   119	
   120	static int nvme_debug_reg_read32(struct nvme_ctrl *nctrl, u32 off, u32 *val)
   121	{
   122		struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
   123		switch (off) {
   124			case NVME_REG_VS: {
   125				*val = 0x20000;
   126				break;
   127			}
   128			case NVME_REG_CC: {
   129				*val = ctrl->reg_cc;
   130				break;
   131			}
   132			case NVME_REG_CSTS: {
   133				*val = 0;
   134				if (ctrl->reg_cc & NVME_CC_ENABLE)
   135					*val |= NVME_CSTS_RDY;
   136				if (ctrl->reg_cc & NVME_CC_SHN_MASK)
   137					*val |= NVME_CSTS_SHST_CMPLT;
   138				break;
   139			}
   140			default: {
   141				printk("nvme_debug_reg_read32: %x\n", off);
   142				return -ENOSYS;
   143			}
   144		}
   145		return 0;
   146	}
   147	
 > 148	int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
   149	{
   150		switch (off) {
   151			case NVME_REG_CAP: {
   152				*val = (1ULL << 0) | (1ULL << 37);
   153				break;
   154			}
   155			default: {
   156				printk("nvme_debug_reg_read64: %x\n", off);
   157				return -ENOSYS;
   158			}
   159		}
   160		return 0;
   161	}
   162	
 > 163	int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
   164	{
   165		struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
   166		switch (off) {
   167			case NVME_REG_CC: {
   168				ctrl->reg_cc = val;
   169				break;
   170			}
   171			default: {
   172				printk("nvme_debug_reg_write32: %x, %x\n", off, val);
   173				return -ENOSYS;
   174			}
   175		}
   176		return 0;
   177	}
   178	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2022-02-02  5:57     ` Kanchan Joshi
  2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 272+ messages in thread
From: Kanchan Joshi @ 2022-02-02  5:57 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On Thu, Jan 27, 2022 at 12:51 PM Chaitanya Kulkarni
<chaitanyak@nvidia.com> wrote:
>
> Hi,
>
> * Background :-
> -----------------------------------------------------------------------
>
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
>
> * Problem :-
> -----------------------------------------------------------------------
>
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
>
> * Current state of the work :-
> -----------------------------------------------------------------------
>
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
>
> We have conducted a call with interested people late last year since
> lack of LSFMMM and we would like to share the details with broader
> community members.

I'm keen on this topic and would like to join the F2F discussion.
The Novmber call did establish some consensus on requirements.
Planning to have a round or two of code-discussions soon.


Thanks,
-- 
Kanchan

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-02-02  5:57     ` Kanchan Joshi
  0 siblings, 0 replies; 272+ messages in thread
From: Kanchan Joshi @ 2022-02-02  5:57 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, mpatocka,
	kbus >> Keith Busch, Frederick.Knight, Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Thu, Jan 27, 2022 at 12:51 PM Chaitanya Kulkarni
<chaitanyak@nvidia.com> wrote:
>
> Hi,
>
> * Background :-
> -----------------------------------------------------------------------
>
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
>
> * Problem :-
> -----------------------------------------------------------------------
>
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
>
> * Current state of the work :-
> -----------------------------------------------------------------------
>
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
>
> We have conducted a call with interested people late last year since
> lack of LSFMMM and we would like to share the details with broader
> community members.

I'm keen on this topic and would like to join the F2F discussion.
The Novmber call did establish some consensus on requirements.
Planning to have a round or two of code-discussions soon.


Thanks,
-- 
Kanchan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
@ 2022-02-02  6:01           ` Adam Manzanares
  -1 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-02  6:01 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Tue, Feb 01, 2022 at 01:33:47PM -0500, Mikulas Patocka wrote:
> This patch adds a new driver "nvme-debug". It uses memory as a backing
> store and it is used to test the copy offload functionality.
>

We have looked at something similar to create a null nvme driver to test 
interfaces poking the nvme driver without using the block layer. Have you 
considered implementing this in the fabrics code since it is already virtualizng
a ssd controller? 

BTW I think having the target code be able to implement simple copy without 
moving data over the fabric would be a great way of showing off the command.


> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  drivers/nvme/host/Kconfig      |   13 
>  drivers/nvme/host/Makefile     |    1 
>  drivers/nvme/host/nvme-debug.c |  838 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 852 insertions(+)
> 
> Index: linux-2.6/drivers/nvme/host/Kconfig
> ===================================================================
> --- linux-2.6.orig/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
> +++ linux-2.6/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
> @@ -83,3 +83,16 @@ config NVME_TCP
>  	  from https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=26ebb71c-79708e5a-26ea3c53-0cc47a3003e8-a628bd4c53dd84d1&q=1&e=9f16193b-4232-4453-9889-7cdf5d653922&u=https*3A*2F*2Fgithub.com*2Flinux-nvme*2Fnvme-cli__;JSUlJSU!!EwVzqGoTKBqv-0DWAJBm!HyJHAJgOq3M4SIOA0HvhX95q50ACkRtsiHWmAQERqjGLQ0pLb_Jru8QcwOQyix0tTVJA$ .
>  
>  	  If unsure, say N.
> +
> +config NVME_DEBUG
> +	tristate "NVM Express debug"
> +	depends on INET
> +	depends on BLOCK
> +	select NVME_CORE
> +	select NVME_FABRICS
> +	select CRYPTO
> +	select CRYPTO_CRC32C
> +	help
> +	  This pseudo driver simulates a NVMe adapter.
> +
> +	  If unsure, say N.
> Index: linux-2.6/drivers/nvme/host/Makefile
> ===================================================================
> --- linux-2.6.orig/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
> +++ linux-2.6/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
> @@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabr
>  obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
>  obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
>  obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
> +obj-$(CONFIG_NVME_DEBUG)		+= nvme-debug.o
>  
>  nvme-core-y				:= core.o ioctl.o
>  nvme-core-$(CONFIG_TRACING)		+= trace.o
> Index: linux-2.6/drivers/nvme/host/nvme-debug.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6/drivers/nvme/host/nvme-debug.c	2022-02-01 18:34:22.000000000 +0100
> @@ -0,0 +1,838 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe debug
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/err.h>
> +#include <linux/blk-mq.h>
> +#include <linux/sort.h>
> +#include <linux/version.h>
> +
> +#include "nvme.h"
> +#include "fabrics.h"
> +
> +static ulong dev_size_mb = 16;
> +module_param_named(dev_size_mb, dev_size_mb, ulong, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(dev_size_mb, "size in MiB of the namespace(def=8)");
> +
> +static unsigned sector_size = 512;
> +module_param_named(sector_size, sector_size, uint, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)");
> +
> +struct nvme_debug_ctrl {
> +	struct nvme_ctrl ctrl;
> +	uint32_t reg_cc;
> +	struct blk_mq_tag_set admin_tag_set;
> +	struct blk_mq_tag_set io_tag_set;
> +	struct workqueue_struct *admin_wq;
> +	struct workqueue_struct *io_wq;
> +	struct list_head namespaces;
> +	struct list_head list;
> +};
> +
> +struct nvme_debug_namespace {
> +	struct list_head list;
> +	uint32_t nsid;
> +	unsigned char sector_size_bits;
> +	size_t n_sectors;
> +	void *space;
> +	char uuid[16];
> +};
> +
> +struct nvme_debug_request {
> +	struct nvme_request req;
> +	struct nvme_command cmd;
> +	struct work_struct work;
> +};
> +
> +static LIST_HEAD(nvme_debug_ctrl_list);
> +static DEFINE_MUTEX(nvme_debug_ctrl_mutex);
> +
> +DEFINE_STATIC_PERCPU_RWSEM(nvme_debug_sem);
> +
> +static inline struct nvme_debug_ctrl *to_debug_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	return container_of(nctrl, struct nvme_debug_ctrl, ctrl);
> +}
> +
> +static struct nvme_debug_namespace *nvme_debug_find_namespace(struct nvme_debug_ctrl *ctrl, unsigned nsid)
> +{
> +	struct nvme_debug_namespace *ns;
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->nsid == nsid)
> +			return ns;
> +	}
> +	return NULL;
> +}
> +
> +static bool nvme_debug_alloc_namespace(struct nvme_debug_ctrl *ctrl)
> +{
> +	struct nvme_debug_namespace *ns;
> +	unsigned s;
> +	size_t dsm;
> +
> +	ns = kmalloc(sizeof(struct nvme_debug_namespace), GFP_KERNEL);
> +	if (!ns)
> +		goto fail0;
> +
> +	ns->nsid = 1;
> +	while (nvme_debug_find_namespace(ctrl, ns->nsid))
> +		ns->nsid++;
> +
> +	s = READ_ONCE(sector_size);
> +	if (s < 512 || s > PAGE_SIZE || !is_power_of_2(s))
> +		goto fail1;
> +	ns->sector_size_bits = __ffs(s);
> +	dsm = READ_ONCE(dev_size_mb);
> +	ns->n_sectors = dsm << (20 - ns->sector_size_bits);
> +	if (ns->n_sectors >> (20 - ns->sector_size_bits) != dsm)
> +		goto fail1;
> +
> +	if (ns->n_sectors << ns->sector_size_bits >> ns->sector_size_bits != ns->n_sectors)
> +		goto fail1;
> +
> +	ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
> +	if (!ns->space)
> +		goto fail1;
> +
> +	generate_random_uuid(ns->uuid);
> +
> +	list_add(&ns->list, &ctrl->namespaces);
> +	return true;
> +
> +fail1:
> +	kfree(ns);
> +fail0:
> +	return false;
> +}
> +
> +static void nvme_debug_free_namespace(struct nvme_debug_namespace *ns)
> +{
> +	vfree(ns->space);
> +	list_del(&ns->list);
> +	kfree(ns);
> +}
> +
> +static int nvme_debug_reg_read32(struct nvme_ctrl *nctrl, u32 off, u32 *val)
> +{
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
> +	switch (off) {
> +		case NVME_REG_VS: {
> +			*val = 0x20000;
> +			break;
> +		}
> +		case NVME_REG_CC: {
> +			*val = ctrl->reg_cc;
> +			break;
> +		}
> +		case NVME_REG_CSTS: {
> +			*val = 0;
> +			if (ctrl->reg_cc & NVME_CC_ENABLE)
> +				*val |= NVME_CSTS_RDY;
> +			if (ctrl->reg_cc & NVME_CC_SHN_MASK)
> +				*val |= NVME_CSTS_SHST_CMPLT;
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_reg_read32: %x\n", off);
> +			return -ENOSYS;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
> +{
> +	switch (off) {
> +		case NVME_REG_CAP: {
> +			*val = (1ULL << 0) | (1ULL << 37);
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_reg_read64: %x\n", off);
> +			return -ENOSYS;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
> +{
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
> +	switch (off) {
> +		case NVME_REG_CC: {
> +			ctrl->reg_cc = val;
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_reg_write32: %x, %x\n", off, val);
> +			return -ENOSYS;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static void nvme_debug_submit_async_event(struct nvme_ctrl *nctrl)
> +{
> +	printk("nvme_debug_submit_async_event\n");
> +}
> +
> +static void nvme_debug_delete_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	nvme_shutdown_ctrl(nctrl);
> +}
> +
> +static void nvme_debug_free_namespaces(struct nvme_debug_ctrl *ctrl)
> +{
> +	if (!list_empty(&ctrl->namespaces)) {
> +		struct nvme_debug_namespace *ns = container_of(ctrl->namespaces.next, struct nvme_debug_namespace, list);
> +		nvme_debug_free_namespace(ns);
> +	}
> +}
> +
> +static void nvme_debug_free_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
> +
> +	flush_workqueue(ctrl->admin_wq);
> +	flush_workqueue(ctrl->io_wq);
> +
> +	nvme_debug_free_namespaces(ctrl);
> +
> +	if (list_empty(&ctrl->list))
> +		goto free_ctrl;
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_del(&ctrl->list);
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +
> +	nvmf_free_options(nctrl->opts);
> +free_ctrl:
> +	destroy_workqueue(ctrl->admin_wq);
> +	destroy_workqueue(ctrl->io_wq);
> +	kfree(ctrl);
> +}
> +
> +static int nvme_debug_get_address(struct nvme_ctrl *nctrl, char *buf, int size)
> +{
> +	int len = 0;
> +	len += snprintf(buf, size, "debug");
> +	return len;
> +}
> +
> +static void nvme_debug_reset_ctrl_work(struct work_struct *work)
> +{
> +	printk("nvme_reset_ctrl_work\n");
> +}
> +
> +static void copy_data_request(struct request *req, void *data, size_t size, bool to_req)
> +{
> +	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
> +		void *addr;
> +		struct bio_vec bv = req->special_vec;
> +		addr = kmap_atomic(bv.bv_page);
> +		if (to_req) {
> +			memcpy(addr + bv.bv_offset, data, bv.bv_len);
> +			flush_dcache_page(bv.bv_page);
> +		} else {
> +			flush_dcache_page(bv.bv_page);
> +			memcpy(data, addr + bv.bv_offset, bv.bv_len);
> +		}
> +		kunmap_atomic(addr);
> +		data += bv.bv_len;
> +		size -= bv.bv_len;
> +	} else {
> +		struct req_iterator bi;
> +		struct bio_vec bv;
> +		rq_for_each_segment(bv, req, bi) {
> +			void *addr;
> +			addr = kmap_atomic(bv.bv_page);
> +			if (to_req) {
> +				memcpy(addr + bv.bv_offset, data, bv.bv_len);
> +				flush_dcache_page(bv.bv_page);
> +			} else {
> +				flush_dcache_page(bv.bv_page);
> +				memcpy(data, addr + bv.bv_offset, bv.bv_len);
> +			}
> +			kunmap_atomic(addr);
> +			data += bv.bv_len;
> +			size -= bv.bv_len;
> +		}
> +	}
> +	if (size)
> +		printk("size mismatch: %lx\n", (unsigned long)size);
> +}
> +
> +static void nvme_debug_identify_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_id_ns *id;
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +
> +	id = kzalloc(sizeof(struct nvme_id_ns), GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
> +	if (!ns) {
> +		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
> +		goto free_ret;
> +	}
> +
> +	id->nsze = cpu_to_le64(ns->n_sectors);
> +	id->ncap = id->nsze;
> +	id->nuse = id->nsze;
> +	/*id->nlbaf = 0;*/
> +	id->dlfeat = 0x01;
> +	id->lbaf[0].ds = ns->sector_size_bits;
> +
> +	copy_data_request(req, id, sizeof(struct nvme_id_ns), true);
> +
> +free_ret:
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_identify_ctrl(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_id_ctrl *id;
> +	char ver[9];
> +	size_t ver_len;
> +
> +	id = kzalloc(sizeof(struct nvme_id_ctrl), GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	id->vid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
> +	id->ssvid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
> +	memset(id->sn, ' ', sizeof id->sn);
> +	memset(id->mn, ' ', sizeof id->mn);
> +	memcpy(id->mn, "nvme-debug", 10);
> +	snprintf(ver, sizeof ver, "%X", LINUX_VERSION_CODE);
> +	ver_len = min(strlen(ver), sizeof id->fr);
> +	memset(id->fr, ' ', sizeof id->fr);
> +	memcpy(id->fr, ver, ver_len);
> +	memcpy(id->ieee, "\xe9\xf2\x40", sizeof id->ieee);
> +	id->ver = cpu_to_le32(0x20000);
> +	id->kas = cpu_to_le16(100);
> +	id->sqes = 0x66;
> +	id->cqes = 0x44;
> +	id->maxcmd = cpu_to_le16(1);
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->nsid > le32_to_cpu(id->nn))
> +			id->nn = cpu_to_le32(ns->nsid);
> +	}
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_COPY);
> +	id->vwc = 0x6;
> +	id->mnan = cpu_to_le32(0xffffffff);
> +	strcpy(id->subnqn, "nqn.2021-09.com.redhat:nvme-debug");
> +	id->ioccsz = cpu_to_le32(4);
> +	id->iorcsz = cpu_to_le32(1);
> +
> +	copy_data_request(req, id, sizeof(struct nvme_id_ctrl), true);
> +
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static int cmp_ns(const void *a1, const void *a2)
> +{
> +	__u32 v1 = le32_to_cpu(*(__u32 *)a1);
> +	__u32 v2 = le32_to_cpu(*(__u32 *)a2);
> +	if (!v1)
> +		v1 = 0xffffffffU;
> +	if (!v2)
> +		v2 = 0xffffffffU;
> +	if (v1 < v2)
> +		return -1;
> +	if (v1 > v2)
> +		return 1;
> +	return 0;
> +}
> +
> +static void nvme_debug_identify_active_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	unsigned size;
> +	__u32 *id;
> +	unsigned idp;
> +
> +	if (le32_to_cpu(ndr->cmd.identify.nsid) >= 0xFFFFFFFE) {
> +		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
> +		blk_mq_end_request(req, BLK_STS_OK);
> +		return;
> +	}
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	size = 0;
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		size++;
> +	}
> +	size = min(size, 1024U);
> +
> +	id = kzalloc(sizeof(__u32) * size, GFP_NOIO);
> +	if (!id) {
> +		mutex_unlock(&nvme_debug_ctrl_mutex);
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	idp = 0;
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->nsid > le32_to_cpu(ndr->cmd.identify.nsid))
> +			id[idp++] = cpu_to_le32(ns->nsid);
> +	}
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +	sort(id, idp, sizeof(__u32), cmp_ns, NULL);
> +
> +	copy_data_request(req, id, sizeof(__u32) * 1024, true);
> +
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_identify_ns_desc_list(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_ns_id_desc *id;
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	id = kzalloc(4096, GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
> +	if (!ns) {
> +		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
> +		goto free_ret;
> +	}
> +
> +	id->nidt = NVME_NIDT_UUID;
> +	id->nidl = NVME_NIDT_UUID_LEN;
> +	memcpy((char *)(id + 1), ns->uuid, NVME_NIDT_UUID_LEN);
> +
> +	copy_data_request(req, id, 4096, true);
> +
> +free_ret:
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_identify_ctrl_cs(struct request *req)
> +{
> +	struct nvme_id_ctrl_nvm *id;
> +	id = kzalloc(sizeof(struct nvme_id_ctrl_nvm), GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	copy_data_request(req, id, sizeof(struct nvme_id_ctrl_nvm), true);
> +
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_admin_rq(struct work_struct *w)
> +{
> +	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
> +	struct request *req = (struct request *)ndr - 1;
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
> +
> +	switch (ndr->cmd.common.opcode) {
> +		case nvme_admin_identify: {
> +			switch (ndr->cmd.identify.cns) {
> +				case NVME_ID_CNS_NS: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ns(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				};
> +				case NVME_ID_CNS_CTRL: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ctrl(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				case NVME_ID_CNS_NS_ACTIVE_LIST: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_active_ns(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				case NVME_ID_CNS_NS_DESC_LIST: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ns_desc_list(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				case NVME_ID_CNS_CS_CTRL: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ctrl_cs(req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				default: {
> +					printk("nvme_admin_identify: %x\n", ndr->cmd.identify.cns);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_admin_rq: %x\n", ndr->cmd.common.opcode);
> +			break;
> +		}
> +	}
> +	blk_mq_end_request(req, BLK_STS_NOTSUPP);
> +}
> +
> +static void nvme_debug_rw(struct nvme_debug_namespace *ns, struct request *req)
> +{
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	__u64 lba = cpu_to_le64(ndr->cmd.rw.slba);
> +	__u64 len = (__u64)cpu_to_le16(ndr->cmd.rw.length) + 1;
> +	void *addr;
> +	if (unlikely(lba + len < lba) || unlikely(lba + len > ns->n_sectors)) {
> +		blk_mq_end_request(req, BLK_STS_NOTSUPP);
> +		return;
> +	}
> +	addr = ns->space + (lba << ns->sector_size_bits);
> +	copy_data_request(req, addr, len << ns->sector_size_bits, ndr->cmd.rw.opcode == nvme_cmd_read);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_copy(struct nvme_debug_namespace *ns, struct request *req)
> +{
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	__u64 dlba = cpu_to_le64(ndr->cmd.copy.sdlba);
> +	unsigned n_descs = ndr->cmd.copy.length + 1;
> +	struct nvme_copy_desc *descs;
> +	unsigned i, ret;
> +
> +	descs = kmalloc(sizeof(struct nvme_copy_desc) * n_descs, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
> +	if (!descs) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	copy_data_request(req, descs, sizeof(struct nvme_copy_desc) * n_descs, false);
> +
> +	for (i = 0; i < n_descs; i++) {
> +		struct nvme_copy_desc *desc = &descs[i];
> +		__u64 slba = cpu_to_le64(desc->slba);
> +		__u64 len = (__u64)cpu_to_le16(desc->length) + 1;
> +		void *saddr, *daddr;
> +
> +		if (unlikely(slba + len < slba) || unlikely(slba + len > ns->n_sectors) ||
> +		    unlikely(dlba + len < dlba) || unlikely(dlba + len > ns->n_sectors)) {
> +			ret = BLK_STS_NOTSUPP;
> +			goto free_ret;
> +		}
> +
> +		saddr = ns->space + (slba << ns->sector_size_bits);
> +		daddr = ns->space + (dlba << ns->sector_size_bits);
> +
> +		memcpy(daddr, saddr, len << ns->sector_size_bits);
> +
> +		dlba += len;
> +	}
> +
> +	ret = BLK_STS_OK;
> +
> +free_ret:
> +	kfree(descs);
> +
> +	blk_mq_end_request(req, ret);
> +}
> +
> +static void nvme_debug_io_rq(struct work_struct *w)
> +{
> +	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
> +	struct request *req = (struct request *)ndr - 1;
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
> +	__u32 nsid = le32_to_cpu(ndr->cmd.common.nsid);
> +	struct nvme_debug_namespace *ns;
> +
> +	percpu_down_read(&nvme_debug_sem);
> +	ns = nvme_debug_find_namespace(ctrl, nsid);
> +	if (unlikely(!ns))
> +		goto ret_notsupp;
> +
> +	switch (ndr->cmd.common.opcode) {
> +		case nvme_cmd_flush: {
> +			blk_mq_end_request(req, BLK_STS_OK);
> +			goto ret;
> +		}
> +		case nvme_cmd_read:
> +		case nvme_cmd_write: {
> +			nvme_debug_rw(ns, req);
> +			goto ret;
> +		}
> +		case nvme_cmd_copy: {
> +			nvme_debug_copy(ns, req);
> +			goto ret;
> +		}
> +		default: {
> +			printk("nvme_debug_io_rq: %x\n", ndr->cmd.common.opcode);
> +			break;
> +		}
> +	}
> +ret_notsupp:
> +	blk_mq_end_request(req, BLK_STS_NOTSUPP);
> +ret:
> +	percpu_up_read(&nvme_debug_sem);
> +}
> +
> +static blk_status_t nvme_debug_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd)
> +{
> +	struct request *req = bd->rq;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
> +	struct nvme_ns *ns = hctx->queue->queuedata;
> +	blk_status_t r;
> +
> +	r = nvme_setup_cmd(ns, req);
> +	if (unlikely(r))
> +		return r;
> +
> +	if (!ns) {
> +		INIT_WORK(&ndr->work, nvme_debug_admin_rq);
> +		queue_work(ctrl->admin_wq, &ndr->work);
> +		return BLK_STS_OK;
> +	} else if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
> +		blk_mq_end_request(req, BLK_STS_OK);
> +		return BLK_STS_OK;
> +	} else {
> +		INIT_WORK(&ndr->work, nvme_debug_io_rq);
> +		queue_work(ctrl->io_wq, &ndr->work);
> +		return BLK_STS_OK;
> +	}
> +}
> +
> +static int nvme_debug_init_request(struct blk_mq_tag_set *set, struct request *req, unsigned hctx_idx, unsigned numa_node)
> +{
> +	struct nvme_debug_ctrl *ctrl = set->driver_data;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	nvme_req(req)->ctrl = &ctrl->ctrl;
> +	nvme_req(req)->cmd = &ndr->cmd;
> +	return 0;
> +}
> +
> +static int nvme_debug_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned hctx_idx)
> +{
> +	struct nvme_debug_ctrl *ctrl = data;
> +	hctx->driver_data = ctrl;
> +	return 0;
> +}
> +
> +static const struct blk_mq_ops nvme_debug_mq_ops = {
> +	.queue_rq = nvme_debug_queue_rq,
> +	.init_request = nvme_debug_init_request,
> +	.init_hctx = nvme_debug_init_hctx,
> +};
> +
> +static int nvme_debug_configure_admin_queue(struct nvme_debug_ctrl *ctrl)
> +{
> +	int r;
> +
> +	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
> +	ctrl->admin_tag_set.ops = &nvme_debug_mq_ops;
> +	ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
> +	ctrl->admin_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
> +	ctrl->admin_tag_set.numa_node = ctrl->ctrl.numa_node;
> +	ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_debug_request);
> +	ctrl->admin_tag_set.driver_data = ctrl;
> +	ctrl->admin_tag_set.nr_hw_queues = 1;
> +	ctrl->admin_tag_set.timeout = NVME_ADMIN_TIMEOUT;
> +	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
> +
> +	r = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
> +	if (r)
> +		goto ret0;
> +	ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
> +
> +	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
> +	if (IS_ERR(ctrl->ctrl.admin_q)) {
> +		r = PTR_ERR(ctrl->ctrl.admin_q);
> +		goto ret1;
> +	}
> +
> +	r = nvme_enable_ctrl(&ctrl->ctrl);
> +	if (r)
> +		goto ret2;
> +
> +	nvme_start_admin_queue(&ctrl->ctrl);
> +
> +	r = nvme_init_ctrl_finish(&ctrl->ctrl);
> +	if (r)
> +		goto ret3;
> +
> +	return 0;
> +
> +ret3:
> +	nvme_stop_admin_queue(&ctrl->ctrl);
> +	blk_sync_queue(ctrl->ctrl.admin_q);
> +ret2:
> +	blk_cleanup_queue(ctrl->ctrl.admin_q);
> +ret1:
> +	blk_mq_free_tag_set(&ctrl->admin_tag_set);
> +ret0:
> +	return r;
> +}
> +
> +static int nvme_debug_configure_io_queue(struct nvme_debug_ctrl *ctrl)
> +{
> +	int r;
> +
> +	memset(&ctrl->io_tag_set, 0, sizeof(ctrl->io_tag_set));
> +	ctrl->io_tag_set.ops = &nvme_debug_mq_ops;
> +	ctrl->io_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
> +	ctrl->io_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
> +	ctrl->io_tag_set.numa_node = ctrl->ctrl.numa_node;
> +	ctrl->io_tag_set.cmd_size = sizeof(struct nvme_debug_request);
> +	ctrl->io_tag_set.driver_data = ctrl;
> +	ctrl->io_tag_set.nr_hw_queues = 1;
> +	ctrl->io_tag_set.timeout = NVME_ADMIN_TIMEOUT;
> +	ctrl->io_tag_set.flags = BLK_MQ_F_NO_SCHED;
> +
> +	r = blk_mq_alloc_tag_set(&ctrl->io_tag_set);
> +	if (r)
> +		goto ret0;
> +	ctrl->ctrl.tagset = &ctrl->io_tag_set;
> +	return 0;
> +
> +ret0:
> +	return r;
> +}
> +
> +static const struct nvme_ctrl_ops nvme_debug_ctrl_ops = {
> +	.name = "debug",
> +	.module = THIS_MODULE,
> +	.flags = NVME_F_FABRICS,
> +	.reg_read32 = nvme_debug_reg_read32,
> +	.reg_read64 = nvme_debug_reg_read64,
> +	.reg_write32 = nvme_debug_reg_write32,
> +	.free_ctrl = nvme_debug_free_ctrl,
> +	.submit_async_event = nvme_debug_submit_async_event,
> +	.delete_ctrl = nvme_debug_delete_ctrl,
> +	.get_address = nvme_debug_get_address,
> +};
> +
> +static struct nvme_ctrl *nvme_debug_create_ctrl(struct device *dev,
> +		struct nvmf_ctrl_options *opts)
> +{
> +	int r;
> +	struct nvme_debug_ctrl *ctrl;
> +
> +	ctrl = kzalloc(sizeof(struct nvme_debug_ctrl), GFP_KERNEL);
> +	if (!ctrl) {
> +		r = -ENOMEM;
> +		goto ret0;
> +	}
> +
> +	INIT_LIST_HEAD(&ctrl->list);
> +	INIT_LIST_HEAD(&ctrl->namespaces);
> +	ctrl->ctrl.opts = opts;
> +	ctrl->ctrl.queue_count = 2;
> +	INIT_WORK(&ctrl->ctrl.reset_work, nvme_debug_reset_ctrl_work);
> +
> +	ctrl->admin_wq = alloc_workqueue("nvme-debug-admin", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
> +	if (!ctrl->admin_wq)
> +		goto ret1;
> +
> +	ctrl->io_wq = alloc_workqueue("nvme-debug-io", WQ_MEM_RECLAIM, 0);
> +	if (!ctrl->io_wq)
> +		goto ret1;
> +
> +	if (!nvme_debug_alloc_namespace(ctrl)) {
> +		r = -ENOMEM;
> +		goto ret1;
> +	}
> +
> +	r = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_debug_ctrl_ops, 0);
> +	if (r)
> +		goto ret1;
> +
> +        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
> +		goto ret2;
> +
> +	r = nvme_debug_configure_admin_queue(ctrl);
> +	if (r)
> +		goto ret2;
> +
> +	r = nvme_debug_configure_io_queue(ctrl);
> +	if (r)
> +		goto ret2;
> +
> +        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE))
> +		goto ret2;
> +
> +	nvme_start_ctrl(&ctrl->ctrl);
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_add_tail(&ctrl->list, &nvme_debug_ctrl_list);
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +
> +	return &ctrl->ctrl;
> +
> +ret2:
> +	nvme_uninit_ctrl(&ctrl->ctrl);
> +	nvme_put_ctrl(&ctrl->ctrl);
> +	return ERR_PTR(r);
> +ret1:
> +	nvme_debug_free_namespaces(ctrl);
> +	if (ctrl->admin_wq)
> +		destroy_workqueue(ctrl->admin_wq);
> +	if (ctrl->io_wq)
> +		destroy_workqueue(ctrl->io_wq);
> +	kfree(ctrl);
> +ret0:
> +	return ERR_PTR(r);
> +}
> +
> +static struct nvmf_transport_ops nvme_debug_transport = {
> +	.name		= "debug",
> +	.module		= THIS_MODULE,
> +	.required_opts	= NVMF_OPT_TRADDR,
> +	.allowed_opts	= NVMF_OPT_CTRL_LOSS_TMO,
> +	.create_ctrl	= nvme_debug_create_ctrl,
> +};
> +
> +static int __init nvme_debug_init_module(void)
> +{
> +	nvmf_register_transport(&nvme_debug_transport);
> +	return 0;
> +}
> +
> +static void __exit nvme_debug_cleanup_module(void)
> +{
> +	struct nvme_debug_ctrl *ctrl;
> +
> +	nvmf_unregister_transport(&nvme_debug_transport);
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_for_each_entry(ctrl, &nvme_debug_ctrl_list, list) {
> +		nvme_delete_ctrl(&ctrl->ctrl);
> +	}
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +	flush_workqueue(nvme_delete_wq);
> +}
> +
> +module_init(nvme_debug_init_module);
> +module_exit(nvme_debug_cleanup_module);
> +
> +MODULE_LICENSE("GPL v2");
> 

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-02  6:01           ` Adam Manzanares
  0 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-02  6:01 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, dsterba, kbus @imap.gmail.com>> Keith Busch,
	Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Tue, Feb 01, 2022 at 01:33:47PM -0500, Mikulas Patocka wrote:
> This patch adds a new driver "nvme-debug". It uses memory as a backing
> store and it is used to test the copy offload functionality.
>

We have looked at something similar to create a null nvme driver to test 
interfaces poking the nvme driver without using the block layer. Have you 
considered implementing this in the fabrics code since it is already virtualizng
a ssd controller? 

BTW I think having the target code be able to implement simple copy without 
moving data over the fabric would be a great way of showing off the command.


> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  drivers/nvme/host/Kconfig      |   13 
>  drivers/nvme/host/Makefile     |    1 
>  drivers/nvme/host/nvme-debug.c |  838 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 852 insertions(+)
> 
> Index: linux-2.6/drivers/nvme/host/Kconfig
> ===================================================================
> --- linux-2.6.orig/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
> +++ linux-2.6/drivers/nvme/host/Kconfig	2022-02-01 18:34:22.000000000 +0100
> @@ -83,3 +83,16 @@ config NVME_TCP
>  	  from https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=26ebb71c-79708e5a-26ea3c53-0cc47a3003e8-a628bd4c53dd84d1&q=1&e=9f16193b-4232-4453-9889-7cdf5d653922&u=https*3A*2F*2Fgithub.com*2Flinux-nvme*2Fnvme-cli__;JSUlJSU!!EwVzqGoTKBqv-0DWAJBm!HyJHAJgOq3M4SIOA0HvhX95q50ACkRtsiHWmAQERqjGLQ0pLb_Jru8QcwOQyix0tTVJA$ .
>  
>  	  If unsure, say N.
> +
> +config NVME_DEBUG
> +	tristate "NVM Express debug"
> +	depends on INET
> +	depends on BLOCK
> +	select NVME_CORE
> +	select NVME_FABRICS
> +	select CRYPTO
> +	select CRYPTO_CRC32C
> +	help
> +	  This pseudo driver simulates a NVMe adapter.
> +
> +	  If unsure, say N.
> Index: linux-2.6/drivers/nvme/host/Makefile
> ===================================================================
> --- linux-2.6.orig/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
> +++ linux-2.6/drivers/nvme/host/Makefile	2022-02-01 18:34:22.000000000 +0100
> @@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)		+= nvme-fabr
>  obj-$(CONFIG_NVME_RDMA)			+= nvme-rdma.o
>  obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
>  obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
> +obj-$(CONFIG_NVME_DEBUG)		+= nvme-debug.o
>  
>  nvme-core-y				:= core.o ioctl.o
>  nvme-core-$(CONFIG_TRACING)		+= trace.o
> Index: linux-2.6/drivers/nvme/host/nvme-debug.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6/drivers/nvme/host/nvme-debug.c	2022-02-01 18:34:22.000000000 +0100
> @@ -0,0 +1,838 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe debug
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/err.h>
> +#include <linux/blk-mq.h>
> +#include <linux/sort.h>
> +#include <linux/version.h>
> +
> +#include "nvme.h"
> +#include "fabrics.h"
> +
> +static ulong dev_size_mb = 16;
> +module_param_named(dev_size_mb, dev_size_mb, ulong, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(dev_size_mb, "size in MiB of the namespace(def=8)");
> +
> +static unsigned sector_size = 512;
> +module_param_named(sector_size, sector_size, uint, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)");
> +
> +struct nvme_debug_ctrl {
> +	struct nvme_ctrl ctrl;
> +	uint32_t reg_cc;
> +	struct blk_mq_tag_set admin_tag_set;
> +	struct blk_mq_tag_set io_tag_set;
> +	struct workqueue_struct *admin_wq;
> +	struct workqueue_struct *io_wq;
> +	struct list_head namespaces;
> +	struct list_head list;
> +};
> +
> +struct nvme_debug_namespace {
> +	struct list_head list;
> +	uint32_t nsid;
> +	unsigned char sector_size_bits;
> +	size_t n_sectors;
> +	void *space;
> +	char uuid[16];
> +};
> +
> +struct nvme_debug_request {
> +	struct nvme_request req;
> +	struct nvme_command cmd;
> +	struct work_struct work;
> +};
> +
> +static LIST_HEAD(nvme_debug_ctrl_list);
> +static DEFINE_MUTEX(nvme_debug_ctrl_mutex);
> +
> +DEFINE_STATIC_PERCPU_RWSEM(nvme_debug_sem);
> +
> +static inline struct nvme_debug_ctrl *to_debug_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	return container_of(nctrl, struct nvme_debug_ctrl, ctrl);
> +}
> +
> +static struct nvme_debug_namespace *nvme_debug_find_namespace(struct nvme_debug_ctrl *ctrl, unsigned nsid)
> +{
> +	struct nvme_debug_namespace *ns;
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->nsid == nsid)
> +			return ns;
> +	}
> +	return NULL;
> +}
> +
> +static bool nvme_debug_alloc_namespace(struct nvme_debug_ctrl *ctrl)
> +{
> +	struct nvme_debug_namespace *ns;
> +	unsigned s;
> +	size_t dsm;
> +
> +	ns = kmalloc(sizeof(struct nvme_debug_namespace), GFP_KERNEL);
> +	if (!ns)
> +		goto fail0;
> +
> +	ns->nsid = 1;
> +	while (nvme_debug_find_namespace(ctrl, ns->nsid))
> +		ns->nsid++;
> +
> +	s = READ_ONCE(sector_size);
> +	if (s < 512 || s > PAGE_SIZE || !is_power_of_2(s))
> +		goto fail1;
> +	ns->sector_size_bits = __ffs(s);
> +	dsm = READ_ONCE(dev_size_mb);
> +	ns->n_sectors = dsm << (20 - ns->sector_size_bits);
> +	if (ns->n_sectors >> (20 - ns->sector_size_bits) != dsm)
> +		goto fail1;
> +
> +	if (ns->n_sectors << ns->sector_size_bits >> ns->sector_size_bits != ns->n_sectors)
> +		goto fail1;
> +
> +	ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits);
> +	if (!ns->space)
> +		goto fail1;
> +
> +	generate_random_uuid(ns->uuid);
> +
> +	list_add(&ns->list, &ctrl->namespaces);
> +	return true;
> +
> +fail1:
> +	kfree(ns);
> +fail0:
> +	return false;
> +}
> +
> +static void nvme_debug_free_namespace(struct nvme_debug_namespace *ns)
> +{
> +	vfree(ns->space);
> +	list_del(&ns->list);
> +	kfree(ns);
> +}
> +
> +static int nvme_debug_reg_read32(struct nvme_ctrl *nctrl, u32 off, u32 *val)
> +{
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
> +	switch (off) {
> +		case NVME_REG_VS: {
> +			*val = 0x20000;
> +			break;
> +		}
> +		case NVME_REG_CC: {
> +			*val = ctrl->reg_cc;
> +			break;
> +		}
> +		case NVME_REG_CSTS: {
> +			*val = 0;
> +			if (ctrl->reg_cc & NVME_CC_ENABLE)
> +				*val |= NVME_CSTS_RDY;
> +			if (ctrl->reg_cc & NVME_CC_SHN_MASK)
> +				*val |= NVME_CSTS_SHST_CMPLT;
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_reg_read32: %x\n", off);
> +			return -ENOSYS;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
> +{
> +	switch (off) {
> +		case NVME_REG_CAP: {
> +			*val = (1ULL << 0) | (1ULL << 37);
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_reg_read64: %x\n", off);
> +			return -ENOSYS;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
> +{
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
> +	switch (off) {
> +		case NVME_REG_CC: {
> +			ctrl->reg_cc = val;
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_reg_write32: %x, %x\n", off, val);
> +			return -ENOSYS;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static void nvme_debug_submit_async_event(struct nvme_ctrl *nctrl)
> +{
> +	printk("nvme_debug_submit_async_event\n");
> +}
> +
> +static void nvme_debug_delete_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	nvme_shutdown_ctrl(nctrl);
> +}
> +
> +static void nvme_debug_free_namespaces(struct nvme_debug_ctrl *ctrl)
> +{
> +	if (!list_empty(&ctrl->namespaces)) {
> +		struct nvme_debug_namespace *ns = container_of(ctrl->namespaces.next, struct nvme_debug_namespace, list);
> +		nvme_debug_free_namespace(ns);
> +	}
> +}
> +
> +static void nvme_debug_free_ctrl(struct nvme_ctrl *nctrl)
> +{
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
> +
> +	flush_workqueue(ctrl->admin_wq);
> +	flush_workqueue(ctrl->io_wq);
> +
> +	nvme_debug_free_namespaces(ctrl);
> +
> +	if (list_empty(&ctrl->list))
> +		goto free_ctrl;
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_del(&ctrl->list);
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +
> +	nvmf_free_options(nctrl->opts);
> +free_ctrl:
> +	destroy_workqueue(ctrl->admin_wq);
> +	destroy_workqueue(ctrl->io_wq);
> +	kfree(ctrl);
> +}
> +
> +static int nvme_debug_get_address(struct nvme_ctrl *nctrl, char *buf, int size)
> +{
> +	int len = 0;
> +	len += snprintf(buf, size, "debug");
> +	return len;
> +}
> +
> +static void nvme_debug_reset_ctrl_work(struct work_struct *work)
> +{
> +	printk("nvme_reset_ctrl_work\n");
> +}
> +
> +static void copy_data_request(struct request *req, void *data, size_t size, bool to_req)
> +{
> +	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
> +		void *addr;
> +		struct bio_vec bv = req->special_vec;
> +		addr = kmap_atomic(bv.bv_page);
> +		if (to_req) {
> +			memcpy(addr + bv.bv_offset, data, bv.bv_len);
> +			flush_dcache_page(bv.bv_page);
> +		} else {
> +			flush_dcache_page(bv.bv_page);
> +			memcpy(data, addr + bv.bv_offset, bv.bv_len);
> +		}
> +		kunmap_atomic(addr);
> +		data += bv.bv_len;
> +		size -= bv.bv_len;
> +	} else {
> +		struct req_iterator bi;
> +		struct bio_vec bv;
> +		rq_for_each_segment(bv, req, bi) {
> +			void *addr;
> +			addr = kmap_atomic(bv.bv_page);
> +			if (to_req) {
> +				memcpy(addr + bv.bv_offset, data, bv.bv_len);
> +				flush_dcache_page(bv.bv_page);
> +			} else {
> +				flush_dcache_page(bv.bv_page);
> +				memcpy(data, addr + bv.bv_offset, bv.bv_len);
> +			}
> +			kunmap_atomic(addr);
> +			data += bv.bv_len;
> +			size -= bv.bv_len;
> +		}
> +	}
> +	if (size)
> +		printk("size mismatch: %lx\n", (unsigned long)size);
> +}
> +
> +static void nvme_debug_identify_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_id_ns *id;
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +
> +	id = kzalloc(sizeof(struct nvme_id_ns), GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
> +	if (!ns) {
> +		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
> +		goto free_ret;
> +	}
> +
> +	id->nsze = cpu_to_le64(ns->n_sectors);
> +	id->ncap = id->nsze;
> +	id->nuse = id->nsze;
> +	/*id->nlbaf = 0;*/
> +	id->dlfeat = 0x01;
> +	id->lbaf[0].ds = ns->sector_size_bits;
> +
> +	copy_data_request(req, id, sizeof(struct nvme_id_ns), true);
> +
> +free_ret:
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_identify_ctrl(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_id_ctrl *id;
> +	char ver[9];
> +	size_t ver_len;
> +
> +	id = kzalloc(sizeof(struct nvme_id_ctrl), GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	id->vid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
> +	id->ssvid = cpu_to_le16(PCI_VENDOR_ID_REDHAT);
> +	memset(id->sn, ' ', sizeof id->sn);
> +	memset(id->mn, ' ', sizeof id->mn);
> +	memcpy(id->mn, "nvme-debug", 10);
> +	snprintf(ver, sizeof ver, "%X", LINUX_VERSION_CODE);
> +	ver_len = min(strlen(ver), sizeof id->fr);
> +	memset(id->fr, ' ', sizeof id->fr);
> +	memcpy(id->fr, ver, ver_len);
> +	memcpy(id->ieee, "\xe9\xf2\x40", sizeof id->ieee);
> +	id->ver = cpu_to_le32(0x20000);
> +	id->kas = cpu_to_le16(100);
> +	id->sqes = 0x66;
> +	id->cqes = 0x44;
> +	id->maxcmd = cpu_to_le16(1);
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->nsid > le32_to_cpu(id->nn))
> +			id->nn = cpu_to_le32(ns->nsid);
> +	}
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_COPY);
> +	id->vwc = 0x6;
> +	id->mnan = cpu_to_le32(0xffffffff);
> +	strcpy(id->subnqn, "nqn.2021-09.com.redhat:nvme-debug");
> +	id->ioccsz = cpu_to_le32(4);
> +	id->iorcsz = cpu_to_le32(1);
> +
> +	copy_data_request(req, id, sizeof(struct nvme_id_ctrl), true);
> +
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static int cmp_ns(const void *a1, const void *a2)
> +{
> +	__u32 v1 = le32_to_cpu(*(__u32 *)a1);
> +	__u32 v2 = le32_to_cpu(*(__u32 *)a2);
> +	if (!v1)
> +		v1 = 0xffffffffU;
> +	if (!v2)
> +		v2 = 0xffffffffU;
> +	if (v1 < v2)
> +		return -1;
> +	if (v1 > v2)
> +		return 1;
> +	return 0;
> +}
> +
> +static void nvme_debug_identify_active_ns(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	unsigned size;
> +	__u32 *id;
> +	unsigned idp;
> +
> +	if (le32_to_cpu(ndr->cmd.identify.nsid) >= 0xFFFFFFFE) {
> +		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
> +		blk_mq_end_request(req, BLK_STS_OK);
> +		return;
> +	}
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	size = 0;
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		size++;
> +	}
> +	size = min(size, 1024U);
> +
> +	id = kzalloc(sizeof(__u32) * size, GFP_NOIO);
> +	if (!id) {
> +		mutex_unlock(&nvme_debug_ctrl_mutex);
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	idp = 0;
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->nsid > le32_to_cpu(ndr->cmd.identify.nsid))
> +			id[idp++] = cpu_to_le32(ns->nsid);
> +	}
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +	sort(id, idp, sizeof(__u32), cmp_ns, NULL);
> +
> +	copy_data_request(req, id, sizeof(__u32) * 1024, true);
> +
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_identify_ns_desc_list(struct nvme_debug_ctrl *ctrl, struct request *req)
> +{
> +	struct nvme_ns_id_desc *id;
> +	struct nvme_debug_namespace *ns;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	id = kzalloc(4096, GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid));
> +	if (!ns) {
> +		nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS);
> +		goto free_ret;
> +	}
> +
> +	id->nidt = NVME_NIDT_UUID;
> +	id->nidl = NVME_NIDT_UUID_LEN;
> +	memcpy((char *)(id + 1), ns->uuid, NVME_NIDT_UUID_LEN);
> +
> +	copy_data_request(req, id, 4096, true);
> +
> +free_ret:
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_identify_ctrl_cs(struct request *req)
> +{
> +	struct nvme_id_ctrl_nvm *id;
> +	id = kzalloc(sizeof(struct nvme_id_ctrl_nvm), GFP_NOIO);
> +	if (!id) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	copy_data_request(req, id, sizeof(struct nvme_id_ctrl_nvm), true);
> +
> +	kfree(id);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_admin_rq(struct work_struct *w)
> +{
> +	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
> +	struct request *req = (struct request *)ndr - 1;
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
> +
> +	switch (ndr->cmd.common.opcode) {
> +		case nvme_admin_identify: {
> +			switch (ndr->cmd.identify.cns) {
> +				case NVME_ID_CNS_NS: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ns(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				};
> +				case NVME_ID_CNS_CTRL: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ctrl(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				case NVME_ID_CNS_NS_ACTIVE_LIST: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_active_ns(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				case NVME_ID_CNS_NS_DESC_LIST: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ns_desc_list(ctrl, req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				case NVME_ID_CNS_CS_CTRL: {
> +					percpu_down_read(&nvme_debug_sem);
> +					nvme_debug_identify_ctrl_cs(req);
> +					percpu_up_read(&nvme_debug_sem);
> +					return;
> +				}
> +				default: {
> +					printk("nvme_admin_identify: %x\n", ndr->cmd.identify.cns);
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +		default: {
> +			printk("nvme_debug_admin_rq: %x\n", ndr->cmd.common.opcode);
> +			break;
> +		}
> +	}
> +	blk_mq_end_request(req, BLK_STS_NOTSUPP);
> +}
> +
> +static void nvme_debug_rw(struct nvme_debug_namespace *ns, struct request *req)
> +{
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	__u64 lba = cpu_to_le64(ndr->cmd.rw.slba);
> +	__u64 len = (__u64)cpu_to_le16(ndr->cmd.rw.length) + 1;
> +	void *addr;
> +	if (unlikely(lba + len < lba) || unlikely(lba + len > ns->n_sectors)) {
> +		blk_mq_end_request(req, BLK_STS_NOTSUPP);
> +		return;
> +	}
> +	addr = ns->space + (lba << ns->sector_size_bits);
> +	copy_data_request(req, addr, len << ns->sector_size_bits, ndr->cmd.rw.opcode == nvme_cmd_read);
> +	blk_mq_end_request(req, BLK_STS_OK);
> +}
> +
> +static void nvme_debug_copy(struct nvme_debug_namespace *ns, struct request *req)
> +{
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	__u64 dlba = cpu_to_le64(ndr->cmd.copy.sdlba);
> +	unsigned n_descs = ndr->cmd.copy.length + 1;
> +	struct nvme_copy_desc *descs;
> +	unsigned i, ret;
> +
> +	descs = kmalloc(sizeof(struct nvme_copy_desc) * n_descs, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
> +	if (!descs) {
> +		blk_mq_end_request(req, BLK_STS_RESOURCE);
> +		return;
> +	}
> +
> +	copy_data_request(req, descs, sizeof(struct nvme_copy_desc) * n_descs, false);
> +
> +	for (i = 0; i < n_descs; i++) {
> +		struct nvme_copy_desc *desc = &descs[i];
> +		__u64 slba = cpu_to_le64(desc->slba);
> +		__u64 len = (__u64)cpu_to_le16(desc->length) + 1;
> +		void *saddr, *daddr;
> +
> +		if (unlikely(slba + len < slba) || unlikely(slba + len > ns->n_sectors) ||
> +		    unlikely(dlba + len < dlba) || unlikely(dlba + len > ns->n_sectors)) {
> +			ret = BLK_STS_NOTSUPP;
> +			goto free_ret;
> +		}
> +
> +		saddr = ns->space + (slba << ns->sector_size_bits);
> +		daddr = ns->space + (dlba << ns->sector_size_bits);
> +
> +		memcpy(daddr, saddr, len << ns->sector_size_bits);
> +
> +		dlba += len;
> +	}
> +
> +	ret = BLK_STS_OK;
> +
> +free_ret:
> +	kfree(descs);
> +
> +	blk_mq_end_request(req, ret);
> +}
> +
> +static void nvme_debug_io_rq(struct work_struct *w)
> +{
> +	struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work);
> +	struct request *req = (struct request *)ndr - 1;
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
> +	__u32 nsid = le32_to_cpu(ndr->cmd.common.nsid);
> +	struct nvme_debug_namespace *ns;
> +
> +	percpu_down_read(&nvme_debug_sem);
> +	ns = nvme_debug_find_namespace(ctrl, nsid);
> +	if (unlikely(!ns))
> +		goto ret_notsupp;
> +
> +	switch (ndr->cmd.common.opcode) {
> +		case nvme_cmd_flush: {
> +			blk_mq_end_request(req, BLK_STS_OK);
> +			goto ret;
> +		}
> +		case nvme_cmd_read:
> +		case nvme_cmd_write: {
> +			nvme_debug_rw(ns, req);
> +			goto ret;
> +		}
> +		case nvme_cmd_copy: {
> +			nvme_debug_copy(ns, req);
> +			goto ret;
> +		}
> +		default: {
> +			printk("nvme_debug_io_rq: %x\n", ndr->cmd.common.opcode);
> +			break;
> +		}
> +	}
> +ret_notsupp:
> +	blk_mq_end_request(req, BLK_STS_NOTSUPP);
> +ret:
> +	percpu_up_read(&nvme_debug_sem);
> +}
> +
> +static blk_status_t nvme_debug_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd)
> +{
> +	struct request *req = bd->rq;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl);
> +	struct nvme_ns *ns = hctx->queue->queuedata;
> +	blk_status_t r;
> +
> +	r = nvme_setup_cmd(ns, req);
> +	if (unlikely(r))
> +		return r;
> +
> +	if (!ns) {
> +		INIT_WORK(&ndr->work, nvme_debug_admin_rq);
> +		queue_work(ctrl->admin_wq, &ndr->work);
> +		return BLK_STS_OK;
> +	} else if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) {
> +		blk_mq_end_request(req, BLK_STS_OK);
> +		return BLK_STS_OK;
> +	} else {
> +		INIT_WORK(&ndr->work, nvme_debug_io_rq);
> +		queue_work(ctrl->io_wq, &ndr->work);
> +		return BLK_STS_OK;
> +	}
> +}
> +
> +static int nvme_debug_init_request(struct blk_mq_tag_set *set, struct request *req, unsigned hctx_idx, unsigned numa_node)
> +{
> +	struct nvme_debug_ctrl *ctrl = set->driver_data;
> +	struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req);
> +	nvme_req(req)->ctrl = &ctrl->ctrl;
> +	nvme_req(req)->cmd = &ndr->cmd;
> +	return 0;
> +}
> +
> +static int nvme_debug_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned hctx_idx)
> +{
> +	struct nvme_debug_ctrl *ctrl = data;
> +	hctx->driver_data = ctrl;
> +	return 0;
> +}
> +
> +static const struct blk_mq_ops nvme_debug_mq_ops = {
> +	.queue_rq = nvme_debug_queue_rq,
> +	.init_request = nvme_debug_init_request,
> +	.init_hctx = nvme_debug_init_hctx,
> +};
> +
> +static int nvme_debug_configure_admin_queue(struct nvme_debug_ctrl *ctrl)
> +{
> +	int r;
> +
> +	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
> +	ctrl->admin_tag_set.ops = &nvme_debug_mq_ops;
> +	ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
> +	ctrl->admin_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
> +	ctrl->admin_tag_set.numa_node = ctrl->ctrl.numa_node;
> +	ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_debug_request);
> +	ctrl->admin_tag_set.driver_data = ctrl;
> +	ctrl->admin_tag_set.nr_hw_queues = 1;
> +	ctrl->admin_tag_set.timeout = NVME_ADMIN_TIMEOUT;
> +	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
> +
> +	r = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
> +	if (r)
> +		goto ret0;
> +	ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
> +
> +	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
> +	if (IS_ERR(ctrl->ctrl.admin_q)) {
> +		r = PTR_ERR(ctrl->ctrl.admin_q);
> +		goto ret1;
> +	}
> +
> +	r = nvme_enable_ctrl(&ctrl->ctrl);
> +	if (r)
> +		goto ret2;
> +
> +	nvme_start_admin_queue(&ctrl->ctrl);
> +
> +	r = nvme_init_ctrl_finish(&ctrl->ctrl);
> +	if (r)
> +		goto ret3;
> +
> +	return 0;
> +
> +ret3:
> +	nvme_stop_admin_queue(&ctrl->ctrl);
> +	blk_sync_queue(ctrl->ctrl.admin_q);
> +ret2:
> +	blk_cleanup_queue(ctrl->ctrl.admin_q);
> +ret1:
> +	blk_mq_free_tag_set(&ctrl->admin_tag_set);
> +ret0:
> +	return r;
> +}
> +
> +static int nvme_debug_configure_io_queue(struct nvme_debug_ctrl *ctrl)
> +{
> +	int r;
> +
> +	memset(&ctrl->io_tag_set, 0, sizeof(ctrl->io_tag_set));
> +	ctrl->io_tag_set.ops = &nvme_debug_mq_ops;
> +	ctrl->io_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
> +	ctrl->io_tag_set.reserved_tags = NVMF_RESERVED_TAGS;
> +	ctrl->io_tag_set.numa_node = ctrl->ctrl.numa_node;
> +	ctrl->io_tag_set.cmd_size = sizeof(struct nvme_debug_request);
> +	ctrl->io_tag_set.driver_data = ctrl;
> +	ctrl->io_tag_set.nr_hw_queues = 1;
> +	ctrl->io_tag_set.timeout = NVME_ADMIN_TIMEOUT;
> +	ctrl->io_tag_set.flags = BLK_MQ_F_NO_SCHED;
> +
> +	r = blk_mq_alloc_tag_set(&ctrl->io_tag_set);
> +	if (r)
> +		goto ret0;
> +	ctrl->ctrl.tagset = &ctrl->io_tag_set;
> +	return 0;
> +
> +ret0:
> +	return r;
> +}
> +
> +static const struct nvme_ctrl_ops nvme_debug_ctrl_ops = {
> +	.name = "debug",
> +	.module = THIS_MODULE,
> +	.flags = NVME_F_FABRICS,
> +	.reg_read32 = nvme_debug_reg_read32,
> +	.reg_read64 = nvme_debug_reg_read64,
> +	.reg_write32 = nvme_debug_reg_write32,
> +	.free_ctrl = nvme_debug_free_ctrl,
> +	.submit_async_event = nvme_debug_submit_async_event,
> +	.delete_ctrl = nvme_debug_delete_ctrl,
> +	.get_address = nvme_debug_get_address,
> +};
> +
> +static struct nvme_ctrl *nvme_debug_create_ctrl(struct device *dev,
> +		struct nvmf_ctrl_options *opts)
> +{
> +	int r;
> +	struct nvme_debug_ctrl *ctrl;
> +
> +	ctrl = kzalloc(sizeof(struct nvme_debug_ctrl), GFP_KERNEL);
> +	if (!ctrl) {
> +		r = -ENOMEM;
> +		goto ret0;
> +	}
> +
> +	INIT_LIST_HEAD(&ctrl->list);
> +	INIT_LIST_HEAD(&ctrl->namespaces);
> +	ctrl->ctrl.opts = opts;
> +	ctrl->ctrl.queue_count = 2;
> +	INIT_WORK(&ctrl->ctrl.reset_work, nvme_debug_reset_ctrl_work);
> +
> +	ctrl->admin_wq = alloc_workqueue("nvme-debug-admin", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
> +	if (!ctrl->admin_wq)
> +		goto ret1;
> +
> +	ctrl->io_wq = alloc_workqueue("nvme-debug-io", WQ_MEM_RECLAIM, 0);
> +	if (!ctrl->io_wq)
> +		goto ret1;
> +
> +	if (!nvme_debug_alloc_namespace(ctrl)) {
> +		r = -ENOMEM;
> +		goto ret1;
> +	}
> +
> +	r = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_debug_ctrl_ops, 0);
> +	if (r)
> +		goto ret1;
> +
> +        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
> +		goto ret2;
> +
> +	r = nvme_debug_configure_admin_queue(ctrl);
> +	if (r)
> +		goto ret2;
> +
> +	r = nvme_debug_configure_io_queue(ctrl);
> +	if (r)
> +		goto ret2;
> +
> +        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE))
> +		goto ret2;
> +
> +	nvme_start_ctrl(&ctrl->ctrl);
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_add_tail(&ctrl->list, &nvme_debug_ctrl_list);
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +
> +	return &ctrl->ctrl;
> +
> +ret2:
> +	nvme_uninit_ctrl(&ctrl->ctrl);
> +	nvme_put_ctrl(&ctrl->ctrl);
> +	return ERR_PTR(r);
> +ret1:
> +	nvme_debug_free_namespaces(ctrl);
> +	if (ctrl->admin_wq)
> +		destroy_workqueue(ctrl->admin_wq);
> +	if (ctrl->io_wq)
> +		destroy_workqueue(ctrl->io_wq);
> +	kfree(ctrl);
> +ret0:
> +	return ERR_PTR(r);
> +}
> +
> +static struct nvmf_transport_ops nvme_debug_transport = {
> +	.name		= "debug",
> +	.module		= THIS_MODULE,
> +	.required_opts	= NVMF_OPT_TRADDR,
> +	.allowed_opts	= NVMF_OPT_CTRL_LOSS_TMO,
> +	.create_ctrl	= nvme_debug_create_ctrl,
> +};
> +
> +static int __init nvme_debug_init_module(void)
> +{
> +	nvmf_register_transport(&nvme_debug_transport);
> +	return 0;
> +}
> +
> +static void __exit nvme_debug_cleanup_module(void)
> +{
> +	struct nvme_debug_ctrl *ctrl;
> +
> +	nvmf_unregister_transport(&nvme_debug_transport);
> +
> +	mutex_lock(&nvme_debug_ctrl_mutex);
> +	list_for_each_entry(ctrl, &nvme_debug_ctrl_list, list) {
> +		nvme_delete_ctrl(&ctrl->ctrl);
> +	}
> +	mutex_unlock(&nvme_debug_ctrl_mutex);
> +	flush_workqueue(nvme_delete_wq);
> +}
> +
> +module_init(nvme_debug_init_module);
> +module_exit(nvme_debug_cleanup_module);
> +
> +MODULE_LICENSE("GPL v2");
> 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
@ 2022-02-02  7:23           ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-02  7:23 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: llvm, kbuild-all

Hi Mikulas,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc2 next-20220202]
[cannot apply to linux-nvme/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikulas-Patocka/block-add-copy-offload-support/20220202-033318
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-allmodconfig (https://download.01.org/0day-ci/archive/20220202/202202021549.3g79sGNE-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b59e9d971770c50aaac1d07d6edaa10a4a47171b
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikulas-Patocka/block-add-copy-offload-support/20220202-033318
        git checkout b59e9d971770c50aaac1d07d6edaa10a4a47171b
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/media/cec/platform/seco/ drivers/net/ethernet/cavium/thunder/ drivers/net/wireless/ath/ drivers/nvme/host/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/nvme/host/nvme-debug.c:148:5: warning: no previous prototype for function 'nvme_debug_reg_read64' [-Wmissing-prototypes]
   int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
       ^
   drivers/nvme/host/nvme-debug.c:148:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
   ^
   static 
>> drivers/nvme/host/nvme-debug.c:163:5: warning: no previous prototype for function 'nvme_debug_reg_write32' [-Wmissing-prototypes]
   int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
       ^
   drivers/nvme/host/nvme-debug.c:163:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
   ^
   static 
>> drivers/nvme/host/nvme-debug.c:758:6: warning: variable 'r' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (!ctrl->io_wq)
               ^~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:804:17: note: uninitialized use occurs here
           return ERR_PTR(r);
                          ^
   drivers/nvme/host/nvme-debug.c:758:2: note: remove the 'if' if its condition is always false
           if (!ctrl->io_wq)
           ^~~~~~~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:754:6: warning: variable 'r' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (!ctrl->admin_wq)
               ^~~~~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:804:17: note: uninitialized use occurs here
           return ERR_PTR(r);
                          ^
   drivers/nvme/host/nvme-debug.c:754:2: note: remove the 'if' if its condition is always false
           if (!ctrl->admin_wq)
           ^~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:738:7: note: initialize the variable 'r' to silence this warning
           int r;
                ^
                 = 0
   4 warnings generated.


vim +/nvme_debug_reg_read64 +148 drivers/nvme/host/nvme-debug.c

   147	
 > 148	int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
   149	{
   150		switch (off) {
   151			case NVME_REG_CAP: {
   152				*val = (1ULL << 0) | (1ULL << 37);
   153				break;
   154			}
   155			default: {
   156				printk("nvme_debug_reg_read64: %x\n", off);
   157				return -ENOSYS;
   158			}
   159		}
   160		return 0;
   161	}
   162	
 > 163	int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
   164	{
   165		struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
   166		switch (off) {
   167			case NVME_REG_CC: {
   168				ctrl->reg_cc = val;
   169				break;
   170			}
   171			default: {
   172				printk("nvme_debug_reg_write32: %x, %x\n", off, val);
   173				return -ENOSYS;
   174			}
   175		}
   176		return 0;
   177	}
   178	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-02  7:23           ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-02  7:23 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4945 bytes --]

Hi Mikulas,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc2 next-20220202]
[cannot apply to linux-nvme/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikulas-Patocka/block-add-copy-offload-support/20220202-033318
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-allmodconfig (https://download.01.org/0day-ci/archive/20220202/202202021549.3g79sGNE-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b59e9d971770c50aaac1d07d6edaa10a4a47171b
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikulas-Patocka/block-add-copy-offload-support/20220202-033318
        git checkout b59e9d971770c50aaac1d07d6edaa10a4a47171b
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/media/cec/platform/seco/ drivers/net/ethernet/cavium/thunder/ drivers/net/wireless/ath/ drivers/nvme/host/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/nvme/host/nvme-debug.c:148:5: warning: no previous prototype for function 'nvme_debug_reg_read64' [-Wmissing-prototypes]
   int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
       ^
   drivers/nvme/host/nvme-debug.c:148:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
   ^
   static 
>> drivers/nvme/host/nvme-debug.c:163:5: warning: no previous prototype for function 'nvme_debug_reg_write32' [-Wmissing-prototypes]
   int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
       ^
   drivers/nvme/host/nvme-debug.c:163:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
   ^
   static 
>> drivers/nvme/host/nvme-debug.c:758:6: warning: variable 'r' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (!ctrl->io_wq)
               ^~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:804:17: note: uninitialized use occurs here
           return ERR_PTR(r);
                          ^
   drivers/nvme/host/nvme-debug.c:758:2: note: remove the 'if' if its condition is always false
           if (!ctrl->io_wq)
           ^~~~~~~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:754:6: warning: variable 'r' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (!ctrl->admin_wq)
               ^~~~~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:804:17: note: uninitialized use occurs here
           return ERR_PTR(r);
                          ^
   drivers/nvme/host/nvme-debug.c:754:2: note: remove the 'if' if its condition is always false
           if (!ctrl->admin_wq)
           ^~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/nvme-debug.c:738:7: note: initialize the variable 'r' to silence this warning
           int r;
                ^
                 = 0
   4 warnings generated.


vim +/nvme_debug_reg_read64 +148 drivers/nvme/host/nvme-debug.c

   147	
 > 148	int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val)
   149	{
   150		switch (off) {
   151			case NVME_REG_CAP: {
   152				*val = (1ULL << 0) | (1ULL << 37);
   153				break;
   154			}
   155			default: {
   156				printk("nvme_debug_reg_read64: %x\n", off);
   157				return -ENOSYS;
   158			}
   159		}
   160		return 0;
   161	}
   162	
 > 163	int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val)
   164	{
   165		struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl);
   166		switch (off) {
   167			case NVME_REG_CC: {
   168				ctrl->reg_cc = val;
   169				break;
   170			}
   171			default: {
   172				printk("nvme_debug_reg_write32: %x, %x\n", off, val);
   173				return -ENOSYS;
   174			}
   175		}
   176		return 0;
   177	}
   178	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
@ 2022-02-02  8:00           ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-02  8:00 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme,
	Javier González, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

Mikulas,

On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> External email: Use caution opening links or attachments
> 
> 
> This patch adds a new driver "nvme-debug". It uses memory as a backing
> store and it is used to test the copy offload functionality.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 


NVMe Controller specific memory backed features needs to go into
QEMU which are targeted for testing and debugging, just like what
we have done for NVMe ZNS QEMU support and not in kernel.

I don't see any special reason to make copy offload an exception.

-ck


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-02  8:00           ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-02  8:00 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, msnitzer@redhat.com >> msnitzer@redhat.com,
	josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

Mikulas,

On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> External email: Use caution opening links or attachments
> 
> 
> This patch adds a new driver "nvme-debug". It uses memory as a backing
> store and it is used to test the copy offload functionality.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 


NVMe Controller specific memory backed features needs to go into
QEMU which are targeted for testing and debugging, just like what
we have done for NVMe ZNS QEMU support and not in kernel.

I don't see any special reason to make copy offload an exception.

-ck


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-02  8:00           ` [dm-devel] " Chaitanya Kulkarni
@ 2022-02-02 12:38             ` Klaus Jensen
  -1 siblings, 0 replies; 272+ messages in thread
From: Klaus Jensen @ 2022-02-02 12:38 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Mikulas Patocka, linux-block, linux-scsi, dm-devel, linux-nvme,
	Javier González, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Feb  2 08:00, Chaitanya Kulkarni wrote:
> Mikulas,
> 
> On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > store and it is used to test the copy offload functionality.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> 
> 
> NVMe Controller specific memory backed features needs to go into
> QEMU which are targeted for testing and debugging, just like what
> we have done for NVMe ZNS QEMU support and not in kernel.
> 
> I don't see any special reason to make copy offload an exception.
> 

FWIW the emulated nvme device in QEMU already supports the Copy command.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-02 12:38             ` Klaus Jensen
  0 siblings, 0 replies; 272+ messages in thread
From: Klaus Jensen @ 2022-02-02 12:38 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc


[-- Attachment #1.1: Type: text/plain, Size: 751 bytes --]

On Feb  2 08:00, Chaitanya Kulkarni wrote:
> Mikulas,
> 
> On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > store and it is used to test the copy offload functionality.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> 
> 
> NVMe Controller specific memory backed features needs to go into
> QEMU which are targeted for testing and debugging, just like what
> we have done for NVMe ZNS QEMU support and not in kernel.
> 
> I don't see any special reason to make copy offload an exception.
> 

FWIW the emulated nvme device in QEMU already supports the Copy command.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-01 18:32         ` [dm-devel] " Mikulas Patocka
@ 2022-02-02 16:21           ` Keith Busch
  -1 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-02 16:21 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On Tue, Feb 01, 2022 at 01:32:29PM -0500, Mikulas Patocka wrote:
> +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +		      struct block_device *bdev2, sector_t sector2,
> +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> +{
> +	struct page *token;
> +	sector_t m;
> +	int r = 0;
> +	struct completion comp;
> +
> +	*copied = 0;
> +
> +	m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
> +	if (!m)
> +		return -EOPNOTSUPP;
> +	m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
> +
> +	if (unlikely(bdev_read_only(bdev2)))
> +		return -EPERM;
> +
> +	token = alloc_page(gfp_mask);
> +	if (unlikely(!token))
> +		return -ENOMEM;
> +
> +	while (nr_sects) {
> +		struct bio *read_bio, *write_bio;
> +		sector_t this_step = min(nr_sects, m);
> +
> +		read_bio = bio_alloc(gfp_mask, 1);
> +		if (unlikely(!read_bio)) {
> +			r = -ENOMEM;
> +			break;
> +		}
> +		bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
> +		bio_set_dev(read_bio, bdev1);
> +		__bio_add_page(read_bio, token, PAGE_SIZE, 0);

You have this "token" payload as driver specific data, but there's no
check that bdev1 and bdev2 subscribe to the same driver specific format.

I thought we discussed defining something like a "copy domain" that
establishes which block devices can offload copy operations to/from each
other, and that should be checked before proceeding with the copy
operation.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-02 16:21           ` Keith Busch
  0 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-02 16:21 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, dsterba, Frederick.Knight, Jens Axboe, tytso,
	Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Tue, Feb 01, 2022 at 01:32:29PM -0500, Mikulas Patocka wrote:
> +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +		      struct block_device *bdev2, sector_t sector2,
> +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> +{
> +	struct page *token;
> +	sector_t m;
> +	int r = 0;
> +	struct completion comp;
> +
> +	*copied = 0;
> +
> +	m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
> +	if (!m)
> +		return -EOPNOTSUPP;
> +	m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
> +
> +	if (unlikely(bdev_read_only(bdev2)))
> +		return -EPERM;
> +
> +	token = alloc_page(gfp_mask);
> +	if (unlikely(!token))
> +		return -ENOMEM;
> +
> +	while (nr_sects) {
> +		struct bio *read_bio, *write_bio;
> +		sector_t this_step = min(nr_sects, m);
> +
> +		read_bio = bio_alloc(gfp_mask, 1);
> +		if (unlikely(!read_bio)) {
> +			r = -ENOMEM;
> +			break;
> +		}
> +		bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
> +		bio_set_dev(read_bio, bdev1);
> +		__bio_add_page(read_bio, token, PAGE_SIZE, 0);

You have this "token" payload as driver specific data, but there's no
check that bdev1 and bdev2 subscribe to the same driver specific format.

I thought we discussed defining something like a "copy domain" that
establishes which block devices can offload copy operations to/from each
other, and that should be checked before proceeding with the copy
operation.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-02 16:21           ` [dm-devel] " Keith Busch
@ 2022-02-02 16:40             ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-02 16:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi



On Wed, 2 Feb 2022, Keith Busch wrote:

> On Tue, Feb 01, 2022 at 01:32:29PM -0500, Mikulas Patocka wrote:
> > +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> > +		      struct block_device *bdev2, sector_t sector2,
> > +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> > +{
> > +	struct page *token;
> > +	sector_t m;
> > +	int r = 0;
> > +	struct completion comp;
> > +
> > +	*copied = 0;
> > +
> > +	m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
> > +	if (!m)
> > +		return -EOPNOTSUPP;
> > +	m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
> > +
> > +	if (unlikely(bdev_read_only(bdev2)))
> > +		return -EPERM;
> > +
> > +	token = alloc_page(gfp_mask);
> > +	if (unlikely(!token))
> > +		return -ENOMEM;
> > +
> > +	while (nr_sects) {
> > +		struct bio *read_bio, *write_bio;
> > +		sector_t this_step = min(nr_sects, m);
> > +
> > +		read_bio = bio_alloc(gfp_mask, 1);
> > +		if (unlikely(!read_bio)) {
> > +			r = -ENOMEM;
> > +			break;
> > +		}
> > +		bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
> > +		bio_set_dev(read_bio, bdev1);
> > +		__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> 
> You have this "token" payload as driver specific data, but there's no
> check that bdev1 and bdev2 subscribe to the same driver specific format.
> 
> I thought we discussed defining something like a "copy domain" that
> establishes which block devices can offload copy operations to/from each
> other, and that should be checked before proceeding with the copy
> operation.

There is nvme_setup_read_token that fills in the token:
	memcpy(token->subsys, "nvme", 4);
	token->ns = ns;
	token->src_sector = bio->bi_iter.bi_sector;
	token->sectors = bio->bi_iter.bi_size >> 9;

There is nvme_setup_write_token that checks these values:
	if (unlikely(memcmp(token->subsys, "nvme", 4)))
		return BLK_STS_NOTSUPP;
	if (unlikely(token->ns != ns))
		return BLK_STS_NOTSUPP;

So, if we attempt to copy data between the nvme subsystem and the scsi 
subsystem, the "subsys" check will fail. If we attempt to copy data 
between different nvme namespaces, the "ns" check will fail.

If the nvme standard gets extended with cross-namespace copies, we can 
check in nvme_setup_write_token if we can copy between the source and 
destination namespace.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-02 16:40             ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-02 16:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, dsterba, Frederick.Knight, Jens Axboe, tytso,
	Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc



On Wed, 2 Feb 2022, Keith Busch wrote:

> On Tue, Feb 01, 2022 at 01:32:29PM -0500, Mikulas Patocka wrote:
> > +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> > +		      struct block_device *bdev2, sector_t sector2,
> > +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> > +{
> > +	struct page *token;
> > +	sector_t m;
> > +	int r = 0;
> > +	struct completion comp;
> > +
> > +	*copied = 0;
> > +
> > +	m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
> > +	if (!m)
> > +		return -EOPNOTSUPP;
> > +	m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
> > +
> > +	if (unlikely(bdev_read_only(bdev2)))
> > +		return -EPERM;
> > +
> > +	token = alloc_page(gfp_mask);
> > +	if (unlikely(!token))
> > +		return -ENOMEM;
> > +
> > +	while (nr_sects) {
> > +		struct bio *read_bio, *write_bio;
> > +		sector_t this_step = min(nr_sects, m);
> > +
> > +		read_bio = bio_alloc(gfp_mask, 1);
> > +		if (unlikely(!read_bio)) {
> > +			r = -ENOMEM;
> > +			break;
> > +		}
> > +		bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
> > +		bio_set_dev(read_bio, bdev1);
> > +		__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> 
> You have this "token" payload as driver specific data, but there's no
> check that bdev1 and bdev2 subscribe to the same driver specific format.
> 
> I thought we discussed defining something like a "copy domain" that
> establishes which block devices can offload copy operations to/from each
> other, and that should be checked before proceeding with the copy
> operation.

There is nvme_setup_read_token that fills in the token:
	memcpy(token->subsys, "nvme", 4);
	token->ns = ns;
	token->src_sector = bio->bi_iter.bi_sector;
	token->sectors = bio->bi_iter.bi_size >> 9;

There is nvme_setup_write_token that checks these values:
	if (unlikely(memcmp(token->subsys, "nvme", 4)))
		return BLK_STS_NOTSUPP;
	if (unlikely(token->ns != ns))
		return BLK_STS_NOTSUPP;

So, if we attempt to copy data between the nvme subsystem and the scsi 
subsystem, the "subsys" check will fail. If we attempt to copy data 
between different nvme namespaces, the "ns" check will fail.

If the nvme standard gets extended with cross-namespace copies, we can 
check in nvme_setup_write_token if we can copy between the source and 
destination namespace.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [RFC PATCH 1/3] block: add copy offload support
  2022-02-02 16:21           ` [dm-devel] " Keith Busch
@ 2022-02-02 18:40             ` Knight, Frederick
  -1 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2022-02-02 18:40 UTC (permalink / raw)
  To: Keith Busch, Mikulas Patocka
  Cc: Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

Just FYI about copy domains.

SCSI already has copy domains defined (it is optional in the standard).  Third-party Copy Descriptor - Copy Group Identifier. Here's the SCSI text:

A Copy Group is a set of logical units that have a high probability of using high performance methods (e.g.,
copy on write snapshot) for third-party copy operations involving logical units that are in the same Copy
Group. Each logical unit may be a member of zero or one Copy Group. A logical unit indicates membership in
a Copy Group using the Copy Group Identifier third-party copy descriptor (see 7.7.18.13).

If a third-party copy operation involves logical units that are in different Copy Groups, then that third-party copy
operation has a high probability of using low performance methods (e.g., copy manager read operations from
the source CSCD or copy manager write operations to the destination CSCD).

NVMe today can only copy between LBAs on the SAME Namespace.  There is a new project just getting started to allow copy across namespaces in the same NVM subsystem (included as part of that project is to define how the copy domains will work).  So for NVMe, copy domains are still a work in progress.

Just FYI.

	Fred

-----Original Message-----
From: Keith Busch <kbusch@kernel.org> 
Sent: Wednesday, February 2, 2022 11:22 AM
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Javier González <javier@javigon.com>; Chaitanya Kulkarni <chaitanyak@nvidia.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; dm-devel@redhat.com; linux-nvme@lists.infradead.org; linux-fsdevel <linux-fsdevel@vger.kernel.org>; Jens Axboe <axboe@kernel.dk>; msnitzer@redhat.com >> msnitzer@redhat.com <msnitzer@redhat.com>; Bart Van Assche <bvanassche@acm.org>; martin.petersen@oracle.com >> Martin K. Petersen <martin.petersen@oracle.com>; roland@purestorage.com; Hannes Reinecke <hare@suse.de>; Christoph Hellwig <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; osandov@fb.com; lsf-pc@lists.linux-foundation.org; djwong@kernel.org; josef@toxicpanda.com; clm@fb.com; dsterba@suse.com; tytso@mit.edu; jack@suse.com; Kanchan Joshi <joshi.k@samsung.com>
Subject: Re: [RFC PATCH 1/3] block: add copy offload support

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Tue, Feb 01, 2022 at 01:32:29PM -0500, Mikulas Patocka wrote:
> +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +                   struct block_device *bdev2, sector_t sector2,
> +                   sector_t nr_sects, sector_t *copied, gfp_t 
> +gfp_mask) {
> +     struct page *token;
> +     sector_t m;
> +     int r = 0;
> +     struct completion comp;
> +
> +     *copied = 0;
> +
> +     m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
> +     if (!m)
> +             return -EOPNOTSUPP;
> +     m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
> +
> +     if (unlikely(bdev_read_only(bdev2)))
> +             return -EPERM;
> +
> +     token = alloc_page(gfp_mask);
> +     if (unlikely(!token))
> +             return -ENOMEM;
> +
> +     while (nr_sects) {
> +             struct bio *read_bio, *write_bio;
> +             sector_t this_step = min(nr_sects, m);
> +
> +             read_bio = bio_alloc(gfp_mask, 1);
> +             if (unlikely(!read_bio)) {
> +                     r = -ENOMEM;
> +                     break;
> +             }
> +             bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
> +             bio_set_dev(read_bio, bdev1);
> +             __bio_add_page(read_bio, token, PAGE_SIZE, 0);

You have this "token" payload as driver specific data, but there's no check that bdev1 and bdev2 subscribe to the same driver specific format.

I thought we discussed defining something like a "copy domain" that establishes which block devices can offload copy operations to/from each other, and that should be checked before proceeding with the copy operation.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-02 18:40             ` Knight, Frederick
  0 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2022-02-02 18:40 UTC (permalink / raw)
  To: Keith Busch, Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Assche, linux-scsi, Christoph Hellwig, roland, zach.brown,
	Kulkarni, msnitzer@redhat.com >> msnitzer@redhat.com,
	josef, linux-block, dsterba, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

Just FYI about copy domains.

SCSI already has copy domains defined (it is optional in the standard).  Third-party Copy Descriptor - Copy Group Identifier. Here's the SCSI text:

A Copy Group is a set of logical units that have a high probability of using high performance methods (e.g.,
copy on write snapshot) for third-party copy operations involving logical units that are in the same Copy
Group. Each logical unit may be a member of zero or one Copy Group. A logical unit indicates membership in
a Copy Group using the Copy Group Identifier third-party copy descriptor (see 7.7.18.13).

If a third-party copy operation involves logical units that are in different Copy Groups, then that third-party copy
operation has a high probability of using low performance methods (e.g., copy manager read operations from
the source CSCD or copy manager write operations to the destination CSCD).

NVMe today can only copy between LBAs on the SAME Namespace.  There is a new project just getting started to allow copy across namespaces in the same NVM subsystem (included as part of that project is to define how the copy domains will work).  So for NVMe, copy domains are still a work in progress.

Just FYI.

	Fred

-----Original Message-----
From: Keith Busch <kbusch@kernel.org> 
Sent: Wednesday, February 2, 2022 11:22 AM
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Javier González <javier@javigon.com>; Chaitanya Kulkarni <chaitanyak@nvidia.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; dm-devel@redhat.com; linux-nvme@lists.infradead.org; linux-fsdevel <linux-fsdevel@vger.kernel.org>; Jens Axboe <axboe@kernel.dk>; msnitzer@redhat.com >> msnitzer@redhat.com <msnitzer@redhat.com>; Bart Van Assche <bvanassche@acm.org>; martin.petersen@oracle.com >> Martin K. Petersen <martin.petersen@oracle.com>; roland@purestorage.com; Hannes Reinecke <hare@suse.de>; Christoph Hellwig <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; osandov@fb.com; lsf-pc@lists.linux-foundation.org; djwong@kernel.org; josef@toxicpanda.com; clm@fb.com; dsterba@suse.com; tytso@mit.edu; jack@suse.com; Kanchan Joshi <joshi.k@samsung.com>
Subject: Re: [RFC PATCH 1/3] block: add copy offload support

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Tue, Feb 01, 2022 at 01:32:29PM -0500, Mikulas Patocka wrote:
> +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> +                   struct block_device *bdev2, sector_t sector2,
> +                   sector_t nr_sects, sector_t *copied, gfp_t 
> +gfp_mask) {
> +     struct page *token;
> +     sector_t m;
> +     int r = 0;
> +     struct completion comp;
> +
> +     *copied = 0;
> +
> +     m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2));
> +     if (!m)
> +             return -EOPNOTSUPP;
> +     m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9);
> +
> +     if (unlikely(bdev_read_only(bdev2)))
> +             return -EPERM;
> +
> +     token = alloc_page(gfp_mask);
> +     if (unlikely(!token))
> +             return -ENOMEM;
> +
> +     while (nr_sects) {
> +             struct bio *read_bio, *write_bio;
> +             sector_t this_step = min(nr_sects, m);
> +
> +             read_bio = bio_alloc(gfp_mask, 1);
> +             if (unlikely(!read_bio)) {
> +                     r = -ENOMEM;
> +                     break;
> +             }
> +             bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE);
> +             bio_set_dev(read_bio, bdev1);
> +             __bio_add_page(read_bio, token, PAGE_SIZE, 0);

You have this "token" payload as driver specific data, but there's no check that bdev1 and bdev2 subscribe to the same driver specific format.

I thought we discussed defining something like a "copy domain" that establishes which block devices can offload copy operations to/from each other, and that should be checked before proceeding with the copy operation.


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-02  8:00           ` [dm-devel] " Chaitanya Kulkarni
@ 2022-02-03 15:38             ` Luis Chamberlain
  -1 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-03 15:38 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Mikulas Patocka, linux-block, linux-scsi, dm-devel, linux-nvme,
	Javier González, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Wed, Feb 02, 2022 at 08:00:12AM +0000, Chaitanya Kulkarni wrote:
> Mikulas,
> 
> On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > store and it is used to test the copy offload functionality.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> 
> 
> NVMe Controller specific memory backed features needs to go into
> QEMU which are targeted for testing and debugging, just like what
> we have done for NVMe ZNS QEMU support and not in kernel.
> 
> I don't see any special reason to make copy offload an exception.

One can instantiate scsi devices with qemu by using fake scsi devices,
but one can also just use scsi_debug to do the same. I see both efforts
as desirable, so long as someone mantains this.

For instance, blktests uses scsi_debug for simplicity.

In the end you decide what you want to use.

  Luis

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 15:38             ` Luis Chamberlain
  0 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-03 15:38 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Wed, Feb 02, 2022 at 08:00:12AM +0000, Chaitanya Kulkarni wrote:
> Mikulas,
> 
> On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > store and it is used to test the copy offload functionality.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> 
> 
> NVMe Controller specific memory backed features needs to go into
> QEMU which are targeted for testing and debugging, just like what
> we have done for NVMe ZNS QEMU support and not in kernel.
> 
> I don't see any special reason to make copy offload an exception.

One can instantiate scsi devices with qemu by using fake scsi devices,
but one can also just use scsi_debug to do the same. I see both efforts
as desirable, so long as someone mantains this.

For instance, blktests uses scsi_debug for simplicity.

In the end you decide what you want to use.

  Luis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-02  6:01           ` [dm-devel] " Adam Manzanares
@ 2022-02-03 16:06             ` Luis Chamberlain
  -1 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-03 16:06 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: Mikulas Patocka, Javier González, Chaitanya Kulkarni,
	linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> BTW I think having the target code be able to implement simple copy without 
> moving data over the fabric would be a great way of showing off the command.

Do you mean this should be implemented instead as a fabrics backend
instead because fabrics already instantiates and creates a virtual
nvme device? And so this would mean less code?

  Luis

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 16:06             ` Luis Chamberlain
  0 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-03 16:06 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> BTW I think having the target code be able to implement simple copy without 
> moving data over the fabric would be a great way of showing off the command.

Do you mean this should be implemented instead as a fabrics backend
instead because fabrics already instantiates and creates a virtual
nvme device? And so this would mean less code?

  Luis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 16:06             ` [dm-devel] " Luis Chamberlain
@ 2022-02-03 16:15               ` Christoph Hellwig
  -1 siblings, 0 replies; 272+ messages in thread
From: Christoph Hellwig @ 2022-02-03 16:15 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Adam Manzanares, Mikulas Patocka, Javier González,
	Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > BTW I think having the target code be able to implement simple copy without 
> > moving data over the fabric would be a great way of showing off the command.
> 
> Do you mean this should be implemented instead as a fabrics backend
> instead because fabrics already instantiates and creates a virtual
> nvme device? And so this would mean less code?

It would be a lot less code.  In fact I don't think we need any new code
at all.  Just using nvme-loop on top of null_blk or brd should be all
that is needed.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 16:15               ` Christoph Hellwig
  0 siblings, 0 replies; 272+ messages in thread
From: Christoph Hellwig @ 2022-02-03 16:15 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: djwong, linux-nvme, clm, dm-devel, Adam Manzanares, osandov,
	Javier González, Bart Van Assche, linux-scsi,
	Christoph Hellwig, roland, zach.brown, dsterba,
	Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > BTW I think having the target code be able to implement simple copy without 
> > moving data over the fabric would be a great way of showing off the command.
> 
> Do you mean this should be implemented instead as a fabrics backend
> instead because fabrics already instantiates and creates a virtual
> nvme device? And so this would mean less code?

It would be a lot less code.  In fact I don't think we need any new code
at all.  Just using nvme-loop on top of null_blk or brd should be all
that is needed.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 15:38             ` [dm-devel] " Luis Chamberlain
@ 2022-02-03 16:52               ` Keith Busch
  -1 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-03 16:52 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Chaitanya Kulkarni, Mikulas Patocka, linux-block, linux-scsi,
	dm-devel, linux-nvme, Javier González, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On Thu, Feb 03, 2022 at 07:38:43AM -0800, Luis Chamberlain wrote:
> On Wed, Feb 02, 2022 at 08:00:12AM +0000, Chaitanya Kulkarni wrote:
> > Mikulas,
> > 
> > On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > > store and it is used to test the copy offload functionality.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > 
> > 
> > 
> > NVMe Controller specific memory backed features needs to go into
> > QEMU which are targeted for testing and debugging, just like what
> > we have done for NVMe ZNS QEMU support and not in kernel.
> > 
> > I don't see any special reason to make copy offload an exception.
> 
> One can instantiate scsi devices with qemu by using fake scsi devices,
> but one can also just use scsi_debug to do the same. I see both efforts
> as desirable, so long as someone mantains this.
> 
> For instance, blktests uses scsi_debug for simplicity.
> 
> In the end you decide what you want to use.

Can we use the nvme-loop target instead?

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 16:52               ` Keith Busch
  0 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-03 16:52 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, Javier González, josef,
	linux-block, Mikulas Patocka, dsterba, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Thu, Feb 03, 2022 at 07:38:43AM -0800, Luis Chamberlain wrote:
> On Wed, Feb 02, 2022 at 08:00:12AM +0000, Chaitanya Kulkarni wrote:
> > Mikulas,
> > 
> > On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > > store and it is used to test the copy offload functionality.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > 
> > 
> > 
> > NVMe Controller specific memory backed features needs to go into
> > QEMU which are targeted for testing and debugging, just like what
> > we have done for NVMe ZNS QEMU support and not in kernel.
> > 
> > I don't see any special reason to make copy offload an exception.
> 
> One can instantiate scsi devices with qemu by using fake scsi devices,
> but one can also just use scsi_debug to do the same. I see both efforts
> as desirable, so long as someone mantains this.
> 
> For instance, blktests uses scsi_debug for simplicity.
> 
> In the end you decide what you want to use.

Can we use the nvme-loop target instead?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-01 19:18           ` [dm-devel] " Bart Van Assche
@ 2022-02-03 18:50             ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-03 18:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel



On Tue, 1 Feb 2022, Bart Van Assche wrote:

> On 2/1/22 10:32, Mikulas Patocka wrote:
> >   /**
> > + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the
> > queue
> > + * @q:  the request queue for the device
> > + * @size:  the maximum copy offload sectors
> > + */
> > +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
> > +{
> > +	q->limits.max_copy_sectors = size;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> 
> Please either change the unit of 'size' into bytes or change its type into
> sector_t.

blk_queue_chunk_sectors, blk_queue_max_discard_sectors, 
blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors, 
blk_queue_max_zone_append_sectors also have the unit of sectors and the 
argument is "unsigned int". Should blk_queue_max_copy_sectors be 
different?

> > +extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> > +		      struct block_device *bdev2, sector_t sector2,
> > +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask);
> > +
> 
> Only supporting copying between contiguous LBA ranges seems restrictive to me.
> I expect garbage collection by filesystems for UFS devices to perform better
> if multiple LBA ranges are submitted as a single SCSI XCOPY command.

NVMe has a possibility to copy multiple source ranges into one destination 
range. But I think that leveraging this capability would just make the 
code excessively complex.

> A general comment about the approach: encoding the LBA range information in a
> bio payload is not compatible with bio splitting. How can the dm driver
> implement copy offloading without the ability to split copy offload bio's?

I don't expect the copy bios to be split.

One possibility is to just return -EOPNOTSUPP and fall back to explicit 
copy if the bio crosses dm target boundary (that's what my previous patch 
for SCSI XCOPY did).

Another possibility is to return the split boundary in the token and retry 
both bios will smaller length. But this approach would prevent us from 
submitting the REQ_OP_COPY_WRITE_TOKEN bios asynchronously.

I'm not sure which of these two approaches is better.

> > +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> > +		      struct block_device *bdev2, sector_t sector2,
> > +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> > +{
> > +	struct page *token;
> > +	sector_t m;
> > +	int r = 0;
> > +	struct completion comp;
> 
> Consider using DECLARE_COMPLETION_ONSTACK() instead of a separate declaration
> and init_completion() call.

OK.

> Thanks,
> 
> Bart.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-03 18:50             ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-03 18:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, linux-fsdevel, dm-devel, linux-nvme, linux-scsi



On Tue, 1 Feb 2022, Bart Van Assche wrote:

> On 2/1/22 10:32, Mikulas Patocka wrote:
> >   /**
> > + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the
> > queue
> > + * @q:  the request queue for the device
> > + * @size:  the maximum copy offload sectors
> > + */
> > +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
> > +{
> > +	q->limits.max_copy_sectors = size;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> 
> Please either change the unit of 'size' into bytes or change its type into
> sector_t.

blk_queue_chunk_sectors, blk_queue_max_discard_sectors, 
blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors, 
blk_queue_max_zone_append_sectors also have the unit of sectors and the 
argument is "unsigned int". Should blk_queue_max_copy_sectors be 
different?

> > +extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> > +		      struct block_device *bdev2, sector_t sector2,
> > +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask);
> > +
> 
> Only supporting copying between contiguous LBA ranges seems restrictive to me.
> I expect garbage collection by filesystems for UFS devices to perform better
> if multiple LBA ranges are submitted as a single SCSI XCOPY command.

NVMe has a possibility to copy multiple source ranges into one destination 
range. But I think that leveraging this capability would just make the 
code excessively complex.

> A general comment about the approach: encoding the LBA range information in a
> bio payload is not compatible with bio splitting. How can the dm driver
> implement copy offloading without the ability to split copy offload bio's?

I don't expect the copy bios to be split.

One possibility is to just return -EOPNOTSUPP and fall back to explicit 
copy if the bio crosses dm target boundary (that's what my previous patch 
for SCSI XCOPY did).

Another possibility is to return the split boundary in the token and retry 
both bios will smaller length. But this approach would prevent us from 
submitting the REQ_OP_COPY_WRITE_TOKEN bios asynchronously.

I'm not sure which of these two approaches is better.

> > +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1,
> > +		      struct block_device *bdev2, sector_t sector2,
> > +		      sector_t nr_sects, sector_t *copied, gfp_t gfp_mask)
> > +{
> > +	struct page *token;
> > +	sector_t m;
> > +	int r = 0;
> > +	struct completion comp;
> 
> Consider using DECLARE_COMPLETION_ONSTACK() instead of a separate declaration
> and init_completion() call.

OK.

> Thanks,
> 
> Bart.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 16:15               ` [dm-devel] " Christoph Hellwig
@ 2022-02-03 19:34                 ` Luis Chamberlain
  -1 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-03 19:34 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Adam Manzanares, Vincent Fu, Mikulas Patocka,
	Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Frederick.Knight, zach.brown, osandov, lsf-pc, djwong, josef,
	clm, dsterba, tytso, jack, Kanchan Joshi

On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > BTW I think having the target code be able to implement simple copy without 
> > > moving data over the fabric would be a great way of showing off the command.
> > 
> > Do you mean this should be implemented instead as a fabrics backend
> > instead because fabrics already instantiates and creates a virtual
> > nvme device? And so this would mean less code?
> 
> It would be a lot less code.  In fact I don't think we need any new code
> at all.  Just using nvme-loop on top of null_blk or brd should be all
> that is needed.

Mikulas,

That begs the question why add this instead of using null_blk with
nvme-loop?

  Luis

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 19:34                 ` Luis Chamberlain
  0 siblings, 0 replies; 272+ messages in thread
From: Luis Chamberlain @ 2022-02-03 19:34 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Vincent Fu, djwong, linux-nvme, clm, dm-devel, Adam Manzanares,
	osandov, Javier González, Bart Van Assche, linux-scsi,
	roland, zach.brown, dsterba, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > BTW I think having the target code be able to implement simple copy without 
> > > moving data over the fabric would be a great way of showing off the command.
> > 
> > Do you mean this should be implemented instead as a fabrics backend
> > instead because fabrics already instantiates and creates a virtual
> > nvme device? And so this would mean less code?
> 
> It would be a lot less code.  In fact I don't think we need any new code
> at all.  Just using nvme-loop on top of null_blk or brd should be all
> that is needed.

Mikulas,

That begs the question why add this instead of using null_blk with
nvme-loop?

  Luis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 19:34                 ` [dm-devel] " Luis Chamberlain
@ 2022-02-03 19:46                   ` Adam Manzanares
  -1 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-03 19:46 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, Keith Busch, Vincent Fu, Mikulas Patocka,
	Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Thu, Feb 03, 2022 at 11:34:27AM -0800, Luis Chamberlain wrote:
> On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > > BTW I think having the target code be able to implement simple copy without 
> > > > moving data over the fabric would be a great way of showing off the command.
> > > 
> > > Do you mean this should be implemented instead as a fabrics backend
> > > instead because fabrics already instantiates and creates a virtual
> > > nvme device? And so this would mean less code?
> > 
> > It would be a lot less code.  In fact I don't think we need any new code
> > at all.  Just using nvme-loop on top of null_blk or brd should be all
> > that is needed.
> 
> Mikulas,
> 
> That begs the question why add this instead of using null_blk with
> nvme-loop?

I think the only thing missing would be a handler for the simple copy command.

> 
>   Luis

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 19:46                   ` Adam Manzanares
  0 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-03 19:46 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Vincent Fu, djwong, linux-nvme, clm, dm-devel, osandov,
	Javier González, Bart Van Assche, linux-scsi,
	Christoph Hellwig, roland, zach.brown, dsterba,
	Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka, Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On Thu, Feb 03, 2022 at 11:34:27AM -0800, Luis Chamberlain wrote:
> On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > > BTW I think having the target code be able to implement simple copy without 
> > > > moving data over the fabric would be a great way of showing off the command.
> > > 
> > > Do you mean this should be implemented instead as a fabrics backend
> > > instead because fabrics already instantiates and creates a virtual
> > > nvme device? And so this would mean less code?
> > 
> > It would be a lot less code.  In fact I don't think we need any new code
> > at all.  Just using nvme-loop on top of null_blk or brd should be all
> > that is needed.
> 
> Mikulas,
> 
> That begs the question why add this instead of using null_blk with
> nvme-loop?

I think the only thing missing would be a handler for the simple copy command.

> 
>   Luis


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 16:52               ` [dm-devel] " Keith Busch
@ 2022-02-03 19:50                 ` Adam Manzanares
  -1 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-03 19:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: Luis Chamberlain, Chaitanya Kulkarni, Mikulas Patocka,
	linux-block, linux-scsi, dm-devel, linux-nvme,
	Javier González, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On Thu, Feb 03, 2022 at 08:52:38AM -0800, Keith Busch wrote:
> On Thu, Feb 03, 2022 at 07:38:43AM -0800, Luis Chamberlain wrote:
> > On Wed, Feb 02, 2022 at 08:00:12AM +0000, Chaitanya Kulkarni wrote:
> > > Mikulas,
> > > 
> > > On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > > > External email: Use caution opening links or attachments
> > > > 
> > > > 
> > > > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > > > store and it is used to test the copy offload functionality.
> > > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > 
> > > 
> > > 
> > > NVMe Controller specific memory backed features needs to go into
> > > QEMU which are targeted for testing and debugging, just like what
> > > we have done for NVMe ZNS QEMU support and not in kernel.
> > > 
> > > I don't see any special reason to make copy offload an exception.
> > 
> > One can instantiate scsi devices with qemu by using fake scsi devices,
> > but one can also just use scsi_debug to do the same. I see both efforts
> > as desirable, so long as someone mantains this.
> > 
> > For instance, blktests uses scsi_debug for simplicity.
> > 
> > In the end you decide what you want to use.
> 
> Can we use the nvme-loop target instead?

I am advocating for this approach as well. It presentas a virtual nvme
controller already.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 19:50                 ` Adam Manzanares
  0 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-03 19:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com, Assche,
	linux-scsi, Christoph Hellwig, roland, zach.brown,
	Chaitanya Kulkarni, Javier González, josef, linux-block,
	Mikulas Patocka, dsterba, Frederick.Knight, Jens Axboe, tytso,
	Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On Thu, Feb 03, 2022 at 08:52:38AM -0800, Keith Busch wrote:
> On Thu, Feb 03, 2022 at 07:38:43AM -0800, Luis Chamberlain wrote:
> > On Wed, Feb 02, 2022 at 08:00:12AM +0000, Chaitanya Kulkarni wrote:
> > > Mikulas,
> > > 
> > > On 2/1/22 10:33 AM, Mikulas Patocka wrote:
> > > > External email: Use caution opening links or attachments
> > > > 
> > > > 
> > > > This patch adds a new driver "nvme-debug". It uses memory as a backing
> > > > store and it is used to test the copy offload functionality.
> > > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > 
> > > 
> > > 
> > > NVMe Controller specific memory backed features needs to go into
> > > QEMU which are targeted for testing and debugging, just like what
> > > we have done for NVMe ZNS QEMU support and not in kernel.
> > > 
> > > I don't see any special reason to make copy offload an exception.
> > 
> > One can instantiate scsi devices with qemu by using fake scsi devices,
> > but one can also just use scsi_debug to do the same. I see both efforts
> > as desirable, so long as someone mantains this.
> > 
> > For instance, blktests uses scsi_debug for simplicity.
> > 
> > In the end you decide what you want to use.
> 
> Can we use the nvme-loop target instead?

I am advocating for this approach as well. It presentas a virtual nvme
controller already.


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-03 18:50             ` [dm-devel] " Mikulas Patocka
@ 2022-02-03 20:11               ` Keith Busch
  -1 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-03 20:11 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Bart Van Assche, linux-block, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel

On Thu, Feb 03, 2022 at 01:50:06PM -0500, Mikulas Patocka wrote:
> On Tue, 1 Feb 2022, Bart Van Assche wrote:
> > Only supporting copying between contiguous LBA ranges seems restrictive to me.
> > I expect garbage collection by filesystems for UFS devices to perform better
> > if multiple LBA ranges are submitted as a single SCSI XCOPY command.
> 
> NVMe has a possibility to copy multiple source ranges into one destination 
> range. But I think that leveraging this capability would just make the 
> code excessively complex.

The point is to defrag discontiguous blocks into a single range. The
capability loses a major value proposition without multiple sources.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-03 20:11               ` Keith Busch
  0 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-03 20:11 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Bart Van Assche, linux-scsi, linux-nvme, linux-block, dm-devel,
	linux-fsdevel

On Thu, Feb 03, 2022 at 01:50:06PM -0500, Mikulas Patocka wrote:
> On Tue, 1 Feb 2022, Bart Van Assche wrote:
> > Only supporting copying between contiguous LBA ranges seems restrictive to me.
> > I expect garbage collection by filesystems for UFS devices to perform better
> > if multiple LBA ranges are submitted as a single SCSI XCOPY command.
> 
> NVMe has a possibility to copy multiple source ranges into one destination 
> range. But I think that leveraging this capability would just make the 
> code excessively complex.

The point is to defrag discontiguous blocks into a single range. The
capability loses a major value proposition without multiple sources.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 19:34                 ` [dm-devel] " Luis Chamberlain
@ 2022-02-03 20:57                   ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-03 20:57 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, Keith Busch, Adam Manzanares, Vincent Fu,
	Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi



On Thu, 3 Feb 2022, Luis Chamberlain wrote:

> On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > > BTW I think having the target code be able to implement simple copy without 
> > > > moving data over the fabric would be a great way of showing off the command.
> > > 
> > > Do you mean this should be implemented instead as a fabrics backend
> > > instead because fabrics already instantiates and creates a virtual
> > > nvme device? And so this would mean less code?
> > 
> > It would be a lot less code.  In fact I don't think we need any new code
> > at all.  Just using nvme-loop on top of null_blk or brd should be all
> > that is needed.
> 
> Mikulas,
> 
> That begs the question why add this instead of using null_blk with
> nvme-loop?
> 
>   Luis

I think that nvme-debug (the patch 3) doesn't have to be added to the 
kernel.

Nvme-debug was an old student project that was canceled. I used it because 
it was very easy to add copy offload functionality to it - adding this 
capability took just one function with 43 lines of code (nvme_debug_copy).

I don't know if someone is interested in continuing the development of 
nvme-debug. If yes, I can continue the development, if not, we can just 
drop it.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 20:57                   ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-03 20:57 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Vincent Fu, djwong, linux-nvme, clm, dm-devel, Adam Manzanares,
	osandov, Javier González, Bart Van Assche, linux-scsi,
	Christoph Hellwig, roland, zach.brown, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, dsterba, Keith Busch, Frederick.Knight, Jens Axboe,
	tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc



On Thu, 3 Feb 2022, Luis Chamberlain wrote:

> On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > > BTW I think having the target code be able to implement simple copy without 
> > > > moving data over the fabric would be a great way of showing off the command.
> > > 
> > > Do you mean this should be implemented instead as a fabrics backend
> > > instead because fabrics already instantiates and creates a virtual
> > > nvme device? And so this would mean less code?
> > 
> > It would be a lot less code.  In fact I don't think we need any new code
> > at all.  Just using nvme-loop on top of null_blk or brd should be all
> > that is needed.
> 
> Mikulas,
> 
> That begs the question why add this instead of using null_blk with
> nvme-loop?
> 
>   Luis

I think that nvme-debug (the patch 3) doesn't have to be added to the 
kernel.

Nvme-debug was an old student project that was canceled. I used it because 
it was very easy to add copy offload functionality to it - adding this 
capability took just one function with 43 lines of code (nvme_debug_copy).

I don't know if someone is interested in continuing the development of 
nvme-debug. If yes, I can continue the development, if not, we can just 
drop it.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-03 18:50             ` [dm-devel] " Mikulas Patocka
@ 2022-02-03 22:49               ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-02-03 22:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel

On 2/3/22 10:50, Mikulas Patocka wrote:
> On Tue, 1 Feb 2022, Bart Van Assche wrote:
>> On 2/1/22 10:32, Mikulas Patocka wrote:
>>>    /**
>>> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the
>>> queue
>>> + * @q:  the request queue for the device
>>> + * @size:  the maximum copy offload sectors
>>> + */
>>> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
>>> +{
>>> +	q->limits.max_copy_sectors = size;
>>> +}
>>> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
>>
>> Please either change the unit of 'size' into bytes or change its type into
>> sector_t.
> 
> blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
> blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
> blk_queue_max_zone_append_sectors also have the unit of sectors and the
> argument is "unsigned int". Should blk_queue_max_copy_sectors be
> different?

As far as I know using the type sector_t for variables that represent a 
number of sectors is a widely followed convention:

$ git grep -w sector_t | wc -l
2575

I would appreciate it if that convention would be used consistently, 
even if that means modifying existing code.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-03 22:49               ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2022-02-03 22:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-block, linux-fsdevel, dm-devel, linux-nvme, linux-scsi

On 2/3/22 10:50, Mikulas Patocka wrote:
> On Tue, 1 Feb 2022, Bart Van Assche wrote:
>> On 2/1/22 10:32, Mikulas Patocka wrote:
>>>    /**
>>> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the
>>> queue
>>> + * @q:  the request queue for the device
>>> + * @size:  the maximum copy offload sectors
>>> + */
>>> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size)
>>> +{
>>> +	q->limits.max_copy_sectors = size;
>>> +}
>>> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
>>
>> Please either change the unit of 'size' into bytes or change its type into
>> sector_t.
> 
> blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
> blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
> blk_queue_max_zone_append_sectors also have the unit of sectors and the
> argument is "unsigned int". Should blk_queue_max_copy_sectors be
> different?

As far as I know using the type sector_t for variables that represent a 
number of sectors is a widely followed convention:

$ git grep -w sector_t | wc -l
2575

I would appreciate it if that convention would be used consistently, 
even if that means modifying existing code.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 20:57                   ` [dm-devel] " Mikulas Patocka
@ 2022-02-03 22:52                     ` Adam Manzanares
  -1 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-03 22:52 UTC (permalink / raw)
  To: Mikulas Patocka, joshi.k
  Cc: Luis Chamberlain, Christoph Hellwig, Keith Busch, Vincent Fu,
	Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Thu, Feb 03, 2022 at 03:57:55PM -0500, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Luis Chamberlain wrote:
> 
> > On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > > > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > > > BTW I think having the target code be able to implement simple copy without 
> > > > > moving data over the fabric would be a great way of showing off the command.
> > > > 
> > > > Do you mean this should be implemented instead as a fabrics backend
> > > > instead because fabrics already instantiates and creates a virtual
> > > > nvme device? And so this would mean less code?
> > > 
> > > It would be a lot less code.  In fact I don't think we need any new code
> > > at all.  Just using nvme-loop on top of null_blk or brd should be all
> > > that is needed.
> > 
> > Mikulas,
> > 
> > That begs the question why add this instead of using null_blk with
> > nvme-loop?
> > 
> >   Luis
> 
> I think that nvme-debug (the patch 3) doesn't have to be added to the 
> kernel.
> 
> Nvme-debug was an old student project that was canceled. I used it because 
> it was very easy to add copy offload functionality to it - adding this 
> capability took just one function with 43 lines of code (nvme_debug_copy).
>

BTW Kanchan's group has looked at adding copy offload support to the target.
I'll let him respond on the timing of upstreaming, I'm under the assumption
that it is also a relatively small patch to the target code.

> I don't know if someone is interested in continuing the development of 
> nvme-debug. If yes, I can continue the development, if not, we can just 
> drop it.
> 
> Mikulas
> 

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-03 22:52                     ` Adam Manzanares
  0 siblings, 0 replies; 272+ messages in thread
From: Adam Manzanares @ 2022-02-03 22:52 UTC (permalink / raw)
  To: Mikulas Patocka, joshi.k
  Cc: Vincent Fu, djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, Javier González, josef,
	linux-block, dsterba, Keith Busch, Frederick.Knight, Jens Axboe,
	tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On Thu, Feb 03, 2022 at 03:57:55PM -0500, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Luis Chamberlain wrote:
> 
> > On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
> > > > On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
> > > > > BTW I think having the target code be able to implement simple copy without 
> > > > > moving data over the fabric would be a great way of showing off the command.
> > > > 
> > > > Do you mean this should be implemented instead as a fabrics backend
> > > > instead because fabrics already instantiates and creates a virtual
> > > > nvme device? And so this would mean less code?
> > > 
> > > It would be a lot less code.  In fact I don't think we need any new code
> > > at all.  Just using nvme-loop on top of null_blk or brd should be all
> > > that is needed.
> > 
> > Mikulas,
> > 
> > That begs the question why add this instead of using null_blk with
> > nvme-loop?
> > 
> >   Luis
> 
> I think that nvme-debug (the patch 3) doesn't have to be added to the 
> kernel.
> 
> Nvme-debug was an old student project that was canceled. I used it because 
> it was very easy to add copy offload functionality to it - adding this 
> capability took just one function with 43 lines of code (nvme_debug_copy).
>

BTW Kanchan's group has looked at adding copy offload support to the target.
I'll let him respond on the timing of upstreaming, I'm under the assumption
that it is also a relatively small patch to the target code.

> I don't know if someone is interested in continuing the development of 
> nvme-debug. If yes, I can continue the development, if not, we can just 
> drop it.
> 
> Mikulas
> 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 20:57                   ` [dm-devel] " Mikulas Patocka
@ 2022-02-04  3:00                     ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  3:00 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Christoph Hellwig, Keith Busch, Adam Manzanares, Vincent Fu,
	Javier González, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi,
	Luis Chamberlain

Mikulas,

On 2/3/22 12:57, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Luis Chamberlain wrote:
> 
>> On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
>>> On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
>>>> On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
>>>>> BTW I think having the target code be able to implement simple copy without
>>>>> moving data over the fabric would be a great way of showing off the command.
>>>>
>>>> Do you mean this should be implemented instead as a fabrics backend
>>>> instead because fabrics already instantiates and creates a virtual
>>>> nvme device? And so this would mean less code?
>>>
>>> It would be a lot less code.  In fact I don't think we need any new code
>>> at all.  Just using nvme-loop on top of null_blk or brd should be all
>>> that is needed.
>>
>> Mikulas,
>>
>> That begs the question why add this instead of using null_blk with
>> nvme-loop?
>>
>>    Luis
> 
> I think that nvme-debug (the patch 3) doesn't have to be added to the
> kernel.
> 
> Nvme-debug was an old student project that was canceled. I used it because
> it was very easy to add copy offload functionality to it - adding this
> capability took just one function with 43 lines of code (nvme_debug_copy).
> 
> I don't know if someone is interested in continuing the development of
> nvme-debug. If yes, I can continue the development, if not, we can just
> drop it.
> 
> Mikulas
> 

Thanks for explanation seems like we are on the same page, we don't
want any code such as this that is controller specific in the
NVMe repo including target.

-ck



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  3:00                     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  3:00 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Vincent Fu, djwong, linux-nvme, clm, dm-devel, Manzanares,
	osandov, Javier González, Assche, linux-scsi,
	Christoph Hellwig, roland, zach.brown,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, dsterba, Keith Busch, Frederick.Knight, Jens Axboe,
	tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

Mikulas,

On 2/3/22 12:57, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Luis Chamberlain wrote:
> 
>> On Thu, Feb 03, 2022 at 05:15:34PM +0100, Christoph Hellwig wrote:
>>> On Thu, Feb 03, 2022 at 08:06:33AM -0800, Luis Chamberlain wrote:
>>>> On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
>>>>> BTW I think having the target code be able to implement simple copy without
>>>>> moving data over the fabric would be a great way of showing off the command.
>>>>
>>>> Do you mean this should be implemented instead as a fabrics backend
>>>> instead because fabrics already instantiates and creates a virtual
>>>> nvme device? And so this would mean less code?
>>>
>>> It would be a lot less code.  In fact I don't think we need any new code
>>> at all.  Just using nvme-loop on top of null_blk or brd should be all
>>> that is needed.
>>
>> Mikulas,
>>
>> That begs the question why add this instead of using null_blk with
>> nvme-loop?
>>
>>    Luis
> 
> I think that nvme-debug (the patch 3) doesn't have to be added to the
> kernel.
> 
> Nvme-debug was an old student project that was canceled. I used it because
> it was very easy to add copy offload functionality to it - adding this
> capability took just one function with 43 lines of code (nvme_debug_copy).
> 
> I don't know if someone is interested in continuing the development of
> nvme-debug. If yes, I can continue the development, if not, we can just
> drop it.
> 
> Mikulas
> 

Thanks for explanation seems like we are on the same page, we don't
want any code such as this that is controller specific in the
NVMe repo including target.

-ck



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 16:06             ` [dm-devel] " Luis Chamberlain
@ 2022-02-04  3:05               ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  3:05 UTC (permalink / raw)
  To: Luis Chamberlain, Adam Manzanares
  Cc: Mikulas Patocka, Javier González, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On 2/3/22 08:06, Luis Chamberlain wrote:
> On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
>> BTW I think having the target code be able to implement simple copy without
>> moving data over the fabric would be a great way of showing off the command.
> 
> Do you mean this should be implemented instead as a fabrics backend
> instead because fabrics already instantiates and creates a virtual
> nvme device? And so this would mean less code?
> 
>    Luis
> 

This should not be implemented on the fabrics backend at all or even
in nvme-loop. Use QEMU for testing memory backed features.

-ck



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  3:05               ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  3:05 UTC (permalink / raw)
  To: Luis Chamberlain, Adam Manzanares
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Assche, linux-scsi, Christoph Hellwig, roland, zach.brown,
	dsterba, msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On 2/3/22 08:06, Luis Chamberlain wrote:
> On Wed, Feb 02, 2022 at 06:01:13AM +0000, Adam Manzanares wrote:
>> BTW I think having the target code be able to implement simple copy without
>> moving data over the fabric would be a great way of showing off the command.
> 
> Do you mean this should be implemented instead as a fabrics backend
> instead because fabrics already instantiates and creates a virtual
> nvme device? And so this would mean less code?
> 
>    Luis
> 

This should not be implemented on the fabrics backend at all or even
in nvme-loop. Use QEMU for testing memory backed features.

-ck



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-03 19:50                 ` [dm-devel] " Adam Manzanares
@ 2022-02-04  3:12                   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  3:12 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: Luis Chamberlain, Mikulas Patocka, linux-block, Keith Busch,
	linux-scsi, dm-devel, linux-nvme, Javier González,
	linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi


>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>> but one can also just use scsi_debug to do the same. I see both efforts
>>> as desirable, so long as someone mantains this.
>>>

Why do you think both efforts are desirable ?

NVMe ZNS QEMU implementation proved to be perfect and works just
fine for testing, copy offload is not an exception.

>>> For instance, blktests uses scsi_debug for simplicity.
>>>
>>> In the end you decide what you want to use.
>>
>> Can we use the nvme-loop target instead?
> 
> I am advocating for this approach as well. It presentas a virtual nvme
> controller already.
> 

It does that assuming underlying block device such as null_blk or
QEMU implementation supports required features not to bloat the the
NVMeOF target.

-ck



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  3:12                   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  3:12 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka, Keith Busch, Frederick.Knight,
	Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc


>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>> but one can also just use scsi_debug to do the same. I see both efforts
>>> as desirable, so long as someone mantains this.
>>>

Why do you think both efforts are desirable ?

NVMe ZNS QEMU implementation proved to be perfect and works just
fine for testing, copy offload is not an exception.

>>> For instance, blktests uses scsi_debug for simplicity.
>>>
>>> In the end you decide what you want to use.
>>
>> Can we use the nvme-loop target instead?
> 
> I am advocating for this approach as well. It presentas a virtual nvme
> controller already.
> 

It does that assuming underlying block device such as null_blk or
QEMU implementation supports required features not to bloat the the
NVMeOF target.

-ck



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04  3:12                   ` [dm-devel] " Chaitanya Kulkarni
@ 2022-02-04  6:28                     ` Damien Le Moal
  -1 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2022-02-04  6:28 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Adam Manzanares
  Cc: Luis Chamberlain, Mikulas Patocka, linux-block, Keith Busch,
	linux-scsi, dm-devel, linux-nvme, Javier González,
	linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On 2/4/22 12:12, Chaitanya Kulkarni wrote:
> 
>>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>>> but one can also just use scsi_debug to do the same. I see both efforts
>>>> as desirable, so long as someone mantains this.
>>>>
> 
> Why do you think both efforts are desirable ?

When testing code using the functionality, it is far easier to get said
functionality doing a simple "modprobe" rather than having to setup a
VM. C.f. running blktests or fstests.

So personally, I also think it would be great to have a kernel-based
emulation of copy offload. And that should be very easy to implement
with the fabric code. Then loopback onto a nullblk device and you get a
quick and easy to setup copy-offload device that can even be of the ZNS
variant if you want since nullblk supports zones.

> 
> NVMe ZNS QEMU implementation proved to be perfect and works just
> fine for testing, copy offload is not an exception.
> 
>>>> For instance, blktests uses scsi_debug for simplicity.
>>>>
>>>> In the end you decide what you want to use.
>>>
>>> Can we use the nvme-loop target instead?
>>
>> I am advocating for this approach as well. It presentas a virtual nvme
>> controller already.
>>
> 
> It does that assuming underlying block device such as null_blk or
> QEMU implementation supports required features not to bloat the the
> NVMeOF target.
> 
> -ck
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  6:28                     ` Damien Le Moal
  0 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2022-02-04  6:28 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Adam Manzanares
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka, Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On 2/4/22 12:12, Chaitanya Kulkarni wrote:
> 
>>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>>> but one can also just use scsi_debug to do the same. I see both efforts
>>>> as desirable, so long as someone mantains this.
>>>>
> 
> Why do you think both efforts are desirable ?

When testing code using the functionality, it is far easier to get said
functionality doing a simple "modprobe" rather than having to setup a
VM. C.f. running blktests or fstests.

So personally, I also think it would be great to have a kernel-based
emulation of copy offload. And that should be very easy to implement
with the fabric code. Then loopback onto a nullblk device and you get a
quick and easy to setup copy-offload device that can even be of the ZNS
variant if you want since nullblk supports zones.

> 
> NVMe ZNS QEMU implementation proved to be perfect and works just
> fine for testing, copy offload is not an exception.
> 
>>>> For instance, blktests uses scsi_debug for simplicity.
>>>>
>>>> In the end you decide what you want to use.
>>>
>>> Can we use the nvme-loop target instead?
>>
>> I am advocating for this approach as well. It presentas a virtual nvme
>> controller already.
>>
> 
> It does that assuming underlying block device such as null_blk or
> QEMU implementation supports required features not to bloat the the
> NVMeOF target.
> 
> -ck
> 
> 


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04  6:28                     ` [dm-devel] " Damien Le Moal
@ 2022-02-04  7:58                       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  7:58 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Luis Chamberlain, Mikulas Patocka, linux-block, Keith Busch,
	Adam Manzanares, linux-scsi, dm-devel, linux-nvme,
	Javier González, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On 2/3/22 22:28, Damien Le Moal wrote:
> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>
>>>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>>>> but one can also just use scsi_debug to do the same. I see both efforts
>>>>> as desirable, so long as someone mantains this.
>>>>>
>>
>> Why do you think both efforts are desirable ?
> 
> When testing code using the functionality, it is far easier to get said
> functionality doing a simple "modprobe" rather than having to setup a
> VM. C.f. running blktests or fstests.
> 

agree on simplicity but then why do we have QEMU implementations for
the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
memoery backed NVMeOF test target for NVMe controller features.

Also, recognizing the simplicity I proposed initially NVMe ZNS
fabrics based emulation over QEMU (I think I still have initial state
machine implementation code for ZNS somewhere), those were "nacked" for
the right reason, since we've decided go with QEMU and use that as a
primary platform for testing, so I failed to understand what has
changed.. since given that QEMU already supports NVMe simple copy ...

> So personally, I also think it would be great to have a kernel-based
> emulation of copy offload. And that should be very easy to implement
> with the fabric code. Then loopback onto a nullblk device and you get a
> quick and easy to setup copy-offload device that can even be of the ZNS
> variant if you want since nullblk supports zones.
> 

One can do that with creating null_blk based NVMeOF target namespace,
no need to emulate simple copy memory backed code in the fabrics
with nvme-loop.. it is as simple as inserting module and configuring
ns with nvmetcli once we have finalized the solution for copy offload.
If you remember, I already have patches for that...

>>
>> NVMe ZNS QEMU implementation proved to be perfect and works just
>> fine for testing, copy offload is not an exception.
>>
>>>>> For instance, blktests uses scsi_debug for simplicity.
>>>>>
>>>>> In the end you decide what you want to use.
>>>>
>>>> Can we use the nvme-loop target instead?
>>>
>>> I am advocating for this approach as well. It presentas a virtual nvme
>>> controller already.
>>>
>>
>> It does that assuming underlying block device such as null_blk or
>> QEMU implementation supports required features not to bloat the the
>> NVMeOF target.
>>
>> -ck
>>

-ck



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  7:58                       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  7:58 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: djwong, linux-nvme, clm, dm-devel, Manzanares, osandov,
	Javier González, Assche, linux-scsi, Christoph Hellwig,
	roland, zach.brown, dsterba,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, Mikulas Patocka, Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On 2/3/22 22:28, Damien Le Moal wrote:
> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>
>>>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>>>> but one can also just use scsi_debug to do the same. I see both efforts
>>>>> as desirable, so long as someone mantains this.
>>>>>
>>
>> Why do you think both efforts are desirable ?
> 
> When testing code using the functionality, it is far easier to get said
> functionality doing a simple "modprobe" rather than having to setup a
> VM. C.f. running blktests or fstests.
> 

agree on simplicity but then why do we have QEMU implementations for
the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
memoery backed NVMeOF test target for NVMe controller features.

Also, recognizing the simplicity I proposed initially NVMe ZNS
fabrics based emulation over QEMU (I think I still have initial state
machine implementation code for ZNS somewhere), those were "nacked" for
the right reason, since we've decided go with QEMU and use that as a
primary platform for testing, so I failed to understand what has
changed.. since given that QEMU already supports NVMe simple copy ...

> So personally, I also think it would be great to have a kernel-based
> emulation of copy offload. And that should be very easy to implement
> with the fabric code. Then loopback onto a nullblk device and you get a
> quick and easy to setup copy-offload device that can even be of the ZNS
> variant if you want since nullblk supports zones.
> 

One can do that with creating null_blk based NVMeOF target namespace,
no need to emulate simple copy memory backed code in the fabrics
with nvme-loop.. it is as simple as inserting module and configuring
ns with nvmetcli once we have finalized the solution for copy offload.
If you remember, I already have patches for that...

>>
>> NVMe ZNS QEMU implementation proved to be perfect and works just
>> fine for testing, copy offload is not an exception.
>>
>>>>> For instance, blktests uses scsi_debug for simplicity.
>>>>>
>>>>> In the end you decide what you want to use.
>>>>
>>>> Can we use the nvme-loop target instead?
>>>
>>> I am advocating for this approach as well. It presentas a virtual nvme
>>> controller already.
>>>
>>
>> It does that assuming underlying block device such as null_blk or
>> QEMU implementation supports required features not to bloat the the
>> NVMeOF target.
>>
>> -ck
>>

-ck



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04  7:58                       ` [dm-devel] " Chaitanya Kulkarni
@ 2022-02-04  8:24                         ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2022-02-04  8:24 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Damien Le Moal, Luis Chamberlain, Mikulas Patocka, linux-block,
	Keith Busch, Adam Manzanares, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
>On 2/3/22 22:28, Damien Le Moal wrote:
>> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>>
>>>>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>>>>> but one can also just use scsi_debug to do the same. I see both efforts
>>>>>> as desirable, so long as someone mantains this.
>>>>>>
>>>
>>> Why do you think both efforts are desirable ?
>>
>> When testing code using the functionality, it is far easier to get said
>> functionality doing a simple "modprobe" rather than having to setup a
>> VM. C.f. running blktests or fstests.
>>
>
>agree on simplicity but then why do we have QEMU implementations for
>the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
>memoery backed NVMeOF test target for NVMe controller features.
>
>Also, recognizing the simplicity I proposed initially NVMe ZNS
>fabrics based emulation over QEMU (I think I still have initial state
>machine implementation code for ZNS somewhere), those were "nacked" for
>the right reason, since we've decided go with QEMU and use that as a
>primary platform for testing, so I failed to understand what has
>changed.. since given that QEMU already supports NVMe simple copy ...

I was not part of this conversation, but as I see it each approach give
a benefit. QEMU is fantastic for compliance testing and I am not sure
you get the same level of command analysis anywhere else; at least not
without writing dedicated code for this in a target.

This said, when we want to test for race conditions, QEMU is very slow.
For a software-only solution, we have experimented with something
similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
the nvme-loop target as an alternative and this seems to work pretty
nicely. I do not believe there should be many changes to support copy
offload using this.

So in my view having both is not replication and it gives more
flexibility for validation, which I believe it is always good.

>
>> So personally, I also think it would be great to have a kernel-based
>> emulation of copy offload. And that should be very easy to implement
>> with the fabric code. Then loopback onto a nullblk device and you get a
>> quick and easy to setup copy-offload device that can even be of the ZNS
>> variant if you want since nullblk supports zones.
>>
>
>One can do that with creating null_blk based NVMeOF target namespace,
>no need to emulate simple copy memory backed code in the fabrics
>with nvme-loop.. it is as simple as inserting module and configuring
>ns with nvmetcli once we have finalized the solution for copy offload.
>If you remember, I already have patches for that...
>
>>>
>>> NVMe ZNS QEMU implementation proved to be perfect and works just
>>> fine for testing, copy offload is not an exception.
>>>
>>>>>> For instance, blktests uses scsi_debug for simplicity.
>>>>>>
>>>>>> In the end you decide what you want to use.
>>>>>
>>>>> Can we use the nvme-loop target instead?
>>>>
>>>> I am advocating for this approach as well. It presentas a virtual nvme
>>>> controller already.
>>>>
>>>
>>> It does that assuming underlying block device such as null_blk or
>>> QEMU implementation supports required features not to bloat the the
>>> NVMeOF target.
>>>
>>> -ck
>>>
>
>-ck
>

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  8:24                         ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2022-02-04  8:24 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, Adam Manzanares, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Damien Le Moal, Christoph Hellwig,
	roland, zach.brown, dsterba, josef, linux-block, Mikulas Patocka,
	Keith Busch, Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
>On 2/3/22 22:28, Damien Le Moal wrote:
>> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>>
>>>>>> One can instantiate scsi devices with qemu by using fake scsi devices,
>>>>>> but one can also just use scsi_debug to do the same. I see both efforts
>>>>>> as desirable, so long as someone mantains this.
>>>>>>
>>>
>>> Why do you think both efforts are desirable ?
>>
>> When testing code using the functionality, it is far easier to get said
>> functionality doing a simple "modprobe" rather than having to setup a
>> VM. C.f. running blktests or fstests.
>>
>
>agree on simplicity but then why do we have QEMU implementations for
>the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
>memoery backed NVMeOF test target for NVMe controller features.
>
>Also, recognizing the simplicity I proposed initially NVMe ZNS
>fabrics based emulation over QEMU (I think I still have initial state
>machine implementation code for ZNS somewhere), those were "nacked" for
>the right reason, since we've decided go with QEMU and use that as a
>primary platform for testing, so I failed to understand what has
>changed.. since given that QEMU already supports NVMe simple copy ...

I was not part of this conversation, but as I see it each approach give
a benefit. QEMU is fantastic for compliance testing and I am not sure
you get the same level of command analysis anywhere else; at least not
without writing dedicated code for this in a target.

This said, when we want to test for race conditions, QEMU is very slow.
For a software-only solution, we have experimented with something
similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
the nvme-loop target as an alternative and this seems to work pretty
nicely. I do not believe there should be many changes to support copy
offload using this.

So in my view having both is not replication and it gives more
flexibility for validation, which I believe it is always good.

>
>> So personally, I also think it would be great to have a kernel-based
>> emulation of copy offload. And that should be very easy to implement
>> with the fabric code. Then loopback onto a nullblk device and you get a
>> quick and easy to setup copy-offload device that can even be of the ZNS
>> variant if you want since nullblk supports zones.
>>
>
>One can do that with creating null_blk based NVMeOF target namespace,
>no need to emulate simple copy memory backed code in the fabrics
>with nvme-loop.. it is as simple as inserting module and configuring
>ns with nvmetcli once we have finalized the solution for copy offload.
>If you remember, I already have patches for that...
>
>>>
>>> NVMe ZNS QEMU implementation proved to be perfect and works just
>>> fine for testing, copy offload is not an exception.
>>>
>>>>>> For instance, blktests uses scsi_debug for simplicity.
>>>>>>
>>>>>> In the end you decide what you want to use.
>>>>>
>>>>> Can we use the nvme-loop target instead?
>>>>
>>>> I am advocating for this approach as well. It presentas a virtual nvme
>>>> controller already.
>>>>
>>>
>>> It does that assuming underlying block device such as null_blk or
>>> QEMU implementation supports required features not to bloat the the
>>> NVMeOF target.
>>>
>>> -ck
>>>
>
>-ck
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04  8:24                         ` [dm-devel] " Javier González
@ 2022-02-04  9:58                           ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  9:58 UTC (permalink / raw)
  To: Javier González
  Cc: Damien Le Moal, Luis Chamberlain, Mikulas Patocka, linux-block,
	Keith Busch, Adam Manzanares, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On 2/4/22 12:24 AM, Javier González wrote:
> On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
>> On 2/3/22 22:28, Damien Le Moal wrote:
>>> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>>>
>>>>>>> One can instantiate scsi devices with qemu by using fake scsi 
>>>>>>> devices,
>>>>>>> but one can also just use scsi_debug to do the same. I see both 
>>>>>>> efforts
>>>>>>> as desirable, so long as someone mantains this.
>>>>>>>
>>>>
>>>> Why do you think both efforts are desirable ?
>>>
>>> When testing code using the functionality, it is far easier to get said
>>> functionality doing a simple "modprobe" rather than having to setup a
>>> VM. C.f. running blktests or fstests.
>>>
>>
>> agree on simplicity but then why do we have QEMU implementations for
>> the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
>> memoery backed NVMeOF test target for NVMe controller features.
>>
>> Also, recognizing the simplicity I proposed initially NVMe ZNS
>> fabrics based emulation over QEMU (I think I still have initial state
>> machine implementation code for ZNS somewhere), those were "nacked" for
>> the right reason, since we've decided go with QEMU and use that as a
>> primary platform for testing, so I failed to understand what has
>> changed.. since given that QEMU already supports NVMe simple copy ...
> 
> I was not part of this conversation, but as I see it each approach give
> a benefit. QEMU is fantastic for compliance testing and I am not sure
> you get the same level of command analysis anywhere else; at least not
> without writing dedicated code for this in a target.
> 
> This said, when we want to test for race conditions, QEMU is very slow.

Can you please elaborate the scenario and numbers for slowness of QEMU?

For race conditions testing we can build error injection framework
around the code implementation which present in kernel everywhere.

> For a software-only solution, we have experimented with something
> similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
> the nvme-loop target as an alternative and this seems to work pretty
> nicely. I do not believe there should be many changes to support copy
> offload using this.
> 

If QEMU is so incompetent then we need to add every big feature into
the NVMeOF test target so that we can test it better ? is that what
you are proposing ? since if we implement one feature, it will be
hard to nack any new features that ppl will come up with
same rationale "with QEMU being slow and hard to test race
conditions etc .."

and if that is the case why we don't have ZNS NVMeOF target
memory backed emulation ? Isn't that a bigger and more
complicated feature than Simple Copy where controller states
are involved with AENs ?

ZNS kernel code testing is also done on QEMU, I've also fixed
bugs in the ZNS kernel code which are discovered on QEMU and I've not
seen any issues with that. Given that simple copy feature is way smaller
than ZNS it will less likely to suffer from slowness and etc (listed
above) in QEMU.

my point is if we allow one, we will be opening floodgates and we need 
to be careful not to bloat the code unless it is _absolutely
necessary_ which I don't think it is based on the simple copy
specification.

> So in my view having both is not replication and it gives more
> flexibility for validation, which I believe it is always good.
> 

-ck


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04  9:58                           ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-04  9:58 UTC (permalink / raw)
  To: Javier González
  Cc: Bart, djwong, linux-nvme, clm, dm-devel, Adam Manzanares,
	osandov, msnitzer@redhat.com >> msnitzer@redhat.com,
	Assche, linux-scsi, Damien Le Moal, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, Mikulas Patocka,
	Keith Busch, Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On 2/4/22 12:24 AM, Javier González wrote:
> On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
>> On 2/3/22 22:28, Damien Le Moal wrote:
>>> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>>>
>>>>>>> One can instantiate scsi devices with qemu by using fake scsi 
>>>>>>> devices,
>>>>>>> but one can also just use scsi_debug to do the same. I see both 
>>>>>>> efforts
>>>>>>> as desirable, so long as someone mantains this.
>>>>>>>
>>>>
>>>> Why do you think both efforts are desirable ?
>>>
>>> When testing code using the functionality, it is far easier to get said
>>> functionality doing a simple "modprobe" rather than having to setup a
>>> VM. C.f. running blktests or fstests.
>>>
>>
>> agree on simplicity but then why do we have QEMU implementations for
>> the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
>> memoery backed NVMeOF test target for NVMe controller features.
>>
>> Also, recognizing the simplicity I proposed initially NVMe ZNS
>> fabrics based emulation over QEMU (I think I still have initial state
>> machine implementation code for ZNS somewhere), those were "nacked" for
>> the right reason, since we've decided go with QEMU and use that as a
>> primary platform for testing, so I failed to understand what has
>> changed.. since given that QEMU already supports NVMe simple copy ...
> 
> I was not part of this conversation, but as I see it each approach give
> a benefit. QEMU is fantastic for compliance testing and I am not sure
> you get the same level of command analysis anywhere else; at least not
> without writing dedicated code for this in a target.
> 
> This said, when we want to test for race conditions, QEMU is very slow.

Can you please elaborate the scenario and numbers for slowness of QEMU?

For race conditions testing we can build error injection framework
around the code implementation which present in kernel everywhere.

> For a software-only solution, we have experimented with something
> similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
> the nvme-loop target as an alternative and this seems to work pretty
> nicely. I do not believe there should be many changes to support copy
> offload using this.
> 

If QEMU is so incompetent then we need to add every big feature into
the NVMeOF test target so that we can test it better ? is that what
you are proposing ? since if we implement one feature, it will be
hard to nack any new features that ppl will come up with
same rationale "with QEMU being slow and hard to test race
conditions etc .."

and if that is the case why we don't have ZNS NVMeOF target
memory backed emulation ? Isn't that a bigger and more
complicated feature than Simple Copy where controller states
are involved with AENs ?

ZNS kernel code testing is also done on QEMU, I've also fixed
bugs in the ZNS kernel code which are discovered on QEMU and I've not
seen any issues with that. Given that simple copy feature is way smaller
than ZNS it will less likely to suffer from slowness and etc (listed
above) in QEMU.

my point is if we allow one, we will be opening floodgates and we need 
to be careful not to bloat the code unless it is _absolutely
necessary_ which I don't think it is based on the simple copy
specification.

> So in my view having both is not replication and it gives more
> flexibility for validation, which I believe it is always good.
> 

-ck


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04  9:58                           ` [dm-devel] " Chaitanya Kulkarni
@ 2022-02-04 11:34                             ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2022-02-04 11:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Damien Le Moal, Luis Chamberlain, Mikulas Patocka, linux-block,
	Keith Busch, Adam Manzanares, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, Christoph Hellwig, Frederick.Knight, zach.brown,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	Kanchan Joshi

On 04.02.2022 09:58, Chaitanya Kulkarni wrote:
>On 2/4/22 12:24 AM, Javier González wrote:
>> On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
>>> On 2/3/22 22:28, Damien Le Moal wrote:
>>>> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>>>>
>>>>>>>> One can instantiate scsi devices with qemu by using fake scsi
>>>>>>>> devices,
>>>>>>>> but one can also just use scsi_debug to do the same. I see both
>>>>>>>> efforts
>>>>>>>> as desirable, so long as someone mantains this.
>>>>>>>>
>>>>>
>>>>> Why do you think both efforts are desirable ?
>>>>
>>>> When testing code using the functionality, it is far easier to get said
>>>> functionality doing a simple "modprobe" rather than having to setup a
>>>> VM. C.f. running blktests or fstests.
>>>>
>>>
>>> agree on simplicity but then why do we have QEMU implementations for
>>> the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
>>> memoery backed NVMeOF test target for NVMe controller features.
>>>
>>> Also, recognizing the simplicity I proposed initially NVMe ZNS
>>> fabrics based emulation over QEMU (I think I still have initial state
>>> machine implementation code for ZNS somewhere), those were "nacked" for
>>> the right reason, since we've decided go with QEMU and use that as a
>>> primary platform for testing, so I failed to understand what has
>>> changed.. since given that QEMU already supports NVMe simple copy ...
>>
>> I was not part of this conversation, but as I see it each approach give
>> a benefit. QEMU is fantastic for compliance testing and I am not sure
>> you get the same level of command analysis anywhere else; at least not
>> without writing dedicated code for this in a target.
>>
>> This said, when we want to test for race conditions, QEMU is very slow.
>
>Can you please elaborate the scenario and numbers for slowness of QEMU?

QEMU is an emulator, not a simulator. So we will not be able to stress
the host stack in the same way the null_blk device does. If we want to
test code in the NVMe driver then we need a way to have the equivalent
to the null_blk in NVMe. It seems like the nvme-loop target can achieve
this.

Does this answer your concern?

>
>For race conditions testing we can build error injection framework
>around the code implementation which present in kernel everywhere.

True. This is also a good way to do this.


>
>> For a software-only solution, we have experimented with something
>> similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
>> the nvme-loop target as an alternative and this seems to work pretty
>> nicely. I do not believe there should be many changes to support copy
>> offload using this.
>>
>
>If QEMU is so incompetent then we need to add every big feature into
>the NVMeOF test target so that we can test it better ? is that what
>you are proposing ? since if we implement one feature, it will be
>hard to nack any new features that ppl will come up with
>same rationale "with QEMU being slow and hard to test race
>conditions etc .."

In my opinion, if people want this and is willing to maintain it, there
is a case for it.

>
>and if that is the case why we don't have ZNS NVMeOF target
>memory backed emulation ? Isn't that a bigger and more
>complicated feature than Simple Copy where controller states
>are involved with AENs ?

I think this is a good idea.

>
>ZNS kernel code testing is also done on QEMU, I've also fixed
>bugs in the ZNS kernel code which are discovered on QEMU and I've not
>seen any issues with that. Given that simple copy feature is way smaller
>than ZNS it will less likely to suffer from slowness and etc (listed
>above) in QEMU.

QEMU is super useful: it is easy and it help identifying many issues.
But it is for compliance, not for performance. There was an effort to
make FEMU, but this seems to be an abandoned project.

>
>my point is if we allow one, we will be opening floodgates and we need
>to be careful not to bloat the code unless it is _absolutely
>necessary_ which I don't think it is based on the simple copy
>specification.

I understand, and this is a very valid point. It seems like the
nvme-loop device can give a lot of what we need; all the necessary extra
logic can go into the null_blk and then we do not need NVMe specific
code.

Do you see any inconvenient with this approach?


>
>> So in my view having both is not replication and it gives more
>> flexibility for validation, which I believe it is always good.
>>
>

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04 11:34                             ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2022-02-04 11:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, Adam Manzanares, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Damien Le Moal, Christoph Hellwig,
	roland, zach.brown, dsterba, josef, linux-block, Mikulas Patocka,
	Keith Busch, Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On 04.02.2022 09:58, Chaitanya Kulkarni wrote:
>On 2/4/22 12:24 AM, Javier González wrote:
>> On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
>>> On 2/3/22 22:28, Damien Le Moal wrote:
>>>> On 2/4/22 12:12, Chaitanya Kulkarni wrote:
>>>>>
>>>>>>>> One can instantiate scsi devices with qemu by using fake scsi
>>>>>>>> devices,
>>>>>>>> but one can also just use scsi_debug to do the same. I see both
>>>>>>>> efforts
>>>>>>>> as desirable, so long as someone mantains this.
>>>>>>>>
>>>>>
>>>>> Why do you think both efforts are desirable ?
>>>>
>>>> When testing code using the functionality, it is far easier to get said
>>>> functionality doing a simple "modprobe" rather than having to setup a
>>>> VM. C.f. running blktests or fstests.
>>>>
>>>
>>> agree on simplicity but then why do we have QEMU implementations for
>>> the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
>>> memoery backed NVMeOF test target for NVMe controller features.
>>>
>>> Also, recognizing the simplicity I proposed initially NVMe ZNS
>>> fabrics based emulation over QEMU (I think I still have initial state
>>> machine implementation code for ZNS somewhere), those were "nacked" for
>>> the right reason, since we've decided go with QEMU and use that as a
>>> primary platform for testing, so I failed to understand what has
>>> changed.. since given that QEMU already supports NVMe simple copy ...
>>
>> I was not part of this conversation, but as I see it each approach give
>> a benefit. QEMU is fantastic for compliance testing and I am not sure
>> you get the same level of command analysis anywhere else; at least not
>> without writing dedicated code for this in a target.
>>
>> This said, when we want to test for race conditions, QEMU is very slow.
>
>Can you please elaborate the scenario and numbers for slowness of QEMU?

QEMU is an emulator, not a simulator. So we will not be able to stress
the host stack in the same way the null_blk device does. If we want to
test code in the NVMe driver then we need a way to have the equivalent
to the null_blk in NVMe. It seems like the nvme-loop target can achieve
this.

Does this answer your concern?

>
>For race conditions testing we can build error injection framework
>around the code implementation which present in kernel everywhere.

True. This is also a good way to do this.


>
>> For a software-only solution, we have experimented with something
>> similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
>> the nvme-loop target as an alternative and this seems to work pretty
>> nicely. I do not believe there should be many changes to support copy
>> offload using this.
>>
>
>If QEMU is so incompetent then we need to add every big feature into
>the NVMeOF test target so that we can test it better ? is that what
>you are proposing ? since if we implement one feature, it will be
>hard to nack any new features that ppl will come up with
>same rationale "with QEMU being slow and hard to test race
>conditions etc .."

In my opinion, if people want this and is willing to maintain it, there
is a case for it.

>
>and if that is the case why we don't have ZNS NVMeOF target
>memory backed emulation ? Isn't that a bigger and more
>complicated feature than Simple Copy where controller states
>are involved with AENs ?

I think this is a good idea.

>
>ZNS kernel code testing is also done on QEMU, I've also fixed
>bugs in the ZNS kernel code which are discovered on QEMU and I've not
>seen any issues with that. Given that simple copy feature is way smaller
>than ZNS it will less likely to suffer from slowness and etc (listed
>above) in QEMU.

QEMU is super useful: it is easy and it help identifying many issues.
But it is for compliance, not for performance. There was an effort to
make FEMU, but this seems to be an abandoned project.

>
>my point is if we allow one, we will be opening floodgates and we need
>to be careful not to bloat the code unless it is _absolutely
>necessary_ which I don't think it is based on the simple copy
>specification.

I understand, and this is a very valid point. It seems like the
nvme-loop device can give a lot of what we need; all the necessary extra
logic can go into the null_blk and then we do not need NVMe specific
code.

Do you see any inconvenient with this approach?


>
>> So in my view having both is not replication and it gives more
>> flexibility for validation, which I believe it is always good.
>>
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-03 22:49               ` [dm-devel] " Bart Van Assche
@ 2022-02-04 12:09                 ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-04 12:09 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel



On Thu, 3 Feb 2022, Bart Van Assche wrote:

> On 2/3/22 10:50, Mikulas Patocka wrote:
> > On Tue, 1 Feb 2022, Bart Van Assche wrote:
> > > On 2/1/22 10:32, Mikulas Patocka wrote:
> > > >    /**
> > > > + * blk_queue_max_copy_sectors - set maximum copy offload sectors for
> > > > the
> > > > queue
> > > > + * @q:  the request queue for the device
> > > > + * @size:  the maximum copy offload sectors
> > > > + */
> > > > +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int
> > > > size)
> > > > +{
> > > > +	q->limits.max_copy_sectors = size;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> > > 
> > > Please either change the unit of 'size' into bytes or change its type into
> > > sector_t.
> > 
> > blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
> > blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
> > blk_queue_max_zone_append_sectors also have the unit of sectors and the
> > argument is "unsigned int". Should blk_queue_max_copy_sectors be
> > different?
> 
> As far as I know using the type sector_t for variables that represent a number
> of sectors is a widely followed convention:
> 
> $ git grep -w sector_t | wc -l
> 2575
> 
> I would appreciate it if that convention would be used consistently, even if
> that means modifying existing code.
> 
> Thanks,
> 
> Bart.

Changing the sector limit variables in struct queue_limits from unsigned 
int to sector_t would increase the size of the structure and its cache 
footprint.

And we can't send bios larger than 4GiB anyway because bi_size is 32-bit.

Jens, what do you think about it? Should the sectors limits be sector_t?

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-04 12:09                 ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-04 12:09 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-fsdevel, dm-devel, linux-nvme, linux-scsi



On Thu, 3 Feb 2022, Bart Van Assche wrote:

> On 2/3/22 10:50, Mikulas Patocka wrote:
> > On Tue, 1 Feb 2022, Bart Van Assche wrote:
> > > On 2/1/22 10:32, Mikulas Patocka wrote:
> > > >    /**
> > > > + * blk_queue_max_copy_sectors - set maximum copy offload sectors for
> > > > the
> > > > queue
> > > > + * @q:  the request queue for the device
> > > > + * @size:  the maximum copy offload sectors
> > > > + */
> > > > +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int
> > > > size)
> > > > +{
> > > > +	q->limits.max_copy_sectors = size;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
> > > 
> > > Please either change the unit of 'size' into bytes or change its type into
> > > sector_t.
> > 
> > blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
> > blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
> > blk_queue_max_zone_append_sectors also have the unit of sectors and the
> > argument is "unsigned int". Should blk_queue_max_copy_sectors be
> > different?
> 
> As far as I know using the type sector_t for variables that represent a number
> of sectors is a widely followed convention:
> 
> $ git grep -w sector_t | wc -l
> 2575
> 
> I would appreciate it if that convention would be used consistently, even if
> that means modifying existing code.
> 
> Thanks,
> 
> Bart.

Changing the sector limit variables in struct queue_limits from unsigned 
int to sector_t would increase the size of the structure and its cache 
footprint.

And we can't send bios larger than 4GiB anyway because bi_size is 32-bit.

Jens, what do you think about it? Should the sectors limits be sector_t?

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 1/3] block: add copy offload support
  2022-02-04 12:09                 ` [dm-devel] " Mikulas Patocka
@ 2022-02-04 13:34                   ` Jens Axboe
  -1 siblings, 0 replies; 272+ messages in thread
From: Jens Axboe @ 2022-02-04 13:34 UTC (permalink / raw)
  To: Mikulas Patocka, Bart Van Assche
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel

On 2/4/22 5:09 AM, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Bart Van Assche wrote:
> 
>> On 2/3/22 10:50, Mikulas Patocka wrote:
>>> On Tue, 1 Feb 2022, Bart Van Assche wrote:
>>>> On 2/1/22 10:32, Mikulas Patocka wrote:
>>>>>    /**
>>>>> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for
>>>>> the
>>>>> queue
>>>>> + * @q:  the request queue for the device
>>>>> + * @size:  the maximum copy offload sectors
>>>>> + */
>>>>> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int
>>>>> size)
>>>>> +{
>>>>> +	q->limits.max_copy_sectors = size;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
>>>>
>>>> Please either change the unit of 'size' into bytes or change its type into
>>>> sector_t.
>>>
>>> blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
>>> blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
>>> blk_queue_max_zone_append_sectors also have the unit of sectors and the
>>> argument is "unsigned int". Should blk_queue_max_copy_sectors be
>>> different?
>>
>> As far as I know using the type sector_t for variables that represent a number
>> of sectors is a widely followed convention:
>>
>> $ git grep -w sector_t | wc -l
>> 2575
>>
>> I would appreciate it if that convention would be used consistently, even if
>> that means modifying existing code.
>>
>> Thanks,
>>
>> Bart.
> 
> Changing the sector limit variables in struct queue_limits from
> unsigned int to sector_t would increase the size of the structure and
> its cache footprint.
> 
> And we can't send bios larger than 4GiB anyway because bi_size is
> 32-bit.
> 
> Jens, what do you think about it? Should the sectors limits be
> sector_t?

Why make it larger than it needs to?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support
@ 2022-02-04 13:34                   ` Jens Axboe
  0 siblings, 0 replies; 272+ messages in thread
From: Jens Axboe @ 2022-02-04 13:34 UTC (permalink / raw)
  To: Mikulas Patocka, Bart Van Assche
  Cc: linux-block, linux-fsdevel, dm-devel, linux-nvme, linux-scsi

On 2/4/22 5:09 AM, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Bart Van Assche wrote:
> 
>> On 2/3/22 10:50, Mikulas Patocka wrote:
>>> On Tue, 1 Feb 2022, Bart Van Assche wrote:
>>>> On 2/1/22 10:32, Mikulas Patocka wrote:
>>>>>    /**
>>>>> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for
>>>>> the
>>>>> queue
>>>>> + * @q:  the request queue for the device
>>>>> + * @size:  the maximum copy offload sectors
>>>>> + */
>>>>> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int
>>>>> size)
>>>>> +{
>>>>> +	q->limits.max_copy_sectors = size;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
>>>>
>>>> Please either change the unit of 'size' into bytes or change its type into
>>>> sector_t.
>>>
>>> blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
>>> blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
>>> blk_queue_max_zone_append_sectors also have the unit of sectors and the
>>> argument is "unsigned int". Should blk_queue_max_copy_sectors be
>>> different?
>>
>> As far as I know using the type sector_t for variables that represent a number
>> of sectors is a widely followed convention:
>>
>> $ git grep -w sector_t | wc -l
>> 2575
>>
>> I would appreciate it if that convention would be used consistently, even if
>> that means modifying existing code.
>>
>> Thanks,
>>
>> Bart.
> 
> Changing the sector limit variables in struct queue_limits from
> unsigned int to sector_t would increase the size of the structure and
> its cache footprint.
> 
> And we can't send bios larger than 4GiB anyway because bi_size is
> 32-bit.
> 
> Jens, what do you think about it? Should the sectors limits be
> sector_t?

Why make it larger than it needs to?

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04  9:58                           ` [dm-devel] " Chaitanya Kulkarni
@ 2022-02-04 14:15                             ` Hannes Reinecke
  -1 siblings, 0 replies; 272+ messages in thread
From: Hannes Reinecke @ 2022-02-04 14:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Javier González
  Cc: Damien Le Moal, Luis Chamberlain, Mikulas Patocka, linux-block,
	Keith Busch, Adam Manzanares, linux-scsi, dm-devel, linux-nvme,
	linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On 2/4/22 10:58, Chaitanya Kulkarni wrote:
> On 2/4/22 12:24 AM, Javier González wrote:
[ .. ]
>> For a software-only solution, we have experimented with something
>> similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
>> the nvme-loop target as an alternative and this seems to work pretty
>> nicely. I do not believe there should be many changes to support copy
>> offload using this.
>>
> 
> If QEMU is so incompetent then we need to add every big feature into
> the NVMeOF test target so that we can test it better ? is that what
> you are proposing ? since if we implement one feature, it will be
> hard to nack any new features that ppl will come up with
> same rationale "with QEMU being slow and hard to test race
> conditions etc .."
> 

How would you use qemu for bare-metal testing?

> and if that is the case why we don't have ZNS NVMeOF target
> memory backed emulation ? Isn't that a bigger and more
> complicated feature than Simple Copy where controller states
> are involved with AENs ?
> 
> ZNS kernel code testing is also done on QEMU, I've also fixed
> bugs in the ZNS kernel code which are discovered on QEMU and I've not
> seen any issues with that. Given that simple copy feature is way smaller
> than ZNS it will less likely to suffer from slowness and etc (listed
> above) in QEMU.
> 
> my point is if we allow one, we will be opening floodgates and we need
> to be careful not to bloat the code unless it is _absolutely
> necessary_ which I don't think it is based on the simple copy
> specification.
> 

I do have a slightly different view on the nvme target code; it should 
provide the necessary means to test the nvme host code.
And simple copy is on of these features, especially as it will operate 
as an exploiter of the new functionality.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04 14:15                             ` Hannes Reinecke
  0 siblings, 0 replies; 272+ messages in thread
From: Hannes Reinecke @ 2022-02-04 14:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Javier González
  Cc: djwong, linux-nvme, clm, dm-devel, Adam Manzanares, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Damien Le Moal, Christoph Hellwig,
	roland, zach.brown, dsterba, josef, linux-block, Mikulas Patocka,
	Keith Busch, Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On 2/4/22 10:58, Chaitanya Kulkarni wrote:
> On 2/4/22 12:24 AM, Javier González wrote:
[ .. ]
>> For a software-only solution, we have experimented with something
>> similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
>> the nvme-loop target as an alternative and this seems to work pretty
>> nicely. I do not believe there should be many changes to support copy
>> offload using this.
>>
> 
> If QEMU is so incompetent then we need to add every big feature into
> the NVMeOF test target so that we can test it better ? is that what
> you are proposing ? since if we implement one feature, it will be
> hard to nack any new features that ppl will come up with
> same rationale "with QEMU being slow and hard to test race
> conditions etc .."
> 

How would you use qemu for bare-metal testing?

> and if that is the case why we don't have ZNS NVMeOF target
> memory backed emulation ? Isn't that a bigger and more
> complicated feature than Simple Copy where controller states
> are involved with AENs ?
> 
> ZNS kernel code testing is also done on QEMU, I've also fixed
> bugs in the ZNS kernel code which are discovered on QEMU and I've not
> seen any issues with that. Given that simple copy feature is way smaller
> than ZNS it will less likely to suffer from slowness and etc (listed
> above) in QEMU.
> 
> my point is if we allow one, we will be opening floodgates and we need
> to be careful not to bloat the code unless it is _absolutely
> necessary_ which I don't think it is based on the simple copy
> specification.
> 

I do have a slightly different view on the nvme target code; it should 
provide the necessary means to test the nvme host code.
And simple copy is on of these features, especially as it will operate 
as an exploiter of the new functionality.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04 14:15                             ` [dm-devel] " Hannes Reinecke
@ 2022-02-04 14:24                               ` Keith Busch
  -1 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-04 14:24 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chaitanya Kulkarni, Javier González, Damien Le Moal,
	Luis Chamberlain, Mikulas Patocka, linux-block, Adam Manzanares,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Fri, Feb 04, 2022 at 03:15:02PM +0100, Hannes Reinecke wrote:
> On 2/4/22 10:58, Chaitanya Kulkarni wrote:
> 
> > and if that is the case why we don't have ZNS NVMeOF target
> > memory backed emulation ? Isn't that a bigger and more
> > complicated feature than Simple Copy where controller states
> > are involved with AENs ?
> > 
> > ZNS kernel code testing is also done on QEMU, I've also fixed
> > bugs in the ZNS kernel code which are discovered on QEMU and I've not
> > seen any issues with that. Given that simple copy feature is way smaller
> > than ZNS it will less likely to suffer from slowness and etc (listed
> > above) in QEMU.
> > 
> > my point is if we allow one, we will be opening floodgates and we need
> > to be careful not to bloat the code unless it is _absolutely
> > necessary_ which I don't think it is based on the simple copy
> > specification.
> > 
> 
> I do have a slightly different view on the nvme target code; it should
> provide the necessary means to test the nvme host code.
> And simple copy is on of these features, especially as it will operate as an
> exploiter of the new functionality.

The threshold to determine if the in-kernel fabrics target ought to
implement a feature should be if it's useful in a production.

Are users interested in copying data without using fabric bandwidth?
Yes.

Does anyone want a mocked up ZNS that has all the contraints and none of
the benefits? No.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04 14:24                               ` Keith Busch
  0 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2022-02-04 14:24 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: djwong, linux-nvme, clm, dm-devel, Adam Manzanares, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Damien Le Moal, Christoph Hellwig,
	roland, zach.brown, Chaitanya Kulkarni, Javier González,
	josef, linux-block, Mikulas Patocka, dsterba, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On Fri, Feb 04, 2022 at 03:15:02PM +0100, Hannes Reinecke wrote:
> On 2/4/22 10:58, Chaitanya Kulkarni wrote:
> 
> > and if that is the case why we don't have ZNS NVMeOF target
> > memory backed emulation ? Isn't that a bigger and more
> > complicated feature than Simple Copy where controller states
> > are involved with AENs ?
> > 
> > ZNS kernel code testing is also done on QEMU, I've also fixed
> > bugs in the ZNS kernel code which are discovered on QEMU and I've not
> > seen any issues with that. Given that simple copy feature is way smaller
> > than ZNS it will less likely to suffer from slowness and etc (listed
> > above) in QEMU.
> > 
> > my point is if we allow one, we will be opening floodgates and we need
> > to be careful not to bloat the code unless it is _absolutely
> > necessary_ which I don't think it is based on the simple copy
> > specification.
> > 
> 
> I do have a slightly different view on the nvme target code; it should
> provide the necessary means to test the nvme host code.
> And simple copy is on of these features, especially as it will operate as an
> exploiter of the new functionality.

The threshold to determine if the in-kernel fabrics target ought to
implement a feature should be if it's useful in a production.

Are users interested in copying data without using fabric bandwidth?
Yes.

Does anyone want a mocked up ZNS that has all the contraints and none of
the benefits? No.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 3/3] nvme: add the "debug" host driver
  2022-02-04 14:15                             ` [dm-devel] " Hannes Reinecke
@ 2022-02-04 16:01                               ` Christoph Hellwig
  -1 siblings, 0 replies; 272+ messages in thread
From: Christoph Hellwig @ 2022-02-04 16:01 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chaitanya Kulkarni, Javier González, Damien Le Moal,
	Luis Chamberlain, Mikulas Patocka, linux-block, Keith Busch,
	Adam Manzanares, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

On Fri, Feb 04, 2022 at 03:15:02PM +0100, Hannes Reinecke wrote:
>> ZNS kernel code testing is also done on QEMU, I've also fixed
>> bugs in the ZNS kernel code which are discovered on QEMU and I've not
>> seen any issues with that. Given that simple copy feature is way smaller
>> than ZNS it will less likely to suffer from slowness and etc (listed
>> above) in QEMU.
>>
>> my point is if we allow one, we will be opening floodgates and we need
>> to be careful not to bloat the code unless it is _absolutely
>> necessary_ which I don't think it is based on the simple copy
>> specification.
>>
>
> I do have a slightly different view on the nvme target code; it should 
> provide the necessary means to test the nvme host code.
> And simple copy is on of these features, especially as it will operate as 
> an exploiter of the new functionality.

Well, in general I'd like to have every useful feature supported in
nvmet, because it serves both testing and production.  If a feature
isn't all that useful (and there are lots of those in nvme these days)
we don't need to support in nvmet, but probably neither in the host code.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 3/3] nvme: add the "debug" host driver
@ 2022-02-04 16:01                               ` Christoph Hellwig
  0 siblings, 0 replies; 272+ messages in thread
From: Christoph Hellwig @ 2022-02-04 16:01 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: djwong, linux-nvme, clm, dm-devel, Adam Manzanares, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Damien Le Moal, Christoph Hellwig,
	roland, zach.brown, dsterba, Chaitanya Kulkarni,
	Javier González, josef, linux-block, Mikulas Patocka,
	Keith Busch, Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	Luis Chamberlain, jack, linux-fsdevel, lsf-pc

On Fri, Feb 04, 2022 at 03:15:02PM +0100, Hannes Reinecke wrote:
>> ZNS kernel code testing is also done on QEMU, I've also fixed
>> bugs in the ZNS kernel code which are discovered on QEMU and I've not
>> seen any issues with that. Given that simple copy feature is way smaller
>> than ZNS it will less likely to suffer from slowness and etc (listed
>> above) in QEMU.
>>
>> my point is if we allow one, we will be opening floodgates and we need
>> to be careful not to bloat the code unless it is _absolutely
>> necessary_ which I don't think it is based on the simple copy
>> specification.
>>
>
> I do have a slightly different view on the nvme target code; it should 
> provide the necessary means to test the nvme host code.
> And simple copy is on of these features, especially as it will operate as 
> an exploiter of the new functionality.

Well, in general I'd like to have every useful feature supported in
nvmet, because it serves both testing and production.  If a feature
isn't all that useful (and there are lots of those in nvme these days)
we don't need to support in nvmet, but probably neither in the host code.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [RFC PATCH 0/3] NVMe copy offload patches
  2022-02-01 18:31       ` [dm-devel] " Mikulas Patocka
@ 2022-02-04 19:41         ` Nitesh Shetty
  -1 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-04 19:41 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Javier González, Chaitanya Kulkarni, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi,
	arnav.dawn, Nitesh Shetty

On Wed, Feb 2, 2022 at 12:23 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> Here I'm submitting the first version of NVMe copy offload patches as a
> request for comment. They use the token-based approach as we discussed on
> the phone call.
>
> The first patch adds generic copy offload support to the block layer - it
> adds two new bio types (REQ_OP_COPY_READ_TOKEN and
> REQ_OP_COPY_WRITE_TOKEN) and a new ioctl BLKCOPY and a kernel function
> blkdev_issue_copy.
>
> The second patch adds copy offload support to the NVMe subsystem.
>
> The third patch implements a "nvme-debug" driver - it is similar to
> "scsi-debug", it simulates a nvme host controller, it keeps data in memory
> and it supports copy offload according to NVMe Command Set Specification
> 1.0a. (there are no hardware or software implementations supporting copy
> offload so far, so I implemented it in nvme-debug)
>
> TODO:
> * implement copy offload in device mapper linear target
> * implement copy offload in software NVMe target driver

We had a series that adds these two elements
https://github.com/nitesh-shetty/linux_copy_offload/tree/main/v1

Overall series supports –
1.    Multi-source/destination interface(yes, it does add complexity
but GC use-case needs it)
2.    Copy-emulation at block-layer
3.    Dm-linear and dm-kcopyd support (for cases not requiring split)
4.    Nvmet support (for block and file backend)

These patches definitely need more feedback. If links are hard to read,
we can send another RFC instead. But before that it would be great to have your
inputs on the path forward.
But before that it would be great to have your inputs on the path forward.

PS: The payload-scheme in your series is particularly interesting and
simplifying plumbing, so you might notice that above patches borrow that

> * make it possible to complete REQ_OP_COPY_WRITE_TOKEN bios asynchronously
Patch[0] support asynchronous copy write,if multi dst/src payload is sent.

[0] https://github.com/nitesh-shetty/linux_copy_offload/blob/main/v1/0003-block-Add-copy-offload-support-infrastructure.patch

-- Nitesh

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [RFC PATCH 0/3] NVMe copy offload patches
@ 2022-02-04 19:41         ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-04 19:41 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, Javier González,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	Nitesh Shetty, zach.brown, Chaitanya Kulkarni,
	msnitzer@redhat.com >> msnitzer@redhat.com, josef,
	linux-block, dsterba, kbus @imap.gmail.com>> Keith Busch,
	Frederick.Knight, Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

On Wed, Feb 2, 2022 at 12:23 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> Here I'm submitting the first version of NVMe copy offload patches as a
> request for comment. They use the token-based approach as we discussed on
> the phone call.
>
> The first patch adds generic copy offload support to the block layer - it
> adds two new bio types (REQ_OP_COPY_READ_TOKEN and
> REQ_OP_COPY_WRITE_TOKEN) and a new ioctl BLKCOPY and a kernel function
> blkdev_issue_copy.
>
> The second patch adds copy offload support to the NVMe subsystem.
>
> The third patch implements a "nvme-debug" driver - it is similar to
> "scsi-debug", it simulates a nvme host controller, it keeps data in memory
> and it supports copy offload according to NVMe Command Set Specification
> 1.0a. (there are no hardware or software implementations supporting copy
> offload so far, so I implemented it in nvme-debug)
>
> TODO:
> * implement copy offload in device mapper linear target
> * implement copy offload in software NVMe target driver

We had a series that adds these two elements
https://github.com/nitesh-shetty/linux_copy_offload/tree/main/v1

Overall series supports –
1.    Multi-source/destination interface(yes, it does add complexity
but GC use-case needs it)
2.    Copy-emulation at block-layer
3.    Dm-linear and dm-kcopyd support (for cases not requiring split)
4.    Nvmet support (for block and file backend)

These patches definitely need more feedback. If links are hard to read,
we can send another RFC instead. But before that it would be great to have your
inputs on the path forward.
But before that it would be great to have your inputs on the path forward.

PS: The payload-scheme in your series is particularly interesting and
simplifying plumbing, so you might notice that above patches borrow that

> * make it possible to complete REQ_OP_COPY_WRITE_TOKEN bios asynchronously
Patch[0] support asynchronous copy write,if multi dst/src payload is sent.

[0] https://github.com/nitesh-shetty/linux_copy_offload/blob/main/v1/0003-block-Add-copy-offload-support-infrastructure.patch

-- Nitesh


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-02-01 10:21   ` Javier González
@ 2022-02-07  9:57       ` Nitesh Shetty
  2022-02-07  9:57       ` [dm-devel] " Nitesh Shetty
  1 sibling, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07  9:57 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack, Kanchan Joshi

Chaitanya,

I would like to join the conversation.

Thanks,
Nitesh

On Sun, Feb 6, 2022 at 7:29 PM Javier González <javier@javigon.com> wrote:
>
> On 27.01.2022 07:14, Chaitanya Kulkarni wrote:
> >Hi,
> >
> >* Background :-
> >-----------------------------------------------------------------------
> >
> >Copy offload is a feature that allows file-systems or storage devices
> >to be instructed to copy files/logical blocks without requiring
> >involvement of the local CPU.
> >
> >With reference to the RISC-V summit keynote [1] single threaded
> >performance is limiting due to Denard scaling and multi-threaded
> >performance is slowing down due Moore's law limitations. With the rise
> >of SNIA Computation Technical Storage Working Group (TWG) [2],
> >offloading computations to the device or over the fabrics is becoming
> >popular as there are several solutions available [2]. One of the common
> >operation which is popular in the kernel and is not merged yet is Copy
> >offload over the fabrics or on to the device.
> >
> >* Problem :-
> >-----------------------------------------------------------------------
> >
> >The original work which is done by Martin is present here [3]. The
> >latest work which is posted by Mikulas [4] is not merged yet. These two
> >approaches are totally different from each other. Several storage
> >vendors discourage mixing copy offload requests with regular READ/WRITE
> >I/O. Also, the fact that the operation fails if a copy request ever
> >needs to be split as it traverses the stack it has the unfortunate
> >side-effect of preventing copy offload from working in pretty much
> >every common deployment configuration out there.
> >
> >* Current state of the work :-
> >-----------------------------------------------------------------------
> >
> >With [3] being hard to handle arbitrary DM/MD stacking without
> >splitting the command in two, one for copying IN and one for copying
> >OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> >candidate. Also, with [4] there is an unresolved problem with the
> >two-command approach about how to handle changes to the DM layout
> >between an IN and OUT operations.
> >
> >We have conducted a call with interested people late last year since
> >lack of LSFMMM and we would like to share the details with broader
> >community members.
>
> Chaitanya,
>
> I would also like to join the F2F conversation as a follow up of the
> virtual one last year. We will have a first version of the patches
> posted in the next few weeks. This will hopefully serve as a good first
> step.
>
> Adding Kanchan to thread too.
>
> Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-02-07  9:57       ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07  9:57 UTC (permalink / raw)
  To: Javier González
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, Chaitanya Kulkarni, josef, linux-block,
	mpatocka, kbus >> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso, Kanchan Joshi,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

Chaitanya,

I would like to join the conversation.

Thanks,
Nitesh

On Sun, Feb 6, 2022 at 7:29 PM Javier González <javier@javigon.com> wrote:
>
> On 27.01.2022 07:14, Chaitanya Kulkarni wrote:
> >Hi,
> >
> >* Background :-
> >-----------------------------------------------------------------------
> >
> >Copy offload is a feature that allows file-systems or storage devices
> >to be instructed to copy files/logical blocks without requiring
> >involvement of the local CPU.
> >
> >With reference to the RISC-V summit keynote [1] single threaded
> >performance is limiting due to Denard scaling and multi-threaded
> >performance is slowing down due Moore's law limitations. With the rise
> >of SNIA Computation Technical Storage Working Group (TWG) [2],
> >offloading computations to the device or over the fabrics is becoming
> >popular as there are several solutions available [2]. One of the common
> >operation which is popular in the kernel and is not merged yet is Copy
> >offload over the fabrics or on to the device.
> >
> >* Problem :-
> >-----------------------------------------------------------------------
> >
> >The original work which is done by Martin is present here [3]. The
> >latest work which is posted by Mikulas [4] is not merged yet. These two
> >approaches are totally different from each other. Several storage
> >vendors discourage mixing copy offload requests with regular READ/WRITE
> >I/O. Also, the fact that the operation fails if a copy request ever
> >needs to be split as it traverses the stack it has the unfortunate
> >side-effect of preventing copy offload from working in pretty much
> >every common deployment configuration out there.
> >
> >* Current state of the work :-
> >-----------------------------------------------------------------------
> >
> >With [3] being hard to handle arbitrary DM/MD stacking without
> >splitting the command in two, one for copying IN and one for copying
> >OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> >candidate. Also, with [4] there is an unresolved problem with the
> >two-command approach about how to handle changes to the DM layout
> >between an IN and OUT operations.
> >
> >We have conducted a call with interested people late last year since
> >lack of LSFMMM and we would like to share the details with broader
> >community members.
>
> Chaitanya,
>
> I would also like to join the F2F conversation as a follow up of the
> virtual one last year. We will have a first version of the patches
> posted in the next few weeks. This will hopefully serve as a good first
> step.
>
> Adding Kanchan to thread too.
>
> Javier


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2022-02-07 10:45     ` David Disseldorp
  2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 272+ messages in thread
From: David Disseldorp @ 2022-02-07 10:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	lsf-pc, djwong, josef, clm, dsterba, tytso, jack

On Thu, 27 Jan 2022 07:14:13 +0000, Chaitanya Kulkarni wrote:

> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since 
> lack of LSFMMM and we would like to share the details with broader
> community members.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.

I'd like to attend this discussion. I've worked on the LIO XCOPY
implementation in drivers/target/target_core_xcopy.c and added Samba's
FSCTL_SRV_COPYCHUNK/FSCTL_DUPLICATE_EXTENTS_TO_FILE support.

Cheers, David

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-02-07 10:45     ` David Disseldorp
  0 siblings, 0 replies; 272+ messages in thread
From: David Disseldorp @ 2022-02-07 10:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	tytso, linux-scsi, djwong, josef, linux-nvme, linux-block, clm,
	dm-devel, dsterba, jack, linux-fsdevel, lsf-pc

On Thu, 27 Jan 2022 07:14:13 +0000, Chaitanya Kulkarni wrote:

> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since 
> lack of LSFMMM and we would like to share the details with broader
> community members.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.

I'd like to attend this discussion. I've worked on the LIO XCOPY
implementation in drivers/target/target_core_xcopy.c and added Samba's
FSCTL_SRV_COPYCHUNK/FSCTL_DUPLICATE_EXTENTS_TO_FILE support.

Cheers, David

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [PATCH v2 00/10] Add Copy offload support
       [not found]         ` <CGME20220207141901epcas5p162ec2387815be7a1fd67ce0ab7082119@epcas5p1.samsung.com>
@ 2022-02-07 14:13             ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

The patch series covers the points discussed in November 2021 virtual call
[LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
We have covered the Initial agreed requirements in this patchset.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.


This is on top of our previous patchset v1[1].
Overall series supports –

1. Driver
- NVMe Copy command (single NS), including support in nvme-target (for
	block and file backend)

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface accommodating
	two block-devs, and multi-source/destination interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl

4. In-kernel user
- dm-kcopyd

[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
[1] https://lore.kernel.org/linux-block/20210817101423.12367-1-selvakuma.s1@samsung.com/

Arnav Dawn (1):
  nvmet: add copy command support for bdev and file ns

Nitesh Shetty (6):
  block: Introduce queue limits for copy-offload support
  block: Add copy offload support infrastructure
  block: Introduce a new ioctl for copy
  block: add emulation for copy
  dm: Add support for copy offload.
  dm: Enable copy offload for dm-linear target

SelvaKumar S (3):
  block: make bio_map_kern() non static
  nvme: add copy support
  dm kcopyd: use copy offload support

 block/blk-lib.c                   | 335 ++++++++++++++++++++++++++++++
 block/blk-map.c                   |   2 +-
 block/blk-settings.c              |   6 +
 block/blk-sysfs.c                 |  51 +++++
 block/blk.h                       |   2 +
 block/ioctl.c                     |  37 ++++
 drivers/md/dm-kcopyd.c            |  57 ++++-
 drivers/md/dm-linear.c            |   1 +
 drivers/md/dm-table.c             |  43 ++++
 drivers/md/dm.c                   |   6 +
 drivers/nvme/host/core.c          | 121 ++++++++++-
 drivers/nvme/host/nvme.h          |   7 +
 drivers/nvme/host/pci.c           |   9 +
 drivers/nvme/host/trace.c         |  19 ++
 drivers/nvme/target/admin-cmd.c   |   8 +-
 drivers/nvme/target/io-cmd-bdev.c |  66 ++++++
 drivers/nvme/target/io-cmd-file.c |  48 +++++
 include/linux/blk_types.h         |  20 ++
 include/linux/blkdev.h            |  17 ++
 include/linux/device-mapper.h     |   5 +
 include/linux/nvme.h              |  43 +++-
 include/uapi/linux/fs.h           |  23 ++
 22 files changed, 912 insertions(+), 14 deletions(-)

-- 
2.30.0-rc0


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 00/10] Add Copy offload support
@ 2022-02-07 14:13             ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

The patch series covers the points discussed in November 2021 virtual call
[LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
We have covered the Initial agreed requirements in this patchset.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.


This is on top of our previous patchset v1[1].
Overall series supports –

1. Driver
- NVMe Copy command (single NS), including support in nvme-target (for
	block and file backend)

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface accommodating
	two block-devs, and multi-source/destination interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl

4. In-kernel user
- dm-kcopyd

[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
[1] https://lore.kernel.org/linux-block/20210817101423.12367-1-selvakuma.s1@samsung.com/

Arnav Dawn (1):
  nvmet: add copy command support for bdev and file ns

Nitesh Shetty (6):
  block: Introduce queue limits for copy-offload support
  block: Add copy offload support infrastructure
  block: Introduce a new ioctl for copy
  block: add emulation for copy
  dm: Add support for copy offload.
  dm: Enable copy offload for dm-linear target

SelvaKumar S (3):
  block: make bio_map_kern() non static
  nvme: add copy support
  dm kcopyd: use copy offload support

 block/blk-lib.c                   | 335 ++++++++++++++++++++++++++++++
 block/blk-map.c                   |   2 +-
 block/blk-settings.c              |   6 +
 block/blk-sysfs.c                 |  51 +++++
 block/blk.h                       |   2 +
 block/ioctl.c                     |  37 ++++
 drivers/md/dm-kcopyd.c            |  57 ++++-
 drivers/md/dm-linear.c            |   1 +
 drivers/md/dm-table.c             |  43 ++++
 drivers/md/dm.c                   |   6 +
 drivers/nvme/host/core.c          | 121 ++++++++++-
 drivers/nvme/host/nvme.h          |   7 +
 drivers/nvme/host/pci.c           |   9 +
 drivers/nvme/host/trace.c         |  19 ++
 drivers/nvme/target/admin-cmd.c   |   8 +-
 drivers/nvme/target/io-cmd-bdev.c |  66 ++++++
 drivers/nvme/target/io-cmd-file.c |  48 +++++
 include/linux/blk_types.h         |  20 ++
 include/linux/blkdev.h            |  17 ++
 include/linux/device-mapper.h     |   5 +
 include/linux/nvme.h              |  43 +++-
 include/uapi/linux/fs.h           |  23 ++
 22 files changed, 912 insertions(+), 14 deletions(-)

-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* [PATCH v2 01/10] block: make bio_map_kern() non static
       [not found]             ` <CGME20220207141908epcas5p4f270c89fc32434ea8b525fa973098231@epcas5p4.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S

From: SelvaKumar S <selvakuma.s1@samsung.com>

Make bio_map_kern() non static

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 block/blk-map.c        | 2 +-
 include/linux/blkdev.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 4526adde0156..c110205df0d5 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -336,7 +336,7 @@ static void bio_map_kern_endio(struct bio *bio)
  *	Map the kernel address into a bio suitable for io to a block
  *	device. Returns an error pointer in case of error.
  */
-static struct bio *bio_map_kern(struct request_queue *q, void *data,
+struct bio *bio_map_kern(struct request_queue *q, void *data,
 		unsigned int len, gfp_t gfp_mask)
 {
 	unsigned long kaddr = (unsigned long)data;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3bfc75a2a450..efed3820cbf7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1106,6 +1106,8 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, int flags,
 		struct bio **biop);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+		gfp_t gfp_mask);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 01/10] block: make bio_map_kern() non static
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, axboe, tytso, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

From: SelvaKumar S <selvakuma.s1@samsung.com>

Make bio_map_kern() non static

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 block/blk-map.c        | 2 +-
 include/linux/blkdev.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 4526adde0156..c110205df0d5 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -336,7 +336,7 @@ static void bio_map_kern_endio(struct bio *bio)
  *	Map the kernel address into a bio suitable for io to a block
  *	device. Returns an error pointer in case of error.
  */
-static struct bio *bio_map_kern(struct request_queue *q, void *data,
+struct bio *bio_map_kern(struct request_queue *q, void *data,
 		unsigned int len, gfp_t gfp_mask)
 {
 	unsigned long kaddr = (unsigned long)data;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3bfc75a2a450..efed3820cbf7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1106,6 +1106,8 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, int flags,
 		struct bio **biop);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+		gfp_t gfp_mask);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 02/10] block: Introduce queue limits for copy-offload support
       [not found]             ` <CGME20220207141913epcas5p4d41cb549b7cca1ede5c7a66bbd110da0@epcas5p4.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S

Add device limits as sysfs entries,
        - copy_offload (READ_WRITE)
        - max_copy_sectors (READ_ONLY)
        - max_copy_ranges_sectors (READ_ONLY)
        - max_copy_nr_ranges (READ_ONLY)

copy_offload(= 0), is disabled by default. This needs to be enabled if
copy-offload needs to be used.
max_copy_sectors = 0, indicates the device doesn't support native copy.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 block/blk-settings.c   |  4 ++++
 block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h | 12 ++++++++++
 3 files changed, 67 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b880c70e22e4..818454552cf8 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->copy_offload = 0;
+	lim->max_copy_sectors = 0;
+	lim->max_copy_nr_ranges = 0;
+	lim->max_copy_range_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9f32882ceb2f..dc68ae6b55c9 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -171,6 +171,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
 	return queue_var_show(q->limits.discard_granularity, page);
 }
 
+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.copy_offload, page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long copy_offload;
+	ssize_t ret = queue_var_store(&copy_offload, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_offload && q->limits.max_copy_sectors == 0)
+		return -EINVAL;
+
+	if (copy_offload)
+		q->limits.copy_offload = BLK_COPY_OFFLOAD;
+	else
+		q->limits.copy_offload = 0;
+
+	return ret;
+}
+
+static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.max_copy_sectors, page);
+}
+
+static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_range_sectors, page);
+}
+
+static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_nr_ranges, page);
+}
+
 static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
 {
 
@@ -597,6 +639,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
 QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
 QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
 
+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
+QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
+QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
+
 QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
 QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
 QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -643,6 +690,10 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_copy_offload_entry.attr,
+	&queue_max_copy_sectors_entry.attr,
+	&queue_max_copy_range_sectors_entry.attr,
+	&queue_max_copy_nr_ranges_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index efed3820cbf7..f63ae50f1de3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -51,6 +51,12 @@ extern struct class block_class;
 /* Doing classic polling */
 #define BLK_MQ_POLL_CLASSIC -1
 
+/* Define copy offload options */
+enum blk_copy {
+	BLK_COPY_EMULATE = 0,
+	BLK_COPY_OFFLOAD,
+};
+
 /*
  * Maximum number of blkcg policies allowed to be registered concurrently.
  * Defined here to simplify include dependency.
@@ -253,6 +259,10 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int            copy_offload;
+	unsigned int            max_copy_sectors;
+	unsigned short          max_copy_range_sectors;
+	unsigned short          max_copy_nr_ranges;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -562,6 +572,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_COPY		30	/* supports copy offload */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -585,6 +596,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
+#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_secure_erase(q) \
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 02/10] block: Introduce queue limits for copy-offload support
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, axboe, tytso, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

Add device limits as sysfs entries,
        - copy_offload (READ_WRITE)
        - max_copy_sectors (READ_ONLY)
        - max_copy_ranges_sectors (READ_ONLY)
        - max_copy_nr_ranges (READ_ONLY)

copy_offload(= 0), is disabled by default. This needs to be enabled if
copy-offload needs to be used.
max_copy_sectors = 0, indicates the device doesn't support native copy.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 block/blk-settings.c   |  4 ++++
 block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h | 12 ++++++++++
 3 files changed, 67 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b880c70e22e4..818454552cf8 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
+	lim->copy_offload = 0;
+	lim->max_copy_sectors = 0;
+	lim->max_copy_nr_ranges = 0;
+	lim->max_copy_range_sectors = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9f32882ceb2f..dc68ae6b55c9 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -171,6 +171,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
 	return queue_var_show(q->limits.discard_granularity, page);
 }
 
+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.copy_offload, page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long copy_offload;
+	ssize_t ret = queue_var_store(&copy_offload, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_offload && q->limits.max_copy_sectors == 0)
+		return -EINVAL;
+
+	if (copy_offload)
+		q->limits.copy_offload = BLK_COPY_OFFLOAD;
+	else
+		q->limits.copy_offload = 0;
+
+	return ret;
+}
+
+static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.max_copy_sectors, page);
+}
+
+static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_range_sectors, page);
+}
+
+static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
+		char *page)
+{
+	return queue_var_show(q->limits.max_copy_nr_ranges, page);
+}
+
 static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
 {
 
@@ -597,6 +639,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
 QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
 QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
 
+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
+QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
+QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
+
 QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
 QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
 QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -643,6 +690,10 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_max_entry.attr,
 	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_copy_offload_entry.attr,
+	&queue_max_copy_sectors_entry.attr,
+	&queue_max_copy_range_sectors_entry.attr,
+	&queue_max_copy_nr_ranges_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index efed3820cbf7..f63ae50f1de3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -51,6 +51,12 @@ extern struct class block_class;
 /* Doing classic polling */
 #define BLK_MQ_POLL_CLASSIC -1
 
+/* Define copy offload options */
+enum blk_copy {
+	BLK_COPY_EMULATE = 0,
+	BLK_COPY_OFFLOAD,
+};
+
 /*
  * Maximum number of blkcg policies allowed to be registered concurrently.
  * Defined here to simplify include dependency.
@@ -253,6 +259,10 @@ struct queue_limits {
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 	unsigned int		zone_write_granularity;
+	unsigned int            copy_offload;
+	unsigned int            max_copy_sectors;
+	unsigned short          max_copy_range_sectors;
+	unsigned short          max_copy_nr_ranges;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -562,6 +572,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_COPY		30	/* supports copy offload */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -585,6 +596,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
+#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_secure_erase(q) \
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 03/10] block: Add copy offload support infrastructure
       [not found]             ` <CGME20220207141918epcas5p4f9badc0c3f3f0913f091c850d2d3bd81@epcas5p4.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S

Introduce blkdev_issue_copy which supports source and destination bdevs,
and a array of (source, destination and copy length) tuples.
Introduce REQ_COP copy offload operation flag. Create a read-write
bio pair with a token as payload and submitted to the device in order.
the read request populates token with source specific information which
is then passed with write request.
Ths design is courtsey Mikulas Patocka<mpatocka@>'s token based copy

Larger copy operation may be divided if necessary by looking at device
limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c           | 216 ++++++++++++++++++++++++++++++++++++++
 block/blk-settings.c      |   2 +
 block/blk.h               |   2 +
 include/linux/blk_types.h |  20 ++++
 include/linux/blkdev.h    |   3 +
 include/uapi/linux/fs.h   |  14 +++
 6 files changed, 257 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 1b8ced45e4e5..3ae2c27b566e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -135,6 +135,222 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+/*
+ * Wait on and process all in-flight BIOs.  This must only be called once
+ * all bios have been issued so that the refcount can only decrease.
+ * This just waits for all bios to make it through bio_copy_end_io. IO
+ * errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+	int ret = 0;
+
+	while (atomic_read(&cio->refcount)) {
+		cio->waiter = current;
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		blk_io_schedule();
+		/* wake up sets us TASK_RUNNING */
+		cio->waiter = NULL;
+		ret = cio->io_err;
+	}
+	kvfree(cio);
+
+	return ret;
+}
+
+static void bio_copy_end_io(struct bio *bio)
+{
+	struct copy_ctx *ctx = bio->bi_private;
+	struct cio *cio = ctx->cio;
+	sector_t clen;
+	int ri = ctx->range_idx;
+
+	if (bio->bi_status) {
+		cio->io_err = bio->bi_status;
+		clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT;
+		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
+	}
+	__free_page(bio->bi_io_vec[0].bv_page);
+	kfree(ctx);
+	bio_put(bio);
+
+	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
+		wake_up_process(cio->waiter);
+}
+
+/*
+ * blk_copy_offload	- Use device's native copy offload feature
+ * Go through user provide payload, prepare new payload based on device's copy offload limits.
+ */
+int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *sq = bdev_get_queue(src_bdev);
+	struct request_queue *dq = bdev_get_queue(dst_bdev);
+	struct bio *read_bio, *write_bio;
+	struct copy_ctx *ctx;
+	struct cio *cio;
+	struct page *token;
+	sector_t src_blk, copy_len, dst_blk;
+	sector_t remaining, max_copy_len = LONG_MAX;
+	int ri = 0, ret = 0;
+
+	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
+	if (!cio)
+		return -ENOMEM;
+	atomic_set(&cio->refcount, 0);
+	cio->rlist = rlist;
+
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
+			(sector_t)dq->limits.max_copy_sectors);
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
+			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		cio->rlist[ri].comp_len = rlist[ri].len;
+		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
+			remaining > 0;
+			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
+			copy_len = min(remaining, max_copy_len);
+
+			token = alloc_page(gfp_mask);
+			if (unlikely(!token)) {
+				ret = -ENOMEM;
+				goto err_token;
+			}
+
+			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!read_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
+			read_bio->bi_iter.bi_size = copy_len;
+			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+			ret = submit_bio_wait(read_bio);
+			if (ret) {
+				bio_put(read_bio);
+				goto err_read_bio;
+			}
+			bio_put(read_bio);
+			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
+			if (!ctx) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			ctx->cio = cio;
+			ctx->range_idx = ri;
+			ctx->start_sec = rlist[ri].src;
+
+			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!write_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+
+			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
+			write_bio->bi_iter.bi_size = copy_len;
+			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+			write_bio->bi_end_io = bio_copy_end_io;
+			write_bio->bi_private = ctx;
+			atomic_inc(&cio->refcount);
+			submit_bio(write_bio);
+		}
+	}
+
+	/* Wait for completion of all IO's*/
+	return cio_await_completion(cio);
+
+err_read_bio:
+	__free_page(token);
+err_token:
+	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
+
+	cio->io_err = ret;
+	return cio_await_completion(cio);
+}
+
+static inline int blk_copy_sanity_check(struct block_device *src_bdev,
+		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
+{
+	unsigned int align_mask = max(
+			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
+	sector_t len = 0;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		if (rlist[i].len)
+			len += rlist[i].len;
+		else
+			return -EINVAL;
+		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
+				(rlist[i].len & align_mask))
+			return -EINVAL;
+		rlist[i].comp_len = 0;
+	}
+
+	if (!len && len >= MAX_COPY_TOTAL_LENGTH)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline bool blk_check_copy_offload(struct request_queue *src_q,
+		struct request_queue *dest_q)
+{
+	if (dest_q->limits.copy_offload == BLK_COPY_OFFLOAD &&
+			src_q->limits.copy_offload == BLK_COPY_OFFLOAD)
+		return true;
+
+	return false;
+}
+
+/*
+ * blkdev_issue_copy - queue a copy
+ * @src_bdev:	source block device
+ * @nr_srcs:	number of source ranges to copy
+ * @src_rlist:	array of source ranges
+ * @dest_bdev:	destination block device
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ * @flags:	BLKDEV_COPY_* flags to control behaviour
+ *
+ * Description:
+ *	Copy source ranges from source block device to destination block device.
+ *	length of a source range cannot be zero.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev,
+		gfp_t gfp_mask, int flags)
+{
+	struct request_queue *src_q = bdev_get_queue(src_bdev);
+	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
+	int ret = -EINVAL;
+
+	if (!src_q || !dest_q)
+		return -ENXIO;
+
+	if (!nr)
+		return -EINVAL;
+
+	if (nr >= MAX_COPY_NR_RANGE)
+		return -EINVAL;
+
+	if (bdev_read_only(dest_bdev))
+		return -EPERM;
+
+	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
+	if (ret)
+		return ret;
+
+	if (blk_check_copy_offload(src_q, dest_q))
+		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_copy);
+
 /**
  * __blkdev_issue_write_same - generate number of bios with same page
  * @bdev:	target blockdev
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 818454552cf8..4c8d48b8af25 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -545,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_copy_sectors = min_not_zero(t->max_copy_sectors, b->max_copy_sectors);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk.h b/block/blk.h
index abb663a2a147..94d2b055750b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 		break;
 	}
 
+	if (unlikely(op_is_copy(bio->bi_opf)))
+		return false;
 	/*
 	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
 	 * This is a quick and dirty check that relies on the fact that
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 5561e58d158a..0a3fee8ad61c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -418,6 +418,7 @@ enum req_flag_bits {
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
+	__REQ_COPY,		/* copy request*/
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -442,6 +443,7 @@ enum req_flag_bits {
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
+#define REQ_COPY		(1ULL << __REQ_COPY)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
@@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op)
 	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
 }
 
+static inline bool op_is_copy(unsigned int op)
+{
+	return (op & REQ_COPY);
+}
+
 /*
  * Check if a bio or request operation is a zone management operation, with
  * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
@@ -532,4 +539,17 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+struct cio {
+	atomic_t refcount;
+	blk_status_t io_err;
+	struct range_entry *rlist;
+	struct task_struct *waiter;     /* waiting task (NULL if none) */
+};
+
+struct copy_ctx {
+	int range_idx;
+	sector_t start_sec;
+	struct cio *cio;
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f63ae50f1de3..15597488040c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1120,6 +1120,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		struct bio **biop);
 struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
 		gfp_t gfp_mask);
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev,
+		gfp_t gfp_mask, int flags);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..55bca8f6e8ed 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,20 @@ struct fstrim_range {
 	__u64 minlen;
 };
 
+/* Maximum no of entries supported */
+#define MAX_COPY_NR_RANGE	(1 << 12)
+
+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
+
+/* Source range entry for copy */
+struct range_entry {
+	__u64 src;
+	__u64 dst;
+	__u64 len;
+	__u64 comp_len;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, axboe, tytso, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

Introduce blkdev_issue_copy which supports source and destination bdevs,
and a array of (source, destination and copy length) tuples.
Introduce REQ_COP copy offload operation flag. Create a read-write
bio pair with a token as payload and submitted to the device in order.
the read request populates token with source specific information which
is then passed with write request.
Ths design is courtsey Mikulas Patocka<mpatocka@>'s token based copy

Larger copy operation may be divided if necessary by looking at device
limits.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/blk-lib.c           | 216 ++++++++++++++++++++++++++++++++++++++
 block/blk-settings.c      |   2 +
 block/blk.h               |   2 +
 include/linux/blk_types.h |  20 ++++
 include/linux/blkdev.h    |   3 +
 include/uapi/linux/fs.h   |  14 +++
 6 files changed, 257 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 1b8ced45e4e5..3ae2c27b566e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -135,6 +135,222 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+/*
+ * Wait on and process all in-flight BIOs.  This must only be called once
+ * all bios have been issued so that the refcount can only decrease.
+ * This just waits for all bios to make it through bio_copy_end_io. IO
+ * errors are propagated through cio->io_error.
+ */
+static int cio_await_completion(struct cio *cio)
+{
+	int ret = 0;
+
+	while (atomic_read(&cio->refcount)) {
+		cio->waiter = current;
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		blk_io_schedule();
+		/* wake up sets us TASK_RUNNING */
+		cio->waiter = NULL;
+		ret = cio->io_err;
+	}
+	kvfree(cio);
+
+	return ret;
+}
+
+static void bio_copy_end_io(struct bio *bio)
+{
+	struct copy_ctx *ctx = bio->bi_private;
+	struct cio *cio = ctx->cio;
+	sector_t clen;
+	int ri = ctx->range_idx;
+
+	if (bio->bi_status) {
+		cio->io_err = bio->bi_status;
+		clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT;
+		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
+	}
+	__free_page(bio->bi_io_vec[0].bv_page);
+	kfree(ctx);
+	bio_put(bio);
+
+	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
+		wake_up_process(cio->waiter);
+}
+
+/*
+ * blk_copy_offload	- Use device's native copy offload feature
+ * Go through user provide payload, prepare new payload based on device's copy offload limits.
+ */
+int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
+{
+	struct request_queue *sq = bdev_get_queue(src_bdev);
+	struct request_queue *dq = bdev_get_queue(dst_bdev);
+	struct bio *read_bio, *write_bio;
+	struct copy_ctx *ctx;
+	struct cio *cio;
+	struct page *token;
+	sector_t src_blk, copy_len, dst_blk;
+	sector_t remaining, max_copy_len = LONG_MAX;
+	int ri = 0, ret = 0;
+
+	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
+	if (!cio)
+		return -ENOMEM;
+	atomic_set(&cio->refcount, 0);
+	cio->rlist = rlist;
+
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
+			(sector_t)dq->limits.max_copy_sectors);
+	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
+			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
+
+	for (ri = 0; ri < nr_srcs; ri++) {
+		cio->rlist[ri].comp_len = rlist[ri].len;
+		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
+			remaining > 0;
+			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
+			copy_len = min(remaining, max_copy_len);
+
+			token = alloc_page(gfp_mask);
+			if (unlikely(!token)) {
+				ret = -ENOMEM;
+				goto err_token;
+			}
+
+			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!read_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
+			read_bio->bi_iter.bi_size = copy_len;
+			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
+			ret = submit_bio_wait(read_bio);
+			if (ret) {
+				bio_put(read_bio);
+				goto err_read_bio;
+			}
+			bio_put(read_bio);
+			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
+			if (!ctx) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+			ctx->cio = cio;
+			ctx->range_idx = ri;
+			ctx->start_sec = rlist[ri].src;
+
+			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
+					gfp_mask);
+			if (!write_bio) {
+				ret = -ENOMEM;
+				goto err_read_bio;
+			}
+
+			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
+			write_bio->bi_iter.bi_size = copy_len;
+			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
+			write_bio->bi_end_io = bio_copy_end_io;
+			write_bio->bi_private = ctx;
+			atomic_inc(&cio->refcount);
+			submit_bio(write_bio);
+		}
+	}
+
+	/* Wait for completion of all IO's*/
+	return cio_await_completion(cio);
+
+err_read_bio:
+	__free_page(token);
+err_token:
+	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
+
+	cio->io_err = ret;
+	return cio_await_completion(cio);
+}
+
+static inline int blk_copy_sanity_check(struct block_device *src_bdev,
+		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
+{
+	unsigned int align_mask = max(
+			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
+	sector_t len = 0;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		if (rlist[i].len)
+			len += rlist[i].len;
+		else
+			return -EINVAL;
+		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
+				(rlist[i].len & align_mask))
+			return -EINVAL;
+		rlist[i].comp_len = 0;
+	}
+
+	if (!len && len >= MAX_COPY_TOTAL_LENGTH)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline bool blk_check_copy_offload(struct request_queue *src_q,
+		struct request_queue *dest_q)
+{
+	if (dest_q->limits.copy_offload == BLK_COPY_OFFLOAD &&
+			src_q->limits.copy_offload == BLK_COPY_OFFLOAD)
+		return true;
+
+	return false;
+}
+
+/*
+ * blkdev_issue_copy - queue a copy
+ * @src_bdev:	source block device
+ * @nr_srcs:	number of source ranges to copy
+ * @src_rlist:	array of source ranges
+ * @dest_bdev:	destination block device
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ * @flags:	BLKDEV_COPY_* flags to control behaviour
+ *
+ * Description:
+ *	Copy source ranges from source block device to destination block device.
+ *	length of a source range cannot be zero.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev,
+		gfp_t gfp_mask, int flags)
+{
+	struct request_queue *src_q = bdev_get_queue(src_bdev);
+	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
+	int ret = -EINVAL;
+
+	if (!src_q || !dest_q)
+		return -ENXIO;
+
+	if (!nr)
+		return -EINVAL;
+
+	if (nr >= MAX_COPY_NR_RANGE)
+		return -EINVAL;
+
+	if (bdev_read_only(dest_bdev))
+		return -EPERM;
+
+	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
+	if (ret)
+		return ret;
+
+	if (blk_check_copy_offload(src_q, dest_q))
+		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_copy);
+
 /**
  * __blkdev_issue_write_same - generate number of bios with same page
  * @bdev:	target blockdev
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 818454552cf8..4c8d48b8af25 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -545,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_copy_sectors = min_not_zero(t->max_copy_sectors, b->max_copy_sectors);
+
 	t->misaligned |= b->misaligned;
 
 	alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk.h b/block/blk.h
index abb663a2a147..94d2b055750b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 		break;
 	}
 
+	if (unlikely(op_is_copy(bio->bi_opf)))
+		return false;
 	/*
 	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
 	 * This is a quick and dirty check that relies on the fact that
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 5561e58d158a..0a3fee8ad61c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -418,6 +418,7 @@ enum req_flag_bits {
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
+	__REQ_COPY,		/* copy request*/
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -442,6 +443,7 @@ enum req_flag_bits {
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
+#define REQ_COPY		(1ULL << __REQ_COPY)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
@@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op)
 	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
 }
 
+static inline bool op_is_copy(unsigned int op)
+{
+	return (op & REQ_COPY);
+}
+
 /*
  * Check if a bio or request operation is a zone management operation, with
  * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
@@ -532,4 +539,17 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+struct cio {
+	atomic_t refcount;
+	blk_status_t io_err;
+	struct range_entry *rlist;
+	struct task_struct *waiter;     /* waiting task (NULL if none) */
+};
+
+struct copy_ctx {
+	int range_idx;
+	sector_t start_sec;
+	struct cio *cio;
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f63ae50f1de3..15597488040c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1120,6 +1120,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		struct bio **biop);
 struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
 		gfp_t gfp_mask);
+int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
+		struct range_entry *src_rlist, struct block_device *dest_bdev,
+		gfp_t gfp_mask, int flags);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..55bca8f6e8ed 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,20 @@ struct fstrim_range {
 	__u64 minlen;
 };
 
+/* Maximum no of entries supported */
+#define MAX_COPY_NR_RANGE	(1 << 12)
+
+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
+
+/* Source range entry for copy */
+struct range_entry {
+	__u64 src;
+	__u64 dst;
+	__u64 len;
+	__u64 comp_len;
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 04/10] block: Introduce a new ioctl for copy
       [not found]             ` <CGME20220207141924epcas5p26ad9cf5de732224f408aded12ed0a577@epcas5p2.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/ioctl.c           | 37 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h |  9 +++++++++
 2 files changed, 46 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 4a86340133e4..d77f6143287e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -124,6 +124,41 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	struct copy_range crange, *ranges;
+	size_t payload_size = 0;
+	int ret;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
+		return -EFAULT;
+
+	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
+		return -EINVAL;
+
+	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
+
+	ranges = kmalloc(payload_size, GFP_KERNEL);
+	if (!ranges)
+		return -ENOMEM;
+
+	if (copy_from_user(ranges, (void __user *)arg, payload_size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL, 0);
+	if (copy_to_user((void __user *)arg, ranges, payload_size))
+		ret = -EFAULT;
+out:
+	kfree(ranges);
+	return ret;
+}
+
 static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
@@ -455,6 +490,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 	case BLKSECDISCARD:
 		return blk_ioctl_discard(bdev, mode, arg,
 				BLKDEV_DISCARD_SECURE);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 55bca8f6e8ed..190911ea4311 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -78,6 +78,14 @@ struct range_entry {
 	__u64 comp_len;
 };
 
+struct copy_range {
+	__u64 nr_range;
+	__u64 reserved;
+
+	/* Range_list always must be at the end */
+	struct range_entry range_list[];
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
@@ -199,6 +207,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 04/10] block: Introduce a new ioctl for copy
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
---
 block/ioctl.c           | 37 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h |  9 +++++++++
 2 files changed, 46 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 4a86340133e4..d77f6143287e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -124,6 +124,41 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 	return err;
 }
 
+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	struct copy_range crange, *ranges;
+	size_t payload_size = 0;
+	int ret;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
+		return -EFAULT;
+
+	if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
+		return -EINVAL;
+
+	payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
+
+	ranges = kmalloc(payload_size, GFP_KERNEL);
+	if (!ranges)
+		return -ENOMEM;
+
+	if (copy_from_user(ranges, (void __user *)arg, payload_size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL, 0);
+	if (copy_to_user((void __user *)arg, ranges, payload_size))
+		ret = -EFAULT;
+out:
+	kfree(ranges);
+	return ret;
+}
+
 static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
@@ -455,6 +490,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 	case BLKSECDISCARD:
 		return blk_ioctl_discard(bdev, mode, arg,
 				BLKDEV_DISCARD_SECURE);
+	case BLKCOPY:
+		return blk_ioctl_copy(bdev, mode, arg);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
 	case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 55bca8f6e8ed..190911ea4311 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -78,6 +78,14 @@ struct range_entry {
 	__u64 comp_len;
 };
 
+struct copy_range {
+	__u64 nr_range;
+	__u64 reserved;
+
+	/* Range_list always must be at the end */
+	struct range_entry range_list[];
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME		0
 #define FILE_DEDUPE_RANGE_DIFFERS	1
@@ -199,6 +207,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 05/10] block: add emulation for copy
       [not found]             ` <CGME20220207141930epcas5p2bcbff65f78ad1dede64648d73ddb3770@epcas5p2.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination synchronously.

TODO: Optimise emulation.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 block/blk-lib.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3ae2c27b566e..05c8cd02fffc 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -272,6 +272,65 @@ int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
 	return cio_await_completion(cio);
 }
 
+int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
+				sector_t sector, unsigned int op, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio, *parent = NULL;
+	sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
+			queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
+	sector_t len, remaining;
+	int ret;
+
+	for (remaining = buf_len; remaining > 0; remaining -= len) {
+		len = min_t(int, max_hw_len, remaining);
+retry:
+		bio = bio_map_kern(q, buf, len, gfp_mask);
+		if (IS_ERR(bio)) {
+			len >>= 1;
+			if (len)
+				goto retry;
+			return PTR_ERR(bio);
+		}
+
+		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
+		bio->bi_opf = op;
+		bio_set_dev(bio, bdev);
+		bio->bi_end_io = NULL;
+		bio->bi_private = NULL;
+
+		if (parent) {
+			bio_chain(parent, bio);
+			submit_bio(parent);
+		}
+		parent = bio;
+		sector += len;
+		buf = (char *) buf + len;
+	}
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return ret;
+}
+
+static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask)
+{
+	int min_size = PAGE_SIZE;
+	void *buf;
+
+	while (req_size >= min_size) {
+		buf = kvmalloc(req_size, gfp_mask);
+		if (buf) {
+			*alloc_size = req_size;
+			return buf;
+		}
+		/* retry half the requested size */
+		req_size >>= 1;
+	}
+
+	return NULL;
+}
+
 static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
 {
@@ -297,6 +356,64 @@ static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 	return 0;
 }
 
+static inline sector_t blk_copy_max_range(struct range_entry *rlist, int nr, sector_t *max_len)
+{
+	int i;
+	sector_t len = 0;
+
+	*max_len = 0;
+	for (i = 0; i < nr; i++) {
+		*max_len = max(*max_len, rlist[i].len);
+		len += rlist[i].len;
+	}
+
+	return len;
+}
+
+/*
+ * If native copy offload feature is absent, this function tries to emulate,
+ * by copying data from source to a temporary buffer and from buffer to
+ * destination device.
+ */
+static int blk_copy_emulate(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	void *buf = NULL;
+	int ret, nr_i = 0;
+	sector_t src, dst, copy_len, buf_len, read_len, copied_len, max_len = 0, remaining = 0;
+
+	copy_len = blk_copy_max_range(rlist, nr, &max_len);
+	buf = blk_alloc_buf(max_len, &buf_len, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	for (copied_len = 0; copied_len < copy_len; copied_len += read_len) {
+		if (!remaining) {
+			rlist[nr_i].comp_len = 0;
+			src = rlist[nr_i].src;
+			dst = rlist[nr_i].dst;
+			remaining = rlist[nr_i++].len;
+		}
+
+		read_len = min_t(sector_t, remaining, buf_len);
+		ret = blk_submit_rw_buf(src_bdev, buf, read_len, src, REQ_OP_READ, gfp_mask);
+		if (ret)
+			goto out;
+		src += read_len;
+		remaining -= read_len;
+		ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dst, REQ_OP_WRITE,
+				gfp_mask);
+		if (ret)
+			goto out;
+		else
+			rlist[nr_i - 1].comp_len += read_len;
+		dst += read_len;
+	}
+out:
+	kvfree(buf);
+	return ret;
+}
+
 static inline bool blk_check_copy_offload(struct request_queue *src_q,
 		struct request_queue *dest_q)
 {
@@ -346,6 +463,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 
 	if (blk_check_copy_offload(src_q, dest_q))
 		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+	else
+		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
 
 	return ret;
 }
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 05/10] block: add emulation for copy
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

For the devices which does not support copy, copy emulation is
added. Copy-emulation is implemented by reading from source ranges
into memory and writing to the corresponding destination synchronously.

TODO: Optimise emulation.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 block/blk-lib.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3ae2c27b566e..05c8cd02fffc 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -272,6 +272,65 @@ int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
 	return cio_await_completion(cio);
 }
 
+int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
+				sector_t sector, unsigned int op, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio, *parent = NULL;
+	sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
+			queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
+	sector_t len, remaining;
+	int ret;
+
+	for (remaining = buf_len; remaining > 0; remaining -= len) {
+		len = min_t(int, max_hw_len, remaining);
+retry:
+		bio = bio_map_kern(q, buf, len, gfp_mask);
+		if (IS_ERR(bio)) {
+			len >>= 1;
+			if (len)
+				goto retry;
+			return PTR_ERR(bio);
+		}
+
+		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
+		bio->bi_opf = op;
+		bio_set_dev(bio, bdev);
+		bio->bi_end_io = NULL;
+		bio->bi_private = NULL;
+
+		if (parent) {
+			bio_chain(parent, bio);
+			submit_bio(parent);
+		}
+		parent = bio;
+		sector += len;
+		buf = (char *) buf + len;
+	}
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return ret;
+}
+
+static void *blk_alloc_buf(sector_t req_size, sector_t *alloc_size, gfp_t gfp_mask)
+{
+	int min_size = PAGE_SIZE;
+	void *buf;
+
+	while (req_size >= min_size) {
+		buf = kvmalloc(req_size, gfp_mask);
+		if (buf) {
+			*alloc_size = req_size;
+			return buf;
+		}
+		/* retry half the requested size */
+		req_size >>= 1;
+	}
+
+	return NULL;
+}
+
 static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
 {
@@ -297,6 +356,64 @@ static inline int blk_copy_sanity_check(struct block_device *src_bdev,
 	return 0;
 }
 
+static inline sector_t blk_copy_max_range(struct range_entry *rlist, int nr, sector_t *max_len)
+{
+	int i;
+	sector_t len = 0;
+
+	*max_len = 0;
+	for (i = 0; i < nr; i++) {
+		*max_len = max(*max_len, rlist[i].len);
+		len += rlist[i].len;
+	}
+
+	return len;
+}
+
+/*
+ * If native copy offload feature is absent, this function tries to emulate,
+ * by copying data from source to a temporary buffer and from buffer to
+ * destination device.
+ */
+static int blk_copy_emulate(struct block_device *src_bdev, int nr,
+		struct range_entry *rlist, struct block_device *dest_bdev, gfp_t gfp_mask)
+{
+	void *buf = NULL;
+	int ret, nr_i = 0;
+	sector_t src, dst, copy_len, buf_len, read_len, copied_len, max_len = 0, remaining = 0;
+
+	copy_len = blk_copy_max_range(rlist, nr, &max_len);
+	buf = blk_alloc_buf(max_len, &buf_len, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	for (copied_len = 0; copied_len < copy_len; copied_len += read_len) {
+		if (!remaining) {
+			rlist[nr_i].comp_len = 0;
+			src = rlist[nr_i].src;
+			dst = rlist[nr_i].dst;
+			remaining = rlist[nr_i++].len;
+		}
+
+		read_len = min_t(sector_t, remaining, buf_len);
+		ret = blk_submit_rw_buf(src_bdev, buf, read_len, src, REQ_OP_READ, gfp_mask);
+		if (ret)
+			goto out;
+		src += read_len;
+		remaining -= read_len;
+		ret = blk_submit_rw_buf(dest_bdev, buf, read_len, dst, REQ_OP_WRITE,
+				gfp_mask);
+		if (ret)
+			goto out;
+		else
+			rlist[nr_i - 1].comp_len += read_len;
+		dst += read_len;
+	}
+out:
+	kvfree(buf);
+	return ret;
+}
+
 static inline bool blk_check_copy_offload(struct request_queue *src_q,
 		struct request_queue *dest_q)
 {
@@ -346,6 +463,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
 
 	if (blk_check_copy_offload(src_q, dest_q))
 		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
+	else
+		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
 
 	return ret;
 }
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 06/10] nvme: add copy support
       [not found]             ` <CGME20220207141937epcas5p2bd57ae35056c69b3e2f9ee2348d6af19@epcas5p2.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S,
	Javier González

From: SelvaKumar S <selvakuma.s1@samsung.com>

Add support for Copy command
For device supporting native copy, nvme driver receives read and
write request with BLK_COPY op flags.
For read request the nvme driver populates the payload with source
information.
For write request the driver converts it to nvme copy command using the
source information in the payload and submits to the device.
current design only supports single source range.
Ths design is courtsey Mikulas Patocka's token based copy

trace event support for nvme_copy_cmd.
Set the device copy limits to queue limits. By default copy_offload
is disabled.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
---
 drivers/nvme/host/core.c  | 121 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h  |   7 +++
 drivers/nvme/host/pci.c   |   9 +++
 drivers/nvme/host/trace.c |  19 ++++++
 include/linux/nvme.h      |  43 +++++++++++++-
 5 files changed, 194 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5e0bfda04bd7..49458001472e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -821,6 +821,90 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+static inline blk_status_t nvme_setup_copy_read(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+
+	return 0;
+}
+
+static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
+	       struct request *req, struct nvme_command *cmnd)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_copy_range *range = NULL;
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+	unsigned short nr_range = 1;
+	u16 control = 0;
+	u32 dsmgmt = 0;
+
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+			unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+			unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+
+	if (req->cmd_flags & REQ_FAILFAST_DEV)
+		control |= NVME_RW_LR;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(blk_rq_pos(req) >> (ns->lba_shift - 9));
+
+	range = kmalloc_array(nr_range, sizeof(*range),
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!range)
+		return BLK_STS_RESOURCE;
+
+	range[0].slba = cpu_to_le64(src_lba);
+	range[0].nlb = cpu_to_le16(n_lba - 1);
+
+	cmnd->copy.nr_range = 0;
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * nr_range;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	if (ctrl->nr_streams)
+		nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
+
+	cmnd->copy.control = cpu_to_le16(control);
+	cmnd->copy.dspec = cpu_to_le32(dsmgmt);
+
+	return BLK_STS_OK;
+}
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -1024,10 +1108,16 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
 	case REQ_OP_READ:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_read(ns, req);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
 	case REQ_OP_WRITE:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_write(ns, req, cmd);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
 		break;
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
@@ -1682,6 +1772,31 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
+				       struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct request_queue *queue = disk->queue;
+
+	if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
+		queue->limits.copy_offload = 0;
+		queue->limits.max_copy_sectors = 0;
+		queue->limits.max_copy_range_sectors = 0;
+		queue->limits.max_copy_nr_ranges = 0;
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, queue);
+		return;
+	}
+
+	/* setting copy limits */
+	blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, queue);
+	queue->limits.copy_offload = 0;
+	queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) *
+		(1 << (ns->lba_shift - 9));
+	queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) *
+		(1 << (ns->lba_shift - 9));
+	queue->limits.max_copy_nr_ranges = id->msrc + 1;
+}
+
 static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
 {
 	return !uuid_is_null(&ids->uuid) ||
@@ -1864,6 +1979,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	nvme_config_discard(disk, ns);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
+	nvme_config_copy(disk, ns, id);
 
 	set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
 		test_bit(NVME_NS_FORCE_RO, &ns->flags));
@@ -4721,6 +4837,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..117658a8cf5f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -474,6 +474,13 @@ struct nvme_ns {
 
 };
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6a99ed680915..a7b0f129a19d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -916,6 +916,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	if (ret)
 		return ret;
 
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &iod->cmd);
 		if (ret)
@@ -929,6 +934,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	}
 
 	blk_mq_start_request(req);
+
 	return BLK_STS_OK;
 out_unmap_data:
 	nvme_unmap_data(dev, req);
@@ -962,6 +968,9 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	ret = nvme_prep_rq(dev, req);
 	if (unlikely(ret))
 		return ret;
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ)))
+		return ret;
+
 	spin_lock(&nvmeq->sq_lock);
 	nvme_sq_copy_cmd(nvmeq, &iod->cmd);
 	nvme_write_sq_db(nvmeq, bd->last);
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
index 2a89c5aa0790..ab72bf546a13 100644
--- a/drivers/nvme/host/trace.c
+++ b/drivers/nvme/host/trace.c
@@ -150,6 +150,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10)
 	return ret;
 }
 
+static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 slba = get_unaligned_le64(cdw10);
+	u8 nr_range = get_unaligned_le16(cdw10 + 8);
+	u16 control = get_unaligned_le16(cdw10 + 10);
+	u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
+	u32 reftag = get_unaligned_le32(cdw10 +  16);
+
+	trace_seq_printf(p,
+			 "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
+			 slba, nr_range, control, dsmgmt, reftag);
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+
 static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10)
 {
 	const char *ret = trace_seq_buffer_ptr(p);
@@ -243,6 +260,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
 		return nvme_trace_zone_mgmt_send(p, cdw10);
 	case nvme_cmd_zone_mgmt_recv:
 		return nvme_trace_zone_mgmt_recv(p, cdw10);
+	case nvme_cmd_copy:
+		return nvme_trace_copy(p, cdw10);
 	default:
 		return nvme_trace_common(p, cdw10);
 	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 855dd9b3e84b..7ed966058f4c 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -309,7 +309,7 @@ struct nvme_id_ctrl {
 	__u8			nvscc;
 	__u8			nwpc;
 	__le16			acwu;
-	__u8			rsvd534[2];
+	__le16			ocfs;
 	__le32			sgls;
 	__le32			mnan;
 	__u8			rsvd544[224];
@@ -335,6 +335,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
@@ -383,7 +384,10 @@ struct nvme_id_ns {
 	__le16			npdg;
 	__le16			npda;
 	__le16			nows;
-	__u8			rsvd74[18];
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd91[11];
 	__le32			anagrpid;
 	__u8			rsvd96[3];
 	__u8			nsattr;
@@ -704,6 +708,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -725,7 +730,8 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
-		nvme_opcode_name(nvme_cmd_zone_append))
+		nvme_opcode_name(nvme_cmd_zone_append),		\
+		nvme_opcode_name(nvme_cmd_copy))
 
 
 
@@ -898,6 +904,36 @@ struct nvme_dsm_range {
 	__le64			slba;
 };
 
+struct nvme_copy_command {
+	__u8                    opcode;
+	__u8                    flags;
+	__u16                   command_id;
+	__le32                  nsid;
+	__u64                   rsvd2;
+	__le64                  metadata;
+	union nvme_data_ptr     dptr;
+	__le64                  sdlba;
+	__u8			nr_range;
+	__u8			rsvd12;
+	__le16                  control;
+	__le16                  rsvd13;
+	__le16			dspec;
+	__le32                  ilbrt;
+	__le16                  lbat;
+	__le16                  lbatm;
+};
+
+struct nvme_copy_range {
+	__le64			rsvd0;
+	__le64			slba;
+	__le16			nlb;
+	__le16			rsvd18;
+	__le32			rsvd20;
+	__le32			eilbrt;
+	__le16			elbat;
+	__le16			elbatm;
+};
+
 struct nvme_write_zeroes_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1449,6 +1485,7 @@ struct nvme_command {
 		struct nvme_download_firmware dlfw;
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
+		struct nvme_copy_command copy;
 		struct nvme_write_zeroes_cmd write_zeroes;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 06/10] nvme: add copy support
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, axboe, tytso, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, Javier González, lsf-pc

From: SelvaKumar S <selvakuma.s1@samsung.com>

Add support for Copy command
For device supporting native copy, nvme driver receives read and
write request with BLK_COPY op flags.
For read request the nvme driver populates the payload with source
information.
For write request the driver converts it to nvme copy command using the
source information in the payload and submits to the device.
current design only supports single source range.
Ths design is courtsey Mikulas Patocka's token based copy

trace event support for nvme_copy_cmd.
Set the device copy limits to queue limits. By default copy_offload
is disabled.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
---
 drivers/nvme/host/core.c  | 121 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h  |   7 +++
 drivers/nvme/host/pci.c   |   9 +++
 drivers/nvme/host/trace.c |  19 ++++++
 include/linux/nvme.h      |  43 +++++++++++++-
 5 files changed, 194 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5e0bfda04bd7..49458001472e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -821,6 +821,90 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+static inline blk_status_t nvme_setup_copy_read(struct nvme_ns *ns, struct request *req)
+{
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+
+	memcpy(token->subsys, "nvme", 4);
+	token->ns = ns;
+	token->src_sector = bio->bi_iter.bi_sector;
+	token->sectors = bio->bi_iter.bi_size >> 9;
+
+	return 0;
+}
+
+static inline blk_status_t nvme_setup_copy_write(struct nvme_ns *ns,
+	       struct request *req, struct nvme_command *cmnd)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_copy_range *range = NULL;
+	struct bio *bio = req->bio;
+	struct nvme_copy_token *token = bvec_kmap_local(&bio->bi_io_vec[0]);
+	sector_t src_sector, dst_sector, n_sectors;
+	u64 src_lba, dst_lba, n_lba;
+	unsigned short nr_range = 1;
+	u16 control = 0;
+	u32 dsmgmt = 0;
+
+	if (unlikely(memcmp(token->subsys, "nvme", 4)))
+		return BLK_STS_NOTSUPP;
+	if (unlikely(token->ns != ns))
+		return BLK_STS_NOTSUPP;
+
+	src_sector = token->src_sector;
+	dst_sector = bio->bi_iter.bi_sector;
+	n_sectors = token->sectors;
+	if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9))
+		return BLK_STS_NOTSUPP;
+
+	src_lba = nvme_sect_to_lba(ns, src_sector);
+	dst_lba = nvme_sect_to_lba(ns, dst_sector);
+	n_lba = nvme_sect_to_lba(ns, n_sectors);
+
+	if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) ||
+			unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) ||
+			unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors))
+		return BLK_STS_NOTSUPP;
+
+	if (WARN_ON(!n_lba))
+		return BLK_STS_NOTSUPP;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+
+	if (req->cmd_flags & REQ_FAILFAST_DEV)
+		control |= NVME_RW_LR;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->copy.opcode = nvme_cmd_copy;
+	cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->copy.sdlba = cpu_to_le64(blk_rq_pos(req) >> (ns->lba_shift - 9));
+
+	range = kmalloc_array(nr_range, sizeof(*range),
+			GFP_ATOMIC | __GFP_NOWARN);
+	if (!range)
+		return BLK_STS_RESOURCE;
+
+	range[0].slba = cpu_to_le64(src_lba);
+	range[0].nlb = cpu_to_le16(n_lba - 1);
+
+	cmnd->copy.nr_range = 0;
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * nr_range;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	if (ctrl->nr_streams)
+		nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
+
+	cmnd->copy.control = cpu_to_le16(control);
+	cmnd->copy.dspec = cpu_to_le32(dsmgmt);
+
+	return BLK_STS_OK;
+}
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -1024,10 +1108,16 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
 	case REQ_OP_READ:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_read(ns, req);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
 	case REQ_OP_WRITE:
-		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
+		if (unlikely(req->cmd_flags & REQ_COPY))
+			ret = nvme_setup_copy_write(ns, req, cmd);
+		else
+			ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
 		break;
 	case REQ_OP_ZONE_APPEND:
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
@@ -1682,6 +1772,31 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
+				       struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct request_queue *queue = disk->queue;
+
+	if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
+		queue->limits.copy_offload = 0;
+		queue->limits.max_copy_sectors = 0;
+		queue->limits.max_copy_range_sectors = 0;
+		queue->limits.max_copy_nr_ranges = 0;
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, queue);
+		return;
+	}
+
+	/* setting copy limits */
+	blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, queue);
+	queue->limits.copy_offload = 0;
+	queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) *
+		(1 << (ns->lba_shift - 9));
+	queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) *
+		(1 << (ns->lba_shift - 9));
+	queue->limits.max_copy_nr_ranges = id->msrc + 1;
+}
+
 static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
 {
 	return !uuid_is_null(&ids->uuid) ||
@@ -1864,6 +1979,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	nvme_config_discard(disk, ns);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
+	nvme_config_copy(disk, ns, id);
 
 	set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
 		test_bit(NVME_NS_FORCE_RO, &ns->flags));
@@ -4721,6 +4837,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_format_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..117658a8cf5f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -474,6 +474,13 @@ struct nvme_ns {
 
 };
 
+struct nvme_copy_token {
+	char subsys[4];
+	struct nvme_ns *ns;
+	u64 src_sector;
+	u64 sectors;
+};
+
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6a99ed680915..a7b0f129a19d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -916,6 +916,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	if (ret)
 		return ret;
 
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ))) {
+		blk_mq_end_request(req, BLK_STS_OK);
+		return BLK_STS_OK;
+	}
+
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &iod->cmd);
 		if (ret)
@@ -929,6 +934,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	}
 
 	blk_mq_start_request(req);
+
 	return BLK_STS_OK;
 out_unmap_data:
 	nvme_unmap_data(dev, req);
@@ -962,6 +968,9 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	ret = nvme_prep_rq(dev, req);
 	if (unlikely(ret))
 		return ret;
+	if (unlikely((req->cmd_flags & REQ_COPY) && (req_op(req) == REQ_OP_READ)))
+		return ret;
+
 	spin_lock(&nvmeq->sq_lock);
 	nvme_sq_copy_cmd(nvmeq, &iod->cmd);
 	nvme_write_sq_db(nvmeq, bd->last);
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
index 2a89c5aa0790..ab72bf546a13 100644
--- a/drivers/nvme/host/trace.c
+++ b/drivers/nvme/host/trace.c
@@ -150,6 +150,23 @@ static const char *nvme_trace_read_write(struct trace_seq *p, u8 *cdw10)
 	return ret;
 }
 
+static const char *nvme_trace_copy(struct trace_seq *p, u8 *cdw10)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 slba = get_unaligned_le64(cdw10);
+	u8 nr_range = get_unaligned_le16(cdw10 + 8);
+	u16 control = get_unaligned_le16(cdw10 + 10);
+	u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
+	u32 reftag = get_unaligned_le32(cdw10 +  16);
+
+	trace_seq_printf(p,
+			 "slba=%llu, nr_range=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
+			 slba, nr_range, control, dsmgmt, reftag);
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+
 static const char *nvme_trace_dsm(struct trace_seq *p, u8 *cdw10)
 {
 	const char *ret = trace_seq_buffer_ptr(p);
@@ -243,6 +260,8 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
 		return nvme_trace_zone_mgmt_send(p, cdw10);
 	case nvme_cmd_zone_mgmt_recv:
 		return nvme_trace_zone_mgmt_recv(p, cdw10);
+	case nvme_cmd_copy:
+		return nvme_trace_copy(p, cdw10);
 	default:
 		return nvme_trace_common(p, cdw10);
 	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 855dd9b3e84b..7ed966058f4c 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -309,7 +309,7 @@ struct nvme_id_ctrl {
 	__u8			nvscc;
 	__u8			nwpc;
 	__le16			acwu;
-	__u8			rsvd534[2];
+	__le16			ocfs;
 	__le32			sgls;
 	__le32			mnan;
 	__u8			rsvd544[224];
@@ -335,6 +335,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_COPY			= 1 << 8,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
@@ -383,7 +384,10 @@ struct nvme_id_ns {
 	__le16			npdg;
 	__le16			npda;
 	__le16			nows;
-	__u8			rsvd74[18];
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd91[11];
 	__le32			anagrpid;
 	__u8			rsvd96[3];
 	__u8			nsattr;
@@ -704,6 +708,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_copy		= 0x19,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -725,7 +730,8 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
-		nvme_opcode_name(nvme_cmd_zone_append))
+		nvme_opcode_name(nvme_cmd_zone_append),		\
+		nvme_opcode_name(nvme_cmd_copy))
 
 
 
@@ -898,6 +904,36 @@ struct nvme_dsm_range {
 	__le64			slba;
 };
 
+struct nvme_copy_command {
+	__u8                    opcode;
+	__u8                    flags;
+	__u16                   command_id;
+	__le32                  nsid;
+	__u64                   rsvd2;
+	__le64                  metadata;
+	union nvme_data_ptr     dptr;
+	__le64                  sdlba;
+	__u8			nr_range;
+	__u8			rsvd12;
+	__le16                  control;
+	__le16                  rsvd13;
+	__le16			dspec;
+	__le32                  ilbrt;
+	__le16                  lbat;
+	__le16                  lbatm;
+};
+
+struct nvme_copy_range {
+	__le64			rsvd0;
+	__le64			slba;
+	__le16			nlb;
+	__le16			rsvd18;
+	__le32			rsvd20;
+	__le32			eilbrt;
+	__le16			elbat;
+	__le16			elbatm;
+};
+
 struct nvme_write_zeroes_cmd {
 	__u8			opcode;
 	__u8			flags;
@@ -1449,6 +1485,7 @@ struct nvme_command {
 		struct nvme_download_firmware dlfw;
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
+		struct nvme_copy_command copy;
 		struct nvme_write_zeroes_cmd write_zeroes;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
       [not found]             ` <CGME20220207141942epcas5p4bda894a5833513c9211dcecc7928a951@epcas5p4.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

From: Arnav Dawn <arnav.dawn@samsung.com>

Add support for handling target command on target.
For bdev-ns we call into blkdev_issue_copy, which the block layer
completes by a offloaded copy request to backend bdev or by emulating the
request.

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/nvme/target/admin-cmd.c   |  8 +++-
 drivers/nvme/target/io-cmd-bdev.c | 66 +++++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c | 48 ++++++++++++++++++++++
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 6fb24746de06..cbb967344d1d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
-			NVME_CTRL_ONCS_WRITE_ZEROES);
-
+			NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY);
 	/* XXX: don't report vwc if the underlying device is write through */
 	id->vwc = NVME_CTRL_VWC_PRESENT;
 
@@ -530,6 +529,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	if (req->ns->bdev)
 		nvmet_bdev_set_limits(req->ns->bdev, id);
+	else {
+		id->msrc = to0based(BIO_MAX_VECS);
+		id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
+		id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
+	}
 
 	/*
 	 * We just provide a single LBA format that matches what the
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 95c2bbb0b2f5..9b403f394f21 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -46,6 +46,30 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->npda = id->npdg;
 	/* NOWS = Namespace Optimal Write Size */
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
+
+	/*Copy limits*/
+	if (ql->max_copy_sectors) {
+		id->mcl = cpu_to_le64((ql->max_copy_sectors << 9) / ql->logical_block_size);
+		id->mssrl = cpu_to_le32((ql->max_copy_range_sectors << 9) /
+				ql->logical_block_size);
+		id->msrc = to0based(ql->max_copy_nr_ranges);
+	} else {
+		if (ql->zoned == BLK_ZONED_NONE) {
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le32(
+					(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
+			id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
+#ifdef CONFIG_BLK_DEV_ZONED
+		} else {
+			/* TODO: get right values for zoned device */
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le32(min((BIO_MAX_VECS << PAGE_SHIFT),
+					ql->chunk_sectors) / ql->logical_block_size);
+			id->mcl = cpu_to_le64(min(le32_to_cpu(id->mssrl) * BIO_MAX_VECS,
+						ql->chunk_sectors));
+#endif
+		}
+	}
 }
 
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
@@ -433,6 +457,44 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static void nvmet_bdev_execute_copy(struct nvmet_req *req)
+{
+	struct nvme_copy_range range;
+	struct range_entry *rlist;
+	struct nvme_command *cmnd = req->cmd;
+	sector_t dest, dest_off = 0;
+	int ret, id, nr_range;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift;
+	rlist = kmalloc_array(nr_range, sizeof(*rlist), GFP_KERNEL);
+
+	for (id = 0 ; id < nr_range; id++) {
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, sizeof(range));
+		if (ret)
+			goto out;
+
+		rlist[id].dst = dest + dest_off;
+		rlist[id].src = le64_to_cpu(range.slba) << req->ns->blksize_shift;
+		rlist[id].len = (le16_to_cpu(range.nlb) + 1) << req->ns->blksize_shift;
+		rlist[id].comp_len = 0;
+		dest_off += rlist[id].len;
+	}
+	ret = blkdev_issue_copy(req->ns->bdev, nr_range, rlist, req->ns->bdev, GFP_KERNEL,
+			0);
+	if (ret) {
+		for (id = 0 ; id < nr_range; id++) {
+			if (rlist[id].len != rlist[id].comp_len) {
+				req->cqe->result.u32 = cpu_to_le32(id);
+				break;
+			}
+		}
+	}
+out:
+	kfree(rlist);
+	nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+}
+
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 {
 	switch (req->cmd->common.opcode) {
@@ -451,6 +513,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_bdev_execute_copy;
+		return 0;
+
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 6be6e59d273b..665baa221a43 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -347,6 +347,46 @@ static void nvmet_file_dsm_work(struct work_struct *w)
 	}
 }
 
+static void nvmet_file_copy_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+	int nr_range;
+	loff_t pos;
+	struct nvme_command *cmnd = req->cmd;
+	int ret = 0, len, src, id;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
+	if (unlikely(pos + req->transfer_len > req->ns->size)) {
+		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
+		return;
+	}
+
+	for (id = 0 ; id < nr_range; id++) {
+		struct nvme_copy_range range;
+
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
+					sizeof(range));
+		if (ret)
+			goto out;
+
+		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
+		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
+		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
+out:
+		if (ret != len) {
+			pos += ret;
+			req->cqe->result.u32 = cpu_to_le32(id);
+			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
+					errno_to_nvme_status(req, -EIO));
+			return;
+
+		} else
+			pos += len;
+}
+	nvmet_req_complete(req, ret);
+
+}
 static void nvmet_file_execute_dsm(struct nvmet_req *req)
 {
 	if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req)))
@@ -355,6 +395,11 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
 	schedule_work(&req->f.work);
 }
 
+static void nvmet_file_execute_copy(struct nvmet_req *req)
+{
+	INIT_WORK(&req->f.work, nvmet_file_copy_work);
+	schedule_work(&req->f.work);
+}
 static void nvmet_file_write_zeroes_work(struct work_struct *w)
 {
 	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -401,6 +446,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_file_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_file_execute_copy;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

From: Arnav Dawn <arnav.dawn@samsung.com>

Add support for handling target command on target.
For bdev-ns we call into blkdev_issue_copy, which the block layer
completes by a offloaded copy request to backend bdev or by emulating the
request.

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/nvme/target/admin-cmd.c   |  8 +++-
 drivers/nvme/target/io-cmd-bdev.c | 66 +++++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c | 48 ++++++++++++++++++++++
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 6fb24746de06..cbb967344d1d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
-			NVME_CTRL_ONCS_WRITE_ZEROES);
-
+			NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY);
 	/* XXX: don't report vwc if the underlying device is write through */
 	id->vwc = NVME_CTRL_VWC_PRESENT;
 
@@ -530,6 +529,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	if (req->ns->bdev)
 		nvmet_bdev_set_limits(req->ns->bdev, id);
+	else {
+		id->msrc = to0based(BIO_MAX_VECS);
+		id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
+		id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
+	}
 
 	/*
 	 * We just provide a single LBA format that matches what the
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 95c2bbb0b2f5..9b403f394f21 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -46,6 +46,30 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->npda = id->npdg;
 	/* NOWS = Namespace Optimal Write Size */
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
+
+	/*Copy limits*/
+	if (ql->max_copy_sectors) {
+		id->mcl = cpu_to_le64((ql->max_copy_sectors << 9) / ql->logical_block_size);
+		id->mssrl = cpu_to_le32((ql->max_copy_range_sectors << 9) /
+				ql->logical_block_size);
+		id->msrc = to0based(ql->max_copy_nr_ranges);
+	} else {
+		if (ql->zoned == BLK_ZONED_NONE) {
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le32(
+					(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
+			id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
+#ifdef CONFIG_BLK_DEV_ZONED
+		} else {
+			/* TODO: get right values for zoned device */
+			id->msrc = to0based(BIO_MAX_VECS);
+			id->mssrl = cpu_to_le32(min((BIO_MAX_VECS << PAGE_SHIFT),
+					ql->chunk_sectors) / ql->logical_block_size);
+			id->mcl = cpu_to_le64(min(le32_to_cpu(id->mssrl) * BIO_MAX_VECS,
+						ql->chunk_sectors));
+#endif
+		}
+	}
 }
 
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
@@ -433,6 +457,44 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static void nvmet_bdev_execute_copy(struct nvmet_req *req)
+{
+	struct nvme_copy_range range;
+	struct range_entry *rlist;
+	struct nvme_command *cmnd = req->cmd;
+	sector_t dest, dest_off = 0;
+	int ret, id, nr_range;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift;
+	rlist = kmalloc_array(nr_range, sizeof(*rlist), GFP_KERNEL);
+
+	for (id = 0 ; id < nr_range; id++) {
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range, sizeof(range));
+		if (ret)
+			goto out;
+
+		rlist[id].dst = dest + dest_off;
+		rlist[id].src = le64_to_cpu(range.slba) << req->ns->blksize_shift;
+		rlist[id].len = (le16_to_cpu(range.nlb) + 1) << req->ns->blksize_shift;
+		rlist[id].comp_len = 0;
+		dest_off += rlist[id].len;
+	}
+	ret = blkdev_issue_copy(req->ns->bdev, nr_range, rlist, req->ns->bdev, GFP_KERNEL,
+			0);
+	if (ret) {
+		for (id = 0 ; id < nr_range; id++) {
+			if (rlist[id].len != rlist[id].comp_len) {
+				req->cqe->result.u32 = cpu_to_le32(id);
+				break;
+			}
+		}
+	}
+out:
+	kfree(rlist);
+	nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+}
+
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 {
 	switch (req->cmd->common.opcode) {
@@ -451,6 +513,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_bdev_execute_copy;
+		return 0;
+
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 6be6e59d273b..665baa221a43 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -347,6 +347,46 @@ static void nvmet_file_dsm_work(struct work_struct *w)
 	}
 }
 
+static void nvmet_file_copy_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+	int nr_range;
+	loff_t pos;
+	struct nvme_command *cmnd = req->cmd;
+	int ret = 0, len, src, id;
+
+	nr_range = cmnd->copy.nr_range + 1;
+	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
+	if (unlikely(pos + req->transfer_len > req->ns->size)) {
+		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
+		return;
+	}
+
+	for (id = 0 ; id < nr_range; id++) {
+		struct nvme_copy_range range;
+
+		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
+					sizeof(range));
+		if (ret)
+			goto out;
+
+		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
+		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
+		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
+out:
+		if (ret != len) {
+			pos += ret;
+			req->cqe->result.u32 = cpu_to_le32(id);
+			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
+					errno_to_nvme_status(req, -EIO));
+			return;
+
+		} else
+			pos += len;
+}
+	nvmet_req_complete(req, ret);
+
+}
 static void nvmet_file_execute_dsm(struct nvmet_req *req)
 {
 	if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req)))
@@ -355,6 +395,11 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
 	schedule_work(&req->f.work);
 }
 
+static void nvmet_file_execute_copy(struct nvmet_req *req)
+{
+	INIT_WORK(&req->f.work, nvmet_file_copy_work);
+	schedule_work(&req->f.work);
+}
 static void nvmet_file_write_zeroes_work(struct work_struct *w)
 {
 	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -401,6 +446,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_file_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_copy:
+		req->execute = nvmet_file_execute_copy;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 08/10] dm: Add support for copy offload.
       [not found]             ` <CGME20220207141948epcas5p4534f6bdc5a1e2e676d7d09c04f8b4a5b@epcas5p4.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

Before enabling copy for dm target, check if underlaying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently spliting copy
request is not supported

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-table.c         | 43 +++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  6 +++++
 include/linux/device-mapper.h |  5 ++++
 3 files changed, 54 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index e43096cfe9e2..cb5cdaf1d8b9 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1903,6 +1903,39 @@ static bool dm_table_supports_nowait(struct dm_table *t)
 	return true;
 }
 
+static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev,
+				      sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return !blk_queue_copy(q);
+}
+
+static bool dm_table_supports_copy(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->copy_supported)
+			return false;
+
+		/*
+		 * Either the target provides copy support (as implied by setting
+		 * 'copy_supported') or it relies on _all_ data devices having
+		 * discard support.
+		 */
+		if (!ti->copy_supported &&
+		    (!ti->type->iterate_devices ||
+		     ti->type->iterate_devices(ti, device_not_copy_capable, NULL)))
+			return false;
+	}
+
+	return true;
+}
+
 static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev,
 				      sector_t start, sector_t len, void *data)
 {
@@ -2000,6 +2033,16 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	} else
 		blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
 
+	if (!dm_table_supports_copy(t)) {
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		/* Must also clear discard limits... */
+		q->limits.max_copy_sectors = 0;
+		q->limits.max_copy_range_sectors = 0;
+		q->limits.max_copy_nr_ranges = 0;
+	} else {
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	}
+
 	if (dm_table_supports_secure_erase(t))
 		blk_queue_flag_set(QUEUE_FLAG_SECERASE, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fa596b654c99..2a6d55722139 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1420,6 +1420,12 @@ static int __split_and_process_non_flush(struct clone_info *ci)
 	if (__process_abnormal_io(ci, ti, &r))
 		return r;
 
+	if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
+				max_io_len(ti, ci->sector) < ci->sector_count)) {
+		DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
+				__func__, ci->sector_count, max_io_len(ti, ci->sector));
+		return -EIO;
+	}
 	len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count);
 
 	r = __clone_and_map_data_bio(ci, ti, ci->sector, &len);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index b26fecf6c8e8..acfd4018125a 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -362,6 +362,11 @@ struct dm_target {
 	 * zone append operations using regular writes.
 	 */
 	bool emulate_zone_append:1;
+
+	/*
+	 * copy offload is supported
+	 */
+	bool copy_supported:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 08/10] dm: Add support for copy offload.
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

Before enabling copy for dm target, check if underlaying devices and
dm target support copy. Avoid split happening inside dm target.
Fail early if the request needs split, currently spliting copy
request is not supported

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-table.c         | 43 +++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  6 +++++
 include/linux/device-mapper.h |  5 ++++
 3 files changed, 54 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index e43096cfe9e2..cb5cdaf1d8b9 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1903,6 +1903,39 @@ static bool dm_table_supports_nowait(struct dm_table *t)
 	return true;
 }
 
+static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev,
+				      sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return !blk_queue_copy(q);
+}
+
+static bool dm_table_supports_copy(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->copy_supported)
+			return false;
+
+		/*
+		 * Either the target provides copy support (as implied by setting
+		 * 'copy_supported') or it relies on _all_ data devices having
+		 * discard support.
+		 */
+		if (!ti->copy_supported &&
+		    (!ti->type->iterate_devices ||
+		     ti->type->iterate_devices(ti, device_not_copy_capable, NULL)))
+			return false;
+	}
+
+	return true;
+}
+
 static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev,
 				      sector_t start, sector_t len, void *data)
 {
@@ -2000,6 +2033,16 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	} else
 		blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
 
+	if (!dm_table_supports_copy(t)) {
+		blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+		/* Must also clear discard limits... */
+		q->limits.max_copy_sectors = 0;
+		q->limits.max_copy_range_sectors = 0;
+		q->limits.max_copy_nr_ranges = 0;
+	} else {
+		blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+	}
+
 	if (dm_table_supports_secure_erase(t))
 		blk_queue_flag_set(QUEUE_FLAG_SECERASE, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fa596b654c99..2a6d55722139 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1420,6 +1420,12 @@ static int __split_and_process_non_flush(struct clone_info *ci)
 	if (__process_abnormal_io(ci, ti, &r))
 		return r;
 
+	if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
+				max_io_len(ti, ci->sector) < ci->sector_count)) {
+		DMERR("%s: Error IO size(%u) is greater than maximum target size(%llu)\n",
+				__func__, ci->sector_count, max_io_len(ti, ci->sector));
+		return -EIO;
+	}
 	len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count);
 
 	r = __clone_and_map_data_bio(ci, ti, ci->sector, &len);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index b26fecf6c8e8..acfd4018125a 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -362,6 +362,11 @@ struct dm_target {
 	 * zone append operations using regular writes.
 	 */
 	bool emulate_zone_append:1;
+
+	/*
+	 * copy offload is supported
+	 */
+	bool copy_supported:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 09/10] dm: Enable copy offload for dm-linear target
       [not found]             ` <CGME20220207141953epcas5p32ccc3c0bbe642cea074edefcc32302a5@epcas5p3.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

Setting copy_supported flag to enable offload.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-linear.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 1b97a11d7151..8910728bc8df 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_same_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->copy_supported = 1;
 	ti->private = lc;
 	return 0;
 
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 09/10] dm: Enable copy offload for dm-linear target
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

Setting copy_supported flag to enable offload.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-linear.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 1b97a11d7151..8910728bc8df 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_same_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->copy_supported = 1;
 	ti->private = lc;
 	return 0;
 
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [PATCH v2 10/10] dm kcopyd: use copy offload support
       [not found]             ` <CGME20220207141958epcas5p25f1cd06726217696d13c2dfbea010565@epcas5p2.samsung.com>
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S

From: SelvaKumar S <selvakuma.s1@samsung.com>

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-kcopyd.c | 57 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 37b03ab7e5c9..64f17cc7b069 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
 	atomic_t nr_jobs;
 
 /*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
  *
- * i)   jobs waiting for pages
- * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i)	jobs waiting to try copy offload
+ * ii)   jobs waiting for pages
+ * iii)  jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
  *
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
  */
 	spinlock_t job_lock;
 	struct list_head callback_jobs;
 	struct list_head complete_jobs;
+	struct list_head copy_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
 };
@@ -579,6 +581,44 @@ static int run_io_job(struct kcopyd_job *job)
 	return r;
 }
 
+static int run_copy_job(struct kcopyd_job *job)
+{
+	int r, i, count = 0;
+	unsigned long flags = 0;
+	struct range_entry range;
+
+	struct request_queue *src_q, *dest_q;
+
+	for (i = 0; i < job->num_dests; i++) {
+		range.dst = job->dests[i].sector << SECTOR_SHIFT;
+		range.src = job->source.sector << SECTOR_SHIFT;
+		range.len = job->source.count << SECTOR_SHIFT;
+
+		src_q = bdev_get_queue(job->source.bdev);
+		dest_q = bdev_get_queue(job->dests[i].bdev);
+
+		if (src_q != dest_q && !src_q->limits.copy_offload)
+			break;
+
+		r = blkdev_issue_copy(job->source.bdev, 1, &range,
+			job->dests[i].bdev, GFP_KERNEL, flags);
+		if (r)
+			break;
+
+		job->dests[i].count = 0;
+		count++;
+	}
+
+	if (count == job->num_dests) {
+		push(&job->kc->complete_jobs, job);
+	} else {
+		push(&job->kc->pages_jobs, job);
+		r = 0;
+	}
+
+	return r;
+}
+
 static int run_pages_job(struct kcopyd_job *job)
 {
 	int r;
@@ -659,6 +699,7 @@ static void do_work(struct work_struct *work)
 	spin_unlock_irq(&kc->job_lock);
 
 	blk_start_plug(&plug);
+	process_jobs(&kc->copy_jobs, kc, run_copy_job);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
 	process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +717,8 @@ static void dispatch_job(struct kcopyd_job *job)
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
 		push(&kc->callback_jobs, job);
+	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+		push(&kc->copy_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -916,6 +959,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 	spin_lock_init(&kc->job_lock);
 	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
+	INIT_LIST_HEAD(&kc->copy_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
 	kc->throttle = throttle;
@@ -971,6 +1015,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 
 	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
+	WARN_ON(!list_empty(&kc->copy_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
 	destroy_workqueue(kc->kcopyd_wq);
-- 
2.30.0-rc0


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* [dm-devel] [PATCH v2 10/10] dm kcopyd: use copy offload support
@ 2022-02-07 14:13                 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-07 14:13 UTC (permalink / raw)
  To: mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	Frederick.Knight, axboe, tytso, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

From: SelvaKumar S <selvakuma.s1@samsung.com>

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 drivers/md/dm-kcopyd.c | 57 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 37b03ab7e5c9..64f17cc7b069 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
 	atomic_t nr_jobs;
 
 /*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
  *
- * i)   jobs waiting for pages
- * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i)	jobs waiting to try copy offload
+ * ii)   jobs waiting for pages
+ * iii)  jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
  *
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
  */
 	spinlock_t job_lock;
 	struct list_head callback_jobs;
 	struct list_head complete_jobs;
+	struct list_head copy_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
 };
@@ -579,6 +581,44 @@ static int run_io_job(struct kcopyd_job *job)
 	return r;
 }
 
+static int run_copy_job(struct kcopyd_job *job)
+{
+	int r, i, count = 0;
+	unsigned long flags = 0;
+	struct range_entry range;
+
+	struct request_queue *src_q, *dest_q;
+
+	for (i = 0; i < job->num_dests; i++) {
+		range.dst = job->dests[i].sector << SECTOR_SHIFT;
+		range.src = job->source.sector << SECTOR_SHIFT;
+		range.len = job->source.count << SECTOR_SHIFT;
+
+		src_q = bdev_get_queue(job->source.bdev);
+		dest_q = bdev_get_queue(job->dests[i].bdev);
+
+		if (src_q != dest_q && !src_q->limits.copy_offload)
+			break;
+
+		r = blkdev_issue_copy(job->source.bdev, 1, &range,
+			job->dests[i].bdev, GFP_KERNEL, flags);
+		if (r)
+			break;
+
+		job->dests[i].count = 0;
+		count++;
+	}
+
+	if (count == job->num_dests) {
+		push(&job->kc->complete_jobs, job);
+	} else {
+		push(&job->kc->pages_jobs, job);
+		r = 0;
+	}
+
+	return r;
+}
+
 static int run_pages_job(struct kcopyd_job *job)
 {
 	int r;
@@ -659,6 +699,7 @@ static void do_work(struct work_struct *work)
 	spin_unlock_irq(&kc->job_lock);
 
 	blk_start_plug(&plug);
+	process_jobs(&kc->copy_jobs, kc, run_copy_job);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
 	process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +717,8 @@ static void dispatch_job(struct kcopyd_job *job)
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
 		push(&kc->callback_jobs, job);
+	else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+		push(&kc->copy_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -916,6 +959,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 	spin_lock_init(&kc->job_lock);
 	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
+	INIT_LIST_HEAD(&kc->copy_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
 	kc->throttle = throttle;
@@ -971,6 +1015,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 
 	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
+	WARN_ON(!list_empty(&kc->copy_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
 	destroy_workqueue(kc->kcopyd_wq);
-- 
2.30.0-rc0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-07 18:10                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 18:10 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220207]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: microblaze-buildonly-randconfig-r003-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080206.J6kilCeY-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=microblaze SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/byteorder/big_endian.h:5,
                    from arch/microblaze/include/uapi/asm/byteorder.h:8,
                    from include/asm-generic/bitops/le.h:6,
                    from include/asm-generic/bitops.h:35,
                    from ./arch/microblaze/include/generated/asm/bitops.h:1,
                    from include/linux/bitops.h:33,
                    from include/linux/log2.h:12,
                    from include/asm-generic/div64.h:55,
                    from ./arch/microblaze/include/generated/asm/div64.h:1,
                    from include/linux/math.h:5,
                    from include/linux/math64.h:6,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/module.h:13,
                    from drivers/nvme/target/admin-cmd.c:7:
   drivers/nvme/target/admin-cmd.c: In function 'nvmet_execute_identify_ns':
>> include/uapi/linux/byteorder/big_endian.h:34:26: warning: conversion from 'unsigned int' to '__le16' {aka 'short unsigned int'} changes value from '524288' to '0' [-Woverflow]
      34 | #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
         |                          ^
   include/linux/byteorder/generic.h:88:21: note: in expansion of macro '__cpu_to_le32'
      88 | #define cpu_to_le32 __cpu_to_le32
         |                     ^~~~~~~~~~~~~
   drivers/nvme/target/admin-cmd.c:534:29: note: in expansion of macro 'cpu_to_le32'
     534 |                 id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
         |                             ^~~~~~~~~~~


vim +34 include/uapi/linux/byteorder/big_endian.h

5921e6f8809b161 David Howells 2012-10-13  15  
5921e6f8809b161 David Howells 2012-10-13  16  #define __constant_htonl(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  17  #define __constant_ntohl(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  18  #define __constant_htons(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  19  #define __constant_ntohs(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  20  #define __constant_cpu_to_le64(x) ((__force __le64)___constant_swab64((x)))
5921e6f8809b161 David Howells 2012-10-13  21  #define __constant_le64_to_cpu(x) ___constant_swab64((__force __u64)(__le64)(x))
5921e6f8809b161 David Howells 2012-10-13  22  #define __constant_cpu_to_le32(x) ((__force __le32)___constant_swab32((x)))
5921e6f8809b161 David Howells 2012-10-13  23  #define __constant_le32_to_cpu(x) ___constant_swab32((__force __u32)(__le32)(x))
5921e6f8809b161 David Howells 2012-10-13  24  #define __constant_cpu_to_le16(x) ((__force __le16)___constant_swab16((x)))
5921e6f8809b161 David Howells 2012-10-13  25  #define __constant_le16_to_cpu(x) ___constant_swab16((__force __u16)(__le16)(x))
5921e6f8809b161 David Howells 2012-10-13  26  #define __constant_cpu_to_be64(x) ((__force __be64)(__u64)(x))
5921e6f8809b161 David Howells 2012-10-13  27  #define __constant_be64_to_cpu(x) ((__force __u64)(__be64)(x))
5921e6f8809b161 David Howells 2012-10-13  28  #define __constant_cpu_to_be32(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  29  #define __constant_be32_to_cpu(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  30  #define __constant_cpu_to_be16(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  31  #define __constant_be16_to_cpu(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  32  #define __cpu_to_le64(x) ((__force __le64)__swab64((x)))
5921e6f8809b161 David Howells 2012-10-13  33  #define __le64_to_cpu(x) __swab64((__force __u64)(__le64)(x))
5921e6f8809b161 David Howells 2012-10-13 @34  #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
5921e6f8809b161 David Howells 2012-10-13  35  #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
5921e6f8809b161 David Howells 2012-10-13  36  #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
5921e6f8809b161 David Howells 2012-10-13  37  #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
5921e6f8809b161 David Howells 2012-10-13  38  #define __cpu_to_be64(x) ((__force __be64)(__u64)(x))
5921e6f8809b161 David Howells 2012-10-13  39  #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
5921e6f8809b161 David Howells 2012-10-13  40  #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  41  #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  42  #define __cpu_to_be16(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  43  #define __be16_to_cpu(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  44  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-07 18:10                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 18:10 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: axboe, javier, msnitzer, kbuild-all, linux-scsi, chaitanyak,
	linux-nvme, linux-block, dm-devel, linux-fsdevel

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220207]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: microblaze-buildonly-randconfig-r003-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080206.J6kilCeY-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=microblaze SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/byteorder/big_endian.h:5,
                    from arch/microblaze/include/uapi/asm/byteorder.h:8,
                    from include/asm-generic/bitops/le.h:6,
                    from include/asm-generic/bitops.h:35,
                    from ./arch/microblaze/include/generated/asm/bitops.h:1,
                    from include/linux/bitops.h:33,
                    from include/linux/log2.h:12,
                    from include/asm-generic/div64.h:55,
                    from ./arch/microblaze/include/generated/asm/div64.h:1,
                    from include/linux/math.h:5,
                    from include/linux/math64.h:6,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/module.h:13,
                    from drivers/nvme/target/admin-cmd.c:7:
   drivers/nvme/target/admin-cmd.c: In function 'nvmet_execute_identify_ns':
>> include/uapi/linux/byteorder/big_endian.h:34:26: warning: conversion from 'unsigned int' to '__le16' {aka 'short unsigned int'} changes value from '524288' to '0' [-Woverflow]
      34 | #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
         |                          ^
   include/linux/byteorder/generic.h:88:21: note: in expansion of macro '__cpu_to_le32'
      88 | #define cpu_to_le32 __cpu_to_le32
         |                     ^~~~~~~~~~~~~
   drivers/nvme/target/admin-cmd.c:534:29: note: in expansion of macro 'cpu_to_le32'
     534 |                 id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
         |                             ^~~~~~~~~~~


vim +34 include/uapi/linux/byteorder/big_endian.h

5921e6f8809b161 David Howells 2012-10-13  15  
5921e6f8809b161 David Howells 2012-10-13  16  #define __constant_htonl(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  17  #define __constant_ntohl(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  18  #define __constant_htons(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  19  #define __constant_ntohs(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  20  #define __constant_cpu_to_le64(x) ((__force __le64)___constant_swab64((x)))
5921e6f8809b161 David Howells 2012-10-13  21  #define __constant_le64_to_cpu(x) ___constant_swab64((__force __u64)(__le64)(x))
5921e6f8809b161 David Howells 2012-10-13  22  #define __constant_cpu_to_le32(x) ((__force __le32)___constant_swab32((x)))
5921e6f8809b161 David Howells 2012-10-13  23  #define __constant_le32_to_cpu(x) ___constant_swab32((__force __u32)(__le32)(x))
5921e6f8809b161 David Howells 2012-10-13  24  #define __constant_cpu_to_le16(x) ((__force __le16)___constant_swab16((x)))
5921e6f8809b161 David Howells 2012-10-13  25  #define __constant_le16_to_cpu(x) ___constant_swab16((__force __u16)(__le16)(x))
5921e6f8809b161 David Howells 2012-10-13  26  #define __constant_cpu_to_be64(x) ((__force __be64)(__u64)(x))
5921e6f8809b161 David Howells 2012-10-13  27  #define __constant_be64_to_cpu(x) ((__force __u64)(__be64)(x))
5921e6f8809b161 David Howells 2012-10-13  28  #define __constant_cpu_to_be32(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  29  #define __constant_be32_to_cpu(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  30  #define __constant_cpu_to_be16(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  31  #define __constant_be16_to_cpu(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  32  #define __cpu_to_le64(x) ((__force __le64)__swab64((x)))
5921e6f8809b161 David Howells 2012-10-13  33  #define __le64_to_cpu(x) __swab64((__force __u64)(__le64)(x))
5921e6f8809b161 David Howells 2012-10-13 @34  #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
5921e6f8809b161 David Howells 2012-10-13  35  #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
5921e6f8809b161 David Howells 2012-10-13  36  #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
5921e6f8809b161 David Howells 2012-10-13  37  #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
5921e6f8809b161 David Howells 2012-10-13  38  #define __cpu_to_be64(x) ((__force __be64)(__u64)(x))
5921e6f8809b161 David Howells 2012-10-13  39  #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
5921e6f8809b161 David Howells 2012-10-13  40  #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  41  #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  42  #define __cpu_to_be16(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  43  #define __be16_to_cpu(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  44  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-07 18:10                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 18:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6726 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220207]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: microblaze-buildonly-randconfig-r003-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080206.J6kilCeY-lkp(a)intel.com/config)
compiler: microblaze-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=microblaze SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/byteorder/big_endian.h:5,
                    from arch/microblaze/include/uapi/asm/byteorder.h:8,
                    from include/asm-generic/bitops/le.h:6,
                    from include/asm-generic/bitops.h:35,
                    from ./arch/microblaze/include/generated/asm/bitops.h:1,
                    from include/linux/bitops.h:33,
                    from include/linux/log2.h:12,
                    from include/asm-generic/div64.h:55,
                    from ./arch/microblaze/include/generated/asm/div64.h:1,
                    from include/linux/math.h:5,
                    from include/linux/math64.h:6,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/module.h:13,
                    from drivers/nvme/target/admin-cmd.c:7:
   drivers/nvme/target/admin-cmd.c: In function 'nvmet_execute_identify_ns':
>> include/uapi/linux/byteorder/big_endian.h:34:26: warning: conversion from 'unsigned int' to '__le16' {aka 'short unsigned int'} changes value from '524288' to '0' [-Woverflow]
      34 | #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
         |                          ^
   include/linux/byteorder/generic.h:88:21: note: in expansion of macro '__cpu_to_le32'
      88 | #define cpu_to_le32 __cpu_to_le32
         |                     ^~~~~~~~~~~~~
   drivers/nvme/target/admin-cmd.c:534:29: note: in expansion of macro 'cpu_to_le32'
     534 |                 id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
         |                             ^~~~~~~~~~~


vim +34 include/uapi/linux/byteorder/big_endian.h

5921e6f8809b161 David Howells 2012-10-13  15  
5921e6f8809b161 David Howells 2012-10-13  16  #define __constant_htonl(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  17  #define __constant_ntohl(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  18  #define __constant_htons(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  19  #define __constant_ntohs(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  20  #define __constant_cpu_to_le64(x) ((__force __le64)___constant_swab64((x)))
5921e6f8809b161 David Howells 2012-10-13  21  #define __constant_le64_to_cpu(x) ___constant_swab64((__force __u64)(__le64)(x))
5921e6f8809b161 David Howells 2012-10-13  22  #define __constant_cpu_to_le32(x) ((__force __le32)___constant_swab32((x)))
5921e6f8809b161 David Howells 2012-10-13  23  #define __constant_le32_to_cpu(x) ___constant_swab32((__force __u32)(__le32)(x))
5921e6f8809b161 David Howells 2012-10-13  24  #define __constant_cpu_to_le16(x) ((__force __le16)___constant_swab16((x)))
5921e6f8809b161 David Howells 2012-10-13  25  #define __constant_le16_to_cpu(x) ___constant_swab16((__force __u16)(__le16)(x))
5921e6f8809b161 David Howells 2012-10-13  26  #define __constant_cpu_to_be64(x) ((__force __be64)(__u64)(x))
5921e6f8809b161 David Howells 2012-10-13  27  #define __constant_be64_to_cpu(x) ((__force __u64)(__be64)(x))
5921e6f8809b161 David Howells 2012-10-13  28  #define __constant_cpu_to_be32(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  29  #define __constant_be32_to_cpu(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  30  #define __constant_cpu_to_be16(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  31  #define __constant_be16_to_cpu(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  32  #define __cpu_to_le64(x) ((__force __le64)__swab64((x)))
5921e6f8809b161 David Howells 2012-10-13  33  #define __le64_to_cpu(x) __swab64((__force __u64)(__le64)(x))
5921e6f8809b161 David Howells 2012-10-13 @34  #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
5921e6f8809b161 David Howells 2012-10-13  35  #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
5921e6f8809b161 David Howells 2012-10-13  36  #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
5921e6f8809b161 David Howells 2012-10-13  37  #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
5921e6f8809b161 David Howells 2012-10-13  38  #define __cpu_to_be64(x) ((__force __be64)(__u64)(x))
5921e6f8809b161 David Howells 2012-10-13  39  #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
5921e6f8809b161 David Howells 2012-10-13  40  #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
5921e6f8809b161 David Howells 2012-10-13  41  #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
5921e6f8809b161 David Howells 2012-10-13  42  #define __cpu_to_be16(x) ((__force __be16)(__u16)(x))
5921e6f8809b161 David Howells 2012-10-13  43  #define __be16_to_cpu(x) ((__force __u16)(__be16)(x))
5921e6f8809b161 David Howells 2012-10-13  44  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-07 20:12                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 20:12 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: llvm, kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220207]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-r031-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080346.u4ubCCIs-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/nvme/target/admin-cmd.c:534:15: warning: implicit conversion from '__le32' (aka 'unsigned int') to '__le16' (aka 'unsigned short') changes value from 2097152 to 0 [-Wconstant-conversion]
                   id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
                             ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/byteorder/generic.h:88:21: note: expanded from macro 'cpu_to_le32'
   #define cpu_to_le32 __cpu_to_le32
                       ^
   include/uapi/linux/byteorder/big_endian.h:34:27: note: expanded from macro '__cpu_to_le32'
   #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning generated.


vim +534 drivers/nvme/target/admin-cmd.c

   488	
   489	static void nvmet_execute_identify_ns(struct nvmet_req *req)
   490	{
   491		struct nvme_id_ns *id;
   492		u16 status;
   493	
   494		if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
   495			req->error_loc = offsetof(struct nvme_identify, nsid);
   496			status = NVME_SC_INVALID_NS | NVME_SC_DNR;
   497			goto out;
   498		}
   499	
   500		id = kzalloc(sizeof(*id), GFP_KERNEL);
   501		if (!id) {
   502			status = NVME_SC_INTERNAL;
   503			goto out;
   504		}
   505	
   506		/* return an all zeroed buffer if we can't find an active namespace */
   507		status = nvmet_req_find_ns(req);
   508		if (status) {
   509			status = 0;
   510			goto done;
   511		}
   512	
   513		nvmet_ns_revalidate(req->ns);
   514	
   515		/*
   516		 * nuse = ncap = nsze isn't always true, but we have no way to find
   517		 * that out from the underlying device.
   518		 */
   519		id->ncap = id->nsze =
   520			cpu_to_le64(req->ns->size >> req->ns->blksize_shift);
   521		switch (req->port->ana_state[req->ns->anagrpid]) {
   522		case NVME_ANA_INACCESSIBLE:
   523		case NVME_ANA_PERSISTENT_LOSS:
   524			break;
   525		default:
   526			id->nuse = id->nsze;
   527			break;
   528		}
   529	
   530		if (req->ns->bdev)
   531			nvmet_bdev_set_limits(req->ns->bdev, id);
   532		else {
   533			id->msrc = to0based(BIO_MAX_VECS);
 > 534			id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
   535			id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
   536		}
   537	
   538		/*
   539		 * We just provide a single LBA format that matches what the
   540		 * underlying device reports.
   541		 */
   542		id->nlbaf = 0;
   543		id->flbas = 0;
   544	
   545		/*
   546		 * Our namespace might always be shared.  Not just with other
   547		 * controllers, but also with any other user of the block device.
   548		 */
   549		id->nmic = NVME_NS_NMIC_SHARED;
   550		id->anagrpid = cpu_to_le32(req->ns->anagrpid);
   551	
   552		memcpy(&id->nguid, &req->ns->nguid, sizeof(id->nguid));
   553	
   554		id->lbaf[0].ds = req->ns->blksize_shift;
   555	
   556		if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns)) {
   557			id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
   558				  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
   559				  NVME_NS_DPC_PI_TYPE3;
   560			id->mc = NVME_MC_EXTENDED_LBA;
   561			id->dps = req->ns->pi_type;
   562			id->flbas = NVME_NS_FLBAS_META_EXT;
   563			id->lbaf[0].ms = cpu_to_le16(req->ns->metadata_size);
   564		}
   565	
   566		if (req->ns->readonly)
   567			id->nsattr |= (1 << 0);
   568	done:
   569		if (!status)
   570			status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
   571	
   572		kfree(id);
   573	out:
   574		nvmet_req_complete(req, status);
   575	}
   576	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-07 20:12                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 20:12 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: axboe, javier, msnitzer, kbuild-all, linux-scsi, llvm,
	chaitanyak, linux-nvme, linux-block, dm-devel, linux-fsdevel

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220207]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-r031-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080346.u4ubCCIs-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/nvme/target/admin-cmd.c:534:15: warning: implicit conversion from '__le32' (aka 'unsigned int') to '__le16' (aka 'unsigned short') changes value from 2097152 to 0 [-Wconstant-conversion]
                   id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
                             ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/byteorder/generic.h:88:21: note: expanded from macro 'cpu_to_le32'
   #define cpu_to_le32 __cpu_to_le32
                       ^
   include/uapi/linux/byteorder/big_endian.h:34:27: note: expanded from macro '__cpu_to_le32'
   #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning generated.


vim +534 drivers/nvme/target/admin-cmd.c

   488	
   489	static void nvmet_execute_identify_ns(struct nvmet_req *req)
   490	{
   491		struct nvme_id_ns *id;
   492		u16 status;
   493	
   494		if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
   495			req->error_loc = offsetof(struct nvme_identify, nsid);
   496			status = NVME_SC_INVALID_NS | NVME_SC_DNR;
   497			goto out;
   498		}
   499	
   500		id = kzalloc(sizeof(*id), GFP_KERNEL);
   501		if (!id) {
   502			status = NVME_SC_INTERNAL;
   503			goto out;
   504		}
   505	
   506		/* return an all zeroed buffer if we can't find an active namespace */
   507		status = nvmet_req_find_ns(req);
   508		if (status) {
   509			status = 0;
   510			goto done;
   511		}
   512	
   513		nvmet_ns_revalidate(req->ns);
   514	
   515		/*
   516		 * nuse = ncap = nsze isn't always true, but we have no way to find
   517		 * that out from the underlying device.
   518		 */
   519		id->ncap = id->nsze =
   520			cpu_to_le64(req->ns->size >> req->ns->blksize_shift);
   521		switch (req->port->ana_state[req->ns->anagrpid]) {
   522		case NVME_ANA_INACCESSIBLE:
   523		case NVME_ANA_PERSISTENT_LOSS:
   524			break;
   525		default:
   526			id->nuse = id->nsze;
   527			break;
   528		}
   529	
   530		if (req->ns->bdev)
   531			nvmet_bdev_set_limits(req->ns->bdev, id);
   532		else {
   533			id->msrc = to0based(BIO_MAX_VECS);
 > 534			id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
   535			id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
   536		}
   537	
   538		/*
   539		 * We just provide a single LBA format that matches what the
   540		 * underlying device reports.
   541		 */
   542		id->nlbaf = 0;
   543		id->flbas = 0;
   544	
   545		/*
   546		 * Our namespace might always be shared.  Not just with other
   547		 * controllers, but also with any other user of the block device.
   548		 */
   549		id->nmic = NVME_NS_NMIC_SHARED;
   550		id->anagrpid = cpu_to_le32(req->ns->anagrpid);
   551	
   552		memcpy(&id->nguid, &req->ns->nguid, sizeof(id->nguid));
   553	
   554		id->lbaf[0].ds = req->ns->blksize_shift;
   555	
   556		if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns)) {
   557			id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
   558				  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
   559				  NVME_NS_DPC_PI_TYPE3;
   560			id->mc = NVME_MC_EXTENDED_LBA;
   561			id->dps = req->ns->pi_type;
   562			id->flbas = NVME_NS_FLBAS_META_EXT;
   563			id->lbaf[0].ms = cpu_to_le16(req->ns->metadata_size);
   564		}
   565	
   566		if (req->ns->readonly)
   567			id->nsattr |= (1 << 0);
   568	done:
   569		if (!status)
   570			status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
   571	
   572		kfree(id);
   573	out:
   574		nvmet_req_complete(req, status);
   575	}
   576	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-07 20:12                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 20:12 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5685 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220207]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-r031-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080346.u4ubCCIs-lkp(a)intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/nvme/target/admin-cmd.c:534:15: warning: implicit conversion from '__le32' (aka 'unsigned int') to '__le16' (aka 'unsigned short') changes value from 2097152 to 0 [-Wconstant-conversion]
                   id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
                             ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/byteorder/generic.h:88:21: note: expanded from macro 'cpu_to_le32'
   #define cpu_to_le32 __cpu_to_le32
                       ^
   include/uapi/linux/byteorder/big_endian.h:34:27: note: expanded from macro '__cpu_to_le32'
   #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning generated.


vim +534 drivers/nvme/target/admin-cmd.c

   488	
   489	static void nvmet_execute_identify_ns(struct nvmet_req *req)
   490	{
   491		struct nvme_id_ns *id;
   492		u16 status;
   493	
   494		if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
   495			req->error_loc = offsetof(struct nvme_identify, nsid);
   496			status = NVME_SC_INVALID_NS | NVME_SC_DNR;
   497			goto out;
   498		}
   499	
   500		id = kzalloc(sizeof(*id), GFP_KERNEL);
   501		if (!id) {
   502			status = NVME_SC_INTERNAL;
   503			goto out;
   504		}
   505	
   506		/* return an all zeroed buffer if we can't find an active namespace */
   507		status = nvmet_req_find_ns(req);
   508		if (status) {
   509			status = 0;
   510			goto done;
   511		}
   512	
   513		nvmet_ns_revalidate(req->ns);
   514	
   515		/*
   516		 * nuse = ncap = nsze isn't always true, but we have no way to find
   517		 * that out from the underlying device.
   518		 */
   519		id->ncap = id->nsze =
   520			cpu_to_le64(req->ns->size >> req->ns->blksize_shift);
   521		switch (req->port->ana_state[req->ns->anagrpid]) {
   522		case NVME_ANA_INACCESSIBLE:
   523		case NVME_ANA_PERSISTENT_LOSS:
   524			break;
   525		default:
   526			id->nuse = id->nsze;
   527			break;
   528		}
   529	
   530		if (req->ns->bdev)
   531			nvmet_bdev_set_limits(req->ns->bdev, id);
   532		else {
   533			id->msrc = to0based(BIO_MAX_VECS);
 > 534			id->mssrl = cpu_to_le32(BIO_MAX_VECS << (PAGE_SHIFT - SECTOR_SHIFT));
   535			id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
   536		}
   537	
   538		/*
   539		 * We just provide a single LBA format that matches what the
   540		 * underlying device reports.
   541		 */
   542		id->nlbaf = 0;
   543		id->flbas = 0;
   544	
   545		/*
   546		 * Our namespace might always be shared.  Not just with other
   547		 * controllers, but also with any other user of the block device.
   548		 */
   549		id->nmic = NVME_NS_NMIC_SHARED;
   550		id->anagrpid = cpu_to_le32(req->ns->anagrpid);
   551	
   552		memcpy(&id->nguid, &req->ns->nguid, sizeof(id->nguid));
   553	
   554		id->lbaf[0].ds = req->ns->blksize_shift;
   555	
   556		if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns)) {
   557			id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
   558				  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
   559				  NVME_NS_DPC_PI_TYPE3;
   560			id->mc = NVME_MC_EXTENDED_LBA;
   561			id->dps = req->ns->pi_type;
   562			id->flbas = NVME_NS_FLBAS_META_EXT;
   563			id->lbaf[0].ms = cpu_to_le16(req->ns->metadata_size);
   564		}
   565	
   566		if (req->ns->readonly)
   567			id->nsattr |= (1 << 0);
   568	done:
   569		if (!status)
   570			status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
   571	
   572		kfree(id);
   573	out:
   574		nvmet_req_complete(req, status);
   575	}
   576	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-07 22:45                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 22:45 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r001-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080650.48C9Ps00-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12a9801a7301f1a1e2ea355c5a4438dab17894cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 12a9801a7301f1a1e2ea355c5a4438dab17894cf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:185:5: warning: no previous prototype for 'blk_copy_offload' [-Wmissing-prototypes]
     185 | int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
         |     ^~~~~~~~~~~~~~~~


vim +/blk_copy_offload +185 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-07 22:45                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 22:45 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: axboe, javier, msnitzer, kbuild-all, linux-scsi, chaitanyak,
	linux-nvme, linux-block, dm-devel, linux-fsdevel

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r001-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080650.48C9Ps00-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12a9801a7301f1a1e2ea355c5a4438dab17894cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 12a9801a7301f1a1e2ea355c5a4438dab17894cf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:185:5: warning: no previous prototype for 'blk_copy_offload' [-Wmissing-prototypes]
     185 | int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
         |     ^~~~~~~~~~~~~~~~


vim +/blk_copy_offload +185 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-07 22:45                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 22:45 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5496 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r001-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080650.48C9Ps00-lkp(a)intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12a9801a7301f1a1e2ea355c5a4438dab17894cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 12a9801a7301f1a1e2ea355c5a4438dab17894cf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:185:5: warning: no previous prototype for 'blk_copy_offload' [-Wmissing-prototypes]
     185 | int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
         |     ^~~~~~~~~~~~~~~~


vim +/blk_copy_offload +185 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-07 23:26                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 23:26 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: llvm, kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r045-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080735.lyaEe5Bq-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12a9801a7301f1a1e2ea355c5a4438dab17894cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 12a9801a7301f1a1e2ea355c5a4438dab17894cf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:185:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:185:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
   1 warning generated.


vim +/blk_copy_offload +185 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-07 23:26                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 23:26 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: axboe, javier, msnitzer, kbuild-all, linux-scsi, llvm,
	chaitanyak, linux-nvme, linux-block, dm-devel, linux-fsdevel

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r045-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080735.lyaEe5Bq-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12a9801a7301f1a1e2ea355c5a4438dab17894cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 12a9801a7301f1a1e2ea355c5a4438dab17894cf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:185:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:185:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
   1 warning generated.


vim +/blk_copy_offload +185 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-07 23:26                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-07 23:26 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5787 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r045-20220207 (https://download.01.org/0day-ci/archive/20220208/202202080735.lyaEe5Bq-lkp(a)intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12a9801a7301f1a1e2ea355c5a4438dab17894cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 12a9801a7301f1a1e2ea355c5a4438dab17894cf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-lib.c:185:5: warning: no previous prototype for function 'blk_copy_offload' [-Wmissing-prototypes]
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
       ^
   block/blk-lib.c:185:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   ^
   static 
   1 warning generated.


vim +/blk_copy_offload +185 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 05/10] block: add emulation for copy
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-08  3:20                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-08  3:20 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r001-20220207 (https://download.01.org/0day-ci/archive/20220208/202202081132.axCkiVgv-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a7bb30870db803af4ad955a968992222bcfb478f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout a7bb30870db803af4ad955a968992222bcfb478f
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   block/blk-lib.c:185:5: warning: no previous prototype for 'blk_copy_offload' [-Wmissing-prototypes]
     185 | int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
         |     ^~~~~~~~~~~~~~~~
>> block/blk-lib.c:275:5: warning: no previous prototype for 'blk_submit_rw_buf' [-Wmissing-prototypes]
     275 | int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
         |     ^~~~~~~~~~~~~~~~~


vim +/blk_submit_rw_buf +275 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	
 > 275	int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   276					sector_t sector, unsigned int op, gfp_t gfp_mask)
   277	{
   278		struct request_queue *q = bdev_get_queue(bdev);
   279		struct bio *bio, *parent = NULL;
   280		sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
   281				queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
   282		sector_t len, remaining;
   283		int ret;
   284	
   285		for (remaining = buf_len; remaining > 0; remaining -= len) {
   286			len = min_t(int, max_hw_len, remaining);
   287	retry:
   288			bio = bio_map_kern(q, buf, len, gfp_mask);
   289			if (IS_ERR(bio)) {
   290				len >>= 1;
   291				if (len)
   292					goto retry;
   293				return PTR_ERR(bio);
   294			}
   295	
   296			bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
   297			bio->bi_opf = op;
   298			bio_set_dev(bio, bdev);
   299			bio->bi_end_io = NULL;
   300			bio->bi_private = NULL;
   301	
   302			if (parent) {
   303				bio_chain(parent, bio);
   304				submit_bio(parent);
   305			}
   306			parent = bio;
   307			sector += len;
   308			buf = (char *) buf + len;
   309		}
   310		ret = submit_bio_wait(bio);
   311		bio_put(bio);
   312	
   313		return ret;
   314	}
   315	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 05/10] block: add emulation for copy
@ 2022-02-08  3:20                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-08  3:20 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: axboe, javier, msnitzer, kbuild-all, linux-scsi, chaitanyak,
	linux-nvme, linux-block, dm-devel, linux-fsdevel

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r001-20220207 (https://download.01.org/0day-ci/archive/20220208/202202081132.axCkiVgv-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a7bb30870db803af4ad955a968992222bcfb478f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout a7bb30870db803af4ad955a968992222bcfb478f
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   block/blk-lib.c:185:5: warning: no previous prototype for 'blk_copy_offload' [-Wmissing-prototypes]
     185 | int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
         |     ^~~~~~~~~~~~~~~~
>> block/blk-lib.c:275:5: warning: no previous prototype for 'blk_submit_rw_buf' [-Wmissing-prototypes]
     275 | int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
         |     ^~~~~~~~~~~~~~~~~


vim +/blk_submit_rw_buf +275 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	
 > 275	int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   276					sector_t sector, unsigned int op, gfp_t gfp_mask)
   277	{
   278		struct request_queue *q = bdev_get_queue(bdev);
   279		struct bio *bio, *parent = NULL;
   280		sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
   281				queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
   282		sector_t len, remaining;
   283		int ret;
   284	
   285		for (remaining = buf_len; remaining > 0; remaining -= len) {
   286			len = min_t(int, max_hw_len, remaining);
   287	retry:
   288			bio = bio_map_kern(q, buf, len, gfp_mask);
   289			if (IS_ERR(bio)) {
   290				len >>= 1;
   291				if (len)
   292					goto retry;
   293				return PTR_ERR(bio);
   294			}
   295	
   296			bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
   297			bio->bi_opf = op;
   298			bio_set_dev(bio, bdev);
   299			bio->bi_end_io = NULL;
   300			bio->bi_private = NULL;
   301	
   302			if (parent) {
   303				bio_chain(parent, bio);
   304				submit_bio(parent);
   305			}
   306			parent = bio;
   307			sector += len;
   308			buf = (char *) buf + len;
   309		}
   310		ret = submit_bio_wait(bio);
   311		bio_put(bio);
   312	
   313		return ret;
   314	}
   315	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 05/10] block: add emulation for copy
@ 2022-02-08  3:20                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-08  3:20 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7043 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r001-20220207 (https://download.01.org/0day-ci/archive/20220208/202202081132.axCkiVgv-lkp(a)intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a7bb30870db803af4ad955a968992222bcfb478f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout a7bb30870db803af4ad955a968992222bcfb478f
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   block/blk-lib.c:185:5: warning: no previous prototype for 'blk_copy_offload' [-Wmissing-prototypes]
     185 | int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
         |     ^~~~~~~~~~~~~~~~
>> block/blk-lib.c:275:5: warning: no previous prototype for 'blk_submit_rw_buf' [-Wmissing-prototypes]
     275 | int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
         |     ^~~~~~~~~~~~~~~~~


vim +/blk_submit_rw_buf +275 block/blk-lib.c

   180	
   181	/*
   182	 * blk_copy_offload	- Use device's native copy offload feature
   183	 * Go through user provide payload, prepare new payload based on device's copy offload limits.
   184	 */
 > 185	int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
   186			struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
   187	{
   188		struct request_queue *sq = bdev_get_queue(src_bdev);
   189		struct request_queue *dq = bdev_get_queue(dst_bdev);
   190		struct bio *read_bio, *write_bio;
   191		struct copy_ctx *ctx;
   192		struct cio *cio;
   193		struct page *token;
   194		sector_t src_blk, copy_len, dst_blk;
   195		sector_t remaining, max_copy_len = LONG_MAX;
   196		int ri = 0, ret = 0;
   197	
   198		cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
   199		if (!cio)
   200			return -ENOMEM;
   201		atomic_set(&cio->refcount, 0);
   202		cio->rlist = rlist;
   203	
   204		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
   205				(sector_t)dq->limits.max_copy_sectors);
   206		max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
   207				(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
   208	
   209		for (ri = 0; ri < nr_srcs; ri++) {
   210			cio->rlist[ri].comp_len = rlist[ri].len;
   211			for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
   212				remaining > 0;
   213				remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
   214				copy_len = min(remaining, max_copy_len);
   215	
   216				token = alloc_page(gfp_mask);
   217				if (unlikely(!token)) {
   218					ret = -ENOMEM;
   219					goto err_token;
   220				}
   221	
   222				read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
   223						gfp_mask);
   224				if (!read_bio) {
   225					ret = -ENOMEM;
   226					goto err_read_bio;
   227				}
   228				read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
   229				read_bio->bi_iter.bi_size = copy_len;
   230				__bio_add_page(read_bio, token, PAGE_SIZE, 0);
   231				ret = submit_bio_wait(read_bio);
   232				if (ret) {
   233					bio_put(read_bio);
   234					goto err_read_bio;
   235				}
   236				bio_put(read_bio);
   237				ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
   238				if (!ctx) {
   239					ret = -ENOMEM;
   240					goto err_read_bio;
   241				}
   242				ctx->cio = cio;
   243				ctx->range_idx = ri;
   244				ctx->start_sec = rlist[ri].src;
   245	
   246				write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
   247						gfp_mask);
   248				if (!write_bio) {
   249					ret = -ENOMEM;
   250					goto err_read_bio;
   251				}
   252	
   253				write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
   254				write_bio->bi_iter.bi_size = copy_len;
   255				__bio_add_page(write_bio, token, PAGE_SIZE, 0);
   256				write_bio->bi_end_io = bio_copy_end_io;
   257				write_bio->bi_private = ctx;
   258				atomic_inc(&cio->refcount);
   259				submit_bio(write_bio);
   260			}
   261		}
   262	
   263		/* Wait for completion of all IO's*/
   264		return cio_await_completion(cio);
   265	
   266	err_read_bio:
   267		__free_page(token);
   268	err_token:
   269		rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
   270	
   271		cio->io_err = ret;
   272		return cio_await_completion(cio);
   273	}
   274	
 > 275	int blk_submit_rw_buf(struct block_device *bdev, void *buf, sector_t buf_len,
   276					sector_t sector, unsigned int op, gfp_t gfp_mask)
   277	{
   278		struct request_queue *q = bdev_get_queue(bdev);
   279		struct bio *bio, *parent = NULL;
   280		sector_t max_hw_len = min_t(unsigned int, queue_max_hw_sectors(q),
   281				queue_max_segments(q) << (PAGE_SHIFT - SECTOR_SHIFT)) << SECTOR_SHIFT;
   282		sector_t len, remaining;
   283		int ret;
   284	
   285		for (remaining = buf_len; remaining > 0; remaining -= len) {
   286			len = min_t(int, max_hw_len, remaining);
   287	retry:
   288			bio = bio_map_kern(q, buf, len, gfp_mask);
   289			if (IS_ERR(bio)) {
   290				len >>= 1;
   291				if (len)
   292					goto retry;
   293				return PTR_ERR(bio);
   294			}
   295	
   296			bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
   297			bio->bi_opf = op;
   298			bio_set_dev(bio, bdev);
   299			bio->bi_end_io = NULL;
   300			bio->bi_private = NULL;
   301	
   302			if (parent) {
   303				bio_chain(parent, bio);
   304				submit_bio(parent);
   305			}
   306			parent = bio;
   307			sector += len;
   308			buf = (char *) buf + len;
   309		}
   310		ret = submit_bio_wait(bio);
   311		bio_put(bio);
   312	
   313		return ret;
   314	}
   315	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 02/10] block: Introduce queue limits for copy-offload support
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
@ 2022-02-08  7:01                   ` Damien Le Moal
  -1 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2022-02-08  7:01 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, SelvaKumar S

On 2/7/22 23:13, Nitesh Shetty wrote:
> Add device limits as sysfs entries,
>         - copy_offload (READ_WRITE)
>         - max_copy_sectors (READ_ONLY)

Why read-only ? With the name as you have it, it seems to be the soft
control for the max size of copy operations rather than the actual
device limit. So it would be better to align to other limits like max
sectors/max_hw_sectors and have:

	max_copy_sectors (RW)
	max_hw_copy_sectors (RO)

>         - max_copy_ranges_sectors (READ_ONLY)
>         - max_copy_nr_ranges (READ_ONLY)

Same for these.

> 
> copy_offload(= 0), is disabled by default. This needs to be enabled if
> copy-offload needs to be used.

How does this work ? This limit will be present for a DM device AND the
underlying devices of the DM target. But "offload" applies only to the
underlying devices, not the DM device...

Also, since this is not an underlying device limitation but an on/off
switch, this should probably be moved to a request_queue boolean field
or flag bit, controlled with sysfs.

> max_copy_sectors = 0, indicates the device doesn't support native copy.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  block/blk-settings.c   |  4 ++++
>  block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/blkdev.h | 12 ++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b880c70e22e4..818454552cf8 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->misaligned = 0;
>  	lim->zoned = BLK_ZONED_NONE;
>  	lim->zone_write_granularity = 0;
> +	lim->copy_offload = 0;
> +	lim->max_copy_sectors = 0;
> +	lim->max_copy_nr_ranges = 0;
> +	lim->max_copy_range_sectors = 0;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>  
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9f32882ceb2f..dc68ae6b55c9 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -171,6 +171,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
>  	return queue_var_show(q->limits.discard_granularity, page);
>  }
>  
> +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->limits.copy_offload, page);
> +}
> +
> +static ssize_t queue_copy_offload_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long copy_offload;
> +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (copy_offload && q->limits.max_copy_sectors == 0)
> +		return -EINVAL;
> +
> +	if (copy_offload)
> +		q->limits.copy_offload = BLK_COPY_OFFLOAD;
> +	else
> +		q->limits.copy_offload = 0;
> +
> +	return ret;
> +}
> +
> +static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_sectors, page);
> +}
> +
> +static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
> +		char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_range_sectors, page);
> +}
> +
> +static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
> +		char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_nr_ranges, page);
> +}
> +
>  static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
>  {
>  
> @@ -597,6 +639,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
>  QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
>  QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
>  
> +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
> +QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
> +QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
> +QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
> +
>  QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
>  QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
>  QUEUE_RW_ENTRY(queue_poll, "io_poll");
> @@ -643,6 +690,10 @@ static struct attribute *queue_attrs[] = {
>  	&queue_discard_max_entry.attr,
>  	&queue_discard_max_hw_entry.attr,
>  	&queue_discard_zeroes_data_entry.attr,
> +	&queue_copy_offload_entry.attr,
> +	&queue_max_copy_sectors_entry.attr,
> +	&queue_max_copy_range_sectors_entry.attr,
> +	&queue_max_copy_nr_ranges_entry.attr,
>  	&queue_write_same_max_entry.attr,
>  	&queue_write_zeroes_max_entry.attr,
>  	&queue_zone_append_max_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index efed3820cbf7..f63ae50f1de3 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -51,6 +51,12 @@ extern struct class block_class;
>  /* Doing classic polling */
>  #define BLK_MQ_POLL_CLASSIC -1
>  
> +/* Define copy offload options */
> +enum blk_copy {
> +	BLK_COPY_EMULATE = 0,
> +	BLK_COPY_OFFLOAD,
> +};
> +
>  /*
>   * Maximum number of blkcg policies allowed to be registered concurrently.
>   * Defined here to simplify include dependency.
> @@ -253,6 +259,10 @@ struct queue_limits {
>  	unsigned int		discard_granularity;
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
> +	unsigned int            copy_offload;
> +	unsigned int            max_copy_sectors;
> +	unsigned short          max_copy_range_sectors;
> +	unsigned short          max_copy_nr_ranges;
>  
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
> @@ -562,6 +572,7 @@ struct request_queue {
>  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
>  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
>  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> +#define QUEUE_FLAG_COPY		30	/* supports copy offload */

Then what is the point of max_copy_sectors limit ? You can test support
by the device by looking at max_copy_sectors != 0, no ? This flag is
duplicated information.
I would rather use it for the on/off switch for the copy offload,
removing the copy_offload limit.

>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> @@ -585,6 +596,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
>  #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
>  #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
> +#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
>  #define blk_queue_zone_resetall(q)	\
>  	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
>  #define blk_queue_secure_erase(q) \


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 02/10] block: Introduce queue limits for copy-offload support
@ 2022-02-08  7:01                   ` Damien Le Moal
  0 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2022-02-08  7:01 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, zach.brown, chaitanyak, SelvaKumar S,
	msnitzer, josef, linux-block, dsterba, kbusch, Frederick.Knight,
	axboe, tytso, joshi.k, martin.petersen, arnav.dawn, jack,
	linux-fsdevel, lsf-pc

On 2/7/22 23:13, Nitesh Shetty wrote:
> Add device limits as sysfs entries,
>         - copy_offload (READ_WRITE)
>         - max_copy_sectors (READ_ONLY)

Why read-only ? With the name as you have it, it seems to be the soft
control for the max size of copy operations rather than the actual
device limit. So it would be better to align to other limits like max
sectors/max_hw_sectors and have:

	max_copy_sectors (RW)
	max_hw_copy_sectors (RO)

>         - max_copy_ranges_sectors (READ_ONLY)
>         - max_copy_nr_ranges (READ_ONLY)

Same for these.

> 
> copy_offload(= 0), is disabled by default. This needs to be enabled if
> copy-offload needs to be used.

How does this work ? This limit will be present for a DM device AND the
underlying devices of the DM target. But "offload" applies only to the
underlying devices, not the DM device...

Also, since this is not an underlying device limitation but an on/off
switch, this should probably be moved to a request_queue boolean field
or flag bit, controlled with sysfs.

> max_copy_sectors = 0, indicates the device doesn't support native copy.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  block/blk-settings.c   |  4 ++++
>  block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/blkdev.h | 12 ++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b880c70e22e4..818454552cf8 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->misaligned = 0;
>  	lim->zoned = BLK_ZONED_NONE;
>  	lim->zone_write_granularity = 0;
> +	lim->copy_offload = 0;
> +	lim->max_copy_sectors = 0;
> +	lim->max_copy_nr_ranges = 0;
> +	lim->max_copy_range_sectors = 0;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>  
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9f32882ceb2f..dc68ae6b55c9 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -171,6 +171,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
>  	return queue_var_show(q->limits.discard_granularity, page);
>  }
>  
> +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->limits.copy_offload, page);
> +}
> +
> +static ssize_t queue_copy_offload_store(struct request_queue *q,
> +				       const char *page, size_t count)
> +{
> +	unsigned long copy_offload;
> +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (copy_offload && q->limits.max_copy_sectors == 0)
> +		return -EINVAL;
> +
> +	if (copy_offload)
> +		q->limits.copy_offload = BLK_COPY_OFFLOAD;
> +	else
> +		q->limits.copy_offload = 0;
> +
> +	return ret;
> +}
> +
> +static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_sectors, page);
> +}
> +
> +static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
> +		char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_range_sectors, page);
> +}
> +
> +static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
> +		char *page)
> +{
> +	return queue_var_show(q->limits.max_copy_nr_ranges, page);
> +}
> +
>  static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
>  {
>  
> @@ -597,6 +639,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
>  QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
>  QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
>  
> +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
> +QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
> +QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
> +QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
> +
>  QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
>  QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
>  QUEUE_RW_ENTRY(queue_poll, "io_poll");
> @@ -643,6 +690,10 @@ static struct attribute *queue_attrs[] = {
>  	&queue_discard_max_entry.attr,
>  	&queue_discard_max_hw_entry.attr,
>  	&queue_discard_zeroes_data_entry.attr,
> +	&queue_copy_offload_entry.attr,
> +	&queue_max_copy_sectors_entry.attr,
> +	&queue_max_copy_range_sectors_entry.attr,
> +	&queue_max_copy_nr_ranges_entry.attr,
>  	&queue_write_same_max_entry.attr,
>  	&queue_write_zeroes_max_entry.attr,
>  	&queue_zone_append_max_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index efed3820cbf7..f63ae50f1de3 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -51,6 +51,12 @@ extern struct class block_class;
>  /* Doing classic polling */
>  #define BLK_MQ_POLL_CLASSIC -1
>  
> +/* Define copy offload options */
> +enum blk_copy {
> +	BLK_COPY_EMULATE = 0,
> +	BLK_COPY_OFFLOAD,
> +};
> +
>  /*
>   * Maximum number of blkcg policies allowed to be registered concurrently.
>   * Defined here to simplify include dependency.
> @@ -253,6 +259,10 @@ struct queue_limits {
>  	unsigned int		discard_granularity;
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
> +	unsigned int            copy_offload;
> +	unsigned int            max_copy_sectors;
> +	unsigned short          max_copy_range_sectors;
> +	unsigned short          max_copy_nr_ranges;
>  
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
> @@ -562,6 +572,7 @@ struct request_queue {
>  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
>  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
>  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> +#define QUEUE_FLAG_COPY		30	/* supports copy offload */

Then what is the point of max_copy_sectors limit ? You can test support
by the device by looking at max_copy_sectors != 0, no ? This flag is
duplicated information.
I would rather use it for the on/off switch for the copy offload,
removing the copy_offload limit.

>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> @@ -585,6 +596,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
>  #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
>  #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
> +#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
>  #define blk_queue_zone_resetall(q)	\
>  	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
>  #define blk_queue_secure_erase(q) \


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
@ 2022-02-08  7:21                   ` Damien Le Moal
  -1 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2022-02-08  7:21 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	osandov, djwong, josef, clm, dsterba, tytso, jack, joshi.k,
	arnav.dawn, SelvaKumar S

On 2/7/22 23:13, Nitesh Shetty wrote:
> Introduce blkdev_issue_copy which supports source and destination bdevs,
> and a array of (source, destination and copy length) tuples.

s/a/an

> Introduce REQ_COP copy offload operation flag. Create a read-write

REQ_COPY ?

> bio pair with a token as payload and submitted to the device in order.
> the read request populates token with source specific information which
> is then passed with write request.
> Ths design is courtsey Mikulas Patocka<mpatocka@>'s token based copy

s/Ths design is courtsey/This design is courtesy of

> 
> Larger copy operation may be divided if necessary by looking at device
> limits.

may or will ?
by looking at -> depending on the ?

> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  block/blk-lib.c           | 216 ++++++++++++++++++++++++++++++++++++++
>  block/blk-settings.c      |   2 +
>  block/blk.h               |   2 +
>  include/linux/blk_types.h |  20 ++++
>  include/linux/blkdev.h    |   3 +
>  include/uapi/linux/fs.h   |  14 +++
>  6 files changed, 257 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 1b8ced45e4e5..3ae2c27b566e 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -135,6 +135,222 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  }
>  EXPORT_SYMBOL(blkdev_issue_discard);
>  
> +/*
> + * Wait on and process all in-flight BIOs.  This must only be called once
> + * all bios have been issued so that the refcount can only decrease.
> + * This just waits for all bios to make it through bio_copy_end_io. IO
> + * errors are propagated through cio->io_error.
> + */
> +static int cio_await_completion(struct cio *cio)
> +{
> +	int ret = 0;
> +
> +	while (atomic_read(&cio->refcount)) {
> +		cio->waiter = current;
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		blk_io_schedule();
> +		/* wake up sets us TASK_RUNNING */
> +		cio->waiter = NULL;
> +		ret = cio->io_err;

Why is this in the loop ?

> +	}
> +	kvfree(cio);
> +
> +	return ret;
> +}
> +
> +static void bio_copy_end_io(struct bio *bio)
> +{
> +	struct copy_ctx *ctx = bio->bi_private;
> +	struct cio *cio = ctx->cio;
> +	sector_t clen;
> +	int ri = ctx->range_idx;
> +
> +	if (bio->bi_status) {
> +		cio->io_err = bio->bi_status;
> +		clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT;
> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> +	}
> +	__free_page(bio->bi_io_vec[0].bv_page);
> +	kfree(ctx);
> +	bio_put(bio);
> +
> +	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
> +		wake_up_process(cio->waiter);

This looks racy: the cio->waiter test and wakeup are not atomic.

> +}
> +
> +/*
> + * blk_copy_offload	- Use device's native copy offload feature
> + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> + */
> +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> +{
> +	struct request_queue *sq = bdev_get_queue(src_bdev);
> +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> +	struct bio *read_bio, *write_bio;
> +	struct copy_ctx *ctx;
> +	struct cio *cio;
> +	struct page *token;
> +	sector_t src_blk, copy_len, dst_blk;
> +	sector_t remaining, max_copy_len = LONG_MAX;
> +	int ri = 0, ret = 0;
> +
> +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> +	if (!cio)
> +		return -ENOMEM;
> +	atomic_set(&cio->refcount, 0);
> +	cio->rlist = rlist;
> +
> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> +			(sector_t)dq->limits.max_copy_sectors);

sq->limits.max_copy_sectors is already by definition smaller than
LONG_MAX, so there is no need for the min3 here.

> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;> +
> +	for (ri = 0; ri < nr_srcs; ri++) {
> +		cio->rlist[ri].comp_len = rlist[ri].len;
> +		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> +			remaining > 0;
> +			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {

This is unreadable.

> +			copy_len = min(remaining, max_copy_len);
> +
> +			token = alloc_page(gfp_mask);
> +			if (unlikely(!token)) {
> +				ret = -ENOMEM;
> +				goto err_token;
> +			}
> +
> +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!read_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> +			read_bio->bi_iter.bi_size = copy_len;
> +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> +			ret = submit_bio_wait(read_bio);
> +			if (ret) {
> +				bio_put(read_bio);
> +				goto err_read_bio;
> +			}
> +			bio_put(read_bio);
> +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> +			if (!ctx) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}

This should be done before the read.

> +			ctx->cio = cio;
> +			ctx->range_idx = ri;
> +			ctx->start_sec = rlist[ri].src;
> +
> +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!write_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +
> +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> +			write_bio->bi_iter.bi_size = copy_len;
> +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> +			write_bio->bi_end_io = bio_copy_end_io;
> +			write_bio->bi_private = ctx;
> +			atomic_inc(&cio->refcount);
> +			submit_bio(write_bio);
> +		}
> +	}
> +
> +	/* Wait for completion of all IO's*/
> +	return cio_await_completion(cio);
> +
> +err_read_bio:
> +	__free_page(token);
> +err_token:
> +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> +
> +	cio->io_err = ret;
> +	return cio_await_completion(cio);
> +}
> +
> +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> +{
> +	unsigned int align_mask = max(
> +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> +	sector_t len = 0;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		if (rlist[i].len)
> +			len += rlist[i].len;
> +		else
> +			return -EINVAL;
> +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> +				(rlist[i].len & align_mask))
> +			return -EINVAL;
> +		rlist[i].comp_len = 0;
> +	}
> +
> +	if (!len && len >= MAX_COPY_TOTAL_LENGTH)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> +		struct request_queue *dest_q)
> +{
> +	if (dest_q->limits.copy_offload == BLK_COPY_OFFLOAD &&
> +			src_q->limits.copy_offload == BLK_COPY_OFFLOAD)
> +		return true;
> +
> +	return false;
> +}
> +
> +/*
> + * blkdev_issue_copy - queue a copy
> + * @src_bdev:	source block device
> + * @nr_srcs:	number of source ranges to copy
> + * @src_rlist:	array of source ranges
> + * @dest_bdev:	destination block device
> + * @gfp_mask:   memory allocation flags (for bio_alloc)
> + * @flags:	BLKDEV_COPY_* flags to control behaviour
> + *
> + * Description:
> + *	Copy source ranges from source block device to destination block device.
> + *	length of a source range cannot be zero.
> + */
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> +		struct range_entry *rlist, struct block_device *dest_bdev,
> +		gfp_t gfp_mask, int flags)
> +{
> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> +	int ret = -EINVAL;
> +
> +	if (!src_q || !dest_q)
> +		return -ENXIO;
> +
> +	if (!nr)
> +		return -EINVAL;
> +
> +	if (nr >= MAX_COPY_NR_RANGE)
> +		return -EINVAL;
> +
> +	if (bdev_read_only(dest_bdev))
> +		return -EPERM;
> +
> +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> +	if (ret)
> +		return ret;
> +
> +	if (blk_check_copy_offload(src_q, dest_q))
> +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(blkdev_issue_copy);
> +
>  /**
>   * __blkdev_issue_write_same - generate number of bios with same page
>   * @bdev:	target blockdev
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 818454552cf8..4c8d48b8af25 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -545,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  	t->max_segment_size = min_not_zero(t->max_segment_size,
>  					   b->max_segment_size);
>  
> +	t->max_copy_sectors = min_not_zero(t->max_copy_sectors, b->max_copy_sectors);

Why min_not_zero ? If one of the underlying drive does not support copy
offload, you cannot report that the top drive does.

> +
>  	t->misaligned |= b->misaligned;
>  
>  	alignment = queue_limit_alignment_offset(b, start);
> diff --git a/block/blk.h b/block/blk.h
> index abb663a2a147..94d2b055750b 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
>  		break;
>  	}
>  
> +	if (unlikely(op_is_copy(bio->bi_opf)))
> +		return false;
>  	/*
>  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
>  	 * This is a quick and dirty check that relies on the fact that
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 5561e58d158a..0a3fee8ad61c 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -418,6 +418,7 @@ enum req_flag_bits {
>  	/* for driver use */
>  	__REQ_DRV,
>  	__REQ_SWAP,		/* swapping request. */
> +	__REQ_COPY,		/* copy request*/
>  	__REQ_NR_BITS,		/* stops here */
>  };
>  
> @@ -442,6 +443,7 @@ enum req_flag_bits {
>  
>  #define REQ_DRV			(1ULL << __REQ_DRV)
>  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> +#define REQ_COPY		(1ULL << __REQ_COPY)
>  
>  #define REQ_FAILFAST_MASK \
>  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> @@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op)
>  	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
>  }
>  
> +static inline bool op_is_copy(unsigned int op)
> +{
> +	return (op & REQ_COPY);
> +}
> +
>  /*
>   * Check if a bio or request operation is a zone management operation, with
>   * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
> @@ -532,4 +539,17 @@ struct blk_rq_stat {
>  	u64 batch;
>  };
>  
> +struct cio {
> +	atomic_t refcount;
> +	blk_status_t io_err;
> +	struct range_entry *rlist;
> +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> +};
> +
> +struct copy_ctx {
> +	int range_idx;
> +	sector_t start_sec;
> +	struct cio *cio;
> +};
> +
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f63ae50f1de3..15597488040c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1120,6 +1120,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		struct bio **biop);
>  struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
>  		gfp_t gfp_mask);
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *src_rlist, struct block_device *dest_bdev,
> +		gfp_t gfp_mask, int flags);
>  
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index bdf7b404b3e7..55bca8f6e8ed 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,20 @@ struct fstrim_range {
>  	__u64 minlen;
>  };
>  
> +/* Maximum no of entries supported */
> +#define MAX_COPY_NR_RANGE	(1 << 12)
> +
> +/* maximum total copy length */
> +#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
> +
> +/* Source range entry for copy */
> +struct range_entry {
> +	__u64 src;
> +	__u64 dst;
> +	__u64 len;
> +	__u64 comp_len;
> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-08  7:21                   ` Damien Le Moal
  0 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2022-02-08  7:21 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, chaitanyak, SelvaKumar S, msnitzer,
	josef, linux-block, dsterba, kbusch, Frederick.Knight, axboe,
	tytso, joshi.k, martin.petersen, arnav.dawn, jack, linux-fsdevel

On 2/7/22 23:13, Nitesh Shetty wrote:
> Introduce blkdev_issue_copy which supports source and destination bdevs,
> and a array of (source, destination and copy length) tuples.

s/a/an

> Introduce REQ_COP copy offload operation flag. Create a read-write

REQ_COPY ?

> bio pair with a token as payload and submitted to the device in order.
> the read request populates token with source specific information which
> is then passed with write request.
> Ths design is courtsey Mikulas Patocka<mpatocka@>'s token based copy

s/Ths design is courtsey/This design is courtesy of

> 
> Larger copy operation may be divided if necessary by looking at device
> limits.

may or will ?
by looking at -> depending on the ?

> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> ---
>  block/blk-lib.c           | 216 ++++++++++++++++++++++++++++++++++++++
>  block/blk-settings.c      |   2 +
>  block/blk.h               |   2 +
>  include/linux/blk_types.h |  20 ++++
>  include/linux/blkdev.h    |   3 +
>  include/uapi/linux/fs.h   |  14 +++
>  6 files changed, 257 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 1b8ced45e4e5..3ae2c27b566e 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -135,6 +135,222 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  }
>  EXPORT_SYMBOL(blkdev_issue_discard);
>  
> +/*
> + * Wait on and process all in-flight BIOs.  This must only be called once
> + * all bios have been issued so that the refcount can only decrease.
> + * This just waits for all bios to make it through bio_copy_end_io. IO
> + * errors are propagated through cio->io_error.
> + */
> +static int cio_await_completion(struct cio *cio)
> +{
> +	int ret = 0;
> +
> +	while (atomic_read(&cio->refcount)) {
> +		cio->waiter = current;
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		blk_io_schedule();
> +		/* wake up sets us TASK_RUNNING */
> +		cio->waiter = NULL;
> +		ret = cio->io_err;

Why is this in the loop ?

> +	}
> +	kvfree(cio);
> +
> +	return ret;
> +}
> +
> +static void bio_copy_end_io(struct bio *bio)
> +{
> +	struct copy_ctx *ctx = bio->bi_private;
> +	struct cio *cio = ctx->cio;
> +	sector_t clen;
> +	int ri = ctx->range_idx;
> +
> +	if (bio->bi_status) {
> +		cio->io_err = bio->bi_status;
> +		clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT;
> +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> +	}
> +	__free_page(bio->bi_io_vec[0].bv_page);
> +	kfree(ctx);
> +	bio_put(bio);
> +
> +	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
> +		wake_up_process(cio->waiter);

This looks racy: the cio->waiter test and wakeup are not atomic.

> +}
> +
> +/*
> + * blk_copy_offload	- Use device's native copy offload feature
> + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> + */
> +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> +{
> +	struct request_queue *sq = bdev_get_queue(src_bdev);
> +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> +	struct bio *read_bio, *write_bio;
> +	struct copy_ctx *ctx;
> +	struct cio *cio;
> +	struct page *token;
> +	sector_t src_blk, copy_len, dst_blk;
> +	sector_t remaining, max_copy_len = LONG_MAX;
> +	int ri = 0, ret = 0;
> +
> +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> +	if (!cio)
> +		return -ENOMEM;
> +	atomic_set(&cio->refcount, 0);
> +	cio->rlist = rlist;
> +
> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> +			(sector_t)dq->limits.max_copy_sectors);

sq->limits.max_copy_sectors is already by definition smaller than
LONG_MAX, so there is no need for the min3 here.

> +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;> +
> +	for (ri = 0; ri < nr_srcs; ri++) {
> +		cio->rlist[ri].comp_len = rlist[ri].len;
> +		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> +			remaining > 0;
> +			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {

This is unreadable.

> +			copy_len = min(remaining, max_copy_len);
> +
> +			token = alloc_page(gfp_mask);
> +			if (unlikely(!token)) {
> +				ret = -ENOMEM;
> +				goto err_token;
> +			}
> +
> +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!read_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> +			read_bio->bi_iter.bi_size = copy_len;
> +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> +			ret = submit_bio_wait(read_bio);
> +			if (ret) {
> +				bio_put(read_bio);
> +				goto err_read_bio;
> +			}
> +			bio_put(read_bio);
> +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> +			if (!ctx) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}

This should be done before the read.

> +			ctx->cio = cio;
> +			ctx->range_idx = ri;
> +			ctx->start_sec = rlist[ri].src;
> +
> +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> +					gfp_mask);
> +			if (!write_bio) {
> +				ret = -ENOMEM;
> +				goto err_read_bio;
> +			}
> +
> +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> +			write_bio->bi_iter.bi_size = copy_len;
> +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> +			write_bio->bi_end_io = bio_copy_end_io;
> +			write_bio->bi_private = ctx;
> +			atomic_inc(&cio->refcount);
> +			submit_bio(write_bio);
> +		}
> +	}
> +
> +	/* Wait for completion of all IO's*/
> +	return cio_await_completion(cio);
> +
> +err_read_bio:
> +	__free_page(token);
> +err_token:
> +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> +
> +	cio->io_err = ret;
> +	return cio_await_completion(cio);
> +}
> +
> +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> +{
> +	unsigned int align_mask = max(
> +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> +	sector_t len = 0;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		if (rlist[i].len)
> +			len += rlist[i].len;
> +		else
> +			return -EINVAL;
> +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> +				(rlist[i].len & align_mask))
> +			return -EINVAL;
> +		rlist[i].comp_len = 0;
> +	}
> +
> +	if (!len && len >= MAX_COPY_TOTAL_LENGTH)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> +		struct request_queue *dest_q)
> +{
> +	if (dest_q->limits.copy_offload == BLK_COPY_OFFLOAD &&
> +			src_q->limits.copy_offload == BLK_COPY_OFFLOAD)
> +		return true;
> +
> +	return false;
> +}
> +
> +/*
> + * blkdev_issue_copy - queue a copy
> + * @src_bdev:	source block device
> + * @nr_srcs:	number of source ranges to copy
> + * @src_rlist:	array of source ranges
> + * @dest_bdev:	destination block device
> + * @gfp_mask:   memory allocation flags (for bio_alloc)
> + * @flags:	BLKDEV_COPY_* flags to control behaviour
> + *
> + * Description:
> + *	Copy source ranges from source block device to destination block device.
> + *	length of a source range cannot be zero.
> + */
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> +		struct range_entry *rlist, struct block_device *dest_bdev,
> +		gfp_t gfp_mask, int flags)
> +{
> +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> +	int ret = -EINVAL;
> +
> +	if (!src_q || !dest_q)
> +		return -ENXIO;
> +
> +	if (!nr)
> +		return -EINVAL;
> +
> +	if (nr >= MAX_COPY_NR_RANGE)
> +		return -EINVAL;
> +
> +	if (bdev_read_only(dest_bdev))
> +		return -EPERM;
> +
> +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> +	if (ret)
> +		return ret;
> +
> +	if (blk_check_copy_offload(src_q, dest_q))
> +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(blkdev_issue_copy);
> +
>  /**
>   * __blkdev_issue_write_same - generate number of bios with same page
>   * @bdev:	target blockdev
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 818454552cf8..4c8d48b8af25 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -545,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  	t->max_segment_size = min_not_zero(t->max_segment_size,
>  					   b->max_segment_size);
>  
> +	t->max_copy_sectors = min_not_zero(t->max_copy_sectors, b->max_copy_sectors);

Why min_not_zero ? If one of the underlying drive does not support copy
offload, you cannot report that the top drive does.

> +
>  	t->misaligned |= b->misaligned;
>  
>  	alignment = queue_limit_alignment_offset(b, start);
> diff --git a/block/blk.h b/block/blk.h
> index abb663a2a147..94d2b055750b 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
>  		break;
>  	}
>  
> +	if (unlikely(op_is_copy(bio->bi_opf)))
> +		return false;
>  	/*
>  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
>  	 * This is a quick and dirty check that relies on the fact that
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 5561e58d158a..0a3fee8ad61c 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -418,6 +418,7 @@ enum req_flag_bits {
>  	/* for driver use */
>  	__REQ_DRV,
>  	__REQ_SWAP,		/* swapping request. */
> +	__REQ_COPY,		/* copy request*/
>  	__REQ_NR_BITS,		/* stops here */
>  };
>  
> @@ -442,6 +443,7 @@ enum req_flag_bits {
>  
>  #define REQ_DRV			(1ULL << __REQ_DRV)
>  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> +#define REQ_COPY		(1ULL << __REQ_COPY)
>  
>  #define REQ_FAILFAST_MASK \
>  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> @@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op)
>  	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
>  }
>  
> +static inline bool op_is_copy(unsigned int op)
> +{
> +	return (op & REQ_COPY);
> +}
> +
>  /*
>   * Check if a bio or request operation is a zone management operation, with
>   * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
> @@ -532,4 +539,17 @@ struct blk_rq_stat {
>  	u64 batch;
>  };
>  
> +struct cio {
> +	atomic_t refcount;
> +	blk_status_t io_err;
> +	struct range_entry *rlist;
> +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> +};
> +
> +struct copy_ctx {
> +	int range_idx;
> +	sector_t start_sec;
> +	struct cio *cio;
> +};
> +
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f63ae50f1de3..15597488040c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1120,6 +1120,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		struct bio **biop);
>  struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
>  		gfp_t gfp_mask);
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> +		struct range_entry *src_rlist, struct block_device *dest_bdev,
> +		gfp_t gfp_mask, int flags);
>  
>  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index bdf7b404b3e7..55bca8f6e8ed 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,20 @@ struct fstrim_range {
>  	__u64 minlen;
>  };
>  
> +/* Maximum no of entries supported */
> +#define MAX_COPY_NR_RANGE	(1 << 12)
> +
> +/* maximum total copy length */
> +#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
> +
> +/* Source range entry for copy */
> +struct range_entry {
> +	__u64 src;
> +	__u64 dst;
> +	__u64 len;
> +	__u64 comp_len;
> +};
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
>  #define FILE_DEDUPE_RANGE_SAME		0
>  #define FILE_DEDUPE_RANGE_DIFFERS	1


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 02/10] block: Introduce queue limits for copy-offload support
  2022-02-08  7:01                   ` [dm-devel] " Damien Le Moal
@ 2022-02-08 18:43                     ` Nitesh Shetty
  -1 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-08 18:43 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: mpatocka, javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	joshi.k, arnav.dawn, SelvaKumar S, nitheshshetty

[-- Attachment #1: Type: text/plain, Size: 7627 bytes --]

On Tue, Feb 08, 2022 at 04:01:23PM +0900, Damien Le Moal wrote:
> On 2/7/22 23:13, Nitesh Shetty wrote:
> > Add device limits as sysfs entries,
> >         - copy_offload (READ_WRITE)
> >         - max_copy_sectors (READ_ONLY)
>
> Why read-only ? With the name as you have it, it seems to be the soft
> control for the max size of copy operations rather than the actual
> device limit. So it would be better to align to other limits like max
> sectors/max_hw_sectors and have:
> 
> 	max_copy_sectors (RW)
> 	max_hw_copy_sectors (RO)
>

Idea was to have minimal number of sysfs.
We will add R/W limits in next series.

> >         - max_copy_ranges_sectors (READ_ONLY)
> >         - max_copy_nr_ranges (READ_ONLY)
> 
> Same for these.
> 
> > 
> > copy_offload(= 0), is disabled by default. This needs to be enabled if
> > copy-offload needs to be used.
> 
> How does this work ? This limit will be present for a DM device AND the
> underlying devices of the DM target. But "offload" applies only to the
> underlying devices, not the DM device...
> 
> Also, since this is not an underlying device limitation but an on/off
> switch, this should probably be moved to a request_queue boolean field
> or flag bit, controlled with sysfs.

copy_offload was used as a flag to switch between emulation/offload in
block layer.
Yeah, it makes sense to use request_queue flag for the same, also will
make dm limit calculations simple. Will add in next series.

> > max_copy_sectors = 0, indicates the device doesn't support native copy.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > ---
> >  block/blk-settings.c   |  4 ++++
> >  block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/blkdev.h | 12 ++++++++++
> >  3 files changed, 67 insertions(+)
> > 
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index b880c70e22e4..818454552cf8 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
> >  	lim->misaligned = 0;
> >  	lim->zoned = BLK_ZONED_NONE;
> >  	lim->zone_write_granularity = 0;
> > +	lim->copy_offload = 0;
> > +	lim->max_copy_sectors = 0;
> > +	lim->max_copy_nr_ranges = 0;
> > +	lim->max_copy_range_sectors = 0;
> >  }
> >  EXPORT_SYMBOL(blk_set_default_limits);
> >  
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 9f32882ceb2f..dc68ae6b55c9 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -171,6 +171,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
> >  	return queue_var_show(q->limits.discard_granularity, page);
> >  }
> >  
> > +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> > +{
> > +	return queue_var_show(q->limits.copy_offload, page);
> > +}
> > +
> > +static ssize_t queue_copy_offload_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long copy_offload;
> > +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (copy_offload && q->limits.max_copy_sectors == 0)
> > +		return -EINVAL;
> > +
> > +	if (copy_offload)
> > +		q->limits.copy_offload = BLK_COPY_OFFLOAD;
> > +	else
> > +		q->limits.copy_offload = 0;
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
> > +{
> > +	return queue_var_show(q->limits.max_copy_sectors, page);
> > +}
> > +
> > +static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
> > +		char *page)
> > +{
> > +	return queue_var_show(q->limits.max_copy_range_sectors, page);
> > +}
> > +
> > +static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
> > +		char *page)
> > +{
> > +	return queue_var_show(q->limits.max_copy_nr_ranges, page);
> > +}
> > +
> >  static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
> >  {
> >  
> > @@ -597,6 +639,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
> >  QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
> >  QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
> >  
> > +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
> > +QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
> > +QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
> > +QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
> > +
> >  QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
> >  QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
> >  QUEUE_RW_ENTRY(queue_poll, "io_poll");
> > @@ -643,6 +690,10 @@ static struct attribute *queue_attrs[] = {
> >  	&queue_discard_max_entry.attr,
> >  	&queue_discard_max_hw_entry.attr,
> >  	&queue_discard_zeroes_data_entry.attr,
> > +	&queue_copy_offload_entry.attr,
> > +	&queue_max_copy_sectors_entry.attr,
> > +	&queue_max_copy_range_sectors_entry.attr,
> > +	&queue_max_copy_nr_ranges_entry.attr,
> >  	&queue_write_same_max_entry.attr,
> >  	&queue_write_zeroes_max_entry.attr,
> >  	&queue_zone_append_max_entry.attr,
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index efed3820cbf7..f63ae50f1de3 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -51,6 +51,12 @@ extern struct class block_class;
> >  /* Doing classic polling */
> >  #define BLK_MQ_POLL_CLASSIC -1
> >  
> > +/* Define copy offload options */
> > +enum blk_copy {
> > +	BLK_COPY_EMULATE = 0,
> > +	BLK_COPY_OFFLOAD,
> > +};
> > +
> >  /*
> >   * Maximum number of blkcg policies allowed to be registered concurrently.
> >   * Defined here to simplify include dependency.
> > @@ -253,6 +259,10 @@ struct queue_limits {
> >  	unsigned int		discard_granularity;
> >  	unsigned int		discard_alignment;
> >  	unsigned int		zone_write_granularity;
> > +	unsigned int            copy_offload;
> > +	unsigned int            max_copy_sectors;
> > +	unsigned short          max_copy_range_sectors;
> > +	unsigned short          max_copy_nr_ranges;
> >  
> >  	unsigned short		max_segments;
> >  	unsigned short		max_integrity_segments;
> > @@ -562,6 +572,7 @@ struct request_queue {
> >  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
> >  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
> >  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> > +#define QUEUE_FLAG_COPY		30	/* supports copy offload */
> 
> Then what is the point of max_copy_sectors limit ? You can test support
> by the device by looking at max_copy_sectors != 0, no ? This flag is
> duplicated information.
> I would rather use it for the on/off switch for the copy offload,
> removing the copy_offload limit.
> 

Same as above

> >  
> >  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
> >  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> > @@ -585,6 +596,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
> >  #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
> >  #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
> >  #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
> > +#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
> >  #define blk_queue_zone_resetall(q)	\
> >  	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
> >  #define blk_queue_secure_erase(q) \
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
> 

--
Thank you
Nitesh

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 02/10] block: Introduce queue limits for copy-offload support
@ 2022-02-08 18:43                     ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-08 18:43 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, nitheshshetty, hch, roland, dsterba, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, mpatocka, kbusch,
	Frederick.Knight, axboe, tytso, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

[-- Attachment #1: Type: text/plain, Size: 7627 bytes --]

On Tue, Feb 08, 2022 at 04:01:23PM +0900, Damien Le Moal wrote:
> On 2/7/22 23:13, Nitesh Shetty wrote:
> > Add device limits as sysfs entries,
> >         - copy_offload (READ_WRITE)
> >         - max_copy_sectors (READ_ONLY)
>
> Why read-only ? With the name as you have it, it seems to be the soft
> control for the max size of copy operations rather than the actual
> device limit. So it would be better to align to other limits like max
> sectors/max_hw_sectors and have:
> 
> 	max_copy_sectors (RW)
> 	max_hw_copy_sectors (RO)
>

Idea was to have minimal number of sysfs.
We will add R/W limits in next series.

> >         - max_copy_ranges_sectors (READ_ONLY)
> >         - max_copy_nr_ranges (READ_ONLY)
> 
> Same for these.
> 
> > 
> > copy_offload(= 0), is disabled by default. This needs to be enabled if
> > copy-offload needs to be used.
> 
> How does this work ? This limit will be present for a DM device AND the
> underlying devices of the DM target. But "offload" applies only to the
> underlying devices, not the DM device...
> 
> Also, since this is not an underlying device limitation but an on/off
> switch, this should probably be moved to a request_queue boolean field
> or flag bit, controlled with sysfs.

copy_offload was used as a flag to switch between emulation/offload in
block layer.
Yeah, it makes sense to use request_queue flag for the same, also will
make dm limit calculations simple. Will add in next series.

> > max_copy_sectors = 0, indicates the device doesn't support native copy.
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > ---
> >  block/blk-settings.c   |  4 ++++
> >  block/blk-sysfs.c      | 51 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/blkdev.h | 12 ++++++++++
> >  3 files changed, 67 insertions(+)
> > 
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index b880c70e22e4..818454552cf8 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -57,6 +57,10 @@ void blk_set_default_limits(struct queue_limits *lim)
> >  	lim->misaligned = 0;
> >  	lim->zoned = BLK_ZONED_NONE;
> >  	lim->zone_write_granularity = 0;
> > +	lim->copy_offload = 0;
> > +	lim->max_copy_sectors = 0;
> > +	lim->max_copy_nr_ranges = 0;
> > +	lim->max_copy_range_sectors = 0;
> >  }
> >  EXPORT_SYMBOL(blk_set_default_limits);
> >  
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 9f32882ceb2f..dc68ae6b55c9 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -171,6 +171,48 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
> >  	return queue_var_show(q->limits.discard_granularity, page);
> >  }
> >  
> > +static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
> > +{
> > +	return queue_var_show(q->limits.copy_offload, page);
> > +}
> > +
> > +static ssize_t queue_copy_offload_store(struct request_queue *q,
> > +				       const char *page, size_t count)
> > +{
> > +	unsigned long copy_offload;
> > +	ssize_t ret = queue_var_store(&copy_offload, page, count);
> > +
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (copy_offload && q->limits.max_copy_sectors == 0)
> > +		return -EINVAL;
> > +
> > +	if (copy_offload)
> > +		q->limits.copy_offload = BLK_COPY_OFFLOAD;
> > +	else
> > +		q->limits.copy_offload = 0;
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page)
> > +{
> > +	return queue_var_show(q->limits.max_copy_sectors, page);
> > +}
> > +
> > +static ssize_t queue_max_copy_range_sectors_show(struct request_queue *q,
> > +		char *page)
> > +{
> > +	return queue_var_show(q->limits.max_copy_range_sectors, page);
> > +}
> > +
> > +static ssize_t queue_max_copy_nr_ranges_show(struct request_queue *q,
> > +		char *page)
> > +{
> > +	return queue_var_show(q->limits.max_copy_nr_ranges, page);
> > +}
> > +
> >  static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
> >  {
> >  
> > @@ -597,6 +639,11 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
> >  QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
> >  QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
> >  
> > +QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
> > +QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors");
> > +QUEUE_RO_ENTRY(queue_max_copy_range_sectors, "max_copy_range_sectors");
> > +QUEUE_RO_ENTRY(queue_max_copy_nr_ranges, "max_copy_nr_ranges");
> > +
> >  QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
> >  QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
> >  QUEUE_RW_ENTRY(queue_poll, "io_poll");
> > @@ -643,6 +690,10 @@ static struct attribute *queue_attrs[] = {
> >  	&queue_discard_max_entry.attr,
> >  	&queue_discard_max_hw_entry.attr,
> >  	&queue_discard_zeroes_data_entry.attr,
> > +	&queue_copy_offload_entry.attr,
> > +	&queue_max_copy_sectors_entry.attr,
> > +	&queue_max_copy_range_sectors_entry.attr,
> > +	&queue_max_copy_nr_ranges_entry.attr,
> >  	&queue_write_same_max_entry.attr,
> >  	&queue_write_zeroes_max_entry.attr,
> >  	&queue_zone_append_max_entry.attr,
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index efed3820cbf7..f63ae50f1de3 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -51,6 +51,12 @@ extern struct class block_class;
> >  /* Doing classic polling */
> >  #define BLK_MQ_POLL_CLASSIC -1
> >  
> > +/* Define copy offload options */
> > +enum blk_copy {
> > +	BLK_COPY_EMULATE = 0,
> > +	BLK_COPY_OFFLOAD,
> > +};
> > +
> >  /*
> >   * Maximum number of blkcg policies allowed to be registered concurrently.
> >   * Defined here to simplify include dependency.
> > @@ -253,6 +259,10 @@ struct queue_limits {
> >  	unsigned int		discard_granularity;
> >  	unsigned int		discard_alignment;
> >  	unsigned int		zone_write_granularity;
> > +	unsigned int            copy_offload;
> > +	unsigned int            max_copy_sectors;
> > +	unsigned short          max_copy_range_sectors;
> > +	unsigned short          max_copy_nr_ranges;
> >  
> >  	unsigned short		max_segments;
> >  	unsigned short		max_integrity_segments;
> > @@ -562,6 +572,7 @@ struct request_queue {
> >  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
> >  #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
> >  #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
> > +#define QUEUE_FLAG_COPY		30	/* supports copy offload */
> 
> Then what is the point of max_copy_sectors limit ? You can test support
> by the device by looking at max_copy_sectors != 0, no ? This flag is
> duplicated information.
> I would rather use it for the on/off switch for the copy offload,
> removing the copy_offload limit.
> 

Same as above

> >  
> >  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
> >  				 (1 << QUEUE_FLAG_SAME_COMP) |		\
> > @@ -585,6 +596,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
> >  #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
> >  #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
> >  #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
> > +#define blk_queue_copy(q)	test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
> >  #define blk_queue_zone_resetall(q)	\
> >  	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
> >  #define blk_queue_secure_erase(q) \
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
> 

--
Thank you
Nitesh

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
  (?)
@ 2022-02-09  7:48   ` Dan Carpenter
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-08 23:31 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 9491 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20220207141348.4235-4-nj.shetty@samsung.com>
References: <20220207141348.4235-4-nj.shetty@samsung.com>
TO: Nitesh Shetty <nj.shetty@samsung.com>
TO: mpatocka(a)redhat.com
CC: javier(a)javigon.com
CC: chaitanyak(a)nvidia.com
CC: linux-block(a)vger.kernel.org
CC: linux-scsi(a)vger.kernel.org
CC: dm-devel(a)redhat.com
CC: linux-nvme(a)lists.infradead.org
CC: linux-fsdevel(a)vger.kernel.org
CC: axboe(a)kernel.dk
CC: msnitzer(a)redhat.com
CC: bvanassche(a)acm.org
CC: martin.petersen(a)oracle.com
CC: roland(a)purestorage.com
CC: hare(a)suse.de
CC: kbusch(a)kernel.org
CC: hch(a)lst.de
CC: Frederick.Knight(a)netapp.com
CC: zach.brown(a)ni.com
CC: osandov(a)fb.com
CC: lsf-pc(a)lists.linux-foundation.org
CC: djwong(a)kernel.org
CC: josef(a)toxicpanda.com
CC: clm(a)fb.com
CC: dsterba(a)suse.com
CC: tytso(a)mit.edu
CC: jack(a)suse.com
CC: joshi.k(a)samsung.com
CC: arnav.dawn(a)samsung.com
CC: nj.shetty(a)samsung.com
CC: SelvaKumar S <selvakuma.s1@samsung.com>

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
:::::: branch date: 32 hours ago
:::::: commit date: 32 hours ago
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220209/202202090703.U5riBMIn-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'

vim +/ctx +272 block/blk-lib.c

12a9801a7301f1 Nitesh Shetty 2022-02-07  180  
12a9801a7301f1 Nitesh Shetty 2022-02-07  181  /*
12a9801a7301f1 Nitesh Shetty 2022-02-07  182   * blk_copy_offload	- Use device's native copy offload feature
12a9801a7301f1 Nitesh Shetty 2022-02-07  183   * Go through user provide payload, prepare new payload based on device's copy offload limits.
12a9801a7301f1 Nitesh Shetty 2022-02-07  184   */
12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }
12a9801a7301f1 Nitesh Shetty 2022-02-07  274  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 04/10] block: Introduce a new ioctl for copy
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-09  3:39                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-09  3:39 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, tytso,
	Frederick.Knight, axboe, kbuild-all, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-c001 (https://download.01.org/0day-ci/archive/20220209/202202091048.qDQvi6ab-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


cocci warnings: (new ones prefixed by >>)
>> block/ioctl.c:145:10-17: WARNING opportunity for memdup_user

vim +145 block/ioctl.c

   126	
   127	static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
   128			unsigned long arg)
   129	{
   130		struct copy_range crange, *ranges;
   131		size_t payload_size = 0;
   132		int ret;
   133	
   134		if (!(mode & FMODE_WRITE))
   135			return -EBADF;
   136	
   137		if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
   138			return -EFAULT;
   139	
   140		if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
   141			return -EINVAL;
   142	
   143		payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
   144	
 > 145		ranges = kmalloc(payload_size, GFP_KERNEL);
   146		if (!ranges)
   147			return -ENOMEM;
   148	
   149		if (copy_from_user(ranges, (void __user *)arg, payload_size)) {
   150			ret = -EFAULT;
   151			goto out;
   152		}
   153	
   154		ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL, 0);
   155		if (copy_to_user((void __user *)arg, ranges, payload_size))
   156			ret = -EFAULT;
   157	out:
   158		kfree(ranges);
   159		return ret;
   160	}
   161	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 04/10] block: Introduce a new ioctl for copy
@ 2022-02-09  3:39                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-09  3:39 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-c001 (https://download.01.org/0day-ci/archive/20220209/202202091048.qDQvi6ab-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


cocci warnings: (new ones prefixed by >>)
>> block/ioctl.c:145:10-17: WARNING opportunity for memdup_user

vim +145 block/ioctl.c

   126	
   127	static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
   128			unsigned long arg)
   129	{
   130		struct copy_range crange, *ranges;
   131		size_t payload_size = 0;
   132		int ret;
   133	
   134		if (!(mode & FMODE_WRITE))
   135			return -EBADF;
   136	
   137		if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
   138			return -EFAULT;
   139	
   140		if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
   141			return -EINVAL;
   142	
   143		payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
   144	
 > 145		ranges = kmalloc(payload_size, GFP_KERNEL);
   146		if (!ranges)
   147			return -ENOMEM;
   148	
   149		if (copy_from_user(ranges, (void __user *)arg, payload_size)) {
   150			ret = -EFAULT;
   151			goto out;
   152		}
   153	
   154		ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL, 0);
   155		if (copy_to_user((void __user *)arg, ranges, payload_size))
   156			ret = -EFAULT;
   157	out:
   158		kfree(ranges);
   159		return ret;
   160	}
   161	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 04/10] block: Introduce a new ioctl for copy
@ 2022-02-09  3:39                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-09  3:39 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2257 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20220208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-c001 (https://download.01.org/0day-ci/archive/20220209/202202091048.qDQvi6ab-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


cocci warnings: (new ones prefixed by >>)
>> block/ioctl.c:145:10-17: WARNING opportunity for memdup_user

vim +145 block/ioctl.c

   126	
   127	static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
   128			unsigned long arg)
   129	{
   130		struct copy_range crange, *ranges;
   131		size_t payload_size = 0;
   132		int ret;
   133	
   134		if (!(mode & FMODE_WRITE))
   135			return -EBADF;
   136	
   137		if (copy_from_user(&crange, (void __user *)arg, sizeof(crange)))
   138			return -EFAULT;
   139	
   140		if (unlikely(!crange.nr_range || crange.reserved || crange.nr_range >= MAX_COPY_NR_RANGE))
   141			return -EINVAL;
   142	
   143		payload_size = (crange.nr_range * sizeof(struct range_entry)) + sizeof(crange);
   144	
 > 145		ranges = kmalloc(payload_size, GFP_KERNEL);
   146		if (!ranges)
   147			return -ENOMEM;
   148	
   149		if (copy_from_user(ranges, (void __user *)arg, payload_size)) {
   150			ret = -EFAULT;
   151			goto out;
   152		}
   153	
   154		ret = blkdev_issue_copy(bdev, ranges->nr_range, ranges->range_list, bdev, GFP_KERNEL, 0);
   155		if (copy_to_user((void __user *)arg, ranges, payload_size))
   156			ret = -EFAULT;
   157	out:
   158		kfree(ranges);
   159		return ret;
   160	}
   161	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-09  7:48   ` Dan Carpenter
  0 siblings, 0 replies; 272+ messages in thread
From: Dan Carpenter @ 2022-02-09  7:48 UTC (permalink / raw)
  To: kbuild, Nitesh Shetty, mpatocka
  Cc: lkp, kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S

Hi Nitesh,

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220209/202202090703.U5riBMIn-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'

vim +/ctx +272 block/blk-lib.c

12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {

Please call kfree(ctx) before the goto.

12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-09  7:48   ` Dan Carpenter
  0 siblings, 0 replies; 272+ messages in thread
From: Dan Carpenter @ 2022-02-09  7:48 UTC (permalink / raw)
  To: kbuild, Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, lkp,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	tytso, Frederick.Knight, bvanassche, axboe, kbuild-all, joshi.k,
	martin.petersen, arnav.dawn, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220209/202202090703.U5riBMIn-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'

vim +/ctx +272 block/blk-lib.c

12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {

Please call kfree(ctx) before the goto.

12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-09  7:48   ` Dan Carpenter
  0 siblings, 0 replies; 272+ messages in thread
From: Dan Carpenter @ 2022-02-09  7:48 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7635 bytes --]

Hi Nitesh,

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220209/202202090703.U5riBMIn-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'

vim +/ctx +272 block/blk-lib.c

12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {

Please call kfree(ctx) before the goto.

12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
  2022-02-08  7:21                   ` [dm-devel] " Damien Le Moal
@ 2022-02-09 10:22                     ` Nitesh Shetty
  -1 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-09 10:22 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: mpatocka, javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, hare, kbusch, hch, Frederick.Knight, osandov,
	djwong, josef, clm, dsterba, tytso, jack, joshi.k, arnav.dawn,
	SelvaKumar S

[-- Attachment #1: Type: text/plain, Size: 13602 bytes --]

O Tue, Feb 08, 2022 at 04:21:19PM +0900, Damien Le Moal wrote:
> On 2/7/22 23:13, Nitesh Shetty wrote:
> > Introduce blkdev_issue_copy which supports source and destination bdevs,
> > and a array of (source, destination and copy length) tuples.
> 
> s/a/an
>

acked

> > Introduce REQ_COP copy offload operation flag. Create a read-write
> 
> REQ_COPY ?
>

acked

> > bio pair with a token as payload and submitted to the device in order.
> > the read request populates token with source specific information which
> > is then passed with write request.
> > Ths design is courtsey Mikulas Patocka<mpatocka@>'s token based copy
> 
> s/Ths design is courtsey/This design is courtesy of
>

acked

> > 
> > Larger copy operation may be divided if necessary by looking at device
> > limits.
> 
> may or will ?
> by looking at -> depending on the ?
> 

Larger copy will be divided, based on max_copy_sectors,max_copy_range_sector
limits. Will add in next series.

> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >  block/blk-lib.c           | 216 ++++++++++++++++++++++++++++++++++++++
> >  block/blk-settings.c      |   2 +
> >  block/blk.h               |   2 +
> >  include/linux/blk_types.h |  20 ++++
> >  include/linux/blkdev.h    |   3 +
> >  include/uapi/linux/fs.h   |  14 +++
> >  6 files changed, 257 insertions(+)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 1b8ced45e4e5..3ae2c27b566e 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -135,6 +135,222 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  }
> >  EXPORT_SYMBOL(blkdev_issue_discard);
> >  
> > +/*
> > + * Wait on and process all in-flight BIOs.  This must only be called once
> > + * all bios have been issued so that the refcount can only decrease.
> > + * This just waits for all bios to make it through bio_copy_end_io. IO
> > + * errors are propagated through cio->io_error.
> > + */
> > +static int cio_await_completion(struct cio *cio)
> > +{
> > +	int ret = 0;
> > +
> > +	while (atomic_read(&cio->refcount)) {
> > +		cio->waiter = current;
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		blk_io_schedule();
> > +		/* wake up sets us TASK_RUNNING */
> > +		cio->waiter = NULL;
> > +		ret = cio->io_err;
> 
> Why is this in the loop ?
>

agree.

> > +	}
> > +	kvfree(cio);
> > +
> > +	return ret;
> > +}
> > +
> > +static void bio_copy_end_io(struct bio *bio)
> > +{
> > +	struct copy_ctx *ctx = bio->bi_private;
> > +	struct cio *cio = ctx->cio;
> > +	sector_t clen;
> > +	int ri = ctx->range_idx;
> > +
> > +	if (bio->bi_status) {
> > +		cio->io_err = bio->bi_status;
> > +		clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT;
> > +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> > +	}
> > +	__free_page(bio->bi_io_vec[0].bv_page);
> > +	kfree(ctx);
> > +	bio_put(bio);
> > +
> > +	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
> > +		wake_up_process(cio->waiter);
> 
> This looks racy: the cio->waiter test and wakeup are not atomic.

agreed, will remove atomic for refcount and add if check and wakeup in locks
in next version.

> > +}
> > +
> > +/*
> > + * blk_copy_offload	- Use device's native copy offload feature
> > + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> > + */
> > +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> > +{
> > +	struct request_queue *sq = bdev_get_queue(src_bdev);
> > +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> > +	struct bio *read_bio, *write_bio;
> > +	struct copy_ctx *ctx;
> > +	struct cio *cio;
> > +	struct page *token;
> > +	sector_t src_blk, copy_len, dst_blk;
> > +	sector_t remaining, max_copy_len = LONG_MAX;
> > +	int ri = 0, ret = 0;
> > +
> > +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> > +	if (!cio)
> > +		return -ENOMEM;
> > +	atomic_set(&cio->refcount, 0);
> > +	cio->rlist = rlist;
> > +
> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> > +			(sector_t)dq->limits.max_copy_sectors);
> 
> sq->limits.max_copy_sectors is already by definition smaller than
> LONG_MAX, so there is no need for the min3 here.
>

acked

> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> > +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;> +
> > +	for (ri = 0; ri < nr_srcs; ri++) {
> > +		cio->rlist[ri].comp_len = rlist[ri].len;
> > +		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> > +			remaining > 0;
> > +			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
> 
> This is unreadable.
> 

Sure, I will simplify the loops in next version.

> > +			copy_len = min(remaining, max_copy_len);
> > +
> > +			token = alloc_page(gfp_mask);
> > +			if (unlikely(!token)) {
> > +				ret = -ENOMEM;
> > +				goto err_token;
> > +			}
> > +
> > +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!read_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> > +			read_bio->bi_iter.bi_size = copy_len;
> > +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> > +			ret = submit_bio_wait(read_bio);
> > +			if (ret) {
> > +				bio_put(read_bio);
> > +				goto err_read_bio;
> > +			}
> > +			bio_put(read_bio);
> > +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> > +			if (!ctx) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> 
> This should be done before the read.
>

acked.

> > +			ctx->cio = cio;
> > +			ctx->range_idx = ri;
> > +			ctx->start_sec = rlist[ri].src;
> > +
> > +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!write_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +
> > +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> > +			write_bio->bi_iter.bi_size = copy_len;
> > +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> > +			write_bio->bi_end_io = bio_copy_end_io;
> > +			write_bio->bi_private = ctx;
> > +			atomic_inc(&cio->refcount);
> > +			submit_bio(write_bio);
> > +		}
> > +	}
> > +
> > +	/* Wait for completion of all IO's*/
> > +	return cio_await_completion(cio);
> > +
> > +err_read_bio:
> > +	__free_page(token);
> > +err_token:
> > +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> > +
> > +	cio->io_err = ret;
> > +	return cio_await_completion(cio);
> > +}
> > +
> > +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> > +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> > +{
> > +	unsigned int align_mask = max(
> > +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> > +	sector_t len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < nr; i++) {
> > +		if (rlist[i].len)
> > +			len += rlist[i].len;
> > +		else
> > +			return -EINVAL;
> > +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> > +				(rlist[i].len & align_mask))
> > +			return -EINVAL;
> > +		rlist[i].comp_len = 0;
> > +	}
> > +
> > +	if (!len && len >= MAX_COPY_TOTAL_LENGTH)
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +
> > +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> > +		struct request_queue *dest_q)
> > +{
> > +	if (dest_q->limits.copy_offload == BLK_COPY_OFFLOAD &&
> > +			src_q->limits.copy_offload == BLK_COPY_OFFLOAD)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * blkdev_issue_copy - queue a copy
> > + * @src_bdev:	source block device
> > + * @nr_srcs:	number of source ranges to copy
> > + * @src_rlist:	array of source ranges
> > + * @dest_bdev:	destination block device
> > + * @gfp_mask:   memory allocation flags (for bio_alloc)
> > + * @flags:	BLKDEV_COPY_* flags to control behaviour
> > + *
> > + * Description:
> > + *	Copy source ranges from source block device to destination block device.
> > + *	length of a source range cannot be zero.
> > + */
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> > +		struct range_entry *rlist, struct block_device *dest_bdev,
> > +		gfp_t gfp_mask, int flags)
> > +{
> > +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> > +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> > +	int ret = -EINVAL;
> > +
> > +	if (!src_q || !dest_q)
> > +		return -ENXIO;
> > +
> > +	if (!nr)
> > +		return -EINVAL;
> > +
> > +	if (nr >= MAX_COPY_NR_RANGE)
> > +		return -EINVAL;
> > +
> > +	if (bdev_read_only(dest_bdev))
> > +		return -EPERM;
> > +
> > +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (blk_check_copy_offload(src_q, dest_q))
> > +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(blkdev_issue_copy);
> > +
> >  /**
> >   * __blkdev_issue_write_same - generate number of bios with same page
> >   * @bdev:	target blockdev
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 818454552cf8..4c8d48b8af25 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -545,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >  	t->max_segment_size = min_not_zero(t->max_segment_size,
> >  					   b->max_segment_size);
> >  
> > +	t->max_copy_sectors = min_not_zero(t->max_copy_sectors, b->max_copy_sectors);
> 
> Why min_not_zero ? If one of the underlying drive does not support copy
> offload, you cannot report that the top drive does.
>

agreed. Will update in next series.

> > +
> >  	t->misaligned |= b->misaligned;
> >  
> >  	alignment = queue_limit_alignment_offset(b, start);
> > diff --git a/block/blk.h b/block/blk.h
> > index abb663a2a147..94d2b055750b 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
> >  		break;
> >  	}
> >  
> > +	if (unlikely(op_is_copy(bio->bi_opf)))
> > +		return false;
> >  	/*
> >  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
> >  	 * This is a quick and dirty check that relies on the fact that
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index 5561e58d158a..0a3fee8ad61c 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -418,6 +418,7 @@ enum req_flag_bits {
> >  	/* for driver use */
> >  	__REQ_DRV,
> >  	__REQ_SWAP,		/* swapping request. */
> > +	__REQ_COPY,		/* copy request*/
> >  	__REQ_NR_BITS,		/* stops here */
> >  };
> >  
> > @@ -442,6 +443,7 @@ enum req_flag_bits {
> >  
> >  #define REQ_DRV			(1ULL << __REQ_DRV)
> >  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> > +#define REQ_COPY		(1ULL << __REQ_COPY)
> >  
> >  #define REQ_FAILFAST_MASK \
> >  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> > @@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op)
> >  	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
> >  }
> >  
> > +static inline bool op_is_copy(unsigned int op)
> > +{
> > +	return (op & REQ_COPY);
> > +}
> > +
> >  /*
> >   * Check if a bio or request operation is a zone management operation, with
> >   * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
> > @@ -532,4 +539,17 @@ struct blk_rq_stat {
> >  	u64 batch;
> >  };
> >  
> > +struct cio {
> > +	atomic_t refcount;
> > +	blk_status_t io_err;
> > +	struct range_entry *rlist;
> > +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> > +};
> > +
> > +struct copy_ctx {
> > +	int range_idx;
> > +	sector_t start_sec;
> > +	struct cio *cio;
> > +};
> > +
> >  #endif /* __LINUX_BLK_TYPES_H */
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index f63ae50f1de3..15597488040c 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -1120,6 +1120,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  		struct bio **biop);
> >  struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
> >  		gfp_t gfp_mask);
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *src_rlist, struct block_device *dest_bdev,
> > +		gfp_t gfp_mask, int flags);
> >  
> >  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
> >  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index bdf7b404b3e7..55bca8f6e8ed 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -64,6 +64,20 @@ struct fstrim_range {
> >  	__u64 minlen;
> >  };
> >  
> > +/* Maximum no of entries supported */
> > +#define MAX_COPY_NR_RANGE	(1 << 12)
> > +
> > +/* maximum total copy length */
> > +#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
> > +
> > +/* Source range entry for copy */
> > +struct range_entry {
> > +	__u64 src;
> > +	__u64 dst;
> > +	__u64 len;
> > +	__u64 comp_len;
> > +};
> > +
> >  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
> >  #define FILE_DEDUPE_RANGE_SAME		0
> >  #define FILE_DEDUPE_RANGE_DIFFERS	1
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
>

 -- 
Thank you
Nitesh

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-09 10:22                     ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-09 10:22 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, chaitanyak, SelvaKumar S, msnitzer, josef,
	linux-block, mpatocka, dsterba, Frederick.Knight, axboe, tytso,
	joshi.k, martin.petersen, arnav.dawn, kbusch, jack,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 13602 bytes --]

O Tue, Feb 08, 2022 at 04:21:19PM +0900, Damien Le Moal wrote:
> On 2/7/22 23:13, Nitesh Shetty wrote:
> > Introduce blkdev_issue_copy which supports source and destination bdevs,
> > and a array of (source, destination and copy length) tuples.
> 
> s/a/an
>

acked

> > Introduce REQ_COP copy offload operation flag. Create a read-write
> 
> REQ_COPY ?
>

acked

> > bio pair with a token as payload and submitted to the device in order.
> > the read request populates token with source specific information which
> > is then passed with write request.
> > Ths design is courtsey Mikulas Patocka<mpatocka@>'s token based copy
> 
> s/Ths design is courtsey/This design is courtesy of
>

acked

> > 
> > Larger copy operation may be divided if necessary by looking at device
> > limits.
> 
> may or will ?
> by looking at -> depending on the ?
> 

Larger copy will be divided, based on max_copy_sectors,max_copy_range_sector
limits. Will add in next series.

> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> > Signed-off-by: Arnav Dawn <arnav.dawn@samsung.com>
> > ---
> >  block/blk-lib.c           | 216 ++++++++++++++++++++++++++++++++++++++
> >  block/blk-settings.c      |   2 +
> >  block/blk.h               |   2 +
> >  include/linux/blk_types.h |  20 ++++
> >  include/linux/blkdev.h    |   3 +
> >  include/uapi/linux/fs.h   |  14 +++
> >  6 files changed, 257 insertions(+)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 1b8ced45e4e5..3ae2c27b566e 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -135,6 +135,222 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  }
> >  EXPORT_SYMBOL(blkdev_issue_discard);
> >  
> > +/*
> > + * Wait on and process all in-flight BIOs.  This must only be called once
> > + * all bios have been issued so that the refcount can only decrease.
> > + * This just waits for all bios to make it through bio_copy_end_io. IO
> > + * errors are propagated through cio->io_error.
> > + */
> > +static int cio_await_completion(struct cio *cio)
> > +{
> > +	int ret = 0;
> > +
> > +	while (atomic_read(&cio->refcount)) {
> > +		cio->waiter = current;
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		blk_io_schedule();
> > +		/* wake up sets us TASK_RUNNING */
> > +		cio->waiter = NULL;
> > +		ret = cio->io_err;
> 
> Why is this in the loop ?
>

agree.

> > +	}
> > +	kvfree(cio);
> > +
> > +	return ret;
> > +}
> > +
> > +static void bio_copy_end_io(struct bio *bio)
> > +{
> > +	struct copy_ctx *ctx = bio->bi_private;
> > +	struct cio *cio = ctx->cio;
> > +	sector_t clen;
> > +	int ri = ctx->range_idx;
> > +
> > +	if (bio->bi_status) {
> > +		cio->io_err = bio->bi_status;
> > +		clen = (bio->bi_iter.bi_sector - ctx->start_sec) << SECTOR_SHIFT;
> > +		cio->rlist[ri].comp_len = min_t(sector_t, clen, cio->rlist[ri].comp_len);
> > +	}
> > +	__free_page(bio->bi_io_vec[0].bv_page);
> > +	kfree(ctx);
> > +	bio_put(bio);
> > +
> > +	if (atomic_dec_and_test(&cio->refcount) && cio->waiter)
> > +		wake_up_process(cio->waiter);
> 
> This looks racy: the cio->waiter test and wakeup are not atomic.

agreed, will remove atomic for refcount and add if check and wakeup in locks
in next version.

> > +}
> > +
> > +/*
> > + * blk_copy_offload	- Use device's native copy offload feature
> > + * Go through user provide payload, prepare new payload based on device's copy offload limits.
> > + */
> > +int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> > +{
> > +	struct request_queue *sq = bdev_get_queue(src_bdev);
> > +	struct request_queue *dq = bdev_get_queue(dst_bdev);
> > +	struct bio *read_bio, *write_bio;
> > +	struct copy_ctx *ctx;
> > +	struct cio *cio;
> > +	struct page *token;
> > +	sector_t src_blk, copy_len, dst_blk;
> > +	sector_t remaining, max_copy_len = LONG_MAX;
> > +	int ri = 0, ret = 0;
> > +
> > +	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> > +	if (!cio)
> > +		return -ENOMEM;
> > +	atomic_set(&cio->refcount, 0);
> > +	cio->rlist = rlist;
> > +
> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> > +			(sector_t)dq->limits.max_copy_sectors);
> 
> sq->limits.max_copy_sectors is already by definition smaller than
> LONG_MAX, so there is no need for the min3 here.
>

acked

> > +	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> > +			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;> +
> > +	for (ri = 0; ri < nr_srcs; ri++) {
> > +		cio->rlist[ri].comp_len = rlist[ri].len;
> > +		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> > +			remaining > 0;
> > +			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
> 
> This is unreadable.
> 

Sure, I will simplify the loops in next version.

> > +			copy_len = min(remaining, max_copy_len);
> > +
> > +			token = alloc_page(gfp_mask);
> > +			if (unlikely(!token)) {
> > +				ret = -ENOMEM;
> > +				goto err_token;
> > +			}
> > +
> > +			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!read_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> > +			read_bio->bi_iter.bi_size = copy_len;
> > +			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> > +			ret = submit_bio_wait(read_bio);
> > +			if (ret) {
> > +				bio_put(read_bio);
> > +				goto err_read_bio;
> > +			}
> > +			bio_put(read_bio);
> > +			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> > +			if (!ctx) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> 
> This should be done before the read.
>

acked.

> > +			ctx->cio = cio;
> > +			ctx->range_idx = ri;
> > +			ctx->start_sec = rlist[ri].src;
> > +
> > +			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> > +					gfp_mask);
> > +			if (!write_bio) {
> > +				ret = -ENOMEM;
> > +				goto err_read_bio;
> > +			}
> > +
> > +			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> > +			write_bio->bi_iter.bi_size = copy_len;
> > +			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> > +			write_bio->bi_end_io = bio_copy_end_io;
> > +			write_bio->bi_private = ctx;
> > +			atomic_inc(&cio->refcount);
> > +			submit_bio(write_bio);
> > +		}
> > +	}
> > +
> > +	/* Wait for completion of all IO's*/
> > +	return cio_await_completion(cio);
> > +
> > +err_read_bio:
> > +	__free_page(token);
> > +err_token:
> > +	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> > +
> > +	cio->io_err = ret;
> > +	return cio_await_completion(cio);
> > +}
> > +
> > +static inline int blk_copy_sanity_check(struct block_device *src_bdev,
> > +		struct block_device *dst_bdev, struct range_entry *rlist, int nr)
> > +{
> > +	unsigned int align_mask = max(
> > +			bdev_logical_block_size(dst_bdev), bdev_logical_block_size(src_bdev)) - 1;
> > +	sector_t len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < nr; i++) {
> > +		if (rlist[i].len)
> > +			len += rlist[i].len;
> > +		else
> > +			return -EINVAL;
> > +		if ((rlist[i].dst & align_mask) || (rlist[i].src & align_mask) ||
> > +				(rlist[i].len & align_mask))
> > +			return -EINVAL;
> > +		rlist[i].comp_len = 0;
> > +	}
> > +
> > +	if (!len && len >= MAX_COPY_TOTAL_LENGTH)
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +
> > +static inline bool blk_check_copy_offload(struct request_queue *src_q,
> > +		struct request_queue *dest_q)
> > +{
> > +	if (dest_q->limits.copy_offload == BLK_COPY_OFFLOAD &&
> > +			src_q->limits.copy_offload == BLK_COPY_OFFLOAD)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * blkdev_issue_copy - queue a copy
> > + * @src_bdev:	source block device
> > + * @nr_srcs:	number of source ranges to copy
> > + * @src_rlist:	array of source ranges
> > + * @dest_bdev:	destination block device
> > + * @gfp_mask:   memory allocation flags (for bio_alloc)
> > + * @flags:	BLKDEV_COPY_* flags to control behaviour
> > + *
> > + * Description:
> > + *	Copy source ranges from source block device to destination block device.
> > + *	length of a source range cannot be zero.
> > + */
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> > +		struct range_entry *rlist, struct block_device *dest_bdev,
> > +		gfp_t gfp_mask, int flags)
> > +{
> > +	struct request_queue *src_q = bdev_get_queue(src_bdev);
> > +	struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> > +	int ret = -EINVAL;
> > +
> > +	if (!src_q || !dest_q)
> > +		return -ENXIO;
> > +
> > +	if (!nr)
> > +		return -EINVAL;
> > +
> > +	if (nr >= MAX_COPY_NR_RANGE)
> > +		return -EINVAL;
> > +
> > +	if (bdev_read_only(dest_bdev))
> > +		return -EPERM;
> > +
> > +	ret = blk_copy_sanity_check(src_bdev, dest_bdev, rlist, nr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (blk_check_copy_offload(src_q, dest_q))
> > +		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(blkdev_issue_copy);
> > +
> >  /**
> >   * __blkdev_issue_write_same - generate number of bios with same page
> >   * @bdev:	target blockdev
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 818454552cf8..4c8d48b8af25 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -545,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >  	t->max_segment_size = min_not_zero(t->max_segment_size,
> >  					   b->max_segment_size);
> >  
> > +	t->max_copy_sectors = min_not_zero(t->max_copy_sectors, b->max_copy_sectors);
> 
> Why min_not_zero ? If one of the underlying drive does not support copy
> offload, you cannot report that the top drive does.
>

agreed. Will update in next series.

> > +
> >  	t->misaligned |= b->misaligned;
> >  
> >  	alignment = queue_limit_alignment_offset(b, start);
> > diff --git a/block/blk.h b/block/blk.h
> > index abb663a2a147..94d2b055750b 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -292,6 +292,8 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
> >  		break;
> >  	}
> >  
> > +	if (unlikely(op_is_copy(bio->bi_opf)))
> > +		return false;
> >  	/*
> >  	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
> >  	 * This is a quick and dirty check that relies on the fact that
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index 5561e58d158a..0a3fee8ad61c 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -418,6 +418,7 @@ enum req_flag_bits {
> >  	/* for driver use */
> >  	__REQ_DRV,
> >  	__REQ_SWAP,		/* swapping request. */
> > +	__REQ_COPY,		/* copy request*/
> >  	__REQ_NR_BITS,		/* stops here */
> >  };
> >  
> > @@ -442,6 +443,7 @@ enum req_flag_bits {
> >  
> >  #define REQ_DRV			(1ULL << __REQ_DRV)
> >  #define REQ_SWAP		(1ULL << __REQ_SWAP)
> > +#define REQ_COPY		(1ULL << __REQ_COPY)
> >  
> >  #define REQ_FAILFAST_MASK \
> >  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> > @@ -498,6 +500,11 @@ static inline bool op_is_discard(unsigned int op)
> >  	return (op & REQ_OP_MASK) == REQ_OP_DISCARD;
> >  }
> >  
> > +static inline bool op_is_copy(unsigned int op)
> > +{
> > +	return (op & REQ_COPY);
> > +}
> > +
> >  /*
> >   * Check if a bio or request operation is a zone management operation, with
> >   * the exception of REQ_OP_ZONE_RESET_ALL which is treated as a special case
> > @@ -532,4 +539,17 @@ struct blk_rq_stat {
> >  	u64 batch;
> >  };
> >  
> > +struct cio {
> > +	atomic_t refcount;
> > +	blk_status_t io_err;
> > +	struct range_entry *rlist;
> > +	struct task_struct *waiter;     /* waiting task (NULL if none) */
> > +};
> > +
> > +struct copy_ctx {
> > +	int range_idx;
> > +	sector_t start_sec;
> > +	struct cio *cio;
> > +};
> > +
> >  #endif /* __LINUX_BLK_TYPES_H */
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index f63ae50f1de3..15597488040c 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -1120,6 +1120,9 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  		struct bio **biop);
> >  struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
> >  		gfp_t gfp_mask);
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> > +		struct range_entry *src_rlist, struct block_device *dest_bdev,
> > +		gfp_t gfp_mask, int flags);
> >  
> >  #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
> >  #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index bdf7b404b3e7..55bca8f6e8ed 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -64,6 +64,20 @@ struct fstrim_range {
> >  	__u64 minlen;
> >  };
> >  
> > +/* Maximum no of entries supported */
> > +#define MAX_COPY_NR_RANGE	(1 << 12)
> > +
> > +/* maximum total copy length */
> > +#define MAX_COPY_TOTAL_LENGTH	(1 << 21)
> > +
> > +/* Source range entry for copy */
> > +struct range_entry {
> > +	__u64 src;
> > +	__u64 dst;
> > +	__u64 len;
> > +	__u64 comp_len;
> > +};
> > +
> >  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
> >  #define FILE_DEDUPE_RANGE_SAME		0
> >  #define FILE_DEDUPE_RANGE_DIFFERS	1
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
>

 -- 
Thank you
Nitesh

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
  2022-02-09  7:48   ` Dan Carpenter
  (?)
@ 2022-02-09 10:32     ` Nitesh Shetty
  -1 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-09 10:32 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: kbuild, mpatocka, kbuild-all, javier, chaitanyak, linux-block,
	linux-scsi, dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer,
	bvanassche, martin.petersen, hare, kbusch, hch, Frederick.Knight,
	osandov, lsf-pc, djwong, josef, clm, dsterba, tytso, jack,
	joshi.k, arnav.dawn, SelvaKumar S, nitheshshetty

[-- Attachment #1: Type: text/plain, Size: 8313 bytes --]

On Wed, Feb 09, 2022 at 10:48:44AM +0300, Dan Carpenter wrote:
> Hi Nitesh,
> 
> url:    https://protect2.fireeye.com/v1/url?k=483798a4-17aca1b5-483613eb-0cc47a31cdbc-db5fd22936f47f46&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Fgithub.com%2F0day-ci%2Flinux%2Fcommits%2FNitesh-Shetty%2Fblock-make-bio_map_kern-non-static%2F20220207-231407
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
> config: i386-randconfig-m021-20220207 (https://protect2.fireeye.com/v1/url?k=24e309ba-7b7830ab-24e282f5-0cc47a31cdbc-9cc4e76aaefa8c0d&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Fdownload.01.org%2F0day-ci%2Farchive%2F20220209%2F202202090703.U5riBMIn-lkp%40intel.com%2Fconfig)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> smatch warnings:
> block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'
> 
> vim +/ctx +272 block/blk-lib.c
>

acked

> 12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {
> 
> Please call kfree(ctx) before the goto.
> 
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://protect2.fireeye.com/v1/url?k=4cd82b59-13431248-4cd9a016-0cc47a31cdbc-7ef30a0abcb321a3&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Flists.01.org%2Fhyperkitty%2Flist%2Fkbuild-all%40lists.01.org
> 
> 
> 

--
Thank you
Nitesh

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-09 10:32     ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-09 10:32 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, nitheshshetty, hch, dsterba, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, mpatocka, kbusch,
	tytso, Frederick.Knight, axboe, kbuild-all, joshi.k,
	martin.petersen, kbuild, arnav.dawn, jack, linux-fsdevel, lsf-pc

[-- Attachment #1: Type: text/plain, Size: 8313 bytes --]

On Wed, Feb 09, 2022 at 10:48:44AM +0300, Dan Carpenter wrote:
> Hi Nitesh,
> 
> url:    https://protect2.fireeye.com/v1/url?k=483798a4-17aca1b5-483613eb-0cc47a31cdbc-db5fd22936f47f46&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Fgithub.com%2F0day-ci%2Flinux%2Fcommits%2FNitesh-Shetty%2Fblock-make-bio_map_kern-non-static%2F20220207-231407
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
> config: i386-randconfig-m021-20220207 (https://protect2.fireeye.com/v1/url?k=24e309ba-7b7830ab-24e282f5-0cc47a31cdbc-9cc4e76aaefa8c0d&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Fdownload.01.org%2F0day-ci%2Farchive%2F20220209%2F202202090703.U5riBMIn-lkp%40intel.com%2Fconfig)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> smatch warnings:
> block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'
> 
> vim +/ctx +272 block/blk-lib.c
>

acked

> 12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {
> 
> Please call kfree(ctx) before the goto.
> 
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://protect2.fireeye.com/v1/url?k=4cd82b59-13431248-4cd9a016-0cc47a31cdbc-7ef30a0abcb321a3&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Flists.01.org%2Fhyperkitty%2Flist%2Fkbuild-all%40lists.01.org
> 
> 
> 

--
Thank you
Nitesh

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 03/10] block: Add copy offload support infrastructure
@ 2022-02-09 10:32     ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-10  6:13 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8436 bytes --]

On Wed, Feb 09, 2022 at 10:48:44AM +0300, Dan Carpenter wrote:
> Hi Nitesh,
> 
> url:    https://protect2.fireeye.com/v1/url?k=483798a4-17aca1b5-483613eb-0cc47a31cdbc-db5fd22936f47f46&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Fgithub.com%2F0day-ci%2Flinux%2Fcommits%2FNitesh-Shetty%2Fblock-make-bio_map_kern-non-static%2F20220207-231407
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
> config: i386-randconfig-m021-20220207 (https://protect2.fireeye.com/v1/url?k=24e309ba-7b7830ab-24e282f5-0cc47a31cdbc-9cc4e76aaefa8c0d&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Fdownload.01.org%2F0day-ci%2Farchive%2F20220209%2F202202090703.U5riBMIn-lkp%40intel.com%2Fconfig)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> smatch warnings:
> block/blk-lib.c:272 blk_copy_offload() warn: possible memory leak of 'ctx'
> 
> vim +/ctx +272 block/blk-lib.c
>

acked

> 12a9801a7301f1 Nitesh Shetty 2022-02-07  185  int blk_copy_offload(struct block_device *src_bdev, int nr_srcs,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  186  		struct range_entry *rlist, struct block_device *dst_bdev, gfp_t gfp_mask)
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  187  {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  188  	struct request_queue *sq = bdev_get_queue(src_bdev);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  189  	struct request_queue *dq = bdev_get_queue(dst_bdev);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  190  	struct bio *read_bio, *write_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  191  	struct copy_ctx *ctx;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  192  	struct cio *cio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  193  	struct page *token;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  194  	sector_t src_blk, copy_len, dst_blk;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  195  	sector_t remaining, max_copy_len = LONG_MAX;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  196  	int ri = 0, ret = 0;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  197  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  198  	cio = kzalloc(sizeof(struct cio), GFP_KERNEL);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  199  	if (!cio)
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  200  		return -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  201  	atomic_set(&cio->refcount, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  202  	cio->rlist = rlist;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  203  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  204  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_sectors,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  205  			(sector_t)dq->limits.max_copy_sectors);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  206  	max_copy_len = min3(max_copy_len, (sector_t)sq->limits.max_copy_range_sectors,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  207  			(sector_t)dq->limits.max_copy_range_sectors) << SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  208  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  209  	for (ri = 0; ri < nr_srcs; ri++) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  210  		cio->rlist[ri].comp_len = rlist[ri].len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  211  		for (remaining = rlist[ri].len, src_blk = rlist[ri].src, dst_blk = rlist[ri].dst;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  212  			remaining > 0;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  213  			remaining -= copy_len, src_blk += copy_len, dst_blk += copy_len) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  214  			copy_len = min(remaining, max_copy_len);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  215  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  216  			token = alloc_page(gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  217  			if (unlikely(!token)) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  218  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  219  				goto err_token;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  220  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  221  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  222  			read_bio = bio_alloc(src_bdev, 1, REQ_OP_READ | REQ_COPY | REQ_NOMERGE,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  223  					gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  224  			if (!read_bio) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  225  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  226  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  227  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  228  			read_bio->bi_iter.bi_sector = src_blk >> SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  229  			read_bio->bi_iter.bi_size = copy_len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  230  			__bio_add_page(read_bio, token, PAGE_SIZE, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  231  			ret = submit_bio_wait(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  232  			if (ret) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  233  				bio_put(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  234  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  235  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  236  			bio_put(read_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  237  			ctx = kzalloc(sizeof(struct copy_ctx), gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  238  			if (!ctx) {
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  239  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  240  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  241  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  242  			ctx->cio = cio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  243  			ctx->range_idx = ri;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  244  			ctx->start_sec = rlist[ri].src;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  245  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  246  			write_bio = bio_alloc(dst_bdev, 1, REQ_OP_WRITE | REQ_COPY | REQ_NOMERGE,
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  247  					gfp_mask);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  248  			if (!write_bio) {
> 
> Please call kfree(ctx) before the goto.
> 
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  249  				ret = -ENOMEM;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  250  				goto err_read_bio;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  251  			}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  252  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  253  			write_bio->bi_iter.bi_sector = dst_blk >> SECTOR_SHIFT;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  254  			write_bio->bi_iter.bi_size = copy_len;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  255  			__bio_add_page(write_bio, token, PAGE_SIZE, 0);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  256  			write_bio->bi_end_io = bio_copy_end_io;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  257  			write_bio->bi_private = ctx;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  258  			atomic_inc(&cio->refcount);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  259  			submit_bio(write_bio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  260  		}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  261  	}
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  262  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  263  	/* Wait for completion of all IO's*/
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  264  	return cio_await_completion(cio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  265  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  266  err_read_bio:
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  267  	__free_page(token);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  268  err_token:
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  269  	rlist[ri].comp_len = min_t(sector_t, rlist[ri].comp_len, (rlist[ri].len - remaining));
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  270  
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  271  	cio->io_err = ret;
> 12a9801a7301f1 Nitesh Shetty 2022-02-07 @272  	return cio_await_completion(cio);
> 12a9801a7301f1 Nitesh Shetty 2022-02-07  273  }
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://protect2.fireeye.com/v1/url?k=4cd82b59-13431248-4cd9a016-0cc47a31cdbc-7ef30a0abcb321a3&q=1&e=e5a0c082-878d-4bbf-be36-3c8e34773475&u=https%3A%2F%2Flists.01.org%2Fhyperkitty%2Flist%2Fkbuild-all%40lists.01.org
> 
> 
> 

--
Thank you
Nitesh

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 06/10] nvme: add copy support
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-10  7:08                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10  7:08 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty, SelvaKumar S

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220209]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-s032-20220207 (https://download.01.org/0day-ci/archive/20220210/202202101447.DEcXWmHU-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/22cbc1d3df11aaadd02b27ce5dcb702f9a8f4272
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 22cbc1d3df11aaadd02b27ce5dcb702f9a8f4272
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/host/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/host/core.c:1793:42: sparse: sparse: cast to restricted __le64
   drivers/nvme/host/core.c:1793:42: sparse: sparse: cast from restricted __le32
>> drivers/nvme/host/core.c:1795:48: sparse: sparse: cast to restricted __le32
>> drivers/nvme/host/core.c:1795:48: sparse: sparse: cast from restricted __le16
>> drivers/nvme/host/core.c:903:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] dspec @@     got restricted __le32 [usertype] @@
   drivers/nvme/host/core.c:903:26: sparse:     expected restricted __le16 [usertype] dspec
   drivers/nvme/host/core.c:903:26: sparse:     got restricted __le32 [usertype]

vim +1793 drivers/nvme/host/core.c

  1774	
  1775	static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
  1776					       struct nvme_id_ns *id)
  1777	{
  1778		struct nvme_ctrl *ctrl = ns->ctrl;
  1779		struct request_queue *queue = disk->queue;
  1780	
  1781		if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
  1782			queue->limits.copy_offload = 0;
  1783			queue->limits.max_copy_sectors = 0;
  1784			queue->limits.max_copy_range_sectors = 0;
  1785			queue->limits.max_copy_nr_ranges = 0;
  1786			blk_queue_flag_clear(QUEUE_FLAG_COPY, queue);
  1787			return;
  1788		}
  1789	
  1790		/* setting copy limits */
  1791		blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, queue);
  1792		queue->limits.copy_offload = 0;
> 1793		queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) *
  1794			(1 << (ns->lba_shift - 9));
> 1795		queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) *
  1796			(1 << (ns->lba_shift - 9));
  1797		queue->limits.max_copy_nr_ranges = id->msrc + 1;
  1798	}
  1799	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 06/10] nvme: add copy support
@ 2022-02-10  7:08                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10  7:08 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	SelvaKumar S, msnitzer, josef, linux-block, dsterba, kbusch,
	tytso, Frederick.Knight, axboe, kbuild-all, joshi.k,
	martin.petersen, arnav.dawn, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220209]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-s032-20220207 (https://download.01.org/0day-ci/archive/20220210/202202101447.DEcXWmHU-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/22cbc1d3df11aaadd02b27ce5dcb702f9a8f4272
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 22cbc1d3df11aaadd02b27ce5dcb702f9a8f4272
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/host/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/host/core.c:1793:42: sparse: sparse: cast to restricted __le64
   drivers/nvme/host/core.c:1793:42: sparse: sparse: cast from restricted __le32
>> drivers/nvme/host/core.c:1795:48: sparse: sparse: cast to restricted __le32
>> drivers/nvme/host/core.c:1795:48: sparse: sparse: cast from restricted __le16
>> drivers/nvme/host/core.c:903:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] dspec @@     got restricted __le32 [usertype] @@
   drivers/nvme/host/core.c:903:26: sparse:     expected restricted __le16 [usertype] dspec
   drivers/nvme/host/core.c:903:26: sparse:     got restricted __le32 [usertype]

vim +1793 drivers/nvme/host/core.c

  1774	
  1775	static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
  1776					       struct nvme_id_ns *id)
  1777	{
  1778		struct nvme_ctrl *ctrl = ns->ctrl;
  1779		struct request_queue *queue = disk->queue;
  1780	
  1781		if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
  1782			queue->limits.copy_offload = 0;
  1783			queue->limits.max_copy_sectors = 0;
  1784			queue->limits.max_copy_range_sectors = 0;
  1785			queue->limits.max_copy_nr_ranges = 0;
  1786			blk_queue_flag_clear(QUEUE_FLAG_COPY, queue);
  1787			return;
  1788		}
  1789	
  1790		/* setting copy limits */
  1791		blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, queue);
  1792		queue->limits.copy_offload = 0;
> 1793		queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) *
  1794			(1 << (ns->lba_shift - 9));
> 1795		queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) *
  1796			(1 << (ns->lba_shift - 9));
  1797		queue->limits.max_copy_nr_ranges = id->msrc + 1;
  1798	}
  1799	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 06/10] nvme: add copy support
@ 2022-02-10  7:08                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10  7:08 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3669 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220209]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-s032-20220207 (https://download.01.org/0day-ci/archive/20220210/202202101447.DEcXWmHU-lkp(a)intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/22cbc1d3df11aaadd02b27ce5dcb702f9a8f4272
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 22cbc1d3df11aaadd02b27ce5dcb702f9a8f4272
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/host/ drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/host/core.c:1793:42: sparse: sparse: cast to restricted __le64
   drivers/nvme/host/core.c:1793:42: sparse: sparse: cast from restricted __le32
>> drivers/nvme/host/core.c:1795:48: sparse: sparse: cast to restricted __le32
>> drivers/nvme/host/core.c:1795:48: sparse: sparse: cast from restricted __le16
>> drivers/nvme/host/core.c:903:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] dspec @@     got restricted __le32 [usertype] @@
   drivers/nvme/host/core.c:903:26: sparse:     expected restricted __le16 [usertype] dspec
   drivers/nvme/host/core.c:903:26: sparse:     got restricted __le32 [usertype]

vim +1793 drivers/nvme/host/core.c

  1774	
  1775	static void nvme_config_copy(struct gendisk *disk, struct nvme_ns *ns,
  1776					       struct nvme_id_ns *id)
  1777	{
  1778		struct nvme_ctrl *ctrl = ns->ctrl;
  1779		struct request_queue *queue = disk->queue;
  1780	
  1781		if (!(ctrl->oncs & NVME_CTRL_ONCS_COPY)) {
  1782			queue->limits.copy_offload = 0;
  1783			queue->limits.max_copy_sectors = 0;
  1784			queue->limits.max_copy_range_sectors = 0;
  1785			queue->limits.max_copy_nr_ranges = 0;
  1786			blk_queue_flag_clear(QUEUE_FLAG_COPY, queue);
  1787			return;
  1788		}
  1789	
  1790		/* setting copy limits */
  1791		blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, queue);
  1792		queue->limits.copy_offload = 0;
> 1793		queue->limits.max_copy_sectors = le64_to_cpu(id->mcl) *
  1794			(1 << (ns->lba_shift - 9));
> 1795		queue->limits.max_copy_range_sectors = le32_to_cpu(id->mssrl) *
  1796			(1 << (ns->lba_shift - 9));
  1797		queue->limits.max_copy_nr_ranges = id->msrc + 1;
  1798	}
  1799	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
@ 2022-02-10  8:31                   ` kernel test robot
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10  8:31 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220209]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-s032-20220207 (https://download.01.org/0day-ci/archive/20220210/202202101647.VtWTgalm-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/io-cmd-bdev.c:52:25: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:52:25: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:52:25: sparse:     got restricted __le64 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:53:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:53:27: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:53:27: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:55:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:55:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:55:26: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:61:35: sparse: sparse: cast to restricted __le32
>> drivers/nvme/target/io-cmd-bdev.c:61:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse:     got restricted __le64 [usertype]
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse:     got restricted __le32 [usertype]
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse:     got restricted __le64 [usertype]
--
>> drivers/nvme/target/admin-cmd.c:533:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/admin-cmd.c:533:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/admin-cmd.c:533:26: sparse:     got restricted __le16
>> drivers/nvme/target/admin-cmd.c:534:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/admin-cmd.c:534:27: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/admin-cmd.c:534:27: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/admin-cmd.c:535:27: sparse: sparse: cast to restricted __le32
>> drivers/nvme/target/admin-cmd.c:535:27: sparse: sparse: cast from restricted __le16
>> drivers/nvme/target/admin-cmd.c:535:25: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/admin-cmd.c:535:25: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/admin-cmd.c:535:25: sparse:     got restricted __le64 [usertype]

vim +52 drivers/nvme/target/io-cmd-bdev.c

    11	
    12	void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
    13	{
    14		const struct queue_limits *ql = &bdev_get_queue(bdev)->limits;
    15		/* Number of logical blocks per physical block. */
    16		const u32 lpp = ql->physical_block_size / ql->logical_block_size;
    17		/* Logical blocks per physical block, 0's based. */
    18		const __le16 lpp0b = to0based(lpp);
    19	
    20		/*
    21		 * For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
    22		 * NAWUPF, and NACWU are defined for this namespace and should be
    23		 * used by the host for this namespace instead of the AWUN, AWUPF,
    24		 * and ACWU fields in the Identify Controller data structure. If
    25		 * any of these fields are zero that means that the corresponding
    26		 * field from the identify controller data structure should be used.
    27		 */
    28		id->nsfeat |= 1 << 1;
    29		id->nawun = lpp0b;
    30		id->nawupf = lpp0b;
    31		id->nacwu = lpp0b;
    32	
    33		/*
    34		 * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
    35		 * NOWS are defined for this namespace and should be used by
    36		 * the host for I/O optimization.
    37		 */
    38		id->nsfeat |= 1 << 4;
    39		/* NPWG = Namespace Preferred Write Granularity. 0's based */
    40		id->npwg = lpp0b;
    41		/* NPWA = Namespace Preferred Write Alignment. 0's based */
    42		id->npwa = id->npwg;
    43		/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
    44		id->npdg = to0based(ql->discard_granularity / ql->logical_block_size);
    45		/* NPDG = Namespace Preferred Deallocate Alignment */
    46		id->npda = id->npdg;
    47		/* NOWS = Namespace Optimal Write Size */
    48		id->nows = to0based(ql->io_opt / ql->logical_block_size);
    49	
    50		/*Copy limits*/
    51		if (ql->max_copy_sectors) {
  > 52			id->mcl = cpu_to_le64((ql->max_copy_sectors << 9) / ql->logical_block_size);
  > 53			id->mssrl = cpu_to_le32((ql->max_copy_range_sectors << 9) /
    54					ql->logical_block_size);
  > 55			id->msrc = to0based(ql->max_copy_nr_ranges);
    56		} else {
    57			if (ql->zoned == BLK_ZONED_NONE) {
    58				id->msrc = to0based(BIO_MAX_VECS);
    59				id->mssrl = cpu_to_le32(
    60						(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
  > 61				id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
    62	#ifdef CONFIG_BLK_DEV_ZONED
    63			} else {
    64				/* TODO: get right values for zoned device */
    65				id->msrc = to0based(BIO_MAX_VECS);
    66				id->mssrl = cpu_to_le32(min((BIO_MAX_VECS << PAGE_SHIFT),
    67						ql->chunk_sectors) / ql->logical_block_size);
    68				id->mcl = cpu_to_le64(min(le32_to_cpu(id->mssrl) * BIO_MAX_VECS,
    69							ql->chunk_sectors));
    70	#endif
    71			}
    72		}
    73	}
    74	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-10  8:31                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10  8:31 UTC (permalink / raw)
  To: Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, tytso,
	Frederick.Knight, axboe, kbuild-all, joshi.k, martin.petersen,
	arnav.dawn, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220209]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-s032-20220207 (https://download.01.org/0day-ci/archive/20220210/202202101647.VtWTgalm-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/io-cmd-bdev.c:52:25: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:52:25: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:52:25: sparse:     got restricted __le64 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:53:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:53:27: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:53:27: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:55:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:55:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:55:26: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:61:35: sparse: sparse: cast to restricted __le32
>> drivers/nvme/target/io-cmd-bdev.c:61:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse:     got restricted __le64 [usertype]
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse:     got restricted __le32 [usertype]
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse:     got restricted __le64 [usertype]
--
>> drivers/nvme/target/admin-cmd.c:533:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/admin-cmd.c:533:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/admin-cmd.c:533:26: sparse:     got restricted __le16
>> drivers/nvme/target/admin-cmd.c:534:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/admin-cmd.c:534:27: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/admin-cmd.c:534:27: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/admin-cmd.c:535:27: sparse: sparse: cast to restricted __le32
>> drivers/nvme/target/admin-cmd.c:535:27: sparse: sparse: cast from restricted __le16
>> drivers/nvme/target/admin-cmd.c:535:25: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/admin-cmd.c:535:25: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/admin-cmd.c:535:25: sparse:     got restricted __le64 [usertype]

vim +52 drivers/nvme/target/io-cmd-bdev.c

    11	
    12	void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
    13	{
    14		const struct queue_limits *ql = &bdev_get_queue(bdev)->limits;
    15		/* Number of logical blocks per physical block. */
    16		const u32 lpp = ql->physical_block_size / ql->logical_block_size;
    17		/* Logical blocks per physical block, 0's based. */
    18		const __le16 lpp0b = to0based(lpp);
    19	
    20		/*
    21		 * For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
    22		 * NAWUPF, and NACWU are defined for this namespace and should be
    23		 * used by the host for this namespace instead of the AWUN, AWUPF,
    24		 * and ACWU fields in the Identify Controller data structure. If
    25		 * any of these fields are zero that means that the corresponding
    26		 * field from the identify controller data structure should be used.
    27		 */
    28		id->nsfeat |= 1 << 1;
    29		id->nawun = lpp0b;
    30		id->nawupf = lpp0b;
    31		id->nacwu = lpp0b;
    32	
    33		/*
    34		 * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
    35		 * NOWS are defined for this namespace and should be used by
    36		 * the host for I/O optimization.
    37		 */
    38		id->nsfeat |= 1 << 4;
    39		/* NPWG = Namespace Preferred Write Granularity. 0's based */
    40		id->npwg = lpp0b;
    41		/* NPWA = Namespace Preferred Write Alignment. 0's based */
    42		id->npwa = id->npwg;
    43		/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
    44		id->npdg = to0based(ql->discard_granularity / ql->logical_block_size);
    45		/* NPDG = Namespace Preferred Deallocate Alignment */
    46		id->npda = id->npdg;
    47		/* NOWS = Namespace Optimal Write Size */
    48		id->nows = to0based(ql->io_opt / ql->logical_block_size);
    49	
    50		/*Copy limits*/
    51		if (ql->max_copy_sectors) {
  > 52			id->mcl = cpu_to_le64((ql->max_copy_sectors << 9) / ql->logical_block_size);
  > 53			id->mssrl = cpu_to_le32((ql->max_copy_range_sectors << 9) /
    54					ql->logical_block_size);
  > 55			id->msrc = to0based(ql->max_copy_nr_ranges);
    56		} else {
    57			if (ql->zoned == BLK_ZONED_NONE) {
    58				id->msrc = to0based(BIO_MAX_VECS);
    59				id->mssrl = cpu_to_le32(
    60						(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
  > 61				id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
    62	#ifdef CONFIG_BLK_DEV_ZONED
    63			} else {
    64				/* TODO: get right values for zoned device */
    65				id->msrc = to0based(BIO_MAX_VECS);
    66				id->mssrl = cpu_to_le32(min((BIO_MAX_VECS << PAGE_SHIFT),
    67						ql->chunk_sectors) / ql->logical_block_size);
    68				id->mcl = cpu_to_le64(min(le32_to_cpu(id->mssrl) * BIO_MAX_VECS,
    69							ql->chunk_sectors));
    70	#endif
    71			}
    72		}
    73	}
    74	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-10  8:31                   ` kernel test robot
  0 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10  8:31 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10879 bytes --]

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220209]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-s032-20220207 (https://download.01.org/0day-ci/archive/20220210/202202101647.VtWTgalm-lkp(a)intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/6bb6ea64499e1ac27975e79bb2eee89f07861893
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
        git checkout 6bb6ea64499e1ac27975e79bb2eee89f07861893
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/io-cmd-bdev.c:52:25: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:52:25: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:52:25: sparse:     got restricted __le64 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:53:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:53:27: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:53:27: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:55:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:55:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:55:26: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:58:34: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:59:35: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/io-cmd-bdev.c:61:35: sparse: sparse: cast to restricted __le32
>> drivers/nvme/target/io-cmd-bdev.c:61:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:61:33: sparse:     got restricted __le64 [usertype]
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/io-cmd-bdev.c:65:34: sparse:     got restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/io-cmd-bdev.c:66:35: sparse:     got restricted __le32 [usertype]
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast to restricted __le32
   drivers/nvme/target/io-cmd-bdev.c:68:35: sparse: sparse: cast from restricted __le16
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/io-cmd-bdev.c:68:33: sparse:     got restricted __le64 [usertype]
--
>> drivers/nvme/target/admin-cmd.c:533:26: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned char [usertype] msrc @@     got restricted __le16 @@
   drivers/nvme/target/admin-cmd.c:533:26: sparse:     expected unsigned char [usertype] msrc
   drivers/nvme/target/admin-cmd.c:533:26: sparse:     got restricted __le16
>> drivers/nvme/target/admin-cmd.c:534:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] mssrl @@     got restricted __le32 [usertype] @@
   drivers/nvme/target/admin-cmd.c:534:27: sparse:     expected restricted __le16 [usertype] mssrl
   drivers/nvme/target/admin-cmd.c:534:27: sparse:     got restricted __le32 [usertype]
>> drivers/nvme/target/admin-cmd.c:535:27: sparse: sparse: cast to restricted __le32
>> drivers/nvme/target/admin-cmd.c:535:27: sparse: sparse: cast from restricted __le16
>> drivers/nvme/target/admin-cmd.c:535:25: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] mcl @@     got restricted __le64 [usertype] @@
   drivers/nvme/target/admin-cmd.c:535:25: sparse:     expected restricted __le32 [usertype] mcl
   drivers/nvme/target/admin-cmd.c:535:25: sparse:     got restricted __le64 [usertype]

vim +52 drivers/nvme/target/io-cmd-bdev.c

    11	
    12	void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
    13	{
    14		const struct queue_limits *ql = &bdev_get_queue(bdev)->limits;
    15		/* Number of logical blocks per physical block. */
    16		const u32 lpp = ql->physical_block_size / ql->logical_block_size;
    17		/* Logical blocks per physical block, 0's based. */
    18		const __le16 lpp0b = to0based(lpp);
    19	
    20		/*
    21		 * For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
    22		 * NAWUPF, and NACWU are defined for this namespace and should be
    23		 * used by the host for this namespace instead of the AWUN, AWUPF,
    24		 * and ACWU fields in the Identify Controller data structure. If
    25		 * any of these fields are zero that means that the corresponding
    26		 * field from the identify controller data structure should be used.
    27		 */
    28		id->nsfeat |= 1 << 1;
    29		id->nawun = lpp0b;
    30		id->nawupf = lpp0b;
    31		id->nacwu = lpp0b;
    32	
    33		/*
    34		 * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
    35		 * NOWS are defined for this namespace and should be used by
    36		 * the host for I/O optimization.
    37		 */
    38		id->nsfeat |= 1 << 4;
    39		/* NPWG = Namespace Preferred Write Granularity. 0's based */
    40		id->npwg = lpp0b;
    41		/* NPWA = Namespace Preferred Write Alignment. 0's based */
    42		id->npwa = id->npwg;
    43		/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
    44		id->npdg = to0based(ql->discard_granularity / ql->logical_block_size);
    45		/* NPDG = Namespace Preferred Deallocate Alignment */
    46		id->npda = id->npdg;
    47		/* NOWS = Namespace Optimal Write Size */
    48		id->nows = to0based(ql->io_opt / ql->logical_block_size);
    49	
    50		/*Copy limits*/
    51		if (ql->max_copy_sectors) {
  > 52			id->mcl = cpu_to_le64((ql->max_copy_sectors << 9) / ql->logical_block_size);
  > 53			id->mssrl = cpu_to_le32((ql->max_copy_range_sectors << 9) /
    54					ql->logical_block_size);
  > 55			id->msrc = to0based(ql->max_copy_nr_ranges);
    56		} else {
    57			if (ql->zoned == BLK_ZONED_NONE) {
    58				id->msrc = to0based(BIO_MAX_VECS);
    59				id->mssrl = cpu_to_le32(
    60						(BIO_MAX_VECS << PAGE_SHIFT) / ql->logical_block_size);
  > 61				id->mcl = cpu_to_le64(le32_to_cpu(id->mssrl) * BIO_MAX_VECS);
    62	#ifdef CONFIG_BLK_DEV_ZONED
    63			} else {
    64				/* TODO: get right values for zoned device */
    65				id->msrc = to0based(BIO_MAX_VECS);
    66				id->mssrl = cpu_to_le32(min((BIO_MAX_VECS << PAGE_SHIFT),
    67						ql->chunk_sectors) / ql->logical_block_size);
    68				id->mcl = cpu_to_le64(min(le32_to_cpu(id->mssrl) * BIO_MAX_VECS,
    69							ql->chunk_sectors));
    70	#endif
    71			}
    72		}
    73	}
    74	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
  (?)
  (?)
@ 2022-02-11  7:52 ` Dan Carpenter
  -1 siblings, 0 replies; 272+ messages in thread
From: kernel test robot @ 2022-02-10 22:49 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 5863 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20220207141348.4235-8-nj.shetty@samsung.com>
References: <20220207141348.4235-8-nj.shetty@samsung.com>
TO: Nitesh Shetty <nj.shetty@samsung.com>
TO: mpatocka(a)redhat.com
CC: javier(a)javigon.com
CC: chaitanyak(a)nvidia.com
CC: linux-block(a)vger.kernel.org
CC: linux-scsi(a)vger.kernel.org
CC: dm-devel(a)redhat.com
CC: linux-nvme(a)lists.infradead.org
CC: linux-fsdevel(a)vger.kernel.org
CC: axboe(a)kernel.dk
CC: msnitzer(a)redhat.com
CC: bvanassche(a)acm.org
CC: martin.petersen(a)oracle.com
CC: roland(a)purestorage.com
CC: hare(a)suse.de
CC: kbusch(a)kernel.org
CC: hch(a)lst.de
CC: Frederick.Knight(a)netapp.com
CC: zach.brown(a)ni.com
CC: osandov(a)fb.com
CC: lsf-pc(a)lists.linux-foundation.org
CC: djwong(a)kernel.org
CC: josef(a)toxicpanda.com
CC: clm(a)fb.com
CC: dsterba(a)suse.com
CC: tytso(a)mit.edu
CC: jack(a)suse.com
CC: joshi.k(a)samsung.com
CC: arnav.dawn(a)samsung.com
CC: nj.shetty(a)samsung.com

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v5.17-rc3 next-20220210]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220211/202202110625.4yHrKaUn-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
drivers/nvme/target/io-cmd-file.c:377 nvmet_file_copy_work() error: uninitialized symbol 'len'.
drivers/nvme/target/io-cmd-bdev.c:479 nvmet_bdev_execute_copy() warn: should '(((range.nlb)) + 1) << req->ns->blksize_shift' be a 64 bit type?

Old smatch warnings:
drivers/nvme/target/io-cmd-bdev.c:377 nvmet_bdev_discard_range() warn: should '((range->nlb)) << (ns->blksize_shift - 9)' be a 64 bit type?

vim +/len +377 drivers/nvme/target/io-cmd-file.c

d5eff33ee6f8086 Chaitanya Kulkarni 2018-05-23  349  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  350  static void nvmet_file_copy_work(struct work_struct *w)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  351  {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  352  	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  353  	int nr_range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  354  	loff_t pos;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  355  	struct nvme_command *cmnd = req->cmd;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  356  	int ret = 0, len, src, id;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  357  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  358  	nr_range = cmnd->copy.nr_range + 1;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  359  	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  360  	if (unlikely(pos + req->transfer_len > req->ns->size)) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  361  		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  362  		return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  363  	}
6bb6ea64499e1ac Arnav Dawn         2022-02-07  364  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  365  	for (id = 0 ; id < nr_range; id++) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  366  		struct nvme_copy_range range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  367  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  368  		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
6bb6ea64499e1ac Arnav Dawn         2022-02-07  369  					sizeof(range));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  370  		if (ret)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  371  			goto out;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  372  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  373  		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  374  		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  375  		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  376  out:
6bb6ea64499e1ac Arnav Dawn         2022-02-07 @377  		if (ret != len) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  378  			pos += ret;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  379  			req->cqe->result.u32 = cpu_to_le32(id);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  380  			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
6bb6ea64499e1ac Arnav Dawn         2022-02-07  381  					errno_to_nvme_status(req, -EIO));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  382  			return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  383  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  384  		} else
6bb6ea64499e1ac Arnav Dawn         2022-02-07  385  			pos += len;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  386  }
6bb6ea64499e1ac Arnav Dawn         2022-02-07  387  	nvmet_req_complete(req, ret);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  388  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-11  7:52 ` Dan Carpenter
  0 siblings, 0 replies; 272+ messages in thread
From: Dan Carpenter @ 2022-02-11  7:52 UTC (permalink / raw)
  To: kbuild, Nitesh Shetty, mpatocka
  Cc: lkp, kbuild-all, javier, chaitanyak, linux-block, linux-scsi,
	dm-devel, linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn, nj.shetty

Hi Nitesh,

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220211/202202110625.4yHrKaUn-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
drivers/nvme/target/io-cmd-file.c:377 nvmet_file_copy_work() error: uninitialized symbol 'len'.

vim +/len +377 drivers/nvme/target/io-cmd-file.c

6bb6ea64499e1ac Arnav Dawn         2022-02-07  350  static void nvmet_file_copy_work(struct work_struct *w)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  351  {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  352  	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  353  	int nr_range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  354  	loff_t pos;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  355  	struct nvme_command *cmnd = req->cmd;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  356  	int ret = 0, len, src, id;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  357  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  358  	nr_range = cmnd->copy.nr_range + 1;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  359  	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  360  	if (unlikely(pos + req->transfer_len > req->ns->size)) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  361  		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  362  		return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  363  	}
6bb6ea64499e1ac Arnav Dawn         2022-02-07  364  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  365  	for (id = 0 ; id < nr_range; id++) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  366  		struct nvme_copy_range range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  367  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  368  		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
6bb6ea64499e1ac Arnav Dawn         2022-02-07  369  					sizeof(range));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  370  		if (ret)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  371  			goto out;
                                                                        ^^^^^^^^

6bb6ea64499e1ac Arnav Dawn         2022-02-07  372  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  373  		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  374  		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  375  		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  376  out:
6bb6ea64499e1ac Arnav Dawn         2022-02-07 @377  		if (ret != len) {
                                                                           ^^^
"len" is not initialized.

6bb6ea64499e1ac Arnav Dawn         2022-02-07  378  			pos += ret;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  379  			req->cqe->result.u32 = cpu_to_le32(id);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  380  			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
6bb6ea64499e1ac Arnav Dawn         2022-02-07  381  					errno_to_nvme_status(req, -EIO));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  382  			return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  383  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  384  		} else
6bb6ea64499e1ac Arnav Dawn         2022-02-07  385  			pos += len;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  386  }
6bb6ea64499e1ac Arnav Dawn         2022-02-07  387  	nvmet_req_complete(req, ret);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-11  7:52 ` Dan Carpenter
  0 siblings, 0 replies; 272+ messages in thread
From: Dan Carpenter @ 2022-02-11  7:52 UTC (permalink / raw)
  To: kbuild, Nitesh Shetty, mpatocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, lkp,
	linux-scsi, hch, roland, nj.shetty, zach.brown, chaitanyak,
	msnitzer, josef, linux-block, dsterba, kbusch, tytso,
	Frederick.Knight, bvanassche, axboe, kbuild-all, joshi.k,
	martin.petersen, arnav.dawn, jack, linux-fsdevel, lsf-pc

Hi Nitesh,

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220211/202202110625.4yHrKaUn-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
drivers/nvme/target/io-cmd-file.c:377 nvmet_file_copy_work() error: uninitialized symbol 'len'.

vim +/len +377 drivers/nvme/target/io-cmd-file.c

6bb6ea64499e1ac Arnav Dawn         2022-02-07  350  static void nvmet_file_copy_work(struct work_struct *w)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  351  {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  352  	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  353  	int nr_range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  354  	loff_t pos;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  355  	struct nvme_command *cmnd = req->cmd;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  356  	int ret = 0, len, src, id;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  357  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  358  	nr_range = cmnd->copy.nr_range + 1;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  359  	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  360  	if (unlikely(pos + req->transfer_len > req->ns->size)) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  361  		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  362  		return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  363  	}
6bb6ea64499e1ac Arnav Dawn         2022-02-07  364  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  365  	for (id = 0 ; id < nr_range; id++) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  366  		struct nvme_copy_range range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  367  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  368  		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
6bb6ea64499e1ac Arnav Dawn         2022-02-07  369  					sizeof(range));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  370  		if (ret)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  371  			goto out;
                                                                        ^^^^^^^^

6bb6ea64499e1ac Arnav Dawn         2022-02-07  372  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  373  		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  374  		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  375  		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  376  out:
6bb6ea64499e1ac Arnav Dawn         2022-02-07 @377  		if (ret != len) {
                                                                           ^^^
"len" is not initialized.

6bb6ea64499e1ac Arnav Dawn         2022-02-07  378  			pos += ret;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  379  			req->cqe->result.u32 = cpu_to_le32(id);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  380  			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
6bb6ea64499e1ac Arnav Dawn         2022-02-07  381  					errno_to_nvme_status(req, -EIO));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  382  			return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  383  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  384  		} else
6bb6ea64499e1ac Arnav Dawn         2022-02-07  385  			pos += len;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  386  }
6bb6ea64499e1ac Arnav Dawn         2022-02-07  387  	nvmet_req_complete(req, ret);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns
@ 2022-02-11  7:52 ` Dan Carpenter
  0 siblings, 0 replies; 272+ messages in thread
From: Dan Carpenter @ 2022-02-11  7:52 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4168 bytes --]

Hi Nitesh,

url:    https://github.com/0day-ci/linux/commits/Nitesh-Shetty/block-make-bio_map_kern-non-static/20220207-231407
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20220207 (https://download.01.org/0day-ci/archive/20220211/202202110625.4yHrKaUn-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
drivers/nvme/target/io-cmd-file.c:377 nvmet_file_copy_work() error: uninitialized symbol 'len'.

vim +/len +377 drivers/nvme/target/io-cmd-file.c

6bb6ea64499e1ac Arnav Dawn         2022-02-07  350  static void nvmet_file_copy_work(struct work_struct *w)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  351  {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  352  	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  353  	int nr_range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  354  	loff_t pos;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  355  	struct nvme_command *cmnd = req->cmd;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  356  	int ret = 0, len, src, id;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  357  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  358  	nr_range = cmnd->copy.nr_range + 1;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  359  	pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  360  	if (unlikely(pos + req->transfer_len > req->ns->size)) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  361  		nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  362  		return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  363  	}
6bb6ea64499e1ac Arnav Dawn         2022-02-07  364  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  365  	for (id = 0 ; id < nr_range; id++) {
6bb6ea64499e1ac Arnav Dawn         2022-02-07  366  		struct nvme_copy_range range;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  367  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  368  		ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
6bb6ea64499e1ac Arnav Dawn         2022-02-07  369  					sizeof(range));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  370  		if (ret)
6bb6ea64499e1ac Arnav Dawn         2022-02-07  371  			goto out;
                                                                        ^^^^^^^^

6bb6ea64499e1ac Arnav Dawn         2022-02-07  372  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  373  		len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  374  		src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  375  		ret = vfs_copy_file_range(req->ns->file, src, req->ns->file, pos, len, 0);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  376  out:
6bb6ea64499e1ac Arnav Dawn         2022-02-07 @377  		if (ret != len) {
                                                                           ^^^
"len" is not initialized.

6bb6ea64499e1ac Arnav Dawn         2022-02-07  378  			pos += ret;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  379  			req->cqe->result.u32 = cpu_to_le32(id);
6bb6ea64499e1ac Arnav Dawn         2022-02-07  380  			nvmet_req_complete(req, ret < 0 ? errno_to_nvme_status(req, ret) :
6bb6ea64499e1ac Arnav Dawn         2022-02-07  381  					errno_to_nvme_status(req, -EIO));
6bb6ea64499e1ac Arnav Dawn         2022-02-07  382  			return;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  383  
6bb6ea64499e1ac Arnav Dawn         2022-02-07  384  		} else
6bb6ea64499e1ac Arnav Dawn         2022-02-07  385  			pos += len;
6bb6ea64499e1ac Arnav Dawn         2022-02-07  386  }
6bb6ea64499e1ac Arnav Dawn         2022-02-07  387  	nvmet_req_complete(req, ret);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 05/10] block: add emulation for copy
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
@ 2022-02-16 13:32                   ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-16 13:32 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn



On Mon, 7 Feb 2022, Nitesh Shetty wrote:

> +				goto retry;
> +			return PTR_ERR(bio);
> +		}
> +
> +		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
> +		bio->bi_opf = op;
> +		bio_set_dev(bio, bdev);
> @@ -346,6 +463,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
>  
>  	if (blk_check_copy_offload(src_q, dest_q))
>  		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> +	else
> +		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
>  
>  	return ret;
>  }

The emulation is not reliable because a device mapper device may be 
reconfigured and it may lose the copy capability between the calls to 
blk_check_copy_offload and blk_copy_offload.

You should call blk_copy_emulate if blk_copy_offload returns an error.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 05/10] block: add emulation for copy
@ 2022-02-16 13:32                   ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-16 13:32 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, zach.brown, chaitanyak, msnitzer, josef,
	linux-block, dsterba, kbusch, Frederick.Knight, axboe, tytso,
	joshi.k, martin.petersen, arnav.dawn, jack, linux-fsdevel,
	lsf-pc



On Mon, 7 Feb 2022, Nitesh Shetty wrote:

> +				goto retry;
> +			return PTR_ERR(bio);
> +		}
> +
> +		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
> +		bio->bi_opf = op;
> +		bio_set_dev(bio, bdev);
> @@ -346,6 +463,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
>  
>  	if (blk_check_copy_offload(src_q, dest_q))
>  		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> +	else
> +		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
>  
>  	return ret;
>  }

The emulation is not reliable because a device mapper device may be 
reconfigured and it may lose the copy capability between the calls to 
blk_check_copy_offload and blk_copy_offload.

You should call blk_copy_emulate if blk_copy_offload returns an error.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 08/10] dm: Add support for copy offload.
  2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
@ 2022-02-16 13:51                   ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-16 13:51 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn



On Mon, 7 Feb 2022, Nitesh Shetty wrote:

> Before enabling copy for dm target, check if underlaying devices and
> dm target support copy. Avoid split happening inside dm target.
> Fail early if the request needs split, currently spliting copy
> request is not supported
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>

If a dm device is reconfigured, you must invalidate all the copy tokens 
that are in flight, otherwise they would copy stale data.

I suggest that you create a global variable "atomic64_t dm_changed".
In nvme_setup_copy_read you copy this variable to the token.
In nvme_setup_copy_write you compare the variable with the value in the 
token and fail if there is mismatch.
In dm.c:__bind you increase the variable, so that all the tokens will be 
invalidated if a dm table is changed.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 08/10] dm: Add support for copy offload.
@ 2022-02-16 13:51                   ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-16 13:51 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, zach.brown, chaitanyak, msnitzer, josef,
	linux-block, dsterba, kbusch, Frederick.Knight, axboe, tytso,
	joshi.k, martin.petersen, arnav.dawn, jack, linux-fsdevel,
	lsf-pc



On Mon, 7 Feb 2022, Nitesh Shetty wrote:

> Before enabling copy for dm target, check if underlaying devices and
> dm target support copy. Avoid split happening inside dm target.
> Fail early if the request needs split, currently spliting copy
> request is not supported
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>

If a dm device is reconfigured, you must invalidate all the copy tokens 
that are in flight, otherwise they would copy stale data.

I suggest that you create a global variable "atomic64_t dm_changed".
In nvme_setup_copy_read you copy this variable to the token.
In nvme_setup_copy_write you compare the variable with the value in the 
token and fail if there is mismatch.
In dm.c:__bind you increase the variable, so that all the tokens will be 
invalidated if a dm table is changed.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 05/10] block: add emulation for copy
  2022-02-16 13:32                   ` [dm-devel] " Mikulas Patocka
@ 2022-02-17 13:18                     ` Nitesh Shetty
  -1 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-17 13:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Wed, Feb 16, 2022 at 08:32:45AM -0500, Mikulas Patocka wrote:
> 
> 
> On Mon, 7 Feb 2022, Nitesh Shetty wrote:
> 
> > +				goto retry;
> > +			return PTR_ERR(bio);
> > +		}
> > +
> > +		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
> > +		bio->bi_opf = op;
> > +		bio_set_dev(bio, bdev);
> > @@ -346,6 +463,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> >  
> >  	if (blk_check_copy_offload(src_q, dest_q))
> >  		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> > +	else
> > +		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> >  
> >  	return ret;
> >  }
> 
> The emulation is not reliable because a device mapper device may be 
> reconfigured and it may lose the copy capability between the calls to 
> blk_check_copy_offload and blk_copy_offload.
> 
> You should call blk_copy_emulate if blk_copy_offload returns an error.
> 
> Mikulas
> 
>

I agree, it was in our todo list to fallback to emulation for partial
copy offload failures. In next version we will add this.

--
Nitesh Shetty


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 05/10] block: add emulation for copy
@ 2022-02-17 13:18                     ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-17 13:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, zach.brown, chaitanyak, msnitzer, josef,
	linux-block, dsterba, kbusch, Frederick.Knight, axboe, tytso,
	joshi.k, martin.petersen, arnav.dawn, jack, linux-fsdevel,
	lsf-pc

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Wed, Feb 16, 2022 at 08:32:45AM -0500, Mikulas Patocka wrote:
> 
> 
> On Mon, 7 Feb 2022, Nitesh Shetty wrote:
> 
> > +				goto retry;
> > +			return PTR_ERR(bio);
> > +		}
> > +
> > +		bio->bi_iter.bi_sector = sector >> SECTOR_SHIFT;
> > +		bio->bi_opf = op;
> > +		bio_set_dev(bio, bdev);
> > @@ -346,6 +463,8 @@ int blkdev_issue_copy(struct block_device *src_bdev, int nr,
> >  
> >  	if (blk_check_copy_offload(src_q, dest_q))
> >  		ret = blk_copy_offload(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> > +	else
> > +		ret = blk_copy_emulate(src_bdev, nr, rlist, dest_bdev, gfp_mask);
> >  
> >  	return ret;
> >  }
> 
> The emulation is not reliable because a device mapper device may be 
> reconfigured and it may lose the copy capability between the calls to 
> blk_check_copy_offload and blk_copy_offload.
> 
> You should call blk_copy_emulate if blk_copy_offload returns an error.
> 
> Mikulas
> 
>

I agree, it was in our todo list to fallback to emulation for partial
copy offload failures. In next version we will add this.

--
Nitesh Shetty


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 08/10] dm: Add support for copy offload.
  2022-02-16 13:51                   ` [dm-devel] " Mikulas Patocka
@ 2022-02-24 12:42                     ` Nitesh Shetty
  -1 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-24 12:42 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On Wed, Feb 16, 2022 at 08:51:08AM -0500, Mikulas Patocka wrote:
> 
> 
> On Mon, 7 Feb 2022, Nitesh Shetty wrote:
> 
> > Before enabling copy for dm target, check if underlaying devices and
> > dm target support copy. Avoid split happening inside dm target.
> > Fail early if the request needs split, currently spliting copy
> > request is not supported
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> 
> If a dm device is reconfigured, you must invalidate all the copy tokens 
> that are in flight, otherwise they would copy stale data.
> 
> I suggest that you create a global variable "atomic64_t dm_changed".
> In nvme_setup_copy_read you copy this variable to the token.
> In nvme_setup_copy_write you compare the variable with the value in the 
> token and fail if there is mismatch.
> In dm.c:__bind you increase the variable, so that all the tokens will be 
> invalidated if a dm table is changed.
> 
> Mikulas
> 
>
Yes, you are right about the reconfiguration of dm device. But wouldn't having a
single global counter(dm_changed), will invalidate for all in-flight copy IO's
across all dm devices. Is my understanding correct?

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 08/10] dm: Add support for copy offload.
@ 2022-02-24 12:42                     ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2022-02-24 12:42 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, zach.brown, chaitanyak, msnitzer, josef,
	linux-block, dsterba, kbusch, Frederick.Knight, axboe, tytso,
	joshi.k, martin.petersen, arnav.dawn, jack, linux-fsdevel,
	lsf-pc

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On Wed, Feb 16, 2022 at 08:51:08AM -0500, Mikulas Patocka wrote:
> 
> 
> On Mon, 7 Feb 2022, Nitesh Shetty wrote:
> 
> > Before enabling copy for dm target, check if underlaying devices and
> > dm target support copy. Avoid split happening inside dm target.
> > Fail early if the request needs split, currently spliting copy
> > request is not supported
> > 
> > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> 
> If a dm device is reconfigured, you must invalidate all the copy tokens 
> that are in flight, otherwise they would copy stale data.
> 
> I suggest that you create a global variable "atomic64_t dm_changed".
> In nvme_setup_copy_read you copy this variable to the token.
> In nvme_setup_copy_write you compare the variable with the value in the 
> token and fail if there is mismatch.
> In dm.c:__bind you increase the variable, so that all the tokens will be 
> invalidated if a dm table is changed.
> 
> Mikulas
> 
>
Yes, you are right about the reconfiguration of dm device. But wouldn't having a
single global counter(dm_changed), will invalidate for all in-flight copy IO's
across all dm devices. Is my understanding correct?

--
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [PATCH v2 08/10] dm: Add support for copy offload.
  2022-02-24 12:42                     ` [dm-devel] " Nitesh Shetty
@ 2022-02-25  9:12                       ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-25  9:12 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: javier, chaitanyak, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, axboe, msnitzer, bvanassche,
	martin.petersen, roland, hare, kbusch, hch, Frederick.Knight,
	zach.brown, osandov, lsf-pc, djwong, josef, clm, dsterba, tytso,
	jack, joshi.k, arnav.dawn



On Thu, 24 Feb 2022, Nitesh Shetty wrote:

> On Wed, Feb 16, 2022 at 08:51:08AM -0500, Mikulas Patocka wrote:
> > 
> > 
> > On Mon, 7 Feb 2022, Nitesh Shetty wrote:
> > 
> > > Before enabling copy for dm target, check if underlaying devices and
> > > dm target support copy. Avoid split happening inside dm target.
> > > Fail early if the request needs split, currently spliting copy
> > > request is not supported
> > > 
> > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > 
> > If a dm device is reconfigured, you must invalidate all the copy tokens 
> > that are in flight, otherwise they would copy stale data.
> > 
> > I suggest that you create a global variable "atomic64_t dm_changed".
> > In nvme_setup_copy_read you copy this variable to the token.
> > In nvme_setup_copy_write you compare the variable with the value in the 
> > token and fail if there is mismatch.
> > In dm.c:__bind you increase the variable, so that all the tokens will be 
> > invalidated if a dm table is changed.
> > 
> > Mikulas
> > 
> >
> Yes, you are right about the reconfiguration of dm device. But wouldn't having a
> single global counter(dm_changed), will invalidate for all in-flight copy IO's
> across all dm devices. Is my understanding correct?
> 
> --
> Nitesh Shetty

Yes, changing it will invalidate all the copy IO's.

But invalidating only IO's affected by the table reload would be hard to 
achieve.

Mikulas


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [PATCH v2 08/10] dm: Add support for copy offload.
@ 2022-02-25  9:12                       ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-02-25  9:12 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: djwong, linux-nvme, clm, dm-devel, osandov, javier, bvanassche,
	linux-scsi, hch, roland, zach.brown, chaitanyak, msnitzer, josef,
	linux-block, dsterba, kbusch, Frederick.Knight, axboe, tytso,
	joshi.k, martin.petersen, arnav.dawn, jack, linux-fsdevel,
	lsf-pc



On Thu, 24 Feb 2022, Nitesh Shetty wrote:

> On Wed, Feb 16, 2022 at 08:51:08AM -0500, Mikulas Patocka wrote:
> > 
> > 
> > On Mon, 7 Feb 2022, Nitesh Shetty wrote:
> > 
> > > Before enabling copy for dm target, check if underlaying devices and
> > > dm target support copy. Avoid split happening inside dm target.
> > > Fail early if the request needs split, currently spliting copy
> > > request is not supported
> > > 
> > > Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> > 
> > If a dm device is reconfigured, you must invalidate all the copy tokens 
> > that are in flight, otherwise they would copy stale data.
> > 
> > I suggest that you create a global variable "atomic64_t dm_changed".
> > In nvme_setup_copy_read you copy this variable to the token.
> > In nvme_setup_copy_write you compare the variable with the value in the 
> > token and fail if there is mismatch.
> > In dm.c:__bind you increase the variable, so that all the tokens will be 
> > invalidated if a dm table is changed.
> > 
> > Mikulas
> > 
> >
> Yes, you are right about the reconfiguration of dm device. But wouldn't having a
> single global counter(dm_changed), will invalidate for all in-flight copy IO's
> across all dm devices. Is my understanding correct?
> 
> --
> Nitesh Shetty

Yes, changing it will invalidate all the copy IO's.

But invalidating only IO's affected by the table reload would be hard to 
achieve.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
@ 2022-03-01 17:34     ` Nikos Tsironis
  2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-01 17:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 1/27/22 09:14, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since
> lack of LSFMMM and we would like to share the details with broader
> community members.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> -ck
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>         https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf

I would like to participate in the discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-03-01 17:34     ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-01 17:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, mpatocka,
	kbus >> Keith Busch, Frederick.Knight, Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On 1/27/22 09:14, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> We have conducted a call with interested people late last year since
> lack of LSFMMM and we would like to share the details with broader
> community members.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> -ck
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>         https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf

I would like to participate in the discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-01 17:34     ` [dm-devel] " Nikos Tsironis
  (?)
@ 2022-03-01 21:32     ` Chaitanya Kulkarni
  2022-03-03 18:36         ` [dm-devel] " Nikos Tsironis
  2022-03-08 20:48         ` [dm-devel] " Nikos Tsironis
  -1 siblings, 2 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2022-03-01 21:32 UTC (permalink / raw)
  To: Nikos Tsironis
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

Nikos,

>> [8] https://kernel.dk/io_uring.pdf
> 
> I would like to participate in the discussion too.
> 
> The dm-clone target would also benefit from copy offload, as it heavily
> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
> NVMe devices, but copy offload sounds even more promising, especially
> for larger copies happening in the background (as is the case with
> dm-clone's background hydration).
> 
> Thanks,
> Nikos

If you can document your findings here it will be great for me to
add it to the agenda.

-ck



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-01 21:32     ` Chaitanya Kulkarni
@ 2022-03-03 18:36         ` Nikos Tsironis
  2022-03-08 20:48         ` [dm-devel] " Nikos Tsironis
  1 sibling, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-03 18:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 3/1/22 23:32, Chaitanya Kulkarni wrote:
> Nikos,
> 
>>> [8] https://kernel.dk/io_uring.pdf
>>
>> I would like to participate in the discussion too.
>>
>> The dm-clone target would also benefit from copy offload, as it heavily
>> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
>> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
>> NVMe devices, but copy offload sounds even more promising, especially
>> for larger copies happening in the background (as is the case with
>> dm-clone's background hydration).
>>
>> Thanks,
>> Nikos
> 
> If you can document your findings here it will be great for me to
> add it to the agenda.
> 

Hi,

Give me a few days to gather my notes, because it's been a while since
the last time I worked on this, and I will come back with a summary of
my findings.

Nikos

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-03-03 18:36         ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-03 18:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, mpatocka,
	kbus >> Keith Busch, Frederick.Knight, Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On 3/1/22 23:32, Chaitanya Kulkarni wrote:
> Nikos,
> 
>>> [8] https://kernel.dk/io_uring.pdf
>>
>> I would like to participate in the discussion too.
>>
>> The dm-clone target would also benefit from copy offload, as it heavily
>> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
>> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
>> NVMe devices, but copy offload sounds even more promising, especially
>> for larger copies happening in the background (as is the case with
>> dm-clone's background hydration).
>>
>> Thanks,
>> Nikos
> 
> If you can document your findings here it will be great for me to
> add it to the agenda.
> 

Hi,

Give me a few days to gather my notes, because it's been a while since
the last time I worked on this, and I will come back with a summary of
my findings.

Nikos

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-01 21:32     ` Chaitanya Kulkarni
@ 2022-03-08 20:48         ` Nikos Tsironis
  2022-03-08 20:48         ` [dm-devel] " Nikos Tsironis
  1 sibling, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-08 20:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, dm-devel, linux-nvme, linux-fsdevel,
	Jens Axboe, msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	mpatocka, Hannes Reinecke, kbus >> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 3/1/22 23:32, Chaitanya Kulkarni wrote:
> Nikos,
> 
>>> [8] https://kernel.dk/io_uring.pdf
>>
>> I would like to participate in the discussion too.
>>
>> The dm-clone target would also benefit from copy offload, as it heavily
>> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
>> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
>> NVMe devices, but copy offload sounds even more promising, especially
>> for larger copies happening in the background (as is the case with
>> dm-clone's background hydration).
>>
>> Thanks,
>> Nikos
> 
> If you can document your findings here it will be great for me to
> add it to the agenda.
> 

My work focuses mainly on improving the IOPs and latency of the
dm-snapshot target, in order to bring the performance of short-lived
snapshots as close as possible to bare-metal performance.

My initial performance evaluation of dm-snapshot had revealed a big
performance drop, while the snapshot is active; a drop which is not
justified by COW alone.

Using fio with blktrace I had noticed that the per-CPU I/O distribution
was uneven. Although many threads were doing I/O, only a couple of the
CPUs ended up submitting I/O requests to the underlying device.

The same issue also affects dm-clone, when doing I/O with sizes smaller
than the target's region size, where kcopyd is used for COW.

The bottleneck here is kcopyd serializing all I/O. Users of kcopyd, such
as dm-snapshot and dm-clone, cannot take advantage of the increased I/O
parallelism that comes with using blk-mq in modern multi-core systems,
because I/Os are issued only by a single CPU at a time, the one on which
kcopyd’s thread happens to be running.

So, I experimented redesigning kcopyd to prevent I/O serialization by
respecting thread locality for I/Os and their completions. This made the
distribution of I/O processing uniform across CPUs.

My measurements had shown that scaling kcopyd, in combination with
scaling dm-snapshot itself [1] [2], can lead to an eventual performance
improvement of ~300% increase in sustained throughput and ~80% decrease
in I/O latency for transient snapshots, over the null_blk device.

The work for scaling dm-snapshot has been merged [1], but,
unfortunately, I haven't been able to send upstream my work on kcopyd
yet, because I have been really busy with other things the last couple
of years.

I haven't looked into the details of copy offload yet, but it would be
really interesting to see how it affects the performance of random and
sequential workloads, and to check how, and if, scaling kcopyd affects
the performance, in combination with copy offload.

Nikos

[1] https://lore.kernel.org/dm-devel/20190317122258.21760-1-ntsironis@arrikto.com/
[2] https://lore.kernel.org/dm-devel/425d7efe-ab3f-67be-264e-9c3b6db229bc@arrikto.com/

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-03-08 20:48         ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-08 20:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, dsterba, josef, linux-block, mpatocka,
	kbus >> Keith Busch, Frederick.Knight, Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On 3/1/22 23:32, Chaitanya Kulkarni wrote:
> Nikos,
> 
>>> [8] https://kernel.dk/io_uring.pdf
>>
>> I would like to participate in the discussion too.
>>
>> The dm-clone target would also benefit from copy offload, as it heavily
>> employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
>> achieve increased IOPS in dm-clone and dm-snapshot for small copies over
>> NVMe devices, but copy offload sounds even more promising, especially
>> for larger copies happening in the background (as is the case with
>> dm-clone's background hydration).
>>
>> Thanks,
>> Nikos
> 
> If you can document your findings here it will be great for me to
> add it to the agenda.
> 

My work focuses mainly on improving the IOPs and latency of the
dm-snapshot target, in order to bring the performance of short-lived
snapshots as close as possible to bare-metal performance.

My initial performance evaluation of dm-snapshot had revealed a big
performance drop, while the snapshot is active; a drop which is not
justified by COW alone.

Using fio with blktrace I had noticed that the per-CPU I/O distribution
was uneven. Although many threads were doing I/O, only a couple of the
CPUs ended up submitting I/O requests to the underlying device.

The same issue also affects dm-clone, when doing I/O with sizes smaller
than the target's region size, where kcopyd is used for COW.

The bottleneck here is kcopyd serializing all I/O. Users of kcopyd, such
as dm-snapshot and dm-clone, cannot take advantage of the increased I/O
parallelism that comes with using blk-mq in modern multi-core systems,
because I/Os are issued only by a single CPU at a time, the one on which
kcopyd’s thread happens to be running.

So, I experimented redesigning kcopyd to prevent I/O serialization by
respecting thread locality for I/Os and their completions. This made the
distribution of I/O processing uniform across CPUs.

My measurements had shown that scaling kcopyd, in combination with
scaling dm-snapshot itself [1] [2], can lead to an eventual performance
improvement of ~300% increase in sustained throughput and ~80% decrease
in I/O latency for transient snapshots, over the null_blk device.

The work for scaling dm-snapshot has been merged [1], but,
unfortunately, I haven't been able to send upstream my work on kcopyd
yet, because I have been really busy with other things the last couple
of years.

I haven't looked into the details of copy offload yet, but it would be
really interesting to see how it affects the performance of random and
sequential workloads, and to check how, and if, scaling kcopyd affects
the performance, in combination with copy offload.

Nikos

[1] https://lore.kernel.org/dm-devel/20190317122258.21760-1-ntsironis@arrikto.com/
[2] https://lore.kernel.org/dm-devel/425d7efe-ab3f-67be-264e-9c3b6db229bc@arrikto.com/

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-08 20:48         ` [dm-devel] " Nikos Tsironis
@ 2022-03-09  8:51           ` Mikulas Patocka
  -1 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-03-09  8:51 UTC (permalink / raw)
  To: Nikos Tsironis
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2404 bytes --]



On Tue, 8 Mar 2022, Nikos Tsironis wrote:

> My work focuses mainly on improving the IOPs and latency of the
> dm-snapshot target, in order to bring the performance of short-lived
> snapshots as close as possible to bare-metal performance.
> 
> My initial performance evaluation of dm-snapshot had revealed a big
> performance drop, while the snapshot is active; a drop which is not
> justified by COW alone.
> 
> Using fio with blktrace I had noticed that the per-CPU I/O distribution
> was uneven. Although many threads were doing I/O, only a couple of the
> CPUs ended up submitting I/O requests to the underlying device.
> 
> The same issue also affects dm-clone, when doing I/O with sizes smaller
> than the target's region size, where kcopyd is used for COW.
> 
> The bottleneck here is kcopyd serializing all I/O. Users of kcopyd, such
> as dm-snapshot and dm-clone, cannot take advantage of the increased I/O
> parallelism that comes with using blk-mq in modern multi-core systems,
> because I/Os are issued only by a single CPU at a time, the one on which
> kcopyd’s thread happens to be running.
> 
> So, I experimented redesigning kcopyd to prevent I/O serialization by
> respecting thread locality for I/Os and their completions. This made the
> distribution of I/O processing uniform across CPUs.
> 
> My measurements had shown that scaling kcopyd, in combination with
> scaling dm-snapshot itself [1] [2], can lead to an eventual performance
> improvement of ~300% increase in sustained throughput and ~80% decrease
> in I/O latency for transient snapshots, over the null_blk device.
> 
> The work for scaling dm-snapshot has been merged [1], but,
> unfortunately, I haven't been able to send upstream my work on kcopyd
> yet, because I have been really busy with other things the last couple
> of years.
> 
> I haven't looked into the details of copy offload yet, but it would be
> really interesting to see how it affects the performance of random and
> sequential workloads, and to check how, and if, scaling kcopyd affects
> the performance, in combination with copy offload.
> 
> Nikos

Hi

Note that you must submit kcopyd callbacks from a single thread, otherwise 
there's a race condition in snapshot.

The snapshot code doesn't take locks in the copy_callback and it expects 
that the callbacks are serialized.

Maybe, adding the locks to copy_callback would solve it.

Mikulas

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-03-09  8:51           ` Mikulas Patocka
  0 siblings, 0 replies; 272+ messages in thread
From: Mikulas Patocka @ 2022-03-09  8:51 UTC (permalink / raw)
  To: Nikos Tsironis
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2404 bytes --]



On Tue, 8 Mar 2022, Nikos Tsironis wrote:

> My work focuses mainly on improving the IOPs and latency of the
> dm-snapshot target, in order to bring the performance of short-lived
> snapshots as close as possible to bare-metal performance.
> 
> My initial performance evaluation of dm-snapshot had revealed a big
> performance drop, while the snapshot is active; a drop which is not
> justified by COW alone.
> 
> Using fio with blktrace I had noticed that the per-CPU I/O distribution
> was uneven. Although many threads were doing I/O, only a couple of the
> CPUs ended up submitting I/O requests to the underlying device.
> 
> The same issue also affects dm-clone, when doing I/O with sizes smaller
> than the target's region size, where kcopyd is used for COW.
> 
> The bottleneck here is kcopyd serializing all I/O. Users of kcopyd, such
> as dm-snapshot and dm-clone, cannot take advantage of the increased I/O
> parallelism that comes with using blk-mq in modern multi-core systems,
> because I/Os are issued only by a single CPU at a time, the one on which
> kcopyd’s thread happens to be running.
> 
> So, I experimented redesigning kcopyd to prevent I/O serialization by
> respecting thread locality for I/Os and their completions. This made the
> distribution of I/O processing uniform across CPUs.
> 
> My measurements had shown that scaling kcopyd, in combination with
> scaling dm-snapshot itself [1] [2], can lead to an eventual performance
> improvement of ~300% increase in sustained throughput and ~80% decrease
> in I/O latency for transient snapshots, over the null_blk device.
> 
> The work for scaling dm-snapshot has been merged [1], but,
> unfortunately, I haven't been able to send upstream my work on kcopyd
> yet, because I have been really busy with other things the last couple
> of years.
> 
> I haven't looked into the details of copy offload yet, but it would be
> really interesting to see how it affects the performance of random and
> sequential workloads, and to check how, and if, scaling kcopyd affects
> the performance, in combination with copy offload.
> 
> Nikos

Hi

Note that you must submit kcopyd callbacks from a single thread, otherwise 
there's a race condition in snapshot.

The snapshot code doesn't take locks in the copy_callback and it expects 
that the callbacks are serialized.

Maybe, adding the locks to copy_callback would solve it.

Mikulas

[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2022-03-09  8:51           ` [dm-devel] " Mikulas Patocka
@ 2022-03-09 15:49             ` Nikos Tsironis
  -1 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-09 15:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, dm-devel,
	linux-nvme, linux-fsdevel, Jens Axboe,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche,
	martin.petersen@oracle.com >> Martin K. Petersen, roland,
	Hannes Reinecke, kbus @imap.gmail.com>> Keith Busch,
	Christoph Hellwig, Frederick.Knight, zach.brown, osandov, lsf-pc,
	djwong, josef, clm, dsterba, tytso, jack

On 3/9/22 10:51, Mikulas Patocka wrote:
> 
> Hi
> 
> Note that you must submit kcopyd callbacks from a single thread, otherwise
> there's a race condition in snapshot.
> 

Hi,

Thanks for the feedback. Yes, I'm aware of that.

> The snapshot code doesn't take locks in the copy_callback and it expects
> that the callbacks are serialized.
> 
> Maybe, adding the locks to copy_callback would solve it.
> 

That's what I did. I used a lock to ensure that kcopyd callbacks are
serialized for persistent snapshots.

For transient snapshots we can lift this limitation, and complete
pending exceptions out-of-oder and in "parallel", i.e., without
explicitly serializing kcopyd callbacks. The locks in pending_complete()
are enough in this case.

Nikos

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2022-03-09 15:49             ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2022-03-09 15:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: djwong, linux-nvme, clm, dm-devel, osandov,
	msnitzer@redhat.com >> msnitzer@redhat.com,
	Bart Van Assche, linux-scsi, Christoph Hellwig, roland,
	zach.brown, Chaitanya Kulkarni, josef, linux-block, dsterba,
	kbus @imap.gmail.com>> Keith Busch, Frederick.Knight,
	Jens Axboe, tytso,
	martin.petersen@oracle.com >> Martin K. Petersen, jack,
	linux-fsdevel, lsf-pc

On 3/9/22 10:51, Mikulas Patocka wrote:
> 
> Hi
> 
> Note that you must submit kcopyd callbacks from a single thread, otherwise
> there's a race condition in snapshot.
> 

Hi,

Thanks for the feedback. Yes, I'm aware of that.

> The snapshot code doesn't take locks in the copy_callback and it expects
> that the callbacks are serialized.
> 
> Maybe, adding the locks to copy_callback would solve it.
> 

That's what I did. I used a lock to ensure that kcopyd callbacks are
serialized for persistent snapshots.

For transient snapshots we can lift this limitation, and complete
pending exceptions out-of-oder and in "parallel", i.e., without
explicitly serializing kcopyd callbacks. The locks in pending_complete()
are enough in this case.

Nikos

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
       [not found] <CGME20230113094723epcas5p2f6f81ca1ad85f4b26829b87f8ec301ce@epcas5p2.samsung.com>
@ 2023-01-13  9:46 ` Nitesh Shetty
  0 siblings, 0 replies; 272+ messages in thread
From: Nitesh Shetty @ 2023-01-13  9:46 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-block, linux-scsi, linux-nvme, Nitesh Shetty

Hi all,

Background:
===========
Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.
We are working on copy offload[1], mainly focused on NVMe Copy command
(single NS, TP 4065) and NVMe over fabrics.
Previously Chaitanya Kulkarni presented talk on copy offload[2].

Current state of work:
======================
	Our latest patch series[1] covers
	1. Driver
		- NVMe Copy command (single NS, TP 4065), including support
		in nvme-target (for block and file back-end).

	2. Block layer
		- Block-generic copy (REQ_COPY flag), with interface
		accommodating two block-devs, and multi-source/destination
		interface
		- Emulation, when offload is natively absent
		- dm-linear support (for cases not requiring split)

	3. User-interface
		- new ioctl

	4. In-kernel user
		- dm-kcopyd
		
	5. Tools:
		- fio
		- blktests

	Performance:
	With the async design of copy-emulation/offload using fio[3],
	we were  able to see the following improvements as
	compared to userspace read and write on a NVMeOF TCP setup:
	Setup1: Network Speed: 1000Mb/s
		Host PC: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
		Target PC: AMD Ryzen 9 5900X 12-Core Processor
		block size 8k, range 1:
			635% improvement in IO BW (107 MiB/s to 787 MiB/s).
			Network utilisation drops from  97% to 14%.
		block-size 2M, range 16:
			2555% improvement in IO BW (100 MiB/s to 2655 MiB/s).
			Network utilisation drops from 89% to 0.62%.
	Setup2: Network Speed: 100Gb/s
		Server: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 72 cores
			(host and target have the same configuration)
		block-size 8k, range 1:
			6.5% improvement in IO BW (791 MiB/s to 843 MiB/s).
			Network utilisation drops from  6.75% to 0.14%.
		block-size 2M, range 16:
			15% improvement in IO BW (1027 MiB/s to 1183 MiB/s).
			Network utilisation drops from 8.42% to ~0%.
	Overall, in these tests, kernel copy emulation reduces network utilisation
	drastically and improves IO bandwidth.

What we will discuss in the proposed session ?
==============================================
Through this session I would like to discuss and get opinion of community
on minimum set of requirement for copy offload for this phase. Also share
some of the blockers we are facing and would get opinion on how we can
proceed further.

Required attendees:
===================
	Martin K. Petersen
	Jens Axboe
	Christoph Hellwig
	Mike Snitzer
	Hannes Reinecke
	Chaitanya Kulkarni
	Bart Van Assche
	Damien Le Moal
	Mikulas Patocka
	Keith Busch
	Sagi Grimberg
	Javier Gonzalez
	Kanchan Joshi

Links:
======
[1] https://lore.kernel.org/lkml/20230112115908.23662-1-nj.shetty@samsung.com/T/#m91a2f506eaa214035a8596fa9aa8d2b9f46654cc
[2] https://lore.kernel.org/all/BYAPR04MB49652C4B75E38F3716F3C06386539@BYAPR04MB4965.namprd04.prod.outlook.com/
[3] https://github.com/vincentkfu/fio/tree/copyoffload


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 13:43                     ` Javier González
  2021-11-16 17:59                       ` Bart Van Assche
  2021-11-19 10:47                       ` Kanchan Joshi
@ 2021-11-22  7:39                       ` Kanchan Joshi
  2 siblings, 0 replies; 272+ messages in thread
From: Kanchan Joshi @ 2021-11-22  7:39 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu, Bart Van Assche

Updated one (points from Keith and Bart) -

Given the multitude of things accumulated on this topic, Martin
suggested to have a table/matrix.
Some of those should go in the initial patchset, and the remaining are
to be staged for subsequent work.
Here is the attempt to split the stuff into two buckets. Please change
if something needs to be changed below.

1. Driver
*********
Initial: NVMe Copy command (single NS), including support in nvme-target
Subsequent: Multi NS copy, XCopy/Token-based Copy

2. Block layer
**************
Initial:
- Block-generic copy (REQ_OP_COPY), with interface accommodating two block-devs
- Emulation, when offload is natively absent
- DM support (at least dm-linear)

Subsequent: Integrity and encryption support

3. User-interface
*****************
Initial: new ioctl or io_uring opcode

4. In-kernel user
******************
Initial: at least one user
- dm-kcopyd user (e.g. dm-clone), or FS requiring GC (F2FS/Btrfs)

Subsequent:
- copy_file_range

On Tue, Nov 16, 2021 at 7:15 PM Javier González <javier@javigon.com> wrote:
>
> Hi all,
>
> Thanks for attending the call on Copy Offload yesterday. Here you have
> the meeting notes and 2 specific actions before we proceed with another
> version of the patchset.
>
> We will work on a version of the use-case matrix internally and reply
> here in the next couple of days.
>
> Please, add to the notes and the matrix as you see fit.
>
> Thanks,
> Javier
>
> ----
>
> ATTENDEES
>
> - Adam
> - Arnav
> - Chaitanya
> - Himashu
> - Johannes
> - Kanchan
> - Keith
> - Martin
> - Mikulas
> - Niklas
> - Nitesh
> - Selva
> - Vincent
> - Bart
>
> NOTES
>
> - MD and DM are hard requirements
>         - We need support for all the main users of the block layer
>         - Same problem with crypto and integrity
> - Martin would be OK with separating Simple Copy in ZNS and Copy Offload
> - Why did Mikulas work not get upstream?
>         - Timing was an issue
>                 - Use-case was about copying data across VMs
>                 - No HW vendor support
>                 - Hard from a protocol perspective
>                         - At that point, SCSI was still adding support in the spec
>                         - MSFT could not implement extended copy command in the target (destination) device.
>                                 - This is what triggered the token-based implementation
>                                 - This triggered array vendors to implement support for copy offload as token-based. This allows mixing with normal read / write workloads
>                         - Martin lost the implementation and dropped it
>
> DIRECTION
>
> - Keeping the IOCTL interface is an option. It might make sense to move from IOCTL to io_uring opcode
> - Martin is happy to do the SCSIpart if the block layer API is upstreamed
> - Token-based implementationis the norm. This allows mixing normal read / write workloads to avoid DoS
>         - This is the direction as opposed to the extended copy command
>         - It addresses problems when integrating with DM and simplifies command multiplexing a single bio into many
>         - It simplifies multiple bios
>         - We should explore Mikulas approach with pointers.
> - Use-cases
>         - ZNS GC
>         - dm-kcopyd
>         - file system GC
>         - fabrics offload to the storage node
>         - copy_file_range
> - It is OK to implement support incrementally, but the interface needs to support all customers of the block layer
>         - OK to not support specific DMs (e.g., RAID5)
>         - We should support DM and MD as a framework and the personalities that are needed. Inspiration in integrity
>                 - dm-linear
>                 - dm-crypt and dm-verify are needed for F2FSuse-case in Androd
>                         - Here, we need copy emulation to support encryption without dealing with HW issues and garbage
> - User-interface can wait and be carried out on the side
> - Maybe it makes sense to start with internal users
>         - copy_file_range
>         - F2FS GC, btrfs GC
> - User-space should be allowed to do anything and kernel-space can chop the command accordingly
> - We need to define the heuristics of the sizes
> - User-space should only work on block devices (no other constructs that are protocol-specific) . Export capabilities in sysfs
>         - Need copy domains to be exposed in sysfs
>         - We need to start with bdev to bdev in block layer
>         - Not specific requirement on multi-namespace in NVMe, but it should be extendable
>         - Plumbing will support all use-cases
> - Try to start with one in-kernel consumer
> - Emulation is a must
>         - Needed for failed I/Os
>         - Expose capabilities so that users can decide
> - We can get help from btrfs and F2FS folks
> - The use case for GC and for copy are different. We might have to reflect this in the interface, but the internal plumbing should allow both paths to be maintained as a single one.
>
> ACTIONS
>
> - [ ] Make a list of use-cases that we want to support in each specification and pick 1-2 examples for MD, DM. Make sure that the interfaces support this
> - [ ] Vendors: Ask internally what is the recommended size for copy, if
>    any
>


-- 
Joshi

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-19 10:47                       ` Kanchan Joshi
  2021-11-19 15:51                         ` Keith Busch
@ 2021-11-19 16:21                         ` Bart Van Assche
  1 sibling, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-11-19 16:21 UTC (permalink / raw)
  To: Kanchan Joshi, Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu

On 11/19/21 02:47, Kanchan Joshi wrote:
> Given the multitude of things accumulated on this topic, Martin
> suggested to have a table/matrix.
> Some of those should go in the initial patchset, and the remaining are
> to be staged for subsequent work.
> Here is the attempt to split the stuff into two buckets. Please change
> if something needs to be changed below.
> 
> 1. Driver
> *********
> Initial: NVMe Copy command (single NS)
> Subsequent: Multi NS copy, XCopy/Token-based Copy
> 
> 2. Block layer
> **************
> Initial:
> - Block-generic copy (REQ_OP_COPY), with interface accommodating two block-devs
> - Emulation, when offload is natively absent
> - DM support (at least dm-linear)
> 
> 3. User-interface
> *****************
> Initial: new ioctl or io_uring opcode
> 
> 4. In-kernel user
> ******************
> Initial: at least one user
> - dm-kcopyd user (e.g. dm-clone), or FS requiring GC (F2FS/Btrfs)
> 
> Subsequent:
> - copy_file_range

Integrity support and inline encryption support are missing from the above
overview. Both are supported by the block layer. See also block/blk-integrity.c
and include/linux/blk-crypto.h. I'm not claiming that these should be supported
in the first version but I think it would be good to add these to the above
overview.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-19 10:47                       ` Kanchan Joshi
@ 2021-11-19 15:51                         ` Keith Busch
  2021-11-19 16:21                         ` Bart Van Assche
  1 sibling, 0 replies; 272+ messages in thread
From: Keith Busch @ 2021-11-19 15:51 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: Javier González, Chaitanya Kulkarni, Johannes Thumshirn,
	Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, martin.petersen, roland,
	mpatocka, hare, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov, Adam Manzanares, SelvaKumar S, Nitesh Shetty,
	Kanchan Joshi, Vincent Fu, Bart Van Assche

On Fri, Nov 19, 2021 at 04:17:51PM +0530, Kanchan Joshi wrote:
> Given the multitude of things accumulated on this topic, Martin
> suggested to have a table/matrix.
> Some of those should go in the initial patchset, and the remaining are
> to be staged for subsequent work.
> Here is the attempt to split the stuff into two buckets. Please change
> if something needs to be changed below.
> 
> 1. Driver
> *********
> Initial: NVMe Copy command (single NS)

Does this point include implementing the copy command in the nvme target
driver or just the host side? Enabling the target should be pretty
straight forward, and would provide an in-kernel standard compliant
"device" that everyone could test future improvements against.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 13:43                     ` Javier González
  2021-11-16 17:59                       ` Bart Van Assche
@ 2021-11-19 10:47                       ` Kanchan Joshi
  2021-11-19 15:51                         ` Keith Busch
  2021-11-19 16:21                         ` Bart Van Assche
  2021-11-22  7:39                       ` Kanchan Joshi
  2 siblings, 2 replies; 272+ messages in thread
From: Kanchan Joshi @ 2021-11-19 10:47 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu, Bart Van Assche

Given the multitude of things accumulated on this topic, Martin
suggested to have a table/matrix.
Some of those should go in the initial patchset, and the remaining are
to be staged for subsequent work.
Here is the attempt to split the stuff into two buckets. Please change
if something needs to be changed below.

1. Driver
*********
Initial: NVMe Copy command (single NS)
Subsequent: Multi NS copy, XCopy/Token-based Copy

2. Block layer
**************
Initial:
- Block-generic copy (REQ_OP_COPY), with interface accommodating two block-devs
- Emulation, when offload is natively absent
- DM support (at least dm-linear)

3. User-interface
*****************
Initial: new ioctl or io_uring opcode

4. In-kernel user
******************
Initial: at least one user
- dm-kcopyd user (e.g. dm-clone), or FS requiring GC (F2FS/Btrfs)

Subsequent:
- copy_file_range

Thanks,
On Tue, Nov 16, 2021 at 7:15 PM Javier González <javier@javigon.com> wrote:
>
> Hi all,
>
> Thanks for attending the call on Copy Offload yesterday. Here you have
> the meeting notes and 2 specific actions before we proceed with another
> version of the patchset.
>
> We will work on a version of the use-case matrix internally and reply
> here in the next couple of days.
>
> Please, add to the notes and the matrix as you see fit.
>
> Thanks,
> Javier
>
> ----
>
> ATTENDEES
>
> - Adam
> - Arnav
> - Chaitanya
> - Himashu
> - Johannes
> - Kanchan
> - Keith
> - Martin
> - Mikulas
> - Niklas
> - Nitesh
> - Selva
> - Vincent
> - Bart
>
> NOTES
>
> - MD and DM are hard requirements
>         - We need support for all the main users of the block layer
>         - Same problem with crypto and integrity
> - Martin would be OK with separating Simple Copy in ZNS and Copy Offload
> - Why did Mikulas work not get upstream?
>         - Timing was an issue
>                 - Use-case was about copying data across VMs
>                 - No HW vendor support
>                 - Hard from a protocol perspective
>                         - At that point, SCSI was still adding support in the spec
>                         - MSFT could not implement extended copy command in the target (destination) device.
>                                 - This is what triggered the token-based implementation
>                                 - This triggered array vendors to implement support for copy offload as token-based. This allows mixing with normal read / write workloads
>                         - Martin lost the implementation and dropped it
>
> DIRECTION
>
> - Keeping the IOCTL interface is an option. It might make sense to move from IOCTL to io_uring opcode
> - Martin is happy to do the SCSIpart if the block layer API is upstreamed
> - Token-based implementationis the norm. This allows mixing normal read / write workloads to avoid DoS
>         - This is the direction as opposed to the extended copy command
>         - It addresses problems when integrating with DM and simplifies command multiplexing a single bio into many
>         - It simplifies multiple bios
>         - We should explore Mikulas approach with pointers.
> - Use-cases
>         - ZNS GC
>         - dm-kcopyd
>         - file system GC
>         - fabrics offload to the storage node
>         - copy_file_range
> - It is OK to implement support incrementally, but the interface needs to support all customers of the block layer
>         - OK to not support specific DMs (e.g., RAID5)
>         - We should support DM and MD as a framework and the personalities that are needed. Inspiration in integrity
>                 - dm-linear
>                 - dm-crypt and dm-verify are needed for F2FSuse-case in Androd
>                         - Here, we need copy emulation to support encryption without dealing with HW issues and garbage
> - User-interface can wait and be carried out on the side
> - Maybe it makes sense to start with internal users
>         - copy_file_range
>         - F2FS GC, btrfs GC
> - User-space should be allowed to do anything and kernel-space can chop the command accordingly
> - We need to define the heuristics of the sizes
> - User-space should only work on block devices (no other constructs that are protocol-specific) . Export capabilities in sysfs
>         - Need copy domains to be exposed in sysfs
>         - We need to start with bdev to bdev in block layer
>         - Not specific requirement on multi-namespace in NVMe, but it should be extendable
>         - Plumbing will support all use-cases
> - Try to start with one in-kernel consumer
> - Emulation is a must
>         - Needed for failed I/Os
>         - Expose capabilities so that users can decide
> - We can get help from btrfs and F2FS folks
> - The use case for GC and for copy are different. We might have to reflect this in the interface, but the internal plumbing should allow both paths to be maintained as a single one.
>
> ACTIONS
>
> - [ ] Make a list of use-cases that we want to support in each specification and pick 1-2 examples for MD, DM. Make sure that the interfaces support this
> - [ ] Vendors: Ask internally what is the recommended size for copy, if
>    any
>


-- 
Joshi

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-17 15:52                           ` Bart Van Assche
@ 2021-11-19  7:38                             ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-11-19  7:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu



> On 17 Nov 2021, at 16.52, Bart Van Assche <bvanassche@acm.org> wrote:
> 
> On 11/17/21 04:53, Javier González wrote:
>> Thanks for sharing this. We will make sure that DM / MD are supported
>> and then we can cover examples. Hopefully, you guys can help with the
>> bits for dm-crypt to make the decision to offload when it make sense.
> 
> Will ask around to learn who should work on this.

Great. Thanks. 
> 
>> I will update the notes to keep them alive. Maybe we can have them open
>> in your github page?
> 
> Feel free to submit a pull request.

Will do.

Javier 

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-17 12:53                         ` Javier González
@ 2021-11-17 15:52                           ` Bart Van Assche
  2021-11-19  7:38                             ` Javier González
  0 siblings, 1 reply; 272+ messages in thread
From: Bart Van Assche @ 2021-11-17 15:52 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu

On 11/17/21 04:53, Javier González wrote:
> Thanks for sharing this. We will make sure that DM / MD are supported
> and then we can cover examples. Hopefully, you guys can help with the
> bits for dm-crypt to make the decision to offload when it make sense.

Will ask around to learn who should work on this.

> I will update the notes to keep them alive. Maybe we can have them open
> in your github page?

Feel free to submit a pull request.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 17:59                       ` Bart Van Assche
@ 2021-11-17 12:53                         ` Javier González
  2021-11-17 15:52                           ` Bart Van Assche
  0 siblings, 1 reply; 272+ messages in thread
From: Javier González @ 2021-11-17 12:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Adam Manzanares, SelvaKumar S, Nitesh Shetty, Kanchan Joshi,
	Vincent Fu

On 16.11.2021 09:59, Bart Van Assche wrote:
>On 11/16/21 05:43, Javier González wrote:
>>             - Here, we need copy emulation to support encryption 
>>without dealing with HW issues and garbage
>
>Hi Javier,
>
>Thanks very much for having taken notes and also for having shared 
>these.

My pleasure. Thanks for attending. Happy to see this moving.

>Regarding the above comment, after the meeting I learned that 
>the above is not correct. Encryption in Android is LBA independent and 
>hence it should be possible to offload F2FS garbage collection in 
>Android once the (UFS) storage controller supports this.
>
>For the general case, I propose to let the dm-crypt driver decide 
>whether or not to offload data copying since that driver knows whether 
>or not data copying can be offloaded.

Thanks for sharing this. We will make sure that DM / MD are supported
and then we can cover examples. Hopefully, you guys can help with the
bits for dm-crypt to make the decision to offload when it make sense.

I will update the notes to keep them alive. Maybe we can have them open
in your github page?

>
>Thanks,
>
>Bart.


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-16 13:43                     ` Javier González
@ 2021-11-16 17:59                       ` Bart Van Assche
  2021-11-17 12:53                         ` Javier González
  2021-11-19 10:47                       ` Kanchan Joshi
  2021-11-22  7:39                       ` Kanchan Joshi
  2 siblings, 1 reply; 272+ messages in thread
From: Bart Van Assche @ 2021-11-16 17:59 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 11/16/21 05:43, Javier González wrote:
>              - Here, we need copy emulation to support encryption 
> without dealing with HW issues and garbage

Hi Javier,

Thanks very much for having taken notes and also for having shared 
these. Regarding the above comment, after the meeting I learned that the 
above is not correct. Encryption in Android is LBA independent and hence 
it should be possible to offload F2FS garbage collection in Android once 
the (UFS) storage controller supports this.

For the general case, I propose to let the dm-crypt driver decide 
whether or not to offload data copying since that driver knows whether 
or not data copying can be offloaded.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-11-03 19:27                   ` Javier González
@ 2021-11-16 13:43                     ` Javier González
  2021-11-16 17:59                       ` Bart Van Assche
                                         ` (2 more replies)
  0 siblings, 3 replies; 272+ messages in thread
From: Javier González @ 2021-11-16 13:43 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

Hi all,

Thanks for attending the call on Copy Offload yesterday. Here you have
the meeting notes and 2 specific actions before we proceed with another
version of the patchset.

We will work on a version of the use-case matrix internally and reply
here in the next couple of days.

Please, add to the notes and the matrix as you see fit.

Thanks,
Javier

----

ATTENDEES

- Adam
- Arnav
- Chaitanya
- Himashu
- Johannes
- Kanchan
- Keith
- Martin
- Mikulas
- Niklas
- Nitesh
- Selva
- Vincent
- Bart

NOTES

- MD and DM are hard requirements
	- We need support for all the main users of the block layer
	- Same problem with crypto and integrity
- Martin would be OK with separating Simple Copy in ZNS and Copy Offload
- Why did Mikulas work not get upstream?
	- Timing was an issue
		- Use-case was about copying data across VMs
		- No HW vendor support
		- Hard from a protocol perspective
			- At that point, SCSI was still adding support in the spec
			- MSFT could not implement extended copy command in the target (destination) device.
				- This is what triggered the token-based implementation
				- This triggered array vendors to implement support for copy offload as token-based. This allows mixing with normal read / write workloads
			- Martin lost the implementation and dropped it

DIRECTION

- Keeping the IOCTL interface is an option. It might make sense to move from IOCTL to io_uring opcode
- Martin is happy to do the SCSIpart if the block layer API is upstreamed
- Token-based implementationis the norm. This allows mixing normal read / write workloads to avoid DoS
	- This is the direction as opposed to the extended copy command
	- It addresses problems when integrating with DM and simplifies command multiplexing a single bio into many
	- It simplifies multiple bios
	- We should explore Mikulas approach with pointers.
- Use-cases
	- ZNS GC
	- dm-kcopyd
	- file system GC
	- fabrics offload to the storage node
	- copy_file_range
- It is OK to implement support incrementally, but the interface needs to support all customers of the block layer
	- OK to not support specific DMs (e.g., RAID5)
	- We should support DM and MD as a framework and the personalities that are needed. Inspiration in integrity
		- dm-linear
		- dm-crypt and dm-verify are needed for F2FSuse-case in Androd
			- Here, we need copy emulation to support encryption without dealing with HW issues and garbage
- User-interface can wait and be carried out on the side
- Maybe it makes sense to start with internal users
	- copy_file_range
	- F2FS GC, btrfs GC
- User-space should be allowed to do anything and kernel-space can chop the command accordingly
- We need to define the heuristics of the sizes
- User-space should only work on block devices (no other constructs that are protocol-specific) . Export capabilities in sysfs
	- Need copy domains to be exposed in sysfs
	- We need to start with bdev to bdev in block layer
	- Not specific requirement on multi-namespace in NVMe, but it should be extendable
	- Plumbing will support all use-cases
- Try to start with one in-kernel consumer
- Emulation is a must
	- Needed for failed I/Os
	- Expose capabilities so that users can decide
- We can get help from btrfs and F2FS folks
- The use case for GC and for copy are different. We might have to reflect this in the interface, but the internal plumbing should allow both paths to be maintained as a single one.

ACTIONS

- [ ] Make a list of use-cases that we want to support in each specification and pick 1-2 examples for MD, DM. Make sure that the interfaces support this
- [ ] Vendors: Ask internally what is the recommended size for copy, if
   any


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  8:14                 ` Javier González
@ 2021-11-03 19:27                   ` Javier González
  2021-11-16 13:43                     ` Javier González
  0 siblings, 1 reply; 272+ messages in thread
From: Javier González @ 2021-11-03 19:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 29.10.2021 10:14, Javier González wrote:
>On 29.10.2021 00:21, Chaitanya Kulkarni wrote:
>>On 10/7/21 11:49 PM, Javier González wrote:
>>>External email: Use caution opening links or attachments
>>>
>>>
>>>On 06.10.2021 10:33, Bart Van Assche wrote:
>>>>On 10/6/21 3:05 AM, Javier González wrote:
>>>>>I agree that the topic is complex. However, we have not been able to
>>>>>find a clear path forward in the mailing list.
>>>>
>>>>Hmm ... really? At least Martin Petersen and I consider device mapper
>>>>support essential. How about starting from Mikulas' patch series that
>>>>supports the device mapper? See also
>>>>https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>>
>>
>>When we add a new REQ_OP_XXX we need to make sure it will work with
>>device mapper, so I agree with Bart and Martin.
>>
>>Starting with Mikulas patches is a right direction as of now..
>>
>>>
>>>Thanks for the pointers. We are looking into Mikulas' patch - I agree
>>>that it is a good start.
>>>
>>>>>What do you think about joining the call to talk very specific next
>>>>>steps to get a patchset that we can start reviewing in detail.
>>>>
>>>>I can do that.
>>>
>>>Thanks. I will wait until Chaitanya's reply on his questions. We will
>>>start suggesting some dates then.
>>>
>>
>>I think at this point we need to at least decide on having a first call
>>focused on how to proceed forward with Mikulas approach  ...
>>
>>Javier, can you please organize a call with people you listed in this
>>thread earlier ?
>
>Here you have a Doogle for end of next week and the week after OCP.
>Please fill it out until Wednesday. I will set up a call with the
>selected slot:
>
>    https://doodle.com/poll/r2c8duy3r8g88v8q?utm_source=poll&utm_medium=link
>
>Thanks,
>Javier

I sent the invite for the people that signed up into the Doodle. The
call will take place on Monday November 15th, 17.00-19.00 CET. See the
list of current participants below. If anyone else wants to participate,
please send me a note and I will extend the invite.

   Johannes.Thumshirn@wdc.com
   Vincent.fu@samsung.com
   a.dawn@samsung.com
   a.manzanares@samsung.com
   bvanassche@acm.org
   himanshu.madhani@oracle.com
   joshi.k@samsung.com
   kch@nvidia.com
   martin.petersen@oracle.com
   nj.shetty@samsung.com
   selvakuma.s1@samsung

Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29 16:15                   ` Bart Van Assche
@ 2021-11-01 17:54                     ` Keith Busch
  0 siblings, 0 replies; 272+ messages in thread
From: Keith Busch @ 2021-11-01 17:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hannes Reinecke, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc

On Fri, Oct 29, 2021 at 09:15:43AM -0700, Bart Van Assche wrote:
> On 10/28/21 10:51 PM, Hannes Reinecke wrote:
> > Also Keith presented his work on a simple zone-based remapping block device, which included an in-kernel copy offload facility.
> > Idea is to lift that as a standalone patch such that we can use it a fallback (ie software) implementation if no other copy offload mechanism is available.
> 
> Is a link to the presentation available?

Thanks for the interest.

I didn't post them online as the conference didn't provide it, and I
don't think the slides would be particularly interesting without the
prepared speech anyway.

The presentation described a simple prototype implementing a redirection
table on zone block devices. There was one bullet point explaining how a
generic kernel implementation would be an improvement. For zoned block
devices, an "append" like copy offload would be an even better option.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  5:51                 ` Hannes Reinecke
  2021-10-29  8:16                   ` Javier González
@ 2021-10-29 16:15                   ` Bart Van Assche
  2021-11-01 17:54                     ` Keith Busch
  1 sibling, 1 reply; 272+ messages in thread
From: Bart Van Assche @ 2021-10-29 16:15 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc

On 10/28/21 10:51 PM, Hannes Reinecke wrote:
> Also Keith presented his work on a simple zone-based remapping block device, which included an in-kernel copy offload facility.
> Idea is to lift that as a standalone patch such that we can use it a fallback (ie software) implementation if no other copy offload mechanism is available.

Is a link to the presentation available?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  5:51                 ` Hannes Reinecke
@ 2021-10-29  8:16                   ` Javier González
  2021-10-29 16:15                   ` Bart Van Assche
  1 sibling, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-29  8:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chaitanya Kulkarni, Johannes Thumshirn, Chaitanya Kulkarni,
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, martin.petersen, roland, mpatocka, kbusch, rwheeler,
	hch, Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu,
	Bart Van Assche

On 29.10.2021 07:51, Hannes Reinecke wrote:
>On 10/29/21 2:21 AM, Chaitanya Kulkarni wrote:
>>On 10/7/21 11:49 PM, Javier González wrote:
>>>External email: Use caution opening links or attachments
>>>
>>>
>>>On 06.10.2021 10:33, Bart Van Assche wrote:
>>>>On 10/6/21 3:05 AM, Javier González wrote:
>>>>>I agree that the topic is complex. However, we have not been able to
>>>>>find a clear path forward in the mailing list.
>>>>
>>>>Hmm ... really? At least Martin Petersen and I consider device mapper
>>>>support essential. How about starting from Mikulas' patch series that
>>>>supports the device mapper? See also
>>>>https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>>
>>
>>When we add a new REQ_OP_XXX we need to make sure it will work with
>>device mapper, so I agree with Bart and Martin.
>>
>>Starting with Mikulas patches is a right direction as of now..
>>
>>>
>>>Thanks for the pointers. We are looking into Mikulas' patch - I agree
>>>that it is a good start.
>>>
>>>>>What do you think about joining the call to talk very specific next
>>>>>steps to get a patchset that we can start reviewing in detail.
>>>>
>>>>I can do that.
>>>
>>>Thanks. I will wait until Chaitanya's reply on his questions. We will
>>>start suggesting some dates then.
>>>
>>
>>I think at this point we need to at least decide on having a first call
>>focused on how to proceed forward with Mikulas approach  ...
>>
>>Javier, can you please organize a call with people you listed in this
>>thread earlier ?
>>
>Also Keith presented his work on a simple zone-based remapping block 
>device, which included an in-kernel copy offload facility.
>Idea is to lift that as a standalone patch such that we can use it a 
>fallback (ie software) implementation if no other copy offload 
>mechanism is available.
>

I believe this is in essence what we are trying to convey here: a
minimal patchset that enables Simple Copy and the infra around to extend
copy-offload use-cases.

I look forward to hear Keith's ideas around this!

Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  0:21               ` Chaitanya Kulkarni
  2021-10-29  5:51                 ` Hannes Reinecke
@ 2021-10-29  8:14                 ` Javier González
  2021-11-03 19:27                   ` Javier González
  1 sibling, 1 reply; 272+ messages in thread
From: Javier González @ 2021-10-29  8:14 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 29.10.2021 00:21, Chaitanya Kulkarni wrote:
>On 10/7/21 11:49 PM, Javier González wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 06.10.2021 10:33, Bart Van Assche wrote:
>>> On 10/6/21 3:05 AM, Javier González wrote:
>>>> I agree that the topic is complex. However, we have not been able to
>>>> find a clear path forward in the mailing list.
>>>
>>> Hmm ... really? At least Martin Petersen and I consider device mapper
>>> support essential. How about starting from Mikulas' patch series that
>>> supports the device mapper? See also
>>> https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>
>
>When we add a new REQ_OP_XXX we need to make sure it will work with
>device mapper, so I agree with Bart and Martin.
>
>Starting with Mikulas patches is a right direction as of now..
>
>>
>> Thanks for the pointers. We are looking into Mikulas' patch - I agree
>> that it is a good start.
>>
>>>> What do you think about joining the call to talk very specific next
>>>> steps to get a patchset that we can start reviewing in detail.
>>>
>>> I can do that.
>>
>> Thanks. I will wait until Chaitanya's reply on his questions. We will
>> start suggesting some dates then.
>>
>
>I think at this point we need to at least decide on having a first call
>focused on how to proceed forward with Mikulas approach  ...
>
>Javier, can you please organize a call with people you listed in this
>thread earlier ?

Here you have a Doogle for end of next week and the week after OCP.
Please fill it out until Wednesday. I will set up a call with the
selected slot:

     https://doodle.com/poll/r2c8duy3r8g88v8q?utm_source=poll&utm_medium=link

Thanks,
Javier


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-29  0:21               ` Chaitanya Kulkarni
@ 2021-10-29  5:51                 ` Hannes Reinecke
  2021-10-29  8:16                   ` Javier González
  2021-10-29 16:15                   ` Bart Van Assche
  2021-10-29  8:14                 ` Javier González
  1 sibling, 2 replies; 272+ messages in thread
From: Hannes Reinecke @ 2021-10-29  5:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 10/29/21 2:21 AM, Chaitanya Kulkarni wrote:
> On 10/7/21 11:49 PM, Javier González wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 06.10.2021 10:33, Bart Van Assche wrote:
>>> On 10/6/21 3:05 AM, Javier González wrote:
>>>> I agree that the topic is complex. However, we have not been able to
>>>> find a clear path forward in the mailing list.
>>>
>>> Hmm ... really? At least Martin Petersen and I consider device mapper
>>> support essential. How about starting from Mikulas' patch series that
>>> supports the device mapper? See also
>>> https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/
>>>
> 
> When we add a new REQ_OP_XXX we need to make sure it will work with
> device mapper, so I agree with Bart and Martin.
> 
> Starting with Mikulas patches is a right direction as of now..
> 
>>
>> Thanks for the pointers. We are looking into Mikulas' patch - I agree
>> that it is a good start.
>>
>>>> What do you think about joining the call to talk very specific next
>>>> steps to get a patchset that we can start reviewing in detail.
>>>
>>> I can do that.
>>
>> Thanks. I will wait until Chaitanya's reply on his questions. We will
>> start suggesting some dates then.
>>
> 
> I think at this point we need to at least decide on having a first call
> focused on how to proceed forward with Mikulas approach  ...
> 
> Javier, can you please organize a call with people you listed in this
> thread earlier ?
> 
Also Keith presented his work on a simple zone-based remapping block 
device, which included an in-kernel copy offload facility.
Idea is to lift that as a standalone patch such that we can use it a 
fallback (ie software) implementation if no other copy offload mechanism 
is available.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-08  6:49               ` Javier González
  (?)
@ 2021-10-29  0:21               ` Chaitanya Kulkarni
  2021-10-29  5:51                 ` Hannes Reinecke
  2021-10-29  8:14                 ` Javier González
  -1 siblings, 2 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-29  0:21 UTC (permalink / raw)
  To: Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu, Bart Van Assche

On 10/7/21 11:49 PM, Javier González wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 06.10.2021 10:33, Bart Van Assche wrote:
>> On 10/6/21 3:05 AM, Javier González wrote:
>>> I agree that the topic is complex. However, we have not been able to
>>> find a clear path forward in the mailing list.
>>
>> Hmm ... really? At least Martin Petersen and I consider device mapper
>> support essential. How about starting from Mikulas' patch series that
>> supports the device mapper? See also 
>> https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/ 
>>

When we add a new REQ_OP_XXX we need to make sure it will work with 
device mapper, so I agree with Bart and Martin.

Starting with Mikulas patches is a right direction as of now..

> 
> Thanks for the pointers. We are looking into Mikulas' patch - I agree
> that it is a good start.
> 
>>> What do you think about joining the call to talk very specific next
>>> steps to get a patchset that we can start reviewing in detail.
>>
>> I can do that.
> 
> Thanks. I will wait until Chaitanya's reply on his questions. We will
> start suggesting some dates then.
> 

I think at this point we need to at least decide on having a first call
focused on how to proceed forward with Mikulas approach  ...

Javier, can you please organize a call with people you listed in this 
thread earlier ?

> Thanks,
> Javier


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-06 10:01           ` Javier González
@ 2021-10-13  8:35             ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-13  8:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

Chaitanya,

Did you have a chance to look at the answers below?

I would like to start finding candidate dates throughout the next couple
of weeks.

Thanks,
Javier

On 06.10.2021 12:01, Javier González wrote:
>On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>>Javier,
>>
>>>
>>>Hi all,
>>>
>>>Since we are not going to be able to talk about this at LSF/MM, a few of
>>>us thought about holding a dedicated virtual discussion about Copy
>>>Offload. I believe we can use Chaitanya's thread as a start. Given the
>>>current state of the current patches, I would propose that we focus on
>>>the next step to get the minimal patchset that can go upstream so that
>>>we can build from there.
>>>
>>
>>I agree with having a call as it has been two years I'm trying to have
>>this discussion.
>>
>>Before we setup a call, please summarize following here :-
>>
>>1. Exactly what work has been done so far.
>
>
>We can categorize that into two sets. First one for XCopy (2014), and
>second one for NVMe Copy (2021).
>
>XCOPY set *********
>- block-generic copy command (single range, between one
>  source/destination device)
>- ioctl interface for the above
>- SCSI plumbing (block-generic to XCOPY conversion)
>- device-mapper support: offload copy whenever possible (if IO is not
>  split while traveling layers of virtual devices)
>
>NVMe-Copy set *************
>- block-generic copy command (multiple ranges, between one
>  source/destination device)
>- ioctl interface for the above
>- NVMe plumbing (block-generic to NVMe Copy conversion)
>- copy-emulation (read + write) in block-layer
>- device-mapper support: no offload, rather fall back to copy-emulation
>
>
>>2. What kind of feedback you got.
>
>For NVMe Copy, the major points are - a) add copy-emulation in
>block-layer and use that if copy-offload is not natively supported by
>device b) user-interface (ioctl) should be extendable for copy across
>two devices (one source, one destination) c) device-mapper targets
>should support copy-offload, whenever possible
>
>"whenever possible" cases get reduced compared to XCOPY because NVMe
>Copy is wit
>
>>3. What are the exact blockers/objections.
>
>I think it was device-mapper for XCOPY and remains the same for NVMe
>Copy as well.  Device-mapper support requires decomposing copy operation
>to read and write.  While that is not great for efficiency PoV, bigger
>concern is to check if we are taking the same route as XCOPY.
>
>From Martin's document (http://mkp.net/pubs/xcopy.pdf), if I got it
>right, one the major blocker is having more failure cases than
>successful ones. And that did not justify the effort/code to wire up
>device mapper.  Is that a factor to consider for NVMe Copy (which is
>narrower in scope than XCOPY).
>
>>4. Potential ways of moving forward.
>
>a) we defer attempt device-mapper support (until NVMe has
>support/usecase), and address everything else (reusable user-interface
>etc.)
>
>b) we attempt device-mapper support (by moving to composite read+write
>communication between block-layer and nvme)
>
>
>Is this enough in your mind to move forward with a specific agenda? If
>we can, I would like to target the meetup in the next 2 weeks.
>
>Thanks,
>Javier


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-10-13  8:35             ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-13  8:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

Chaitanya,

Did you have a chance to look at the answers below?

I would like to start finding candidate dates throughout the next couple
of weeks.

Thanks,
Javier

On 06.10.2021 12:01, Javier González wrote:
>On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>>Javier,
>>
>>>
>>>Hi all,
>>>
>>>Since we are not going to be able to talk about this at LSF/MM, a few of
>>>us thought about holding a dedicated virtual discussion about Copy
>>>Offload. I believe we can use Chaitanya's thread as a start. Given the
>>>current state of the current patches, I would propose that we focus on
>>>the next step to get the minimal patchset that can go upstream so that
>>>we can build from there.
>>>
>>
>>I agree with having a call as it has been two years I'm trying to have
>>this discussion.
>>
>>Before we setup a call, please summarize following here :-
>>
>>1. Exactly what work has been done so far.
>
>
>We can categorize that into two sets. First one for XCopy (2014), and
>second one for NVMe Copy (2021).
>
>XCOPY set *********
>- block-generic copy command (single range, between one
>  source/destination device)
>- ioctl interface for the above
>- SCSI plumbing (block-generic to XCOPY conversion)
>- device-mapper support: offload copy whenever possible (if IO is not
>  split while traveling layers of virtual devices)
>
>NVMe-Copy set *************
>- block-generic copy command (multiple ranges, between one
>  source/destination device)
>- ioctl interface for the above
>- NVMe plumbing (block-generic to NVMe Copy conversion)
>- copy-emulation (read + write) in block-layer
>- device-mapper support: no offload, rather fall back to copy-emulation
>
>
>>2. What kind of feedback you got.
>
>For NVMe Copy, the major points are - a) add copy-emulation in
>block-layer and use that if copy-offload is not natively supported by
>device b) user-interface (ioctl) should be extendable for copy across
>two devices (one source, one destination) c) device-mapper targets
>should support copy-offload, whenever possible
>
>"whenever possible" cases get reduced compared to XCOPY because NVMe
>Copy is wit
>
>>3. What are the exact blockers/objections.
>
>I think it was device-mapper for XCOPY and remains the same for NVMe
>Copy as well.  Device-mapper support requires decomposing copy operation
>to read and write.  While that is not great for efficiency PoV, bigger
>concern is to check if we are taking the same route as XCOPY.
>
>From Martin's document (http://mkp.net/pubs/xcopy.pdf), if I got it
>right, one the major blocker is having more failure cases than
>successful ones. And that did not justify the effort/code to wire up
>device mapper.  Is that a factor to consider for NVMe Copy (which is
>narrower in scope than XCOPY).
>
>>4. Potential ways of moving forward.
>
>a) we defer attempt device-mapper support (until NVMe has
>support/usecase), and address everything else (reusable user-interface
>etc.)
>
>b) we attempt device-mapper support (by moving to composite read+write
>communication between block-layer and nvme)
>
>
>Is this enough in your mind to move forward with a specific agenda? If
>we can, I would like to target the meetup in the next 2 weeks.
>
>Thanks,
>Javier


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-06 17:33             ` Bart Van Assche
@ 2021-10-08  6:49               ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-08  6:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 06.10.2021 10:33, Bart Van Assche wrote:
>On 10/6/21 3:05 AM, Javier González wrote:
>>I agree that the topic is complex. However, we have not been able to
>>find a clear path forward in the mailing list.
>
>Hmm ... really? At least Martin Petersen and I consider device mapper 
>support essential. How about starting from Mikulas' patch series that 
>supports the device mapper? See also https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/

Thanks for the pointers. We are looking into Mikulas' patch - I agree
that it is a good start.

>>What do you think about joining the call to talk very specific next
>>steps to get a patchset that we can start reviewing in detail.
>
>I can do that.

Thanks. I will wait until Chaitanya's reply on his questions. We will
start suggesting some dates then.

Thanks,
Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-10-08  6:49               ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-08  6:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 06.10.2021 10:33, Bart Van Assche wrote:
>On 10/6/21 3:05 AM, Javier González wrote:
>>I agree that the topic is complex. However, we have not been able to
>>find a clear path forward in the mailing list.
>
>Hmm ... really? At least Martin Petersen and I consider device mapper 
>support essential. How about starting from Mikulas' patch series that 
>supports the device mapper? See also https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/

Thanks for the pointers. We are looking into Mikulas' patch - I agree
that it is a good start.

>>What do you think about joining the call to talk very specific next
>>steps to get a patchset that we can start reviewing in detail.
>
>I can do that.

Thanks. I will wait until Chaitanya's reply on his questions. We will
start suggesting some dates then.

Thanks,
Javier

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-10-06 10:05           ` Javier González
@ 2021-10-06 17:33             ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-10-06 17:33 UTC (permalink / raw)
  To: Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 10/6/21 3:05 AM, Javier González wrote:
> I agree that the topic is complex. However, we have not been able to
> find a clear path forward in the mailing list.

Hmm ... really? At least Martin Petersen and I consider device mapper 
support essential. How about starting from Mikulas' patch series that 
supports the device mapper? See also 
https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/

> What do you think about joining the call to talk very specific next
> steps to get a patchset that we can start reviewing in detail.

I can do that.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-10-06 17:33             ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-10-06 17:33 UTC (permalink / raw)
  To: Javier González
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 10/6/21 3:05 AM, Javier González wrote:
> I agree that the topic is complex. However, we have not been able to
> find a clear path forward in the mailing list.

Hmm ... really? At least Martin Petersen and I consider device mapper 
support essential. How about starting from Mikulas' patch series that 
supports the device mapper? See also 
https://lore.kernel.org/all/alpine.LRH.2.02.2108171630120.30363@file01.intranet.prod.int.rdu2.redhat.com/

> What do you think about joining the call to talk very specific next
> steps to get a patchset that we can start reviewing in detail.

I can do that.

Thanks,

Bart.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-30 16:20         ` Bart Van Assche
@ 2021-10-06 10:05           ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-06 10:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:20, Bart Van Assche wrote:
>On 9/28/21 12:13 PM, Javier González wrote:
>>Since we are not going to be able to talk about this at LSF/MM, a few of
>>us thought about holding a dedicated virtual discussion about Copy
>>Offload. I believe we can use Chaitanya's thread as a start. Given the
>>current state of the current patches, I would propose that we focus on
>>the next step to get the minimal patchset that can go upstream so that
>>we can build from there.
>>
>>Before we try to find a date and a time that fits most of us, who would
>>be interested in participating?
>
>Given the technical complexity of this topic and also that the people who are
>interested live in multiple time zones, I prefer email to discuss the technical
>aspects of this work. My attempt to summarize how to implement copy offloading
>is available here: https://protect2.fireeye.com/v1/url?k=ba7e5d9a-e5e564d5-ba7fd6d5-0cc47a30d446-07a47f3f53cbfe53&q=1&e=c3973bdc-b6fd-43fb-80e6-0c86cb6b4d5f&u=https%3A%2F%2Fgithub.com%2Fbvanassche%2Flinux-kernel-copy-offload.
>Feedback on this text is welcome.

Thanks for sharing this Bart.

I agree that the topic is complex. However, we have not been able to
find a clear path forward in the mailing list.

What do you think about joining the call to talk very specific next
steps to get a patchset that we can start reviewing in detail.

I think that your presence in the call will help us all.

What do you think?


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-10-06 10:05           ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-06 10:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:20, Bart Van Assche wrote:
>On 9/28/21 12:13 PM, Javier González wrote:
>>Since we are not going to be able to talk about this at LSF/MM, a few of
>>us thought about holding a dedicated virtual discussion about Copy
>>Offload. I believe we can use Chaitanya's thread as a start. Given the
>>current state of the current patches, I would propose that we focus on
>>the next step to get the minimal patchset that can go upstream so that
>>we can build from there.
>>
>>Before we try to find a date and a time that fits most of us, who would
>>be interested in participating?
>
>Given the technical complexity of this topic and also that the people who are
>interested live in multiple time zones, I prefer email to discuss the technical
>aspects of this work. My attempt to summarize how to implement copy offloading
>is available here: https://protect2.fireeye.com/v1/url?k=ba7e5d9a-e5e564d5-ba7fd6d5-0cc47a30d446-07a47f3f53cbfe53&q=1&e=c3973bdc-b6fd-43fb-80e6-0c86cb6b4d5f&u=https%3A%2F%2Fgithub.com%2Fbvanassche%2Flinux-kernel-copy-offload.
>Feedback on this text is welcome.

Thanks for sharing this Bart.

I agree that the topic is complex. However, we have not been able to
find a clear path forward in the mailing list.

What do you think about joining the call to talk very specific next
steps to get a patchset that we can start reviewing in detail.

I think that your presence in the call will help us all.

What do you think?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-30  9:43         ` Chaitanya Kulkarni
@ 2021-10-06 10:01           ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-06 10:01 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>Javier,
>
>>
>> Hi all,
>>
>> Since we are not going to be able to talk about this at LSF/MM, a few of
>> us thought about holding a dedicated virtual discussion about Copy
>> Offload. I believe we can use Chaitanya's thread as a start. Given the
>> current state of the current patches, I would propose that we focus on
>> the next step to get the minimal patchset that can go upstream so that
>> we can build from there.
>>
>
>I agree with having a call as it has been two years I'm trying to have
>this discussion.
>
>Before we setup a call, please summarize following here :-
>
>1. Exactly what work has been done so far.


We can categorize that into two sets. First one for XCopy (2014), and
second one for NVMe Copy (2021).

XCOPY set *********
- block-generic copy command (single range, between one
   source/destination device)
- ioctl interface for the above
- SCSI plumbing (block-generic to XCOPY conversion)
- device-mapper support: offload copy whenever possible (if IO is not
   split while traveling layers of virtual devices)

NVMe-Copy set *************
- block-generic copy command (multiple ranges, between one
   source/destination device)
- ioctl interface for the above
- NVMe plumbing (block-generic to NVMe Copy conversion)
- copy-emulation (read + write) in block-layer
- device-mapper support: no offload, rather fall back to copy-emulation


>2. What kind of feedback you got.

For NVMe Copy, the major points are - a) add copy-emulation in
block-layer and use that if copy-offload is not natively supported by
device b) user-interface (ioctl) should be extendable for copy across
two devices (one source, one destination) c) device-mapper targets
should support copy-offload, whenever possible

"whenever possible" cases get reduced compared to XCOPY because NVMe
Copy is wit

>3. What are the exact blockers/objections.

I think it was device-mapper for XCOPY and remains the same for NVMe
Copy as well.  Device-mapper support requires decomposing copy operation
to read and write.  While that is not great for efficiency PoV, bigger
concern is to check if we are taking the same route as XCOPY.

 From Martin's document (http://mkp.net/pubs/xcopy.pdf), if I got it
right, one the major blocker is having more failure cases than
successful ones. And that did not justify the effort/code to wire up
device mapper.  Is that a factor to consider for NVMe Copy (which is
narrower in scope than XCOPY).

>4. Potential ways of moving forward.

a) we defer attempt device-mapper support (until NVMe has
support/usecase), and address everything else (reusable user-interface
etc.)

b) we attempt device-mapper support (by moving to composite read+write
communication between block-layer and nvme)


Is this enough in your mind to move forward with a specific agenda? If
we can, I would like to target the meetup in the next 2 weeks.

Thanks,
Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-10-06 10:01           ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-10-06 10:01 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>Javier,
>
>>
>> Hi all,
>>
>> Since we are not going to be able to talk about this at LSF/MM, a few of
>> us thought about holding a dedicated virtual discussion about Copy
>> Offload. I believe we can use Chaitanya's thread as a start. Given the
>> current state of the current patches, I would propose that we focus on
>> the next step to get the minimal patchset that can go upstream so that
>> we can build from there.
>>
>
>I agree with having a call as it has been two years I'm trying to have
>this discussion.
>
>Before we setup a call, please summarize following here :-
>
>1. Exactly what work has been done so far.


We can categorize that into two sets. First one for XCopy (2014), and
second one for NVMe Copy (2021).

XCOPY set *********
- block-generic copy command (single range, between one
   source/destination device)
- ioctl interface for the above
- SCSI plumbing (block-generic to XCOPY conversion)
- device-mapper support: offload copy whenever possible (if IO is not
   split while traveling layers of virtual devices)

NVMe-Copy set *************
- block-generic copy command (multiple ranges, between one
   source/destination device)
- ioctl interface for the above
- NVMe plumbing (block-generic to NVMe Copy conversion)
- copy-emulation (read + write) in block-layer
- device-mapper support: no offload, rather fall back to copy-emulation


>2. What kind of feedback you got.

For NVMe Copy, the major points are - a) add copy-emulation in
block-layer and use that if copy-offload is not natively supported by
device b) user-interface (ioctl) should be extendable for copy across
two devices (one source, one destination) c) device-mapper targets
should support copy-offload, whenever possible

"whenever possible" cases get reduced compared to XCOPY because NVMe
Copy is wit

>3. What are the exact blockers/objections.

I think it was device-mapper for XCOPY and remains the same for NVMe
Copy as well.  Device-mapper support requires decomposing copy operation
to read and write.  While that is not great for efficiency PoV, bigger
concern is to check if we are taking the same route as XCOPY.

 From Martin's document (http://mkp.net/pubs/xcopy.pdf), if I got it
right, one the major blocker is having more failure cases than
successful ones. And that did not justify the effort/code to wire up
device mapper.  Is that a factor to consider for NVMe Copy (which is
narrower in scope than XCOPY).

>4. Potential ways of moving forward.

a) we defer attempt device-mapper support (until NVMe has
support/usecase), and address everything else (reusable user-interface
etc.)

b) we attempt device-mapper support (by moving to composite read+write
communication between block-layer and nvme)


Is this enough in your mind to move forward with a specific agenda? If
we can, I would like to target the meetup in the next 2 weeks.

Thanks,
Javier

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-28 19:13       ` Javier González
@ 2021-09-30 16:20         ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-09-30 16:20 UTC (permalink / raw)
  To: Javier González, Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, martin.petersen, roland,
	mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 9/28/21 12:13 PM, Javier González wrote:
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 
> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?

Given the technical complexity of this topic and also that the people who are
interested live in multiple time zones, I prefer email to discuss the technical
aspects of this work. My attempt to summarize how to implement copy offloading
is available here: https://github.com/bvanassche/linux-kernel-copy-offload.
Feedback on this text is welcome.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-09-30 16:20         ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-09-30 16:20 UTC (permalink / raw)
  To: Javier González, Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, martin.petersen, roland,
	mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 9/28/21 12:13 PM, Javier González wrote:
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 
> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?

Given the technical complexity of this topic and also that the people who are
interested live in multiple time zones, I prefer email to discuss the technical
aspects of this work. My attempt to summarize how to implement copy offloading
is available here: https://github.com/bvanassche/linux-kernel-copy-offload.
Feedback on this text is welcome.

Thanks,

Bart.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-30  9:43         ` Chaitanya Kulkarni
@ 2021-09-30  9:53           ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-09-30  9:53 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>Javier,
>
>>
>> Hi all,
>>
>> Since we are not going to be able to talk about this at LSF/MM, a few of
>> us thought about holding a dedicated virtual discussion about Copy
>> Offload. I believe we can use Chaitanya's thread as a start. Given the
>> current state of the current patches, I would propose that we focus on
>> the next step to get the minimal patchset that can go upstream so that
>> we can build from there.
>>
>
>I agree with having a call as it has been two years I'm trying to have
>this discussion.
>
>Before we setup a call, please summarize following here :-
>
>1. Exactly what work has been done so far.
>2. What kind of feedback you got.
>3. What are the exact blockers/objections.
>4. Potential ways of moving forward.
>
>Although this all information is present in the mailing archives it is
>scattered all over the places, looking at the long CC list above we need
>to get the everyone on the same page in order to have a productive call.
>
>Once we have above discussion we can setup a precise agenda and assign
>slots.

Sounds reasonable. Let me collect all this information and post it here.
I will maintain a list of people that has showed interest on joining.
For now:

   - Martin
   - Johannes
   - Fred
   - Chaitanya
   - Adam
   - Kanchan
   - Selva
   - Nitesh
   - Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-09-30  9:53           ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-09-30  9:53 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Johannes Thumshirn, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc, axboe, msnitzer, bvanassche,
	martin.petersen, roland, mpatocka, hare, kbusch, rwheeler, hch,
	Frederick.Knight, zach.brown, osandov, Adam Manzanares,
	SelvaKumar S, Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 30.09.2021 09:43, Chaitanya Kulkarni wrote:
>Javier,
>
>>
>> Hi all,
>>
>> Since we are not going to be able to talk about this at LSF/MM, a few of
>> us thought about holding a dedicated virtual discussion about Copy
>> Offload. I believe we can use Chaitanya's thread as a start. Given the
>> current state of the current patches, I would propose that we focus on
>> the next step to get the minimal patchset that can go upstream so that
>> we can build from there.
>>
>
>I agree with having a call as it has been two years I'm trying to have
>this discussion.
>
>Before we setup a call, please summarize following here :-
>
>1. Exactly what work has been done so far.
>2. What kind of feedback you got.
>3. What are the exact blockers/objections.
>4. Potential ways of moving forward.
>
>Although this all information is present in the mailing archives it is
>scattered all over the places, looking at the long CC list above we need
>to get the everyone on the same page in order to have a productive call.
>
>Once we have above discussion we can setup a precise agenda and assign
>slots.

Sounds reasonable. Let me collect all this information and post it here.
I will maintain a list of people that has showed interest on joining.
For now:

   - Martin
   - Johannes
   - Fred
   - Chaitanya
   - Adam
   - Kanchan
   - Selva
   - Nitesh
   - Javier

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-28 19:13       ` Javier González
@ 2021-09-30  9:43         ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-09-30  9:43 UTC (permalink / raw)
  To: Javier González, Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

Javier,

> 
> Hi all,
> 
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 

I agree with having a call as it has been two years I'm trying to have 
this discussion.

Before we setup a call, please summarize following here :-

1. Exactly what work has been done so far.
2. What kind of feedback you got.
3. What are the exact blockers/objections.
4. Potential ways of moving forward.

Although this all information is present in the mailing archives it is 
scattered all over the places, looking at the long CC list above we need 
to get the everyone on the same page in order to have a productive call.

Once we have above discussion we can setup a precise agenda and assign 
slots.

> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?
> 
> Thanks,
> Javier

-ck

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-09-30  9:43         ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-09-30  9:43 UTC (permalink / raw)
  To: Javier González, Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

Javier,

> 
> Hi all,
> 
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 

I agree with having a call as it has been two years I'm trying to have 
this discussion.

Before we setup a call, please summarize following here :-

1. Exactly what work has been done so far.
2. What kind of feedback you got.
3. What are the exact blockers/objections.
4. Potential ways of moving forward.

Although this all information is present in the mailing archives it is 
scattered all over the places, looking at the long CC list above we need 
to get the everyone on the same page in order to have a productive call.

Once we have above discussion we can setup a precise agenda and assign 
slots.

> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?
> 
> Thanks,
> Javier

-ck
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-09-28 19:13       ` Javier González
@ 2021-09-29  6:44         ` Johannes Thumshirn
  -1 siblings, 0 replies; 272+ messages in thread
From: Johannes Thumshirn @ 2021-09-29  6:44 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 28/09/2021 21:13, Javier González wrote:
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 
> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?

I'd definitively be interested in participating.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-09-29  6:44         ` Johannes Thumshirn
  0 siblings, 0 replies; 272+ messages in thread
From: Johannes Thumshirn @ 2021-09-29  6:44 UTC (permalink / raw)
  To: Javier González
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 28/09/2021 21:13, Javier González wrote:
> Since we are not going to be able to talk about this at LSF/MM, a few of
> us thought about holding a dedicated virtual discussion about Copy
> Offload. I believe we can use Chaitanya's thread as a start. Given the
> current state of the current patches, I would propose that we focus on
> the next step to get the minimal patchset that can go upstream so that
> we can build from there.
> 
> Before we try to find a date and a time that fits most of us, who would
> be interested in participating?

I'd definitively be interested in participating.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
       [not found]   ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
@ 2021-09-28 19:13       ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-09-28 19:13 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 12.05.2021 07:30, Johannes Thumshirn wrote:
>On 11/05/2021 02:15, Chaitanya Kulkarni wrote:
>> Hi,
>>
>> * Background :-
>> -----------------------------------------------------------------------
>>
>> Copy offload is a feature that allows file-systems or storage devices
>> to be instructed to copy files/logical blocks without requiring
>> involvement of the local CPU.
>>
>> With reference to the RISC-V summit keynote [1] single threaded
>> performance is limiting due to Denard scaling and multi-threaded
>> performance is slowing down due Moore's law limitations. With the rise
>> of SNIA Computation Technical Storage Working Group (TWG) [2],
>> offloading computations to the device or over the fabrics is becoming
>> popular as there are several solutions available [2]. One of the common
>> operation which is popular in the kernel and is not merged yet is Copy
>> offload over the fabrics or on to the device.
>>
>> * Problem :-
>> -----------------------------------------------------------------------
>>
>> The original work which is done by Martin is present here [3]. The
>> latest work which is posted by Mikulas [4] is not merged yet. These two
>> approaches are totally different from each other. Several storage
>> vendors discourage mixing copy offload requests with regular READ/WRITE
>> I/O. Also, the fact that the operation fails if a copy request ever
>> needs to be split as it traverses the stack it has the unfortunate
>> side-effect of preventing copy offload from working in pretty much
>> every common deployment configuration out there.
>>
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
>>
>> * Why Linux Kernel Storage System needs Copy Offload support now ?
>> -----------------------------------------------------------------------
>>
>> With the rise of the SNIA Computational Storage TWG and solutions [2],
>> existing SCSI XCopy support in the protocol, recent advancement in the
>> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>> DMA support in the Linux Kernel mainly for NVMe devices [7] and
>> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>> from Copy offload operation.
>>
>> With this background we have significant number of use-cases which are
>> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>> Offload support, so that Linux Kernel Storage subsystem can to address
>> previously mentioned problems [1] and allow efficient offloading of the
>> data related operations. (Such as move/copy etc.)
>>
>> For reference following is the list of the use-cases/candidates waiting
>> for Copy Offload support :-
>>
>> 1. SCSI-attached storage arrays.
>> 2. Stacking drivers supporting XCopy DM/MD.
>> 3. Computational Storage solutions.
>> 7. File systems :- Local, NFS and Zonefs.
>> 4. Block devices :- Distributed, local, and Zoned devices.
>> 5. Peer to Peer DMA support solutions.
>> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>>
>> * What we will discuss in the proposed session ?
>> -----------------------------------------------------------------------
>>
>> I'd like to propose a session to go over this topic to understand :-
>>
>> 1. What are the blockers for Copy Offload implementation ?
>> 2. Discussion about having a file system interface.
>> 3. Discussion about having right system call for user-space.
>> 4. What is the right way to move this work forward ?
>> 5. How can we help to contribute and move this work forward ?
>>
>> * Required Participants :-
>> -----------------------------------------------------------------------
>>
>> I'd like to invite file system, block layer, and device drivers
>> developers to:-
>>
>> 1. Share their opinion on the topic.
>> 2. Share their experience and any other issues with [4].
>> 3. Uncover additional details that are missing from this proposal.
>>
>> Required attendees :-
>>
>> Martin K. Petersen
>> Jens Axboe
>> Christoph Hellwig
>> Bart Van Assche
>> Zach Brown
>> Roland Dreier
>> Ric Wheeler
>> Trond Myklebust
>> Mike Snitzer
>> Keith Busch
>> Sagi Grimberg
>> Hannes Reinecke
>> Frederick Knight
>> Mikulas Patocka
>> Keith Busch
>>
>
>I would like to participate in this discussion as well. A generic block layer
>copy API is extremely helpful for filesystem garbage collection and copy operations
>like copy_file_range().


Hi all,

Since we are not going to be able to talk about this at LSF/MM, a few of
us thought about holding a dedicated virtual discussion about Copy
Offload. I believe we can use Chaitanya's thread as a start. Given the
current state of the current patches, I would propose that we focus on
the next step to get the minimal patchset that can go upstream so that
we can build from there.

Before we try to find a date and a time that fits most of us, who would
be interested in participating?

Thanks,
Javier

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-09-28 19:13       ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-09-28 19:13 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, msnitzer, bvanassche, martin.petersen,
	roland, mpatocka, hare, kbusch, rwheeler, hch, Frederick.Knight,
	zach.brown, osandov, Adam Manzanares, SelvaKumar S,
	Nitesh Shetty, Kanchan Joshi, Vincent Fu

On 12.05.2021 07:30, Johannes Thumshirn wrote:
>On 11/05/2021 02:15, Chaitanya Kulkarni wrote:
>> Hi,
>>
>> * Background :-
>> -----------------------------------------------------------------------
>>
>> Copy offload is a feature that allows file-systems or storage devices
>> to be instructed to copy files/logical blocks without requiring
>> involvement of the local CPU.
>>
>> With reference to the RISC-V summit keynote [1] single threaded
>> performance is limiting due to Denard scaling and multi-threaded
>> performance is slowing down due Moore's law limitations. With the rise
>> of SNIA Computation Technical Storage Working Group (TWG) [2],
>> offloading computations to the device or over the fabrics is becoming
>> popular as there are several solutions available [2]. One of the common
>> operation which is popular in the kernel and is not merged yet is Copy
>> offload over the fabrics or on to the device.
>>
>> * Problem :-
>> -----------------------------------------------------------------------
>>
>> The original work which is done by Martin is present here [3]. The
>> latest work which is posted by Mikulas [4] is not merged yet. These two
>> approaches are totally different from each other. Several storage
>> vendors discourage mixing copy offload requests with regular READ/WRITE
>> I/O. Also, the fact that the operation fails if a copy request ever
>> needs to be split as it traverses the stack it has the unfortunate
>> side-effect of preventing copy offload from working in pretty much
>> every common deployment configuration out there.
>>
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
>>
>> * Why Linux Kernel Storage System needs Copy Offload support now ?
>> -----------------------------------------------------------------------
>>
>> With the rise of the SNIA Computational Storage TWG and solutions [2],
>> existing SCSI XCopy support in the protocol, recent advancement in the
>> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>> DMA support in the Linux Kernel mainly for NVMe devices [7] and
>> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>> from Copy offload operation.
>>
>> With this background we have significant number of use-cases which are
>> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>> Offload support, so that Linux Kernel Storage subsystem can to address
>> previously mentioned problems [1] and allow efficient offloading of the
>> data related operations. (Such as move/copy etc.)
>>
>> For reference following is the list of the use-cases/candidates waiting
>> for Copy Offload support :-
>>
>> 1. SCSI-attached storage arrays.
>> 2. Stacking drivers supporting XCopy DM/MD.
>> 3. Computational Storage solutions.
>> 7. File systems :- Local, NFS and Zonefs.
>> 4. Block devices :- Distributed, local, and Zoned devices.
>> 5. Peer to Peer DMA support solutions.
>> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>>
>> * What we will discuss in the proposed session ?
>> -----------------------------------------------------------------------
>>
>> I'd like to propose a session to go over this topic to understand :-
>>
>> 1. What are the blockers for Copy Offload implementation ?
>> 2. Discussion about having a file system interface.
>> 3. Discussion about having right system call for user-space.
>> 4. What is the right way to move this work forward ?
>> 5. How can we help to contribute and move this work forward ?
>>
>> * Required Participants :-
>> -----------------------------------------------------------------------
>>
>> I'd like to invite file system, block layer, and device drivers
>> developers to:-
>>
>> 1. Share their opinion on the topic.
>> 2. Share their experience and any other issues with [4].
>> 3. Uncover additional details that are missing from this proposal.
>>
>> Required attendees :-
>>
>> Martin K. Petersen
>> Jens Axboe
>> Christoph Hellwig
>> Bart Van Assche
>> Zach Brown
>> Roland Dreier
>> Ric Wheeler
>> Trond Myklebust
>> Mike Snitzer
>> Keith Busch
>> Sagi Grimberg
>> Hannes Reinecke
>> Frederick Knight
>> Mikulas Patocka
>> Keith Busch
>>
>
>I would like to participate in this discussion as well. A generic block layer
>copy API is extremely helpful for filesystem garbage collection and copy operations
>like copy_file_range().


Hi all,

Since we are not going to be able to talk about this at LSF/MM, a few of
us thought about holding a dedicated virtual discussion about Copy
Offload. I believe we can use Chaitanya's thread as a start. Given the
current state of the current patches, I would propose that we focus on
the next step to get the minimal patchset that can go upstream so that
we can build from there.

Before we try to find a date and a time that fits most of us, who would
be interested in participating?

Thanks,
Javier

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-06-11 15:35   ` Nikos Tsironis
  -1 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2021-06-11 15:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

On 5/11/21 3:15 AM, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> Regards,
> Chaitanya
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 

I would like to participate in this discussion too.

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-06-11 15:35   ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2021-06-11 15:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

On 5/11/21 3:15 AM, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
> 
> Regards,
> Chaitanya
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 

I would like to participate in this discussion too.

Thanks,
Nikos

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-06-11  6:03   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-11  6:03 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc; +Cc: ckulkarnilinux

On 5/10/21 17:15, Chaitanya Kulkarni wrote:
> Hi,
>
> * Background :-
> -----------------------------------------------------------------------
>
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
>
> * Problem :-
> -----------------------------------------------------------------------
>
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
>
> * Current state of the work :-
> -----------------------------------------------------------------------
>
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
>
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> -----------------------------------------------------------------------
>
> I'd like to invite file system, block layer, and device drivers
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
>
> Regards,
> Chaitanya
>
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
>
>

Mail server is dropping emails from the mailing list, adding personal
email address.



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-06-11  6:03   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-11  6:03 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc; +Cc: ckulkarnilinux

On 5/10/21 17:15, Chaitanya Kulkarni wrote:
> Hi,
>
> * Background :-
> -----------------------------------------------------------------------
>
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
>
> * Problem :-
> -----------------------------------------------------------------------
>
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
>
> * Current state of the work :-
> -----------------------------------------------------------------------
>
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
>
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> -----------------------------------------------------------------------
>
> I'd like to invite file system, block layer, and device drivers
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
>
> Regards,
> Chaitanya
>
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
>
>

Mail server is dropping emails from the mailing list, adding personal
email address.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-18  0:15   ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-05-18  0:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/10/21 5:15 PM, Chaitanya Kulkarni wrote:
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?

We need to achieve agreement about an approach. The text below is my
attempt at guiding the discussion. A HTML version is available at
https://github.com/bvanassche/linux-kernel-copy-offload. As usual,
feedback is welcome.

Bart.


# Implementing Copy Offloading in the Linux Kernel

## Introduction

Efforts to add copy offloading support in the Linux kernel started considerable
time ago. Despite this copy offloading support is not yet upstream and there is
no detailed plan yet of how to implement copy offloading.

This document outlines a possible implementation. The purpose of this document
is to help guiding the conversations around copy offloading.

## Block Layer

We need an interface to pass copy offload requests from user space or file
systems to block drivers. Although the first implementation of copy offloading
added a single operation to the block layer for copy offloading, there seems
to be agreement today to implement copy offloading as two operations,
namely `REQ_COPY_IN` and `REQ_COPY_OUT`.

A possible approach is as follows:

* Fall back to a non-offloaded copy operation if necessary, e.g. if copy
  offloading is not supported or if data is encrypted and the ciphertext
  depends on the LBA. The following code may be a good starting point:
  `drivers/md/dm-kcopyd.c`.
* If the block driver supports copy offloading, submit the `REQ_COPY_IN`
  operation first. The block driver stores the data ranges associated with the
  `REQ_COPY_IN` operation.
* Wait for completion of the `REQ_COPY_IN` operation.
* After the `REQ_COPY_IN` operation has completed, submit the `REQ_COPY_OUT`
  operation and include a reference to the `REQ_COPY_IN` operation. If the
  block driver that receives the `REQ_COPY_OUT` operation receives a matching
  `REQ_COPY_IN` operation, offload the copy operation. Otherwise report that no
  data has been copied and let the block layer perform a non-offloaded copy
  operation.

The operation type is stored in the top bits of the `bi_opf` member of struct
bio.  With each bio a single data buffer and a single contiguous byte range on
the storage medium are associated. Pointers to the data buffer occur in
`bi_io_vec[]`. The affected byte range is represented by `bi_iter.bi_sector` and
`bi_iter.bi_size`.

While the NVMe and SCSI copy offload commands both support multiple source
ranges, XCOPY supports multiple destination ranges while the NVMe simple copy
command supports a single destination range.

Possible approaches for passing the data ranges involved in a copy operation
from the block layer to block drivers are as follows:

* Attach a bio to each copy offload request and encode all relevant copy
  offload parameters in that data buffer. These parameters include source
  device and source ranges for `REQ_COPY_IN` and destination device and
  destination ranges for `REQ_COPY_OUT`. Let the block drivers translate these
  parameters into something the storage device understands (NVMe simple copy
  parameters or SCSI XCOPY parameters). Fill in the parameter structure size
  in `bi_iter.bi_size`. Set `bi_vcnt` to 1 and fill in `bio->bi_io_vec[0]`.
* Map each source range and each destination range onto a different bio. Link
  all the bios with the `bi_next` pointer and attach these bios to the copy
  offload requests. Leave `bi_vcnt` zero. This is related but not identical to
  the approach followed by `__blkdev_issue_discard()`.

I think that the first approach would require more changes in the device mapper
than the second approach since the device mapper code knows how to split bios
but not how to split a buffer with LBA range descriptors.

The following code needs to be modified no matter how copy offloading is
implemented:

* Request cloning. The code for checking the limits before request are cloned
  compares `blk_rq_sectors()` with `max_sectors`. This is inappropriate for
  `REQ_COPY_*` requests.
* Request splitting. `bio_split()` assumes that `bi_iter.bi_size` represents
  the number of bytes affected on the medium.
* Code related to retrying the original requests of a merged request with
  mixed failfast attributes, e.g. `blk_rq_err_bytes()`.
* Code related to partially completing a request, e.g. `blk_update_request()`.
* The code for merging block layer requests.
* `blk_mq_end_request()` since it calls `blk_update_request()` and
  `blk_rq_bytes()`.
* The plugging code because of the following test in the plugging code:
  `blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE`.
* The I/O accounting code (task_io_account_read()) since that code uses
  bio_has_data() and hence skips discard, secure erase and write zeroes
  requests:
```
static inline bool bio_has_data(struct bio *bio)
{
	return bio && bio->bi_iter.bi_size &&
	    bio_op(bio) != REQ_OP_DISCARD &&
	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
	    bio_op(bio) != REQ_OP_WRITE_ZEROESy;
}
```

Block drivers will need to use the `special_vec` member of struct request to
pass the copy offload parameters to the storage device. That member is used
e.g. when a REQ_OP_DISCARD operation is submitted to an NVMe driver. The SCSI
sd driver uses `special_vec` while processing an UNMAP or WRITE SAME command.

## Device Mapper

The device mapper may have to split a request. As an example, LVM is
based on the dm-linear driver. A request that is submitted to an LVM volume
has to be split if it affects multiple block devices. Copy offload requests
that affect multiple block devices should be split or should be onloaded.

The call chain for bio-based dm drivers is as follows:
```
dm_submit_bio(bio)
-> __split_and_process_bio(md, map, bio)
  -> __split_and_process_non_flush(clone_info)
    -> __clone_and_map_data_bio(clone_info, target_info, sector, len)
      -> clone_bio(dm_target_io, bio, sector, len)
      -> __map_bio(dm_target_io)
        -> ti->type->map(dm_target_io, clone)
```

## NVMe

Process copy offload commands by translating REQ_COPY_OUT requests into simple
copy commands.

## SCSI

From inside `sd_revalidate_disk()`, query the third-party copy VPD page. Extract
the following parameters (see also SPC-6):

* MAXIMUM CSCD DESCRIPTOR COUNT
* MAXIMUM SEGMENT DESCRIPTOR COUNT
* MAXIMUM DESCRIPTOR LIST LENGTH
* Supported third-party copy commands.
* SUPPORTED CSCD DESCRIPTOR ID (0 or more)
* ROD type descriptor (0 or more)
* TOTAL CONCURRENT COPIES
* MAXIMUM IDENTIFIED CONCURRENT COPIES
* MAXIMUM SEGMENT LENGTH

From inside `sd_init_command()`, translate REQ_COPY_OUT into either EXTENDED
COPY or POPULATE TOKEN + WRITE USING TOKEN.

Set the parameters in the copy offload commands as follows:

* We may have to set the STR bit. From SPC-6: "A sequential striped (STR) bit
  set to one specifies to the copy manager that the majority of the block
  device references in the parameter list represent sequential access of
  several block devices that are striped. This may be used by the copy manager
  to perform reads from a copy source block device at any time and in any
  order during processing of an EXTENDED COPY command as described in
  6.6.5.3. A STR bit set to zero specifies to the copy manager that disk
  references, if any, may not be sequential."
* Set the LIST ID USAGE field to 3 and the LIST ID to 0. This means that
  neither "held data" nor the RECEIVE COPY STATUS command are supported. This
  improves security because the data that is being copied cannot be accessed
  via the LIST ID.
* We may have to set the G_SENSE (good with sense data) bit. From SPC-6: " If
  the G _SENSE bit is set to one and the copy manager completes the EXTENDED
  COPY command with GOOD status, then the copy manager shall include sense
  data with the GOOD status in which the sense key is set to COMPLETED, the
  additional sense code is set to EXTENDED COPY INFORMATION AVAILABLE, and the
  COMMAND-SPECIFIC INFORMATION field is set to the number of segment
  descriptors the copy manager has processed."
* Clear the IMMED bit.

## System Call Interface

To submit copy offload requests from user space, we need:

* A system call for passing these requests, e.g. copy_file_range() or io_uring.
* Add a copy offload parameter format description to the user space ABI. The
  parameters include source device, source ranges, destination device and
  destination ranges.
* A flag that indicates whether or not it is acceptable to fall back to
  onloading the copy operation.

## Sysfs Interface

To do: define which aspects of copy offloading should be configurable through
new sysfs parameters under /sys/block/*/queue/.

## See Also

* Martin Petersen, [Copy
  Offload](https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg28998.html),
  linux-scsi, 28 May 2014.
* Mikulas Patocka, [ANNOUNCE: SCSI XCOPY support for the kernel and device
  mapper](https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg686111.html),
  15 July 2014.
* [kcopyd documentation](https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/kcopyd.html), kernel.org.
* Martin K. Petersen, [Copy Offload - Here Be Dragons](http://mkp.net/pubs/xcopy.pdf), 2019-08-21.
* Martin K. Petersen, [Re: [dm-devel] [RFC PATCH v2 1/2] block: add simple copy
support](https://lore.kernel.org/linux-nvme/yq1blf3smcl.fsf@ca-mkp.ca.oracle.com/), linux-nvme mailing list, 2020-12-08.
* NVM Express Organization, [NVMe - TP 4065b Simple Copy Command 2021.01.25 -
  Ratified.pdf](https://workspace.nvmexpress.org/apps/org/workgroup/allmembers/download.php/4773/NVMe%20-%20TP%204065b%20Simple%20Copy%20Command%202021.01.25%20-%20Ratified.pdf), 2021-01-25.
* Selvakumar S, [[RFC PATCH v5 0/4] add simple copy
  support](https://lore.kernel.org/linux-nvme/20210219124517.79359-1-selvakuma.s1@samsung.com/),
  linux-nvme, 2021-02-19.


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-18  0:15   ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-05-18  0:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/10/21 5:15 PM, Chaitanya Kulkarni wrote:
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?

We need to achieve agreement about an approach. The text below is my
attempt at guiding the discussion. A HTML version is available at
https://github.com/bvanassche/linux-kernel-copy-offload. As usual,
feedback is welcome.

Bart.


# Implementing Copy Offloading in the Linux Kernel

## Introduction

Efforts to add copy offloading support in the Linux kernel started considerable
time ago. Despite this copy offloading support is not yet upstream and there is
no detailed plan yet of how to implement copy offloading.

This document outlines a possible implementation. The purpose of this document
is to help guiding the conversations around copy offloading.

## Block Layer

We need an interface to pass copy offload requests from user space or file
systems to block drivers. Although the first implementation of copy offloading
added a single operation to the block layer for copy offloading, there seems
to be agreement today to implement copy offloading as two operations,
namely `REQ_COPY_IN` and `REQ_COPY_OUT`.

A possible approach is as follows:

* Fall back to a non-offloaded copy operation if necessary, e.g. if copy
  offloading is not supported or if data is encrypted and the ciphertext
  depends on the LBA. The following code may be a good starting point:
  `drivers/md/dm-kcopyd.c`.
* If the block driver supports copy offloading, submit the `REQ_COPY_IN`
  operation first. The block driver stores the data ranges associated with the
  `REQ_COPY_IN` operation.
* Wait for completion of the `REQ_COPY_IN` operation.
* After the `REQ_COPY_IN` operation has completed, submit the `REQ_COPY_OUT`
  operation and include a reference to the `REQ_COPY_IN` operation. If the
  block driver that receives the `REQ_COPY_OUT` operation receives a matching
  `REQ_COPY_IN` operation, offload the copy operation. Otherwise report that no
  data has been copied and let the block layer perform a non-offloaded copy
  operation.

The operation type is stored in the top bits of the `bi_opf` member of struct
bio.  With each bio a single data buffer and a single contiguous byte range on
the storage medium are associated. Pointers to the data buffer occur in
`bi_io_vec[]`. The affected byte range is represented by `bi_iter.bi_sector` and
`bi_iter.bi_size`.

While the NVMe and SCSI copy offload commands both support multiple source
ranges, XCOPY supports multiple destination ranges while the NVMe simple copy
command supports a single destination range.

Possible approaches for passing the data ranges involved in a copy operation
from the block layer to block drivers are as follows:

* Attach a bio to each copy offload request and encode all relevant copy
  offload parameters in that data buffer. These parameters include source
  device and source ranges for `REQ_COPY_IN` and destination device and
  destination ranges for `REQ_COPY_OUT`. Let the block drivers translate these
  parameters into something the storage device understands (NVMe simple copy
  parameters or SCSI XCOPY parameters). Fill in the parameter structure size
  in `bi_iter.bi_size`. Set `bi_vcnt` to 1 and fill in `bio->bi_io_vec[0]`.
* Map each source range and each destination range onto a different bio. Link
  all the bios with the `bi_next` pointer and attach these bios to the copy
  offload requests. Leave `bi_vcnt` zero. This is related but not identical to
  the approach followed by `__blkdev_issue_discard()`.

I think that the first approach would require more changes in the device mapper
than the second approach since the device mapper code knows how to split bios
but not how to split a buffer with LBA range descriptors.

The following code needs to be modified no matter how copy offloading is
implemented:

* Request cloning. The code for checking the limits before request are cloned
  compares `blk_rq_sectors()` with `max_sectors`. This is inappropriate for
  `REQ_COPY_*` requests.
* Request splitting. `bio_split()` assumes that `bi_iter.bi_size` represents
  the number of bytes affected on the medium.
* Code related to retrying the original requests of a merged request with
  mixed failfast attributes, e.g. `blk_rq_err_bytes()`.
* Code related to partially completing a request, e.g. `blk_update_request()`.
* The code for merging block layer requests.
* `blk_mq_end_request()` since it calls `blk_update_request()` and
  `blk_rq_bytes()`.
* The plugging code because of the following test in the plugging code:
  `blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE`.
* The I/O accounting code (task_io_account_read()) since that code uses
  bio_has_data() and hence skips discard, secure erase and write zeroes
  requests:
```
static inline bool bio_has_data(struct bio *bio)
{
	return bio && bio->bi_iter.bi_size &&
	    bio_op(bio) != REQ_OP_DISCARD &&
	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
	    bio_op(bio) != REQ_OP_WRITE_ZEROESy;
}
```

Block drivers will need to use the `special_vec` member of struct request to
pass the copy offload parameters to the storage device. That member is used
e.g. when a REQ_OP_DISCARD operation is submitted to an NVMe driver. The SCSI
sd driver uses `special_vec` while processing an UNMAP or WRITE SAME command.

## Device Mapper

The device mapper may have to split a request. As an example, LVM is
based on the dm-linear driver. A request that is submitted to an LVM volume
has to be split if it affects multiple block devices. Copy offload requests
that affect multiple block devices should be split or should be onloaded.

The call chain for bio-based dm drivers is as follows:
```
dm_submit_bio(bio)
-> __split_and_process_bio(md, map, bio)
  -> __split_and_process_non_flush(clone_info)
    -> __clone_and_map_data_bio(clone_info, target_info, sector, len)
      -> clone_bio(dm_target_io, bio, sector, len)
      -> __map_bio(dm_target_io)
        -> ti->type->map(dm_target_io, clone)
```

## NVMe

Process copy offload commands by translating REQ_COPY_OUT requests into simple
copy commands.

## SCSI

From inside `sd_revalidate_disk()`, query the third-party copy VPD page. Extract
the following parameters (see also SPC-6):

* MAXIMUM CSCD DESCRIPTOR COUNT
* MAXIMUM SEGMENT DESCRIPTOR COUNT
* MAXIMUM DESCRIPTOR LIST LENGTH
* Supported third-party copy commands.
* SUPPORTED CSCD DESCRIPTOR ID (0 or more)
* ROD type descriptor (0 or more)
* TOTAL CONCURRENT COPIES
* MAXIMUM IDENTIFIED CONCURRENT COPIES
* MAXIMUM SEGMENT LENGTH

From inside `sd_init_command()`, translate REQ_COPY_OUT into either EXTENDED
COPY or POPULATE TOKEN + WRITE USING TOKEN.

Set the parameters in the copy offload commands as follows:

* We may have to set the STR bit. From SPC-6: "A sequential striped (STR) bit
  set to one specifies to the copy manager that the majority of the block
  device references in the parameter list represent sequential access of
  several block devices that are striped. This may be used by the copy manager
  to perform reads from a copy source block device at any time and in any
  order during processing of an EXTENDED COPY command as described in
  6.6.5.3. A STR bit set to zero specifies to the copy manager that disk
  references, if any, may not be sequential."
* Set the LIST ID USAGE field to 3 and the LIST ID to 0. This means that
  neither "held data" nor the RECEIVE COPY STATUS command are supported. This
  improves security because the data that is being copied cannot be accessed
  via the LIST ID.
* We may have to set the G_SENSE (good with sense data) bit. From SPC-6: " If
  the G _SENSE bit is set to one and the copy manager completes the EXTENDED
  COPY command with GOOD status, then the copy manager shall include sense
  data with the GOOD status in which the sense key is set to COMPLETED, the
  additional sense code is set to EXTENDED COPY INFORMATION AVAILABLE, and the
  COMMAND-SPECIFIC INFORMATION field is set to the number of segment
  descriptors the copy manager has processed."
* Clear the IMMED bit.

## System Call Interface

To submit copy offload requests from user space, we need:

* A system call for passing these requests, e.g. copy_file_range() or io_uring.
* Add a copy offload parameter format description to the user space ABI. The
  parameters include source device, source ranges, destination device and
  destination ranges.
* A flag that indicates whether or not it is acceptable to fall back to
  onloading the copy operation.

## Sysfs Interface

To do: define which aspects of copy offloading should be configurable through
new sysfs parameters under /sys/block/*/queue/.

## See Also

* Martin Petersen, [Copy
  Offload](https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg28998.html),
  linux-scsi, 28 May 2014.
* Mikulas Patocka, [ANNOUNCE: SCSI XCOPY support for the kernel and device
  mapper](https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg686111.html),
  15 July 2014.
* [kcopyd documentation](https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/kcopyd.html), kernel.org.
* Martin K. Petersen, [Copy Offload - Here Be Dragons](http://mkp.net/pubs/xcopy.pdf), 2019-08-21.
* Martin K. Petersen, [Re: [dm-devel] [RFC PATCH v2 1/2] block: add simple copy
support](https://lore.kernel.org/linux-nvme/yq1blf3smcl.fsf@ca-mkp.ca.oracle.com/), linux-nvme mailing list, 2020-12-08.
* NVM Express Organization, [NVMe - TP 4065b Simple Copy Command 2021.01.25 -
  Ratified.pdf](https://workspace.nvmexpress.org/apps/org/workgroup/allmembers/download.php/4773/NVMe%20-%20TP%204065b%20Simple%20Copy%20Command%202021.01.25%20-%20Ratified.pdf), 2021-01-25.
* Selvakumar S, [[RFC PATCH v5 0/4] add simple copy
  support](https://lore.kernel.org/linux-nvme/20210219124517.79359-1-selvakuma.s1@samsung.com/),
  linux-nvme, 2021-02-19.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-17 16:39   ` Kanchan Joshi
  -1 siblings, 0 replies; 272+ messages in thread
From: Kanchan Joshi @ 2021-05-17 16:39 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, martin.petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov

> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> -----------------------------------------------------------------------
>
> I'd like to invite file system, block layer, and device drivers
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
I'd like to participate in discussion.
Hopefully we can get consensus on some elements (or discover new
issues) before Dec.
An async-interface (via io_uring) would be good to be discussed while
we are at it.


-- 
Kanchan

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-17 16:39   ` Kanchan Joshi
  0 siblings, 0 replies; 272+ messages in thread
From: Kanchan Joshi @ 2021-05-17 16:39 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, martin.petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov

> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> -----------------------------------------------------------------------
>
> I'd like to invite file system, block layer, and device drivers
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
I'd like to participate in discussion.
Hopefully we can get consensus on some elements (or discover new
issues) before Dec.
An async-interface (via io_uring) would be good to be discussed while
we are at it.


-- 
Kanchan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-12 15:45   ` Himanshu Madhani
  -1 siblings, 0 replies; 272+ messages in thread
From: Himanshu Madhani @ 2021-05-12 15:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, Martin Petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov



> On May 10, 2021, at 7:15 PM, Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> wrote:
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.

I would like to participate in this discussion as well. 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-12 15:45   ` Himanshu Madhani
  0 siblings, 0 replies; 272+ messages in thread
From: Himanshu Madhani @ 2021-05-12 15:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, Martin Petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov



> On May 10, 2021, at 7:15 PM, Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> wrote:
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.

I would like to participate in this discussion as well. 

--
Himanshu Madhani	 Oracle Linux Engineering


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-12 15:23   ` Hannes Reinecke
  -1 siblings, 0 replies; 272+ messages in thread
From: Hannes Reinecke @ 2021-05-12 15:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/11/21 2:15 AM, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
The neverending topic.

Count me in.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-12 15:23   ` Hannes Reinecke
  0 siblings, 0 replies; 272+ messages in thread
From: Hannes Reinecke @ 2021-05-12 15:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/11/21 2:15 AM, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
The neverending topic.

Count me in.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-12  7:30   ` Johannes Thumshirn
  -1 siblings, 0 replies; 272+ messages in thread
From: Johannes Thumshirn @ 2021-05-12  7:30 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

On 11/05/2021 02:15, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
>

I would like to participate in this discussion as well. A generic block layer
copy API is extremely helpful for filesystem garbage collection and copy operations
like copy_file_range().

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-12  7:30   ` Johannes Thumshirn
  0 siblings, 0 replies; 272+ messages in thread
From: Johannes Thumshirn @ 2021-05-12  7:30 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

On 11/05/2021 02:15, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite file system, block layer, and device drivers
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Keith Busch
>

I would like to participate in this discussion as well. A generic block layer
copy API is extremely helpful for filesystem garbage collection and copy operations
like copy_file_range().

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
       [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
@ 2021-05-12  7:13     ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-05-12  7:13 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, martin.petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Kanchan Joshi, SelvaKumar S

On 11.05.2021 00:15, Chaitanya Kulkarni wrote:
>Hi,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite file system, block layer, and device drivers
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Keith Busch
>
>Regards,
>Chaitanya
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf


I would like to participate in this discussion too.

Cc'in Selva and Kanchan, who have been posting several series for NVMe
Simple Copy (SCC). Even though SCC is a very narrow use-case of
copy-offload, it seems like a good start to start getting generic code
in the block layer.

Javier



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-12  7:13     ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2021-05-12  7:13 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	msnitzer, bvanassche, martin.petersen, roland, mpatocka, hare,
	kbusch, rwheeler, hch, Frederick.Knight, zach.brown, osandov,
	Kanchan Joshi, SelvaKumar S

On 11.05.2021 00:15, Chaitanya Kulkarni wrote:
>Hi,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite file system, block layer, and device drivers
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Keith Busch
>
>Regards,
>Chaitanya
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf


I would like to participate in this discussion too.

Cc'in Selva and Kanchan, who have been posting several series for NVMe
Simple Copy (SCC). Even though SCC is a very narrow use-case of
copy-offload, it seems like a good start to start getting generic code
in the block layer.

Javier



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-12  2:21   ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-05-12  2:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/10/21 5:15 PM, Chaitanya Kulkarni wrote:
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?

Are there any blockers left? My understanding is that what is needed is
to implement what has been proposed recently
(https://lore.kernel.org/linux-nvme/yq1blf3smcl.fsf@ca-mkp.ca.oracle.com/).
Anyway, I'm interested to attend the conversation about this topic.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-12  2:21   ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2021-05-12  2:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, martin.petersen, roland, mpatocka, hare, kbusch,
	rwheeler, hch, Frederick.Knight, zach.brown, osandov

On 5/10/21 5:15 PM, Chaitanya Kulkarni wrote:
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?

Are there any blockers left? My understanding is that what is needed is
to implement what has been proposed recently
(https://lore.kernel.org/linux-nvme/yq1blf3smcl.fsf@ca-mkp.ca.oracle.com/).
Anyway, I'm interested to attend the conversation about this topic.

Thanks,

Bart.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2021-05-11  0:15 ` Chaitanya Kulkarni
@ 2021-05-11 21:15   ` Knight, Frederick
  -1 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2021-05-11 21:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	Hannes Reinecke, kbusch, rwheeler, hch, zach.brown, osandov

I'd love to participate in this discussion.

You mention the 2 different models (single command vs. multi-command).  Just as a reminder, there are specific reasons for those 2 different models.

Some applications know both the source and the destination, so can use the single command model (the application is aware it is doing a copy).  But, there is a group of applications that do NOT know both pieces of information at the same time, in the same thread, in the same context (the application is NOT aware it is doing a copy - the application thinks it is doing reads and writes).

That is why there are 2 different models - because the application engineers didn't want to change their application.  So, the author of the CP application (the shell copy command) wanted to use the existing READ / WRITE model (2 commands).  Just replace the READ with "get the data ready" and replace the WRITE with "use the data you got ready".  It was easier for that application to use the existing model, rather than totally redesigning the application.

But, other application engineers had a code base that already knew a copy was happening, and their code already knew both the source and destination in the same code path. A BACKUP application is one that generally fits into this camp.  So, it was easier for that application to replace that function with a single copy request.  Another application was a VM mastering/replicating application that could spin up new VM images very quickly - the source and destination are known to be able to use a single request.

When this offload journey began, both interfaces were needed and used.  But yes, it did bifurcate the space, creating 2 camps of engineers - each with their favorite method (based on the application where they planned to use it).  Each camp of engineers often sees no reason that the other camp can't just switch to do it the way they do - if they'd only see the light.  But, originally, there were 2 different sets of requirements that each drove a specific design of a copy offload model.

Even NVMe has recently joined the copy offload camp with a new COPY command (single namespace, multiple source ranges, single destination range - works well for defrag, and other use cases). I'm confident its capabilities will grow over time.

SO, I think this will be a great discussion to have!!!

	Fred Knight



-----Original Message-----
From: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> 
Sent: Monday, May 10, 2021 8:16 PM
To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; martin.petersen@oracle.com; roland@purestorage.com; mpatocka@redhat.com; Hannes Reinecke <hare@suse.de>; kbusch@kernel.org; rwheeler@redhat.com; hch@lst.de; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; osandov@fb.com
Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices to be instructed to copy files/logical blocks without requiring involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded performance is limiting due to Denard scaling and multi-threaded performance is slowing down due Moore's law limitations. With the rise of SNIA Computation Technical Storage Working Group (TWG) [2], offloading computations to the device or over the fabrics is becoming popular as there are several solutions available [2]. One of the common operation which is popular in the kernel and is not merged yet is Copy offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The latest work which is posted by Mikulas [4] is not merged yet. These two approaches are totally different from each other. Several storage vendors discourage mixing copy offload requests with regular READ/WRITE I/O. Also, the fact that the operation fails if a copy request ever needs to be split as it traverses the stack it has the unfortunate side-effect of preventing copy offload from working in pretty much every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without splitting the command in two, one for copying IN and one for copying OUT. Which is then demonstrated by the [4] why [3] it is not a suitable candidate. Also, with [4] there is an unresolved problem with the two-command approach about how to handle changes to the DM layout between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2], existing SCSI XCopy support in the protocol, recent advancement in the Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the Linux Kernel mainly for NVMe devices [7] and eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.

With this background we have significant number of use-cases which are strong candidates waiting for outstanding Linux Kernel Block Layer Copy Offload support, so that Linux Kernel Storage subsystem can to address previously mentioned problems [1] and allow efficient offloading of the data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-11 21:15   ` Knight, Frederick
  0 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2021-05-11 21:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	Hannes Reinecke, kbusch, rwheeler, hch, zach.brown, osandov

I'd love to participate in this discussion.

You mention the 2 different models (single command vs. multi-command).  Just as a reminder, there are specific reasons for those 2 different models.

Some applications know both the source and the destination, so can use the single command model (the application is aware it is doing a copy).  But, there is a group of applications that do NOT know both pieces of information at the same time, in the same thread, in the same context (the application is NOT aware it is doing a copy - the application thinks it is doing reads and writes).

That is why there are 2 different models - because the application engineers didn't want to change their application.  So, the author of the CP application (the shell copy command) wanted to use the existing READ / WRITE model (2 commands).  Just replace the READ with "get the data ready" and replace the WRITE with "use the data you got ready".  It was easier for that application to use the existing model, rather than totally redesigning the application.

But, other application engineers had a code base that already knew a copy was happening, and their code already knew both the source and destination in the same code path. A BACKUP application is one that generally fits into this camp.  So, it was easier for that application to replace that function with a single copy request.  Another application was a VM mastering/replicating application that could spin up new VM images very quickly - the source and destination are known to be able to use a single request.

When this offload journey began, both interfaces were needed and used.  But yes, it did bifurcate the space, creating 2 camps of engineers - each with their favorite method (based on the application where they planned to use it).  Each camp of engineers often sees no reason that the other camp can't just switch to do it the way they do - if they'd only see the light.  But, originally, there were 2 different sets of requirements that each drove a specific design of a copy offload model.

Even NVMe has recently joined the copy offload camp with a new COPY command (single namespace, multiple source ranges, single destination range - works well for defrag, and other use cases). I'm confident its capabilities will grow over time.

SO, I think this will be a great discussion to have!!!

	Fred Knight



-----Original Message-----
From: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> 
Sent: Monday, May 10, 2021 8:16 PM
To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; martin.petersen@oracle.com; roland@purestorage.com; mpatocka@redhat.com; Hannes Reinecke <hare@suse.de>; kbusch@kernel.org; rwheeler@redhat.com; hch@lst.de; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; osandov@fb.com
Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices to be instructed to copy files/logical blocks without requiring involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded performance is limiting due to Denard scaling and multi-threaded performance is slowing down due Moore's law limitations. With the rise of SNIA Computation Technical Storage Working Group (TWG) [2], offloading computations to the device or over the fabrics is becoming popular as there are several solutions available [2]. One of the common operation which is popular in the kernel and is not merged yet is Copy offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The latest work which is posted by Mikulas [4] is not merged yet. These two approaches are totally different from each other. Several storage vendors discourage mixing copy offload requests with regular READ/WRITE I/O. Also, the fact that the operation fails if a copy request ever needs to be split as it traverses the stack it has the unfortunate side-effect of preventing copy offload from working in pretty much every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without splitting the command in two, one for copying IN and one for copying OUT. Which is then demonstrated by the [4] why [3] it is not a suitable candidate. Also, with [4] there is an unresolved problem with the two-command approach about how to handle changes to the DM layout between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2], existing SCSI XCopy support in the protocol, recent advancement in the Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the Linux Kernel mainly for NVMe devices [7] and eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.

With this background we have significant number of use-cases which are strong candidates waiting for outstanding Linux Kernel Block Layer Copy Offload support, so that Linux Kernel Storage subsystem can to address previously mentioned problems [1] and allow efficient offloading of the data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-11  0:15 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-11  0:15 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


^ permalink raw reply	[flat|nested] 272+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2021-05-11  0:15 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-11  0:15 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, martin.petersen, roland, mpatocka,
	hare, kbusch, rwheeler, hch, Frederick.Knight, zach.brown,
	osandov

Hi,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite file system, block layer, and device drivers
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Keith Busch

Regards,
Chaitanya

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-02-13  5:11     ` joshi.k
  (?)
@ 2020-02-13 13:09       ` Knight, Frederick
  -1 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2020-02-13 13:09 UTC (permalink / raw)
  To: joshi.k, 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	'Matias Bjorling', 'Stephen Bates',
	roland, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig',
	zach.brown, javier

FWIW - the design of NVMe Simply Copy specifically included versioning of the data structure that describes what to copy.

The reason for that was random peoples desire to complexify the Simple Copy command.  Specifically, there was room designed into the data structure to accommodate a source NSID (to allow cross namespace copy - the intention being namespaces attached to the same controller); and room to accommodate the KPIO key tag value for each source range.  Other people thought they could use this data structure versioning to design a fully SCSI XCOPY compatible data structure.

My point, is just to consider the flexibility and extensibility of the OS interfaces when thinking about "Simple Copy".

I'm just not sure how SIMPLY it will remain.

	Fred 


-----Original Message-----
From: joshi.k@samsung.com <joshi.k@samsung.com> 
Sent: Thursday, February 13, 2020 12:11 AM
To: 'Chaitanya Kulkarni' <Chaitanya.Kulkarni@wdc.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; 'Martin K. Petersen' <martin.petersen@oracle.com>; 'Matias Bjorling' <Matias.Bjorling@wdc.com>; 'Stephen Bates' <sbates@raithlin.com>; roland@purestorage.com; joshi.k@samsung.com; mpatocka@redhat.com; hare@suse.de; 'Keith Busch' <kbusch@kernel.org>; rwheeler@redhat.com; 'Christoph Hellwig' <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; joshi.k@samsung.com; javier@javigon.com
Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On 
> Behalf Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux- 
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux- 
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling 
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>; 
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch 
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig 
> <hch@lst.de>; frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
>
> Hi all,
>
> * Background :-
> ----------------------------------------------------------------------
> -
>
> Copy offload is a feature that allows file-systems or storage devices 
> to
be
> instructed to copy files/logical blocks without requiring involvement 
> of
the local
> CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is 
> slowing
down
> due Moore's law limitations. With the rise of SNIA Computation 
> Technical Storage Working Group (TWG) [2], offloading computations to 
> the device or over the fabrics is becoming popular as there are 
> several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
>
> * Problem :-
> ----------------------------------------------------------------------
> -
>
> The original work which is done by Martin is present here [3]. The 
> latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches 
> are totally different from each other. Several storage vendors 
> discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that 
> the operation fails if a copy request ever needs to be split as it 
> traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from 
> working in pretty much every common deployment configuration out there.
>
> * Current state of the work :-
> ----------------------------------------------------------------------
> -
>
> With [3] being hard to handle arbitrary DM/MD stacking without 
> splitting
the
> command in two, one for copying IN and one for copying OUT. Which is 
> then demonstrated by the [4] why [3] it is not a suitable candidate. 
> Also, with
[4]
> there is an unresolved problem with the two-command approach about how 
> to handle changes to the DM layout between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> ----------------------------------------------------------------------
> -
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux 
> Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
>
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy 
> Offload support, so that Linux Kernel Storage subsystem can to address 
> previously mentioned problems [1] and allow efficient offloading of 
> the data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates 
> waiting
for Copy
> Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> ----------------------------------------------------------------------
> -
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> ----------------------------------------------------------------------
> -
>
> I'd like to invite block layer, device drivers and file system 
> developers
to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
>
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy 
> [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
>
> Regards,
> Chaitanya
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-n
> vme



^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-02-13 13:09       ` Knight, Frederick
  0 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2020-02-13 13:09 UTC (permalink / raw)
  To: joshi.k, 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	javier, 'Stephen Bates',
	roland, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig', 'Matias Bjorling',
	zach.brown

FWIW - the design of NVMe Simply Copy specifically included versioning of the data structure that describes what to copy.

The reason for that was random peoples desire to complexify the Simple Copy command.  Specifically, there was room designed into the data structure to accommodate a source NSID (to allow cross namespace copy - the intention being namespaces attached to the same controller); and room to accommodate the KPIO key tag value for each source range.  Other people thought they could use this data structure versioning to design a fully SCSI XCOPY compatible data structure.

My point, is just to consider the flexibility and extensibility of the OS interfaces when thinking about "Simple Copy".

I'm just not sure how SIMPLY it will remain.

	Fred 


-----Original Message-----
From: joshi.k@samsung.com <joshi.k@samsung.com> 
Sent: Thursday, February 13, 2020 12:11 AM
To: 'Chaitanya Kulkarni' <Chaitanya.Kulkarni@wdc.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; 'Martin K. Petersen' <martin.petersen@oracle.com>; 'Matias Bjorling' <Matias.Bjorling@wdc.com>; 'Stephen Bates' <sbates@raithlin.com>; roland@purestorage.com; joshi.k@samsung.com; mpatocka@redhat.com; hare@suse.de; 'Keith Busch' <kbusch@kernel.org>; rwheeler@redhat.com; 'Christoph Hellwig' <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; joshi.k@samsung.com; javier@javigon.com
Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On 
> Behalf Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux- 
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux- 
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling 
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>; 
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch 
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig 
> <hch@lst.de>; frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
>
> Hi all,
>
> * Background :-
> ----------------------------------------------------------------------
> -
>
> Copy offload is a feature that allows file-systems or storage devices 
> to
be
> instructed to copy files/logical blocks without requiring involvement 
> of
the local
> CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is 
> slowing
down
> due Moore's law limitations. With the rise of SNIA Computation 
> Technical Storage Working Group (TWG) [2], offloading computations to 
> the device or over the fabrics is becoming popular as there are 
> several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
>
> * Problem :-
> ----------------------------------------------------------------------
> -
>
> The original work which is done by Martin is present here [3]. The 
> latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches 
> are totally different from each other. Several storage vendors 
> discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that 
> the operation fails if a copy request ever needs to be split as it 
> traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from 
> working in pretty much every common deployment configuration out there.
>
> * Current state of the work :-
> ----------------------------------------------------------------------
> -
>
> With [3] being hard to handle arbitrary DM/MD stacking without 
> splitting
the
> command in two, one for copying IN and one for copying OUT. Which is 
> then demonstrated by the [4] why [3] it is not a suitable candidate. 
> Also, with
[4]
> there is an unresolved problem with the two-command approach about how 
> to handle changes to the DM layout between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> ----------------------------------------------------------------------
> -
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux 
> Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
>
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy 
> Offload support, so that Linux Kernel Storage subsystem can to address 
> previously mentioned problems [1] and allow efficient offloading of 
> the data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates 
> waiting
for Copy
> Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> ----------------------------------------------------------------------
> -
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> ----------------------------------------------------------------------
> -
>
> I'd like to invite block layer, device drivers and file system 
> developers
to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
>
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy 
> [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
>
> Regards,
> Chaitanya
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-n
> vme



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-02-13 13:09       ` Knight, Frederick
  0 siblings, 0 replies; 272+ messages in thread
From: Knight, Frederick @ 2020-02-13 13:09 UTC (permalink / raw)
  To: joshi.k, 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	'Matias Bjorling', 'Stephen Bates',
	roland, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig',
	zach.brown, javier

FWIW - the design of NVMe Simply Copy specifically included versioning of the data structure that describes what to copy.

The reason for that was random peoples desire to complexify the Simple Copy command.  Specifically, there was room designed into the data structure to accommodate a source NSID (to allow cross namespace copy - the intention being namespaces attached to the same controller); and room to accommodate the KPIO key tag value for each source range.  Other people thought they could use this data structure versioning to design a fully SCSI XCOPY compatible data structure.

My point, is just to consider the flexibility and extensibility of the OS interfaces when thinking about "Simple Copy".

I'm just not sure how SIMPLY it will remain.

	Fred 


-----Original Message-----
From: joshi.k@samsung.com <joshi.k@samsung.com> 
Sent: Thursday, February 13, 2020 12:11 AM
To: 'Chaitanya Kulkarni' <Chaitanya.Kulkarni@wdc.com>; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-foundation.org
Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; 'Martin K. Petersen' <martin.petersen@oracle.com>; 'Matias Bjorling' <Matias.Bjorling@wdc.com>; 'Stephen Bates' <sbates@raithlin.com>; roland@purestorage.com; joshi.k@samsung.com; mpatocka@redhat.com; hare@suse.de; 'Keith Busch' <kbusch@kernel.org>; rwheeler@redhat.com; 'Christoph Hellwig' <hch@lst.de>; Knight, Frederick <Frederick.Knight@netapp.com>; zach.brown@ni.com; joshi.k@samsung.com; javier@javigon.com
Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On 
> Behalf Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux- 
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux- 
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling 
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>; 
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch 
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig 
> <hch@lst.de>; frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
>
> Hi all,
>
> * Background :-
> ----------------------------------------------------------------------
> -
>
> Copy offload is a feature that allows file-systems or storage devices 
> to
be
> instructed to copy files/logical blocks without requiring involvement 
> of
the local
> CPU.
>
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is 
> slowing
down
> due Moore's law limitations. With the rise of SNIA Computation 
> Technical Storage Working Group (TWG) [2], offloading computations to 
> the device or over the fabrics is becoming popular as there are 
> several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
>
> * Problem :-
> ----------------------------------------------------------------------
> -
>
> The original work which is done by Martin is present here [3]. The 
> latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches 
> are totally different from each other. Several storage vendors 
> discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that 
> the operation fails if a copy request ever needs to be split as it 
> traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from 
> working in pretty much every common deployment configuration out there.
>
> * Current state of the work :-
> ----------------------------------------------------------------------
> -
>
> With [3] being hard to handle arbitrary DM/MD stacking without 
> splitting
the
> command in two, one for copying IN and one for copying OUT. Which is 
> then demonstrated by the [4] why [3] it is not a suitable candidate. 
> Also, with
[4]
> there is an unresolved problem with the two-command approach about how 
> to handle changes to the DM layout between an IN and OUT operations.
>
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> ----------------------------------------------------------------------
> -
>
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux 
> Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
>
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy 
> Offload support, so that Linux Kernel Storage subsystem can to address 
> previously mentioned problems [1] and allow efficient offloading of 
> the data related operations. (Such as move/copy etc.)
>
> For reference following is the list of the use-cases/candidates 
> waiting
for Copy
> Offload support :-
>
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
> * What we will discuss in the proposed session ?
> ----------------------------------------------------------------------
> -
>
> I'd like to propose a session to go over this topic to understand :-
>
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
>
> * Required Participants :-
> ----------------------------------------------------------------------
> -
>
> I'd like to invite block layer, device drivers and file system 
> developers
to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
>
> Required attendees :-
>
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
>
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy 
> [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
>
> Regards,
> Chaitanya
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-n
> vme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14   ` Chaitanya Kulkarni
  (?)
@ 2020-02-13  5:11     ` joshi.k
  -1 siblings, 0 replies; 272+ messages in thread
From: joshi.k @ 2020-02-13  5:11 UTC (permalink / raw)
  To: 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	'Matias Bjorling', 'Stephen Bates',
	roland, joshi.k, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig',
	frederick.knight, zach.brown, joshi.k, javier

I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss
and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single
namespace. Some of the problems that original XCOPY work [2] faced may not
be applicable for simple-copy, e.g. split of single copy due to differing
device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple
copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend
copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses
page-cache, which is fair. But, for relatively cold/warm data (if that needs
to be garbage-collected anyway), it can rather bypass the Host and skip
running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will
benefit from internal copy. But failure scenarios (partial copy,
write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf
> Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>;
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig <hch@lst.de>;
> frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
> 
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices to
be
> instructed to copy files/logical blocks without requiring involvement of
the local
> CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is slowing
down
> due Moore's law limitations. With the rise of SNIA Computation Technical
> Storage Working Group (TWG) [2], offloading computations to the device or
> over the fabrics is becoming popular as there are several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches are
> totally different from each other. Several storage vendors discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that the
> operation fails if a copy request ever needs to be split as it traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from working in
> pretty much every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without splitting
the
> command in two, one for copying IN and one for copying OUT. Which is then
> demonstrated by the [4] why [3] it is not a suitable candidate. Also, with
[4]
> there is an unresolved problem with the two-command approach about how to
> handle changes to the DM layout between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy Offload
> support, so that Linux Kernel Storage subsystem can to address previously
> mentioned problems [1] and allow efficient offloading of the data related
> operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
for Copy
> Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system developers
to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-nvme



^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-02-13  5:11     ` joshi.k
  0 siblings, 0 replies; 272+ messages in thread
From: joshi.k @ 2020-02-13  5:11 UTC (permalink / raw)
  To: 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, frederick.knight, msnitzer, bvanassche,
	'Martin K. Petersen', javier, 'Stephen Bates',
	roland, 'Keith Busch',
	mpatocka, hare, joshi.k, rwheeler, 'Christoph Hellwig',
	'Matias Bjorling',
	zach.brown

I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss
and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single
namespace. Some of the problems that original XCOPY work [2] faced may not
be applicable for simple-copy, e.g. split of single copy due to differing
device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple
copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend
copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses
page-cache, which is fair. But, for relatively cold/warm data (if that needs
to be garbage-collected anyway), it can rather bypass the Host and skip
running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will
benefit from internal copy. But failure scenarios (partial copy,
write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf
> Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>;
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig <hch@lst.de>;
> frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
> 
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices to
be
> instructed to copy files/logical blocks without requiring involvement of
the local
> CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is slowing
down
> due Moore's law limitations. With the rise of SNIA Computation Technical
> Storage Working Group (TWG) [2], offloading computations to the device or
> over the fabrics is becoming popular as there are several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches are
> totally different from each other. Several storage vendors discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that the
> operation fails if a copy request ever needs to be split as it traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from working in
> pretty much every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without splitting
the
> command in two, one for copying IN and one for copying OUT. Which is then
> demonstrated by the [4] why [3] it is not a suitable candidate. Also, with
[4]
> there is an unresolved problem with the two-command approach about how to
> handle changes to the DM layout between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy Offload
> support, so that Linux Kernel Storage subsystem can to address previously
> mentioned problems [1] and allow efficient offloading of the data related
> operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
for Copy
> Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system developers
to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-nvme



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-02-13  5:11     ` joshi.k
  0 siblings, 0 replies; 272+ messages in thread
From: joshi.k @ 2020-02-13  5:11 UTC (permalink / raw)
  To: 'Chaitanya Kulkarni',
	linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, 'Martin K. Petersen',
	'Matias Bjorling', 'Stephen Bates',
	roland, joshi.k, mpatocka, hare, 'Keith Busch',
	rwheeler, 'Christoph Hellwig',
	frederick.knight, zach.brown

I am very keen on this topic.
I've been doing some work for "NVMe simple copy", and would like to discuss
and solicit opinion of community on the following:

- Simple-copy, unlike XCOPY and P2P, is limited to copy within a single
namespace. Some of the problems that original XCOPY work [2] faced may not
be applicable for simple-copy, e.g. split of single copy due to differing
device-specific limits.
Hope I'm not missing something in thinking so?

- [Block I/O] Async interface (through io-uring or AIO) so that multiple
copy operations can be queued.

- [File I/O to user-space] I think it may make sense to extend
copy_file_range API to do in-device copy as well.

- [F2FS] GC of F2FS may leverage the interface. Currently it uses
page-cache, which is fair. But, for relatively cold/warm data (if that needs
to be garbage-collected anyway), it can rather bypass the Host and skip
running into a scenario when something (useful) gets thrown out of cache.

- [ZNS] ZNS users (kernel or user-space) would be log-structured, and will
benefit from internal copy. But failure scenarios (partial copy,
write-pointer position) need to be discussed.

Thanks,
Kanchan

> -----Original Message-----
> From: linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf
> Of Chaitanya Kulkarni
> Sent: Tuesday, January 7, 2020 11:44 PM
> To: linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; linux-
> nvme@lists.infradead.org; dm-devel@redhat.com; lsf-pc@lists.linux-
> foundation.org
> Cc: axboe@kernel.dk; msnitzer@redhat.com; bvanassche@acm.org; Martin K.
> Petersen <martin.petersen@oracle.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>; Stephen Bates <sbates@raithlin.com>;
> roland@purestorage.com; mpatocka@redhat.com; hare@suse.de; Keith Busch
> <kbusch@kernel.org>; rwheeler@redhat.com; Christoph Hellwig <hch@lst.de>;
> frederick.knight@netapp.com; zach.brown@ni.com
> Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
> 
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices to
be
> instructed to copy files/logical blocks without requiring involvement of
the local
> CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
performance is
> limiting due to Denard scaling and multi-threaded performance is slowing
down
> due Moore's law limitations. With the rise of SNIA Computation Technical
> Storage Working Group (TWG) [2], offloading computations to the device or
> over the fabrics is becoming popular as there are several solutions
available [2].
> One of the common operation which is popular in the kernel and is not
merged
> yet is Copy offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The latest
work
> which is posted by Mikulas [4] is not merged yet. These two approaches are
> totally different from each other. Several storage vendors discourage
mixing
> copy offload requests with regular READ/WRITE I/O. Also, the fact that the
> operation fails if a copy request ever needs to be split as it traverses
the stack it
> has the unfortunate side-effect of preventing copy offload from working in
> pretty much every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without splitting
the
> command in two, one for copying IN and one for copying OUT. Which is then
> demonstrated by the [4] why [3] it is not a suitable candidate. Also, with
[4]
> there is an unresolved problem with the two-command approach about how to
> handle changes to the DM layout between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
existing
> SCSI XCopy support in the protocol, recent advancement in the Linux Kernel
File
> System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the
Linux
> Kernel mainly for NVMe devices [7] and eventually NVMe Devices and
subsystem
> (NVMe PCIe/NVMeOF) will benefit from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
strong
> candidates waiting for outstanding Linux Kernel Block Layer Copy Offload
> support, so that Linux Kernel Storage subsystem can to address previously
> mentioned problems [1] and allow efficient offloading of the data related
> operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
for Copy
> Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system developers
to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062-
> 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp-
> content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-
> History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823-
> 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational
> https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993-
> 0cc47a31ba82-
> 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution
> -descriptions/napatech-smartnic-solution-for-hardware-offload/
>       https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685-
> 0cc47a31ba82-
> 277b6b09d36e6567&u=https://www.eideticom.com/products.html
> https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a-
> 0cc47a31ba82-
> a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data-
> center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4]
> https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973-
> 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux-
> block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9-
> 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm-
> specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1-
> 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux-
> p2pmem
> [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73-
> 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715-
> 0cc47a31ba82-
> 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-nvme



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14   ` Chaitanya Kulkarni
  (?)
@ 2020-01-24 14:23     ` Nikos Tsironis
  -1 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2020-01-24 14:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

On 1/7/20 8:14 PM, Chaitanya Kulkarni wrote:
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 

This is a very interesting topic and I would like to participate in the
discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-24 14:23     ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2020-01-24 14:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, hare, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

On 1/7/20 8:14 PM, Chaitanya Kulkarni wrote:
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 

This is a very interesting topic and I would like to participate in the
discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-24 14:23     ` Nikos Tsironis
  0 siblings, 0 replies; 272+ messages in thread
From: Nikos Tsironis @ 2020-01-24 14:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

On 1/7/20 8:14 PM, Chaitanya Kulkarni wrote:
> Hi all,
> 
> * Background :-
> -----------------------------------------------------------------------
> 
> Copy offload is a feature that allows file-systems or storage devices
> to be instructed to copy files/logical blocks without requiring
> involvement of the local CPU.
> 
> With reference to the RISC-V summit keynote [1] single threaded
> performance is limiting due to Denard scaling and multi-threaded
> performance is slowing down due Moore's law limitations. With the rise
> of SNIA Computation Technical Storage Working Group (TWG) [2],
> offloading computations to the device or over the fabrics is becoming
> popular as there are several solutions available [2]. One of the common
> operation which is popular in the kernel and is not merged yet is Copy
> offload over the fabrics or on to the device.
> 
> * Problem :-
> -----------------------------------------------------------------------
> 
> The original work which is done by Martin is present here [3]. The
> latest work which is posted by Mikulas [4] is not merged yet. These two
> approaches are totally different from each other. Several storage
> vendors discourage mixing copy offload requests with regular READ/WRITE
> I/O. Also, the fact that the operation fails if a copy request ever
> needs to be split as it traverses the stack it has the unfortunate
> side-effect of preventing copy offload from working in pretty much
> every common deployment configuration out there.
> 
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.
> 
> * Why Linux Kernel Storage System needs Copy Offload support now ?
> -----------------------------------------------------------------------
> 
> With the rise of the SNIA Computational Storage TWG and solutions [2],
> existing SCSI XCopy support in the protocol, recent advancement in the
> Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
> DMA support in the Linux Kernel mainly for NVMe devices [7] and
> eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
> from Copy offload operation.
> 
> With this background we have significant number of use-cases which are
> strong candidates waiting for outstanding Linux Kernel Block Layer Copy
> Offload support, so that Linux Kernel Storage subsystem can to address
> previously mentioned problems [1] and allow efficient offloading of the
> data related operations. (Such as move/copy etc.)
> 
> For reference following is the list of the use-cases/candidates waiting
> for Copy Offload support :-
> 
> 1. SCSI-attached storage arrays.
> 2. Stacking drivers supporting XCopy DM/MD.
> 3. Computational Storage solutions.
> 7. File systems :- Local, NFS and Zonefs.
> 4. Block devices :- Distributed, local, and Zoned devices.
> 5. Peer to Peer DMA support solutions.
> 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session to go over this topic to understand :-
> 
> 1. What are the blockers for Copy Offload implementation ?
> 2. Discussion about having a file system interface.
> 3. Discussion about having right system call for user-space.
> 4. What is the right way to move this work forward ?
> 5. How can we help to contribute and move this work forward ?
> 
> * Required Participants :-
> -----------------------------------------------------------------------
> 
> I'd like to invite block layer, device drivers and file system
> developers to:-
> 
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with [4].
> 3. Uncover additional details that are missing from this proposal.
> 
> Required attendees :-
> 
> Martin K. Petersen
> Jens Axboe
> Christoph Hellwig
> Bart Van Assche
> Stephen Bates
> Zach Brown
> Roland Dreier
> Ric Wheeler
> Trond Myklebust
> Mike Snitzer
> Keith Busch
> Sagi Grimberg
> Hannes Reinecke
> Frederick Knight
> Mikulas Patocka
> Matias Bjørling
> 
> [1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
> [2] https://www.snia.org/computational
> https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>        https://www.eideticom.com/products.html
> https://www.xilinx.com/applications/data-center/computational-storage.html
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
> [4] https://www.spinics.net/lists/linux-block/msg00599.html
> [5] https://lwn.net/Articles/793585/
> [6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [7] https://github.com/sbates130272/linux-p2pmem
> [8] https://kernel.dk/io_uring.pdf
> 
> Regards,
> Chaitanya
> 

This is a very interesting topic and I would like to participate in the
discussion too.

The dm-clone target would also benefit from copy offload, as it heavily
employs dm-kcopyd. I have been exploring redesigning kcopyd in order to
achieve increased IOPS in dm-clone and dm-snapshot for small copies over
NVMe devices, but copy offload sounds even more promising, especially
for larger copies happening in the background (as is the case with
dm-clone's background hydration).

Thanks,
Nikos

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18     ` Bart Van Assche
  (?)
@ 2020-01-10  5:33       ` Martin K. Petersen
  -1 siblings, 0 replies; 272+ messages in thread
From: Martin K. Petersen @ 2020-01-10  5:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc, axboe, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling


Bart,

> * Copying must be supported not only within a single storage device but
>   also between storage devices.

Identifying which devices to permit copies between has been challenging.
That has since been addressed in T10.

> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).

I don't think LID1 vs LID4 is particularly interesting for the Linux use
case. It's just an additional command tag since the copy manager is a
third party.

> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).

Microsoft uses the token commands.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-10  5:33       ` Martin K. Petersen
  0 siblings, 0 replies; 272+ messages in thread
From: Martin K. Petersen @ 2020-01-10  5:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, roland, msnitzer, Chaitanya Kulkarni, linux-scsi,
	Matias Bjorling, linux-nvme, Stephen Bates, linux-block,
	dm-devel, mpatocka, hare, Martin K. Petersen, Keith Busch,
	lsf-pc, rwheeler, Christoph Hellwig, frederick.knight,
	zach.brown


Bart,

> * Copying must be supported not only within a single storage device but
>   also between storage devices.

Identifying which devices to permit copies between has been challenging.
That has since been addressed in T10.

> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).

I don't think LID1 vs LID4 is particularly interesting for the Linux use
case. It's just an additional command tag since the copy manager is a
third party.

> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).

Microsoft uses the token commands.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-10  5:33       ` Martin K. Petersen
  0 siblings, 0 replies; 272+ messages in thread
From: Martin K. Petersen @ 2020-01-10  5:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, roland, msnitzer, Chaitanya Kulkarni, linux-scsi,
	Matias Bjorling, linux-nvme, Stephen Bates, linux-block,
	dm-devel, mpatocka, Martin K. Petersen, Keith Busch, lsf-pc,
	rwheeler, Christoph Hellwig, frederick.knight, zach.brown


Bart,

> * Copying must be supported not only within a single storage device but
>   also between storage devices.

Identifying which devices to permit copies between has been challenging.
That has since been addressed in T10.

> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).

I don't think LID1 vs LID4 is particularly interesting for the Linux use
case. It's just an additional command tag since the copy manager is a
third party.

> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).

Microsoft uses the token commands.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18     ` Bart Van Assche
  (?)
@ 2020-01-09  5:56       ` Damien Le Moal
  -1 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2020-01-09  5:56 UTC (permalink / raw)
  To: Bart Van Assche, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
> 
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).

Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.

> 
> Thanks,
> 
> Bart.
> 
> 
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
> 
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
>   dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
>   and RAID, at least if RAID is supported by the filesystem. Note: the
>   BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
>   should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
>   network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
>   copying can happen locally on the storage array. XCOPY is widely used
>   for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
> 
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
>   support asynchronous operation such that users of this API are not
>   blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
>   also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
>   (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
> 
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
>   List Identifier length of 1 byte and LID4 stands for a List Identifier
>   length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
>   EXTENDED COPY(LID4) (83h/01h).
> 
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
> 
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
> 
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
> 
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
> 
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
> 
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
>   -> dm_make_request()
>     -> __dm_make_request()
>       -> __split_and_process_bio()
>         -> __split_and_process_non_flush()
>           -> __clone_and_map_data_bio()
>           -> alloc_tio()
>           -> clone_bio()
>             -> bio_advance()
>           -> __map_bio()
> 
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
>   #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
> 
> struct file_clone_range {
> 	__s64 src_fd;
> 	__u64 src_offset;
> 	__u64 src_length;
> 	__u64 dest_offset;
> };
> 
> * The sendfile() system call. sendfile() copies a given number of bytes
>   from one file to another. The output offset is the offset of the
>   output file descriptor. The input offset is either the input file
>   descriptor offset or can be specified explicitly. The sendfile()
>   prototype is as follows:
>   ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
>   ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
>   prototype is as follows:
>   ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>      loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
>   functionality since it copies data from or to a pipe. Its prototype is
>   as follows:
>   long splice(struct file *in, loff_t *off_in, struct file *out,
>     loff_t *off_out, size_t len, unsigned int flags);
> 
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
>   device are both specified in the same bio. Only works for block
>   devices. Does not work for files. Adds a new blocking ioctl() for
>   XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
>   REQ_OP_COPY_READ operations. These are sent individually down stacked
>   drivers and are paired by the driver at the bottom of the stack.
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  5:56       ` Damien Le Moal
  0 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2020-01-09  5:56 UTC (permalink / raw)
  To: Bart Van Assche, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, hare, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
> 
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).

Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.

> 
> Thanks,
> 
> Bart.
> 
> 
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
> 
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
>   dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
>   and RAID, at least if RAID is supported by the filesystem. Note: the
>   BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
>   should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
>   network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
>   copying can happen locally on the storage array. XCOPY is widely used
>   for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
> 
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
>   support asynchronous operation such that users of this API are not
>   blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
>   also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
>   (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
> 
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
>   List Identifier length of 1 byte and LID4 stands for a List Identifier
>   length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
>   EXTENDED COPY(LID4) (83h/01h).
> 
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
> 
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
> 
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
> 
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
> 
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
> 
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
>   -> dm_make_request()
>     -> __dm_make_request()
>       -> __split_and_process_bio()
>         -> __split_and_process_non_flush()
>           -> __clone_and_map_data_bio()
>           -> alloc_tio()
>           -> clone_bio()
>             -> bio_advance()
>           -> __map_bio()
> 
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
>   #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
> 
> struct file_clone_range {
> 	__s64 src_fd;
> 	__u64 src_offset;
> 	__u64 src_length;
> 	__u64 dest_offset;
> };
> 
> * The sendfile() system call. sendfile() copies a given number of bytes
>   from one file to another. The output offset is the offset of the
>   output file descriptor. The input offset is either the input file
>   descriptor offset or can be specified explicitly. The sendfile()
>   prototype is as follows:
>   ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
>   ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
>   prototype is as follows:
>   ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>      loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
>   functionality since it copies data from or to a pipe. Its prototype is
>   as follows:
>   long splice(struct file *in, loff_t *off_in, struct file *out,
>     loff_t *off_out, size_t len, unsigned int flags);
> 
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
>   device are both specified in the same bio. Only works for block
>   devices. Does not work for files. Adds a new blocking ioctl() for
>   XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
>   REQ_OP_COPY_READ operations. These are sent individually down stacked
>   drivers and are paired by the driver at the bottom of the stack.
> 
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  5:56       ` Damien Le Moal
  0 siblings, 0 replies; 272+ messages in thread
From: Damien Le Moal @ 2020-01-09  5:56 UTC (permalink / raw)
  To: Bart Van Assche, Chaitanya Kulkarni, linux-block, linux-scsi,
	linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
> 
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).

Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.

> 
> Thanks,
> 
> Bart.
> 
> 
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
> 
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
>   dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
>   and RAID, at least if RAID is supported by the filesystem. Note: the
>   BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
>   should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
>   network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
>   copying can happen locally on the storage array. XCOPY is widely used
>   for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
> 
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
>   support asynchronous operation such that users of this API are not
>   blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
>   also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
>   (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
> 
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
>   List Identifier length of 1 byte and LID4 stands for a List Identifier
>   length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
>   EXTENDED COPY(LID4) (83h/01h).
> 
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
> 
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
> 
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
> 
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
> 
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
> 
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
>   -> dm_make_request()
>     -> __dm_make_request()
>       -> __split_and_process_bio()
>         -> __split_and_process_non_flush()
>           -> __clone_and_map_data_bio()
>           -> alloc_tio()
>           -> clone_bio()
>             -> bio_advance()
>           -> __map_bio()
> 
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
>   #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
> 
> struct file_clone_range {
> 	__s64 src_fd;
> 	__u64 src_offset;
> 	__u64 src_length;
> 	__u64 dest_offset;
> };
> 
> * The sendfile() system call. sendfile() copies a given number of bytes
>   from one file to another. The output offset is the offset of the
>   output file descriptor. The input offset is either the input file
>   descriptor offset or can be specified explicitly. The sendfile()
>   prototype is as follows:
>   ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
>   ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
>   prototype is as follows:
>   ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>      loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
>   functionality since it copies data from or to a pipe. Its prototype is
>   as follows:
>   long splice(struct file *in, loff_t *off_in, struct file *out,
>     loff_t *off_out, size_t len, unsigned int flags);
> 
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
>   device are both specified in the same bio. Only works for block
>   devices. Does not work for files. Adds a new blocking ioctl() for
>   XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
>   REQ_OP_COPY_READ operations. These are sent individually down stacked
>   drivers and are paired by the driver at the bottom of the stack.
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-09  3:18     ` Bart Van Assche
  (?)
@ 2020-01-09  4:01       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-09  4:01 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).
>
> Thanks,
>
> Bart.
>

Thanks for sharing this Bart, this is very helpful.

I've not found any notes on lwn for the session which was held in 2018.



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  4:01       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-09  4:01 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, hare, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).
>
> Thanks,
>
> Bart.
>

Thanks for sharing this Bart, this is very helpful.

I've not found any notes on lwn for the session which was held in 2018.



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  4:01       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-09  4:01 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).
>
> Thanks,
>
> Bart.
>

Thanks for sharing this Bart, this is very helpful.

I've not found any notes on lwn for the session which was held in 2018.



^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14   ` Chaitanya Kulkarni
  (?)
@ 2020-01-09  3:18     ` Bart Van Assche
  -1 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2020-01-09  3:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, hare, Martin K. Petersen, Keith Busch, Christoph Hellwig,
	Stephen Bates, msnitzer, mpatocka, zach.brown, roland, rwheeler,
	frederick.knight, Matias Bjorling

On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.

Was this last discussed during the 2018 edition of LSF/MM (see also
https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
taken notes during that session? I haven't found a report of that
session in the official proceedings (https://lwn.net/Articles/752509/).

Thanks,

Bart.


This is my own collection with two year old notes about copy offloading
for the Linux Kernel:

Potential Users
* All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
  dm-writecache and dm-zoned.
* Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
  and RAID, at least if RAID is supported by the filesystem. Note: the
  BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
  should use FICLONERANGE instead.
* Network filesystems, e.g. NFS. Copying at the server side can reduce
  network traffic significantly.
* Linux SCSI initiator systems connected to SAN systems such that
  copying can happen locally on the storage array. XCOPY is widely used
  for provisioning virtual machine images.
* Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.

Requirements
* The block layer must gain support for XCOPY. The new XCOPY API must
  support asynchronous operation such that users of this API are not
  blocked while the XCOPY operation is in progress.
* Copying must be supported not only within a single storage device but
  also between storage devices.
* The SCSI sd driver must gain support for XCOPY.
* A user space API must be added and that API must support asynchronous
  (non-blocking) operation.
* The block layer XCOPY primitive must be support by the device mapper.

SCSI Extended Copy (ANSI T10 SPC)
The SCSI commands that support extended copy operations are:
* POPULATE TOKEN + WRITE USING TOKEN.
* EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
  List Identifier length of 1 byte and LID4 stands for a List Identifier
  length of 4 bytes.
* SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
  EXTENDED COPY(LID4) (83h/01h).

Existing Users and Implementations of SCSI XCOPY
* VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
* Microsoft, which uses ODX (aka LID4 because it has a four-byte length
  ID).
* Storage vendors all support XCOPY, but ODX support is growing.

Block Layer Notes
The block layer supports the following types of block drivers:
* blk-mq request-based drivers.
* make_request drivers.

Notes:
With each request a list of bio's is associated.
Since submit_bio() only accepts a single bio and not a bio list this
means that all make_request block drivers process one bio at a time.

Device Mapper
The device mapper core supports bio processing and blk-mq requests. The
function in the device mapper that creates a request queue is called
alloc_dev(). That function not only allocates a request queue but also
associates a struct gendisk with the request queue. The
DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
causes the type of a dm device to be set to one of the following:
DM_TYPE_NONE;
DM_TYPE_BIO_BASED;
DM_TYPE_REQUEST_BASED;
DM_TYPE_MQ_REQUEST_BASED;
DM_TYPE_DAX_BIO_BASED;
DM_TYPE_NVME_BIO_BASED.

Device mapper drivers must implement target_type.map(),
target_type.clone_and_map_rq() or both. .map() maps a bio list.
.clone_and_map_rq() maps a single request. The multipath and error
device mapper drivers implement both methods. All other dm drivers only
implement the .map() method.

Device mapper bio processing
submit_bio()
-> generic_make_request()
  -> dm_make_request()
    -> __dm_make_request()
      -> __split_and_process_bio()
        -> __split_and_process_non_flush()
          -> __clone_and_map_data_bio()
          -> alloc_tio()
          -> clone_bio()
            -> bio_advance()
          -> __map_bio()

Existing Linux Copy Offload APIs
* The FICLONERANGE ioctl. From <include/linux/fs.h>:
  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)

struct file_clone_range {
	__s64 src_fd;
	__u64 src_offset;
	__u64 src_length;
	__u64 dest_offset;
};

* The sendfile() system call. sendfile() copies a given number of bytes
  from one file to another. The output offset is the offset of the
  output file descriptor. The input offset is either the input file
  descriptor offset or can be specified explicitly. The sendfile()
  prototype is as follows:
  ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
  ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
* The copy_file_range() system call. See also vfs_copy_file_range(). Its
  prototype is as follows:
  ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
     loff_t *off_out, size_t len, unsigned int flags);
* The splice() system call is not appropriate for adding extended copy
  functionality since it copies data from or to a pipe. Its prototype is
  as follows:
  long splice(struct file *in, loff_t *off_in, struct file *out,
    loff_t *off_out, size_t len, unsigned int flags);

Existing Linux Block Layer Copy Offload Implementations
* Martin Petersen's REQ_COPY bio, where source and destination block
  device are both specified in the same bio. Only works for block
  devices. Does not work for files. Adds a new blocking ioctl() for
  XCOPY from user space.
* Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
  REQ_OP_COPY_READ operations. These are sent individually down stacked
  drivers and are paired by the driver at the bottom of the stack.


^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  3:18     ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2020-01-09  3:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, hare, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.

Was this last discussed during the 2018 edition of LSF/MM (see also
https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
taken notes during that session? I haven't found a report of that
session in the official proceedings (https://lwn.net/Articles/752509/).

Thanks,

Bart.


This is my own collection with two year old notes about copy offloading
for the Linux Kernel:

Potential Users
* All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
  dm-writecache and dm-zoned.
* Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
  and RAID, at least if RAID is supported by the filesystem. Note: the
  BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
  should use FICLONERANGE instead.
* Network filesystems, e.g. NFS. Copying at the server side can reduce
  network traffic significantly.
* Linux SCSI initiator systems connected to SAN systems such that
  copying can happen locally on the storage array. XCOPY is widely used
  for provisioning virtual machine images.
* Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.

Requirements
* The block layer must gain support for XCOPY. The new XCOPY API must
  support asynchronous operation such that users of this API are not
  blocked while the XCOPY operation is in progress.
* Copying must be supported not only within a single storage device but
  also between storage devices.
* The SCSI sd driver must gain support for XCOPY.
* A user space API must be added and that API must support asynchronous
  (non-blocking) operation.
* The block layer XCOPY primitive must be support by the device mapper.

SCSI Extended Copy (ANSI T10 SPC)
The SCSI commands that support extended copy operations are:
* POPULATE TOKEN + WRITE USING TOKEN.
* EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
  List Identifier length of 1 byte and LID4 stands for a List Identifier
  length of 4 bytes.
* SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
  EXTENDED COPY(LID4) (83h/01h).

Existing Users and Implementations of SCSI XCOPY
* VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
* Microsoft, which uses ODX (aka LID4 because it has a four-byte length
  ID).
* Storage vendors all support XCOPY, but ODX support is growing.

Block Layer Notes
The block layer supports the following types of block drivers:
* blk-mq request-based drivers.
* make_request drivers.

Notes:
With each request a list of bio's is associated.
Since submit_bio() only accepts a single bio and not a bio list this
means that all make_request block drivers process one bio at a time.

Device Mapper
The device mapper core supports bio processing and blk-mq requests. The
function in the device mapper that creates a request queue is called
alloc_dev(). That function not only allocates a request queue but also
associates a struct gendisk with the request queue. The
DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
causes the type of a dm device to be set to one of the following:
DM_TYPE_NONE;
DM_TYPE_BIO_BASED;
DM_TYPE_REQUEST_BASED;
DM_TYPE_MQ_REQUEST_BASED;
DM_TYPE_DAX_BIO_BASED;
DM_TYPE_NVME_BIO_BASED.

Device mapper drivers must implement target_type.map(),
target_type.clone_and_map_rq() or both. .map() maps a bio list.
.clone_and_map_rq() maps a single request. The multipath and error
device mapper drivers implement both methods. All other dm drivers only
implement the .map() method.

Device mapper bio processing
submit_bio()
-> generic_make_request()
  -> dm_make_request()
    -> __dm_make_request()
      -> __split_and_process_bio()
        -> __split_and_process_non_flush()
          -> __clone_and_map_data_bio()
          -> alloc_tio()
          -> clone_bio()
            -> bio_advance()
          -> __map_bio()

Existing Linux Copy Offload APIs
* The FICLONERANGE ioctl. From <include/linux/fs.h>:
  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)

struct file_clone_range {
	__s64 src_fd;
	__u64 src_offset;
	__u64 src_length;
	__u64 dest_offset;
};

* The sendfile() system call. sendfile() copies a given number of bytes
  from one file to another. The output offset is the offset of the
  output file descriptor. The input offset is either the input file
  descriptor offset or can be specified explicitly. The sendfile()
  prototype is as follows:
  ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
  ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
* The copy_file_range() system call. See also vfs_copy_file_range(). Its
  prototype is as follows:
  ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
     loff_t *off_out, size_t len, unsigned int flags);
* The splice() system call is not appropriate for adding extended copy
  functionality since it copies data from or to a pipe. Its prototype is
  as follows:
  long splice(struct file *in, loff_t *off_in, struct file *out,
    loff_t *off_out, size_t len, unsigned int flags);

Existing Linux Block Layer Copy Offload Implementations
* Martin Petersen's REQ_COPY bio, where source and destination block
  device are both specified in the same bio. Only works for block
  devices. Does not work for files. Adds a new blocking ioctl() for
  XCOPY from user space.
* Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
  REQ_OP_COPY_READ operations. These are sent individually down stacked
  drivers and are paired by the driver at the bottom of the stack.


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  3:18     ` Bart Van Assche
  0 siblings, 0 replies; 272+ messages in thread
From: Bart Van Assche @ 2020-01-09  3:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-scsi, linux-nvme,
	dm-devel, lsf-pc
  Cc: axboe, msnitzer, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
> * Current state of the work :-
> -----------------------------------------------------------------------
> 
> With [3] being hard to handle arbitrary DM/MD stacking without
> splitting the command in two, one for copying IN and one for copying
> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
> candidate. Also, with [4] there is an unresolved problem with the
> two-command approach about how to handle changes to the DM layout
> between an IN and OUT operations.

Was this last discussed during the 2018 edition of LSF/MM (see also
https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
taken notes during that session? I haven't found a report of that
session in the official proceedings (https://lwn.net/Articles/752509/).

Thanks,

Bart.


This is my own collection with two year old notes about copy offloading
for the Linux Kernel:

Potential Users
* All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
  dm-writecache and dm-zoned.
* Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
  and RAID, at least if RAID is supported by the filesystem. Note: the
  BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
  should use FICLONERANGE instead.
* Network filesystems, e.g. NFS. Copying at the server side can reduce
  network traffic significantly.
* Linux SCSI initiator systems connected to SAN systems such that
  copying can happen locally on the storage array. XCOPY is widely used
  for provisioning virtual machine images.
* Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.

Requirements
* The block layer must gain support for XCOPY. The new XCOPY API must
  support asynchronous operation such that users of this API are not
  blocked while the XCOPY operation is in progress.
* Copying must be supported not only within a single storage device but
  also between storage devices.
* The SCSI sd driver must gain support for XCOPY.
* A user space API must be added and that API must support asynchronous
  (non-blocking) operation.
* The block layer XCOPY primitive must be support by the device mapper.

SCSI Extended Copy (ANSI T10 SPC)
The SCSI commands that support extended copy operations are:
* POPULATE TOKEN + WRITE USING TOKEN.
* EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
  List Identifier length of 1 byte and LID4 stands for a List Identifier
  length of 4 bytes.
* SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
  EXTENDED COPY(LID4) (83h/01h).

Existing Users and Implementations of SCSI XCOPY
* VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
* Microsoft, which uses ODX (aka LID4 because it has a four-byte length
  ID).
* Storage vendors all support XCOPY, but ODX support is growing.

Block Layer Notes
The block layer supports the following types of block drivers:
* blk-mq request-based drivers.
* make_request drivers.

Notes:
With each request a list of bio's is associated.
Since submit_bio() only accepts a single bio and not a bio list this
means that all make_request block drivers process one bio at a time.

Device Mapper
The device mapper core supports bio processing and blk-mq requests. The
function in the device mapper that creates a request queue is called
alloc_dev(). That function not only allocates a request queue but also
associates a struct gendisk with the request queue. The
DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
causes the type of a dm device to be set to one of the following:
DM_TYPE_NONE;
DM_TYPE_BIO_BASED;
DM_TYPE_REQUEST_BASED;
DM_TYPE_MQ_REQUEST_BASED;
DM_TYPE_DAX_BIO_BASED;
DM_TYPE_NVME_BIO_BASED.

Device mapper drivers must implement target_type.map(),
target_type.clone_and_map_rq() or both. .map() maps a bio list.
.clone_and_map_rq() maps a single request. The multipath and error
device mapper drivers implement both methods. All other dm drivers only
implement the .map() method.

Device mapper bio processing
submit_bio()
-> generic_make_request()
  -> dm_make_request()
    -> __dm_make_request()
      -> __split_and_process_bio()
        -> __split_and_process_non_flush()
          -> __clone_and_map_data_bio()
          -> alloc_tio()
          -> clone_bio()
            -> bio_advance()
          -> __map_bio()

Existing Linux Copy Offload APIs
* The FICLONERANGE ioctl. From <include/linux/fs.h>:
  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)

struct file_clone_range {
	__s64 src_fd;
	__u64 src_offset;
	__u64 src_length;
	__u64 dest_offset;
};

* The sendfile() system call. sendfile() copies a given number of bytes
  from one file to another. The output offset is the offset of the
  output file descriptor. The input offset is either the input file
  descriptor offset or can be specified explicitly. The sendfile()
  prototype is as follows:
  ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
  ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
* The copy_file_range() system call. See also vfs_copy_file_range(). Its
  prototype is as follows:
  ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
     loff_t *off_out, size_t len, unsigned int flags);
* The splice() system call is not appropriate for adding extended copy
  functionality since it copies data from or to a pipe. Its prototype is
  as follows:
  long splice(struct file *in, loff_t *off_in, struct file *out,
    loff_t *off_out, size_t len, unsigned int flags);

Existing Linux Block Layer Copy Offload Implementations
* Martin Petersen's REQ_COPY bio, where source and destination block
  device are both specified in the same bio. Only works for block
  devices. Does not work for files. Adds a new blocking ioctl() for
  XCOPY from user space.
* Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
  REQ_OP_COPY_READ operations. These are sent individually down stacked
  drivers and are paired by the driver at the bottom of the stack.

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-08 10:17     ` Javier González
  (?)
@ 2020-01-09  0:51       ` Logan Gunthorpe
  -1 siblings, 0 replies; 272+ messages in thread
From: Logan Gunthorpe @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling,
	Kanchan Joshi, stephen



On 2020-01-08 3:17 a.m., Javier González wrote:
> I think this is good topic and I would like to participate in the
> discussion too. I think that Logan Gunthorpe would also be interested
> (Cc). Adding Kanchan too, who is also working on this and can contribute
> to the discussion
> 
> We discussed this in the context of P2P at different SNIA events in the
> context of computational offloads and also as the backend implementation
> for Simple Copy, which is coming in NVMe. Discussing this (again) at
> LSF/MM and finding a way to finally get XCOPY merged would be great.

Yes, I would definitely be interested in discussing copy offload
especially in the context of P2P. Sorting out a userspace interface for
this that supports a P2P use case would be very beneficial to a lot of
folks.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  0:51       ` Logan Gunthorpe
  0 siblings, 0 replies; 272+ messages in thread
From: Logan Gunthorpe @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: axboe, roland, msnitzer, bvanassche, linux-scsi, Matias Bjorling,
	linux-nvme, Stephen Bates, linux-block, stephen, dm-devel,
	mpatocka, hare, Kanchan Joshi, Martin K. Petersen, Keith Busch,
	lsf-pc, rwheeler, Christoph Hellwig, frederick.knight,
	zach.brown



On 2020-01-08 3:17 a.m., Javier González wrote:
> I think this is good topic and I would like to participate in the
> discussion too. I think that Logan Gunthorpe would also be interested
> (Cc). Adding Kanchan too, who is also working on this and can contribute
> to the discussion
> 
> We discussed this in the context of P2P at different SNIA events in the
> context of computational offloads and also as the backend implementation
> for Simple Copy, which is coming in NVMe. Discussing this (again) at
> LSF/MM and finding a way to finally get XCOPY merged would be great.

Yes, I would definitely be interested in discussing copy offload
especially in the context of P2P. Sorting out a userspace interface for
this that supports a P2P use case would be very beneficial to a lot of
folks.

Thanks,

Logan

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-09  0:51       ` Logan Gunthorpe
  0 siblings, 0 replies; 272+ messages in thread
From: Logan Gunthorpe @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Javier González, Chaitanya Kulkarni
  Cc: axboe, roland, msnitzer, stephen, bvanassche, linux-scsi,
	Matias Bjorling, linux-nvme, Stephen Bates, linux-block,
	dm-devel, mpatocka, Kanchan Joshi, Martin K. Petersen,
	Keith Busch, lsf-pc, rwheeler, Christoph Hellwig,
	frederick.knight



On 2020-01-08 3:17 a.m., Javier González wrote:
> I think this is good topic and I would like to participate in the
> discussion too. I think that Logan Gunthorpe would also be interested
> (Cc). Adding Kanchan too, who is also working on this and can contribute
> to the discussion
> 
> We discussed this in the context of P2P at different SNIA events in the
> context of computational offloads and also as the backend implementation
> for Simple Copy, which is coming in NVMe. Discussing this (again) at
> LSF/MM and finding a way to finally get XCOPY merged would be great.

Yes, I would definitely be interested in discussing copy offload
especially in the context of P2P. Sorting out a userspace interface for
this that supports a P2P use case would be very beneficial to a lot of
folks.

Thanks,

Logan


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
  2020-01-07 18:14   ` Chaitanya Kulkarni
  (?)
@ 2020-01-08 10:17     ` Javier González
  -1 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2020-01-08 10:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc, axboe,
	bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling,
	Kanchan Joshi, Logan Gunthorpe, stephen

On 07.01.2020 18:14, Chaitanya Kulkarni wrote:
>Hi all,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite block layer, device drivers and file system
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Stephen Bates
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Matias Bjørling
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf
>
>Regards,
>Chaitanya

I think this is good topic and I would like to participate in the
discussion too. I think that Logan Gunthorpe would also be interested
(Cc). Adding Kanchan too, who is also working on this and can contribute
to the discussion

We discussed this in the context of P2P at different SNIA events in the
context of computational offloads and also as the backend implementation
for Simple Copy, which is coming in NVMe. Discussing this (again) at
LSF/MM and finding a way to finally get XCOPY merged would be great.

Thanks,
Javier




^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-08 10:17     ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2020-01-08 10:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme, Stephen Bates, dm-devel, Christoph Hellwig, msnitzer,
	bvanassche, linux-scsi, roland, frederick.knight, zach.brown,
	linux-block, mpatocka, hare, Keith Busch, Matias Bjorling, axboe,
	Martin K. Petersen, Kanchan Joshi, rwheeler, lsf-pc,
	Logan Gunthorpe, stephen

On 07.01.2020 18:14, Chaitanya Kulkarni wrote:
>Hi all,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite block layer, device drivers and file system
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Stephen Bates
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Matias Bjørling
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf
>
>Regards,
>Chaitanya

I think this is good topic and I would like to participate in the
discussion too. I think that Logan Gunthorpe would also be interested
(Cc). Adding Kanchan too, who is also working on this and can contribute
to the discussion

We discussed this in the context of P2P at different SNIA events in the
context of computational offloads and also as the backend implementation
for Simple Copy, which is coming in NVMe. Discussing this (again) at
LSF/MM and finding a way to finally get XCOPY merged would be great.

Thanks,
Javier




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-08 10:17     ` Javier González
  0 siblings, 0 replies; 272+ messages in thread
From: Javier González @ 2020-01-08 10:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme, Stephen Bates, dm-devel, Christoph Hellwig, msnitzer,
	bvanassche, linux-scsi, roland, frederick.knight, zach.brown,
	linux-block, mpatocka, Keith Busch, Matias Bjorling, axboe,
	Martin K. Petersen, Kanchan Joshi, rwheeler,
	lsf-pc@lists.linux-foundation.org

On 07.01.2020 18:14, Chaitanya Kulkarni wrote:
>Hi all,
>
>* Background :-
>-----------------------------------------------------------------------
>
>Copy offload is a feature that allows file-systems or storage devices
>to be instructed to copy files/logical blocks without requiring
>involvement of the local CPU.
>
>With reference to the RISC-V summit keynote [1] single threaded
>performance is limiting due to Denard scaling and multi-threaded
>performance is slowing down due Moore's law limitations. With the rise
>of SNIA Computation Technical Storage Working Group (TWG) [2],
>offloading computations to the device or over the fabrics is becoming
>popular as there are several solutions available [2]. One of the common
>operation which is popular in the kernel and is not merged yet is Copy
>offload over the fabrics or on to the device.
>
>* Problem :-
>-----------------------------------------------------------------------
>
>The original work which is done by Martin is present here [3]. The
>latest work which is posted by Mikulas [4] is not merged yet. These two
>approaches are totally different from each other. Several storage
>vendors discourage mixing copy offload requests with regular READ/WRITE
>I/O. Also, the fact that the operation fails if a copy request ever
>needs to be split as it traverses the stack it has the unfortunate
>side-effect of preventing copy offload from working in pretty much
>every common deployment configuration out there.
>
>* Current state of the work :-
>-----------------------------------------------------------------------
>
>With [3] being hard to handle arbitrary DM/MD stacking without
>splitting the command in two, one for copying IN and one for copying
>OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>candidate. Also, with [4] there is an unresolved problem with the
>two-command approach about how to handle changes to the DM layout
>between an IN and OUT operations.
>
>* Why Linux Kernel Storage System needs Copy Offload support now ?
>-----------------------------------------------------------------------
>
>With the rise of the SNIA Computational Storage TWG and solutions [2],
>existing SCSI XCopy support in the protocol, recent advancement in the
>Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
>DMA support in the Linux Kernel mainly for NVMe devices [7] and
>eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
>from Copy offload operation.
>
>With this background we have significant number of use-cases which are
>strong candidates waiting for outstanding Linux Kernel Block Layer Copy
>Offload support, so that Linux Kernel Storage subsystem can to address
>previously mentioned problems [1] and allow efficient offloading of the
>data related operations. (Such as move/copy etc.)
>
>For reference following is the list of the use-cases/candidates waiting
>for Copy Offload support :-
>
>1. SCSI-attached storage arrays.
>2. Stacking drivers supporting XCopy DM/MD.
>3. Computational Storage solutions.
>7. File systems :- Local, NFS and Zonefs.
>4. Block devices :- Distributed, local, and Zoned devices.
>5. Peer to Peer DMA support solutions.
>6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.
>
>* What we will discuss in the proposed session ?
>-----------------------------------------------------------------------
>
>I'd like to propose a session to go over this topic to understand :-
>
>1. What are the blockers for Copy Offload implementation ?
>2. Discussion about having a file system interface.
>3. Discussion about having right system call for user-space.
>4. What is the right way to move this work forward ?
>5. How can we help to contribute and move this work forward ?
>
>* Required Participants :-
>-----------------------------------------------------------------------
>
>I'd like to invite block layer, device drivers and file system
>developers to:-
>
>1. Share their opinion on the topic.
>2. Share their experience and any other issues with [4].
>3. Uncover additional details that are missing from this proposal.
>
>Required attendees :-
>
>Martin K. Petersen
>Jens Axboe
>Christoph Hellwig
>Bart Van Assche
>Stephen Bates
>Zach Brown
>Roland Dreier
>Ric Wheeler
>Trond Myklebust
>Mike Snitzer
>Keith Busch
>Sagi Grimberg
>Hannes Reinecke
>Frederick Knight
>Mikulas Patocka
>Matias Bjørling
>
>[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
>[2] https://www.snia.org/computational
>https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
>      https://www.eideticom.com/products.html
>https://www.xilinx.com/applications/data-center/computational-storage.html
>[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
>[4] https://www.spinics.net/lists/linux-block/msg00599.html
>[5] https://lwn.net/Articles/793585/
>[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>namespaces-zns-as-go-to-industry-technology/
>[7] https://github.com/sbates130272/linux-p2pmem
>[8] https://kernel.dk/io_uring.pdf
>
>Regards,
>Chaitanya

I think this is good topic and I would like to participate in the
discussion too. I think that Logan Gunthorpe would also be interested
(Cc). Adding Kanchan too, who is also working on this and can contribute
to the discussion

We discussed this in the context of P2P at different SNIA events in the
context of computational offloads and also as the backend implementation
for Simple Copy, which is coming in NVMe. Discussing this (again) at
LSF/MM and finding a way to finally get XCOPY merged would be great.

Thanks,
Javier




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 272+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-07 18:14   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-07 18:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, bvanassche, hare, Martin K. Petersen, Keith Busch,
	Christoph Hellwig, Stephen Bates, msnitzer, mpatocka, zach.brown,
	roland, rwheeler, frederick.knight, Matias Bjorling

Hi all,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite block layer, device drivers and file system
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Stephen Bates
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Matias Bjørling

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

Regards,
Chaitanya

^ permalink raw reply	[flat|nested] 272+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-07 18:14   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-07 18:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, Martin K. Petersen, Matias Bjorling,
	Stephen Bates, roland, mpatocka, hare, Keith Busch, rwheeler,
	Christoph Hellwig, frederick.knight, zach.brown

Hi all,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite block layer, device drivers and file system
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Stephen Bates
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Matias Bjørling

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

Regards,
Chaitanya

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 272+ messages in thread

* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload
@ 2020-01-07 18:14   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 272+ messages in thread
From: Chaitanya Kulkarni @ 2020-01-07 18:14 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, dm-devel, lsf-pc
  Cc: axboe, msnitzer, bvanassche, Martin K. Petersen, Christoph,
	Matias Bjorling, Stephen Bates, roland, mpatocka, Keith Busch,
	rwheeler, Hellwig, frederick.knight, zach.brown

Hi all,

* Background :-
-----------------------------------------------------------------------

Copy offload is a feature that allows file-systems or storage devices
to be instructed to copy files/logical blocks without requiring
involvement of the local CPU.

With reference to the RISC-V summit keynote [1] single threaded
performance is limiting due to Denard scaling and multi-threaded
performance is slowing down due Moore's law limitations. With the rise
of SNIA Computation Technical Storage Working Group (TWG) [2],
offloading computations to the device or over the fabrics is becoming
popular as there are several solutions available [2]. One of the common
operation which is popular in the kernel and is not merged yet is Copy
offload over the fabrics or on to the device.

* Problem :-
-----------------------------------------------------------------------

The original work which is done by Martin is present here [3]. The
latest work which is posted by Mikulas [4] is not merged yet. These two
approaches are totally different from each other. Several storage
vendors discourage mixing copy offload requests with regular READ/WRITE
I/O. Also, the fact that the operation fails if a copy request ever
needs to be split as it traverses the stack it has the unfortunate
side-effect of preventing copy offload from working in pretty much
every common deployment configuration out there.

* Current state of the work :-
-----------------------------------------------------------------------

With [3] being hard to handle arbitrary DM/MD stacking without
splitting the command in two, one for copying IN and one for copying
OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
candidate. Also, with [4] there is an unresolved problem with the
two-command approach about how to handle changes to the DM layout
between an IN and OUT operations.

* Why Linux Kernel Storage System needs Copy Offload support now ?
-----------------------------------------------------------------------

With the rise of the SNIA Computational Storage TWG and solutions [2],
existing SCSI XCopy support in the protocol, recent advancement in the
Linux Kernel File System for Zoned devices (Zonefs [5]), Peer to Peer
DMA support in the Linux Kernel mainly for NVMe devices [7] and
eventually NVMe Devices and subsystem (NVMe PCIe/NVMeOF) will benefit
from Copy offload operation.

With this background we have significant number of use-cases which are
strong candidates waiting for outstanding Linux Kernel Block Layer Copy
Offload support, so that Linux Kernel Storage subsystem can to address
previously mentioned problems [1] and allow efficient offloading of the
data related operations. (Such as move/copy etc.)

For reference following is the list of the use-cases/candidates waiting
for Copy Offload support :-

1. SCSI-attached storage arrays.
2. Stacking drivers supporting XCopy DM/MD.
3. Computational Storage solutions.
7. File systems :- Local, NFS and Zonefs.
4. Block devices :- Distributed, local, and Zoned devices.
5. Peer to Peer DMA support solutions.
6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session to go over this topic to understand :-

1. What are the blockers for Copy Offload implementation ?
2. Discussion about having a file system interface.
3. Discussion about having right system call for user-space.
4. What is the right way to move this work forward ?
5. How can we help to contribute and move this work forward ?

* Required Participants :-
-----------------------------------------------------------------------

I'd like to invite block layer, device drivers and file system
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with [4].
3. Uncover additional details that are missing from this proposal.

Required attendees :-

Martin K. Petersen
Jens Axboe
Christoph Hellwig
Bart Van Assche
Stephen Bates
Zach Brown
Roland Dreier
Ric Wheeler
Trond Myklebust
Mike Snitzer
Keith Busch
Sagi Grimberg
Hannes Reinecke
Frederick Knight
Mikulas Patocka
Matias Bjørling

[1]https://content.riscv.org/wp-content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture-History-Challenges-and-Opportunities-David-Patterson-.pdf
[2] https://www.snia.org/computational
https://www.napatech.com/support/resources/solution-descriptions/napatech-smartnic-solution-for-hardware-offload/
      https://www.eideticom.com/products.html
https://www.xilinx.com/applications/data-center/computational-storage.html
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy
[4] https://www.spinics.net/lists/linux-block/msg00599.html
[5] https://lwn.net/Articles/793585/
[6] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[7] https://github.com/sbates130272/linux-p2pmem
[8] https://kernel.dk/io_uring.pdf

Regards,
Chaitanya

^ permalink raw reply	[flat|nested] 272+ messages in thread

end of thread, other threads:[~2023-01-13  9:53 UTC | newest]

Thread overview: 272+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220127071544uscas1p2f70f4d2509f3ebd574b7ed746d3fa551@uscas1p2.samsung.com>
2022-01-27  7:14 ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
2022-01-28 19:59   ` Adam Manzanares
2022-01-28 19:59     ` [dm-devel] " Adam Manzanares
2022-01-31 11:49     ` Johannes Thumshirn
2022-01-31 11:49       ` [dm-devel] " Johannes Thumshirn
2022-01-31 19:03   ` Bart Van Assche
2022-01-31 19:03     ` [dm-devel] " Bart Van Assche
2022-02-01  1:54   ` Luis Chamberlain
2022-02-01  1:54     ` [dm-devel] " Luis Chamberlain
2022-02-01 10:21   ` Javier González
2022-02-01 18:31     ` [RFC PATCH 0/3] NVMe copy offload patches Mikulas Patocka
2022-02-01 18:31       ` [dm-devel] " Mikulas Patocka
2022-02-01 18:32       ` [RFC PATCH 1/3] block: add copy offload support Mikulas Patocka
2022-02-01 18:32         ` [dm-devel] " Mikulas Patocka
2022-02-01 19:18         ` Bart Van Assche
2022-02-01 19:18           ` [dm-devel] " Bart Van Assche
2022-02-03 18:50           ` Mikulas Patocka
2022-02-03 18:50             ` [dm-devel] " Mikulas Patocka
2022-02-03 20:11             ` Keith Busch
2022-02-03 20:11               ` [dm-devel] " Keith Busch
2022-02-03 22:49             ` Bart Van Assche
2022-02-03 22:49               ` [dm-devel] " Bart Van Assche
2022-02-04 12:09               ` Mikulas Patocka
2022-02-04 12:09                 ` [dm-devel] " Mikulas Patocka
2022-02-04 13:34                 ` Jens Axboe
2022-02-04 13:34                   ` [dm-devel] " Jens Axboe
2022-02-02 16:21         ` Keith Busch
2022-02-02 16:21           ` [dm-devel] " Keith Busch
2022-02-02 16:40           ` Mikulas Patocka
2022-02-02 16:40             ` [dm-devel] " Mikulas Patocka
2022-02-02 18:40           ` Knight, Frederick
2022-02-02 18:40             ` [dm-devel] " Knight, Frederick
2022-02-01 18:33       ` [RFC PATCH 2/3] nvme: " Mikulas Patocka
2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
2022-02-01 19:18         ` Bart Van Assche
2022-02-01 19:18           ` [dm-devel] " Bart Van Assche
2022-02-01 19:25           ` Mikulas Patocka
2022-02-01 19:25             ` [dm-devel] " Mikulas Patocka
2022-02-01 18:33       ` [RFC PATCH 3/3] nvme: add the "debug" host driver Mikulas Patocka
2022-02-01 18:33         ` [dm-devel] " Mikulas Patocka
2022-02-01 21:32         ` kernel test robot
2022-02-02  6:01         ` Adam Manzanares
2022-02-02  6:01           ` [dm-devel] " Adam Manzanares
2022-02-03 16:06           ` Luis Chamberlain
2022-02-03 16:06             ` [dm-devel] " Luis Chamberlain
2022-02-03 16:15             ` Christoph Hellwig
2022-02-03 16:15               ` [dm-devel] " Christoph Hellwig
2022-02-03 19:34               ` Luis Chamberlain
2022-02-03 19:34                 ` [dm-devel] " Luis Chamberlain
2022-02-03 19:46                 ` Adam Manzanares
2022-02-03 19:46                   ` [dm-devel] " Adam Manzanares
2022-02-03 20:57                 ` Mikulas Patocka
2022-02-03 20:57                   ` [dm-devel] " Mikulas Patocka
2022-02-03 22:52                   ` Adam Manzanares
2022-02-03 22:52                     ` [dm-devel] " Adam Manzanares
2022-02-04  3:00                   ` Chaitanya Kulkarni
2022-02-04  3:00                     ` [dm-devel] " Chaitanya Kulkarni
2022-02-04  3:05             ` Chaitanya Kulkarni
2022-02-04  3:05               ` [dm-devel] " Chaitanya Kulkarni
2022-02-02  7:23         ` kernel test robot
2022-02-02  7:23           ` kernel test robot
2022-02-02  8:00         ` Chaitanya Kulkarni
2022-02-02  8:00           ` [dm-devel] " Chaitanya Kulkarni
2022-02-02 12:38           ` Klaus Jensen
2022-02-02 12:38             ` [dm-devel] " Klaus Jensen
2022-02-03 15:38           ` Luis Chamberlain
2022-02-03 15:38             ` [dm-devel] " Luis Chamberlain
2022-02-03 16:52             ` Keith Busch
2022-02-03 16:52               ` [dm-devel] " Keith Busch
2022-02-03 19:50               ` Adam Manzanares
2022-02-03 19:50                 ` [dm-devel] " Adam Manzanares
2022-02-04  3:12                 ` Chaitanya Kulkarni
2022-02-04  3:12                   ` [dm-devel] " Chaitanya Kulkarni
2022-02-04  6:28                   ` Damien Le Moal
2022-02-04  6:28                     ` [dm-devel] " Damien Le Moal
2022-02-04  7:58                     ` Chaitanya Kulkarni
2022-02-04  7:58                       ` [dm-devel] " Chaitanya Kulkarni
2022-02-04  8:24                       ` Javier González
2022-02-04  8:24                         ` [dm-devel] " Javier González
2022-02-04  9:58                         ` Chaitanya Kulkarni
2022-02-04  9:58                           ` [dm-devel] " Chaitanya Kulkarni
2022-02-04 11:34                           ` Javier González
2022-02-04 11:34                             ` [dm-devel] " Javier González
2022-02-04 14:15                           ` Hannes Reinecke
2022-02-04 14:15                             ` [dm-devel] " Hannes Reinecke
2022-02-04 14:24                             ` Keith Busch
2022-02-04 14:24                               ` [dm-devel] " Keith Busch
2022-02-04 16:01                             ` Christoph Hellwig
2022-02-04 16:01                               ` [dm-devel] " Christoph Hellwig
2022-02-04 19:41       ` [RFC PATCH 0/3] NVMe copy offload patches Nitesh Shetty
2022-02-04 19:41         ` [dm-devel] " Nitesh Shetty
     [not found]         ` <CGME20220207141901epcas5p162ec2387815be7a1fd67ce0ab7082119@epcas5p1.samsung.com>
2022-02-07 14:13           ` [PATCH v2 00/10] Add Copy offload support Nitesh Shetty
2022-02-07 14:13             ` [dm-devel] " Nitesh Shetty
     [not found]             ` <CGME20220207141908epcas5p4f270c89fc32434ea8b525fa973098231@epcas5p4.samsung.com>
2022-02-07 14:13               ` [PATCH v2 01/10] block: make bio_map_kern() non static Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
     [not found]             ` <CGME20220207141913epcas5p4d41cb549b7cca1ede5c7a66bbd110da0@epcas5p4.samsung.com>
2022-02-07 14:13               ` [PATCH v2 02/10] block: Introduce queue limits for copy-offload support Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-08  7:01                 ` Damien Le Moal
2022-02-08  7:01                   ` [dm-devel] " Damien Le Moal
2022-02-08 18:43                   ` Nitesh Shetty
2022-02-08 18:43                     ` [dm-devel] " Nitesh Shetty
     [not found]             ` <CGME20220207141918epcas5p4f9badc0c3f3f0913f091c850d2d3bd81@epcas5p4.samsung.com>
2022-02-07 14:13               ` [PATCH v2 03/10] block: Add copy offload support infrastructure Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-07 22:45                 ` kernel test robot
2022-02-07 22:45                   ` kernel test robot
2022-02-07 22:45                   ` [dm-devel] " kernel test robot
2022-02-07 23:26                 ` kernel test robot
2022-02-07 23:26                   ` kernel test robot
2022-02-07 23:26                   ` [dm-devel] " kernel test robot
2022-02-08  7:21                 ` Damien Le Moal
2022-02-08  7:21                   ` [dm-devel] " Damien Le Moal
2022-02-09 10:22                   ` Nitesh Shetty
2022-02-09 10:22                     ` [dm-devel] " Nitesh Shetty
     [not found]             ` <CGME20220207141924epcas5p26ad9cf5de732224f408aded12ed0a577@epcas5p2.samsung.com>
2022-02-07 14:13               ` [PATCH v2 04/10] block: Introduce a new ioctl for copy Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-09  3:39                 ` kernel test robot
2022-02-09  3:39                   ` kernel test robot
2022-02-09  3:39                   ` kernel test robot
     [not found]             ` <CGME20220207141930epcas5p2bcbff65f78ad1dede64648d73ddb3770@epcas5p2.samsung.com>
2022-02-07 14:13               ` [PATCH v2 05/10] block: add emulation " Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-08  3:20                 ` kernel test robot
2022-02-08  3:20                   ` kernel test robot
2022-02-08  3:20                   ` [dm-devel] " kernel test robot
2022-02-16 13:32                 ` Mikulas Patocka
2022-02-16 13:32                   ` [dm-devel] " Mikulas Patocka
2022-02-17 13:18                   ` Nitesh Shetty
2022-02-17 13:18                     ` [dm-devel] " Nitesh Shetty
     [not found]             ` <CGME20220207141937epcas5p2bd57ae35056c69b3e2f9ee2348d6af19@epcas5p2.samsung.com>
2022-02-07 14:13               ` [PATCH v2 06/10] nvme: add copy support Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-10  7:08                 ` kernel test robot
2022-02-10  7:08                   ` kernel test robot
2022-02-10  7:08                   ` [dm-devel] " kernel test robot
     [not found]             ` <CGME20220207141942epcas5p4bda894a5833513c9211dcecc7928a951@epcas5p4.samsung.com>
2022-02-07 14:13               ` [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-07 18:10                 ` kernel test robot
2022-02-07 18:10                   ` kernel test robot
2022-02-07 18:10                   ` [dm-devel] " kernel test robot
2022-02-07 20:12                 ` kernel test robot
2022-02-07 20:12                   ` kernel test robot
2022-02-07 20:12                   ` [dm-devel] " kernel test robot
2022-02-10  8:31                 ` kernel test robot
2022-02-10  8:31                   ` kernel test robot
2022-02-10  8:31                   ` [dm-devel] " kernel test robot
     [not found]             ` <CGME20220207141948epcas5p4534f6bdc5a1e2e676d7d09c04f8b4a5b@epcas5p4.samsung.com>
2022-02-07 14:13               ` [PATCH v2 08/10] dm: Add support for copy offload Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-16 13:51                 ` Mikulas Patocka
2022-02-16 13:51                   ` [dm-devel] " Mikulas Patocka
2022-02-24 12:42                   ` Nitesh Shetty
2022-02-24 12:42                     ` [dm-devel] " Nitesh Shetty
2022-02-25  9:12                     ` Mikulas Patocka
2022-02-25  9:12                       ` [dm-devel] " Mikulas Patocka
     [not found]             ` <CGME20220207141953epcas5p32ccc3c0bbe642cea074edefcc32302a5@epcas5p3.samsung.com>
2022-02-07 14:13               ` [PATCH v2 09/10] dm: Enable copy offload for dm-linear target Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
     [not found]             ` <CGME20220207141958epcas5p25f1cd06726217696d13c2dfbea010565@epcas5p2.samsung.com>
2022-02-07 14:13               ` [PATCH v2 10/10] dm kcopyd: use copy offload support Nitesh Shetty
2022-02-07 14:13                 ` [dm-devel] " Nitesh Shetty
2022-02-07  9:57     ` [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Nitesh Shetty
2022-02-07  9:57       ` [dm-devel] " Nitesh Shetty
2022-02-02  5:57   ` Kanchan Joshi
2022-02-02  5:57     ` [dm-devel] " Kanchan Joshi
2022-02-07 10:45   ` David Disseldorp
2022-02-07 10:45     ` [dm-devel] " David Disseldorp
2022-03-01 17:34   ` Nikos Tsironis
2022-03-01 17:34     ` [dm-devel] " Nikos Tsironis
2022-03-01 21:32     ` Chaitanya Kulkarni
2022-03-03 18:36       ` Nikos Tsironis
2022-03-03 18:36         ` [dm-devel] " Nikos Tsironis
2022-03-08 20:48       ` Nikos Tsironis
2022-03-08 20:48         ` [dm-devel] " Nikos Tsironis
2022-03-09  8:51         ` Mikulas Patocka
2022-03-09  8:51           ` [dm-devel] " Mikulas Patocka
2022-03-09 15:49           ` Nikos Tsironis
2022-03-09 15:49             ` [dm-devel] " Nikos Tsironis
     [not found] <CGME20230113094723epcas5p2f6f81ca1ad85f4b26829b87f8ec301ce@epcas5p2.samsung.com>
2023-01-13  9:46 ` Nitesh Shetty
2022-02-10 22:49 [PATCH v2 07/10] nvmet: add copy command support for bdev and file ns kernel test robot
2022-02-11  7:52 ` Dan Carpenter
2022-02-11  7:52 ` [dm-devel] " Dan Carpenter
2022-02-11  7:52 ` Dan Carpenter
     [not found] <CGME20220209075901epcas5p3cff468deadd8ef836522f032bd4ed36c@epcas5p3.samsung.com>
2022-02-08 23:31 ` [PATCH v2 03/10] block: Add copy offload support infrastructure kernel test robot
2022-02-09  7:48   ` Dan Carpenter
2022-02-09  7:48   ` [dm-devel] " Dan Carpenter
2022-02-09  7:48   ` Dan Carpenter
2022-02-09 10:32   ` Nitesh Shetty
2022-02-10  6:13     ` Nitesh Shetty
2022-02-09 10:32     ` [dm-devel] " Nitesh Shetty
  -- strict thread matches above, loose matches on Subject: below --
2021-05-11  0:15 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload Chaitanya Kulkarni
2021-05-11  0:15 ` Chaitanya Kulkarni
2021-05-11 21:15 ` Knight, Frederick
2021-05-11 21:15   ` Knight, Frederick
2021-05-12  2:21 ` Bart Van Assche
2021-05-12  2:21   ` Bart Van Assche
     [not found] ` <CGME20210512071321eucas1p2ca2253e90449108b9f3e4689bf8e0512@eucas1p2.samsung.com>
2021-05-12  7:13   ` Javier González
2021-05-12  7:13     ` Javier González
2021-05-12  7:30 ` Johannes Thumshirn
2021-05-12  7:30   ` Johannes Thumshirn
     [not found]   ` <CGME20210928191342eucas1p23448dcd51b23495fa67cdc017e77435c@eucas1p2.samsung.com>
2021-09-28 19:13     ` Javier González
2021-09-28 19:13       ` Javier González
2021-09-29  6:44       ` Johannes Thumshirn
2021-09-29  6:44         ` Johannes Thumshirn
2021-09-30  9:43       ` Chaitanya Kulkarni
2021-09-30  9:43         ` Chaitanya Kulkarni
2021-09-30  9:53         ` Javier González
2021-09-30  9:53           ` Javier González
2021-10-06 10:01         ` Javier González
2021-10-06 10:01           ` Javier González
2021-10-13  8:35           ` Javier González
2021-10-13  8:35             ` Javier González
2021-09-30 16:20       ` Bart Van Assche
2021-09-30 16:20         ` Bart Van Assche
2021-10-06 10:05         ` Javier González
2021-10-06 10:05           ` Javier González
2021-10-06 17:33           ` Bart Van Assche
2021-10-06 17:33             ` Bart Van Assche
2021-10-08  6:49             ` Javier González
2021-10-08  6:49               ` Javier González
2021-10-29  0:21               ` Chaitanya Kulkarni
2021-10-29  5:51                 ` Hannes Reinecke
2021-10-29  8:16                   ` Javier González
2021-10-29 16:15                   ` Bart Van Assche
2021-11-01 17:54                     ` Keith Busch
2021-10-29  8:14                 ` Javier González
2021-11-03 19:27                   ` Javier González
2021-11-16 13:43                     ` Javier González
2021-11-16 17:59                       ` Bart Van Assche
2021-11-17 12:53                         ` Javier González
2021-11-17 15:52                           ` Bart Van Assche
2021-11-19  7:38                             ` Javier González
2021-11-19 10:47                       ` Kanchan Joshi
2021-11-19 15:51                         ` Keith Busch
2021-11-19 16:21                         ` Bart Van Assche
2021-11-22  7:39                       ` Kanchan Joshi
2021-05-12 15:23 ` Hannes Reinecke
2021-05-12 15:23   ` Hannes Reinecke
2021-05-12 15:45 ` Himanshu Madhani
2021-05-12 15:45   ` Himanshu Madhani
2021-05-17 16:39 ` Kanchan Joshi
2021-05-17 16:39   ` Kanchan Joshi
2021-05-18  0:15 ` Bart Van Assche
2021-05-18  0:15   ` Bart Van Assche
2021-06-11  6:03 ` Chaitanya Kulkarni
2021-06-11  6:03   ` Chaitanya Kulkarni
2021-06-11 15:35 ` Nikos Tsironis
2021-06-11 15:35   ` Nikos Tsironis
     [not found] <CGME20200107181551epcas5p4f47eeafd807c28a26b4024245c4e00ab@epcas5p4.samsung.com>
2020-01-07 18:14 ` Chaitanya Kulkarni
2020-01-07 18:14   ` Chaitanya Kulkarni
2020-01-07 18:14   ` Chaitanya Kulkarni
2020-01-08 10:17   ` Javier González
2020-01-08 10:17     ` Javier González
2020-01-08 10:17     ` Javier González
2020-01-09  0:51     ` Logan Gunthorpe
2020-01-09  0:51       ` Logan Gunthorpe
2020-01-09  0:51       ` Logan Gunthorpe
2020-01-09  3:18   ` Bart Van Assche
2020-01-09  3:18     ` Bart Van Assche
2020-01-09  3:18     ` Bart Van Assche
2020-01-09  4:01     ` Chaitanya Kulkarni
2020-01-09  4:01       ` Chaitanya Kulkarni
2020-01-09  4:01       ` Chaitanya Kulkarni
2020-01-09  5:56     ` Damien Le Moal
2020-01-09  5:56       ` Damien Le Moal
2020-01-09  5:56       ` Damien Le Moal
2020-01-10  5:33     ` Martin K. Petersen
2020-01-10  5:33       ` Martin K. Petersen
2020-01-10  5:33       ` Martin K. Petersen
2020-01-24 14:23   ` Nikos Tsironis
2020-01-24 14:23     ` Nikos Tsironis
2020-01-24 14:23     ` Nikos Tsironis
2020-02-13  5:11   ` joshi.k
2020-02-13  5:11     ` joshi.k
2020-02-13  5:11     ` joshi.k
2020-02-13 13:09     ` Knight, Frederick
2020-02-13 13:09       ` Knight, Frederick
2020-02-13 13:09       ` Knight, Frederick

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.