From: "Javier González" <javier@javigon.com> To: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: "Jens Axboe" <axboe@kernel.dk>, "Niklas Cassel" <Niklas.Cassel@wdc.com>, "Ajay Joshi" <Ajay.Joshi@wdc.com>, "Sagi Grimberg" <sagi@grimberg.me>, "Keith Busch" <Keith.Busch@wdc.com>, "Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>, "Aravind Ramesh" <Aravind.Ramesh@wdc.com>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "Hans Holmberg" <Hans.Holmberg@wdc.com>, "Keith Busch" <kbusch@kernel.org>, "Matias Bjørling" <mb@lightnvm.io>, "Christoph Hellwig" <hch@lst.de>, "Matias Bjorling" <Matias.Bjorling@wdc.com> Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces Date: Wed, 17 Jun 2020 09:11:41 +0200 [thread overview] Message-ID: <20200617071141.rfy545k2vlzkroby@mpHalley.localdomain> (raw) In-Reply-To: <CY4PR04MB3751808DFE9AF00EF172DFCCE79A0@CY4PR04MB3751.namprd04.prod.outlook.com> On 17.06.2020 06:54, Damien Le Moal wrote: >On 2020/06/17 15:18, Javier González wrote: >> On 17.06.2020 00:38, Damien Le Moal wrote: >>> On 2020/06/17 1:13, Javier González wrote: >>>> On 16.06.2020 09:07, Keith Busch wrote: >>>>> On Tue, Jun 16, 2020 at 05:55:26PM +0200, Javier González wrote: >>>>>> On 16.06.2020 08:48, Keith Busch wrote: >>>>>>> On Tue, Jun 16, 2020 at 05:02:17PM +0200, Javier González wrote: >>>>>>>> This depends very much on how the FS / application is managing >>>>>>>> stripping. At the moment our main use case is enabling user-space >>>>>>>> applications submitting I/Os to raw ZNS devices through the kernel. >>>>>>>> >>>>>>>> Can we enable this use case to start with? >>>>>>> >>>>>>> I think this already provides that. You can set the nsid value to >>>>>>> whatever you want in the passthrough interface, so a namespace block >>>>>>> device is not required to issue I/O to a ZNS namespace from user space. >>>>>> >>>>>> Mmmmm. Problem now is that the check on the nvme driver prevents the ZNS >>>>>> namespace from being initialized. Am I missing something? >>>>> >>>>> Hm, okay, it may not work for you. We need the driver to create at least >>>>> one namespace so that we have tags and request_queue. If you have that, >>>>> you can issue IO to any other attached namespace through the passthrough >>>>> interface, but we can't assume there is an available namespace. >>>> >>>> That makes sense for now. >>>> >>>> The next step for us is to enable a passthrough on uring, making sure >>>> that I/Os do not split. >>> >>> Passthrough as in "application issues directly NVMe commands" like for SG_IO >>> with SCSI ? Or do you mean raw block device file accesses by the application, >>> meaning that the IO goes through the block IO stack as opposed to directly going >>> to the driver ? >>> >>> For the latter case, I do not think it is possible to guarantee that an IO will >>> not get split unless we are talking about single page IOs (e.g. 4K on X86). See >>> a somewhat similar request here and comments about it. >>> >>> https://www.spinics.net/lists/linux-block/msg55079.html >> >> At the moment we are doing the former, but it looks like a hack to me to >> go directly to the NVMe driver. > >That is what the nvme driver ioctl() is for no ? An application can send an NVMe >command directly to the driver with it. That is not a hack, but the regular way >of doing passthrough for NVMe, isn't it ? We have enabled it through uring to get async() passthru submission. Looks like a hack at the moment, but we might just send a RFC to have something concrete to based the discussion on. > >> I was thinking that we could enable the second path by making use of >> chunk_sectors and limit the I/O size just as the append_max_io_size >> does. Is this the complete wrong way of looking at it? > >The block layer cannot limit the size of a passthrough command since the command >is protocol specific and the block layer is a protocol independent interface. Agree. This work depend in the application being aware of a max I/O size at the moment. Down the road, we will remove (or at least limit a lot) this constraint for ZNS devices that can eventually cache out-of-order I/Os. >SCSI SG does not split passthrough requests, it cannot. For passthrough >commands, the command buffer can be dma-mapped or it cannot. If mapping >succeeds, the command is issued. If it cannot, the command is failed. At least, >that is my understanding of how the stack is working. I am not familiar with SCSI SG. This looks like how the ioctl() passthru works in NVMe, but as mentioned above, we would like to enable an async() passthru path. Thanks, Javier
WARNING: multiple messages have this Message-ID (diff)
From: "Javier González" <javier@javigon.com> To: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: "Jens Axboe" <axboe@kernel.dk>, "Niklas Cassel" <Niklas.Cassel@wdc.com>, "Ajay Joshi" <Ajay.Joshi@wdc.com>, "Sagi Grimberg" <sagi@grimberg.me>, "Keith Busch" <Keith.Busch@wdc.com>, "Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>, "Aravind Ramesh" <Aravind.Ramesh@wdc.com>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "Hans Holmberg" <Hans.Holmberg@wdc.com>, "Keith Busch" <kbusch@kernel.org>, "Matias Bjørling" <mb@lightnvm.io>, "Christoph Hellwig" <hch@lst.de>, "Matias Bjorling" <Matias.Bjorling@wdc.com> Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces Date: Wed, 17 Jun 2020 09:11:41 +0200 [thread overview] Message-ID: <20200617071141.rfy545k2vlzkroby@mpHalley.localdomain> (raw) In-Reply-To: <CY4PR04MB3751808DFE9AF00EF172DFCCE79A0@CY4PR04MB3751.namprd04.prod.outlook.com> On 17.06.2020 06:54, Damien Le Moal wrote: >On 2020/06/17 15:18, Javier González wrote: >> On 17.06.2020 00:38, Damien Le Moal wrote: >>> On 2020/06/17 1:13, Javier González wrote: >>>> On 16.06.2020 09:07, Keith Busch wrote: >>>>> On Tue, Jun 16, 2020 at 05:55:26PM +0200, Javier González wrote: >>>>>> On 16.06.2020 08:48, Keith Busch wrote: >>>>>>> On Tue, Jun 16, 2020 at 05:02:17PM +0200, Javier González wrote: >>>>>>>> This depends very much on how the FS / application is managing >>>>>>>> stripping. At the moment our main use case is enabling user-space >>>>>>>> applications submitting I/Os to raw ZNS devices through the kernel. >>>>>>>> >>>>>>>> Can we enable this use case to start with? >>>>>>> >>>>>>> I think this already provides that. You can set the nsid value to >>>>>>> whatever you want in the passthrough interface, so a namespace block >>>>>>> device is not required to issue I/O to a ZNS namespace from user space. >>>>>> >>>>>> Mmmmm. Problem now is that the check on the nvme driver prevents the ZNS >>>>>> namespace from being initialized. Am I missing something? >>>>> >>>>> Hm, okay, it may not work for you. We need the driver to create at least >>>>> one namespace so that we have tags and request_queue. If you have that, >>>>> you can issue IO to any other attached namespace through the passthrough >>>>> interface, but we can't assume there is an available namespace. >>>> >>>> That makes sense for now. >>>> >>>> The next step for us is to enable a passthrough on uring, making sure >>>> that I/Os do not split. >>> >>> Passthrough as in "application issues directly NVMe commands" like for SG_IO >>> with SCSI ? Or do you mean raw block device file accesses by the application, >>> meaning that the IO goes through the block IO stack as opposed to directly going >>> to the driver ? >>> >>> For the latter case, I do not think it is possible to guarantee that an IO will >>> not get split unless we are talking about single page IOs (e.g. 4K on X86). See >>> a somewhat similar request here and comments about it. >>> >>> https://www.spinics.net/lists/linux-block/msg55079.html >> >> At the moment we are doing the former, but it looks like a hack to me to >> go directly to the NVMe driver. > >That is what the nvme driver ioctl() is for no ? An application can send an NVMe >command directly to the driver with it. That is not a hack, but the regular way >of doing passthrough for NVMe, isn't it ? We have enabled it through uring to get async() passthru submission. Looks like a hack at the moment, but we might just send a RFC to have something concrete to based the discussion on. > >> I was thinking that we could enable the second path by making use of >> chunk_sectors and limit the I/O size just as the append_max_io_size >> does. Is this the complete wrong way of looking at it? > >The block layer cannot limit the size of a passthrough command since the command >is protocol specific and the block layer is a protocol independent interface. Agree. This work depend in the application being aware of a max I/O size at the moment. Down the road, we will remove (or at least limit a lot) this constraint for ZNS devices that can eventually cache out-of-order I/Os. >SCSI SG does not split passthrough requests, it cannot. For passthrough >commands, the command buffer can be dma-mapped or it cannot. If mapping >succeeds, the command is issued. If it cannot, the command is failed. At least, >that is my understanding of how the stack is working. I am not familiar with SCSI SG. This looks like how the ioctl() passthru works in NVMe, but as mentioned above, we would like to enable an async() passthru path. Thanks, Javier _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-06-17 7:11 UTC|newest] Thread overview: 192+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-15 23:34 [PATCH 0/5] nvme support for zoned namespace command set Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-15 23:34 ` [PATCH 1/5] block: add capacity field to zone descriptors Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-15 23:49 ` Chaitanya Kulkarni 2020-06-15 23:49 ` Chaitanya Kulkarni 2020-06-16 10:28 ` Javier González 2020-06-16 10:28 ` Javier González 2020-06-16 13:47 ` Daniel Wagner 2020-06-16 13:47 ` Daniel Wagner 2020-06-16 13:54 ` Johannes Thumshirn 2020-06-16 13:54 ` Johannes Thumshirn 2020-06-16 15:41 ` Martin K. Petersen 2020-06-16 15:41 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 2/5] null_blk: introduce zone capacity for zoned device Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-15 23:46 ` Chaitanya Kulkarni 2020-06-15 23:46 ` Chaitanya Kulkarni 2020-06-16 14:18 ` Daniel Wagner 2020-06-16 14:18 ` Daniel Wagner 2020-06-16 15:48 ` Martin K. Petersen 2020-06-16 15:48 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 3/5] nvme: implement I/O Command Sets Command Set support Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-16 10:33 ` Javier González 2020-06-16 10:33 ` Javier González 2020-06-16 17:14 ` Niklas Cassel 2020-06-16 17:14 ` Niklas Cassel 2020-06-16 15:58 ` Martin K. Petersen 2020-06-16 15:58 ` Martin K. Petersen 2020-06-16 17:01 ` Keith Busch 2020-06-16 17:01 ` Keith Busch 2020-06-17 9:50 ` Niklas Cassel 2020-06-17 9:50 ` Niklas Cassel 2020-06-16 17:06 ` Niklas Cassel 2020-06-16 17:06 ` Niklas Cassel 2020-06-17 2:01 ` Martin K. Petersen 2020-06-17 2:01 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 4/5] nvme: support for multi-command set effects Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-16 10:34 ` Javier González 2020-06-16 10:34 ` Javier González 2020-06-16 16:03 ` Martin K. Petersen 2020-06-16 16:03 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 5/5] nvme: support for zoned namespaces Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-16 10:41 ` Javier González 2020-06-16 10:41 ` Javier González 2020-06-16 11:18 ` Matias Bjørling 2020-06-16 11:18 ` Matias Bjørling 2020-06-16 12:00 ` Javier González 2020-06-16 12:00 ` Javier González 2020-06-16 12:06 ` Matias Bjørling 2020-06-16 12:06 ` Matias Bjørling 2020-06-16 12:24 ` Javier González 2020-06-16 12:24 ` Javier González 2020-06-16 12:27 ` Matias Bjørling 2020-06-16 12:27 ` Matias Bjørling 2020-06-16 12:35 ` Damien Le Moal 2020-06-16 12:35 ` Damien Le Moal [not found] ` <CGME20200616130815uscas1p1be34e5fceaa548eac31fb30790a689d4@uscas1p1.samsung.com> 2020-06-16 13:08 ` Judy Brock 2020-06-16 13:08 ` Judy Brock 2020-06-16 13:32 ` Matias Bjørling 2020-06-16 13:32 ` Matias Bjørling 2020-06-16 13:34 ` Damien Le Moal 2020-06-16 13:34 ` Damien Le Moal 2020-06-16 14:16 ` Javier González 2020-06-16 14:16 ` Javier González 2020-06-16 14:42 ` Damien Le Moal 2020-06-16 14:42 ` Damien Le Moal 2020-06-16 15:02 ` Javier González 2020-06-16 15:02 ` Javier González 2020-06-16 15:20 ` Matias Bjørling 2020-06-16 15:20 ` Matias Bjørling 2020-06-16 16:03 ` Javier González 2020-06-16 16:03 ` Javier González 2020-06-16 16:07 ` Matias Bjorling 2020-06-16 16:07 ` Matias Bjorling 2020-06-16 16:21 ` Javier González 2020-06-16 16:21 ` Javier González 2020-06-16 16:25 ` Matias Bjørling 2020-06-16 16:25 ` Matias Bjørling 2020-06-16 15:48 ` Keith Busch 2020-06-16 15:48 ` Keith Busch 2020-06-16 15:55 ` Javier González 2020-06-16 15:55 ` Javier González 2020-06-16 16:04 ` Matias Bjorling 2020-06-16 16:04 ` Matias Bjorling 2020-06-16 16:07 ` Keith Busch 2020-06-16 16:07 ` Keith Busch 2020-06-16 16:13 ` Javier González 2020-06-16 16:13 ` Javier González 2020-06-17 0:38 ` Damien Le Moal 2020-06-17 0:38 ` Damien Le Moal 2020-06-17 6:18 ` Javier González 2020-06-17 6:18 ` Javier González 2020-06-17 6:54 ` Damien Le Moal 2020-06-17 6:54 ` Damien Le Moal 2020-06-17 7:11 ` Javier González [this message] 2020-06-17 7:11 ` Javier González 2020-06-17 7:29 ` Damien Le Moal 2020-06-17 7:29 ` Damien Le Moal 2020-06-17 7:34 ` Javier González 2020-06-17 7:34 ` Javier González 2020-06-17 0:14 ` Damien Le Moal 2020-06-17 0:14 ` Damien Le Moal 2020-06-17 6:09 ` Javier González 2020-06-17 6:09 ` Javier González 2020-06-17 6:47 ` Damien Le Moal 2020-06-17 6:47 ` Damien Le Moal 2020-06-17 7:02 ` Javier González 2020-06-17 7:02 ` Javier González 2020-06-17 7:24 ` Damien Le Moal 2020-06-17 7:24 ` Damien Le Moal 2020-06-17 7:29 ` Javier González 2020-06-17 7:29 ` Javier González [not found] ` <CGME20200616123503uscas1p22ce22054a1b4152a20437b5abdd55119@uscas1p2.samsung.com> 2020-06-16 12:35 ` Judy Brock 2020-06-16 12:35 ` Judy Brock 2020-06-16 12:37 ` Damien Le Moal 2020-06-16 12:37 ` Damien Le Moal 2020-06-16 12:37 ` Matias Bjørling 2020-06-16 12:37 ` Matias Bjørling 2020-06-16 13:12 ` Judy Brock 2020-06-16 13:12 ` Judy Brock 2020-06-16 13:18 ` Judy Brock 2020-06-16 13:18 ` Judy Brock 2020-06-16 13:32 ` Judy Brock 2020-06-16 13:32 ` Judy Brock 2020-06-16 13:39 ` Damien Le Moal 2020-06-16 13:39 ` Damien Le Moal 2020-06-17 7:43 ` Christoph Hellwig 2020-06-17 7:43 ` Christoph Hellwig 2020-06-17 12:01 ` Martin K. Petersen 2020-06-17 12:01 ` Martin K. Petersen 2020-06-17 15:00 ` Javier González 2020-06-17 15:00 ` Javier González 2020-06-17 14:42 ` Javier González 2020-06-17 14:42 ` Javier González 2020-06-17 17:57 ` Matias Bjørling 2020-06-17 17:57 ` Matias Bjørling 2020-06-17 18:28 ` Javier González 2020-06-17 18:28 ` Javier González 2020-06-17 18:55 ` Matias Bjorling 2020-06-17 18:55 ` Matias Bjorling 2020-06-17 19:09 ` Javier González 2020-06-17 19:09 ` Javier González 2020-06-17 19:23 ` Matias Bjørling 2020-06-17 19:23 ` Matias Bjørling 2020-06-17 19:40 ` Javier González 2020-06-17 19:40 ` Javier González 2020-06-17 23:44 ` Heiner Litz 2020-06-17 23:44 ` Heiner Litz 2020-06-18 1:55 ` Keith Busch 2020-06-18 1:55 ` Keith Busch 2020-06-18 4:24 ` Heiner Litz 2020-06-18 4:24 ` Heiner Litz 2020-06-18 5:15 ` Damien Le Moal 2020-06-18 5:15 ` Damien Le Moal 2020-06-18 20:47 ` Heiner Litz 2020-06-18 20:47 ` Heiner Litz 2020-06-18 21:04 ` Matias Bjorling 2020-06-18 21:04 ` Matias Bjorling 2020-06-18 21:19 ` Keith Busch 2020-06-18 21:19 ` Keith Busch 2020-06-18 22:05 ` Heiner Litz 2020-06-18 22:05 ` Heiner Litz 2020-06-19 0:57 ` Damien Le Moal 2020-06-19 0:57 ` Damien Le Moal 2020-06-19 10:29 ` Matias Bjorling 2020-06-19 10:29 ` Matias Bjorling 2020-06-19 18:08 ` Heiner Litz 2020-06-19 18:08 ` Heiner Litz 2020-06-19 18:10 ` Keith Busch 2020-06-19 18:10 ` Keith Busch 2020-06-19 18:17 ` Heiner Litz 2020-06-19 18:17 ` Heiner Litz 2020-06-19 18:22 ` Keith Busch 2020-06-19 18:22 ` Keith Busch 2020-06-19 18:25 ` Matias Bjørling 2020-06-19 18:25 ` Matias Bjørling 2020-06-19 18:40 ` Heiner Litz 2020-06-19 18:40 ` Heiner Litz 2020-06-19 18:18 ` Matias Bjørling 2020-06-19 18:18 ` Matias Bjørling 2020-06-20 6:33 ` Christoph Hellwig 2020-06-20 6:33 ` Christoph Hellwig 2020-06-20 17:52 ` Heiner Litz 2020-06-20 17:52 ` Heiner Litz 2020-06-22 14:01 ` Christoph Hellwig 2022-03-02 21:11 ` Luis Chamberlain 2020-06-17 2:08 ` Martin K. Petersen 2020-06-17 2:08 ` Martin K. Petersen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200617071141.rfy545k2vlzkroby@mpHalley.localdomain \ --to=javier@javigon.com \ --cc=Ajay.Joshi@wdc.com \ --cc=Aravind.Ramesh@wdc.com \ --cc=Damien.LeMoal@wdc.com \ --cc=Dmitry.Fomichev@wdc.com \ --cc=Hans.Holmberg@wdc.com \ --cc=Keith.Busch@wdc.com \ --cc=Matias.Bjorling@wdc.com \ --cc=Niklas.Cassel@wdc.com \ --cc=axboe@kernel.dk \ --cc=hch@lst.de \ --cc=kbusch@kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=mb@lightnvm.io \ --cc=sagi@grimberg.me \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.