From: "Javier González" <javier@javigon.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
"Niklas Cassel" <Niklas.Cassel@wdc.com>,
"Ajay Joshi" <Ajay.Joshi@wdc.com>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Keith Busch" <Keith.Busch@wdc.com>,
"Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>,
"Aravind Ramesh" <Aravind.Ramesh@wdc.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"Hans Holmberg" <Hans.Holmberg@wdc.com>,
"Keith Busch" <kbusch@kernel.org>,
"Matias Bjørling" <mb@lightnvm.io>,
"Christoph Hellwig" <hch@lst.de>,
"Matias Bjorling" <Matias.Bjorling@wdc.com>
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces
Date: Wed, 17 Jun 2020 09:34:30 +0200 [thread overview]
Message-ID: <20200617073430.htbwj6ybkbui7jai@mpHalley.localdomain> (raw)
In-Reply-To: <CY4PR04MB375134A3B37A43AD6AF07815E79A0@CY4PR04MB3751.namprd04.prod.outlook.com>
On 17.06.2020 07:29, Damien Le Moal wrote:
>On 2020/06/17 16:11, Javier González wrote:
>> On 17.06.2020 06:54, Damien Le Moal wrote:
>>> On 2020/06/17 15:18, Javier González wrote:
>>>> On 17.06.2020 00:38, Damien Le Moal wrote:
>>>>> On 2020/06/17 1:13, Javier González wrote:
>>>>>> On 16.06.2020 09:07, Keith Busch wrote:
>>>>>>> On Tue, Jun 16, 2020 at 05:55:26PM +0200, Javier González wrote:
>>>>>>>> On 16.06.2020 08:48, Keith Busch wrote:
>>>>>>>>> On Tue, Jun 16, 2020 at 05:02:17PM +0200, Javier González wrote:
>>>>>>>>>> This depends very much on how the FS / application is managing
>>>>>>>>>> stripping. At the moment our main use case is enabling user-space
>>>>>>>>>> applications submitting I/Os to raw ZNS devices through the kernel.
>>>>>>>>>>
>>>>>>>>>> Can we enable this use case to start with?
>>>>>>>>>
>>>>>>>>> I think this already provides that. You can set the nsid value to
>>>>>>>>> whatever you want in the passthrough interface, so a namespace block
>>>>>>>>> device is not required to issue I/O to a ZNS namespace from user space.
>>>>>>>>
>>>>>>>> Mmmmm. Problem now is that the check on the nvme driver prevents the ZNS
>>>>>>>> namespace from being initialized. Am I missing something?
>>>>>>>
>>>>>>> Hm, okay, it may not work for you. We need the driver to create at least
>>>>>>> one namespace so that we have tags and request_queue. If you have that,
>>>>>>> you can issue IO to any other attached namespace through the passthrough
>>>>>>> interface, but we can't assume there is an available namespace.
>>>>>>
>>>>>> That makes sense for now.
>>>>>>
>>>>>> The next step for us is to enable a passthrough on uring, making sure
>>>>>> that I/Os do not split.
>>>>>
>>>>> Passthrough as in "application issues directly NVMe commands" like for SG_IO
>>>>> with SCSI ? Or do you mean raw block device file accesses by the application,
>>>>> meaning that the IO goes through the block IO stack as opposed to directly going
>>>>> to the driver ?
>>>>>
>>>>> For the latter case, I do not think it is possible to guarantee that an IO will
>>>>> not get split unless we are talking about single page IOs (e.g. 4K on X86). See
>>>>> a somewhat similar request here and comments about it.
>>>>>
>>>>> https://www.spinics.net/lists/linux-block/msg55079.html
>>>>
>>>> At the moment we are doing the former, but it looks like a hack to me to
>>>> go directly to the NVMe driver.
>>>
>>> That is what the nvme driver ioctl() is for no ? An application can send an NVMe
>>> command directly to the driver with it. That is not a hack, but the regular way
>>> of doing passthrough for NVMe, isn't it ?
>>
>> We have enabled it through uring to get async() passthru submission.
>> Looks like a hack at the moment, but we might just send a RFC to have
>> something concrete to based the discussion on.
>
>Yes, that would clarify things.
>
>>>> I was thinking that we could enable the second path by making use of
>>>> chunk_sectors and limit the I/O size just as the append_max_io_size
>>>> does. Is this the complete wrong way of looking at it?
>>>
>>> The block layer cannot limit the size of a passthrough command since the command
>>> is protocol specific and the block layer is a protocol independent interface.
>>
>> Agree. This work depend in the application being aware of a max I/O size
>> at the moment. Down the road, we will remove (or at least limit a lot)
>> this constraint for ZNS devices that can eventually cache out-of-order
>> I/Os.
>
>I/Os with a data buffer all need mapping for DMA, no matter the device
>functionalities or the command being executed. With passthrough, I do not think
>it is possible to have the block layer limit anything. It will likely always be
>pass-or-fail. With passthrough, the application needs to understand what it is
>doing.
Yes. It is definitely for applications that are implementing directly
zone-aware logic.
>
>>
>>> SCSI SG does not split passthrough requests, it cannot. For passthrough
>>> commands, the command buffer can be dma-mapped or it cannot. If mapping
>>> succeeds, the command is issued. If it cannot, the command is failed. At least,
>>> that is my understanding of how the stack is working.
>>
>> I am not familiar with SCSI SG. This looks like how the ioctl() passthru
>> works in NVMe, but as mentioned above, we would like to enable an
>> async() passthru path.
>
>That is done with bsg for SCSI I believe. You may want to have a look around
>there. The SG driver used to have the write() system call mapped to "issuing a
>command" and read() for "getting a command result". That was removed however.
>But I think bsg has a replacement for that defunct async passthrough interface.
>Not sure. I have not looked at that for a while.
>
Thanks for the pointer; I was not aware of this. We will look into it.
Thanks again for the help Damien!
Javier
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-06-17 7:34 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-15 23:34 [PATCH 0/5] nvme support for zoned namespace command set Keith Busch
2020-06-15 23:34 ` [PATCH 1/5] block: add capacity field to zone descriptors Keith Busch
2020-06-15 23:49 ` Chaitanya Kulkarni
2020-06-16 10:28 ` Javier González
2020-06-16 13:47 ` Daniel Wagner
2020-06-16 13:54 ` Johannes Thumshirn
2020-06-16 15:41 ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 2/5] null_blk: introduce zone capacity for zoned device Keith Busch
2020-06-15 23:46 ` Chaitanya Kulkarni
2020-06-16 14:18 ` Daniel Wagner
2020-06-16 15:48 ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 3/5] nvme: implement I/O Command Sets Command Set support Keith Busch
2020-06-16 10:33 ` Javier González
2020-06-16 17:14 ` Niklas Cassel
2020-06-16 15:58 ` Martin K. Petersen
2020-06-16 17:01 ` Keith Busch
2020-06-17 9:50 ` Niklas Cassel
2020-06-16 17:06 ` Niklas Cassel
2020-06-17 2:01 ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 4/5] nvme: support for multi-command set effects Keith Busch
2020-06-16 10:34 ` Javier González
2020-06-16 16:03 ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 5/5] nvme: support for zoned namespaces Keith Busch
2020-06-16 10:41 ` Javier González
2020-06-16 11:18 ` Matias Bjørling
2020-06-16 12:00 ` Javier González
2020-06-16 12:06 ` Matias Bjørling
2020-06-16 12:24 ` Javier González
2020-06-16 12:27 ` Matias Bjørling
2020-06-16 12:35 ` Damien Le Moal
[not found] ` <CGME20200616130815uscas1p1be34e5fceaa548eac31fb30790a689d4@uscas1p1.samsung.com>
2020-06-16 13:08 ` Judy Brock
2020-06-16 13:32 ` Matias Bjørling
2020-06-16 13:34 ` Damien Le Moal
2020-06-16 14:16 ` Javier González
2020-06-16 14:42 ` Damien Le Moal
2020-06-16 15:02 ` Javier González
2020-06-16 15:20 ` Matias Bjørling
2020-06-16 16:03 ` Javier González
2020-06-16 16:07 ` Matias Bjorling
2020-06-16 16:21 ` Javier González
2020-06-16 16:25 ` Matias Bjørling
2020-06-16 15:48 ` Keith Busch
2020-06-16 15:55 ` Javier González
2020-06-16 16:04 ` Matias Bjorling
2020-06-16 16:07 ` Keith Busch
2020-06-16 16:13 ` Javier González
2020-06-17 0:38 ` Damien Le Moal
2020-06-17 6:18 ` Javier González
2020-06-17 6:54 ` Damien Le Moal
2020-06-17 7:11 ` Javier González
2020-06-17 7:29 ` Damien Le Moal
2020-06-17 7:34 ` Javier González [this message]
2020-06-17 0:14 ` Damien Le Moal
2020-06-17 6:09 ` Javier González
2020-06-17 6:47 ` Damien Le Moal
2020-06-17 7:02 ` Javier González
2020-06-17 7:24 ` Damien Le Moal
2020-06-17 7:29 ` Javier González
[not found] ` <CGME20200616123503uscas1p22ce22054a1b4152a20437b5abdd55119@uscas1p2.samsung.com>
2020-06-16 12:35 ` Judy Brock
2020-06-16 12:37 ` Damien Le Moal
2020-06-16 12:37 ` Matias Bjørling
2020-06-16 13:12 ` Judy Brock
2020-06-16 13:18 ` Judy Brock
2020-06-16 13:32 ` Judy Brock
2020-06-16 13:39 ` Damien Le Moal
2020-06-17 7:43 ` Christoph Hellwig
2020-06-17 12:01 ` Martin K. Petersen
2020-06-17 15:00 ` Javier González
2020-06-17 14:42 ` Javier González
2020-06-17 17:57 ` Matias Bjørling
2020-06-17 18:28 ` Javier González
2020-06-17 18:55 ` Matias Bjorling
2020-06-17 19:09 ` Javier González
2020-06-17 19:23 ` Matias Bjørling
2020-06-17 19:40 ` Javier González
2020-06-17 23:44 ` Heiner Litz
2020-06-18 1:55 ` Keith Busch
2020-06-18 4:24 ` Heiner Litz
2020-06-18 5:15 ` Damien Le Moal
2020-06-18 20:47 ` Heiner Litz
2020-06-18 21:04 ` Matias Bjorling
2020-06-18 21:19 ` Keith Busch
2020-06-18 22:05 ` Heiner Litz
2020-06-19 0:57 ` Damien Le Moal
2020-06-19 10:29 ` Matias Bjorling
2020-06-19 18:08 ` Heiner Litz
2020-06-19 18:10 ` Keith Busch
2020-06-19 18:17 ` Heiner Litz
2020-06-19 18:22 ` Keith Busch
2020-06-19 18:25 ` Matias Bjørling
2020-06-19 18:40 ` Heiner Litz
2020-06-19 18:18 ` Matias Bjørling
2020-06-20 6:33 ` Christoph Hellwig
2020-06-20 17:52 ` Heiner Litz
2022-03-02 21:11 ` Luis Chamberlain
2020-06-17 2:08 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200617073430.htbwj6ybkbui7jai@mpHalley.localdomain \
--to=javier@javigon.com \
--cc=Ajay.Joshi@wdc.com \
--cc=Aravind.Ramesh@wdc.com \
--cc=Damien.LeMoal@wdc.com \
--cc=Dmitry.Fomichev@wdc.com \
--cc=Hans.Holmberg@wdc.com \
--cc=Keith.Busch@wdc.com \
--cc=Matias.Bjorling@wdc.com \
--cc=Niklas.Cassel@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mb@lightnvm.io \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).