All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Javier González" <javier@javigon.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Ajay Joshi" <Ajay.Joshi@wdc.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Keith Busch" <Keith.Busch@wdc.com>,
	"Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>,
	"Aravind Ramesh" <Aravind.Ramesh@wdc.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"Hans Holmberg" <Hans.Holmberg@wdc.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Matias Bjørling" <mb@lightnvm.io>,
	"Christoph Hellwig" <hch@lst.de>,
	"Matias Bjorling" <Matias.Bjorling@wdc.com>
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces
Date: Wed, 17 Jun 2020 09:11:41 +0200	[thread overview]
Message-ID: <20200617071141.rfy545k2vlzkroby@mpHalley.localdomain> (raw)
In-Reply-To: <CY4PR04MB3751808DFE9AF00EF172DFCCE79A0@CY4PR04MB3751.namprd04.prod.outlook.com>

On 17.06.2020 06:54, Damien Le Moal wrote:
>On 2020/06/17 15:18, Javier González wrote:
>> On 17.06.2020 00:38, Damien Le Moal wrote:
>>> On 2020/06/17 1:13, Javier González wrote:
>>>> On 16.06.2020 09:07, Keith Busch wrote:
>>>>> On Tue, Jun 16, 2020 at 05:55:26PM +0200, Javier González wrote:
>>>>>> On 16.06.2020 08:48, Keith Busch wrote:
>>>>>>> On Tue, Jun 16, 2020 at 05:02:17PM +0200, Javier González wrote:
>>>>>>>> This depends very much on how the FS / application is managing
>>>>>>>> stripping. At the moment our main use case is enabling user-space
>>>>>>>> applications submitting I/Os to raw ZNS devices through the kernel.
>>>>>>>>
>>>>>>>> Can we enable this use case to start with?
>>>>>>>
>>>>>>> I think this already provides that. You can set the nsid value to
>>>>>>> whatever you want in the passthrough interface, so a namespace block
>>>>>>> device is not required to issue I/O to a ZNS namespace from user space.
>>>>>>
>>>>>> Mmmmm. Problem now is that the check on the nvme driver prevents the ZNS
>>>>>> namespace from being initialized. Am I missing something?
>>>>>
>>>>> Hm, okay, it may not work for you. We need the driver to create at least
>>>>> one namespace so that we have tags and request_queue. If you have that,
>>>>> you can issue IO to any other attached namespace through the passthrough
>>>>> interface, but we can't assume there is an available namespace.
>>>>
>>>> That makes sense for now.
>>>>
>>>> The next step for us is to enable a passthrough on uring, making sure
>>>> that I/Os do not split.
>>>
>>> Passthrough as in "application issues directly NVMe commands" like for SG_IO
>>> with SCSI ? Or do you mean raw block device file accesses by the application,
>>> meaning that the IO goes through the block IO stack as opposed to directly going
>>> to the driver ?
>>>
>>> For the latter case, I do not think it is possible to guarantee that an IO will
>>> not get split unless we are talking about single page IOs (e.g. 4K on X86). See
>>> a somewhat similar request here and comments about it.
>>>
>>> https://www.spinics.net/lists/linux-block/msg55079.html
>>
>> At the moment we are doing the former, but it looks like a hack to me to
>> go directly to the NVMe driver.
>
>That is what the nvme driver ioctl() is for no ? An application can send an NVMe
>command directly to the driver with it. That is not a hack, but the regular way
>of doing passthrough for NVMe, isn't it ?

We have enabled it through uring to get async() passthru submission.
Looks like a hack at the moment, but we might just send a RFC to have
something concrete to based the discussion on.

>
>> I was thinking that we could enable the second path by making use of
>> chunk_sectors and limit the I/O size just as the append_max_io_size
>> does. Is this the complete wrong way of looking at it?
>
>The block layer cannot limit the size of a passthrough command since the command
>is protocol specific and the block layer is a protocol independent interface.

Agree. This work depend in the application being aware of a max I/O size
at the moment. Down the road, we will remove (or at least limit a lot)
this constraint for ZNS devices that can eventually cache out-of-order
I/Os.

>SCSI SG does not split passthrough requests, it cannot. For passthrough
>commands, the command buffer can be dma-mapped or it cannot. If mapping
>succeeds, the command is issued. If it cannot, the command is failed. At least,
>that is my understanding of how the stack is working.

I am not familiar with SCSI SG. This looks like how the ioctl() passthru
works in NVMe, but as mentioned above, we would like to enable an
async() passthru path.

Thanks,
Javier

WARNING: multiple messages have this Message-ID (diff)
From: "Javier González" <javier@javigon.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Ajay Joshi" <Ajay.Joshi@wdc.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Keith Busch" <Keith.Busch@wdc.com>,
	"Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>,
	"Aravind Ramesh" <Aravind.Ramesh@wdc.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"Hans Holmberg" <Hans.Holmberg@wdc.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Matias Bjørling" <mb@lightnvm.io>,
	"Christoph Hellwig" <hch@lst.de>,
	"Matias Bjorling" <Matias.Bjorling@wdc.com>
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces
Date: Wed, 17 Jun 2020 09:11:41 +0200	[thread overview]
Message-ID: <20200617071141.rfy545k2vlzkroby@mpHalley.localdomain> (raw)
In-Reply-To: <CY4PR04MB3751808DFE9AF00EF172DFCCE79A0@CY4PR04MB3751.namprd04.prod.outlook.com>

On 17.06.2020 06:54, Damien Le Moal wrote:
>On 2020/06/17 15:18, Javier González wrote:
>> On 17.06.2020 00:38, Damien Le Moal wrote:
>>> On 2020/06/17 1:13, Javier González wrote:
>>>> On 16.06.2020 09:07, Keith Busch wrote:
>>>>> On Tue, Jun 16, 2020 at 05:55:26PM +0200, Javier González wrote:
>>>>>> On 16.06.2020 08:48, Keith Busch wrote:
>>>>>>> On Tue, Jun 16, 2020 at 05:02:17PM +0200, Javier González wrote:
>>>>>>>> This depends very much on how the FS / application is managing
>>>>>>>> stripping. At the moment our main use case is enabling user-space
>>>>>>>> applications submitting I/Os to raw ZNS devices through the kernel.
>>>>>>>>
>>>>>>>> Can we enable this use case to start with?
>>>>>>>
>>>>>>> I think this already provides that. You can set the nsid value to
>>>>>>> whatever you want in the passthrough interface, so a namespace block
>>>>>>> device is not required to issue I/O to a ZNS namespace from user space.
>>>>>>
>>>>>> Mmmmm. Problem now is that the check on the nvme driver prevents the ZNS
>>>>>> namespace from being initialized. Am I missing something?
>>>>>
>>>>> Hm, okay, it may not work for you. We need the driver to create at least
>>>>> one namespace so that we have tags and request_queue. If you have that,
>>>>> you can issue IO to any other attached namespace through the passthrough
>>>>> interface, but we can't assume there is an available namespace.
>>>>
>>>> That makes sense for now.
>>>>
>>>> The next step for us is to enable a passthrough on uring, making sure
>>>> that I/Os do not split.
>>>
>>> Passthrough as in "application issues directly NVMe commands" like for SG_IO
>>> with SCSI ? Or do you mean raw block device file accesses by the application,
>>> meaning that the IO goes through the block IO stack as opposed to directly going
>>> to the driver ?
>>>
>>> For the latter case, I do not think it is possible to guarantee that an IO will
>>> not get split unless we are talking about single page IOs (e.g. 4K on X86). See
>>> a somewhat similar request here and comments about it.
>>>
>>> https://www.spinics.net/lists/linux-block/msg55079.html
>>
>> At the moment we are doing the former, but it looks like a hack to me to
>> go directly to the NVMe driver.
>
>That is what the nvme driver ioctl() is for no ? An application can send an NVMe
>command directly to the driver with it. That is not a hack, but the regular way
>of doing passthrough for NVMe, isn't it ?

We have enabled it through uring to get async() passthru submission.
Looks like a hack at the moment, but we might just send a RFC to have
something concrete to based the discussion on.

>
>> I was thinking that we could enable the second path by making use of
>> chunk_sectors and limit the I/O size just as the append_max_io_size
>> does. Is this the complete wrong way of looking at it?
>
>The block layer cannot limit the size of a passthrough command since the command
>is protocol specific and the block layer is a protocol independent interface.

Agree. This work depend in the application being aware of a max I/O size
at the moment. Down the road, we will remove (or at least limit a lot)
this constraint for ZNS devices that can eventually cache out-of-order
I/Os.

>SCSI SG does not split passthrough requests, it cannot. For passthrough
>commands, the command buffer can be dma-mapped or it cannot. If mapping
>succeeds, the command is issued. If it cannot, the command is failed. At least,
>that is my understanding of how the stack is working.

I am not familiar with SCSI SG. This looks like how the ioctl() passthru
works in NVMe, but as mentioned above, we would like to enable an
async() passthru path.

Thanks,
Javier

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-06-17  7:11 UTC|newest]

Thread overview: 192+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 23:34 [PATCH 0/5] nvme support for zoned namespace command set Keith Busch
2020-06-15 23:34 ` Keith Busch
2020-06-15 23:34 ` [PATCH 1/5] block: add capacity field to zone descriptors Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-15 23:49   ` Chaitanya Kulkarni
2020-06-15 23:49     ` Chaitanya Kulkarni
2020-06-16 10:28   ` Javier González
2020-06-16 10:28     ` Javier González
2020-06-16 13:47   ` Daniel Wagner
2020-06-16 13:47     ` Daniel Wagner
2020-06-16 13:54   ` Johannes Thumshirn
2020-06-16 13:54     ` Johannes Thumshirn
2020-06-16 15:41   ` Martin K. Petersen
2020-06-16 15:41     ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 2/5] null_blk: introduce zone capacity for zoned device Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-15 23:46   ` Chaitanya Kulkarni
2020-06-15 23:46     ` Chaitanya Kulkarni
2020-06-16 14:18   ` Daniel Wagner
2020-06-16 14:18     ` Daniel Wagner
2020-06-16 15:48   ` Martin K. Petersen
2020-06-16 15:48     ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 3/5] nvme: implement I/O Command Sets Command Set support Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-16 10:33   ` Javier González
2020-06-16 10:33     ` Javier González
2020-06-16 17:14     ` Niklas Cassel
2020-06-16 17:14       ` Niklas Cassel
2020-06-16 15:58   ` Martin K. Petersen
2020-06-16 15:58     ` Martin K. Petersen
2020-06-16 17:01     ` Keith Busch
2020-06-16 17:01       ` Keith Busch
2020-06-17  9:50       ` Niklas Cassel
2020-06-17  9:50         ` Niklas Cassel
2020-06-16 17:06     ` Niklas Cassel
2020-06-16 17:06       ` Niklas Cassel
2020-06-17  2:01       ` Martin K. Petersen
2020-06-17  2:01         ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 4/5] nvme: support for multi-command set effects Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-16 10:34   ` Javier González
2020-06-16 10:34     ` Javier González
2020-06-16 16:03   ` Martin K. Petersen
2020-06-16 16:03     ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 5/5] nvme: support for zoned namespaces Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-16 10:41   ` Javier González
2020-06-16 10:41     ` Javier González
2020-06-16 11:18     ` Matias Bjørling
2020-06-16 11:18       ` Matias Bjørling
2020-06-16 12:00       ` Javier González
2020-06-16 12:00         ` Javier González
2020-06-16 12:06         ` Matias Bjørling
2020-06-16 12:06           ` Matias Bjørling
2020-06-16 12:24           ` Javier González
2020-06-16 12:24             ` Javier González
2020-06-16 12:27             ` Matias Bjørling
2020-06-16 12:27               ` Matias Bjørling
2020-06-16 12:35             ` Damien Le Moal
2020-06-16 12:35               ` Damien Le Moal
     [not found]               ` <CGME20200616130815uscas1p1be34e5fceaa548eac31fb30790a689d4@uscas1p1.samsung.com>
2020-06-16 13:08                 ` Judy Brock
2020-06-16 13:08                   ` Judy Brock
2020-06-16 13:32                   ` Matias Bjørling
2020-06-16 13:32                     ` Matias Bjørling
2020-06-16 13:34                   ` Damien Le Moal
2020-06-16 13:34                     ` Damien Le Moal
2020-06-16 14:16               ` Javier González
2020-06-16 14:16                 ` Javier González
2020-06-16 14:42                 ` Damien Le Moal
2020-06-16 14:42                   ` Damien Le Moal
2020-06-16 15:02                   ` Javier González
2020-06-16 15:02                     ` Javier González
2020-06-16 15:20                     ` Matias Bjørling
2020-06-16 15:20                       ` Matias Bjørling
2020-06-16 16:03                       ` Javier González
2020-06-16 16:03                         ` Javier González
2020-06-16 16:07                         ` Matias Bjorling
2020-06-16 16:07                           ` Matias Bjorling
2020-06-16 16:21                           ` Javier González
2020-06-16 16:21                             ` Javier González
2020-06-16 16:25                             ` Matias Bjørling
2020-06-16 16:25                               ` Matias Bjørling
2020-06-16 15:48                     ` Keith Busch
2020-06-16 15:48                       ` Keith Busch
2020-06-16 15:55                       ` Javier González
2020-06-16 15:55                         ` Javier González
2020-06-16 16:04                         ` Matias Bjorling
2020-06-16 16:04                           ` Matias Bjorling
2020-06-16 16:07                         ` Keith Busch
2020-06-16 16:07                           ` Keith Busch
2020-06-16 16:13                           ` Javier González
2020-06-16 16:13                             ` Javier González
2020-06-17  0:38                             ` Damien Le Moal
2020-06-17  0:38                               ` Damien Le Moal
2020-06-17  6:18                               ` Javier González
2020-06-17  6:18                                 ` Javier González
2020-06-17  6:54                                 ` Damien Le Moal
2020-06-17  6:54                                   ` Damien Le Moal
2020-06-17  7:11                                   ` Javier González [this message]
2020-06-17  7:11                                     ` Javier González
2020-06-17  7:29                                     ` Damien Le Moal
2020-06-17  7:29                                       ` Damien Le Moal
2020-06-17  7:34                                       ` Javier González
2020-06-17  7:34                                         ` Javier González
2020-06-17  0:14                     ` Damien Le Moal
2020-06-17  0:14                       ` Damien Le Moal
2020-06-17  6:09                       ` Javier González
2020-06-17  6:09                         ` Javier González
2020-06-17  6:47                         ` Damien Le Moal
2020-06-17  6:47                           ` Damien Le Moal
2020-06-17  7:02                           ` Javier González
2020-06-17  7:02                             ` Javier González
2020-06-17  7:24                             ` Damien Le Moal
2020-06-17  7:24                               ` Damien Le Moal
2020-06-17  7:29                               ` Javier González
2020-06-17  7:29                                 ` Javier González
     [not found]         ` <CGME20200616123503uscas1p22ce22054a1b4152a20437b5abdd55119@uscas1p2.samsung.com>
2020-06-16 12:35           ` Judy Brock
2020-06-16 12:35             ` Judy Brock
2020-06-16 12:37             ` Damien Le Moal
2020-06-16 12:37               ` Damien Le Moal
2020-06-16 12:37             ` Matias Bjørling
2020-06-16 12:37               ` Matias Bjørling
2020-06-16 13:12               ` Judy Brock
2020-06-16 13:12                 ` Judy Brock
2020-06-16 13:18                 ` Judy Brock
2020-06-16 13:18                   ` Judy Brock
2020-06-16 13:32                   ` Judy Brock
2020-06-16 13:32                     ` Judy Brock
2020-06-16 13:39                     ` Damien Le Moal
2020-06-16 13:39                       ` Damien Le Moal
2020-06-17  7:43     ` Christoph Hellwig
2020-06-17  7:43       ` Christoph Hellwig
2020-06-17 12:01       ` Martin K. Petersen
2020-06-17 12:01         ` Martin K. Petersen
2020-06-17 15:00         ` Javier González
2020-06-17 15:00           ` Javier González
2020-06-17 14:42       ` Javier González
2020-06-17 14:42         ` Javier González
2020-06-17 17:57         ` Matias Bjørling
2020-06-17 17:57           ` Matias Bjørling
2020-06-17 18:28           ` Javier González
2020-06-17 18:28             ` Javier González
2020-06-17 18:55             ` Matias Bjorling
2020-06-17 18:55               ` Matias Bjorling
2020-06-17 19:09               ` Javier González
2020-06-17 19:09                 ` Javier González
2020-06-17 19:23                 ` Matias Bjørling
2020-06-17 19:23                   ` Matias Bjørling
2020-06-17 19:40                   ` Javier González
2020-06-17 19:40                     ` Javier González
2020-06-17 23:44                     ` Heiner Litz
2020-06-17 23:44                       ` Heiner Litz
2020-06-18  1:55                       ` Keith Busch
2020-06-18  1:55                         ` Keith Busch
2020-06-18  4:24                         ` Heiner Litz
2020-06-18  4:24                           ` Heiner Litz
2020-06-18  5:15                           ` Damien Le Moal
2020-06-18  5:15                             ` Damien Le Moal
2020-06-18 20:47                             ` Heiner Litz
2020-06-18 20:47                               ` Heiner Litz
2020-06-18 21:04                               ` Matias Bjorling
2020-06-18 21:04                                 ` Matias Bjorling
2020-06-18 21:19                               ` Keith Busch
2020-06-18 21:19                                 ` Keith Busch
2020-06-18 22:05                                 ` Heiner Litz
2020-06-18 22:05                                   ` Heiner Litz
2020-06-19  0:57                                   ` Damien Le Moal
2020-06-19  0:57                                     ` Damien Le Moal
2020-06-19 10:29                                   ` Matias Bjorling
2020-06-19 10:29                                     ` Matias Bjorling
2020-06-19 18:08                                     ` Heiner Litz
2020-06-19 18:08                                       ` Heiner Litz
2020-06-19 18:10                                       ` Keith Busch
2020-06-19 18:10                                         ` Keith Busch
2020-06-19 18:17                                         ` Heiner Litz
2020-06-19 18:17                                           ` Heiner Litz
2020-06-19 18:22                                           ` Keith Busch
2020-06-19 18:22                                             ` Keith Busch
2020-06-19 18:25                                           ` Matias Bjørling
2020-06-19 18:25                                             ` Matias Bjørling
2020-06-19 18:40                                             ` Heiner Litz
2020-06-19 18:40                                               ` Heiner Litz
2020-06-19 18:18                                       ` Matias Bjørling
2020-06-19 18:18                                         ` Matias Bjørling
2020-06-20  6:33                                       ` Christoph Hellwig
2020-06-20  6:33                                         ` Christoph Hellwig
2020-06-20 17:52                                         ` Heiner Litz
2020-06-20 17:52                                           ` Heiner Litz
2020-06-22 14:01                                           ` Christoph Hellwig
2022-03-02 21:11                   ` Luis Chamberlain
2020-06-17  2:08   ` Martin K. Petersen
2020-06-17  2:08     ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200617071141.rfy545k2vlzkroby@mpHalley.localdomain \
    --to=javier@javigon.com \
    --cc=Ajay.Joshi@wdc.com \
    --cc=Aravind.Ramesh@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Dmitry.Fomichev@wdc.com \
    --cc=Hans.Holmberg@wdc.com \
    --cc=Keith.Busch@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mb@lightnvm.io \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.