linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Kanchan Joshi <joshiiitr@gmail.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>,
	Kanchan Joshi <joshi.k@samsung.com>,
	Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
	sagi@grimberg.me, linux-nvme@lists.infradead.org,
	io-uring@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Javier Gonzalez <javier.gonz@samsung.com>,
	Nitesh Shetty <nj.shetty@samsung.com>,
	Selvakumar S <selvakuma.s1@samsung.com>
Subject: Re: [RFC PATCH 0/4] Asynchronous passthrough ioctl
Date: Thu, 28 Jan 2021 10:24:45 -0700	[thread overview]
Message-ID: <dd0392a1-acfa-ef4d-5531-5f1dddc9efe7@kernel.dk> (raw)
In-Reply-To: <CA+1E3rKeqaLXBuvpMcjZ37XH9RqJHjPnTFObJj0T-u8K9Otw-w@mail.gmail.com>

On 1/28/21 10:13 AM, Kanchan Joshi wrote:
> On Thu, Jan 28, 2021 at 8:08 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 1/28/21 5:04 AM, Kanchan Joshi wrote:
>>> On Wed, Jan 27, 2021 at 9:32 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>
>>>> On 27/01/2021 15:42, Pavel Begunkov wrote:
>>>>> On 27/01/2021 15:00, Kanchan Joshi wrote:
>>>>>> This RFC patchset adds asynchronous ioctl capability for NVMe devices.
>>>>>> Purpose of RFC is to get the feedback and optimize the path.
>>>>>>
>>>>>> At the uppermost io-uring layer, a new opcode IORING_OP_IOCTL_PT is
>>>>>> presented to user-space applications. Like regular-ioctl, it takes
>>>>>> ioctl opcode and an optional argument (ioctl-specific input/output
>>>>>> parameter). Unlike regular-ioctl, it is made to skip the block-layer
>>>>>> and reach directly to the underlying driver (nvme in the case of this
>>>>>> patchset). This path between io-uring and nvme is via a newly
>>>>>> introduced block-device operation "async_ioctl". This operation
>>>>>> expects io-uring to supply a callback function which can be used to
>>>>>> report completion at later stage.
>>>>>>
>>>>>> For a regular ioctl, NVMe driver submits the command to the device and
>>>>>> the submitter (task) is made to wait until completion arrives. For
>>>>>> async-ioctl, completion is decoupled from submission. Submitter goes
>>>>>> back to its business without waiting for nvme-completion. When
>>>>>> nvme-completion arrives, it informs io-uring via the registered
>>>>>> completion-handler. But some ioctls may require updating certain
>>>>>> ioctl-specific fields which can be accessed only in context of the
>>>>>> submitter task. For that reason, NVMe driver uses task-work infra for
>>>>>> that ioctl-specific update. Since task-work is not exported, it cannot
>>>>>> be referenced when nvme is compiled as a module. Therefore, one of the
>>>>>> patch exports task-work API.
>>>>>>
>>>>>> Here goes example of usage (pseudo-code).
>>>>>> Actual nvme-cli source, modified to issue all ioctls via this opcode
>>>>>> is present at-
>>>>>> https://github.com/joshkan/nvme-cli/commit/a008a733f24ab5593e7874cfbc69ee04e88068c5
>>>>>
>>>>> see https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops
>>>>>
>>>>> Looks like good time to bring that branch/discussion back
>>>>
>>>> a bit more context:
>>>> https://github.com/axboe/liburing/issues/270
>>>
>>> Thanks, it looked good. It seems key differences (compared to
>>> uring-patch that I posted) are -
>>> 1. using file-operation instead of block-dev operation.
>>
>> Right, it's meant to span wider than just block devices.
>>
>>> 2. repurpose the sqe memory for ioctl-cmd. If an application does
>>> ioctl with <=40 bytes of cmd, it does not have to allocate ioctl-cmd.
>>> That's nifty. We still need to support passing larger-cmd (e.g.
>>> nvme-passthru ioctl takes 72 bytes) but that shouldn't get too
>>> difficult I suppose.
>>
>> It's actually 48 bytes in the as-posted version, and I've bumped it to
>> 56 bytes in the latest branch. So not quite enough for everything,
>> nothing ever will be, but should work for a lot of cases without
>> requiring per-command allocations just for the actual command.
> 
> Agreed. But if I got it right, you are open to support both in-the-sqe
> command (<= 56 bytes) and out-of-sqe command (> 56 bytes) with this
> interface.
> Driver processing the ioctl can fetch the cmd from user-space in one
> case (as it does now), and skips in another.

Your out-of-seq command would be none of io_urings business, outside of
the fact that we'd need to ensure it's stable if we need to postpone
it. So yes, that would be fine, it just means your actual command is
passed in as a pointer, and you would be responsible for copying it
in for execution

We're going to need something to handle postponing, and something
for ensuring that eg cancelations free the allocated memory.

>>> And for some ioctls, driver may still need to use task-work to update
>>> the user-space pointers (embedded in uring/ioctl cmd) during
>>> completion.
>>>
>>> @Jens - will it be fine if I start looking at plumbing nvme-part of
>>> this series on top of your work?
>>
>> Sure, go ahead. Just beware that things are still changing, so you might
>> have to adapt it a few times. It's still early days, but I do think
>> that's the way forward in providing controlled access to what is
>> basically async ioctls.
> 
> Sounds good, I will start with the latest branch that you posted. Thanks.

It's io_uring-fops.v2 for now, use that one.

-- 
Jens Axboe


  reply	other threads:[~2021-01-28 17:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210127150134epcas5p251fc1de3ff3581dd4c68b3fbe0b9dd91@epcas5p2.samsung.com>
2021-01-27 15:00 ` [RFC PATCH 0/4] Asynchronous passthrough ioctl Kanchan Joshi
     [not found]   ` <CGME20210127150140epcas5p32832cc0c0db953db199eb9dd326f2d4c@epcas5p3.samsung.com>
2021-01-27 15:00     ` [RFC PATCH 1/4] block: introduce async ioctl operation Kanchan Joshi
     [not found]   ` <CGME20210127150144epcas5p29ccb35d7e7170aba7947b5ee16fd2db0@epcas5p2.samsung.com>
2021-01-27 15:00     ` [RFC PATCH 2/4] kernel: export task_work_add Kanchan Joshi
     [not found]   ` <CGME20210127150149epcas5p4fa8edd47712f28ccdd9bac5139fc6e61@epcas5p4.samsung.com>
2021-01-27 15:00     ` [RFC PATCH 3/4] nvme: add async ioctl support Kanchan Joshi
     [not found]   ` <CGME20210127150156epcas5p26cdf368e4ff6bffb132fa1c7f9430653@epcas5p2.samsung.com>
2021-01-27 15:00     ` [RFC PATCH 4/4] io_uring: add async passthrough " Kanchan Joshi
2021-01-27 15:42   ` [RFC PATCH 0/4] Asynchronous passthrough ioctl Pavel Begunkov
2021-01-27 15:53     ` Pavel Begunkov
2021-01-28 12:04       ` Kanchan Joshi
2021-01-28 14:38         ` Jens Axboe
2021-01-28 17:13           ` Kanchan Joshi
2021-01-28 17:24             ` Jens Axboe [this message]
2021-02-22 13:42               ` Kanchan Joshi
2021-02-22 14:33                 ` Jens Axboe
2021-02-23  4:41                   ` Kanchan Joshi
2021-01-28 14:50         ` Jens Axboe
2021-01-28 17:25           ` Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd0392a1-acfa-ef4d-5531-5f1dddc9efe7@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=javier.gonz@samsung.com \
    --cc=joshi.k@samsung.com \
    --cc=joshiiitr@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nj.shetty@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=selvakuma.s1@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).