linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Sagi Grimberg <sagi@grimberg.me>,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Hannes Reinecke <hare@suse.de>,
	lsf-pc@lists.linux-foundation.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] block drivers in user space
Date: Mon, 14 Mar 2022 12:12:04 -0500	[thread overview]
Message-ID: <50379fbf-0344-7471-365e-76bab8dc949e@oracle.com> (raw)
In-Reply-To: <6d831f69-06f4-fafe-ce17-13596e6f3f6d@grimberg.me>

On 3/13/22 4:15 PM, Sagi Grimberg wrote:
> 
>>>>>> Actually, I'd rather have something like an 'inverse io_uring', where
>>>>>> an application creates a memory region separated into several 'ring'
>>>>>> for submission and completion.
>>>>>> Then the kernel could write/map the incoming data onto the rings, and
>>>>>> application can read from there.
>>>>>> Maybe it'll be worthwhile to look at virtio here.
>>>>>
>>>>> There is lio loopback backed by tcmu... I'm assuming that nvmet can
>>>>> hook into the same/similar interface. nvmet is pretty lean, and we
>>>>> can probably help tcmu/equivalent scale better if that is a concern...
>>>>
>>>> Sagi,
>>>>
>>>> I looked at tcmu prior to starting this work.  Other than the tcmu
>>>> overhead, one concern was the complexity of a scsi device interface
>>>> versus sending block requests to userspace.
>>>
>>> The complexity is understandable, though it can be viewed as a
>>> capability as well. Note I do not have any desire to promote tcmu here,
>>> just trying to understand if we need a brand new interface rather than
>>> making the existing one better.
>>
>> Ccing tcmu maintainer Bodo.
>>
>> We don't want to re-use tcmu's interface.
>>
>> Bodo has been looking into on a new interface to avoid issues tcmu has
>> and to improve performance. If it's allowed to add a tcmu like backend to
>> nvmet then it would be great because lio was not really made with mq and
>> perf in mind so it already starts with issues. I just started doing the
>> basics like removing locks from the main lio IO path but it seems like
>> there is just so much work.
> 
> Good to know...
> 
> So I hear there is a desire to do this. So I think we should list the
> use-cases for this first because that would lead to different design
> choices.. For example one use-case is just to send read/write/flush
> to userspace, another may want to passthru nvme commands to userspace
> and there may be others...

We might want to discuss at OLS or start a new thread.

Based on work we did for tcmu and local nbd, the issue is how complex
can handling nvme commands in the kernel get? If you want to run nvmet
on a single node then you can pass just read/write/flush to userspace
and it's not really an issue.

For tcmu/nbd the issue we are hitting is how to handle SCSI PGRs when
you are running lio on multiple nodes and the nodes export the same
LU to the same initiators. You can do it all in kernel like Bart did
for SCST and DLM
(https://blog.linuxplumbersconf.org/2015/ocw/sessions/2691.html).
However, for lio and tcmu some users didn't want pacemaker/corosync and
instead wanted to use their project's clustering or message passing
So pushing to user space is nice for these commands.

There are/were also issues with things like ALUA commands and handling
failover across nodes but I think nvme ANA avoids them. Like there
is nothing in nvme ANA like the SET_TARGET_PORT_GROUPS command which can
set the state of what would be remote ports right?

  reply	other threads:[~2022-03-14 17:12 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 19:59 [LSF/MM/BPF TOPIC] block drivers in user space Gabriel Krisman Bertazi
2022-02-21 23:16 ` Damien Le Moal
2022-02-21 23:30   ` Gabriel Krisman Bertazi
2022-02-22  6:57 ` Hannes Reinecke
2022-02-22 14:46   ` Sagi Grimberg
2022-02-22 17:46     ` Hannes Reinecke
2022-02-22 18:05     ` Gabriel Krisman Bertazi
2022-02-24  9:37       ` Xiaoguang Wang
2022-02-24 10:12       ` Sagi Grimberg
2022-03-01 23:24         ` Khazhy Kumykov
2022-03-02 16:16         ` Mike Christie
2022-03-13 21:15           ` Sagi Grimberg
2022-03-14 17:12             ` Mike Christie [this message]
2022-03-15  8:03               ` Sagi Grimberg
2022-03-14 19:21             ` Bart Van Assche
2022-03-15  6:52               ` Hannes Reinecke
2022-03-15  8:08                 ` Sagi Grimberg
2022-03-15  8:12                   ` Christoph Hellwig
2022-03-15  8:38                     ` Sagi Grimberg
2022-03-15  8:42                       ` Christoph Hellwig
2022-03-23 19:42                       ` Gabriel Krisman Bertazi
2022-03-24 17:05                         ` Sagi Grimberg
2022-03-15  8:04               ` Sagi Grimberg
2022-02-22 18:05   ` Bart Van Assche
2022-03-02 23:04   ` Gabriel Krisman Bertazi
2022-03-03  7:17     ` Hannes Reinecke
2022-03-27 16:35   ` Ming Lei
2022-03-28  5:47     ` Kanchan Joshi
2022-03-28  5:48     ` Hannes Reinecke
2022-03-28 20:20     ` Gabriel Krisman Bertazi
2022-03-29  0:30       ` Ming Lei
2022-03-29 17:20         ` Gabriel Krisman Bertazi
2022-03-30  1:55           ` Ming Lei
2022-03-30 18:22             ` Gabriel Krisman Bertazi
2022-03-31  1:38               ` Ming Lei
2022-03-31  3:49                 ` Bart Van Assche
2022-04-08  6:52     ` Xiaoguang Wang
2022-04-08  7:44       ` Ming Lei
2022-02-23  5:57 ` Gao Xiang
2022-02-23  7:46   ` Damien Le Moal
2022-02-23  8:11     ` Gao Xiang
2022-02-23 22:40       ` Damien Le Moal
2022-02-24  0:58         ` Gao Xiang
2022-06-09  2:01           ` Ming Lei
2022-06-09  2:28             ` Gao Xiang
2022-06-09  4:06               ` Ming Lei
2022-06-09  4:55                 ` Gao Xiang
2022-06-10  1:52                   ` Ming Lei
2022-07-28  8:23                 ` Pavel Machek
2022-03-02 16:52 ` Mike Christie
2022-03-03  7:09   ` Hannes Reinecke
2022-03-14 17:04     ` Mike Christie
2022-03-15  6:45       ` Hannes Reinecke
2022-03-05  7:29 ` Dongsheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50379fbf-0344-7471-365e-76bab8dc949e@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=hare@suse.de \
    --cc=krisman@collabora.com \
    --cc=linux-block@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).