linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Khazhy Kumykov <khazhy@google.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>,
	Hannes Reinecke <hare@suse.de>,
	lsf-pc@lists.linux-foundation.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] block drivers in user space
Date: Tue, 1 Mar 2022 15:24:17 -0800	[thread overview]
Message-ID: <CACGdZY+SLWETvAxH6M+BhipB1KV=W+kS7cxFWgaiK=en4sqDPQ@mail.gmail.com> (raw)
In-Reply-To: <e0a6ca51-8202-0b61-dd50-349e6f27761b@grimberg.me>

[-- Attachment #1: Type: text/plain, Size: 2809 bytes --]

On Thu, Feb 24, 2022 at 2:12 AM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
> >>> Actually, I'd rather have something like an 'inverse io_uring', where
> >>> an application creates a memory region separated into several 'ring'
> >>> for submission and completion.
> >>> Then the kernel could write/map the incoming data onto the rings, and
> >>> application can read from there.
> >>> Maybe it'll be worthwhile to look at virtio here.

Another advantage that comes to mind, especially the userspace target
needs to operate on the data anyways, is if we're forwarding to
io_uring based networking, or user based networking, reading a direct
mapping may be quicker than opening a file & reading it.

(I think an idea for parallel/out-of-order processing was
fd-per-request, if this is too much overhead / too limited due to fd
count, perhaps mapping is just the way to go)

> >>
> >> There is lio loopback backed by tcmu... I'm assuming that nvmet can
> >> hook into the same/similar interface. nvmet is pretty lean, and we
> >> can probably help tcmu/equivalent scale better if that is a concern...
> >
> > Sagi,
> >
> > I looked at tcmu prior to starting this work.  Other than the tcmu
> > overhead, one concern was the complexity of a scsi device interface
> > versus sending block requests to userspace.
>
> The complexity is understandable, though it can be viewed as a
> capability as well. Note I do not have any desire to promote tcmu here,
> just trying to understand if we need a brand new interface rather than
> making the existing one better.
>
> > What would be the advantage of doing it as a nvme target over delivering
> > directly to userspace as a block driver?
>
> Well, for starters you gain the features and tools that are extensively
> used with nvme. Plus you get the ecosystem support (development,
> features, capabilities and testing). There are clear advantages of
> plugging into an established ecosystem.

I recall when discussing an nvme style approach, another advantage was
the nvme target impl could be re-used if exposing the same interface
via this user space block device interface, or e.g. presenting as nvme
device to a VM, etc.

That said, for a device that just needs to support read/write &
forward data to some userspace networked storage, the overhead in
implementation and interface should be considered. If there's a rich
set of tooling here already to create a custom nvme target, perhaps
that could be leveraged?

Maybe there's a middle ground? If we do a "inverse io_uring" -
forwarding the block interface into userspace, and allowing those who
choose to implement passthrough commands (to get the extra
"capability")? Providing an efficient mechanism to forward block
requests to userspace, then allowing the target to implement their
favorite flavor.

Khazhy

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

  reply	other threads:[~2022-03-01 23:24 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 19:59 [LSF/MM/BPF TOPIC] block drivers in user space Gabriel Krisman Bertazi
2022-02-21 23:16 ` Damien Le Moal
2022-02-21 23:30   ` Gabriel Krisman Bertazi
2022-02-22  6:57 ` Hannes Reinecke
2022-02-22 14:46   ` Sagi Grimberg
2022-02-22 17:46     ` Hannes Reinecke
2022-02-22 18:05     ` Gabriel Krisman Bertazi
2022-02-24  9:37       ` Xiaoguang Wang
2022-02-24 10:12       ` Sagi Grimberg
2022-03-01 23:24         ` Khazhy Kumykov [this message]
2022-03-02 16:16         ` Mike Christie
2022-03-13 21:15           ` Sagi Grimberg
2022-03-14 17:12             ` Mike Christie
2022-03-15  8:03               ` Sagi Grimberg
2022-03-14 19:21             ` Bart Van Assche
2022-03-15  6:52               ` Hannes Reinecke
2022-03-15  8:08                 ` Sagi Grimberg
2022-03-15  8:12                   ` Christoph Hellwig
2022-03-15  8:38                     ` Sagi Grimberg
2022-03-15  8:42                       ` Christoph Hellwig
2022-03-23 19:42                       ` Gabriel Krisman Bertazi
2022-03-24 17:05                         ` Sagi Grimberg
2022-03-15  8:04               ` Sagi Grimberg
2022-02-22 18:05   ` Bart Van Assche
2022-03-02 23:04   ` Gabriel Krisman Bertazi
2022-03-03  7:17     ` Hannes Reinecke
2022-03-27 16:35   ` Ming Lei
2022-03-28  5:47     ` Kanchan Joshi
2022-03-28  5:48     ` Hannes Reinecke
2022-03-28 20:20     ` Gabriel Krisman Bertazi
2022-03-29  0:30       ` Ming Lei
2022-03-29 17:20         ` Gabriel Krisman Bertazi
2022-03-30  1:55           ` Ming Lei
2022-03-30 18:22             ` Gabriel Krisman Bertazi
2022-03-31  1:38               ` Ming Lei
2022-03-31  3:49                 ` Bart Van Assche
2022-04-08  6:52     ` Xiaoguang Wang
2022-04-08  7:44       ` Ming Lei
2022-02-23  5:57 ` Gao Xiang
2022-02-23  7:46   ` Damien Le Moal
2022-02-23  8:11     ` Gao Xiang
2022-02-23 22:40       ` Damien Le Moal
2022-02-24  0:58         ` Gao Xiang
2022-06-09  2:01           ` Ming Lei
2022-06-09  2:28             ` Gao Xiang
2022-06-09  4:06               ` Ming Lei
2022-06-09  4:55                 ` Gao Xiang
2022-06-10  1:52                   ` Ming Lei
2022-07-28  8:23                 ` Pavel Machek
2022-03-02 16:52 ` Mike Christie
2022-03-03  7:09   ` Hannes Reinecke
2022-03-14 17:04     ` Mike Christie
2022-03-15  6:45       ` Hannes Reinecke
2022-03-05  7:29 ` Dongsheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGdZY+SLWETvAxH6M+BhipB1KV=W+kS7cxFWgaiK=en4sqDPQ@mail.gmail.com' \
    --to=khazhy@google.com \
    --cc=hare@suse.de \
    --cc=krisman@collabora.com \
    --cc=linux-block@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).