linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bernd Schubert <bschubert@ddn.com>
To: linux-fsdevel@vger.kernel.org
Cc: Dharmendra Singh <dsingh@ddn.com>,
	Bernd Schubert <bschubert@ddn.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Amir Goldstein <amir73il@gmail.com>,
	fuse-devel@lists.sourceforge.net
Subject: [RFC PATCH 00/13] fuse uring communication
Date: Tue, 21 Mar 2023 02:10:34 +0100	[thread overview]
Message-ID: <20230321011047.3425786-1-bschubert@ddn.com> (raw)

This adds support for uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
appraoch was taken from ublk.  The patches are in RFC state -
I'm not sure about all decisions and some questions are marked
with XXX.

Userspace side has to send IOCTL(s) to configure ring queue(s)
and it has the choice to configure exactly one ring or one
ring per core. If there are use case we can also consider
to allow a different number of rings - the ioctl configuration
option is rather generic (number of queues).

Right now a queue lock is taken for any ring entry state change,
mostly to correctly handle unmount/daemon-stop. In fact,
correctly stopping the ring took most of the development
time - always new corner cases came up.
I had run dozens of xfstest cycles, 
versions I had once seen a warning about the ring start_stop
mutex being the wrong state - probably another stop issue,
but I have not been able to track it down yet. 
Regarding the queue lock - I still need to do profiling, but
my assumption is that it should not matter for the 
one-ring-per-core configuration. For the single ring config
option lock contention might come up, but I see this
configuration mostly for development only.
Adding more complexity and protecting ring entries with
their own locks can be done later.

Current code also keep the fuse request allocation, initially
I only had that for background requests when the ring queue
didn't have free entries anymore. The allocation is done
to reduce initial complexity, especially also for ring stop.
The allocation free mode can be added back later.

Right now always the ring queue of the submitting core
is used, especially for page cached background requests
we might consider later to also enqueue on other core queues
(when these are not busy, of course).

Splice/zero-copy is not supported yet, all requests go
through the shared memory queue entry buffer. I also
following splice and ublk/zc copy discussions, I will
look into these options in the next days/weeks.
To have that buffer allocated on the right numa node,
a vmalloc is done per ring queue and on the numa node
userspace daemon side asks for.
My assumption is that the mmap offset parameter will be
part of a debate and I'm curious what other think about
that appraoch. 

Benchmarking and tuning is on my agenda for the next
days. For now I only have xfstest results - most longer
running tests were running at about 2x, but somehow when
I cleaned up the patches for submission I lost that.
My development VM/kernel has all sanitizers enabled -
hard to profile what happened. Performance
results with profiling will be submitted in a few days.

The patches include a design document, which has a few more
details.

The corresponding libfuse patches are on my uring branch,
but need cleanup for submission - will happen during the next
days.
https://github.com/bsbernd/libfuse/tree/uring

If it should make review easier, patches posted here are on
this branch
https://github.com/bsbernd/linux/tree/fuse-uring-for-6.2


Bernd Schubert (13):
  fuse: Add uring data structures and documentation
  fuse: rename to fuse_dev_end_requests and make non-static
  fuse: Move fuse_get_dev to header file
  Add a vmalloc_node_user function
  fuse: Add a uring config ioctl and ring destruction
  fuse: Add an interval ring stop worker/monitor
  fuse: Add uring mmap method
  fuse: Move request bits
  fuse: Add wait stop ioctl support to the ring
  fuse: Handle SQEs - register commands
  fuse: Add support to copy from/to the ring buffer
  fuse: Add uring sqe commit and fetch support
  fuse: Allow to queue to the ring

 Documentation/filesystems/fuse-uring.rst |  179 +++
 fs/fuse/Makefile                         |    2 +-
 fs/fuse/dev.c                            |  193 +++-
 fs/fuse/dev_uring.c                      | 1292 ++++++++++++++++++++++
 fs/fuse/dev_uring_i.h                    |   23 +
 fs/fuse/fuse_dev_i.h                     |   62 ++
 fs/fuse/fuse_i.h                         |  178 +++
 fs/fuse/inode.c                          |   10 +
 include/linux/vmalloc.h                  |    1 +
 include/uapi/linux/fuse.h                |  131 +++
 mm/nommu.c                               |    6 +
 mm/vmalloc.c                             |   41 +-
 12 files changed, 2064 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/filesystems/fuse-uring.rst
 create mode 100644 fs/fuse/dev_uring.c
 create mode 100644 fs/fuse/dev_uring_i.h
 create mode 100644 fs/fuse/fuse_dev_i.h

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: linux-fsdevel@vger.kernel.org
cc: Amir Goldstein <amir73il@gmail.com>
cc: fuse-devel@lists.sourceforge.net

-- 
2.37.2


             reply	other threads:[~2023-03-21  1:26 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  1:10 Bernd Schubert [this message]
2023-03-21  1:10 ` [PATCH 01/13] fuse: Add uring data structures and documentation Bernd Schubert
2023-03-21  1:10 ` [PATCH 02/13] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2023-03-21  1:10 ` [PATCH 03/13] fuse: Move fuse_get_dev to header file Bernd Schubert
2023-03-21  1:10 ` [PATCH 04/13] Add a vmalloc_node_user function Bernd Schubert
2023-03-21 21:21   ` Andrew Morton
2023-03-21  1:10 ` [PATCH 05/13] fuse: Add a uring config ioctl and ring destruction Bernd Schubert
2023-03-21  1:10 ` [PATCH 06/13] fuse: Add an interval ring stop worker/monitor Bernd Schubert
2023-03-23 10:27   ` Miklos Szeredi
2023-03-23 11:04     ` Bernd Schubert
2023-03-23 12:35       ` Miklos Szeredi
2023-03-23 13:18         ` Bernd Schubert
2023-03-23 20:51           ` Bernd Schubert
2023-03-27 13:22             ` Pavel Begunkov
2023-03-27 14:02               ` Bernd Schubert
2023-03-23 13:26         ` Ming Lei
2023-03-21  1:10 ` [PATCH 07/13] fuse: Add uring mmap method Bernd Schubert
2023-03-21  1:10 ` [PATCH 08/13] fuse: Move request bits Bernd Schubert
2023-03-21  1:10 ` [PATCH 09/13] fuse: Add wait stop ioctl support to the ring Bernd Schubert
2023-03-21  1:10 ` [PATCH 10/13] fuse: Handle SQEs - register commands Bernd Schubert
2023-03-21  1:10 ` [PATCH 11/13] fuse: Add support to copy from/to the ring buffer Bernd Schubert
2023-03-21  1:10 ` [PATCH 12/13] fuse: Add uring sqe commit and fetch support Bernd Schubert
2023-03-21  1:10 ` [PATCH 13/13] fuse: Allow to queue to the ring Bernd Schubert
2023-03-21  1:26 ` [RFC PATCH 00/13] fuse uring communication Bernd Schubert
2023-03-21  9:35 ` Amir Goldstein
2023-03-23 11:18   ` Bernd Schubert
2023-03-23 11:55     ` Amir Goldstein
2023-06-07 14:20       ` Miklos Szeredi
2023-06-08 21:31         ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230321011047.3425786-1-bschubert@ddn.com \
    --to=bschubert@ddn.com \
    --cc=amir73il@gmail.com \
    --cc=dsingh@ddn.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).