linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd.schubert@fastmail.fm>
To: Amir Goldstein <amir73il@gmail.com>, Bernd Schubert <bschubert@ddn.com>
Cc: linux-fsdevel@vger.kernel.org, Dharmendra Singh <dsingh@ddn.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	fuse-devel@lists.sourceforge.net
Subject: Re: [RFC PATCH 00/13] fuse uring communication
Date: Thu, 23 Mar 2023 12:18:11 +0100	[thread overview]
Message-ID: <02f19f49-47f8-b1c5-224d-d7233b62bf32@fastmail.fm> (raw)
In-Reply-To: <CAOQ4uxjXZHr3DZUQVvcTisRy+HYNWSRWvzKDXuHP0w==QR8Yog@mail.gmail.com>



On 3/21/23 10:35, Amir Goldstein wrote:
> On Tue, Mar 21, 2023 at 3:11 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> This adds support for uring communication between kernel and
>> userspace daemon using opcode the IORING_OP_URING_CMD. The basic
>> appraoch was taken from ublk.  The patches are in RFC state -
>> I'm not sure about all decisions and some questions are marked
>> with XXX.
>>
>> Userspace side has to send IOCTL(s) to configure ring queue(s)
>> and it has the choice to configure exactly one ring or one
>> ring per core. If there are use case we can also consider
>> to allow a different number of rings - the ioctl configuration
>> option is rather generic (number of queues).
>>
>> Right now a queue lock is taken for any ring entry state change,
>> mostly to correctly handle unmount/daemon-stop. In fact,
>> correctly stopping the ring took most of the development
>> time - always new corner cases came up.
>> I had run dozens of xfstest cycles,
>> versions I had once seen a warning about the ring start_stop
>> mutex being the wrong state - probably another stop issue,
>> but I have not been able to track it down yet.
>> Regarding the queue lock - I still need to do profiling, but
>> my assumption is that it should not matter for the
>> one-ring-per-core configuration. For the single ring config
>> option lock contention might come up, but I see this
>> configuration mostly for development only.
>> Adding more complexity and protecting ring entries with
>> their own locks can be done later.
>>
>> Current code also keep the fuse request allocation, initially
>> I only had that for background requests when the ring queue
>> didn't have free entries anymore. The allocation is done
>> to reduce initial complexity, especially also for ring stop.
>> The allocation free mode can be added back later.
>>
>> Right now always the ring queue of the submitting core
>> is used, especially for page cached background requests
>> we might consider later to also enqueue on other core queues
>> (when these are not busy, of course).
>>
>> Splice/zero-copy is not supported yet, all requests go
>> through the shared memory queue entry buffer. I also
>> following splice and ublk/zc copy discussions, I will
>> look into these options in the next days/weeks.
>> To have that buffer allocated on the right numa node,
>> a vmalloc is done per ring queue and on the numa node
>> userspace daemon side asks for.
>> My assumption is that the mmap offset parameter will be
>> part of a debate and I'm curious what other think about
>> that appraoch.
>>
>> Benchmarking and tuning is on my agenda for the next
>> days. For now I only have xfstest results - most longer
>> running tests were running at about 2x, but somehow when
>> I cleaned up the patches for submission I lost that.
>> My development VM/kernel has all sanitizers enabled -
>> hard to profile what happened. Performance
>> results with profiling will be submitted in a few days.
> 
> When posting those benchmarks and with future RFC posting,
> it's would be useful for people reading this introduction for the
> first time, to explicitly state the motivation of your work, which
> can only be inferred from the mention of "benchmarks".
> 
> I think it would also be useful to link to prior work (ZUFS, fuse2)
> and mention the current FUSE performance issues related to
> context switches and cache line bouncing that was discussed in
> those threads.

Oh yes sure, entirely forgot to mention the motivation. Will do in the 
next patch round. You don't have these links by any chance? I know that 
there were several zufs threads, but I don't remember discussions about 
cache line - maybe I had missed it. I can try to read through the old 
threads, in case you don't have it.
Our own motivation for ring basically comes from atomic-open benchmarks, 
which gave totally confusing benchmark results in multi threaded libfuse 
mode - less requests caused lower IOPs - switching to single threaded 
then gave expect IOP increase. Part of it was due to a libfuse issue - 
persistent thread destruction/creation due to min_idle_threads, but 
another part can be explained with thread switching only. When I added 
(slight) spinning in fuse_dev_do_read(), the hard part/impossible part 
was to avoid letting multiple threads spin - even with a single threaded 
application creating/deleting files (like bonnie++), multiple libfuse 
threads started to spin for no good reason. So spinning resulted in a 
much improved latency, but high cpu usage, because multiple threads were 
spinning. I will add those explanations to the next patch set.

Thanks,
Bernd


  reply	other threads:[~2023-03-23 11:18 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  1:10 [RFC PATCH 00/13] fuse uring communication Bernd Schubert
2023-03-21  1:10 ` [PATCH 01/13] fuse: Add uring data structures and documentation Bernd Schubert
2023-03-21  1:10 ` [PATCH 02/13] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2023-03-21  1:10 ` [PATCH 03/13] fuse: Move fuse_get_dev to header file Bernd Schubert
2023-03-21  1:10 ` [PATCH 04/13] Add a vmalloc_node_user function Bernd Schubert
2023-03-21 21:21   ` Andrew Morton
2023-03-21  1:10 ` [PATCH 05/13] fuse: Add a uring config ioctl and ring destruction Bernd Schubert
2023-03-21  1:10 ` [PATCH 06/13] fuse: Add an interval ring stop worker/monitor Bernd Schubert
2023-03-23 10:27   ` Miklos Szeredi
2023-03-23 11:04     ` Bernd Schubert
2023-03-23 12:35       ` Miklos Szeredi
2023-03-23 13:18         ` Bernd Schubert
2023-03-23 20:51           ` Bernd Schubert
2023-03-27 13:22             ` Pavel Begunkov
2023-03-27 14:02               ` Bernd Schubert
2023-03-23 13:26         ` Ming Lei
2023-03-21  1:10 ` [PATCH 07/13] fuse: Add uring mmap method Bernd Schubert
2023-03-21  1:10 ` [PATCH 08/13] fuse: Move request bits Bernd Schubert
2023-03-21  1:10 ` [PATCH 09/13] fuse: Add wait stop ioctl support to the ring Bernd Schubert
2023-03-21  1:10 ` [PATCH 10/13] fuse: Handle SQEs - register commands Bernd Schubert
2023-03-21  1:10 ` [PATCH 11/13] fuse: Add support to copy from/to the ring buffer Bernd Schubert
2023-03-21  1:10 ` [PATCH 12/13] fuse: Add uring sqe commit and fetch support Bernd Schubert
2023-03-21  1:10 ` [PATCH 13/13] fuse: Allow to queue to the ring Bernd Schubert
2023-03-21  1:26 ` [RFC PATCH 00/13] fuse uring communication Bernd Schubert
2023-03-21  9:35 ` Amir Goldstein
2023-03-23 11:18   ` Bernd Schubert [this message]
2023-03-23 11:55     ` Amir Goldstein
2023-06-07 14:20       ` Miklos Szeredi
2023-06-08 21:31         ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02f19f49-47f8-b1c5-224d-d7233b62bf32@fastmail.fm \
    --to=bernd.schubert@fastmail.fm \
    --cc=amir73il@gmail.com \
    --cc=bschubert@ddn.com \
    --cc=dsingh@ddn.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).