All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org
Cc: Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization
Date: Tue, 10 Mar 2020 16:10:41 +0100	[thread overview]
Message-ID: <1651612.bE1MAdl4rA@silver> (raw)
In-Reply-To: <20200309164259.2affe4bc@bahia.home>

On Montag, 9. März 2020 16:42:59 CET Greg Kurz wrote:
> The main issue I see with this patch is that it is too big. At
> least from my POV, with the little time I have for 9p, that's
> the first thing that pushes me back... So if you could split
> this patch down to some reviewable chunks, this may help
> going forward.

This patch is also too big for my preference, but I don't see a viable way to 
split it further into separate patches. I already separated all the patches I 
could. If you have suggestions, very much appreciated!

The reason for this is that in order to fix these issues with current 
T_readdir implementation, it requires to separate what's currently one task 
(i.e. one function) into two separate tasks (i.e. two functions). There is no 
sensible way to do that.

But don't be too scared about this patch; if you look just at the diff 
directly then yes, the diff is not very human readable. However if you apply 
the patch and look at the resulting code, you'll notice that there are 
actually only 2 functions (v9fs_do_readdir() in 9p.c and do_readdir_lowlat() 
in codir.c) which require careful reviewing and that their resulting code is 
actually very, very straight forward, and not long either.

Current code on master is much more tricky and error prone due to the huge 
amount of potential branches, individual error/cleanup handlers, high amount 
of thread dispatching and much more. In the patched version all these code 
complexities and error sources are removed.

> > > diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c
> > > index 73f9a751e1..6ce7dc8cde 100644
> > > --- a/hw/9pfs/codir.c
> > > +++ b/hw/9pfs/codir.c
> > > @@ -18,28 +18,189 @@
> > > 
> > >  #include "qemu/main-loop.h"
> > >  #include "coth.h"
> > > 
> > > +/*
> > > + * This is solely executed on a background IO thread.
> > > + */
> > > +static int do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, struct dirent
> > > **dent) +{
> > > +    int err = 0;
> > > +    V9fsState *s = pdu->s;
> > > +    struct dirent *entry;
> > > +
> > > +    errno = 0;
> > > +    entry = s->ops->readdir(&s->ctx, &fidp->fs);
> > > +    if (!entry && errno) {
> > > +        *dent = NULL;
> > > +        err = -errno;
> > > +    } else {
> > > +        *dent = entry;
> > > +    }
> > > +    return err;
> > > +}
> > > +
> > > +/*
> > > + * TODO: This will be removed for performance reasons.
> > > + * Use v9fs_co_readdir_lowlat() instead.
> 
> Oh, so we'd still have the current implementation being used, even
> with this patch applied... This would be okay for a RFC patch but
> in the end I'd really like to merge something that also converts
> v9fs_do_readdir_with_stat().

Yes, I know, but I would not recommend mixing these things at this point, 
because it would be an entire effort on its own.

v9fs_do_readdir_with_stat() is used for 9P2000.u, while v9fs_do_readdir() is 
used for 9P2000.L. They're behaving very differently, so it would not only 
require me to update v9fs_do_readdir_with_stat() and v9fs_read(), I would also 
need to write their own test cases (plural, since there are none at all yet) 
and benchmarks, and of course somebody what need to review all that additional 
amount of code, and who would actually test it? I am actively testing my 
9P2000.L changes, but I am not actually using 9P2000.u, so I could not 
guarantee for the same quality.

Considering the current commitment situation regarding 9pfs, we can only bring 
things forward with compromises. Hence I suggest I concentrate on fixing the 
worst performance issues on 9P2000.L side first, and later on if there are 
really people interested in seeing these improvements on 9P2000.u side and 
willing to at least help out with testing, then I can definitely also adjust 
9P2000.u side accordingly as well.

But to also make this clear: this patch 10 is not introducing any behaviour 
change for 9P2000.u side.

> > >  int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
> > >  
> > >                                   struct dirent **dent)
> > >  
> > >  {
> > >  
> > >      int err;
> > > 
> > > -    V9fsState *s = pdu->s;
> > > 
> > >      if (v9fs_request_cancelled(pdu)) {
> > >      
> > >          return -EINTR;
> > >      
> > >      }
> > > 
> > > -    v9fs_co_run_in_worker(
> > > -        {
> > > -            struct dirent *entry;
> > > +    v9fs_co_run_in_worker({
> > > +        err = do_readdir(pdu, fidp, dent);
> > > +    });
> > > +    return err;
> > > +}
> > > +
> > > +/*
> > > + * This is solely executed on a background IO thread.
> > > + *
> > > + * See v9fs_co_readdir_lowlat() (as its only user) below for details.
> 
> Since the only user is quite small, maybe just open-code this.

Mmm... maybe later on, as a cleanup patch or something. Current version is 
tested. I would like to avoid to make things more complicated than they 
already are (i.e. accidental introduction of some bug just due to this).

> 
> > > + */
> > > +static int do_readdir_lowlat(V9fsPDU *pdu, V9fsFidState *fidp,
> > > +                             struct V9fsDirEnt **entries,
> > > +                             int32_t maxsize, bool dostat)
> > > +{
> > > +    V9fsState *s = pdu->s;
> > > +    V9fsString name;
> > > +    int len, err = 0;
> > > +    int32_t size = 0;
> > > +    off_t saved_dir_pos;
> > > +    struct dirent *dent;
> > > +    struct V9fsDirEnt *e = NULL;
> > > +    V9fsPath path;
> > > +    struct stat stbuf;
> > > +
> > > +    *entries = NULL;
> > > +    v9fs_path_init(&path);
> > > +
> > > +    /*
> > > +     * TODO: Here should be a warn_report_once() if lock failed.
> > > +     *
> > > +     * With a good 9p client we should not get into concurrency here,
> > > +     * because a good client would not use the same fid for concurrent
> > > +     * requests. We do the lock here for safety reasons though. However
> > > +     * the client would then suffer performance issues, so better log
> > > that
> > > +     * issue here.
> > > +     */
> 
> This should probably be done with qemu_mutex_trylock() in a loop directly
> in v9fs_readdir_lock().

No definitely not in the loop. That's intentionally *one* lock *outside* of 
the loop. The normal case is not requiring a lock at all, like the comment 
describes. Doing multiple locks inside the loop unnecessaririly reduces 
performance for no reason.

About *_trylock() instead of v9fs_readdir_lock(): might be a candidate to 
address the TODO comment, but I would like to pospone that for the same 
reasons: it is an odd/ill case encountering concurrency here. So for all well 
implemented clients this is not an issue at all, and security (concerning 
malicious clients) is provided already by this lock. Addressing this TODO 
already now would potentially unnecessarily introduce bugs due to added 
complexity, so really I would postpone that.

> > > +/**
> > > + * @brief Low latency variant of fs driver readdir handling.
> > > + *
> 
> Not sure it brings much to mention latency here... telling what this
> function does actually ie. "read multiple directory entries in a row"
> would be more informative.

NP

> > > + * Retrieves the requested (max. amount of) directory entries from the
> > > fs
> > > + * driver. This function must only be called by the main IO thread (top
> > > half). + * Internally this function call will be dispatched to a
> > > background
> > > IO thread + * (bottom half) where it is eventually executed by the fs
> > > driver. + *
> > > + * The old readdir implementation above just retrieves always one dir
> > > entry + * per call. The problem of that implementation above is that
> > > latency is + * added for (retrieval of) each directory entry, which in
> > > practice lead to + * latencies of several hundred ms for readdir of
> > > only one directory. + * + * This is avoided in this function by letting
> > > the fs driver retrieve all + * required directory entries with only
> > > call of this function and hence with + * only a single fs driver
> > > request.
> 
> True but these kind of considerations rather belong to the changelog...
> otherwise, we'll have to not forget to kill these lines when the "old
> readdir implementation" is no more.

Mmm... I do think this overall latency issue should be clarified somewhere as 
a comment. Where exactly is not that important to me. For instance it could 
also be described on v9fs_co_run_in_worker() macro definition in coth.h 
instead, as general rule of thumb when dispatching stuff.

The thing is: for >10 years several developers obviously did not realize the 
severe negative performance impact of frivolously dispatching tasks to 
different threads too often. It looks like linear code, right? But no, it is 
not.

> > > + *
> > > + * @param pdu - the causing 9p (T_readdir) client request
> > > + * @param fidp - already opened directory where readdir shall be
> > > performed
> > > on + * @param entries - output for directory entries (must not be NULL)
> > > + *
> > > @param maxsize - maximum result message body size (in bytes)
> > > + * @param dostat - whether a stat() should be performed and returned
> > > for
> > > + *                 each directory entry
> > > + * @returns resulting response message body size (in bytes) on success,
> > > + *          negative error code otherwise
> > > + */
> > > +int coroutine_fn v9fs_co_readdir_lowlat(V9fsPDU *pdu, V9fsFidState
> > > *fidp,
> 
> Not really a fan of the _lowlat (low fat ?) wording.
> 
> v9fs_co_readdir_many() or anything else that suggests we're going to
> read several entries in a row.

NP

> 
> > > +                                        struct V9fsDirEnt **entries,
> > > +                                        int32_t maxsize, bool dostat)
> 
> s/maxsize/max_count since this is the wording use in the caller.

Wouldn't do that. This is the driver side, not the 9p protocol request 
handler. As you might have relized before, "max_count" is semantically 
completely wrong. This variable does not count integral entries, it is rather 
a maximum size (in bytes) of the destination buffer being filled.

On 9p request handler side it is ok to use "max_count" instead, because the 
protocol definition uses exactly this term for the request parameter, but on 
driver side it's IMO inappropriate.

Best regards,
Christian Schoenebeck




  reply	other threads:[~2020-03-10 15:13 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-21  0:36 [PATCH v4 00/11] 9pfs: readdir optimization Christian Schoenebeck
2020-01-20 22:29 ` [PATCH v4 01/11] tests/virtio-9p: add terminating null in v9fs_string_read() Christian Schoenebeck
2020-01-20 22:47 ` [PATCH v4 02/11] 9pfs: require msize >= 4096 Christian Schoenebeck
2020-01-20 23:50 ` [PATCH v4 03/11] 9pfs: validate count sent by client with T_readdir Christian Schoenebeck
2020-01-22 14:11   ` Greg Kurz
2020-01-22 14:26     ` Christian Schoenebeck
2020-01-21  0:01 ` [PATCH v4 04/11] hw/9pfs/9p-synth: added directory for readdir test Christian Schoenebeck
2020-01-21  0:12 ` [PATCH v4 05/11] tests/virtio-9p: added " Christian Schoenebeck
2020-01-22 19:56   ` Greg Kurz
2020-01-21  0:16 ` [PATCH v4 06/11] tests/virtio-9p: added splitted " Christian Schoenebeck
2020-01-22 21:14   ` Eric Blake
2020-01-22 21:29     ` Christian Schoenebeck
2020-01-23  6:59       ` Greg Kurz
2020-01-22 21:19   ` Greg Kurz
2020-01-22 22:36     ` Christian Schoenebeck
2020-01-23 10:30       ` Greg Kurz
2020-01-23 13:07         ` Christian Schoenebeck
2020-01-21  0:17 ` [PATCH v4 07/11] tests/virtio-9p: failing " Christian Schoenebeck
2020-01-22 22:59   ` Greg Kurz
2020-01-23 11:36     ` Christian Schoenebeck
2020-01-23 12:08       ` Greg Kurz
2020-01-21  0:23 ` [PATCH v4 08/11] 9pfs: readdir benchmark Christian Schoenebeck
2020-01-23 10:34   ` Greg Kurz
2020-01-23 13:20     ` Christian Schoenebeck
2020-01-21  0:26 ` [PATCH v4 09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir() Christian Schoenebeck
2020-01-23 11:13   ` Greg Kurz
2020-01-23 12:40     ` Christian Schoenebeck
2020-01-21  0:30 ` [PATCH v4 10/11] 9pfs: T_readdir latency optimization Christian Schoenebeck
2020-01-23 11:33   ` Greg Kurz
2020-01-23 12:57     ` Christian Schoenebeck
2020-03-09 14:09   ` Christian Schoenebeck
2020-03-09 15:42     ` Greg Kurz
2020-03-10 15:10       ` Christian Schoenebeck [this message]
2020-03-10 18:33         ` Greg Kurz
2020-03-11  1:18           ` Christian Schoenebeck
     [not found]             ` <20200311171408.3b3a2dfa@bahia.home>
2020-03-11 19:54               ` Christian Schoenebeck
2020-03-17 14:14                 ` Greg Kurz
2020-03-17 16:09                   ` Christian Schoenebeck
2020-01-21  0:32 ` [PATCH v4 11/11] hw/9pfs/9p.c: benchmark time on T_readdir request Christian Schoenebeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1651612.bE1MAdl4rA@silver \
    --to=qemu_oss@crudebyte.com \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.