All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Christian Schoenebeck <qemu_oss@crudebyte.com>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	qemu-block@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>, "Amit Shah" <amit@kernel.org>,
	"David Hildenbrand" <david@redhat.com>,
	qemu-devel@nongnu.org, "Greg Kurz" <groug@kaod.org>,
	virtio-fs@redhat.com, "Eric Auger" <eric.auger@redhat.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Fam Zheng" <fam@euphon.net>,
	"Raphael Norwitz" <raphael.norwitz@nutanix.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
Date: Wed, 10 Nov 2021 10:05:50 +0000	[thread overview]
Message-ID: <YYuZfkfbxcX0JDRN@stefanha-x1.localdomain> (raw)
In-Reply-To: <25571471.tMsSMU6axZ@silver>

[-- Attachment #1: Type: text/plain, Size: 43910 bytes --]

On Tue, Nov 09, 2021 at 02:09:59PM +0100, Christian Schoenebeck wrote:
> On Dienstag, 9. November 2021 11:56:35 CET Stefan Hajnoczi wrote:
> > On Thu, Nov 04, 2021 at 03:41:23PM +0100, Christian Schoenebeck wrote:
> > > On Mittwoch, 3. November 2021 12:33:33 CET Stefan Hajnoczi wrote:
> > > > On Mon, Nov 01, 2021 at 09:29:26PM +0100, Christian Schoenebeck wrote:
> > > > > On Donnerstag, 28. Oktober 2021 11:00:48 CET Stefan Hajnoczi wrote:
> > > > > > On Mon, Oct 25, 2021 at 05:03:25PM +0200, Christian Schoenebeck 
> wrote:
> > > > > > > On Montag, 25. Oktober 2021 12:30:41 CEST Stefan Hajnoczi wrote:
> > > > > > > > On Thu, Oct 21, 2021 at 05:39:28PM +0200, Christian Schoenebeck 
> wrote:
> > > > > > > > > On Freitag, 8. Oktober 2021 18:08:48 CEST Christian 
> Schoenebeck wrote:
> > > > > > > > > > On Freitag, 8. Oktober 2021 16:24:42 CEST Christian 
> Schoenebeck wrote:
> > > > > > > > > > > On Freitag, 8. Oktober 2021 09:25:33 CEST Greg Kurz wrote:
> > > > > > > > > > > > On Thu, 7 Oct 2021 16:42:49 +0100
> > > > > > > > > > > > 
> > > > > > > > > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > > > > > > > > > On Thu, Oct 07, 2021 at 02:51:55PM +0200, Christian 
> Schoenebeck wrote:
> > > > > > > > > > > > > > On Donnerstag, 7. Oktober 2021 07:23:59 CEST Stefan 
> Hajnoczi wrote:
> > > > > > > > > > > > > > > On Mon, Oct 04, 2021 at 09:38:00PM +0200,
> > > > > > > > > > > > > > > Christian
> > > > > > > > > > > > > > > Schoenebeck
> > > > > > > > > > 
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > At the moment the maximum transfer size with
> > > > > > > > > > > > > > > > virtio
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > limited
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > 4M
> > > > > > > > > > > > > > > > (1024 * PAGE_SIZE). This series raises this
> > > > > > > > > > > > > > > > limit to
> > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > maximum
> > > > > > > > > > > > > > > > theoretical possible transfer size of 128M (32k
> > > > > > > > > > > > > > > > pages)
> > > > > > > > > > > > > > > > according
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > virtio specs:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/c
> > > > > > > > > > > > > > > > s01/
> > > > > > > > > > > > > > > > virt
> > > > > > > > > > > > > > > > io-v
> > > > > > > > > > > > > > > > 1.1-
> > > > > > > > > > > > > > > > cs
> > > > > > > > > > > > > > > > 01
> > > > > > > > > > > > > > > > .html#
> > > > > > > > > > > > > > > > x1-240006
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Hi Christian,
> > > > > > > > > > > > 
> > > > > > > > > > > > > > > I took a quick look at the code:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks Stefan for sharing virtio expertise and helping
> > > > > > > > > > > > Christian
> > > > > > > > > > > > !
> > > > > > > > > > > > 
> > > > > > > > > > > > > > > - The Linux 9p driver restricts descriptor chains
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > 128
> > > > > > > > > > > > > > > elements
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > >   (net/9p/trans_virtio.c:VIRTQUEUE_NUM)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Yes, that's the limitation that I am about to remove
> > > > > > > > > > > > > > (WIP);
> > > > > > > > > > > > > > current
> > > > > > > > > > > > > > kernel
> > > > > > > > > > > > > > patches:
> > > > > > > > > > > > > > https://lore.kernel.org/netdev/cover.1632327421.git.
> > > > > > > > > > > > > > linu
> > > > > > > > > > > > > > x_os
> > > > > > > > > > > > > > s@cr
> > > > > > > > > > > > > > udeb
> > > > > > > > > > > > > > yt
> > > > > > > > > > > > > > e.
> > > > > > > > > > > > > > com/>
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I haven't read the patches yet but I'm concerned that
> > > > > > > > > > > > > today
> > > > > > > > > > > > > the
> > > > > > > > > > > > > driver
> > > > > > > > > > > > > is pretty well-behaved and this new patch series
> > > > > > > > > > > > > introduces a
> > > > > > > > > > > > > spec
> > > > > > > > > > > > > violation. Not fixing existing spec violations is
> > > > > > > > > > > > > okay,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > adding
> > > > > > > > > > > > > new
> > > > > > > > > > > > > ones is a red flag. I think we need to figure out a
> > > > > > > > > > > > > clean
> > > > > > > > > > > > > solution.
> > > > > > > > > > > 
> > > > > > > > > > > Nobody has reviewed the kernel patches yet. My main
> > > > > > > > > > > concern
> > > > > > > > > > > therefore
> > > > > > > > > > > actually is that the kernel patches are already too
> > > > > > > > > > > complex,
> > > > > > > > > > > because
> > > > > > > > > > > the
> > > > > > > > > > > current situation is that only Dominique is handling 9p
> > > > > > > > > > > patches on
> > > > > > > > > > > kernel
> > > > > > > > > > > side, and he barely has time for 9p anymore.
> > > > > > > > > > > 
> > > > > > > > > > > Another reason for me to catch up on reading current
> > > > > > > > > > > kernel
> > > > > > > > > > > code
> > > > > > > > > > > and
> > > > > > > > > > > stepping in as reviewer of 9p on kernel side ASAP,
> > > > > > > > > > > independent
> > > > > > > > > > > of
> > > > > > > > > > > this
> > > > > > > > > > > issue.
> > > > > > > > > > > 
> > > > > > > > > > > As for current kernel patches' complexity: I can certainly
> > > > > > > > > > > drop
> > > > > > > > > > > patch
> > > > > > > > > > > 7
> > > > > > > > > > > entirely as it is probably just overkill. Patch 4 is then
> > > > > > > > > > > the
> > > > > > > > > > > biggest
> > > > > > > > > > > chunk, I have to see if I can simplify it, and whether it
> > > > > > > > > > > would
> > > > > > > > > > > make
> > > > > > > > > > > sense to squash with patch 3.
> > > > > > > > > > > 
> > > > > > > > > > > > > > > - The QEMU 9pfs code passes iovecs directly to
> > > > > > > > > > > > > > > preadv(2)
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > >   with EINVAL when called with more than IOV_MAX
> > > > > > > > > > > > > > >   iovecs
> > > > > > > > > > > > > > >   (hw/9pfs/9p.c:v9fs_read())
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hmm, which makes me wonder why I never encountered
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > error
> > > > > > > > > > > > > > during
> > > > > > > > > > > > > > testing.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Most people will use the 9p qemu 'local' fs driver
> > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > practice,
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > that v9fs_read() call would translate for most
> > > > > > > > > > > > > > people to
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > implementation on QEMU side (hw/9p/9p-local.c):
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > static ssize_t local_preadv(FsContext *ctx,
> > > > > > > > > > > > > > V9fsFidOpenState
> > > > > > > > > > > > > > *fs,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >                             const struct iovec *iov,
> > > > > > > > > > > > > >                             int iovcnt, off_t
> > > > > > > > > > > > > >                             offset)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > #ifdef CONFIG_PREADV
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     return preadv(fs->fd, iov, iovcnt, offset);
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > #else
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     int err = lseek(fs->fd, offset, SEEK_SET);
> > > > > > > > > > > > > >     if (err == -1) {
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >     } else {
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >         return readv(fs->fd, iov, iovcnt);
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >     }
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > #endif
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Unless I misunderstood the code, neither side can
> > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > advantage
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new 32k descriptor chain limit?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Stefan
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I need to check that when I have some more time. One
> > > > > > > > > > > > > > possible
> > > > > > > > > > > > > > explanation
> > > > > > > > > > > > > > might be that preadv() already has this wrapped into
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > loop
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > its
> > > > > > > > > > > > > > implementation to circumvent a limit like IOV_MAX.
> > > > > > > > > > > > > > It
> > > > > > > > > > > > > > might
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > another
> > > > > > > > > > > > > > "it
> > > > > > > > > > > > > > works, but not portable" issue, but not sure.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > There are still a bunch of other issues I have to
> > > > > > > > > > > > > > resolve.
> > > > > > > > > > > > > > If
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > look
> > > > > > > > > > > > > > at
> > > > > > > > > > > > > > net/9p/client.c on kernel side, you'll notice that
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > basically
> > > > > > > > > > > > > > does
> > > > > > > > > > > > > > this ATM> >
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     kmalloc(msize);
> > > > > > > > > > > > 
> > > > > > > > > > > > Note that this is done twice : once for the T message
> > > > > > > > > > > > (client
> > > > > > > > > > > > request)
> > > > > > > > > > > > and
> > > > > > > > > > > > once for the R message (server answer). The 9p driver
> > > > > > > > > > > > could
> > > > > > > > > > > > adjust
> > > > > > > > > > > > the
> > > > > > > > > > > > size
> > > > > > > > > > > > of the T message to what's really needed instead of
> > > > > > > > > > > > allocating
> > > > > > > > > > > > the
> > > > > > > > > > > > full
> > > > > > > > > > > > msize. R message size is not known though.
> > > > > > > > > > > 
> > > > > > > > > > > Would it make sense adding a second virtio ring, dedicated
> > > > > > > > > > > to
> > > > > > > > > > > server
> > > > > > > > > > > responses to solve this? IIRC 9p server already calculates
> > > > > > > > > > > appropriate
> > > > > > > > > > > exact sizes for each response type. So server could just
> > > > > > > > > > > push
> > > > > > > > > > > space
> > > > > > > > > > > that's
> > > > > > > > > > > really needed for its responses.
> > > > > > > > > > > 
> > > > > > > > > > > > > > for every 9p request. So not only does it allocate
> > > > > > > > > > > > > > much
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > every request than actually required (i.e. say 9pfs
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > mounted
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > msize=8M, then a 9p request that actually would just
> > > > > > > > > > > > > > need 1k
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > nevertheless allocate 8M), but also it allocates >
> > > > > > > > > > > > > > PAGE_SIZE,
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > obviously may fail at any time.>
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The PAGE_SIZE limitation sounds like a kmalloc() vs
> > > > > > > > > > > > > vmalloc()
> > > > > > > > > > > > > situation.
> > > > > > > > > > > 
> > > > > > > > > > > Hu, I didn't even consider vmalloc(). I just tried the
> > > > > > > > > > > kvmalloc()
> > > > > > > > > > > wrapper
> > > > > > > > > > > as a quick & dirty test, but it crashed in the same way as
> > > > > > > > > > > kmalloc()
> > > > > > > > > > > with
> > > > > > > > > > > large msize values immediately on mounting:
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/net/9p/client.c b/net/9p/client.c
> > > > > > > > > > > index a75034fa249b..cfe300a4b6ca 100644
> > > > > > > > > > > --- a/net/9p/client.c
> > > > > > > > > > > +++ b/net/9p/client.c
> > > > > > > > > > > @@ -227,15 +227,18 @@ static int parse_opts(char *opts,
> > > > > > > > > > > struct
> > > > > > > > > > > p9_client
> > > > > > > > > > > *clnt)
> > > > > > > > > > > 
> > > > > > > > > > >  static int p9_fcall_init(struct p9_client *c, struct
> > > > > > > > > > >  p9_fcall
> > > > > > > > > > >  *fc,
> > > > > > > > > > >  
> > > > > > > > > > >                          int alloc_msize)
> > > > > > > > > > >  
> > > > > > > > > > >  {
> > > > > > > > > > > 
> > > > > > > > > > > -       if (likely(c->fcall_cache) && alloc_msize ==
> > > > > > > > > > > c->msize)
> > > > > > > > > > > {
> > > > > > > > > > > +       //if (likely(c->fcall_cache) && alloc_msize ==
> > > > > > > > > > > c->msize) {
> > > > > > > > > > > +       if (false) {
> > > > > > > > > > > 
> > > > > > > > > > >                 fc->sdata =
> > > > > > > > > > >                 kmem_cache_alloc(c->fcall_cache,
> > > > > > > > > > >                 GFP_NOFS);
> > > > > > > > > > >                 fc->cache = c->fcall_cache;
> > > > > > > > > > >         
> > > > > > > > > > >         } else {
> > > > > > > > > > > 
> > > > > > > > > > > -               fc->sdata = kmalloc(alloc_msize,
> > > > > > > > > > > GFP_NOFS);
> > > > > > > > > > > +               fc->sdata = kvmalloc(alloc_msize,
> > > > > > > > > > > GFP_NOFS);
> > > > > > > > > > 
> > > > > > > > > > Ok, GFP_NOFS -> GFP_KERNEL did the trick.
> > > > > > > > > > 
> > > > > > > > > > Now I get:
> > > > > > > > > >    virtio: bogus descriptor or out of resources
> > > > > > > > > > 
> > > > > > > > > > So, still some work ahead on both ends.
> > > > > > > > > 
> > > > > > > > > Few hacks later (only changes on 9p client side) I got this
> > > > > > > > > running
> > > > > > > > > stable
> > > > > > > > > now. The reason for the virtio error above was that kvmalloc()
> > > > > > > > > returns
> > > > > > > > > a
> > > > > > > > > non-logical kernel address for any kvmalloc(>4M), i.e. an
> > > > > > > > > address
> > > > > > > > > that
> > > > > > > > > is
> > > > > > > > > inaccessible from host side, hence that "bogus descriptor"
> > > > > > > > > message
> > > > > > > > > by
> > > > > > > > > QEMU.
> > > > > > > > > So I had to split those linear 9p client buffers into sparse
> > > > > > > > > ones
> > > > > > > > > (set
> > > > > > > > > of
> > > > > > > > > individual pages).
> > > > > > > > > 
> > > > > > > > > I tested this for some days with various virtio transmission
> > > > > > > > > sizes
> > > > > > > > > and
> > > > > > > > > it
> > > > > > > > > works as expected up to 128 MB (more precisely: 128 MB read
> > > > > > > > > space
> > > > > > > > > +
> > > > > > > > > 128 MB
> > > > > > > > > write space per virtio round trip message).
> > > > > > > > > 
> > > > > > > > > I did not encounter a show stopper for large virtio
> > > > > > > > > transmission
> > > > > > > > > sizes
> > > > > > > > > (4 MB ... 128 MB) on virtio level, neither as a result of
> > > > > > > > > testing,
> > > > > > > > > nor
> > > > > > > > > after reviewing the existing code.
> > > > > > > > > 
> > > > > > > > > About IOV_MAX: that's apparently not an issue on virtio level.
> > > > > > > > > Most of
> > > > > > > > > the
> > > > > > > > > iovec code, both on Linux kernel side and on QEMU side do not
> > > > > > > > > have
> > > > > > > > > this
> > > > > > > > > limitation. It is apparently however indeed a limitation for
> > > > > > > > > userland
> > > > > > > > > apps
> > > > > > > > > calling the Linux kernel's syscalls yet.
> > > > > > > > > 
> > > > > > > > > Stefan, as it stands now, I am even more convinced that the
> > > > > > > > > upper
> > > > > > > > > virtio
> > > > > > > > > transmission size limit should not be squeezed into the queue
> > > > > > > > > size
> > > > > > > > > argument of virtio_add_queue(). Not because of the previous
> > > > > > > > > argument
> > > > > > > > > that
> > > > > > > > > it would waste space (~1MB), but rather because they are two
> > > > > > > > > different
> > > > > > > > > things. To outline this, just a quick recap of what happens
> > > > > > > > > exactly
> > > > > > > > > when
> > > > > > > > > a bulk message is pushed over the virtio wire (assuming virtio
> > > > > > > > > "split"
> > > > > > > > > layout here):
> > > > > > > > > 
> > > > > > > > > ---------- [recap-start] ----------
> > > > > > > > > 
> > > > > > > > > For each bulk message sent guest <-> host, exactly *one* of
> > > > > > > > > the
> > > > > > > > > pre-allocated descriptors is taken and placed (subsequently)
> > > > > > > > > into
> > > > > > > > > exactly
> > > > > > > > > *one* position of the two available/used ring buffers. The
> > > > > > > > > actual
> > > > > > > > > descriptor table though, containing all the DMA addresses of
> > > > > > > > > the
> > > > > > > > > message
> > > > > > > > > bulk data, is allocated just in time for each round trip
> > > > > > > > > message.
> > > > > > > > > Say,
> > > > > > > > > it
> > > > > > > > > is the first message sent, it yields in the following
> > > > > > > > > structure:
> > > > > > > > > 
> > > > > > > > > Ring Buffer   Descriptor Table      Bulk Data Pages
> > > > > > > > > 
> > > > > > > > >    +-+              +-+           +-----------------+
> > > > > > > > >    
> > > > > > > > >    |D|------------->|d|---------->| Bulk data block |
> > > > > > > > >    
> > > > > > > > >    +-+              |d|--------+  +-----------------+
> > > > > > > > >    
> > > > > > > > >    | |              |d|------+ |
> > > > > > > > >    
> > > > > > > > >    +-+               .       | |  +-----------------+
> > > > > > > > >    
> > > > > > > > >    | |               .       | +->| Bulk data block |
> > > > > > > > >     
> > > > > > > > >     .                .       |    +-----------------+
> > > > > > > > >     .               |d|-+    |
> > > > > > > > >     .               +-+ |    |    +-----------------+
> > > > > > > > >     
> > > > > > > > >    | |                  |    +--->| Bulk data block |
> > > > > > > > >    
> > > > > > > > >    +-+                  |         +-----------------+
> > > > > > > > >    
> > > > > > > > >    | |                  |                 .
> > > > > > > > >    
> > > > > > > > >    +-+                  |                 .
> > > > > > > > >    
> > > > > > > > >                         |                 .
> > > > > > > > >                         |         
> > > > > > > > >                         |         +-----------------+
> > > > > > > > >                         
> > > > > > > > >                         +-------->| Bulk data block |
> > > > > > > > >                         
> > > > > > > > >                                   +-----------------+
> > > > > > > > > 
> > > > > > > > > Legend:
> > > > > > > > > D: pre-allocated descriptor
> > > > > > > > > d: just in time allocated descriptor
> > > > > > > > > -->: memory pointer (DMA)
> > > > > > > > > 
> > > > > > > > > The bulk data blocks are allocated by the respective device
> > > > > > > > > driver
> > > > > > > > > above
> > > > > > > > > virtio subsystem level (guest side).
> > > > > > > > > 
> > > > > > > > > There are exactly as many descriptors pre-allocated (D) as the
> > > > > > > > > size of
> > > > > > > > > a
> > > > > > > > > ring buffer.
> > > > > > > > > 
> > > > > > > > > A "descriptor" is more or less just a chainable DMA memory
> > > > > > > > > pointer;
> > > > > > > > > defined
> > > > > > > > > as:
> > > > > > > > > 
> > > > > > > > > /* Virtio ring descriptors: 16 bytes.  These can chain
> > > > > > > > > together
> > > > > > > > > via
> > > > > > > > > "next". */ struct vring_desc {
> > > > > > > > > 
> > > > > > > > > 	/* Address (guest-physical). */
> > > > > > > > > 	__virtio64 addr;
> > > > > > > > > 	/* Length. */
> > > > > > > > > 	__virtio32 len;
> > > > > > > > > 	/* The flags as indicated above. */
> > > > > > > > > 	__virtio16 flags;
> > > > > > > > > 	/* We chain unused descriptors via this, too */
> > > > > > > > > 	__virtio16 next;
> > > > > > > > > 
> > > > > > > > > };
> > > > > > > > > 
> > > > > > > > > There are 2 ring buffers; the "available" ring buffer is for
> > > > > > > > > sending a
> > > > > > > > > message guest->host (which will transmit DMA addresses of
> > > > > > > > > guest
> > > > > > > > > allocated
> > > > > > > > > bulk data blocks that are used for data sent to device, and
> > > > > > > > > separate
> > > > > > > > > guest allocated bulk data blocks that will be used by host
> > > > > > > > > side to
> > > > > > > > > place
> > > > > > > > > its response bulk data), and the "used" ring buffer is for
> > > > > > > > > sending
> > > > > > > > > host->guest to let guest know about host's response and that
> > > > > > > > > it
> > > > > > > > > could
> > > > > > > > > now
> > > > > > > > > safely consume and then deallocate the bulk data blocks
> > > > > > > > > subsequently.
> > > > > > > > > 
> > > > > > > > > ---------- [recap-end] ----------
> > > > > > > > > 
> > > > > > > > > So the "queue size" actually defines the ringbuffer size. It
> > > > > > > > > does
> > > > > > > > > not
> > > > > > > > > define the maximum amount of descriptors. The "queue size"
> > > > > > > > > rather
> > > > > > > > > defines
> > > > > > > > > how many pending messages can be pushed into either one
> > > > > > > > > ringbuffer
> > > > > > > > > before
> > > > > > > > > the other side would need to wait until the counter side would
> > > > > > > > > step up
> > > > > > > > > (i.e. ring buffer full).
> > > > > > > > > 
> > > > > > > > > The maximum amount of descriptors (what VIRTQUEUE_MAX_SIZE
> > > > > > > > > actually
> > > > > > > > > is)
> > > > > > > > > OTOH defines the max. bulk data size that could be transmitted
> > > > > > > > > with
> > > > > > > > > each
> > > > > > > > > virtio round trip message.
> > > > > > > > > 
> > > > > > > > > And in fact, 9p currently handles the virtio "queue size" as
> > > > > > > > > directly
> > > > > > > > > associative with its maximum amount of active 9p requests the
> > > > > > > > > server
> > > > > > > > > could
> > > > > > > > > 
> > > > > > > > > handle simultaniously:
> > > > > > > > >   hw/9pfs/9p.h:#define MAX_REQ         128
> > > > > > > > >   hw/9pfs/9p.h:    V9fsPDU pdus[MAX_REQ];
> > > > > > > > >   hw/9pfs/virtio-9p-device.c:    v->vq =
> > > > > > > > >   virtio_add_queue(vdev,
> > > > > > > > >   MAX_REQ,
> > > > > > > > >   
> > > > > > > > >                                  handle_9p_output);
> > > > > > > > > 
> > > > > > > > > So if I would change it like this, just for the purpose to
> > > > > > > > > increase
> > > > > > > > > the
> > > > > > > > > max. virtio transmission size:
> > > > > > > > > 
> > > > > > > > > --- a/hw/9pfs/virtio-9p-device.c
> > > > > > > > > +++ b/hw/9pfs/virtio-9p-device.c
> > > > > > > > > @@ -218,7 +218,7 @@ static void
> > > > > > > > > virtio_9p_device_realize(DeviceState
> > > > > > > > > *dev,
> > > > > > > > > Error **errp)>
> > > > > > > > > 
> > > > > > > > >      v->config_size = sizeof(struct virtio_9p_config) +
> > > > > > > > >      strlen(s->fsconf.tag);
> > > > > > > > >      virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P,
> > > > > > > > >      v->config_size,
> > > > > > > > >      
> > > > > > > > >                  VIRTQUEUE_MAX_SIZE);
> > > > > > > > > 
> > > > > > > > > -    v->vq = virtio_add_queue(vdev, MAX_REQ,
> > > > > > > > > handle_9p_output);
> > > > > > > > > +    v->vq = virtio_add_queue(vdev, 32*1024,
> > > > > > > > > handle_9p_output);
> > > > > > > > > 
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > > Then it would require additional synchronization code on both
> > > > > > > > > ends
> > > > > > > > > and
> > > > > > > > > therefore unnecessary complexity, because it would now be
> > > > > > > > > possible
> > > > > > > > > that
> > > > > > > > > more requests are pushed into the ringbuffer than server could
> > > > > > > > > handle.
> > > > > > > > > 
> > > > > > > > > There is one potential issue though that probably did justify
> > > > > > > > > the
> > > > > > > > > "don't
> > > > > > > > > exceed the queue size" rule:
> > > > > > > > > 
> > > > > > > > > ATM the descriptor table is allocated (just in time) as *one*
> > > > > > > > > continuous
> > > > > > > > > buffer via kmalloc_array():
> > > > > > > > > https://github.com/torvalds/linux/blob/2f111a6fd5b5297b4e92f53
> > > > > > > > > 798c
> > > > > > > > > a086
> > > > > > > > > f7c7
> > > > > > > > > d33a4/drivers/virtio/virtio_ring.c#L440
> > > > > > > > > 
> > > > > > > > > So assuming transmission size of 2 * 128 MB that
> > > > > > > > > kmalloc_array()
> > > > > > > > > call
> > > > > > > > > would
> > > > > > > > > yield in kmalloc(1M) and the latter might fail if guest had
> > > > > > > > > highly
> > > > > > > > > fragmented physical memory. For such kind of error case there
> > > > > > > > > is
> > > > > > > > > currently a fallback path in virtqueue_add_split() that would
> > > > > > > > > then
> > > > > > > > > use
> > > > > > > > > the required amount of pre-allocated descriptors instead:
> > > > > > > > > https://github.com/torvalds/linux/blob/2f111a6fd5b5297b4e92f53
> > > > > > > > > 798c
> > > > > > > > > a086
> > > > > > > > > f7c7
> > > > > > > > > d33a4/drivers/virtio/virtio_ring.c#L525
> > > > > > > > > 
> > > > > > > > > That fallback recovery path would no longer be viable if the
> > > > > > > > > queue
> > > > > > > > > size
> > > > > > > > > was
> > > > > > > > > exceeded. There would be alternatives though, e.g. by allowing
> > > > > > > > > to
> > > > > > > > > chain
> > > > > > > > > indirect descriptor tables (currently prohibited by the virtio
> > > > > > > > > specs).
> > > > > > > > 
> > > > > > > > Making the maximum number of descriptors independent of the
> > > > > > > > queue
> > > > > > > > size
> > > > > > > > requires a change to the VIRTIO spec since the two values are
> > > > > > > > currently
> > > > > > > > explicitly tied together by the spec.
> > > > > > > 
> > > > > > > Yes, that's what the virtio specs say. But they don't say why, nor
> > > > > > > did
> > > > > > > I
> > > > > > > hear a reason in this dicussion.
> > > > > > > 
> > > > > > > That's why I invested time reviewing current virtio implementation
> > > > > > > and
> > > > > > > specs, as well as actually testing exceeding that limit. And as I
> > > > > > > outlined in detail in my previous email, I only found one
> > > > > > > theoretical
> > > > > > > issue that could be addressed though.
> > > > > > 
> > > > > > I agree that there is a limitation in the VIRTIO spec, but violating
> > > > > > the
> > > > > > spec isn't an acceptable solution:
> > > > > > 
> > > > > > 1. QEMU and Linux aren't the only components that implement VIRTIO.
> > > > > > You
> > > > > > 
> > > > > >    cannot make assumptions about their implementations because it
> > > > > >    may
> > > > > >    break spec-compliant implementations that you haven't looked at.
> > > > > >    
> > > > > >    Your patches weren't able to increase Queue Size because some
> > > > > >    device
> > > > > >    implementations break when descriptor chains are too long. This
> > > > > >    shows
> > > > > >    there is a practical issue even in QEMU.
> > > > > > 
> > > > > > 2. The specific spec violation that we discussed creates the problem
> > > > > > 
> > > > > >    that drivers can no longer determine the maximum description
> > > > > >    chain
> > > > > >    length. This in turn will lead to more implementation-specific
> > > > > >    assumptions being baked into drivers and cause problems with
> > > > > >    interoperability and future changes.
> > > > > > 
> > > > > > The spec needs to be extended instead. I included an idea for how to
> > > > > > do
> > > > > > that below.
> > > > > 
> > > > > Sure, I just wanted to see if there was a non-neglectable "hard" show
> > > > > stopper per se that I probably haven't seen yet. I have not questioned
> > > > > aiming a clean solution.
> > > > > 
> > > > > Thanks for the clarification!
> > > > > 
> > > > > > > > Before doing that, are there benchmark results showing that 1 MB
> > > > > > > > vs
> > > > > > > > 128
> > > > > > > > MB produces a performance improvement? I'm asking because if
> > > > > > > > performance
> > > > > > > > with 1 MB is good then you can probably do that without having
> > > > > > > > to
> > > > > > > > change
> > > > > > > > VIRTIO and also because it's counter-intuitive that 9p needs 128
> > > > > > > > MB
> > > > > > > > for
> > > > > > > > good performance when it's ultimately implemented on top of disk
> > > > > > > > and
> > > > > > > > network I/O that have lower size limits.
> > > > > > > 
> > > > > > > First some numbers, linear reading a 12 GB file:
> > > > > > > 
> > > > > > > msize    average      notes
> > > > > > > 
> > > > > > > 8 kB     52.0 MB/s    default msize of Linux kernel <v5.15
> > > > > > > 128 kB   624.8 MB/s   default msize of Linux kernel >=v5.15
> > > > > > > 512 kB   1961 MB/s    current max. msize with any Linux kernel
> > > > > > > <=v5.15
> > > > > > > 1 MB     2551 MB/s    this msize would already violate virtio
> > > > > > > specs
> > > > > > > 2 MB     2521 MB/s    this msize would already violate virtio
> > > > > > > specs
> > > > > > > 4 MB     2628 MB/s    planned max. msize of my current kernel
> > > > > > > patches
> > > > > > > [1]
> > > > > > 
> > > > > > How many descriptors are used? 4 MB can be covered by a single
> > > > > > descriptor if the data is physically contiguous in memory, so this
> > > > > > data
> > > > > > doesn't demonstrate a need for more descriptors.
> > > > > 
> > > > > No, in the last couple years there was apparently no kernel version
> > > > > that
> > > > > used just one descriptor, nor did my benchmarked version. Even though
> > > > > the
> > > > > Linux 9p client uses (yet) simple linear buffers (contiguous physical
> > > > > memory) on 9p client level, these are however split into PAGE_SIZE
> > > > > chunks
> > > > > by function pack_sg_list() [1] before being fed to virtio level:
> > > > > 
> > > > > static unsigned int rest_of_page(void *data)
> > > > > {
> > > > > 
> > > > > 	return PAGE_SIZE - offset_in_page(data);
> > > > > 
> > > > > }
> > > > > ...
> > > > > static int pack_sg_list(struct scatterlist *sg, int start,
> > > > > 
> > > > > 			int limit, char *data, int count)
> > > > > 
> > > > > {
> > > > > 
> > > > > 	int s;
> > > > > 	int index = start;
> > > > > 	
> > > > > 	while (count) {
> > > > > 	
> > > > > 		s = rest_of_page(data);
> > > > > 		...
> > > > > 		sg_set_buf(&sg[index++], data, s);
> > > > > 		count -= s;
> > > > > 		data += s;
> > > > > 	
> > > > > 	}
> > > > > 	...
> > > > > 
> > > > > }
> > > > > 
> > > > > [1]
> > > > > https://github.com/torvalds/linux/blob/19901165d90fdca1e57c9baa0d5b4c6
> > > > > 3d1
> > > > > 5c476a/net/9p/trans_virtio.c#L171
> > > > > 
> > > > > So when sending 4MB over virtio wire, it would yield in 1k descriptors
> > > > > ATM.
> > > > > 
> > > > > I have wondered about this before, but did not question it, because
> > > > > due to
> > > > > the cross-platform nature I couldn't say for certain whether that's
> > > > > probably needed somewhere. I mean for the case virtio-PCI I know for
> > > > > sure
> > > > > that one descriptor (i.e. >PAGE_SIZE) would be fine, but I don't know
> > > > > if
> > > > > that applies to all buses and architectures.
> > > > 
> > > > VIRTIO does not limit descriptor the descriptor len field to PAGE_SIZE,
> > > > so I don't think there is a limit at the VIRTIO level.
> > > 
> > > So you are viewing this purely from virtio specs PoV: in the sense, if it
> > > is not prohibited by the virtio specs, then it should work. Maybe.
> > 
> > Limitations must be specified either in the 9P protocol or the VIRTIO
> > specification. Drivers and devices will not be able to operate correctly
> > if there are limitations that aren't covered by the specs.
> > 
> > Do you have something in mind that isn't covered by the specs?
> 
> Not sure whether that's something that should be specified by the virtio 
> specs, probably not. I simply do not know if there was any bus or architecture 
> that would have a limitation for max. size for a memory block passed per one 
> DMA address.

Host-side limitations like that can exist. For example when a physical
storage device on the host has limits that the VIRTIO device does not
have. In this case both virtio-scsi and virtio-blk report those limits
to the guest so that the guest won't submit requests that the physical
device would reject. I guess networking MTU is kind of similar too. What
they have in common is that the limit needs to be reported to the guest,
typically using a VIRTIO Configuration Space field. It is an explicit
limit that is part of the host<->guest interface (VIRTIO spec, SCSI,
etc).

> > > > If this function coalesces adjacent pages then the descriptor chain
> > > > length issues could be reduced.
> > > > 
> > > > > > > But again, this is not just about performance. My conclusion as
> > > > > > > described
> > > > > > > in my previous email is that virtio currently squeezes
> > > > > > > 
> > > > > > > 	"max. simultanious amount of bulk messages"
> > > > > > > 
> > > > > > > vs.
> > > > > > > 
> > > > > > > 	"max. bulk data transmission size per bulk messaage"
> > > > > > > 
> > > > > > > into the same configuration parameter, which is IMO inappropriate
> > > > > > > and
> > > > > > > hence
> > > > > > > splitting them into 2 separate parameters when creating a queue
> > > > > > > makes
> > > > > > > sense, independent of the performance benchmarks.
> > > > > > > 
> > > > > > > [1]
> > > > > > > https://lore.kernel.org/netdev/cover.1632327421.git.linux_oss@crud
> > > > > > > ebyt
> > > > > > > e.c
> > > > > > > om/
> > > > > > 
> > > > > > Some devices effectively already have this because the device
> > > > > > advertises
> > > > > > a maximum number of descriptors via device-specific mechanisms like
> > > > > > the
> > > > > > struct virtio_blk_config seg_max field. But today these fields can
> > > > > > only
> > > > > > reduce the maximum descriptor chain length because the spec still
> > > > > > limits
> > > > > > the length to Queue Size.
> > > > > > 
> > > > > > We can build on this approach to raise the length above Queue Size.
> > > > > > This
> > > > > > approach has the advantage that the maximum number of segments isn't
> > > > > > per
> > > > > > device or per virtqueue, it's fine-grained. If the device supports
> > > > > > two
> > > > > > requests types then different max descriptor chain limits could be
> > > > > > given
> > > > > > for them by introducing two separate configuration space fields.
> > > > > > 
> > > > > > Here are the corresponding spec changes:
> > > > > > 
> > > > > > 1. A new feature bit called VIRTIO_RING_F_LARGE_INDIRECT_DESC is
> > > > > > added
> > > > > > 
> > > > > >    to indicate that indirect descriptor table size and maximum
> > > > > >    descriptor chain length are not limited by Queue Size value.
> > > > > >    (Maybe
> > > > > >    there still needs to be a limit like 2^15?)
> > > > > 
> > > > > Sounds good to me!
> > > > > 
> > > > > AFAIK it is effectively limited to 2^16 because of vring_desc->next:
> > > > > 
> > > > > /* Virtio ring descriptors: 16 bytes.  These can chain together via
> > > > > "next". */ struct vring_desc {
> > > > > 
> > > > >         /* Address (guest-physical). */
> > > > >         __virtio64 addr;
> > > > >         /* Length. */
> > > > >         __virtio32 len;
> > > > >         /* The flags as indicated above. */
> > > > >         __virtio16 flags;
> > > > >         /* We chain unused descriptors via this, too */
> > > > >         __virtio16 next;
> > > > > 
> > > > > };
> > > > 
> > > > Yes, Split Virtqueues have a fundamental limit on indirect table size
> > > > due to the "next" field. Packed Virtqueue descriptors don't have a
> > > > "next" field so descriptor chains could be longer in theory (currently
> > > > forbidden by the spec).
> > > > 
> > > > > > One thing that's messy is that we've been discussing the maximum
> > > > > > descriptor chain length but 9p has the "msize" concept, which isn't
> > > > > > aware of contiguous memory. It may be necessary to extend the 9p
> > > > > > driver
> > > > > > code to size requests not just according to their length in bytes
> > > > > > but
> > > > > > also according to the descriptor chain length. That's how the Linux
> > > > > > block layer deals with queue limits (struct queue_limits
> > > > > > max_segments vs
> > > > > > max_hw_sectors).
> > > > > 
> > > > > Hmm, can't follow on that one. For what should that be needed in case
> > > > > of
> > > > > 9p? My plan was to limit msize by 9p client simply at session start to
> > > > > whatever is the max. amount virtio descriptors supported by host and
> > > > > using PAGE_SIZE as size per descriptor, because that's what 9p client
> > > > > actually does ATM (see above). So you think that should be changed to
> > > > > e.g. just one descriptor for 4MB, right?
> > > > 
> > > > Limiting msize to the 9p transport device's maximum number of
> > > > descriptors is conservative (i.e. 128 descriptors = 512 KB msize)
> > > > because it doesn't take advantage of contiguous memory. I suggest
> > > > leaving msize alone, adding a separate limit at which requests are split
> > > > according to the maximum descriptor chain length, and tweaking
> > > > pack_sg_list() to coalesce adjacent pages.
> > > > 
> > > > That way msize can be large without necessarily using lots of
> > > > descriptors (depending on the memory layout).
> > > 
> > > That was actually a tempting solution. Because it would neither require
> > > changes to the virtio specs (at least for a while) and it would also work
> > > with older QEMU versions. And for that pack_sg_list() portion of the code
> > > it would work well and easy as the buffer passed to pack_sg_list() is
> > > contiguous already.
> > > 
> > > However I just realized for the zero-copy version of the code that would
> > > be
> > > more tricky. The ZC version already uses individual pages (struct page,
> > > hence PAGE_SIZE each) which are pinned, i.e. it uses pack_sg_list_p() [1]
> > > in combination with p9_get_mapped_pages() [2]
> > > 
> > > [1]
> > > https://github.com/torvalds/linux/blob/7ddb58cb0ecae8e8b6181d736a87667cc9
> > > ab8389/net/9p/trans_virtio.c#L218 [2]
> > > https://github.com/torvalds/linux/blob/7ddb58cb0ecae8e8b6181d736a87667cc9
> > > ab8389/net/9p/trans_virtio.c#L309
> > > 
> > > So that would require much more work and code trying to sort and coalesce
> > > individual pages to contiguous physical memory for the sake of reducing
> > > virtio descriptors. And there is no guarantee that this is even possible.
> > > The kernel may simply return a non-contiguous set of pages which would
> > > eventually end up exceeding the virtio descriptor limit again.
> > 
> > Order must be preserved so pages cannot be sorted by physical address.
> > How about simply coalescing when pages are adjacent?
> 
> It would help, but not solve the issue we are talking about here: if 99% of 
> the cases could successfully merge descriptors to stay below the descriptor 
> count limit, but in 1% of the cases it could not, then this still construes a 
> severe runtime issue that could trigger at any time.
> 
> > > So looks like it was probably still easier and realistic to just add
> > > virtio
> > > capabilities for now for allowing to exceed current descriptor limit.
> > 
> > I'm still not sure why virtio-net, virtio-blk, virtio-fs, etc perform
> > fine under today's limits while virtio-9p needs a much higher limit to
> > achieve good performance. Maybe there is an issue in a layer above the
> > vring that's causing the virtio-9p performance you've observed?
> 
> Are you referring to (somewhat) recent benchmarks when saying those would all 
> still perform fine today?

I'm not referring to specific benchmark results. Just that none of those
devices needed to raise the descriptor chain length, so I'm surprised
that virtio-9p needs it because it's conceptually similar to these
devices.

> Vivek was running detailed benchmarks for virtiofs vs. 9p:
> https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg02704.html
> 
> For the virtio aspect discussed here, only the benchmark configurations 
> without cache are relevant (9p-none, vtfs-none) and under this aspect the 
> situation seems to be quite similar between 9p and virtio-fs. You'll also note 
> once DAX is enabled (vtfs-none-dax) that apparently boosts virtio-fs 
> performance significantly, which however seems to corelate to numbers when I 
> am running 9p with msize > 300k. Note: Vivek was presumably running 9p 
> effecively with msize=300k, as this was the kernel limitation at that time.

Agreed, virtio-9p and virtiofs are similar without caching.

I think we shouldn't consider DAX here since it bypasses the virtqueue.

> To bring things into relation: there are known performance aspects in 9p that 
> can be improved, yes, both on Linux kernel side and on 9p server side in QEMU. 
> For instance 9p server uses coroutines [1] and currently dispatches between 
> worker thread(s) and main thread too often per request (partly addressed 
> already [2], but still WIP), which accumulates to overall latency. But Vivek 
> was actually using a 9p patch here which disabled coroutines entirely, which 
> suggests that the virtio transmission size limit still represents a 
> bottleneck.

These results were collected with 4k block size. Neither msize nor the
descriptor chain length limits will be stressed, so I don't think these
results are relevant here.

Maybe a more relevant comparison would be virtio-9p, virtiofs, and
virtio-blk when block size is large (e.g. 1M). The Linux block layer in
the guest will split virtio-blk requests when they exceed the block
queue limits.

Stefan

> 
> [1] https://wiki.qemu.org/Documentation/9p#Coroutines
> [2] https://wiki.qemu.org/Documentation/9p#Implementation_Plans
> 
> Best regards,
> Christian Schoenebeck
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Christian Schoenebeck <qemu_oss@crudebyte.com>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	qemu-block@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>, "Amit Shah" <amit@kernel.org>,
	"David Hildenbrand" <david@redhat.com>,
	qemu-devel@nongnu.org, virtio-fs@redhat.com,
	"Eric Auger" <eric.auger@redhat.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Fam Zheng" <fam@euphon.net>,
	"Raphael Norwitz" <raphael.norwitz@nutanix.com>
Subject: Re: [Virtio-fs] [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
Date: Wed, 10 Nov 2021 10:05:50 +0000	[thread overview]
Message-ID: <YYuZfkfbxcX0JDRN@stefanha-x1.localdomain> (raw)
In-Reply-To: <25571471.tMsSMU6axZ@silver>

[-- Attachment #1: Type: text/plain, Size: 43910 bytes --]

On Tue, Nov 09, 2021 at 02:09:59PM +0100, Christian Schoenebeck wrote:
> On Dienstag, 9. November 2021 11:56:35 CET Stefan Hajnoczi wrote:
> > On Thu, Nov 04, 2021 at 03:41:23PM +0100, Christian Schoenebeck wrote:
> > > On Mittwoch, 3. November 2021 12:33:33 CET Stefan Hajnoczi wrote:
> > > > On Mon, Nov 01, 2021 at 09:29:26PM +0100, Christian Schoenebeck wrote:
> > > > > On Donnerstag, 28. Oktober 2021 11:00:48 CET Stefan Hajnoczi wrote:
> > > > > > On Mon, Oct 25, 2021 at 05:03:25PM +0200, Christian Schoenebeck 
> wrote:
> > > > > > > On Montag, 25. Oktober 2021 12:30:41 CEST Stefan Hajnoczi wrote:
> > > > > > > > On Thu, Oct 21, 2021 at 05:39:28PM +0200, Christian Schoenebeck 
> wrote:
> > > > > > > > > On Freitag, 8. Oktober 2021 18:08:48 CEST Christian 
> Schoenebeck wrote:
> > > > > > > > > > On Freitag, 8. Oktober 2021 16:24:42 CEST Christian 
> Schoenebeck wrote:
> > > > > > > > > > > On Freitag, 8. Oktober 2021 09:25:33 CEST Greg Kurz wrote:
> > > > > > > > > > > > On Thu, 7 Oct 2021 16:42:49 +0100
> > > > > > > > > > > > 
> > > > > > > > > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > > > > > > > > > On Thu, Oct 07, 2021 at 02:51:55PM +0200, Christian 
> Schoenebeck wrote:
> > > > > > > > > > > > > > On Donnerstag, 7. Oktober 2021 07:23:59 CEST Stefan 
> Hajnoczi wrote:
> > > > > > > > > > > > > > > On Mon, Oct 04, 2021 at 09:38:00PM +0200,
> > > > > > > > > > > > > > > Christian
> > > > > > > > > > > > > > > Schoenebeck
> > > > > > > > > > 
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > At the moment the maximum transfer size with
> > > > > > > > > > > > > > > > virtio
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > limited
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > 4M
> > > > > > > > > > > > > > > > (1024 * PAGE_SIZE). This series raises this
> > > > > > > > > > > > > > > > limit to
> > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > maximum
> > > > > > > > > > > > > > > > theoretical possible transfer size of 128M (32k
> > > > > > > > > > > > > > > > pages)
> > > > > > > > > > > > > > > > according
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > virtio specs:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/c
> > > > > > > > > > > > > > > > s01/
> > > > > > > > > > > > > > > > virt
> > > > > > > > > > > > > > > > io-v
> > > > > > > > > > > > > > > > 1.1-
> > > > > > > > > > > > > > > > cs
> > > > > > > > > > > > > > > > 01
> > > > > > > > > > > > > > > > .html#
> > > > > > > > > > > > > > > > x1-240006
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Hi Christian,
> > > > > > > > > > > > 
> > > > > > > > > > > > > > > I took a quick look at the code:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks Stefan for sharing virtio expertise and helping
> > > > > > > > > > > > Christian
> > > > > > > > > > > > !
> > > > > > > > > > > > 
> > > > > > > > > > > > > > > - The Linux 9p driver restricts descriptor chains
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > 128
> > > > > > > > > > > > > > > elements
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > >   (net/9p/trans_virtio.c:VIRTQUEUE_NUM)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Yes, that's the limitation that I am about to remove
> > > > > > > > > > > > > > (WIP);
> > > > > > > > > > > > > > current
> > > > > > > > > > > > > > kernel
> > > > > > > > > > > > > > patches:
> > > > > > > > > > > > > > https://lore.kernel.org/netdev/cover.1632327421.git.
> > > > > > > > > > > > > > linu
> > > > > > > > > > > > > > x_os
> > > > > > > > > > > > > > s@cr
> > > > > > > > > > > > > > udeb
> > > > > > > > > > > > > > yt
> > > > > > > > > > > > > > e.
> > > > > > > > > > > > > > com/>
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I haven't read the patches yet but I'm concerned that
> > > > > > > > > > > > > today
> > > > > > > > > > > > > the
> > > > > > > > > > > > > driver
> > > > > > > > > > > > > is pretty well-behaved and this new patch series
> > > > > > > > > > > > > introduces a
> > > > > > > > > > > > > spec
> > > > > > > > > > > > > violation. Not fixing existing spec violations is
> > > > > > > > > > > > > okay,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > adding
> > > > > > > > > > > > > new
> > > > > > > > > > > > > ones is a red flag. I think we need to figure out a
> > > > > > > > > > > > > clean
> > > > > > > > > > > > > solution.
> > > > > > > > > > > 
> > > > > > > > > > > Nobody has reviewed the kernel patches yet. My main
> > > > > > > > > > > concern
> > > > > > > > > > > therefore
> > > > > > > > > > > actually is that the kernel patches are already too
> > > > > > > > > > > complex,
> > > > > > > > > > > because
> > > > > > > > > > > the
> > > > > > > > > > > current situation is that only Dominique is handling 9p
> > > > > > > > > > > patches on
> > > > > > > > > > > kernel
> > > > > > > > > > > side, and he barely has time for 9p anymore.
> > > > > > > > > > > 
> > > > > > > > > > > Another reason for me to catch up on reading current
> > > > > > > > > > > kernel
> > > > > > > > > > > code
> > > > > > > > > > > and
> > > > > > > > > > > stepping in as reviewer of 9p on kernel side ASAP,
> > > > > > > > > > > independent
> > > > > > > > > > > of
> > > > > > > > > > > this
> > > > > > > > > > > issue.
> > > > > > > > > > > 
> > > > > > > > > > > As for current kernel patches' complexity: I can certainly
> > > > > > > > > > > drop
> > > > > > > > > > > patch
> > > > > > > > > > > 7
> > > > > > > > > > > entirely as it is probably just overkill. Patch 4 is then
> > > > > > > > > > > the
> > > > > > > > > > > biggest
> > > > > > > > > > > chunk, I have to see if I can simplify it, and whether it
> > > > > > > > > > > would
> > > > > > > > > > > make
> > > > > > > > > > > sense to squash with patch 3.
> > > > > > > > > > > 
> > > > > > > > > > > > > > > - The QEMU 9pfs code passes iovecs directly to
> > > > > > > > > > > > > > > preadv(2)
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > >   with EINVAL when called with more than IOV_MAX
> > > > > > > > > > > > > > >   iovecs
> > > > > > > > > > > > > > >   (hw/9pfs/9p.c:v9fs_read())
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hmm, which makes me wonder why I never encountered
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > error
> > > > > > > > > > > > > > during
> > > > > > > > > > > > > > testing.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Most people will use the 9p qemu 'local' fs driver
> > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > practice,
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > that v9fs_read() call would translate for most
> > > > > > > > > > > > > > people to
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > implementation on QEMU side (hw/9p/9p-local.c):
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > static ssize_t local_preadv(FsContext *ctx,
> > > > > > > > > > > > > > V9fsFidOpenState
> > > > > > > > > > > > > > *fs,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >                             const struct iovec *iov,
> > > > > > > > > > > > > >                             int iovcnt, off_t
> > > > > > > > > > > > > >                             offset)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > #ifdef CONFIG_PREADV
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     return preadv(fs->fd, iov, iovcnt, offset);
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > #else
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     int err = lseek(fs->fd, offset, SEEK_SET);
> > > > > > > > > > > > > >     if (err == -1) {
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >     } else {
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >         return readv(fs->fd, iov, iovcnt);
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > >     }
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > #endif
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Unless I misunderstood the code, neither side can
> > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > advantage
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new 32k descriptor chain limit?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Stefan
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I need to check that when I have some more time. One
> > > > > > > > > > > > > > possible
> > > > > > > > > > > > > > explanation
> > > > > > > > > > > > > > might be that preadv() already has this wrapped into
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > loop
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > its
> > > > > > > > > > > > > > implementation to circumvent a limit like IOV_MAX.
> > > > > > > > > > > > > > It
> > > > > > > > > > > > > > might
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > another
> > > > > > > > > > > > > > "it
> > > > > > > > > > > > > > works, but not portable" issue, but not sure.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > There are still a bunch of other issues I have to
> > > > > > > > > > > > > > resolve.
> > > > > > > > > > > > > > If
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > look
> > > > > > > > > > > > > > at
> > > > > > > > > > > > > > net/9p/client.c on kernel side, you'll notice that
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > basically
> > > > > > > > > > > > > > does
> > > > > > > > > > > > > > this ATM> >
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     kmalloc(msize);
> > > > > > > > > > > > 
> > > > > > > > > > > > Note that this is done twice : once for the T message
> > > > > > > > > > > > (client
> > > > > > > > > > > > request)
> > > > > > > > > > > > and
> > > > > > > > > > > > once for the R message (server answer). The 9p driver
> > > > > > > > > > > > could
> > > > > > > > > > > > adjust
> > > > > > > > > > > > the
> > > > > > > > > > > > size
> > > > > > > > > > > > of the T message to what's really needed instead of
> > > > > > > > > > > > allocating
> > > > > > > > > > > > the
> > > > > > > > > > > > full
> > > > > > > > > > > > msize. R message size is not known though.
> > > > > > > > > > > 
> > > > > > > > > > > Would it make sense adding a second virtio ring, dedicated
> > > > > > > > > > > to
> > > > > > > > > > > server
> > > > > > > > > > > responses to solve this? IIRC 9p server already calculates
> > > > > > > > > > > appropriate
> > > > > > > > > > > exact sizes for each response type. So server could just
> > > > > > > > > > > push
> > > > > > > > > > > space
> > > > > > > > > > > that's
> > > > > > > > > > > really needed for its responses.
> > > > > > > > > > > 
> > > > > > > > > > > > > > for every 9p request. So not only does it allocate
> > > > > > > > > > > > > > much
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > every request than actually required (i.e. say 9pfs
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > mounted
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > msize=8M, then a 9p request that actually would just
> > > > > > > > > > > > > > need 1k
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > nevertheless allocate 8M), but also it allocates >
> > > > > > > > > > > > > > PAGE_SIZE,
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > obviously may fail at any time.>
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The PAGE_SIZE limitation sounds like a kmalloc() vs
> > > > > > > > > > > > > vmalloc()
> > > > > > > > > > > > > situation.
> > > > > > > > > > > 
> > > > > > > > > > > Hu, I didn't even consider vmalloc(). I just tried the
> > > > > > > > > > > kvmalloc()
> > > > > > > > > > > wrapper
> > > > > > > > > > > as a quick & dirty test, but it crashed in the same way as
> > > > > > > > > > > kmalloc()
> > > > > > > > > > > with
> > > > > > > > > > > large msize values immediately on mounting:
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/net/9p/client.c b/net/9p/client.c
> > > > > > > > > > > index a75034fa249b..cfe300a4b6ca 100644
> > > > > > > > > > > --- a/net/9p/client.c
> > > > > > > > > > > +++ b/net/9p/client.c
> > > > > > > > > > > @@ -227,15 +227,18 @@ static int parse_opts(char *opts,
> > > > > > > > > > > struct
> > > > > > > > > > > p9_client
> > > > > > > > > > > *clnt)
> > > > > > > > > > > 
> > > > > > > > > > >  static int p9_fcall_init(struct p9_client *c, struct
> > > > > > > > > > >  p9_fcall
> > > > > > > > > > >  *fc,
> > > > > > > > > > >  
> > > > > > > > > > >                          int alloc_msize)
> > > > > > > > > > >  
> > > > > > > > > > >  {
> > > > > > > > > > > 
> > > > > > > > > > > -       if (likely(c->fcall_cache) && alloc_msize ==
> > > > > > > > > > > c->msize)
> > > > > > > > > > > {
> > > > > > > > > > > +       //if (likely(c->fcall_cache) && alloc_msize ==
> > > > > > > > > > > c->msize) {
> > > > > > > > > > > +       if (false) {
> > > > > > > > > > > 
> > > > > > > > > > >                 fc->sdata =
> > > > > > > > > > >                 kmem_cache_alloc(c->fcall_cache,
> > > > > > > > > > >                 GFP_NOFS);
> > > > > > > > > > >                 fc->cache = c->fcall_cache;
> > > > > > > > > > >         
> > > > > > > > > > >         } else {
> > > > > > > > > > > 
> > > > > > > > > > > -               fc->sdata = kmalloc(alloc_msize,
> > > > > > > > > > > GFP_NOFS);
> > > > > > > > > > > +               fc->sdata = kvmalloc(alloc_msize,
> > > > > > > > > > > GFP_NOFS);
> > > > > > > > > > 
> > > > > > > > > > Ok, GFP_NOFS -> GFP_KERNEL did the trick.
> > > > > > > > > > 
> > > > > > > > > > Now I get:
> > > > > > > > > >    virtio: bogus descriptor or out of resources
> > > > > > > > > > 
> > > > > > > > > > So, still some work ahead on both ends.
> > > > > > > > > 
> > > > > > > > > Few hacks later (only changes on 9p client side) I got this
> > > > > > > > > running
> > > > > > > > > stable
> > > > > > > > > now. The reason for the virtio error above was that kvmalloc()
> > > > > > > > > returns
> > > > > > > > > a
> > > > > > > > > non-logical kernel address for any kvmalloc(>4M), i.e. an
> > > > > > > > > address
> > > > > > > > > that
> > > > > > > > > is
> > > > > > > > > inaccessible from host side, hence that "bogus descriptor"
> > > > > > > > > message
> > > > > > > > > by
> > > > > > > > > QEMU.
> > > > > > > > > So I had to split those linear 9p client buffers into sparse
> > > > > > > > > ones
> > > > > > > > > (set
> > > > > > > > > of
> > > > > > > > > individual pages).
> > > > > > > > > 
> > > > > > > > > I tested this for some days with various virtio transmission
> > > > > > > > > sizes
> > > > > > > > > and
> > > > > > > > > it
> > > > > > > > > works as expected up to 128 MB (more precisely: 128 MB read
> > > > > > > > > space
> > > > > > > > > +
> > > > > > > > > 128 MB
> > > > > > > > > write space per virtio round trip message).
> > > > > > > > > 
> > > > > > > > > I did not encounter a show stopper for large virtio
> > > > > > > > > transmission
> > > > > > > > > sizes
> > > > > > > > > (4 MB ... 128 MB) on virtio level, neither as a result of
> > > > > > > > > testing,
> > > > > > > > > nor
> > > > > > > > > after reviewing the existing code.
> > > > > > > > > 
> > > > > > > > > About IOV_MAX: that's apparently not an issue on virtio level.
> > > > > > > > > Most of
> > > > > > > > > the
> > > > > > > > > iovec code, both on Linux kernel side and on QEMU side do not
> > > > > > > > > have
> > > > > > > > > this
> > > > > > > > > limitation. It is apparently however indeed a limitation for
> > > > > > > > > userland
> > > > > > > > > apps
> > > > > > > > > calling the Linux kernel's syscalls yet.
> > > > > > > > > 
> > > > > > > > > Stefan, as it stands now, I am even more convinced that the
> > > > > > > > > upper
> > > > > > > > > virtio
> > > > > > > > > transmission size limit should not be squeezed into the queue
> > > > > > > > > size
> > > > > > > > > argument of virtio_add_queue(). Not because of the previous
> > > > > > > > > argument
> > > > > > > > > that
> > > > > > > > > it would waste space (~1MB), but rather because they are two
> > > > > > > > > different
> > > > > > > > > things. To outline this, just a quick recap of what happens
> > > > > > > > > exactly
> > > > > > > > > when
> > > > > > > > > a bulk message is pushed over the virtio wire (assuming virtio
> > > > > > > > > "split"
> > > > > > > > > layout here):
> > > > > > > > > 
> > > > > > > > > ---------- [recap-start] ----------
> > > > > > > > > 
> > > > > > > > > For each bulk message sent guest <-> host, exactly *one* of
> > > > > > > > > the
> > > > > > > > > pre-allocated descriptors is taken and placed (subsequently)
> > > > > > > > > into
> > > > > > > > > exactly
> > > > > > > > > *one* position of the two available/used ring buffers. The
> > > > > > > > > actual
> > > > > > > > > descriptor table though, containing all the DMA addresses of
> > > > > > > > > the
> > > > > > > > > message
> > > > > > > > > bulk data, is allocated just in time for each round trip
> > > > > > > > > message.
> > > > > > > > > Say,
> > > > > > > > > it
> > > > > > > > > is the first message sent, it yields in the following
> > > > > > > > > structure:
> > > > > > > > > 
> > > > > > > > > Ring Buffer   Descriptor Table      Bulk Data Pages
> > > > > > > > > 
> > > > > > > > >    +-+              +-+           +-----------------+
> > > > > > > > >    
> > > > > > > > >    |D|------------->|d|---------->| Bulk data block |
> > > > > > > > >    
> > > > > > > > >    +-+              |d|--------+  +-----------------+
> > > > > > > > >    
> > > > > > > > >    | |              |d|------+ |
> > > > > > > > >    
> > > > > > > > >    +-+               .       | |  +-----------------+
> > > > > > > > >    
> > > > > > > > >    | |               .       | +->| Bulk data block |
> > > > > > > > >     
> > > > > > > > >     .                .       |    +-----------------+
> > > > > > > > >     .               |d|-+    |
> > > > > > > > >     .               +-+ |    |    +-----------------+
> > > > > > > > >     
> > > > > > > > >    | |                  |    +--->| Bulk data block |
> > > > > > > > >    
> > > > > > > > >    +-+                  |         +-----------------+
> > > > > > > > >    
> > > > > > > > >    | |                  |                 .
> > > > > > > > >    
> > > > > > > > >    +-+                  |                 .
> > > > > > > > >    
> > > > > > > > >                         |                 .
> > > > > > > > >                         |         
> > > > > > > > >                         |         +-----------------+
> > > > > > > > >                         
> > > > > > > > >                         +-------->| Bulk data block |
> > > > > > > > >                         
> > > > > > > > >                                   +-----------------+
> > > > > > > > > 
> > > > > > > > > Legend:
> > > > > > > > > D: pre-allocated descriptor
> > > > > > > > > d: just in time allocated descriptor
> > > > > > > > > -->: memory pointer (DMA)
> > > > > > > > > 
> > > > > > > > > The bulk data blocks are allocated by the respective device
> > > > > > > > > driver
> > > > > > > > > above
> > > > > > > > > virtio subsystem level (guest side).
> > > > > > > > > 
> > > > > > > > > There are exactly as many descriptors pre-allocated (D) as the
> > > > > > > > > size of
> > > > > > > > > a
> > > > > > > > > ring buffer.
> > > > > > > > > 
> > > > > > > > > A "descriptor" is more or less just a chainable DMA memory
> > > > > > > > > pointer;
> > > > > > > > > defined
> > > > > > > > > as:
> > > > > > > > > 
> > > > > > > > > /* Virtio ring descriptors: 16 bytes.  These can chain
> > > > > > > > > together
> > > > > > > > > via
> > > > > > > > > "next". */ struct vring_desc {
> > > > > > > > > 
> > > > > > > > > 	/* Address (guest-physical). */
> > > > > > > > > 	__virtio64 addr;
> > > > > > > > > 	/* Length. */
> > > > > > > > > 	__virtio32 len;
> > > > > > > > > 	/* The flags as indicated above. */
> > > > > > > > > 	__virtio16 flags;
> > > > > > > > > 	/* We chain unused descriptors via this, too */
> > > > > > > > > 	__virtio16 next;
> > > > > > > > > 
> > > > > > > > > };
> > > > > > > > > 
> > > > > > > > > There are 2 ring buffers; the "available" ring buffer is for
> > > > > > > > > sending a
> > > > > > > > > message guest->host (which will transmit DMA addresses of
> > > > > > > > > guest
> > > > > > > > > allocated
> > > > > > > > > bulk data blocks that are used for data sent to device, and
> > > > > > > > > separate
> > > > > > > > > guest allocated bulk data blocks that will be used by host
> > > > > > > > > side to
> > > > > > > > > place
> > > > > > > > > its response bulk data), and the "used" ring buffer is for
> > > > > > > > > sending
> > > > > > > > > host->guest to let guest know about host's response and that
> > > > > > > > > it
> > > > > > > > > could
> > > > > > > > > now
> > > > > > > > > safely consume and then deallocate the bulk data blocks
> > > > > > > > > subsequently.
> > > > > > > > > 
> > > > > > > > > ---------- [recap-end] ----------
> > > > > > > > > 
> > > > > > > > > So the "queue size" actually defines the ringbuffer size. It
> > > > > > > > > does
> > > > > > > > > not
> > > > > > > > > define the maximum amount of descriptors. The "queue size"
> > > > > > > > > rather
> > > > > > > > > defines
> > > > > > > > > how many pending messages can be pushed into either one
> > > > > > > > > ringbuffer
> > > > > > > > > before
> > > > > > > > > the other side would need to wait until the counter side would
> > > > > > > > > step up
> > > > > > > > > (i.e. ring buffer full).
> > > > > > > > > 
> > > > > > > > > The maximum amount of descriptors (what VIRTQUEUE_MAX_SIZE
> > > > > > > > > actually
> > > > > > > > > is)
> > > > > > > > > OTOH defines the max. bulk data size that could be transmitted
> > > > > > > > > with
> > > > > > > > > each
> > > > > > > > > virtio round trip message.
> > > > > > > > > 
> > > > > > > > > And in fact, 9p currently handles the virtio "queue size" as
> > > > > > > > > directly
> > > > > > > > > associative with its maximum amount of active 9p requests the
> > > > > > > > > server
> > > > > > > > > could
> > > > > > > > > 
> > > > > > > > > handle simultaniously:
> > > > > > > > >   hw/9pfs/9p.h:#define MAX_REQ         128
> > > > > > > > >   hw/9pfs/9p.h:    V9fsPDU pdus[MAX_REQ];
> > > > > > > > >   hw/9pfs/virtio-9p-device.c:    v->vq =
> > > > > > > > >   virtio_add_queue(vdev,
> > > > > > > > >   MAX_REQ,
> > > > > > > > >   
> > > > > > > > >                                  handle_9p_output);
> > > > > > > > > 
> > > > > > > > > So if I would change it like this, just for the purpose to
> > > > > > > > > increase
> > > > > > > > > the
> > > > > > > > > max. virtio transmission size:
> > > > > > > > > 
> > > > > > > > > --- a/hw/9pfs/virtio-9p-device.c
> > > > > > > > > +++ b/hw/9pfs/virtio-9p-device.c
> > > > > > > > > @@ -218,7 +218,7 @@ static void
> > > > > > > > > virtio_9p_device_realize(DeviceState
> > > > > > > > > *dev,
> > > > > > > > > Error **errp)>
> > > > > > > > > 
> > > > > > > > >      v->config_size = sizeof(struct virtio_9p_config) +
> > > > > > > > >      strlen(s->fsconf.tag);
> > > > > > > > >      virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P,
> > > > > > > > >      v->config_size,
> > > > > > > > >      
> > > > > > > > >                  VIRTQUEUE_MAX_SIZE);
> > > > > > > > > 
> > > > > > > > > -    v->vq = virtio_add_queue(vdev, MAX_REQ,
> > > > > > > > > handle_9p_output);
> > > > > > > > > +    v->vq = virtio_add_queue(vdev, 32*1024,
> > > > > > > > > handle_9p_output);
> > > > > > > > > 
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > > Then it would require additional synchronization code on both
> > > > > > > > > ends
> > > > > > > > > and
> > > > > > > > > therefore unnecessary complexity, because it would now be
> > > > > > > > > possible
> > > > > > > > > that
> > > > > > > > > more requests are pushed into the ringbuffer than server could
> > > > > > > > > handle.
> > > > > > > > > 
> > > > > > > > > There is one potential issue though that probably did justify
> > > > > > > > > the
> > > > > > > > > "don't
> > > > > > > > > exceed the queue size" rule:
> > > > > > > > > 
> > > > > > > > > ATM the descriptor table is allocated (just in time) as *one*
> > > > > > > > > continuous
> > > > > > > > > buffer via kmalloc_array():
> > > > > > > > > https://github.com/torvalds/linux/blob/2f111a6fd5b5297b4e92f53
> > > > > > > > > 798c
> > > > > > > > > a086
> > > > > > > > > f7c7
> > > > > > > > > d33a4/drivers/virtio/virtio_ring.c#L440
> > > > > > > > > 
> > > > > > > > > So assuming transmission size of 2 * 128 MB that
> > > > > > > > > kmalloc_array()
> > > > > > > > > call
> > > > > > > > > would
> > > > > > > > > yield in kmalloc(1M) and the latter might fail if guest had
> > > > > > > > > highly
> > > > > > > > > fragmented physical memory. For such kind of error case there
> > > > > > > > > is
> > > > > > > > > currently a fallback path in virtqueue_add_split() that would
> > > > > > > > > then
> > > > > > > > > use
> > > > > > > > > the required amount of pre-allocated descriptors instead:
> > > > > > > > > https://github.com/torvalds/linux/blob/2f111a6fd5b5297b4e92f53
> > > > > > > > > 798c
> > > > > > > > > a086
> > > > > > > > > f7c7
> > > > > > > > > d33a4/drivers/virtio/virtio_ring.c#L525
> > > > > > > > > 
> > > > > > > > > That fallback recovery path would no longer be viable if the
> > > > > > > > > queue
> > > > > > > > > size
> > > > > > > > > was
> > > > > > > > > exceeded. There would be alternatives though, e.g. by allowing
> > > > > > > > > to
> > > > > > > > > chain
> > > > > > > > > indirect descriptor tables (currently prohibited by the virtio
> > > > > > > > > specs).
> > > > > > > > 
> > > > > > > > Making the maximum number of descriptors independent of the
> > > > > > > > queue
> > > > > > > > size
> > > > > > > > requires a change to the VIRTIO spec since the two values are
> > > > > > > > currently
> > > > > > > > explicitly tied together by the spec.
> > > > > > > 
> > > > > > > Yes, that's what the virtio specs say. But they don't say why, nor
> > > > > > > did
> > > > > > > I
> > > > > > > hear a reason in this dicussion.
> > > > > > > 
> > > > > > > That's why I invested time reviewing current virtio implementation
> > > > > > > and
> > > > > > > specs, as well as actually testing exceeding that limit. And as I
> > > > > > > outlined in detail in my previous email, I only found one
> > > > > > > theoretical
> > > > > > > issue that could be addressed though.
> > > > > > 
> > > > > > I agree that there is a limitation in the VIRTIO spec, but violating
> > > > > > the
> > > > > > spec isn't an acceptable solution:
> > > > > > 
> > > > > > 1. QEMU and Linux aren't the only components that implement VIRTIO.
> > > > > > You
> > > > > > 
> > > > > >    cannot make assumptions about their implementations because it
> > > > > >    may
> > > > > >    break spec-compliant implementations that you haven't looked at.
> > > > > >    
> > > > > >    Your patches weren't able to increase Queue Size because some
> > > > > >    device
> > > > > >    implementations break when descriptor chains are too long. This
> > > > > >    shows
> > > > > >    there is a practical issue even in QEMU.
> > > > > > 
> > > > > > 2. The specific spec violation that we discussed creates the problem
> > > > > > 
> > > > > >    that drivers can no longer determine the maximum description
> > > > > >    chain
> > > > > >    length. This in turn will lead to more implementation-specific
> > > > > >    assumptions being baked into drivers and cause problems with
> > > > > >    interoperability and future changes.
> > > > > > 
> > > > > > The spec needs to be extended instead. I included an idea for how to
> > > > > > do
> > > > > > that below.
> > > > > 
> > > > > Sure, I just wanted to see if there was a non-neglectable "hard" show
> > > > > stopper per se that I probably haven't seen yet. I have not questioned
> > > > > aiming a clean solution.
> > > > > 
> > > > > Thanks for the clarification!
> > > > > 
> > > > > > > > Before doing that, are there benchmark results showing that 1 MB
> > > > > > > > vs
> > > > > > > > 128
> > > > > > > > MB produces a performance improvement? I'm asking because if
> > > > > > > > performance
> > > > > > > > with 1 MB is good then you can probably do that without having
> > > > > > > > to
> > > > > > > > change
> > > > > > > > VIRTIO and also because it's counter-intuitive that 9p needs 128
> > > > > > > > MB
> > > > > > > > for
> > > > > > > > good performance when it's ultimately implemented on top of disk
> > > > > > > > and
> > > > > > > > network I/O that have lower size limits.
> > > > > > > 
> > > > > > > First some numbers, linear reading a 12 GB file:
> > > > > > > 
> > > > > > > msize    average      notes
> > > > > > > 
> > > > > > > 8 kB     52.0 MB/s    default msize of Linux kernel <v5.15
> > > > > > > 128 kB   624.8 MB/s   default msize of Linux kernel >=v5.15
> > > > > > > 512 kB   1961 MB/s    current max. msize with any Linux kernel
> > > > > > > <=v5.15
> > > > > > > 1 MB     2551 MB/s    this msize would already violate virtio
> > > > > > > specs
> > > > > > > 2 MB     2521 MB/s    this msize would already violate virtio
> > > > > > > specs
> > > > > > > 4 MB     2628 MB/s    planned max. msize of my current kernel
> > > > > > > patches
> > > > > > > [1]
> > > > > > 
> > > > > > How many descriptors are used? 4 MB can be covered by a single
> > > > > > descriptor if the data is physically contiguous in memory, so this
> > > > > > data
> > > > > > doesn't demonstrate a need for more descriptors.
> > > > > 
> > > > > No, in the last couple years there was apparently no kernel version
> > > > > that
> > > > > used just one descriptor, nor did my benchmarked version. Even though
> > > > > the
> > > > > Linux 9p client uses (yet) simple linear buffers (contiguous physical
> > > > > memory) on 9p client level, these are however split into PAGE_SIZE
> > > > > chunks
> > > > > by function pack_sg_list() [1] before being fed to virtio level:
> > > > > 
> > > > > static unsigned int rest_of_page(void *data)
> > > > > {
> > > > > 
> > > > > 	return PAGE_SIZE - offset_in_page(data);
> > > > > 
> > > > > }
> > > > > ...
> > > > > static int pack_sg_list(struct scatterlist *sg, int start,
> > > > > 
> > > > > 			int limit, char *data, int count)
> > > > > 
> > > > > {
> > > > > 
> > > > > 	int s;
> > > > > 	int index = start;
> > > > > 	
> > > > > 	while (count) {
> > > > > 	
> > > > > 		s = rest_of_page(data);
> > > > > 		...
> > > > > 		sg_set_buf(&sg[index++], data, s);
> > > > > 		count -= s;
> > > > > 		data += s;
> > > > > 	
> > > > > 	}
> > > > > 	...
> > > > > 
> > > > > }
> > > > > 
> > > > > [1]
> > > > > https://github.com/torvalds/linux/blob/19901165d90fdca1e57c9baa0d5b4c6
> > > > > 3d1
> > > > > 5c476a/net/9p/trans_virtio.c#L171
> > > > > 
> > > > > So when sending 4MB over virtio wire, it would yield in 1k descriptors
> > > > > ATM.
> > > > > 
> > > > > I have wondered about this before, but did not question it, because
> > > > > due to
> > > > > the cross-platform nature I couldn't say for certain whether that's
> > > > > probably needed somewhere. I mean for the case virtio-PCI I know for
> > > > > sure
> > > > > that one descriptor (i.e. >PAGE_SIZE) would be fine, but I don't know
> > > > > if
> > > > > that applies to all buses and architectures.
> > > > 
> > > > VIRTIO does not limit descriptor the descriptor len field to PAGE_SIZE,
> > > > so I don't think there is a limit at the VIRTIO level.
> > > 
> > > So you are viewing this purely from virtio specs PoV: in the sense, if it
> > > is not prohibited by the virtio specs, then it should work. Maybe.
> > 
> > Limitations must be specified either in the 9P protocol or the VIRTIO
> > specification. Drivers and devices will not be able to operate correctly
> > if there are limitations that aren't covered by the specs.
> > 
> > Do you have something in mind that isn't covered by the specs?
> 
> Not sure whether that's something that should be specified by the virtio 
> specs, probably not. I simply do not know if there was any bus or architecture 
> that would have a limitation for max. size for a memory block passed per one 
> DMA address.

Host-side limitations like that can exist. For example when a physical
storage device on the host has limits that the VIRTIO device does not
have. In this case both virtio-scsi and virtio-blk report those limits
to the guest so that the guest won't submit requests that the physical
device would reject. I guess networking MTU is kind of similar too. What
they have in common is that the limit needs to be reported to the guest,
typically using a VIRTIO Configuration Space field. It is an explicit
limit that is part of the host<->guest interface (VIRTIO spec, SCSI,
etc).

> > > > If this function coalesces adjacent pages then the descriptor chain
> > > > length issues could be reduced.
> > > > 
> > > > > > > But again, this is not just about performance. My conclusion as
> > > > > > > described
> > > > > > > in my previous email is that virtio currently squeezes
> > > > > > > 
> > > > > > > 	"max. simultanious amount of bulk messages"
> > > > > > > 
> > > > > > > vs.
> > > > > > > 
> > > > > > > 	"max. bulk data transmission size per bulk messaage"
> > > > > > > 
> > > > > > > into the same configuration parameter, which is IMO inappropriate
> > > > > > > and
> > > > > > > hence
> > > > > > > splitting them into 2 separate parameters when creating a queue
> > > > > > > makes
> > > > > > > sense, independent of the performance benchmarks.
> > > > > > > 
> > > > > > > [1]
> > > > > > > https://lore.kernel.org/netdev/cover.1632327421.git.linux_oss@crud
> > > > > > > ebyt
> > > > > > > e.c
> > > > > > > om/
> > > > > > 
> > > > > > Some devices effectively already have this because the device
> > > > > > advertises
> > > > > > a maximum number of descriptors via device-specific mechanisms like
> > > > > > the
> > > > > > struct virtio_blk_config seg_max field. But today these fields can
> > > > > > only
> > > > > > reduce the maximum descriptor chain length because the spec still
> > > > > > limits
> > > > > > the length to Queue Size.
> > > > > > 
> > > > > > We can build on this approach to raise the length above Queue Size.
> > > > > > This
> > > > > > approach has the advantage that the maximum number of segments isn't
> > > > > > per
> > > > > > device or per virtqueue, it's fine-grained. If the device supports
> > > > > > two
> > > > > > requests types then different max descriptor chain limits could be
> > > > > > given
> > > > > > for them by introducing two separate configuration space fields.
> > > > > > 
> > > > > > Here are the corresponding spec changes:
> > > > > > 
> > > > > > 1. A new feature bit called VIRTIO_RING_F_LARGE_INDIRECT_DESC is
> > > > > > added
> > > > > > 
> > > > > >    to indicate that indirect descriptor table size and maximum
> > > > > >    descriptor chain length are not limited by Queue Size value.
> > > > > >    (Maybe
> > > > > >    there still needs to be a limit like 2^15?)
> > > > > 
> > > > > Sounds good to me!
> > > > > 
> > > > > AFAIK it is effectively limited to 2^16 because of vring_desc->next:
> > > > > 
> > > > > /* Virtio ring descriptors: 16 bytes.  These can chain together via
> > > > > "next". */ struct vring_desc {
> > > > > 
> > > > >         /* Address (guest-physical). */
> > > > >         __virtio64 addr;
> > > > >         /* Length. */
> > > > >         __virtio32 len;
> > > > >         /* The flags as indicated above. */
> > > > >         __virtio16 flags;
> > > > >         /* We chain unused descriptors via this, too */
> > > > >         __virtio16 next;
> > > > > 
> > > > > };
> > > > 
> > > > Yes, Split Virtqueues have a fundamental limit on indirect table size
> > > > due to the "next" field. Packed Virtqueue descriptors don't have a
> > > > "next" field so descriptor chains could be longer in theory (currently
> > > > forbidden by the spec).
> > > > 
> > > > > > One thing that's messy is that we've been discussing the maximum
> > > > > > descriptor chain length but 9p has the "msize" concept, which isn't
> > > > > > aware of contiguous memory. It may be necessary to extend the 9p
> > > > > > driver
> > > > > > code to size requests not just according to their length in bytes
> > > > > > but
> > > > > > also according to the descriptor chain length. That's how the Linux
> > > > > > block layer deals with queue limits (struct queue_limits
> > > > > > max_segments vs
> > > > > > max_hw_sectors).
> > > > > 
> > > > > Hmm, can't follow on that one. For what should that be needed in case
> > > > > of
> > > > > 9p? My plan was to limit msize by 9p client simply at session start to
> > > > > whatever is the max. amount virtio descriptors supported by host and
> > > > > using PAGE_SIZE as size per descriptor, because that's what 9p client
> > > > > actually does ATM (see above). So you think that should be changed to
> > > > > e.g. just one descriptor for 4MB, right?
> > > > 
> > > > Limiting msize to the 9p transport device's maximum number of
> > > > descriptors is conservative (i.e. 128 descriptors = 512 KB msize)
> > > > because it doesn't take advantage of contiguous memory. I suggest
> > > > leaving msize alone, adding a separate limit at which requests are split
> > > > according to the maximum descriptor chain length, and tweaking
> > > > pack_sg_list() to coalesce adjacent pages.
> > > > 
> > > > That way msize can be large without necessarily using lots of
> > > > descriptors (depending on the memory layout).
> > > 
> > > That was actually a tempting solution. Because it would neither require
> > > changes to the virtio specs (at least for a while) and it would also work
> > > with older QEMU versions. And for that pack_sg_list() portion of the code
> > > it would work well and easy as the buffer passed to pack_sg_list() is
> > > contiguous already.
> > > 
> > > However I just realized for the zero-copy version of the code that would
> > > be
> > > more tricky. The ZC version already uses individual pages (struct page,
> > > hence PAGE_SIZE each) which are pinned, i.e. it uses pack_sg_list_p() [1]
> > > in combination with p9_get_mapped_pages() [2]
> > > 
> > > [1]
> > > https://github.com/torvalds/linux/blob/7ddb58cb0ecae8e8b6181d736a87667cc9
> > > ab8389/net/9p/trans_virtio.c#L218 [2]
> > > https://github.com/torvalds/linux/blob/7ddb58cb0ecae8e8b6181d736a87667cc9
> > > ab8389/net/9p/trans_virtio.c#L309
> > > 
> > > So that would require much more work and code trying to sort and coalesce
> > > individual pages to contiguous physical memory for the sake of reducing
> > > virtio descriptors. And there is no guarantee that this is even possible.
> > > The kernel may simply return a non-contiguous set of pages which would
> > > eventually end up exceeding the virtio descriptor limit again.
> > 
> > Order must be preserved so pages cannot be sorted by physical address.
> > How about simply coalescing when pages are adjacent?
> 
> It would help, but not solve the issue we are talking about here: if 99% of 
> the cases could successfully merge descriptors to stay below the descriptor 
> count limit, but in 1% of the cases it could not, then this still construes a 
> severe runtime issue that could trigger at any time.
> 
> > > So looks like it was probably still easier and realistic to just add
> > > virtio
> > > capabilities for now for allowing to exceed current descriptor limit.
> > 
> > I'm still not sure why virtio-net, virtio-blk, virtio-fs, etc perform
> > fine under today's limits while virtio-9p needs a much higher limit to
> > achieve good performance. Maybe there is an issue in a layer above the
> > vring that's causing the virtio-9p performance you've observed?
> 
> Are you referring to (somewhat) recent benchmarks when saying those would all 
> still perform fine today?

I'm not referring to specific benchmark results. Just that none of those
devices needed to raise the descriptor chain length, so I'm surprised
that virtio-9p needs it because it's conceptually similar to these
devices.

> Vivek was running detailed benchmarks for virtiofs vs. 9p:
> https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg02704.html
> 
> For the virtio aspect discussed here, only the benchmark configurations 
> without cache are relevant (9p-none, vtfs-none) and under this aspect the 
> situation seems to be quite similar between 9p and virtio-fs. You'll also note 
> once DAX is enabled (vtfs-none-dax) that apparently boosts virtio-fs 
> performance significantly, which however seems to corelate to numbers when I 
> am running 9p with msize > 300k. Note: Vivek was presumably running 9p 
> effecively with msize=300k, as this was the kernel limitation at that time.

Agreed, virtio-9p and virtiofs are similar without caching.

I think we shouldn't consider DAX here since it bypasses the virtqueue.

> To bring things into relation: there are known performance aspects in 9p that 
> can be improved, yes, both on Linux kernel side and on 9p server side in QEMU. 
> For instance 9p server uses coroutines [1] and currently dispatches between 
> worker thread(s) and main thread too often per request (partly addressed 
> already [2], but still WIP), which accumulates to overall latency. But Vivek 
> was actually using a 9p patch here which disabled coroutines entirely, which 
> suggests that the virtio transmission size limit still represents a 
> bottleneck.

These results were collected with 4k block size. Neither msize nor the
descriptor chain length limits will be stressed, so I don't think these
results are relevant here.

Maybe a more relevant comparison would be virtio-9p, virtiofs, and
virtio-blk when block size is large (e.g. 1M). The Linux block layer in
the guest will split virtio-blk requests when they exceed the block
queue limits.

Stefan

> 
> [1] https://wiki.qemu.org/Documentation/9p#Coroutines
> [2] https://wiki.qemu.org/Documentation/9p#Implementation_Plans
> 
> Best regards,
> Christian Schoenebeck
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-11-10 10:09 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04 19:38 [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k Christian Schoenebeck
2021-10-04 19:38 ` [Virtio-fs] " Christian Schoenebeck
2021-10-04 19:38 ` [PATCH v2 1/3] virtio: turn VIRTQUEUE_MAX_SIZE into a variable Christian Schoenebeck
2021-10-04 19:38   ` [Virtio-fs] " Christian Schoenebeck
2021-10-05  7:36   ` Greg Kurz
2021-10-05  7:36     ` [Virtio-fs] " Greg Kurz
2021-10-05 12:45   ` Stefan Hajnoczi
2021-10-05 12:45     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 13:15     ` Christian Schoenebeck
2021-10-05 13:15       ` [Virtio-fs] " Christian Schoenebeck
2021-10-05 15:10       ` Stefan Hajnoczi
2021-10-05 15:10         ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 16:32         ` Christian Schoenebeck
2021-10-05 16:32           ` [Virtio-fs] " Christian Schoenebeck
2021-10-06 11:06           ` Stefan Hajnoczi
2021-10-06 11:06             ` [Virtio-fs] " Stefan Hajnoczi
2021-10-06 12:50             ` Christian Schoenebeck
2021-10-06 12:50               ` [Virtio-fs] " Christian Schoenebeck
2021-10-06 14:42               ` Stefan Hajnoczi
2021-10-06 14:42                 ` [Virtio-fs] " Stefan Hajnoczi
2021-10-07 13:09                 ` Christian Schoenebeck
2021-10-07 13:09                   ` [Virtio-fs] " Christian Schoenebeck
2021-10-07 15:18                   ` Stefan Hajnoczi
2021-10-07 15:18                     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-08 14:48                     ` Christian Schoenebeck
2021-10-08 14:48                       ` [Virtio-fs] " Christian Schoenebeck
2021-10-04 19:38 ` [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k Christian Schoenebeck
2021-10-04 19:38   ` [Virtio-fs] " Christian Schoenebeck
2021-10-05  7:16   ` Michael S. Tsirkin
2021-10-05  7:16     ` [Virtio-fs] " Michael S. Tsirkin
2021-10-05  7:35     ` Greg Kurz
2021-10-05  7:35       ` [Virtio-fs] " Greg Kurz
2021-10-05 11:17     ` Christian Schoenebeck
2021-10-05 11:17       ` [Virtio-fs] " Christian Schoenebeck
2021-10-05 11:24       ` Michael S. Tsirkin
2021-10-05 11:24         ` [Virtio-fs] " Michael S. Tsirkin
2021-10-05 12:01         ` Christian Schoenebeck
2021-10-05 12:01           ` [Virtio-fs] " Christian Schoenebeck
2021-10-04 19:38 ` [PATCH v2 3/3] virtio-9p-device: switch to 32k max. transfer size Christian Schoenebeck
2021-10-04 19:38   ` [Virtio-fs] " Christian Schoenebeck
2021-10-05  7:38 ` [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k David Hildenbrand
2021-10-05  7:38   ` [Virtio-fs] " David Hildenbrand
2021-10-05 11:10   ` Christian Schoenebeck
2021-10-05 11:10     ` [Virtio-fs] " Christian Schoenebeck
2021-10-05 11:19     ` Michael S. Tsirkin
2021-10-05 11:19       ` [Virtio-fs] " Michael S. Tsirkin
2021-10-05 11:43       ` Christian Schoenebeck
2021-10-05 11:43         ` [Virtio-fs] " Christian Schoenebeck
2021-10-07  5:23 ` Stefan Hajnoczi
2021-10-07  5:23   ` [Virtio-fs] " Stefan Hajnoczi
2021-10-07 12:51   ` Christian Schoenebeck
2021-10-07 12:51     ` [Virtio-fs] " Christian Schoenebeck
2021-10-07 15:42     ` Stefan Hajnoczi
2021-10-07 15:42       ` [Virtio-fs] " Stefan Hajnoczi
2021-10-08  7:25       ` Greg Kurz
2021-10-08  7:25         ` [Virtio-fs] " Greg Kurz
2021-10-08  8:27         ` Greg Kurz
2021-10-08 14:24         ` Christian Schoenebeck
2021-10-08 14:24           ` [Virtio-fs] " Christian Schoenebeck
2021-10-08 16:08           ` Christian Schoenebeck
2021-10-08 16:08             ` [Virtio-fs] " Christian Schoenebeck
2021-10-21 15:39             ` Christian Schoenebeck
2021-10-21 15:39               ` [Virtio-fs] " Christian Schoenebeck
2021-10-25 10:30               ` Stefan Hajnoczi
2021-10-25 10:30                 ` [Virtio-fs] " Stefan Hajnoczi
2021-10-25 15:03                 ` Christian Schoenebeck
2021-10-25 15:03                   ` [Virtio-fs] " Christian Schoenebeck
2021-10-28  9:00                   ` Stefan Hajnoczi
2021-10-28  9:00                     ` [Virtio-fs] " Stefan Hajnoczi
2021-11-01 20:29                     ` Christian Schoenebeck
2021-11-01 20:29                       ` [Virtio-fs] " Christian Schoenebeck
2021-11-03 11:33                       ` Stefan Hajnoczi
2021-11-03 11:33                         ` [Virtio-fs] " Stefan Hajnoczi
2021-11-04 14:41                         ` Christian Schoenebeck
2021-11-04 14:41                           ` [Virtio-fs] " Christian Schoenebeck
2021-11-09 10:56                           ` Stefan Hajnoczi
2021-11-09 10:56                             ` [Virtio-fs] " Stefan Hajnoczi
2021-11-09 13:09                             ` Christian Schoenebeck
2021-11-09 13:09                               ` [Virtio-fs] " Christian Schoenebeck
2021-11-10 10:05                               ` Stefan Hajnoczi [this message]
2021-11-10 10:05                                 ` Stefan Hajnoczi
2021-11-10 13:14                                 ` Christian Schoenebeck
2021-11-10 13:14                                   ` [Virtio-fs] " Christian Schoenebeck
2021-11-10 15:14                                   ` Stefan Hajnoczi
2021-11-10 15:14                                     ` [Virtio-fs] " Stefan Hajnoczi
2021-11-10 15:53                                     ` Christian Schoenebeck
2021-11-10 15:53                                       ` [Virtio-fs] " Christian Schoenebeck
2021-11-11 16:31                                       ` Stefan Hajnoczi
2021-11-11 16:31                                         ` [Virtio-fs] " Stefan Hajnoczi
2021-11-11 17:54                                         ` Christian Schoenebeck
2021-11-11 17:54                                           ` [Virtio-fs] " Christian Schoenebeck
2021-11-15 11:54                                           ` Stefan Hajnoczi
2021-11-15 11:54                                             ` [Virtio-fs] " Stefan Hajnoczi
2021-11-15 14:32                                             ` Christian Schoenebeck
2021-11-15 14:32                                               ` [Virtio-fs] " Christian Schoenebeck
2021-11-16 11:13                                               ` Stefan Hajnoczi
2021-11-16 11:13                                                 ` [Virtio-fs] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYuZfkfbxcX0JDRN@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=amit@kernel.org \
    --cc=arei.gonglei@huawei.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=fam@euphon.net \
    --cc=groug@kaod.org \
    --cc=hreitz@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu_oss@crudebyte.com \
    --cc=raphael.norwitz@nutanix.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.