All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Zheng <fam@euphon.net>, "Michael S. Tsirkin" <mst@redhat.com>,
	Wang <jasowang@redhat.com>, Schoenebeck <qemu_oss@crudebyte.com>,
	qemu-devel@nongnu.org, Gerd@redhat.com,
	Hoffmann <kraxel@redhat.com>,
	virtio-fs@redhat.com, qemu-block@nongnu.org,
	"David Hildenbrand" <david@redhat.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Amit Shah" <amit@kernel.org>,
	"Eric Auger" <eric.auger@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	Norwitz <raphael.norwitz@nutanix.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Virtio-fs] [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
Date: Fri, 8 Oct 2021 10:27:01 +0200	[thread overview]
Message-ID: <20211008102701.59f7d8cd@bahia.huguette> (raw)
In-Reply-To: <20211008092533.376b568b@bahia.huguette>

[-- Attachment #1: Type: text/plain, Size: 5792 bytes --]

On Fri, 8 Oct 2021 09:25:33 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Thu, 7 Oct 2021 16:42:49 +0100
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Thu, Oct 07, 2021 at 02:51:55PM +0200, Christian Schoenebeck wrote:
> > > On Donnerstag, 7. Oktober 2021 07:23:59 CEST Stefan Hajnoczi wrote:
> > > > On Mon, Oct 04, 2021 at 09:38:00PM +0200, Christian Schoenebeck wrote:
> > > > > At the moment the maximum transfer size with virtio is limited to 4M
> > > > > (1024 * PAGE_SIZE). This series raises this limit to its maximum
> > > > > theoretical possible transfer size of 128M (32k pages) according to the
> > > > > virtio specs:
> > > > > 
> > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#
> > > > > x1-240006
> > > > Hi Christian,
> > > > I took a quick look at the code:
> > > > 
> 
> 
> Hi,
> 
> Thanks Stefan for sharing virtio expertise and helping Christian !
> 
> > > > - The Linux 9p driver restricts descriptor chains to 128 elements
> > > >   (net/9p/trans_virtio.c:VIRTQUEUE_NUM)
> > > 
> > > Yes, that's the limitation that I am about to remove (WIP); current kernel 
> > > patches:
> > > https://lore.kernel.org/netdev/cover.1632327421.git.linux_oss@crudebyte.com/
> > 
> > I haven't read the patches yet but I'm concerned that today the driver
> > is pretty well-behaved and this new patch series introduces a spec
> > violation. Not fixing existing spec violations is okay, but adding new
> > ones is a red flag. I think we need to figure out a clean solution.
> > 
> > > > - The QEMU 9pfs code passes iovecs directly to preadv(2) and will fail
> > > >   with EINVAL when called with more than IOV_MAX iovecs
> > > >   (hw/9pfs/9p.c:v9fs_read())
> > > 
> > > Hmm, which makes me wonder why I never encountered this error during testing.
> > > 
> > > Most people will use the 9p qemu 'local' fs driver backend in practice, so 
> > > that v9fs_read() call would translate for most people to this implementation 
> > > on QEMU side (hw/9p/9p-local.c):
> > > 
> > > static ssize_t local_preadv(FsContext *ctx, V9fsFidOpenState *fs,
> > >                             const struct iovec *iov,
> > >                             int iovcnt, off_t offset)
> > > {
> > > #ifdef CONFIG_PREADV
> > >     return preadv(fs->fd, iov, iovcnt, offset);
> > > #else
> > >     int err = lseek(fs->fd, offset, SEEK_SET);
> > >     if (err == -1) {
> > >         return err;
> > >     } else {
> > >         return readv(fs->fd, iov, iovcnt);
> > >     }
> > > #endif
> > > }
> > > 
> > > > Unless I misunderstood the code, neither side can take advantage of the
> > > > new 32k descriptor chain limit?
> > > > 
> > > > Thanks,
> > > > Stefan
> > > 
> > > I need to check that when I have some more time. One possible explanation 
> > > might be that preadv() already has this wrapped into a loop in its 
> > > implementation to circumvent a limit like IOV_MAX. It might be another "it 
> > > works, but not portable" issue, but not sure.
> > >
> > > There are still a bunch of other issues I have to resolve. If you look at
> > > net/9p/client.c on kernel side, you'll notice that it basically does this ATM
> > > 
> > >     kmalloc(msize);
> > > 
> 
> Note that this is done twice : once for the T message (client request) and once
> for the R message (server answer). The 9p driver could adjust the size of the T
> message to what's really needed instead of allocating the full msize. R message
> size is not known though.
> 
> > > for every 9p request. So not only does it allocate much more memory for every 
> > > request than actually required (i.e. say 9pfs was mounted with msize=8M, then 
> > > a 9p request that actually would just need 1k would nevertheless allocate 8M), 
> > > but also it allocates > PAGE_SIZE, which obviously may fail at any time.
> > 
> > The PAGE_SIZE limitation sounds like a kmalloc() vs vmalloc() situation.
> > 
> > I saw zerocopy code in the 9p guest driver but didn't investigate when
> > it's used. Maybe that should be used for large requests (file
> > reads/writes)?
> 
> This is the case already : zero-copy is only used for reads/writes/readdir
> if the requested size is 1k or more.
> 
> Also you'll note that in this case, the 9p driver doesn't allocate msize
> for the T/R messages but only 4k, which is largely enough to hold the
> header.
> 
> 	/*
> 	 * We allocate a inline protocol data of only 4k bytes.
> 	 * The actual content is passed in zero-copy fashion.
> 	 */
> 	req = p9_client_prepare_req(c, type, P9_ZC_HDR_SZ, fmt, ap);
> 
> and
> 
> /* size of header for zero copy read/write */
> #define P9_ZC_HDR_SZ 4096
> 
> A huge msize only makes sense for Twrite, Rread and Rreaddir because
> of the amount of data they convey. All other messages certainly fit
> in a couple of kilobytes only (sorry, don't remember the numbers).
> 
> A first change should be to allocate MIN(XXX, msize) for the
> regular non-zc case, where XXX could be a reasonable fixed
> value (8k?). 


Note that this would violate the 9p spec since the server
can legitimately use the negotiated msize for all R messages
even if all of them only need a couple of bytes in practice,
at worse a couple of kilobytes if a path is involved.

In a ideal world, this would call for a spec refinement to
special case Rread and Rreaddir, which are the only ones
where a high msize is useful AFAICT.

> In the case of T messages, it is even possible
> to adjust the size to what's exactly needed, ala snprintf(NULL).
> 
> > virtio-blk/scsi don't memcpy data into a new buffer, they
> > directly access page cache or O_DIRECT pinned pages.
> > 
> > Stefan
> 
> Cheers,
> 
> --
> Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-10-08  8:27 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04 19:38 [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k Christian Schoenebeck
2021-10-04 19:38 ` [Virtio-fs] " Christian Schoenebeck
2021-10-04 19:38 ` [PATCH v2 1/3] virtio: turn VIRTQUEUE_MAX_SIZE into a variable Christian Schoenebeck
2021-10-04 19:38   ` [Virtio-fs] " Christian Schoenebeck
2021-10-05  7:36   ` Greg Kurz
2021-10-05  7:36     ` [Virtio-fs] " Greg Kurz
2021-10-05 12:45   ` Stefan Hajnoczi
2021-10-05 12:45     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 13:15     ` Christian Schoenebeck
2021-10-05 13:15       ` [Virtio-fs] " Christian Schoenebeck
2021-10-05 15:10       ` Stefan Hajnoczi
2021-10-05 15:10         ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 16:32         ` Christian Schoenebeck
2021-10-05 16:32           ` [Virtio-fs] " Christian Schoenebeck
2021-10-06 11:06           ` Stefan Hajnoczi
2021-10-06 11:06             ` [Virtio-fs] " Stefan Hajnoczi
2021-10-06 12:50             ` Christian Schoenebeck
2021-10-06 12:50               ` [Virtio-fs] " Christian Schoenebeck
2021-10-06 14:42               ` Stefan Hajnoczi
2021-10-06 14:42                 ` [Virtio-fs] " Stefan Hajnoczi
2021-10-07 13:09                 ` Christian Schoenebeck
2021-10-07 13:09                   ` [Virtio-fs] " Christian Schoenebeck
2021-10-07 15:18                   ` Stefan Hajnoczi
2021-10-07 15:18                     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-08 14:48                     ` Christian Schoenebeck
2021-10-08 14:48                       ` [Virtio-fs] " Christian Schoenebeck
2021-10-04 19:38 ` [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k Christian Schoenebeck
2021-10-04 19:38   ` [Virtio-fs] " Christian Schoenebeck
2021-10-05  7:16   ` Michael S. Tsirkin
2021-10-05  7:16     ` [Virtio-fs] " Michael S. Tsirkin
2021-10-05  7:35     ` Greg Kurz
2021-10-05  7:35       ` [Virtio-fs] " Greg Kurz
2021-10-05 11:17     ` Christian Schoenebeck
2021-10-05 11:17       ` [Virtio-fs] " Christian Schoenebeck
2021-10-05 11:24       ` Michael S. Tsirkin
2021-10-05 11:24         ` [Virtio-fs] " Michael S. Tsirkin
2021-10-05 12:01         ` Christian Schoenebeck
2021-10-05 12:01           ` [Virtio-fs] " Christian Schoenebeck
2021-10-04 19:38 ` [PATCH v2 3/3] virtio-9p-device: switch to 32k max. transfer size Christian Schoenebeck
2021-10-04 19:38   ` [Virtio-fs] " Christian Schoenebeck
2021-10-05  7:38 ` [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k David Hildenbrand
2021-10-05  7:38   ` [Virtio-fs] " David Hildenbrand
2021-10-05 11:10   ` Christian Schoenebeck
2021-10-05 11:10     ` [Virtio-fs] " Christian Schoenebeck
2021-10-05 11:19     ` Michael S. Tsirkin
2021-10-05 11:19       ` [Virtio-fs] " Michael S. Tsirkin
2021-10-05 11:43       ` Christian Schoenebeck
2021-10-05 11:43         ` [Virtio-fs] " Christian Schoenebeck
2021-10-07  5:23 ` Stefan Hajnoczi
2021-10-07  5:23   ` [Virtio-fs] " Stefan Hajnoczi
2021-10-07 12:51   ` Christian Schoenebeck
2021-10-07 12:51     ` [Virtio-fs] " Christian Schoenebeck
2021-10-07 15:42     ` Stefan Hajnoczi
2021-10-07 15:42       ` [Virtio-fs] " Stefan Hajnoczi
2021-10-08  7:25       ` Greg Kurz
2021-10-08  7:25         ` [Virtio-fs] " Greg Kurz
2021-10-08  8:27         ` Greg Kurz [this message]
2021-10-08 14:24         ` Christian Schoenebeck
2021-10-08 14:24           ` [Virtio-fs] " Christian Schoenebeck
2021-10-08 16:08           ` Christian Schoenebeck
2021-10-08 16:08             ` [Virtio-fs] " Christian Schoenebeck
2021-10-21 15:39             ` Christian Schoenebeck
2021-10-21 15:39               ` [Virtio-fs] " Christian Schoenebeck
2021-10-25 10:30               ` Stefan Hajnoczi
2021-10-25 10:30                 ` [Virtio-fs] " Stefan Hajnoczi
2021-10-25 15:03                 ` Christian Schoenebeck
2021-10-25 15:03                   ` [Virtio-fs] " Christian Schoenebeck
2021-10-28  9:00                   ` Stefan Hajnoczi
2021-10-28  9:00                     ` [Virtio-fs] " Stefan Hajnoczi
2021-11-01 20:29                     ` Christian Schoenebeck
2021-11-01 20:29                       ` [Virtio-fs] " Christian Schoenebeck
2021-11-03 11:33                       ` Stefan Hajnoczi
2021-11-03 11:33                         ` [Virtio-fs] " Stefan Hajnoczi
2021-11-04 14:41                         ` Christian Schoenebeck
2021-11-04 14:41                           ` [Virtio-fs] " Christian Schoenebeck
2021-11-09 10:56                           ` Stefan Hajnoczi
2021-11-09 10:56                             ` [Virtio-fs] " Stefan Hajnoczi
2021-11-09 13:09                             ` Christian Schoenebeck
2021-11-09 13:09                               ` [Virtio-fs] " Christian Schoenebeck
2021-11-10 10:05                               ` Stefan Hajnoczi
2021-11-10 10:05                                 ` [Virtio-fs] " Stefan Hajnoczi
2021-11-10 13:14                                 ` Christian Schoenebeck
2021-11-10 13:14                                   ` [Virtio-fs] " Christian Schoenebeck
2021-11-10 15:14                                   ` Stefan Hajnoczi
2021-11-10 15:14                                     ` [Virtio-fs] " Stefan Hajnoczi
2021-11-10 15:53                                     ` Christian Schoenebeck
2021-11-10 15:53                                       ` [Virtio-fs] " Christian Schoenebeck
2021-11-11 16:31                                       ` Stefan Hajnoczi
2021-11-11 16:31                                         ` [Virtio-fs] " Stefan Hajnoczi
2021-11-11 17:54                                         ` Christian Schoenebeck
2021-11-11 17:54                                           ` [Virtio-fs] " Christian Schoenebeck
2021-11-15 11:54                                           ` Stefan Hajnoczi
2021-11-15 11:54                                             ` [Virtio-fs] " Stefan Hajnoczi
2021-11-15 14:32                                             ` Christian Schoenebeck
2021-11-15 14:32                                               ` [Virtio-fs] " Christian Schoenebeck
2021-11-16 11:13                                               ` Stefan Hajnoczi
2021-11-16 11:13                                                 ` [Virtio-fs] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211008102701.59f7d8cd@bahia.huguette \
    --to=groug@kaod.org \
    --cc=Gerd@redhat.com \
    --cc=amit@kernel.org \
    --cc=arei.gonglei@huawei.com \
    --cc=david@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu_oss@crudebyte.com \
    --cc=raphael.norwitz@nutanix.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.