netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dominique Martinet <asmadeus@codewreck.org>
To: Nikolay Kichukov <nikolay@oldum.net>
Cc: Christian Schoenebeck <linux_oss@crudebyte.com>,
	v9fs-developer@lists.sourceforge.net, netdev@vger.kernel.org,
	Eric Van Hensbergen <ericvh@gmail.com>,
	Latchesar Ionkov <lucho@ionkov.net>, Greg Kurz <groug@kaod.org>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH v3 6/7] 9p/trans_virtio: support larger msize values
Date: Sat, 20 Nov 2021 20:45:09 +0900	[thread overview]
Message-ID: <YZjfxT24by0Cse6q@codewreck.org> (raw)
In-Reply-To: <cef6a6c6f8313a609ef104cc64ee6cf9d0ed6adb.camel@oldum.net>

(Thanks for restarting this thread, this had totally slipped out of my
mind...)

Nikolay Kichukov wrote on Sat, Nov 20, 2021 at 12:20:35PM +0100:
> When the client mounts the share via virtio, requested msize is:
> 10485760 or 104857600
> 
> however the mount succeeds with:
> msize=507904 in the end as per the /proc filesystem. This is less than
> the previous maximum value.

(Not sure about this, I'll test these patches tomorrow, but since
something failed I'm not surprised you have less than what you could
have here: what do you get with a more reasonable value like 1M first?)

> In addition to the above, when the kernel on the guest boots and loads
> 9pfs support, the attached memory allocation failure trace is generated.
> 
> Is anyone else seeing similar and was anybody able to get msize set to
> 10MB via virtio protocol with these patches?

I don't think the kernel would ever allow this with the given code, as
the "common" part of 9p is not smart enough to use scatter-gather and
tries to do a big allocation (almost) the size of the msize:

---
        clnt->fcall_cache =
                kmem_cache_create_usercopy("9p-fcall-cache", clnt->msize,
                                           0, 0, P9_HDRSZ + 4,
                                           clnt->msize - (P9_HDRSZ + 4),
                                           NULL);

...
		fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
---
So in practice, you will be capped at 2MB as that is the biggest the
slab will be able to hand over in a single chunk.

It'll also make allocation failures quite likely as soon as the system
has had some uptime (depending on your workload, look at /proc/buddyinfo
if your machines normally have 2MB chunks available), so I would really
not recommend running with buffers bigger than e.g. 512k on real
workloads -- it looks great on benchmarks, especially as it's on its own
slab so as long as you're doing a lot of requests it will dish out
buffers fast, but it'll likely be unreliable over time.
(oh, and we allocate two of these per request...)


The next step to support large buffers really would be splitting htat
buffer to:
 1/ not allocate huge buffers for small metadata ops, e.g. an open call
certainly doesn't need to allocate 10MB of memory
 2/ support splitting the buffer in some scattergather array

Ideally we'd only allocate on an as-need basis, most of the protocol
calls bound how much data is supposed to come back and we know how much
we want to send (it's a format string actually, but we can majorate it
quite easily), so one would need to adjust all protocol calls to pass
this info to p9_client_rpc/p9_client_zc_rpc so it only allocates buffers
as required, if necessary in multiple reasonably-sized segments (I'd
love 2MB hugepages-backed folios...), and have all transports use these
buffers.

I have a rough idea on how to do all this but honestly less than 0 time
for that, so happy to give advices or review any patch, but it's going
to be a lot of work that stand in the way of really big IOs.



> [    1.527981] 9p: Installing v9fs 9p2000 file system support
> [    1.528173] ------------[ cut here ]------------
> [    1.528174] WARNING: CPU: 1 PID: 791 at mm/page_alloc.c:5356 __alloc_pages+0x1ed/0x290


This warning is exactly what I was saying about the allocation cap:
you've requested an allocation that was bigger than the max __alloc_page
is willing to provide (MAX_ORDER, 11, so 2MB as I was saying)

-- 
Dominique

  reply	other threads:[~2021-11-20 11:45 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22 16:00 [PATCH v3 0/7] net/9p: remove msize limit in virtio transport Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 1/7] net/9p: show error message if user 'msize' cannot be satisfied Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 2/7] 9p/trans_virtio: separate allocation of scatter gather list Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 3/7] 9p/trans_virtio: turn amount of sg lists into runtime info Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 4/7] 9p/trans_virtio: introduce struct virtqueue_sg Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 5/7] net/9p: add trans_maxsize to struct p9_client Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 6/7] 9p/trans_virtio: support larger msize values Christian Schoenebeck
2021-11-20 11:20   ` Nikolay Kichukov
2021-11-20 11:45     ` Dominique Martinet [this message]
2021-11-20 14:46       ` Christian Schoenebeck
2021-11-20 21:28         ` Nikolay Kichukov
2021-11-20 23:02         ` Dominique Martinet
2021-11-21 16:57           ` Christian Schoenebeck
2021-11-21 22:12             ` Dominique Martinet
2021-11-22 13:32               ` Christian Schoenebeck
2021-11-22 22:35                 ` Dominique Martinet
2021-11-24 12:22                   ` Christian Schoenebeck
2021-09-22 16:00 ` [PATCH v3 7/7] 9p/trans_virtio: resize sg lists to whatever is possible Christian Schoenebeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZjfxT24by0Cse6q@codewreck.org \
    --to=asmadeus@codewreck.org \
    --cc=ericvh@gmail.com \
    --cc=groug@kaod.org \
    --cc=linux_oss@crudebyte.com \
    --cc=lucho@ionkov.net \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@oldum.net \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).