From: Greg Kurz <firstname.lastname@example.org>
To: Dominique Martinet <email@example.com>
Cc: Matthew Wilcox <firstname.lastname@example.org>,
Latchesar Ionkov <email@example.com>,
Eric Van Hensbergen <firstname.lastname@example.org>,
Ron Minnich <email@example.com>,
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests
Date: Mon, 23 Jul 2018 16:24:14 +0200 [thread overview]
Message-ID: <20180723162414.2c119be2@bahia> (raw)
On Mon, 23 Jul 2018 14:25:31 +0200
Dominique Martinet <firstname.lastname@example.org> wrote:
> Greg Kurz wrote on Mon, Jul 23, 2018:
> > The patch is quite big and I'm not sure I can find time to review it
> > carefully, but I'll try to help anyway.
> No worry, thanks for this already.
> > > Sorry for coming back to this patch now, I just noticed something that's
> > > actually probably a fairly big hit on performance...
> > >
> > > While the slab is just as good as the array for the request itself, this
> > > makes every single request allocate "fcalls" everytime instead of
> > > reusing a cached allocation.
> > > The default msize is 8k and these allocs probably are fairly efficient,
> > > but some transports like RDMA allow to increase this to up to 1MB... And
> > It can be even bigger with virtio:
> > #define VIRTQUEUE_NUM 128
> > .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
> > On a typical ppc64 server class setup with 64KB pages, this is nearly 8MB.
> I don't think I'll be able to test 64KB pages, and it's "just" 500k with
> 4K pages so I'll go with IB.
> I just finished reinstalling my IB-enabled VMs, now to get some iops
> test running (dbench maybe) and I'll get some figures to be able to play
> with different models and evaluate the impact of these.
Sounds like a good plan.
> > > One thing is that the buffers are all going to be the same size for a
> > > given client (.... except virtio zc buffers, I wonder what I'm missing
> > > or why that didn't blow up before?)
> > ZC allocates a 4KB buffer, which is more than enough to hold the 7-byte 9P
> > header and the "dqd" part of all messages that may use ZC, ie, 16 bytes.
> > So I'm not sure to catch what could blow up.
> ZC requests won't blow up, but from what I can see with the current
> (old) request cache array, if a ZC request has a not-yet used tag it'll
> allocate a new 4k buffer, then if a normal request uses that tag it'll
> get the 4k buffer instead of an msize sized one.
> On the client size the request would be posted with req->rc->capacity
> which would correctly be 4k, but I'm not sure what would happen if qemu
> tries to write more than the given size to that request?
QEMU would detect that the sg list doesn't have enough capacity.
Old QEMUs used to return a RERROR or RLERROR message with ENOBUFS
in this case. This didn't made sense to hijack the 9P protocol, which
is transport agnostic to report misconfigured buffers. Especially, in
the worst case, maybe we wouldn't even have enough space for the error
response... So, since QEMU 2.10, we put the virtio 9p device into
broken state instead, ie, inoperative until it gets reset.
I guess this situation was never hit because server responses mostly
need less than 4KB...
> > > It's a shame because I really like that patch, I'll try to find time to
> > > run some light benchmark with varying msizes eventually but I'm not sure
> > > when I'll find time for that... Hopefully before the 4.19 merge window!
> > >
> > Yeah, the open-coded cache we have now really obfuscates things.
> > Maybe have a per-client kmem_cache object for non-ZC requests with
> > size msize [*], and a global kmem_cache object for ZC requests with
> > fixed size P9_ZC_HDR_SZ.
> > [*] the server can require a smaller msize during version negotiation,
> > so maybe we should change the kmem_cache object in this case.
> Yeah, if we're going to want to accomodate non-power of two buffers, I
> think we'll need a separate kmem_cache for them.
> The ZC requests could be made into exactly 4k and these could come with
> regular kmalloc just fine, it looks like trying to create a cache of
> that size would just return the same cache used by kmalloc anyway so
> it's probably easier to fall back to kmalloc if requested alloc size
> doesn't match what we were hoping for.
You're right, ZC requests could rely on kmalloc() directly.
> I'll try to get figures for various approaches before the merge window
> for 4.19 starts, it's getting closer though...
Great thanks for your effort, but we've been leaving with this code
since the beginning. If this misses the 4.19 merge window, we'll have
more time to validate the approach and polish the fix for 4.20 :)
next prev parent reply other threads:[~2018-07-23 16:02 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 21:02 [PATCH v2 0/6] 9p: Use IDRs more effectively Matthew Wilcox
2018-07-11 21:02 ` [PATCH v2 1/6] 9p: Fix comment on smp_wmb Matthew Wilcox
2018-07-12 11:55 ` [V9fs-developer] " Greg Kurz
2018-07-11 21:02 ` [PATCH v2 2/6] 9p: Change p9_fid_create calling convention Matthew Wilcox
2018-07-12 2:15 ` [V9fs-developer] " piaojun
2018-07-12 11:56 ` Greg Kurz
2018-07-13 1:18 ` jiangyiwen
2018-07-11 21:02 ` [PATCH v2 3/6] 9p: Replace the fidlist with an IDR Matthew Wilcox
2018-07-12 11:17 ` Dominique Martinet
2018-07-12 11:23 ` Matthew Wilcox
2018-07-12 11:30 ` Dominique Martinet
2018-07-13 2:05 ` [V9fs-developer] " jiangyiwen
2018-07-13 2:48 ` Matthew Wilcox
2018-07-11 21:02 ` [PATCH v2 4/6] 9p: Embed wait_queue_head into p9_req_t Matthew Wilcox
2018-07-12 14:36 ` [V9fs-developer] " Greg Kurz
2018-07-12 14:40 ` Dominique Martinet
2018-07-12 14:59 ` Greg Kurz
2018-07-11 21:02 ` [PATCH v2 5/6] 9p: Use a slab for allocating requests Matthew Wilcox
2018-07-18 10:05 ` Dominique Martinet
2018-07-18 11:49 ` Matthew Wilcox
2018-07-18 12:46 ` Dominique Martinet
2018-07-23 11:52 ` Greg Kurz
2018-07-23 12:25 ` Dominique Martinet
2018-07-23 14:24 ` Greg Kurz [this message]
2018-07-30 9:31 ` Dominique Martinet
2018-07-30 9:34 ` [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs Dominique Martinet
2018-07-30 9:34 ` [PATCH 2/2] net/9p: add a per-client fcall kmem_cache Dominique Martinet
2018-07-31 1:18 ` [V9fs-developer] " piaojun
2018-07-31 1:35 ` Dominique Martinet
2018-07-31 1:45 ` piaojun
2018-07-31 2:46 ` Matthew Wilcox
2018-07-31 4:17 ` Dominique Martinet
2018-08-01 14:28 ` [V9fs-developer] " Greg Kurz
2018-08-01 15:22 ` Dominique Martinet
2018-07-31 0:55 ` [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs piaojun
2018-07-31 1:12 ` Dominique Martinet
2018-07-31 1:28 ` piaojun
2018-08-01 14:14 ` Greg Kurz
2018-08-01 14:38 ` Dominique Martinet
2018-08-01 15:03 ` Greg Kurz
2018-08-02 2:37 ` [PATCH v2 " Dominique Martinet
2018-08-02 2:37 ` [PATCH v2 2/2] net/9p: add a per-client fcall kmem_cache Dominique Martinet
2018-08-02 4:58 ` [V9fs-developer] " Dominique Martinet
2018-08-02 9:23 ` [PATCH v2 1/2] net/9p: embed fcall in req to round down buffer allocs Greg Kurz
2018-08-02 22:03 ` Dominique Martinet
2018-08-09 14:33 ` [PATCH v3 " Dominique Martinet
2018-08-09 14:33 ` [PATCH v3 2/2] net/9p: add a per-client fcall kmem_cache Dominique Martinet
2018-08-10 1:23 ` piaojun
2018-08-10 1:41 ` Dominique Martinet
2018-08-10 1:49 ` piaojun
2018-08-10 0:47 ` [PATCH v3 1/2] net/9p: embed fcall in req to round down buffer allocs piaojun
2018-07-11 21:02 ` [PATCH v2 6/6] 9p: Remove p9_idpool Matthew Wilcox
2018-07-11 23:37 ` [PATCH v2 0/6] 9p: Use IDRs more effectively Dominique Martinet
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).