linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dominique Martinet <asmadeus@codewreck.org>
To: Greg Kurz <groug@kaod.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	v9fs-developer@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests
Date: Mon, 30 Jul 2018 11:31:01 +0200	[thread overview]
Message-ID: <20180730093101.GA7894@nautica> (raw)
In-Reply-To: <20180723122531.GA9773@nautica>

Dominique Martinet wrote on Mon, Jul 23, 2018:
> I'll try to get figures for various approaches before the merge window
> for 4.19 starts, it's getting closer though...

Here's some numbers; with v4.18-rc7 + current test tree (my 9p-next) as
a base.


For the context, I'm running on VMs that bind their cores to CPUs on the
host (32 cores), and have a Connect-IB mellanox card through SRIOV.

The server is nfs-ganesha, serving a tmpfs filesystem on a second VM
(different host)

Mounting with msize=$((1024*1024))

My main problem with this test is that the client has way too much
memory and it's mostly pristine with a boot not long before, so any kind
of memory pressure won't be seen here.
If someone knows how to fragment memory quickly I'll take that and rerun
the tests :)


I've changed my mind from mdtest to a simple ior, as I'm testing on
trans=rdma there's no difference and I'm more familiar with ior options.

I ran two workloads:
 - 32 processes, file per process, 512k at a time writing a total of
32GB (1GB per file), repeated 10 times
 - 32 processes, file per process, 32 bytes at a time writing a total of
16MB (512k per file), repeated 10 times.

The first test gives a proper impression of the throughput the systems
can sustain and the results are pretty much around what I was expecting
for the setup; the second test is purely a latency test (how long does
it take to send 512k RPCs)


I ran almost all of these tests with KASAN enabled in the VMs a first
time, so leaving the results with KASAN at the end for reference...


Overall I'm rather happy with the result, without KASAN the overhead of
the patch isn't negligible (~6%) but I'd say it's acceptable for
correctness and with an extra two patchs with the suggesteed changes
(rounding down the alloc size to not include the struct overhead and
separate kmem cache) it's getting down to 0.5% which is quite good, I
think.

I'll send the two patchs to the list shortly. The first one is rather
huge even if it's a trivial change logically, so part of me wants to get
it merged quickly to not have to deal with rebases... ;)


With KASAN, well, it certainly does more things but I hope
performance-critical systems don't have it enabled in the first place.



Raw results:

 * Base = 4.18-rc7 + queued patches, without request cache rework
- "Big" I/Os:
Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5842.40    5751.58    5793.53      23.93    5.65606 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6098.92    6018.63    6064.30      20.00    5.40348 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           2.10       1.91       2.00       0.05    8.01074 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.27       1.07       1.15       0.06   13.93901 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
 -> 512k / 8.01074 = 65.4k req/s


 * Base + patch as submitted
- "Big" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5844.84    5665.32    5787.15      48.94    5.66261 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6082.24    6039.62    6057.14      12.50    5.40983 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
                             
- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           1.95       1.82       1.88       0.04    8.50453 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.18       1.07       1.14       0.03   14.04634 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
 -> 512k / 8.50453 = 61.6k req/s


 * Base + patch as submitted + moving the header into req so the
allocation is "round" as suggested by Matthew
- "Big" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5861.79    5680.99    5795.71      48.84    5.65424 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6098.54    6037.55    6067.80      19.39    5.40036 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           1.98       1.81       1.90       0.06    8.43521 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.19       1.08       1.13       0.03   14.11709 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
 -> 62.2k req/s

 * Base + patchs submitted + round alloc + kmem cache in the client
struct
- "Big" I/Os
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5859.51    5747.64    5808.22      34.81    5.64186 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6087.90    6037.03    6063.98      15.14    5.40374 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           2.07       1.95       1.99       0.03    8.05362 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.22       1.11       1.16       0.04   13.75312 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
 -> 65.1k req/s

 * Base + patchs submitted + kmem cache in the client struct (kind of
similar to testing an 'odd' msize like 1.001MB)
- "Big" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5883.03    5725.30    5811.58      45.22    5.63874 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6090.29    6015.23    6062.49      25.93    5.40514 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           2.07       1.89       1.98       0.05    8.10028 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.23       1.05       1.12       0.05   14.25607 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
 -> 64.7k req/s




Raw results with KASAN:
 * Base = 4.18-rc7 + queued patches, without request cache rework
- "Big" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5790.03    5705.32    5749.69      27.63    5.69922 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6095.11    6007.29    6066.50      26.26    5.40157 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           1.63       1.53       1.58       0.03   10.10286 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.43       1.19       1.31       0.07   12.27704 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0


 * Base + patch as submitted
- "Big" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5773.60    5673.92    5729.01      29.63    5.71982 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6097.96    6006.50    6059.40      26.74    5.40790 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           1.15       1.08       1.12       0.02   14.32230 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.18       1.06       1.10       0.04   14.51172 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0


 * Base + patch as submitted + moving the header into req so the
allocation is "round" as suggested by Matthew
- "Big" I/Os:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5878.75    5709.74    5798.96      57.12    5.65122 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6089.83    6039.75    6072.64      14.78    5.39604 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test#
#Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize
aggsize API RefNum
write           1.33       1.26       1.29       0.02   12.38185 0 32 32
10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            1.18       1.08       1.15       0.03   13.90525 0 32 32
10 1 0 1 0 0 1 524288 32 16777216 POSIX 0


 * Base + patchs submitted + round alloc + kmem cache in the client
struct
- "Big" I/Os
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        5816.89    5729.58    5775.02      26.71    5.67422 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read         6087.33    6032.62    6058.69      16.73    5.40847 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0


- "Small" I/Os
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write           0.87       0.85       0.86       0.01   18.59584 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read            0.89       0.86       0.88       0.01   18.26275 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
 -> I'm not sure why it's so different, actually; the cache doesn't turn
up in /proc/slabinfo so I'm figuring it got merged with kmalloc-1024 so
there should be no difference? And this turned out fine without KASAN...


-- 
Dominique
 

  parent reply	other threads:[~2018-07-30  9:31 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 21:02 [PATCH v2 0/6] 9p: Use IDRs more effectively Matthew Wilcox
2018-07-11 21:02 ` [PATCH v2 1/6] 9p: Fix comment on smp_wmb Matthew Wilcox
2018-07-12 11:55   ` [V9fs-developer] " Greg Kurz
2018-07-11 21:02 ` [PATCH v2 2/6] 9p: Change p9_fid_create calling convention Matthew Wilcox
2018-07-12  2:15   ` [V9fs-developer] " piaojun
2018-07-12 11:56   ` Greg Kurz
2018-07-13  1:18   ` jiangyiwen
2018-07-11 21:02 ` [PATCH v2 3/6] 9p: Replace the fidlist with an IDR Matthew Wilcox
2018-07-12 11:17   ` Dominique Martinet
2018-07-12 11:23     ` Matthew Wilcox
2018-07-12 11:30       ` Dominique Martinet
2018-07-13  2:05   ` [V9fs-developer] " jiangyiwen
2018-07-13  2:48     ` Matthew Wilcox
2018-07-11 21:02 ` [PATCH v2 4/6] 9p: Embed wait_queue_head into p9_req_t Matthew Wilcox
2018-07-12 14:36   ` [V9fs-developer] " Greg Kurz
2018-07-12 14:40     ` Dominique Martinet
2018-07-12 14:59       ` Greg Kurz
2018-07-11 21:02 ` [PATCH v2 5/6] 9p: Use a slab for allocating requests Matthew Wilcox
2018-07-18 10:05   ` Dominique Martinet
2018-07-18 11:49     ` Matthew Wilcox
2018-07-18 12:46       ` Dominique Martinet
2018-07-23 11:52     ` Greg Kurz
2018-07-23 12:25       ` Dominique Martinet
2018-07-23 14:24         ` Greg Kurz
2018-07-30  9:31         ` Dominique Martinet [this message]
2018-07-30  9:34           ` [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs Dominique Martinet
2018-07-30  9:34             ` [PATCH 2/2] net/9p: add a per-client fcall kmem_cache Dominique Martinet
2018-07-31  1:18               ` [V9fs-developer] " piaojun
2018-07-31  1:35                 ` Dominique Martinet
2018-07-31  1:45                   ` piaojun
2018-07-31  2:46               ` Matthew Wilcox
2018-07-31  4:17                 ` Dominique Martinet
2018-08-01 14:28               ` [V9fs-developer] " Greg Kurz
2018-08-01 15:22                 ` Dominique Martinet
2018-07-31  0:55             ` [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs piaojun
2018-07-31  1:12               ` Dominique Martinet
2018-07-31  1:28                 ` piaojun
2018-08-01 14:14             ` Greg Kurz
2018-08-01 14:38               ` Dominique Martinet
2018-08-01 15:03                 ` Greg Kurz
2018-08-02  2:37             ` [PATCH v2 " Dominique Martinet
2018-08-02  2:37               ` [PATCH v2 2/2] net/9p: add a per-client fcall kmem_cache Dominique Martinet
2018-08-02  4:58                 ` [V9fs-developer] " Dominique Martinet
2018-08-02  9:23               ` [PATCH v2 1/2] net/9p: embed fcall in req to round down buffer allocs Greg Kurz
2018-08-02 22:03                 ` Dominique Martinet
2018-08-09 14:33               ` [PATCH v3 " Dominique Martinet
2018-08-09 14:33                 ` [PATCH v3 2/2] net/9p: add a per-client fcall kmem_cache Dominique Martinet
2018-08-10  1:23                   ` piaojun
2018-08-10  1:41                     ` Dominique Martinet
2018-08-10  1:49                       ` piaojun
2018-08-10  0:47                 ` [PATCH v3 1/2] net/9p: embed fcall in req to round down buffer allocs piaojun
2018-07-11 21:02 ` [PATCH v2 6/6] 9p: Remove p9_idpool Matthew Wilcox
2018-07-11 23:37 ` [PATCH v2 0/6] 9p: Use IDRs more effectively Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180730093101.GA7894@nautica \
    --to=asmadeus@codewreck.org \
    --cc=groug@kaod.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).