From: Nikolay Borisov <nborisov@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH] btrfs: Remove __extent_readpages
Date: Mon, 3 Dec 2018 16:25:13 +0200 [thread overview]
Message-ID: <51be8fe3-e319-6666-2511-342446297f61@suse.com> (raw)
In-Reply-To: <20181203102532.13945-1-nborisov@suse.com>
On 3.12.18 г. 12:25 ч., Nikolay Borisov wrote:
> When extent_readpages is called from the generic readahead code it first
> builds a batch of 16 pages (which might or might not be consecutive,
> depending on whether add_to_page_cache_lru failed) and submits them to
> __extent_readpages. The latter ensures that the range of pages (in the
> batch of 16) that is passed to __do_contiguous_readpages is consecutive.
>
> If add_to_page_cache_lru does't fail then __extent_readpages will call
> __do_contiguous_readpages only once with the whole batch of 16.
> Alternatively, if add_to_page_cache_lru fails once on the 8th page (as an example)
> then the contigous page read code will be called twice.
>
> All of this can be simplified by exploiting the fact that all pages passed
> to extent_readpages are consecutive, thus when the batch is built in
> that function it is already consecutive (barring add_to_page_cache_lru
> failures) so are ready to be submitted directly to __do_contiguous_readpages.
> Also simplify the name of the function to contiguous_readpages.
>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>
> So this patch looks like a very nice cleanup, however when doing performance
> measurements with fio I was shocked to see that it actually is detrimental to
> performance. Here are the results:
>
> The command line used for fio:
> fio --name=/media/scratch/seqread --rw=read --direct=0 --ioengine=sync --bs=4k
> --numjobs=1 --size=1G --runtime=600 --group_reporting --loop 10
>
> This was tested on a vm with 4g of ram so the size of the test is smaller than
> the memory, so pages should have been nicely readahead.
>
> PATCHED:
>
> Starting 1 process
> Jobs: 1 (f=1): [R(1)][100.0%][r=519MiB/s][r=133k IOPS][eta 00m:00s]
> /media/scratch/seqread: (groupid=0, jobs=1): err= 0: pid=3722: Mon Dec 3 09:57:17 2018
> read: IOPS=78.4k, BW=306MiB/s (321MB/s)(10.0GiB/33444msec)
> clat (nsec): min=1703, max=9042.7k, avg=5463.97, stdev=121068.28
> lat (usec): min=2, max=9043, avg= 6.00, stdev=121.07
> clat percentiles (nsec):
> | 1.00th=[ 1848], 5.00th=[ 1896], 10.00th=[ 1912],
> | 20.00th=[ 1960], 30.00th=[ 2024], 40.00th=[ 2160],
> | 50.00th=[ 2384], 60.00th=[ 2576], 70.00th=[ 2800],
> | 80.00th=[ 3120], 90.00th=[ 3824], 95.00th=[ 4768],
> | 99.00th=[ 7968], 99.50th=[ 14912], 99.90th=[ 50944],
> | 99.95th=[ 667648], 99.99th=[5931008]
> bw ( KiB/s): min= 2768, max=544542, per=100.00%, avg=409912.68, stdev=162333.72, samples=50
> iops : min= 692, max=136135, avg=102478.08, stdev=40583.47, samples=50
> lat (usec) : 2=25.93%, 4=65.58%, 10=7.69%, 20=0.57%, 50=0.13%
> lat (usec) : 100=0.04%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=0.01%, 4=0.01%, 10=0.05%
> cpu : usr=7.20%, sys=92.55%, ctx=396, majf=0, minf=9
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
> READ: bw=306MiB/s (321MB/s), 306MiB/s-306MiB/s (321MB/s-321MB/s), io=10.0GiB (10.7GB), run=33444-33444msec
>
>
> UNPATCHED:
>
> Starting 1 process
> Jobs: 1 (f=1): [R(1)][100.0%][r=568MiB/s][r=145k IOPS][eta 00m:00s]
> /media/scratch/seqread: (groupid=0, jobs=1): err= 0: pid=640: Mon Dec 3 10:07:38 2018
> read: IOPS=90.4k, BW=353MiB/s (370MB/s)(10.0GiB/29008msec)
> clat (nsec): min=1418, max=12374k, avg=4816.38, stdev=109448.00
> lat (nsec): min=1836, max=12374k, avg=5284.46, stdev=109451.36
> clat percentiles (nsec):
> | 1.00th=[ 1576], 5.00th=[ 1608], 10.00th=[ 1640],
> | 20.00th=[ 1672], 30.00th=[ 1720], 40.00th=[ 1832],
> | 50.00th=[ 2096], 60.00th=[ 2288], 70.00th=[ 2480],
> | 80.00th=[ 2736], 90.00th=[ 3248], 95.00th=[ 3952],
> | 99.00th=[ 6368], 99.50th=[ 12736], 99.90th=[ 43776],
> | 99.95th=[ 798720], 99.99th=[5341184]
> bw ( KiB/s): min=34144, max=606208, per=100.00%, avg=465737.56, stdev=177637.57, samples=45
> iops : min= 8536, max=151552, avg=116434.33, stdev=44409.46, samples=45
> lat (usec) : 2=45.74%, 4=49.50%, 10=4.13%, 20=0.45%, 50=0.08%
> lat (usec) : 100=0.03%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=0.01%, 4=0.01%, 10=0.05%, 20=0.01%
> cpu : usr=7.14%, sys=92.39%, ctx=1059, majf=0, minf=9
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
> READ: bw=353MiB/s (370MB/s), 353MiB/s-353MiB/s (370MB/s-370MB/s), io=10.0GiB (10.7GB), run=29008-29008msec
>
> Clearly both bandwidth and iops are worse. However I'm puzzled as to why this is
> the case, given that I don't see how this patch affects the submission of
> readahead io.
>
I took slightly different measurements, namely: call count of
contiguous_readpages/__do_contiguous_readpages, latency histogram of calls
to extent_readpages as well as the output from cachetop (from bcc tools). Also I
switched qemu's cache settings to 'none' for the virtio device which fio is writing to!
The vms still has 4g of ram and this time the fio cmd line is:
fio --name=/media/scratch/seqread --rw=read --direct=0 --ioengine=sync --bs=4k --numjobs=1 --size=2G --runtime=600 --group_reporting --loop 10 (I've modified the file to be 2g rather than 1 as in the original patch)
So here are the results:
UNPATCHED CASE:
/media/scratch/seqread: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=470MiB/s][r=120k IOPS][eta 00m:00s]
/media/scratch/seqread: (groupid=0, jobs=1): err= 0: pid=643: Mon Dec 3 13:55:09 2018
read: IOPS=82.5k, BW=322MiB/s (338MB/s)(20.0GiB/63559msec)
clat (nsec): min=1427, max=38694k, avg=5613.13, stdev=126477.18
lat (nsec): min=1819, max=38695k, avg=6061.95, stdev=126479.46
clat percentiles (nsec):
| 1.00th=[ 1480], 5.00th=[ 1512], 10.00th=[ 1528],
| 20.00th=[ 1560], 30.00th=[ 1576], 40.00th=[ 1592],
| 50.00th=[ 1624], 60.00th=[ 1656], 70.00th=[ 1720],
| 80.00th=[ 1896], 90.00th=[ 2640], 95.00th=[ 3024],
| 99.00th=[ 4768], 99.50th=[ 12352], 99.90th=[1003520],
| 99.95th=[3817472], 99.99th=[4685824]
bw ( KiB/s): min=45728, max=549616, per=100.00%, avg=460140.08, stdev=144985.90, samples=91
iops : min=11432, max=137404, avg=115035.09, stdev=36246.50, samples=91
lat (usec) : 2=82.15%, 4=16.25%, 10=1.01%, 20=0.34%, 50=0.10%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.02%
lat (msec) : 2=0.02%, 4=0.04%, 10=0.04%, 20=0.01%, 50=0.01%
cpu : usr=5.81%, sys=81.11%, ctx=5682, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
FUNC COUNT
__do_contiguous_readpages 327935
Here is a histogram of latencies in microseconds
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 10 | |
64 -> 127 : 20 | |
128 -> 255 : 6 | |
256 -> 511 : 13 | |
512 -> 1023 : 11 | |
1024 -> 2047 : 10 | |
2048 -> 4095 : 2516 |****************************************|
4096 -> 8191 : 34 | |
Output from cachetop for the duration of the test:
TOTAL MISSES HITS DIRTIES BUFFERS_MB CACHED_MB
354 1 353 1 14 1063
543360 524288 19072 0 14 2232
379788 369092 10696 0 14 1956
329056 316960 12096 0 14 1147
377836 362181 15655 0 14 766
543360 524288 19072 0 14 2036
437191 423764 13427 0 14 2169
327289 317440 9849 0 14 1361
322175 307376 14799 0 14 513
543352 524284 19068 0 14 1680
481780 466852 14928 0 14 2338
343440 331528 11912 0 14 1585
344403 333824 10579 0 14 841
564139 440492 123647 0 14 2160
total hits:294800
total misses: 5242370
PATCHED CASE:
/media/scratch/seqread: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=529MiB/s][r=135k IOPS][eta 00m:00s]
/media/scratch/seqread: (groupid=0, jobs=1): err= 0: pid=620: Mon Dec 3 11:56:49 2018
read: IOPS=81.2k, BW=317MiB/s (332MB/s)(10.0GiB/32302msec)
clat (nsec): min=1428, max=152945k, avg=5927.07, stdev=246422.91
lat (nsec): min=1837, max=152946k, avg=6427.46, stdev=246426.89
clat percentiles (nsec):
| 1.00th=[ 1496], 5.00th=[ 1528], 10.00th=[ 1560],
| 20.00th=[ 1608], 30.00th=[ 1672], 40.00th=[ 1736],
| 50.00th=[ 1880], 60.00th=[ 2192], 70.00th=[ 2448],
| 80.00th=[ 2736], 90.00th=[ 3280], 95.00th=[ 4080],
| 99.00th=[ 7328], 99.50th=[ 14528], 99.90th=[ 242688],
| 99.95th=[3719168], 99.99th=[5013504]
bw ( KiB/s): min= 1432, max=549808, per=100.00%, avg=396769.62, stdev=175814.08, samples=52
iops : min= 358, max=137452, avg=99192.48, stdev=43953.67, samples=52
lat (usec) : 2=53.85%, 4=40.84%, 10=4.54%, 20=0.47%, 50=0.15%
lat (usec) : 100=0.04%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.02%, 10=0.04%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=6.53%, sys=83.13%, ctx=2746, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=317MiB/s (332MB/s), 317MiB/s-317MiB/s (332MB/s-332MB/s), io=10.0GiB (10.7GB), run=32302-32302msec
FUNC COUNT
contiguous_readpages 327935
So the total number of times we call the low-level submission function is
identical across the 2 versions. This implies that the patch doesn't result
in worse "merge" behavior i.e submitting more but smaller contiguous chunks via
contiguous_readpages.
Cachetop:
TOTAL MISSES HITS DIRTIES BUFFERS_MB CACHED_MB
475016 459690 15326 0 15 2313
316483 309138 7345 2 15 1472
326726 315393 11333 1 15 656
508038 487596 20442 0 15 1710
470487 456532 13955 0 15 2296
329494 318160 11334 0 15 1492
312000 300217 11783 0 15 617
518013 497629 20384 0 15 1552
518780 503636 15144 0 15 2480
338471 325632 12839 0 15 1705
325678 315328 10350 0 15 888
447023 428204 18819 0 15 1127
647182 524289 122893 0 15 2160
total hits: 291947<--- hits are negligibly smaller
total misses: 5241444<--- so are misses
next prev parent reply other threads:[~2018-12-03 14:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-03 10:25 [RFC PATCH] btrfs: Remove __extent_readpages Nikolay Borisov
2018-12-03 14:25 ` Nikolay Borisov [this message]
2018-12-05 16:58 ` Josef Bacik
2018-12-10 8:41 ` Nikolay Borisov
2019-02-11 7:46 ` Nikolay Borisov
2019-02-28 13:41 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51be8fe3-e319-6666-2511-342446297f61@suse.com \
--to=nborisov@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).