Significantly dropped dm-cache performance in 4.13 compared to 4.11

* Significantly dropped dm-cache performance in 4.13 compared to 4.11
@ 2017-11-03 18:50 Stefan Ring
  2017-11-07 14:41 ` Joe Thornber
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Ring @ 2017-11-03 18:50 UTC (permalink / raw)
  To: dm-devel

Having just upgraded from a 4.11 kernel to a 4.13 one, I see a
significantly higher scrub time for a ZFS on Linux (=ZoL) pool that
lives on a dm-cache device consisting of a 800 GB partition on one
spinning 1TB disk and one partition on an SDD (something between 100
and 200 GB). ZFS scrubbing consists of reading everything stored in
the pool from start to finish, roughly in the order that it was
written. The data on the pool is for the most part more or less
linear, and the scrubbing used to achieve read rates from the spinning
disk in excess of 100MB/sec. With the old kernel, that is. These are
the scrub times for both kernels:

4.11.5-300.fc26: 1h56m
4.13.9-200.fc26: 4h32m

Nothing changed between those two runs except for the booted kernel.
ZoL is version 0.7.3 in both cases. Originally, I suspected ZoL 0.7.x
to be the culprit, which I upgraded simultaneously to the kernel, from
0.6.5.11. However, I built and installed it for both kernel versions
from the exact same sources, and scrub times are comparable to what
they were before on my home system which uses ZoL on four spinning
disks without an interposed dm-cache.

Typical output for iostat -dmx 3 with kernel 4.13 while scrub is going
on. Otherwise, there is no I/O activity on the system:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             300.67     0.00  462.67    0.00    68.16     0.00
301.69     2.63    5.61    5.61    0.00   2.16  99.90
sdb               0.00   194.67    6.00   83.33     0.38    14.01
329.82     0.20    2.22    0.50    2.34   1.58  14.13
dm-0              0.00     0.00    6.00  221.33     0.38    13.83
128.01     0.54    2.38    0.50    2.43   0.29   6.63
dm-1              0.00     0.00    0.00   53.67     0.00     0.17
6.31     0.12    2.28    0.00    2.28   2.06  11.07
dm-2              0.00     0.00  763.33    0.00    68.16     0.00
182.86     8.05   10.49   10.49    0.00   1.31  99.93
dm-3              0.00     0.00  440.00    0.00    54.70     0.00
254.60     1.98    4.41    4.41    0.00   2.27 100.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             468.00     1.00  519.67   20.00    82.39     0.24
313.60     2.93    5.38    5.49    2.50   1.83  98.63
sdb               0.00   356.00   18.67  109.33     1.00    25.80
428.73     0.15    1.20    1.20    1.20   1.04  13.33
dm-0              0.00     0.00   18.67  426.00     1.00    25.75
123.20     0.52    1.16    1.20    1.16   0.19   8.33
dm-1              0.00     0.00    0.00   39.67     0.00     0.13
6.66     0.06    1.52    0.00    1.52   1.43   5.67
dm-2              0.00     0.00  988.00   21.00    82.68     0.24
168.31     9.63    8.97    9.11    2.38   0.98  98.60
dm-3              0.00     0.00  485.00   19.33    57.84     0.24
235.88     2.14    4.29    4.41    1.41   1.98  99.87

dm-3 is the cached device which ZoL reads from. sda/dm-2 is the
spinning disk, sdb/dm-0 is the cache SDD.

It strikes me as odd that the amount read from the spinning disk is
actually more than what comes out of the combined device in the end.
It is exactly the other way around with the older kernel, which makes
much more sense to me. It looks like this with 4.11, where the
resulting amount of data is the sum of both reads:

Typical samples with kernel 4.11:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              87.67     0.00  618.33    0.00    62.53     0.00
207.12     1.58    2.56    2.56    0.00   1.36  84.37
sdb               0.67     0.00 1057.00    0.00    86.96     0.00
168.49     0.44    0.41    0.41    0.00   0.23  24.37
dm-0              0.00     0.00 1057.67    0.00    86.96     0.00
168.38     0.44    0.42    0.42    0.00   0.23  24.40
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  706.00    0.00    62.56     0.00
181.48     1.74    2.46    2.46    0.00   1.19  84.33
dm-3              0.00     0.00 1488.33    0.00   149.52     0.00
205.74     1.97    1.32    1.32    0.00   0.67 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             165.33     0.00  747.33    0.00    91.42     0.00
250.52     1.70    2.27    2.27    0.00   1.14  85.37
sdb               0.00     0.00  746.33    0.00    64.54     0.00
177.09     0.36    0.49    0.49    0.00   0.23  17.00
dm-0              0.00     0.00  746.33    0.00    64.54     0.00
177.09     0.37    0.49    0.49    0.00   0.23  17.07
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  912.67    0.00    91.39     0.00
205.07     2.02    2.21    2.21    0.00   0.94  85.37
dm-3              0.00     0.00 1363.00    0.00   155.92     0.00
234.28     2.02    1.48    1.48    0.00   0.73 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             161.00     0.00  684.67    0.00    84.63     0.00
253.14     1.93    2.83    2.83    0.00   1.45  99.27
sdb               0.00     0.00   62.67    0.00     6.05     0.00
197.57     0.03    0.48    0.48    0.00   0.32   2.03
dm-0              0.00     0.00   62.67    0.00     6.05     0.00
197.57     0.03    0.48    0.48    0.00   0.32   2.03
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  845.67    0.00    84.63     0.00
204.94     2.26    2.68    2.68    0.00   1.17  99.30
dm-3              0.00     0.00  727.67    0.00    90.67     0.00
255.19     1.97    2.70    2.70    0.00   1.37 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             160.67     0.00  738.67    0.00    89.74     0.00
248.81     1.83    2.48    2.48    0.00   1.29  95.23
sdb               0.33     0.00  303.33    0.00    28.02     0.00
189.17     0.14    0.47    0.47    0.00   0.25   7.73
dm-0              0.00     0.00  303.67    0.00    28.02     0.00
188.96     0.14    0.47    0.47    0.00   0.26   7.87
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  899.33    0.00    89.74     0.00
204.36     2.17    2.41    2.41    0.00   1.06  95.27
dm-3              0.00     0.00  978.67    0.00   117.76     0.00
246.42     1.96    2.00    2.00    0.00   1.02 100.00

$ ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 236 Nov  3 09:08 control
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_cachedata_cdata -> ../dm-0
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_cachedata_cmeta -> ../dm-1
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_zfsdisk -> ../dm-3
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_zfsdisk_corig -> ../dm-2

$ sudo dmsetup ls --tree
vg_zfs-lv_zfsdisk (253:3)
 ├─vg_zfs-lv_zfsdisk_corig (253:2)
 │  └─ (8:6)
 ├─vg_zfs-lv_cachedata_cdata (253:0)
 │  └─ (8:21)
 └─vg_zfs-lv_cachedata_cmeta (253:1)
    └─ (8:21)

$ sudo dmsetup table vg_zfs-lv_zfsdisk
0 1876041728 cache 253:1 253:0 253:2 1024 1 writethrough smq 0

$ sudo dmsetup status /dev/mapper/vg_zfs-lv_zfsdisk
0 1876041728 cache 8 1296/54272 1024 430706/430720 91621106 163624489
32345201 16417931 307686 307668 0 1 writethrough 2 migration_threshold
2048 smq 0 rw -

Any ideas?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread