All of lore.kernel.org
 help / color / mirror / Atom feed
* Significantly dropped dm-cache performance in 4.13 compared to 4.11
@ 2017-11-03 18:50 Stefan Ring
  2017-11-07 14:41 ` Joe Thornber
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Ring @ 2017-11-03 18:50 UTC (permalink / raw)
  To: dm-devel

Having just upgraded from a 4.11 kernel to a 4.13 one, I see a
significantly higher scrub time for a ZFS on Linux (=ZoL) pool that
lives on a dm-cache device consisting of a 800 GB partition on one
spinning 1TB disk and one partition on an SDD (something between 100
and 200 GB). ZFS scrubbing consists of reading everything stored in
the pool from start to finish, roughly in the order that it was
written. The data on the pool is for the most part more or less
linear, and the scrubbing used to achieve read rates from the spinning
disk in excess of 100MB/sec. With the old kernel, that is. These are
the scrub times for both kernels:

4.11.5-300.fc26: 1h56m
4.13.9-200.fc26: 4h32m

Nothing changed between those two runs except for the booted kernel.
ZoL is version 0.7.3 in both cases. Originally, I suspected ZoL 0.7.x
to be the culprit, which I upgraded simultaneously to the kernel, from
0.6.5.11. However, I built and installed it for both kernel versions
from the exact same sources, and scrub times are comparable to what
they were before on my home system which uses ZoL on four spinning
disks without an interposed dm-cache.

Typical output for iostat -dmx 3 with kernel 4.13 while scrub is going
on. Otherwise, there is no I/O activity on the system:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             300.67     0.00  462.67    0.00    68.16     0.00
301.69     2.63    5.61    5.61    0.00   2.16  99.90
sdb               0.00   194.67    6.00   83.33     0.38    14.01
329.82     0.20    2.22    0.50    2.34   1.58  14.13
dm-0              0.00     0.00    6.00  221.33     0.38    13.83
128.01     0.54    2.38    0.50    2.43   0.29   6.63
dm-1              0.00     0.00    0.00   53.67     0.00     0.17
6.31     0.12    2.28    0.00    2.28   2.06  11.07
dm-2              0.00     0.00  763.33    0.00    68.16     0.00
182.86     8.05   10.49   10.49    0.00   1.31  99.93
dm-3              0.00     0.00  440.00    0.00    54.70     0.00
254.60     1.98    4.41    4.41    0.00   2.27 100.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             468.00     1.00  519.67   20.00    82.39     0.24
313.60     2.93    5.38    5.49    2.50   1.83  98.63
sdb               0.00   356.00   18.67  109.33     1.00    25.80
428.73     0.15    1.20    1.20    1.20   1.04  13.33
dm-0              0.00     0.00   18.67  426.00     1.00    25.75
123.20     0.52    1.16    1.20    1.16   0.19   8.33
dm-1              0.00     0.00    0.00   39.67     0.00     0.13
6.66     0.06    1.52    0.00    1.52   1.43   5.67
dm-2              0.00     0.00  988.00   21.00    82.68     0.24
168.31     9.63    8.97    9.11    2.38   0.98  98.60
dm-3              0.00     0.00  485.00   19.33    57.84     0.24
235.88     2.14    4.29    4.41    1.41   1.98  99.87

dm-3 is the cached device which ZoL reads from. sda/dm-2 is the
spinning disk, sdb/dm-0 is the cache SDD.

It strikes me as odd that the amount read from the spinning disk is
actually more than what comes out of the combined device in the end.
It is exactly the other way around with the older kernel, which makes
much more sense to me. It looks like this with 4.11, where the
resulting amount of data is the sum of both reads:

Typical samples with kernel 4.11:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              87.67     0.00  618.33    0.00    62.53     0.00
207.12     1.58    2.56    2.56    0.00   1.36  84.37
sdb               0.67     0.00 1057.00    0.00    86.96     0.00
168.49     0.44    0.41    0.41    0.00   0.23  24.37
dm-0              0.00     0.00 1057.67    0.00    86.96     0.00
168.38     0.44    0.42    0.42    0.00   0.23  24.40
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  706.00    0.00    62.56     0.00
181.48     1.74    2.46    2.46    0.00   1.19  84.33
dm-3              0.00     0.00 1488.33    0.00   149.52     0.00
205.74     1.97    1.32    1.32    0.00   0.67 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             165.33     0.00  747.33    0.00    91.42     0.00
250.52     1.70    2.27    2.27    0.00   1.14  85.37
sdb               0.00     0.00  746.33    0.00    64.54     0.00
177.09     0.36    0.49    0.49    0.00   0.23  17.00
dm-0              0.00     0.00  746.33    0.00    64.54     0.00
177.09     0.37    0.49    0.49    0.00   0.23  17.07
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  912.67    0.00    91.39     0.00
205.07     2.02    2.21    2.21    0.00   0.94  85.37
dm-3              0.00     0.00 1363.00    0.00   155.92     0.00
234.28     2.02    1.48    1.48    0.00   0.73 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             161.00     0.00  684.67    0.00    84.63     0.00
253.14     1.93    2.83    2.83    0.00   1.45  99.27
sdb               0.00     0.00   62.67    0.00     6.05     0.00
197.57     0.03    0.48    0.48    0.00   0.32   2.03
dm-0              0.00     0.00   62.67    0.00     6.05     0.00
197.57     0.03    0.48    0.48    0.00   0.32   2.03
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  845.67    0.00    84.63     0.00
204.94     2.26    2.68    2.68    0.00   1.17  99.30
dm-3              0.00     0.00  727.67    0.00    90.67     0.00
255.19     1.97    2.70    2.70    0.00   1.37 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             160.67     0.00  738.67    0.00    89.74     0.00
248.81     1.83    2.48    2.48    0.00   1.29  95.23
sdb               0.33     0.00  303.33    0.00    28.02     0.00
189.17     0.14    0.47    0.47    0.00   0.25   7.73
dm-0              0.00     0.00  303.67    0.00    28.02     0.00
188.96     0.14    0.47    0.47    0.00   0.26   7.87
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  899.33    0.00    89.74     0.00
204.36     2.17    2.41    2.41    0.00   1.06  95.27
dm-3              0.00     0.00  978.67    0.00   117.76     0.00
246.42     1.96    2.00    2.00    0.00   1.02 100.00

$ ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 236 Nov  3 09:08 control
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_cachedata_cdata -> ../dm-0
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_cachedata_cmeta -> ../dm-1
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_zfsdisk -> ../dm-3
lrwxrwxrwx 1 root root       7 Nov  3 09:08 vg_zfs-lv_zfsdisk_corig -> ../dm-2

$ sudo dmsetup ls --tree
vg_zfs-lv_zfsdisk (253:3)
 ├─vg_zfs-lv_zfsdisk_corig (253:2)
 │  └─ (8:6)
 ├─vg_zfs-lv_cachedata_cdata (253:0)
 │  └─ (8:21)
 └─vg_zfs-lv_cachedata_cmeta (253:1)
    └─ (8:21)

$ sudo dmsetup table vg_zfs-lv_zfsdisk
0 1876041728 cache 253:1 253:0 253:2 1024 1 writethrough smq 0

$ sudo dmsetup status /dev/mapper/vg_zfs-lv_zfsdisk
0 1876041728 cache 8 1296/54272 1024 430706/430720 91621106 163624489
32345201 16417931 307686 307668 0 1 writethrough 2 migration_threshold
2048 smq 0 rw -

Any ideas?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-03 18:50 Significantly dropped dm-cache performance in 4.13 compared to 4.11 Stefan Ring
@ 2017-11-07 14:41 ` Joe Thornber
  2017-11-09 15:15   ` Stefan Ring
  0 siblings, 1 reply; 9+ messages in thread
From: Joe Thornber @ 2017-11-07 14:41 UTC (permalink / raw)
  To: Stefan Ring; +Cc: dm-devel

On Fri, Nov 03, 2017 at 07:50:23PM +0100, Stefan Ring wrote:
> It strikes me as odd that the amount read from the spinning disk is
> actually more than what comes out of the combined device in the end.

This suggests dm-cache is trying to promote too way too much.
I'll try and reproduce the issue, your setup sounds pretty straight forward.

- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-07 14:41 ` Joe Thornber
@ 2017-11-09 15:15   ` Stefan Ring
  2017-11-13 17:31     ` Stefan Ring
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Ring @ 2017-11-09 15:15 UTC (permalink / raw)
  To: dm-devel

On Tue, Nov 7, 2017 at 3:41 PM, Joe Thornber <thornber@redhat.com> wrote:
> On Fri, Nov 03, 2017 at 07:50:23PM +0100, Stefan Ring wrote:
>> It strikes me as odd that the amount read from the spinning disk is
>> actually more than what comes out of the combined device in the end.
>
> This suggests dm-cache is trying to promote too way too much.
> I'll try and reproduce the issue, your setup sounds pretty straight forward.

I think it's actually the most straight-forward you can get ;).

I've also tested kernel 4.12 in the meantime, which behaves just like
4.13. So the difference in behavior seems to have been introduced
somewhere between 4.11 and 4.12.

I've also done plain dd from the dm-cache disk to /dev/null a few
times, which wrote enormous amounts of data to the SDD. My poor SSD
has received the same amount of writes during the last week that it
has had to endure during the entire previous year.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-09 15:15   ` Stefan Ring
@ 2017-11-13 17:31     ` Stefan Ring
  2017-11-13 19:01       ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Ring @ 2017-11-13 17:31 UTC (permalink / raw)
  To: dm-devel

On Thu, Nov 9, 2017 at 4:15 PM, Stefan Ring <stefanrin@gmail.com> wrote:
> On Tue, Nov 7, 2017 at 3:41 PM, Joe Thornber <thornber@redhat.com> wrote:
>> On Fri, Nov 03, 2017 at 07:50:23PM +0100, Stefan Ring wrote:
>>> It strikes me as odd that the amount read from the spinning disk is
>>> actually more than what comes out of the combined device in the end.
>>
>> This suggests dm-cache is trying to promote too way too much.
>> I'll try and reproduce the issue, your setup sounds pretty straight forward.
>
> I think it's actually the most straight-forward you can get ;).
>
> I've also tested kernel 4.12 in the meantime, which behaves just like
> 4.13. So the difference in behavior seems to have been introduced
> somewhere between 4.11 and 4.12.
>
> I've also done plain dd from the dm-cache disk to /dev/null a few
> times, which wrote enormous amounts of data to the SDD. My poor SSD
> has received the same amount of writes during the last week that it
> has had to endure during the entire previous year.

Do you think it would make a difference if I removed and recreated the cache?

I don't want to fry my SSD any longer. I've just copied several large
files into the dm-cached zfs dataset, and while reading them back
immediately afterwards, the SSD started writing crazy amounts again.
In my understanding, linear reads should rarely end up on the cache
device, but that is absolutely not what I'm experiencing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-13 17:31     ` Stefan Ring
@ 2017-11-13 19:01       ` Mike Snitzer
  2017-11-14 11:00         ` Joe Thornber
  2017-12-15 16:03         ` Stefan Ring
  0 siblings, 2 replies; 9+ messages in thread
From: Mike Snitzer @ 2017-11-13 19:01 UTC (permalink / raw)
  To: Stefan Ring; +Cc: dm-devel, ejt

On Mon, Nov 13 2017 at 12:31pm -0500,
Stefan Ring <stefanrin@gmail.com> wrote:

> On Thu, Nov 9, 2017 at 4:15 PM, Stefan Ring <stefanrin@gmail.com> wrote:
> > On Tue, Nov 7, 2017 at 3:41 PM, Joe Thornber <thornber@redhat.com> wrote:
> >> On Fri, Nov 03, 2017 at 07:50:23PM +0100, Stefan Ring wrote:
> >>> It strikes me as odd that the amount read from the spinning disk is
> >>> actually more than what comes out of the combined device in the end.
> >>
> >> This suggests dm-cache is trying to promote too way too much.
> >> I'll try and reproduce the issue, your setup sounds pretty straight forward.
> >
> > I think it's actually the most straight-forward you can get ;).
> >
> > I've also tested kernel 4.12 in the meantime, which behaves just like
> > 4.13. So the difference in behavior seems to have been introduced
> > somewhere between 4.11 and 4.12.
> >
> > I've also done plain dd from the dm-cache disk to /dev/null a few
> > times, which wrote enormous amounts of data to the SDD. My poor SSD
> > has received the same amount of writes during the last week that it
> > has had to endure during the entire previous year.
> 
> Do you think it would make a difference if I removed and recreated the cache?
> 
> I don't want to fry my SSD any longer. I've just copied several large
> files into the dm-cached zfs dataset, and while reading them back
> immediately afterwards, the SSD started writing crazy amounts again.
> In my understanding, linear reads should rarely end up on the cache
> device, but that is absolutely not what I'm experiencing.

Joe tried to reproduce your reported issue today and couldn't.

I think we need to better understand how you're triggering this
behaviour.  But we no longer have logic in place to avoid having
sequential IO bypass the cache... that _could_ start to explain things?
Whereas earlier versions of dm-cache definitely did ignore promoting
sequential IO.

But feel free to remove the cache for now.  Should be as simple as:
lvconvert --uncache VG/CacheLV

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-13 19:01       ` Mike Snitzer
@ 2017-11-14 11:00         ` Joe Thornber
  2017-11-14 14:53           ` Stefan Ring
  2017-11-14 18:41           ` Stefan Ring
  2017-12-15 16:03         ` Stefan Ring
  1 sibling, 2 replies; 9+ messages in thread
From: Joe Thornber @ 2017-11-14 11:00 UTC (permalink / raw)
  To: stefanrin; +Cc: dm-devel, ejt

On Mon, Nov 13, 2017 at 02:01:11PM -0500, Mike Snitzer wrote:
> On Mon, Nov 13 2017 at 12:31pm -0500,
> Stefan Ring <stefanrin@gmail.com> wrote:
> 
> > On Thu, Nov 9, 2017 at 4:15 PM, Stefan Ring <stefanrin@gmail.com> wrote:
> > > On Tue, Nov 7, 2017 at 3:41 PM, Joe Thornber <thornber@redhat.com> wrote:
> > >> On Fri, Nov 03, 2017 at 07:50:23PM +0100, Stefan Ring wrote:
> > >>> It strikes me as odd that the amount read from the spinning disk is
> > >>> actually more than what comes out of the combined device in the end.
> > >>
> > >> This suggests dm-cache is trying to promote too way too much.
> > >> I'll try and reproduce the issue, your setup sounds pretty straight forward.
> > >
> > > I think it's actually the most straight-forward you can get ;).
> > >
> > > I've also tested kernel 4.12 in the meantime, which behaves just like
> > > 4.13. So the difference in behavior seems to have been introduced
> > > somewhere between 4.11 and 4.12.
> > >
> > > I've also done plain dd from the dm-cache disk to /dev/null a few
> > > times, which wrote enormous amounts of data to the SDD. My poor SSD
> > > has received the same amount of writes during the last week that it
> > > has had to endure during the entire previous year.
> > 
> > Do you think it would make a difference if I removed and recreated the cache?
> > 
> > I don't want to fry my SSD any longer. I've just copied several large
> > files into the dm-cached zfs dataset, and while reading them back
> > immediately afterwards, the SSD started writing crazy amounts again.
> > In my understanding, linear reads should rarely end up on the cache
> > device, but that is absolutely not what I'm experiencing.
> 
> Joe tried to reproduce your reported issue today and couldn't.

I'm not sure what's going on here.  Would you mind sending me the
metadata please?  Either a cache_dump of it, or a copy of the metadata
dev?

- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-14 11:00         ` Joe Thornber
@ 2017-11-14 14:53           ` Stefan Ring
  2017-11-14 18:41           ` Stefan Ring
  1 sibling, 0 replies; 9+ messages in thread
From: Stefan Ring @ 2017-11-14 14:53 UTC (permalink / raw)
  To: dm-devel, ejt

On Tue, Nov 14, 2017 at 12:00 PM, Joe Thornber <thornber@redhat.com> wrote:
>
> I'm not sure what's going on here.  Would you mind sending me the
> metadata please?  Either a cache_dump of it, or a copy of the metadata
> dev?

I'd like to create a cache dump, but I'm not very experienced with
this stuff. I do:

$ cache_dump /dev/vg_zfs/lv_zfsdisk |less
syscall 'open' failed: Device or resource busy
Note: you cannot run this tool with these options on live metadata.

Which is what I feared/expected. However, after doing

$ lvchange -an vg_zfs/lv_zfsdisk

the device disappears, and I cannot run cache_dump on it as well. So
how do I make it not live, while still visible?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-14 11:00         ` Joe Thornber
  2017-11-14 14:53           ` Stefan Ring
@ 2017-11-14 18:41           ` Stefan Ring
  1 sibling, 0 replies; 9+ messages in thread
From: Stefan Ring @ 2017-11-14 18:41 UTC (permalink / raw)
  To: dm-devel, ejt

On Tue, Nov 14, 2017 at 12:00 PM, Joe Thornber <thornber@redhat.com> wrote:
> I'm not sure what's going on here.  Would you mind sending me the
> metadata please?  Either a cache_dump of it, or a copy of the metadata
> dev?

Ok, I've copied the device to a file and run cache_dump on it:
https://www.dropbox.com/s/y7fu723oybuxbz0/cmeta.xml.xz?dl=0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Significantly dropped dm-cache performance in 4.13 compared to 4.11
  2017-11-13 19:01       ` Mike Snitzer
  2017-11-14 11:00         ` Joe Thornber
@ 2017-12-15 16:03         ` Stefan Ring
  1 sibling, 0 replies; 9+ messages in thread
From: Stefan Ring @ 2017-12-15 16:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, ejt

On Mon, Nov 13, 2017 at 8:01 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>
> But feel free to remove the cache for now.  Should be as simple as:
> lvconvert --uncache VG/CacheLV

I did a --splitcache yesterday and ran a scrub again, which completed
in 3h. More than the ~2h of the cached version in 4.11, but
significantly less than the ~4.5h of the 4.13 cached variant. The data
on this volume has not changed much since last month, so the scrub
times should be very comparable, give or take a few minutes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-12-15 16:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-03 18:50 Significantly dropped dm-cache performance in 4.13 compared to 4.11 Stefan Ring
2017-11-07 14:41 ` Joe Thornber
2017-11-09 15:15   ` Stefan Ring
2017-11-13 17:31     ` Stefan Ring
2017-11-13 19:01       ` Mike Snitzer
2017-11-14 11:00         ` Joe Thornber
2017-11-14 14:53           ` Stefan Ring
2017-11-14 18:41           ` Stefan Ring
2017-12-15 16:03         ` Stefan Ring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.