linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Scott Mcdermott <scott@smemsh.net>
To: linux-lvm@redhat.com
Subject: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
Date: Sun, 22 Mar 2020 10:57:35 -0700	[thread overview]
Message-ID: <CACRKOwz_fOJiTVhbUqJR_EaxrKbf2bk+wb6LYOy0yWDkvt+buw@mail.gmail.com> (raw)

have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
these disks are hosted by a small 4GB big.little ARM system
running4.4.192-rk3399 (armbian 5.98 bionic).  parameters were set
with: lvconvert --type cache --cachemode writeback --cachepolicy smq
--cachesettings migration_threshold=10000000

the system lost power recently.  after coming back up from the crash
and resyncing the raid pair (no issues there), I get the disks online
and "lvchange -ay" the dm-cache device.  the system immediately starts
consuming all memory over the next few seconds, system used memory is
going up to 3500M (last I can see), cached is going down to 163M (last
I can see), the kernel kills all processes (they are faulting in
ext4_filemap_fault -> out_of_memory -> oom_kill_process)

so this happens every time I reboot and try it.  is 4G just not enough
system memory to run a dm-cache device? it was in perfect working
order before the crash and nicely optimizing my nightly rsyncs on top
of the cached device.  why's it going crazy when it goes online (third
second in below dstat 1), resulting in crashing system? how do I get
it online to remove my data from it?

usr sys idl wai stl| read  writ| int   csw | used  free  buff  cach
  2   3  95   0   0|   0     0 | 450   594 |93.9M  413M 10.3M 3184M
  0   0 100   0   0|   0     0 |  78   130 |93.4M  413M 10.3M 3184M
  2   6  91   1   0|8503k    0 |2447  4554 | 111M  393M 10.3M 3187M
  5   9  78   7   0|  11M   34k|7932    16k| 145M  358M 10.3M 3187M
  3  25  60  12   0| 277M 6818k|5385    10k| 602M 22.1M 10.4M 3068M
  9  64  21   5   0| 363M  124M|5509  9276 |1464M 39.9M 10.3M 2211M
  2  31  40  27   0| 342M  208M|5487  9495 |1671M 21.8M 10.3M 2027M
  1  10  63  26   0|  96M  128M|2197  4051 |1698M 23.9M 10.3M 1999M
  1  15  55  29   0| 138M  225M|2361  5007 |1730M 25.3M 10.3M 1966M
  3  16  54  28   0| 163M  234M|3118  5021 |1768M 23.8M 10.3M 1930M
  1   8  58  33   0|  85M  128M|1541  2860 |1795M 24.3M 10.3M 1904M
  2  10  53  35   0|  97M  161M|1949  3275 |1820M 24.1M 10.3M 1879M
  3  16  55  26   0| 148M  235M|2927  4733 |1865M 24.1M 10.3M 1835M
  1   9  65  25   0|  83M  137M|1764  3521 |1891M 23.9M 10.3M 1810M
  5  59  29   6   0| 340M   97M|4291  5530 |3569M 39.5M 10.3M  163M
  2  22  51  25   0| 339M  236M|5985  9526 |3500M  109M 10.3M  163M

note: tried adding some swap on unrelated slow disk, which seems to
delay it by some seconds, but ultimate result is always the same: OOM
every process killed and reboot...

here is paste from "lvs -a" just before "lvchange -ay
raidbak4/bakvol4" which brings the system down:

  LV                VG       Attr       LSize    Pool        Origin
  [bakcache4]       raidbak4 Cwi---C--- <931.38g
  [bakcache4_cdata] raidbak4 Cwi------- <931.38g
  [bakcache4_cmeta] raidbak4 ewi-------   48.00m
  bakvol4           raidbak4 Cwi---C---    1.75t [bakcache4] [bakvol4_corig]
  [bakvol4_corig]   raidbak4 owi---C---    1.75t
  [lvol0_pmspare]   raidbak4 ewi-------   48.00m

this lvchange then brings the system down with OOM death within 10 or
so seconds.  online access to the cached data seems to be
impossible...

             reply	other threads:[~2020-03-22 17:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-22 17:57 Scott Mcdermott [this message]
2020-03-23  8:26 ` Joe Thornber
2020-03-23  9:57   ` Zdenek Kabelac
2020-03-23 16:26     ` John Stoffel
2020-03-23 22:02     ` Scott Mcdermott
2020-03-24  9:43       ` Zdenek Kabelac
2020-03-24 11:37         ` Gionatan Danti
2020-03-24 15:09           ` Zdenek Kabelac
2020-03-24 22:35             ` Gionatan Danti
2020-03-25  8:55               ` Zdenek Kabelac
2020-03-23 21:35   ` Scott Mcdermott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACRKOwz_fOJiTVhbUqJR_EaxrKbf2bk+wb6LYOy0yWDkvt+buw@mail.gmail.com \
    --to=scott@smemsh.net \
    --cc=linux-lvm@redhat.com \
    --subject='Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).