linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] cache on SSD makes system unresponsive
@ 2017-10-19 17:54 Oleg Cherkasov
  2017-10-19 18:13 ` Xen
                   ` (4 more replies)
  0 siblings, 5 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-19 17:54 UTC (permalink / raw)
  To: linux-lvm

Hi,

Recently I have decided to try out LVM cache feature on one of our Dell 
NX3100 servers running CentOS 7.4.1708 with 110Tb disk array (hardware 
RAID5 with H710 and H830 Dell adapters).  Two SSD disks each 256Gb are 
in hardware RAID1 using H710 adapter with primary and extended 
partitions so I decided to make ~240Gb LVM cache to see if system I/O 
may be improved.  The server is running Bareos storage daemon and beside 
sshd and Dell OpenManage monitoring does not have any other services. 
Unfortunately testing went not as I expected nonetheless at the end 
system is up and running with no data corrupted.

Initially I have tried the default writethrough mode and after running 
dd reading test with 250Gb file got system unresponsive for roughly 
15min with cache allocation around 50%.  Writing to disks it seems speed 
up the system however marginally, so around 10% on my tests and I did 
manage to pull more than 32Tb via backup from different hosts and once 
system became unresponsive to ssh and icmp requests however for a very 
short time.

I though it may be something with cache mode so switched to writeback 
via lvconvert and run dd reading test again with 250Gb file however that 
time everything went completely unexpected.  System started to slow 
responding for simple user interactions like list files and run top. And 
then became completely unresponsive for about half an hours.  Switching 
to main console via iLO I saw a lot of OOM messages and kernel tried to 
survive therefore randomly killed almost all processes.  Eventually I 
did manage to reboot and immediately uncached the array.

My question is about very strange behavior of LVM cache.  Well, I may 
expect no performance boost or even I/O degradation however I do not 
expect run out of memory and than OOM kicks in.  That server has only 
12Gb RAM however it does run only sshd, bareos SD daemon and OpenManange 
java based monitoring system so no RAM problems were notices for last 
few years running with our LVM cache.

Any ideas what may be wrong?  I have second NX3200 server with similar 
hardware setup and it would be switch to FreeBSD 11.1 with ZFS very time 
soon however I may try to install CentOS 7.4 first and see if the 
problem may be reproduced.

LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.


Thank you!
Oleg

^ permalink raw reply	[flat|nested] 37+ messages in thread
[parent not found: <640472762.2746512.1508882485777.ref@mail.yahoo.com>]
[parent not found: <1928541660.2031191.1508802005006.ref@mail.yahoo.com>]
[parent not found: <1714773615.1945146.1508792555922.ref@mail.yahoo.com>]
[parent not found: <1540708205.1077645.1508602122091.ref@mail.yahoo.com>]
[parent not found: <1244564108.1073508.1508601932111.ref@mail.yahoo.com>]
[parent not found: <541215543.377417.1508458336923.ref@mail.yahoo.com>]
* [linux-lvm] cache on SSD makes system unresponsive
@ 2017-10-19 10:05 Oleg Cherkasov
  0 siblings, 0 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-19 10:05 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]

Hi,

Recently I have decided to try out LVM cache feature on one of our Dell 
NX3100 servers running CentOS 7.4.1708 with 110Tb disk array (hardware 
RAID5 with H710 and H830 Dell adapters).  Two SSD disks each 256Gb are 
in hardware RAID1 using H710 adapter with primary and extended 
partitions so I decided to make ~240Gb LVM cache to see if system I/O 
may be improved.  The server is running Bareos storage daemon and beside 
sshd and Dell OpenManage monitoring does not have any other services.  
Unfortunately testing went not as I expected nonetheless at the end 
system is up and running with no data corrupted.

Initially I have tried the default writethrough mode and after running 
dd reading test with 250Gb file got system unresponsive for roughly 
15min with cache allocation around 50%. Writing to disks it seems speed 
up the system however marginally, so around 10% on my tests and I did 
manage to pull more than 32Tb via backup from different hosts and once 
system became unresponsive to ssh and icmp requests however for a very 
short time.

I though it may be something with cache mode so switched to writeback 
via lvconvert and run dd reading test again with 250Gb file however that 
time everything went completely unexpected.  System started to slow 
responding for simple user interactions like list files and run top. And 
then became completely unresponsive for about half an hours. Switching 
to main console via iLO I saw a lot of OOM messages and kernel tried to 
survive therefore randomly killed almost all processes.  Eventually I 
did manage to reboot and immediately uncached the array.

My question is about very strange behavior of LVM cache.  Well, I may 
expect no performance boost or even I/O degradation however I do not 
expect run out of memory and than OOM kicks in.  That server has only 
12Gb RAM however it does run only sshd, bareos SD daemon and OpenManange 
java based monitoring system so no RAM problems were notices for last 
few years running with our LVM cache.

Any ideas what may be wrong?  I have second NX3200 server with similar 
hardware setup and it would be switch to FreeBSD 11.1 with ZFS very time 
soon however I may try to install CentOS 7.4 first and see if the 
problem may be reproduced.

LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.


Thank you!

Oleg


[-- Attachment #2: Type: text/html, Size: 3840 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-10-24 23:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-19 17:54 [linux-lvm] cache on SSD makes system unresponsive Oleg Cherkasov
2017-10-19 18:13 ` Xen
2017-10-20 10:21   ` Oleg Cherkasov
2017-10-20 10:38     ` Xen
2017-10-20 11:41       ` Oleg Cherkasov
2017-10-19 18:49 ` Mike Snitzer
2017-10-20 11:07   ` Joe Thornber
2017-10-19 19:09 ` John Stoffel
2017-10-19 19:46   ` Xen
2017-10-19 21:14     ` John Stoffel
2017-10-20  6:42       ` Xen
2017-10-19 21:59   ` Oleg Cherkasov
2017-10-20 19:35     ` John Stoffel
2017-10-21  3:05       ` Mike Snitzer
2017-10-21 14:33       ` Oleg Cherkasov
2017-10-23 10:58         ` Zdenek Kabelac
2017-10-21  2:55     ` Mike Snitzer
2017-10-21 14:10       ` Oleg Cherkasov
2017-10-23 20:45         ` John Stoffel
2017-10-20 16:20 ` lejeczek
2017-10-20 16:48   ` Xen
2017-10-20 17:02     ` Bernd Eckenfels
2017-10-24 14:51 ` lejeczek
     [not found] <640472762.2746512.1508882485777.ref@mail.yahoo.com>
2017-10-24 22:01 ` matthew patton
2017-10-24 23:10   ` Chris Friesen
     [not found] <1928541660.2031191.1508802005006.ref@mail.yahoo.com>
2017-10-23 23:40 ` matthew patton
2017-10-24 15:36   ` Xen
     [not found] <1714773615.1945146.1508792555922.ref@mail.yahoo.com>
2017-10-23 21:02 ` matthew patton
2017-10-23 21:54   ` Xen
2017-10-24  2:51   ` John Stoffel
     [not found] <1540708205.1077645.1508602122091.ref@mail.yahoo.com>
2017-10-21 16:08 ` matthew patton
     [not found] <1244564108.1073508.1508601932111.ref@mail.yahoo.com>
2017-10-21 16:05 ` matthew patton
2017-10-24 18:09   ` Oleg Cherkasov
     [not found] <541215543.377417.1508458336923.ref@mail.yahoo.com>
2017-10-20  0:12 ` matthew patton
2017-10-20  6:46   ` Xen
2017-10-20  9:59     ` Oleg Cherkasov
  -- strict thread matches above, loose matches on Subject: below --
2017-10-19 10:05 Oleg Cherkasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).