linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: "John Stoffel" <john@stoffel.org>
To: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] cache on SSD makes system unresponsive
Date: Mon, 23 Oct 2017 16:45:56 -0400	[thread overview]
Message-ID: <23022.21764.968838.764670@quad.stoffel.home> (raw)
In-Reply-To: <6897ab24-f558-33c6-511a-5d2bc3f4967b@member.fsf.org>

>>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes:

Oleg> On 21. okt. 2017 04:55, Mike Snitzer wrote:
>> On Thu, Oct 19 2017 at  5:59pm -0400,
>> Oleg Cherkasov <o1e9@member.fsf.org> wrote:
>> 
>>> On 19. okt. 2017 21:09, John Stoffel wrote:
>>>> 
>> 
>> So aside from SAR outout: you don't have any system logs?  Or a vmcore
>> of the system (assuming it crashed?) -- in it you could access the
>> kernel log (via 'log' command in crash utility.

Oleg> Unfortunately no logs.  I have tried to see if I may recover dmesg 
Oleg> however no luck.  All logs but the latest dmesg boot are zeroed.  Of 
Oleg> course there are messages, secure and others however I do not see any 
Oleg> valuable information there.

Oleg> System did not crash, OOM were going wind however I did manage to 
Oleg> Ctrl-Alt-Del from the main console via iLO so eventually it rebooted 
Oleg> with clean disk umount.

Bummers.  Maybe you can setup a syslog server to use to log verbose
kernel logs elsewhere, including the OOM messages?  

>> 
>> More specifics on the workload would be useful.  Also, more details on
>> the LVM cache configuration (block size?  writethrough or writeback?
>> etc).

Oleg> No extra params but specifying mode writethrough initially.
Oleg> Hardware RAID1 on cache disk is 64k and on main array hardware
Oleg> RAID5 128k.

Oleg> I had followed precisely documentation from RHEL doc site so lvcreate, 
Oleg> lvconvert to update type and then lvconvert to add cache.

Oleg> I have decided to try writeback after and shifted cachemode to it with 
Oleg> lvcache.

>> I'll be looking very closely for any sign of memory leaks (both with
>> code inspection and testing while kemmleak is enabled).
>> 
>> But the more info you can provide on the workload the better.

Oleg> According to SAR there are no records about 20min before I reboot, so I 
Oleg> suspect SAR daemon failed a victim of OOM.

Maybe if you could take a snapshot of all the processes on the system
before you run the test, and then also run 'vmstat 1' to a log file
while running the test?

As a wierd thought... maybe it's because you have a 1gb meta data LV
that's causing problems?  Maybe you need to just accept the default
size?

It might also be instructive to make the cache be just half the SSD in
size and see if that helps.  It *might* be that as other people have
mentioned, that your SSD's performance drops off a cliff when it's
mostly full.  So reducing the cache size, even to only 80% of the size
of the disk, might give it enough spare empty blocks to stay
performant?

John

  reply	other threads:[~2017-10-23 20:54 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-19 17:54 [linux-lvm] cache on SSD makes system unresponsive Oleg Cherkasov
2017-10-19 18:13 ` Xen
2017-10-20 10:21   ` Oleg Cherkasov
2017-10-20 10:38     ` Xen
2017-10-20 11:41       ` Oleg Cherkasov
2017-10-19 18:49 ` Mike Snitzer
2017-10-20 11:07   ` Joe Thornber
2017-10-19 19:09 ` John Stoffel
2017-10-19 19:46   ` Xen
2017-10-19 21:14     ` John Stoffel
2017-10-20  6:42       ` Xen
2017-10-19 21:59   ` Oleg Cherkasov
2017-10-20 19:35     ` John Stoffel
2017-10-21  3:05       ` Mike Snitzer
2017-10-21 14:33       ` Oleg Cherkasov
2017-10-23 10:58         ` Zdenek Kabelac
2017-10-21  2:55     ` Mike Snitzer
2017-10-21 14:10       ` Oleg Cherkasov
2017-10-23 20:45         ` John Stoffel [this message]
2017-10-20 16:20 ` lejeczek
2017-10-20 16:48   ` Xen
2017-10-20 17:02     ` Bernd Eckenfels
2017-10-24 14:51 ` lejeczek
     [not found] <640472762.2746512.1508882485777.ref@mail.yahoo.com>
2017-10-24 22:01 ` matthew patton
2017-10-24 23:10   ` Chris Friesen
     [not found] <1928541660.2031191.1508802005006.ref@mail.yahoo.com>
2017-10-23 23:40 ` matthew patton
2017-10-24 15:36   ` Xen
     [not found] <1714773615.1945146.1508792555922.ref@mail.yahoo.com>
2017-10-23 21:02 ` matthew patton
2017-10-23 21:54   ` Xen
2017-10-24  2:51   ` John Stoffel
     [not found] <1540708205.1077645.1508602122091.ref@mail.yahoo.com>
2017-10-21 16:08 ` matthew patton
     [not found] <1244564108.1073508.1508601932111.ref@mail.yahoo.com>
2017-10-21 16:05 ` matthew patton
2017-10-24 18:09   ` Oleg Cherkasov
     [not found] <541215543.377417.1508458336923.ref@mail.yahoo.com>
2017-10-20  0:12 ` matthew patton
2017-10-20  6:46   ` Xen
2017-10-20  9:59     ` Oleg Cherkasov
  -- strict thread matches above, loose matches on Subject: below --
2017-10-19 10:05 Oleg Cherkasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23022.21764.968838.764670@quad.stoffel.home \
    --to=john@stoffel.org \
    --cc=linux-lvm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).