linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* Re: [linux-lvm] cache on SSD makes system unresponsive
       [not found] <1244564108.1073508.1508601932111.ref@mail.yahoo.com>
@ 2017-10-21 16:05 ` matthew patton
  2017-10-24 18:09   ` Oleg Cherkasov
  0 siblings, 1 reply; 37+ messages in thread
From: matthew patton @ 2017-10-21 16:05 UTC (permalink / raw)
  To: LVM general discussion and development

0) what is the full DD command you are issuing? (I think we have this)

1) does your DD command work when LVM is not using caching of any kind.

2) does your DD command work if using 'direct' mode

3) are you able to write smaller chunks from NON-cached LVM volume to SSD vdev? Is there an inflection point in size where it goes haywire?

4) what is your IO elevator/scheduler set to?

5) what is value of
vm.dirty_background_ratio
vm.dirty_ratio
vm.dirty_background_bytes
vm.dirty_bytes

What do you observe in /proc/vmstat during DD?

6) run DD via strace

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-21 16:05 ` [linux-lvm] cache on SSD makes system unresponsive matthew patton
@ 2017-10-24 18:09   ` Oleg Cherkasov
  0 siblings, 0 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-24 18:09 UTC (permalink / raw)
  To: matthew patton, LVM general discussion and development

Some of your questions are answered int thread ...

On 21. okt. 2017 18:05, matthew patton wrote:
> 0) what is the full DD command you are issuing? (I think we have this)

dd if=file_250G of=/dev/null status=progress

> 
> 1) does your DD command work when LVM is not using caching of any kind.

Just dd had been running.

> 
> 2) does your DD command work if using 'direct' mode

nope

> 
> 3) are you able to write smaller chunks from NON-cached LVM volume to SSD vdev? Is there an inflection point in size where it goes haywire?

Tried for a smaller file, system became unresponsive for few minutes, 
LVM cache 51% however system survived with no reboot.

> 
> 4) what is your IO elevator/scheduler set to?

deadline for all disks in LV

> 
> 5) what is value of
> vm.dirty_background_ratio
> vm.dirty_ratio
> vm.dirty_background_bytes
> vm.dirty_bytes

vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500

> 
> What do you observe in /proc/vmstat during DD?
> 
> 6) run DD via strace

Once again, system were not responding to ICMP so checking vmstat does 
not make any sense because of deny of service to ssh or terminal.

strace? What are you planning to see their?  open() and continues read() 
system calls?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-24 22:01 ` matthew patton
@ 2017-10-24 23:10   ` Chris Friesen
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Friesen @ 2017-10-24 23:10 UTC (permalink / raw)
  To: linux-lvm

On 10/24/2017 04:01 PM, matthew patton wrote:

> How in the hell is the LVM cache being used at all? It has no business
> caching ANYTHING on streaming reads. Hmm, it turns out dm-cache/lvmcache
> really is retarded. It copies data to cache on first read and furthermore
> doesn't appear to detect streaming reads which have no value for caching
> purposes.

Technically it's not entirely true to say that streaming reads have no value for 
caching purposes.  It's conceivable to have a workload where the same file gets 
read over and over, in which case it might be useful to have it cached on an SSD.

As I understand it dm-cache is using smq, which essentially uses an LRU 
algorithm.  So yes, it'll read the streaming data into the cache, but the 
read-once/written-never data should also be the most likely to be evicted from 
the cache.

For what it's worth, the Linux kernel also copies data to the page cache on 
reads, which is why they introduced posix_fadvise(POSIX_FADV_DONTNEED) to allow 
the application to indicate that it's done with the data and it can be dropped 
from the page cache.

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
       [not found] <640472762.2746512.1508882485777.ref@mail.yahoo.com>
@ 2017-10-24 22:01 ` matthew patton
  2017-10-24 23:10   ` Chris Friesen
  0 siblings, 1 reply; 37+ messages in thread
From: matthew patton @ 2017-10-24 22:01 UTC (permalink / raw)
  To: Oleg Cherkasov; +Cc: linux-lvm

Oleg wrote:

>> 0) what is the full DD command you are issuing? (I think we have this)
> dd if=file_250G of=/dev/null status=progress

You do realize this is copying data to virtual memory (ie it's buffering data) when that's pointless in both benchmark and backup/restore purposes. And also generating VM pressure and swapping until it's forced to discard pages or resort to OOM.
 
>> 1) does your DD command work when LVM is not using caching of any kind.
> Just dd had been running.

I mean you degraded your LVM device holding the 250GB to not have any caching at all (lvconvert --splitcache VG/CacheLV) and otherwise removed any and all associations with the SSD virtual device?
 
 >> 2) does your DD command work if using 'direct' mode
 > nope

what command modifiers did you use precisely? And this failure was also observed with striaght-up NON-cached LVM too?
 
>> 3) are you able to write smaller chunks from NON-cached LVM volume to SSD vdev?
>> Is there an inflection point in size where it goes haywire?
 
> Tried for a smaller file, system became unresponsive for few minutes, 
> LVM cache 51% however system survived with no reboot.

What was the size of this file that succeeded, if poorly?

How in the hell is the LVM cache being used at all? It has no business caching ANYTHING on streaming reads. Hmm, it turns out dm-cache/lvmcache really is retarded. It copies data to cache on first read and furthermore doesn't appear to detect streaming reads which have no value for caching purposes.

Somebody thought they were doing the world a favor when they clearly had insufficient real-world experience. Worse, you can't even tune away the not necessarily helpful assumptions.
https://www.mjmwired.net/kernel/Documentation/device-mapper/cache-policies.txt

If you guys over at RedHat would oblige with a Nerf clue-bat to the persons involved, being able to forcibly override the cache/promotion settings would be a very nice thing to have back. For most situations it may not have any real value, but for this pathological workload, a sysadmin should be able to intervene.

Much of what is below is besides the point now that dm-cache is stuck in permanent 'dummy mode'. I maintain that using SSD caching on your application (backup server, all streaming read/write) to be a total waste of time anyway. If you still persist in wanting a modicum of caching intelligence use BCache, (BTier?) or LSI Cachecade.

--------------------
what is output of
    lvs -o+cache_policy,cache_settings VG/CacheLV

Please remove LVM caching capability from everywhere including the origin volume and test writing to raw SSD virtual disk. ie. /dev/sdxx whatever the Dell VD is as recognized by the SCSI layer. I suspect your SSD is crap and/or the Perc+SSD combo is crap. Please test them independently of any confounding influences of your LVM origin. Test the raw block device, not anything (filesystem or lvm) layered on top.

What brand/type SSDs are we talking about?

Unless the rules have changed for a 250GB cache dataLV you need a metadata of at least 250MB. Somewhere I think someone said you had a whole lot less? Or did you alloc 1GB to the metadata and I'm mis-remembering?

What size did you set your cache_blocks to? 256k?

What is the output of dmsetup on your LVM origin in cached mode?

What did you set read_promote_adjustment and write_promote_adjustment to?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-23 23:40 ` matthew patton
@ 2017-10-24 15:36   ` Xen
  0 siblings, 0 replies; 37+ messages in thread
From: Xen @ 2017-10-24 15:36 UTC (permalink / raw)
  To: linux-lvm

matthew patton schreef op 24-10-2017 1:40:
>> Because whatever purpose you are using it for, it shouldn't OOM the  
>> system.
> 
> I posted a 6 point query to the list 2 days ago as to what are the
> various settings being used (not LVM related) and also pointed out
> that not using odirect was necessarily going to try to stuff the file
> into the linux vm system which was bound to cause all kind of grief.
> 
> Maybe I'm missing responses but I haven't seen any answers to those
> questions which has nothing to do with LVM. I would be very surprised
> this has anything to do with LVM.

LVM is a system that is meant to run flawlessly without extra 
configuration.

You seem to be invested in not having problems solved.

I don't know.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 17:54 Oleg Cherkasov
                   ` (3 preceding siblings ...)
  2017-10-20 16:20 ` lejeczek
@ 2017-10-24 14:51 ` lejeczek
  4 siblings, 0 replies; 37+ messages in thread
From: lejeczek @ 2017-10-24 14:51 UTC (permalink / raw)
  To: LVM general discussion and development



On 19/10/17 18:54, Oleg Cherkasov wrote:
> Hi,
>
> Recently I have decided to try out LVM cache feature on 
> one of our Dell NX3100 servers running CentOS 7.4.1708 
> with 110Tb disk array (hardware RAID5 with H710 and H830 
> Dell adapters).  Two SSD disks each 256Gb are in hardware 
> RAID1 using H710 adapter with primary and extended 
> partitions so I decided to make ~240Gb LVM cache to see if 
> system I/O may be improved.  The server is running Bareos 
> storage daemon and beside sshd and Dell OpenManage 
> monitoring does not have any other services. Unfortunately 
> testing went not as I expected nonetheless at the end 
> system is up and running with no data corrupted.
>
> Initially I have tried the default writethrough mode and 
> after running dd reading test with 250Gb file got system 
> unresponsive for roughly 15min with cache allocation 
> around 50%.  Writing to disks it seems speed up the system 
> however marginally, so around 10% on my tests and I did 
> manage to pull more than 32Tb via backup from different 
> hosts and once system became unresponsive to ssh and icmp 
> requests however for a very short time.
>
> I though it may be something with cache mode so switched 
> to writeback via lvconvert and run dd reading test again 
> with 250Gb file however that time everything went 
> completely unexpected. System started to slow responding 
> for simple user interactions like list files and run top. 
> And then became completely unresponsive for about half an 
> hours.  Switching to main console via iLO I saw a lot of 
> OOM messages and kernel tried to survive therefore 
> randomly killed almost all processes.  Eventually I did 
> manage to reboot and immediately uncached the array.
>
> My question is about very strange behavior of LVM cache.  
> Well, I may expect no performance boost or even I/O 
> degradation however I do not expect run out of memory and 
> than OOM kicks in.  That server has only 12Gb RAM however 
> it does run only sshd, bareos SD daemon and OpenManange 
> java based monitoring system so no RAM problems were 
> notices for last few years running with our LVM cache.
>
> Any ideas what may be wrong?  I have second NX3200 server 
> with similar hardware setup and it would be switch to 
> FreeBSD 11.1 with ZFS very time soon however I may try to 
> install CentOS 7.4 first and see if the problem may be 
> reproduced.
>
> LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.
>
>
> Thank you!
> Oleg

I realized that same day I replied, mailman disabled my 
subscription, so in case it did not get through, again:

hi

not much of an explanation nor insight as to what might be 
going wrong with your setup/system but, instead my own 
conclusions/suggestions as a result of bits of my 
experience, I will share...

I would - if bigger part of a storage subsystem resides in 
the hardware - stick to the hardware, use CacheCade, let the 
hardware do the lot.

On LVM - similarly, stick to LVM, let LVM manage the whole 
lot (you will loose ~50% of a single average core(opteron 
6376) with raid5). Use the simplest HBAs(dell have such), no 
raid, not even JBOD. If disks are in same one enclosure, or 
simply under same one HBA(even though it's just a HBA) - do 
*not *mix SATA & SAS(it may work, but better not, from my 
experience)

Last one, keep that freaking firmware updated, everywhere 
possible, disks too(my latest experience with Seagate 2TB 
SAS, over hundred of those in two enclosures - I cannot, 
update does not work - Seagate's off the website tech 
support => useless = stay away from Seagate.)

I'll keep my fingers crossed for you - On luck - never too 
much of it.

>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-23 21:02 ` matthew patton
  2017-10-23 21:54   ` Xen
@ 2017-10-24  2:51   ` John Stoffel
  1 sibling, 0 replies; 37+ messages in thread
From: John Stoffel @ 2017-10-24  2:51 UTC (permalink / raw)
  To: matthew patton, LVM general discussion and development

>>>>> "matthew" == matthew patton <pattonme@yahoo.com> writes:

>> On Mon, 10/23/17, John Stoffel <john@stoffel.org> wrote:

matthew> SSD pathologies aside, why are we concerned about the cache
matthew> layer on a streaming read?

matthew> By definition the cache shouldn't be involved at all.

Because his system is going into OOM when doing this?  Yes, the cache
won't probably do anything for a streaming read, it needs to be
primed.  But when the system craps out... it cries out to be figured
out.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
       [not found] <1928541660.2031191.1508802005006.ref@mail.yahoo.com>
@ 2017-10-23 23:40 ` matthew patton
  2017-10-24 15:36   ` Xen
  0 siblings, 1 reply; 37+ messages in thread
From: matthew patton @ 2017-10-23 23:40 UTC (permalink / raw)
  To: LVM general discussion and development

> Because whatever purpose you are using it for, it shouldn't OOM the  system.

I posted a 6 point query to the list 2 days ago as to what are the various settings being used (not LVM related) and also pointed out that not using odirect was necessarily going to try to stuff the file into the linux vm system which was bound to cause all kind of grief.

Maybe I'm missing responses but I haven't seen any answers to those questions which has nothing to do with LVM. I would be very surprised this has anything to do with LVM.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-23 21:02 ` matthew patton
@ 2017-10-23 21:54   ` Xen
  2017-10-24  2:51   ` John Stoffel
  1 sibling, 0 replies; 37+ messages in thread
From: Xen @ 2017-10-23 21:54 UTC (permalink / raw)
  To: linux-lvm

matthew patton schreef op 23-10-2017 21:02:
>> On Mon, 10/23/17, John Stoffel <john@stoffel.org> wrote:
> 
> SSD pathologies aside, why are we concerned about the cache layer on a
> streaming read?
> 
> By definition the cache shouldn't be involved at all.

Because whatever purpose you are using it for, it shouldn't OOM the 
system.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
       [not found] <1714773615.1945146.1508792555922.ref@mail.yahoo.com>
@ 2017-10-23 21:02 ` matthew patton
  2017-10-23 21:54   ` Xen
  2017-10-24  2:51   ` John Stoffel
  0 siblings, 2 replies; 37+ messages in thread
From: matthew patton @ 2017-10-23 21:02 UTC (permalink / raw)
  To: LVM general discussion and development

>On Mon, 10/23/17, John Stoffel <john@stoffel.org> wrote:

SSD pathologies aside, why are we concerned about the cache layer on a streaming read?

By definition the cache shouldn't be involved at all.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-21 14:10       ` Oleg Cherkasov
@ 2017-10-23 20:45         ` John Stoffel
  0 siblings, 0 replies; 37+ messages in thread
From: John Stoffel @ 2017-10-23 20:45 UTC (permalink / raw)
  To: LVM general discussion and development

>>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes:

Oleg> On 21. okt. 2017 04:55, Mike Snitzer wrote:
>> On Thu, Oct 19 2017 at  5:59pm -0400,
>> Oleg Cherkasov <o1e9@member.fsf.org> wrote:
>> 
>>> On 19. okt. 2017 21:09, John Stoffel wrote:
>>>> 
>> 
>> So aside from SAR outout: you don't have any system logs?  Or a vmcore
>> of the system (assuming it crashed?) -- in it you could access the
>> kernel log (via 'log' command in crash utility.

Oleg> Unfortunately no logs.  I have tried to see if I may recover dmesg 
Oleg> however no luck.  All logs but the latest dmesg boot are zeroed.  Of 
Oleg> course there are messages, secure and others however I do not see any 
Oleg> valuable information there.

Oleg> System did not crash, OOM were going wind however I did manage to 
Oleg> Ctrl-Alt-Del from the main console via iLO so eventually it rebooted 
Oleg> with clean disk umount.

Bummers.  Maybe you can setup a syslog server to use to log verbose
kernel logs elsewhere, including the OOM messages?  

>> 
>> More specifics on the workload would be useful.  Also, more details on
>> the LVM cache configuration (block size?  writethrough or writeback?
>> etc).

Oleg> No extra params but specifying mode writethrough initially.
Oleg> Hardware RAID1 on cache disk is 64k and on main array hardware
Oleg> RAID5 128k.

Oleg> I had followed precisely documentation from RHEL doc site so lvcreate, 
Oleg> lvconvert to update type and then lvconvert to add cache.

Oleg> I have decided to try writeback after and shifted cachemode to it with 
Oleg> lvcache.

>> I'll be looking very closely for any sign of memory leaks (both with
>> code inspection and testing while kemmleak is enabled).
>> 
>> But the more info you can provide on the workload the better.

Oleg> According to SAR there are no records about 20min before I reboot, so I 
Oleg> suspect SAR daemon failed a victim of OOM.

Maybe if you could take a snapshot of all the processes on the system
before you run the test, and then also run 'vmstat 1' to a log file
while running the test?

As a wierd thought... maybe it's because you have a 1gb meta data LV
that's causing problems?  Maybe you need to just accept the default
size?

It might also be instructive to make the cache be just half the SSD in
size and see if that helps.  It *might* be that as other people have
mentioned, that your SSD's performance drops off a cliff when it's
mostly full.  So reducing the cache size, even to only 80% of the size
of the disk, might give it enough spare empty blocks to stay
performant?

John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-21 14:33       ` Oleg Cherkasov
@ 2017-10-23 10:58         ` Zdenek Kabelac
  0 siblings, 0 replies; 37+ messages in thread
From: Zdenek Kabelac @ 2017-10-23 10:58 UTC (permalink / raw)
  To: LVM general discussion and development, Oleg Cherkasov, John Stoffel

Dne 21.10.2017 v 16:33 Oleg Cherkasov napsal(a):
> On 20. okt. 2017 21:35, John Stoffel wrote:
>>>>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes:
>>
>> Oleg> On 19. okt. 2017 21:09, John Stoffel wrote:
>>>>
>>
>> Oleg> RAM 12Gb, swap around 12Gb as well.� /dev/sda is a hardware RAID1, the
>> Oleg> rest are RAID5.
>>
>> Interesting, it's all hardware RAID devices from what I can see.
> 
> It is exactly what I wrote initially in my first message!
> 
>>
>> Can you should the *exact* commands you used to make the cache?� Are
>> you using lvcache, or bcache?� they're two totally different beasts.
>> I looked into bcache in the past, but since you can't remove it from
>> an LV, I decided not to use it.� I use lvcache like this:
> 
> I have used lvcache of course and here are commands from bash history:
> 
> lvcreate -L 1G -n primary_backup_lv_cache_meta primary_backup_vg /dev/sda5
> 
> ### Allocate ~247G ib /dev/sda5 what has left of VG
> lvcreate -l 100%FREE -n primary_backup_lv_cache primary_backup_vg /dev/sda5
> 
> lvconvert --type cache-pool --cachemode writethrough --poolmetadata 
> primary_backup_vg/primary_backup_lv_cache_meta 
> primary_backup_vg/primary_backup_lv_cache
> 
> lvconvert --type cache --cachepool primary_backup_vg/primary_backup_lv_cache 
> primary_backup_vg/primary_backup_lv
> 
> ### lvconvert failed because required some extra extends in VG so I had to 
> reduce cache LV and try again:
> 
> lvreduce -L 200M primary_backup_vg/primary_backup_lv_cache
> 


Hi

Without plans to interrupt thoughts on topic here - the explanation here is 
very simple.

Cache pool is made from 'data' & 'metadata' LV - so both needs some space.
In the case of 'cache pool' it's pretty good plan to have both device is fast 
spindle (SSD).

So can you please provide output of:

lvs -a -o+devices

so it could be easily validated both _cdata & _cmeta LV is hosted by some SSD 
device (it's not shown anywhere in the thread - so just to be sure we have 
them on right disks)

Regards

Zdenek

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
       [not found] <1540708205.1077645.1508602122091.ref@mail.yahoo.com>
@ 2017-10-21 16:08 ` matthew patton
  0 siblings, 0 replies; 37+ messages in thread
From: matthew patton @ 2017-10-21 16:08 UTC (permalink / raw)
  To: Mike Snitzer, Oleg Cherkasov, LVM general discussion and development; +Cc: ejt

>  But if it's something that only exhibits in writeback mode rather than writethrough, then I'd guess it's to do with the
> list of writeback work that the policy builds.  So check

OP is being coy about the DD command but I saw it mentioned off-hand earlier.

dd if=/250GB_file/on_LVM/H800_vdev of=/dev/null

It's a pure, streaming read. LVM cache should be doing absolutely nothing.


 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20 19:35     ` John Stoffel
  2017-10-21  3:05       ` Mike Snitzer
@ 2017-10-21 14:33       ` Oleg Cherkasov
  2017-10-23 10:58         ` Zdenek Kabelac
  1 sibling, 1 reply; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-21 14:33 UTC (permalink / raw)
  To: LVM general discussion and development, John Stoffel

On 20. okt. 2017 21:35, John Stoffel wrote:
>>>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes:
> 
> Oleg> On 19. okt. 2017 21:09, John Stoffel wrote:
>>>
> 
> Oleg> RAM 12Gb, swap around 12Gb as well.  /dev/sda is a hardware RAID1, the
> Oleg> rest are RAID5.
> 
> Interesting, it's all hardware RAID devices from what I can see.

It is exactly what I wrote initially in my first message!

> 
> Can you should the *exact* commands you used to make the cache?  Are
> you using lvcache, or bcache?  they're two totally different beasts.
> I looked into bcache in the past, but since you can't remove it from
> an LV, I decided not to use it.  I use lvcache like this:

I have used lvcache of course and here are commands from bash history:

lvcreate -L 1G -n primary_backup_lv_cache_meta primary_backup_vg /dev/sda5

### Allocate ~247G ib /dev/sda5 what has left of VG
lvcreate -l 100%FREE -n primary_backup_lv_cache primary_backup_vg /dev/sda5

lvconvert --type cache-pool --cachemode writethrough --poolmetadata 
primary_backup_vg/primary_backup_lv_cache_meta 
primary_backup_vg/primary_backup_lv_cache

lvconvert --type cache --cachepool 
primary_backup_vg/primary_backup_lv_cache 
primary_backup_vg/primary_backup_lv

### lvconvert failed because required some extra extends in VG so I had 
to reduce cache LV and try again:

lvreduce -L 200M primary_backup_vg/primary_backup_lv_cache

### so this time it worked ok:

lvconvert --type cache-pool --cachemode writethrough --poolmetadata 
primary_backup_vg/primary_backup_lv_cache_meta 
primary_backup_vg/primary_backup_lv_cache
lvconvert --type cache --cachepool 
primary_backup_vg/primary_backup_lv_cache 
primary_backup_vg/primary_backup_lv

### The exact output of `lvs -a -o +devices` is gone of course because I 
had uncached of course however it looks as in docs so did not bring any 
suspicions to me.

> How was the performance before your caching tests?  Are you looking
> for better compression of your backups?  I've used bacula (which
> Bareos is based on) for years, but recently gave up because the
> restores sucked to do.  Sorry for the side note.  :-)

The performance was good, no complains to aging hardware however having 
spare SSD disk I wanted to test if it would improve anything and did not 
expect that trivial DD puts whole system on its knees.

> Any messages from the console?

Unfortunately no in logs.  As I wrote before I saw a lot of OOM messages 
on a killing spree.

> Oleg> User stat:
> Oleg> 02:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal
> Oleg>   %idle
> Oleg> 02:10:01 PM     all      0.22      0.00      0.08      0.05      0.00
> Oleg>   99.64
> Oleg> 02:20:35 PM     all      0.21      0.00      5.23     20.58      0.00
> Oleg>   73.98
> Oleg> 02:30:51 PM     all      0.23      0.00      0.43     31.06      0.00
> Oleg>   68.27
> Oleg> 02:40:02 PM     all      0.06      0.00      0.15     18.55      0.00
> Oleg>   81.24
> Oleg> Average:        all      0.19      0.00      1.54     17.67      0.00
> Oleg>   80.61
> 
> That looks ok to me... nothing obvious there at all.

Same is here ...

> Are you writing to a spool disk, before you then write the data into
> bacula's backup system?

Well, Bareos SD was down that time for testing, so it was:

dd if=sime_250G_file of=/dev/null status=process

Basically the first command after allocating LV cache.

> 
> I think you're running into a RedHat bug at this point.  I'd probably
> move to Debian and run my own kernel with the latest patches for MD, etc.

Would have to stay with CentOS and moving to Debian is not necessarily 
solves the problem.

> 
> You might even be running into problems with your HW RAID controllers
> and how Linux talks to them.
> 
> Any chance you could post more details?

HW RAID controller are PERC H710 and H810.  Posting extremely verbose 
MegaCli output would not help I guess.  Firmware is up to date according 
to BIOS Maintenance monitor.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-21  2:55     ` Mike Snitzer
@ 2017-10-21 14:10       ` Oleg Cherkasov
  2017-10-23 20:45         ` John Stoffel
  0 siblings, 1 reply; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-21 14:10 UTC (permalink / raw)
  To: linux-lvm

On 21. okt. 2017 04:55, Mike Snitzer wrote:
> On Thu, Oct 19 2017 at  5:59pm -0400,
> Oleg Cherkasov <o1e9@member.fsf.org> wrote:
> 
>> On 19. okt. 2017 21:09, John Stoffel wrote:
>>>
> 
> So aside from SAR outout: you don't have any system logs?  Or a vmcore
> of the system (assuming it crashed?) -- in it you could access the
> kernel log (via 'log' command in crash utility.

Unfortunately no logs.  I have tried to see if I may recover dmesg 
however no luck.  All logs but the latest dmesg boot are zeroed.  Of 
course there are messages, secure and others however I do not see any 
valuable information there.

System did not crash, OOM were going wind however I did manage to 
Ctrl-Alt-Del from the main console via iLO so eventually it rebooted 
with clean disk umount.

> 
> More specifics on the workload would be useful.  Also, more details on
> the LVM cache configuration (block size?  writethrough or writeback?
> etc).

No extra params but specifying mode writethrough initially.  Hardware 
RAID1 on cache disk is 64k and on main array hardware RAID5 128k.

I had followed precisely documentation from RHEL doc site so lvcreate, 
lvconvert to update type and then lvconvert to add cache.

I have decided to try writeback after and shifted cachemode to it with 
lvcache.

> 
> I'll be looking very closely for any sign of memory leaks (both with
> code inspection and testing while kemmleak is enabled).
> 
> But the more info you can provide on the workload the better.

According to SAR there are no records about 20min before I reboot, so I 
suspect SAR daemon failed a victim of OOM.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20 19:35     ` John Stoffel
@ 2017-10-21  3:05       ` Mike Snitzer
  2017-10-21 14:33       ` Oleg Cherkasov
  1 sibling, 0 replies; 37+ messages in thread
From: Mike Snitzer @ 2017-10-21  3:05 UTC (permalink / raw)
  To: John Stoffel; +Cc: LVM general discussion and development

On Fri, Oct 20 2017 at  3:35pm -0400,
John Stoffel <john@stoffel.org> wrote:

> >>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes:
> 
> I think you're running into a RedHat bug at this point.  I'd probably
> move to Debian and run my own kernel with the latest patches for MD, etc.

There is no reason to think this is a "RedHat bug".. verdict is very much
still out (but yes the kernel core is very different in RHEL7 than
upstream Linux.. though we have no details to suggest _where_ the issue
lies.. if it is a pathologicl dm-cache code issue then RHEL7.4 and
upstream Linux should both see the problem).

Moving distros is a waste of time given that RHEL7.4 and Centos7.4 have
the latest dm-cache code.  The issue is likely DM-cache specific (not
RHEL7.4 specific).

In general: RHEL7 or Centos7 will provide the best support of DM-cache.
All developers invested in DM-cache work for Red Hat.

Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 21:59   ` Oleg Cherkasov
  2017-10-20 19:35     ` John Stoffel
@ 2017-10-21  2:55     ` Mike Snitzer
  2017-10-21 14:10       ` Oleg Cherkasov
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2017-10-21  2:55 UTC (permalink / raw)
  To: Oleg Cherkasov; +Cc: linux-lvm

On Thu, Oct 19 2017 at  5:59pm -0400,
Oleg Cherkasov <o1e9@member.fsf.org> wrote:

> On 19. okt. 2017 21:09, John Stoffel wrote:
> >
> > Oleg> Recently I have decided to try out LVM cache feature on one of
> > Oleg> our Dell NX3100 servers running CentOS 7.4.1708 with 110Tb disk
> > Oleg> array (hardware RAID5 with H710 and H830 Dell adapters).  Two
> > Oleg> SSD disks each 256Gb are in hardware RAID1 using H710 adapter
> > Oleg> with primary and extended partitions so I decided to make ~240Gb
> > Oleg> LVM cache to see if system I/O may be improved.  The server is
> > Oleg> running Bareos storage daemon and beside sshd and Dell
> > Oleg> OpenManage monitoring does not have any other services.
> > Oleg> Unfortunately testing went not as I expected nonetheless at the
> > Oleg> end system is up and running with no data corrupted.
> >
> > Can you give more details about the system.  Is this providing storage
> > services (NFS) or is it just a backup server?
> 
> It is just a backup server, Bareos Storage Daemon + Dell OpenManage
> for LSI RAID cards (Dell's H7XX and H8XX are LSI based).  That host
> deliberately do no share any files or resources for security
> reasons, so no NFS or SMB.
> 
> Server has 2x SSD drives by 256Gb each and 10x 3Tb drives.  In
> addition there are two MD1200 disk arrays attached with 12x 4Tb
> disks each.  All disks exposed to CentOS as Virtual so there are 4
> disks in total:
> 
> NAME                                      MAJ:MIN RM   SIZE RO TYPE
> sda                                         8:0    0 278.9G  0 disk
> ├─sda1                                      8:1    0   500M  0 part /boot
> ├─sda2                                      8:2    0  36.1G  0 part
> │ ├─centos-swap                           253:0    0  11.7G  0 lvm  [SWAP]
> │ └─centos-root                           253:1    0  24.4G  0 lvm
> ├─sda3                                      8:3    0     1K  0 part
> └─sda5                                      8:5    0 242.3G  0 part
> sdb                                         8:16   0    30T  0 disk
> └─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
> sdc                                         8:32   0    40T  0 disk
> └─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
> sdd                                         8:48   0    40T  0 disk
> └─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
> 
> RAM 12Gb, swap around 12Gb as well.  /dev/sda is a hardware RAID1,
> the rest are RAID5.
> 
> I did make a cache and cache_meta on /dev/sda5.  It used to be a
> partition for Bareos spool for quite some time and because after
> upgrading to 10GbBASE network I do not need that spooler any more so
> I decided to try LVM cache.
> 
> > How did you setup your LVM config and your cache config?  Did you
> > mirror the two SSDs using MD, then add the device into your VG and use
> > that to setup the lvcache?
> All configs are stock CentOS 7.4 at the moment (incrementally
> upgraded from 7.0 of course), so I did not customize or tried to
> make any optimization on config.
> > I ask because I'm running lvcache at home on my main file/kvm server
> > and I've never seen this problem.  But!  I suspect you're running a
> > much older kernel, lvm config, etc.  Please post the full details of
> > your system if you can.
> 3.10.0-693.2.2.el7.x86_64
> 
> CentOS 7.4, as been pointed by Xen, released about a month ago and I
> had updated about a week ago while doing planned maintenance on
> network so had a good excuse to reboot it.
> 
> > Oleg> Initially I have tried the default writethrough mode and after
> > Oleg> running dd reading test with 250Gb file got system unresponsive
> > Oleg> for roughly 15min with cache allocation around 50%.  Writing to
> > Oleg> disks it seems speed up the system however marginally, so around
> > Oleg> 10% on my tests and I did manage to pull more than 32Tb via
> > Oleg> backup from different hosts and once system became unresponsive
> > Oleg> to ssh and icmp requests however for a very short time.
> >
> > Can you run 'top' or 'vmstat -admt 10' on the console while you're
> > running your tests to see what the system does?  How does memory look
> > on this system when you're NOT runnig lvcache?
> 
> Well, it is a production system and I am not planning to cache it
> again for test however if any patches would be available then try to
> run a similar system test on spare box before converting it to
> FreeBSD with ZFS.
> 
> Nonetheless I tried to run top during the dd reading test however
> with in first few minutes I did not notice any issues with RAM.
> System was using less then 2Gb of 12GB and the rest are wired
> (cache/buffers). After few minutes system became unresponsive even
> dropping ICMP ping requests and ssh session frozen and then dropped
> after time out, so no way to check top measurements.
> 
> I have recovered some of SAR records and I may see the last 20
> minutes SAR did not manage to log anything from 2:40pm to 3:00pm
> before system got rebooted and back online at 3:10pm:
> 
> User stat:
> 02:00:01 PM     CPU     %user     %nice   %system   %iowait
> %steal  %idle
> 02:10:01 PM     all      0.22      0.00      0.08      0.05
> 0.00  99.64
> 02:20:35 PM     all      0.21      0.00      5.23     20.58
> 0.00  73.98
> 02:30:51 PM     all      0.23      0.00      0.43     31.06
> 0.00  68.27
> 02:40:02 PM     all      0.06      0.00      0.15     18.55
> 0.00  81.24
> Average:        all      0.19      0.00      1.54     17.67
> 0.00  80.61
> 
> I/O stat:
> 02:00:01 PM       tps      rtps      wtps   bread/s   bwrtn/s
> 02:10:01 PM      5.27      3.19      2.08    109.29    195.38
> 02:20:35 PM   4404.80   3841.22    563.58 971542.00 140195.66
> 02:30:51 PM   1110.49    586.67    523.83 148206.31 131721.52
> 02:40:02 PM    510.72    211.29    299.43  51321.12  76246.81
> Average:      1566.86   1214.43    352.43 306453.67  88356.03
> 
> DMs:
> 02:00:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
> avgqu-sz    await     svctm     %util
> Average:       dev8-0    370.04    853.43  88355.91    241.08
> 85.32   230.56      1.61     59.54
> Average:      dev8-16      0.02      0.14      0.02      8.18
> 0.00     3.71      3.71      0.01
> Average:      dev8-32   1196.77 305599.78      0.04    255.35
> 4.26     3.56      0.09     11.28
> Average:      dev8-48      0.02      0.35      0.06     18.72
> 0.00    17.77     17.77      0.04
> Average:     dev253-0    151.59    118.15   1094.56      8.00
> 13.60    89.71      2.07     31.36
> Average:     dev253-1     15.01    722.81     53.73     51.73
> 3.08   204.85     28.35     42.56
> Average:     dev253-2   1259.48 218411.68      0.07    173.41
> 0.21     0.16      0.08      9.98
> Average:     dev253-3    681.29      1.27  87189.52    127.98
> 163.02   239.29      0.84     57.12
> Average:     dev253-4      3.83     11.09     18.09      7.61
> 0.09    22.59     10.72      4.11
> Average:     dev253-5   1940.54 305599.86      0.07    157.48
> 8.47     4.36      0.06     11.24
> 
> dev253:2 is the cache or actually was ...
> 
> Queue stat:
> 02:00:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
> 02:10:01 PM         1       302      0.09      0.05      0.05         0
> 02:20:35 PM         0       568      6.87      9.72      5.28         3
> 02:30:51 PM         1       569      5.46      6.83      5.83         2
> 02:40:02 PM         0       568      0.18      2.41      4.26         1
> Average:            0       502      3.15      4.75      3.85         2
> 
> RAM stat:
> 02:00:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
> kbcommit  %commit  kbactive   kbinact   kbdirty
> 02:10:01 PM    256304  11866580     97.89     66860   9181100
> 2709288    11.10   5603576   5066808        32
> 02:20:35 PM    185160  11937724     98.47     56712     39104
> 2725476    11.17    299256    292604        16
> 02:30:51 PM    175220  11947664     98.55     56712     29640
> 2730732    11.19    113912    113552        24
> 02:40:02 PM  11195028    927856      7.65     57504     62416
> 2696248    11.05    119488    164076        16
> Average:      2952928   9169956     75.64     59447   2328065
> 2715436    11.12   1534058   1409260        22
> 
> SWAP stat:
> 02:00:01 PM kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
> 02:10:01 PM  12010984    277012      2.25     71828     25.93
> 02:20:35 PM  11048040   1239956     10.09     88696      7.15
> 02:30:51 PM  10723456   1564540     12.73     38272      2.45
> 02:40:02 PM  10716884   1571112     12.79     77928      4.96
> Average:     11124841   1163155      9.47     69181      5.95

So aside from SAR outout: you don't have any system logs?  Or a vmcore
of the system (assuming it crashed?) -- in it you could access the
kernel log (via 'log' command in crash utility.

More specifics on the workload would be useful.  Also, more details on
the LVM cache configuration (block size?  writethrough or writeback?
etc).

I'll be looking very closely for any sign of memory leaks (both with
code inspection and testing while kemmleak is enabled).

But the more info you can provide on the workload the better.

Thanks,
Mike

p.s. RHEL7.4 has all of upstream's dm-cache code.
p.p.s.:
I've implemented parallel submission of write IO for writethrough mode.
It needs further testing and review but so far it seems to be working;
yet to see a huge improvement in writethrough mode throughput but
overall IO latencies on writes may be improved (at least closer to that
of the slow device in the cache).  Haven't looked at latency yet (will
test further with fio on Monday).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 21:59   ` Oleg Cherkasov
@ 2017-10-20 19:35     ` John Stoffel
  2017-10-21  3:05       ` Mike Snitzer
  2017-10-21 14:33       ` Oleg Cherkasov
  2017-10-21  2:55     ` Mike Snitzer
  1 sibling, 2 replies; 37+ messages in thread
From: John Stoffel @ 2017-10-20 19:35 UTC (permalink / raw)
  To: LVM general discussion and development

>>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes:

Oleg> On 19. okt. 2017 21:09, John Stoffel wrote:
>> 
Oleg> Recently I have decided to try out LVM cache feature on one of
Oleg> our Dell NX3100 servers running CentOS 7.4.1708 with 110Tb disk
Oleg> array (hardware RAID5 with H710 and H830 Dell adapters).  Two
Oleg> SSD disks each 256Gb are in hardware RAID1 using H710 adapter
Oleg> with primary and extended partitions so I decided to make ~240Gb
Oleg> LVM cache to see if system I/O may be improved.  The server is
Oleg> running Bareos storage daemon and beside sshd and Dell
Oleg> OpenManage monitoring does not have any other services.
Oleg> Unfortunately testing went not as I expected nonetheless at the
Oleg> end system is up and running with no data corrupted.
>> 
>> Can you give more details about the system.  Is this providing storage
>> services (NFS) or is it just a backup server?

Oleg> It is just a backup server, Bareos Storage Daemon + Dell
Oleg> OpenManage for LSI RAID cards (Dell's H7XX and H8XX are LSI
Oleg> based).  That host deliberately do no share any files or
Oleg> resources for security reasons, so no NFS or SMB.

Well... if it's a backup server, then I suspect that using caching
won't help much because you're mostly doing streaming writes, with
very few reads.  The Cache is designed to help the *read* case more.
And for a backup server, you're writing one or just a couple of
streams at once, which is a fairly ideal state for RAID5.

Oleg> Server has 2x SSD drives by 256Gb each and 10x 3Tb drives.  In
Oleg> addition there are two MD1200 disk arrays attached with 12x 4Tb
Oleg> disks each.  All disks exposed to CentOS as Virtual so there are
Oleg> 4 disks in total:

Oleg> NAME                                      MAJ:MIN RM   SIZE RO TYPE
Oleg> sda                                         8:0    0 278.9G  0 disk
Oleg> ├─sda1                                      8:1    0   500M  0 part /boot
Oleg> ├─sda2                                      8:2    0  36.1G  0 part
Oleg> │ ├─centos-swap                           253:0    0  11.7G  0 lvm  [SWAP]
Oleg> │ └─centos-root                           253:1    0  24.4G  0 lvm
Oleg> ├─sda3                                      8:3    0     1K  0 part
Oleg> └─sda5                                      8:5    0 242.3G  0 part
Oleg> sdb                                         8:16   0    30T  0 disk
Oleg> └─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
Oleg> sdc                                         8:32   0    40T  0 disk
Oleg> └─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
Oleg> sdd                                         8:48   0    40T  0 disk
Oleg> └─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm

Oleg> RAM 12Gb, swap around 12Gb as well.  /dev/sda is a hardware RAID1, the 
Oleg> rest are RAID5.

Interesting, it's all hardware RAID devices from what I can see.  

Oleg> I did make a cache and cache_meta on /dev/sda5.  It used to be a
Oleg> partition for Bareos spool for quite some time and because after
Oleg> upgrading to 10GbBASE network I do not need that spooler any
Oleg> more so I decided to try LVM cache.

Can you should the *exact* commands you used to make the cache?  Are
you using lvcache, or bcache?  they're two totally different beasts.
I looked into bcache in the past, but since you can't remove it from
an LV, I decided not to use it.  I use lvcache like this:

> sudo lvs data
  LV          VG   Attr       LSize   Pool           Origin        Data%  Meta%  Move Log Cpy%Sync Convert
      home        data Cwi-aoC--- 650.00g home_cache     [home_corig]
      home_cache  data Cwi---C--- 130.00g
      local       data Cwi-aoC--- 335.00g [localcacheLV] [local_corig]
	    
so I'm wondering exactly which caching setup you're using. 

>> How did you setup your LVM config and your cache config?  Did you
>> mirror the two SSDs using MD, then add the device into your VG and use
>> that to setup the lvcache?

Oleg> All configs are stock CentOS 7.4 at the moment (incrementally upgraded 
Oleg> from 7.0 of course), so I did not customize or tried to make any 
Oleg> optimization on config.

Ok, good to know.

>> I ask because I'm running lvcache at home on my main file/kvm server
>> and I've never seen this problem.  But!  I suspect you're running a
>> much older kernel, lvm config, etc.  Please post the full details of
>> your system if you can.
Oleg> 3.10.0-693.2.2.el7.x86_64

Oleg> CentOS 7.4, as been pointed by Xen, released about a month ago
Oleg> and I had updated about a week ago while doing planned
Oleg> maintenance on network so had a good excuse to reboot it.

Oleg> Initially I have tried the default writethrough mode and after
Oleg> running dd reading test with 250Gb file got system unresponsive
Oleg> for roughly 15min with cache allocation around 50%.  Writing to
Oleg> disks it seems speed up the system however marginally, so around
Oleg> 10% on my tests and I did manage to pull more than 32Tb via
Oleg> backup from different hosts and once system became unresponsive
Oleg> to ssh and icmp requests however for a very short time.

This isn't good.  Can you post more details about your LV setup please?  

>> Can you run 'top' or 'vmstat -admt 10' on the console while you're
>> running your tests to see what the system does?  How does memory look
>> on this system when you're NOT runnig lvcache?

Oleg> Well, it is a production system and I am not planning to cache
Oleg> it again for test however if any patches would be available then
Oleg> try to run a similar system test on spare box before converting
Oleg> it to FreeBSD with ZFS.

How was the performance before your caching tests?  Are you looking
for better compression of your backups?  I've used bacula (which
Bareos is based on) for years, but recently gave up because the
restores sucked to do.  Sorry for the side note.  :-)

Oleg> Nonetheless I tried to run top during the dd reading test
Oleg> however with in first few minutes I did not notice any issues
Oleg> with RAM.  System was using less then 2Gb of 12GB and the rest
Oleg> are wired (cache/buffers).  After few minutes system became
Oleg> unresponsive even dropping ICMP ping requests and ssh session
Oleg> frozen and then dropped after time out, so no way to check top
Oleg> measurements.

Any messages from the console?  

Oleg> I have recovered some of SAR records and I may see the last 20 minutes 
Oleg> SAR did not manage to log anything from 2:40pm to 3:00pm before system 
Oleg> got rebooted and back online at 3:10pm:

Oleg> User stat:
Oleg> 02:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal 
Oleg>   %idle
Oleg> 02:10:01 PM     all      0.22      0.00      0.08      0.05      0.00 
Oleg>   99.64
Oleg> 02:20:35 PM     all      0.21      0.00      5.23     20.58      0.00 
Oleg>   73.98
Oleg> 02:30:51 PM     all      0.23      0.00      0.43     31.06      0.00 
Oleg>   68.27
Oleg> 02:40:02 PM     all      0.06      0.00      0.15     18.55      0.00 
Oleg>   81.24
Oleg> Average:        all      0.19      0.00      1.54     17.67      0.00 
Oleg>   80.61

That looks ok to me... nothing obvious there at all.

Oleg> I/O stat:
Oleg> 02:00:01 PM       tps      rtps      wtps   bread/s   bwrtn/s
Oleg> 02:10:01 PM      5.27      3.19      2.08    109.29    195.38
Oleg> 02:20:35 PM   4404.80   3841.22    563.58 971542.00 140195.66
Oleg> 02:30:51 PM   1110.49    586.67    523.83 148206.31 131721.52
Oleg> 02:40:02 PM    510.72    211.29    299.43  51321.12  76246.81
Oleg> Average:      1566.86   1214.43    352.43 306453.67  88356.03


Are you writing to a spool disk, before you then write the data into
bacula's backup system?


Oleg> DMs:
Oleg> 02:00:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
Oleg>     await     svctm     %util
Oleg> Average:       dev8-0    370.04    853.43  88355.91    241.08     85.32 
Oleg>    230.56      1.61     59.54
Oleg> Average:      dev8-16      0.02      0.14      0.02      8.18      0.00 
Oleg>      3.71      3.71      0.01
Oleg> Average:      dev8-32   1196.77 305599.78      0.04    255.35      4.26 
Oleg>      3.56      0.09     11.28
Oleg> Average:      dev8-48      0.02      0.35      0.06     18.72      0.00 
Oleg>     17.77     17.77      0.04
Oleg> Average:     dev253-0    151.59    118.15   1094.56      8.00     13.60 
Oleg>     89.71      2.07     31.36
Oleg> Average:     dev253-1     15.01    722.81     53.73     51.73      3.08 
Oleg>    204.85     28.35     42.56
Oleg> Average:     dev253-2   1259.48 218411.68      0.07    173.41      0.21 
Oleg>      0.16      0.08      9.98
Oleg> Average:     dev253-3    681.29      1.27  87189.52    127.98    163.02 
Oleg>    239.29      0.84     57.12
Oleg> Average:     dev253-4      3.83     11.09     18.09      7.61      0.09 
Oleg>     22.59     10.72      4.11
Oleg> Average:     dev253-5   1940.54 305599.86      0.07    157.48      8.47 
Oleg>      4.36      0.06     11.24


That's really bursty traffic... 


Oleg> dev253:2 is the cache or actually was ...

Oleg> Queue stat:
Oleg> 02:00:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
Oleg> 02:10:01 PM         1       302      0.09      0.05      0.05         0
Oleg> 02:20:35 PM         0       568      6.87      9.72      5.28         3
Oleg> 02:30:51 PM         1       569      5.46      6.83      5.83         2
Oleg> 02:40:02 PM         0       568      0.18      2.41      4.26         1
Oleg> Average:            0       502      3.15      4.75      3.85         2

Oleg> RAM stat:
Oleg> 02:00:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit 
Oleg>   %commit  kbactive   kbinact   kbdirty
Oleg> 02:10:01 PM    256304  11866580     97.89     66860   9181100   2709288 
Oleg>     11.10   5603576   5066808        32
Oleg> 02:20:35 PM    185160  11937724     98.47     56712     39104   2725476 
Oleg>     11.17    299256    292604        16
Oleg> 02:30:51 PM    175220  11947664     98.55     56712     29640   2730732 
Oleg>     11.19    113912    113552        24
Oleg> 02:40:02 PM  11195028    927856      7.65     57504     62416   2696248 
Oleg>     11.05    119488    164076        16
Oleg> Average:      2952928   9169956     75.64     59447   2328065   2715436 
Oleg>     11.12   1534058   1409260        22

Oleg> SWAP stat:
Oleg> 02:00:01 PM kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
Oleg> 02:10:01 PM  12010984    277012      2.25     71828     25.93
Oleg> 02:20:35 PM  11048040   1239956     10.09     88696      7.15
Oleg> 02:30:51 PM  10723456   1564540     12.73     38272      2.45
Oleg> 02:40:02 PM  10716884   1571112     12.79     77928      4.96
Oleg> Average:     11124841   1163155      9.47     69181      5.95

I think you're running into a RedHat bug at this point.  I'd probably
move to Debian and run my own kernel with the latest patches for MD, etc.

You might even be running into problems with your HW RAID controllers
and how Linux talks to them.

Any chance you could post more details?

Good luck!
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20 16:48   ` Xen
@ 2017-10-20 17:02     ` Bernd Eckenfels
  0 siblings, 0 replies; 37+ messages in thread
From: Bernd Eckenfels @ 2017-10-20 17:02 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

Not sure if this is on-topic but there is a reason for software solutions. The days for super properties and faulty hardware is over. You put Software like lvm on top of mass market stupid hardware exactly to reduce complexity.

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
From: linux-lvm-bounces@redhat.com <linux-lvm-bounces@redhat.com> on behalf of Xen <list@xenhideout.nl>
Sent: Friday, October 20, 2017 6:48:31 PM
To: linux-lvm@redhat.com
Subject: Re: [linux-lvm] cache on SSD makes system unresponsive

lejeczek schreef op 20-10-2017 16:20:

> I would - if bigger part of a storage subsystem resides in the
> hardware - stick to the hardware, use CacheCade, let the hardware do
> the lot.

In other words -- keep it simple (smart person) ;-).

Complicatedness is really the biggest reason for failure everywhere....

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[-- Attachment #2: Type: text/html, Size: 2386 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20 16:20 ` lejeczek
@ 2017-10-20 16:48   ` Xen
  2017-10-20 17:02     ` Bernd Eckenfels
  0 siblings, 1 reply; 37+ messages in thread
From: Xen @ 2017-10-20 16:48 UTC (permalink / raw)
  To: linux-lvm

lejeczek schreef op 20-10-2017 16:20:

> I would - if bigger part of a storage subsystem resides in the
> hardware - stick to the hardware, use CacheCade, let the hardware do
> the lot.

In other words -- keep it simple (smart person) ;-).

Complicatedness is really the biggest reason for failure everywhere....

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 17:54 Oleg Cherkasov
                   ` (2 preceding siblings ...)
  2017-10-19 19:09 ` John Stoffel
@ 2017-10-20 16:20 ` lejeczek
  2017-10-20 16:48   ` Xen
  2017-10-24 14:51 ` lejeczek
  4 siblings, 1 reply; 37+ messages in thread
From: lejeczek @ 2017-10-20 16:20 UTC (permalink / raw)
  To: LVM general discussion and development



On 19/10/17 18:54, Oleg Cherkasov wrote:
> Hi,
>
> Recently I have decided to try out LVM cache feature on 
> one of our Dell NX3100 servers running CentOS 7.4.1708 
> with 110Tb disk array (hardware RAID5 with H710 and H830 
> Dell adapters).  Two SSD disks each 256Gb are in hardware 
> RAID1 using H710 adapter with primary and extended 
> partitions so I decided to make ~240Gb LVM cache to see if 
> system I/O may be improved.  The server is running Bareos 
> storage daemon and beside sshd and Dell OpenManage 
> monitoring does not have any other services. Unfortunately 
> testing went not as I expected nonetheless at the end 
> system is up and running with no data corrupted.
>
> Initially I have tried the default writethrough mode and 
> after running dd reading test with 250Gb file got system 
> unresponsive for roughly 15min with cache allocation 
> around 50%.  Writing to disks it seems speed up the system 
> however marginally, so around 10% on my tests and I did 
> manage to pull more than 32Tb via backup from different 
> hosts and once system became unresponsive to ssh and icmp 
> requests however for a very short time.
>
> I though it may be something with cache mode so switched 
> to writeback via lvconvert and run dd reading test again 
> with 250Gb file however that time everything went 
> completely unexpected. System started to slow responding 
> for simple user interactions like list files and run top. 
> And then became completely unresponsive for about half an 
> hours.  Switching to main console via iLO I saw a lot of 
> OOM messages and kernel tried to survive therefore 
> randomly killed almost all processes.  Eventually I did 
> manage to reboot and immediately uncached the array.
>
> My question is about very strange behavior of LVM cache.  
> Well, I may expect no performance boost or even I/O 
> degradation however I do not expect run out of memory and 
> than OOM kicks in.  That server has only 12Gb RAM however 
> it does run only sshd, bareos SD daemon and OpenManange 
> java based monitoring system so no RAM problems were 
> notices for last few years running with our LVM cache.
>
> Any ideas what may be wrong?  I have second NX3200 server 
> with similar hardware setup and it would be switch to 
> FreeBSD 11.1 with ZFS very time soon however I may try to 
> install CentOS 7.4 first and see if the problem may be 
> reproduced.
>
> LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.
>
>
> Thank you!
> Oleg
>
hi

not much of an explanation nor insight as to what might be 
going wrong with your setup/system but, instead my own 
conclusions/suggestions as a result of bits of my 
experience, I will share...

I would - if bigger part of a storage subsystem resides in 
the hardware - stick to the hardware, use CacheCade, let the 
hardware do the lot.

On LVM - similarly, stick to LVM, let LVM manage the whole 
lot (you will loose ~50% of a single average core(opteron 
6376) with raid5). Use the simplest HBAs(dell have such), no 
raid, not even JBOD. If disks are in same one enclosure, or 
simply under same one HBA(even though it's just a HBA) - do 
*not *mix SATA & SAS(it may work, but better not, from my 
experience)

Last one, keep that freaking firmware updated, everywhere 
possible, disks too(my latest experience with Seagate 2TB 
SAS, over hundred of those in two enclosures - I cannot, 
update does not work - Seagate's off the website tech 
support => useless = stay away from Seagate.)

I'll keep my fingers crossed for you - On luck - never too 
much of it.

> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20 10:38     ` Xen
@ 2017-10-20 11:41       ` Oleg Cherkasov
  0 siblings, 0 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-20 11:41 UTC (permalink / raw)
  To: linux-lvm

On 20. okt. 2017 12:38, Xen wrote:
> Oleg Cherkasov schreef op 20-10-2017 10:21:
>> On 19. okt. 2017 20:13, Xen wrote:
>>
>> Could it be TRIMing issue because those are from 2012?
> 
> You mean that the SATA version is too low to interleave TRIMs with data 
> access?

I think SSDs from different vendors handle trimming differently 
regardless SAS or SATA.  It is just my hypothesis if trimming is a cause 
of the problem of course.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 18:49 ` Mike Snitzer
@ 2017-10-20 11:07   ` Joe Thornber
  0 siblings, 0 replies; 37+ messages in thread
From: Joe Thornber @ 2017-10-20 11:07 UTC (permalink / raw)
  To: Mike Snitzer, Oleg Cherkasov; +Cc: ejt, linux-lvm

[-- Attachment #1: Type: text/plain, Size: 3447 bytes --]

I can't look at this until Sunday.  But if it's something that only
exhibits in writeback mode rather than writethrough, then I'd guess it's to
do with the list of writeback work that the policy builds.  So check
whether the list is growing endlessly, and check the work object is being
freed once the copy has completed.

On Thu, 19 Oct 2017 at 19:49 Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Oct 19 2017 at  1:54pm -0400,
> Oleg Cherkasov <o1e9@member.fsf.org> wrote:
>
> > Hi,
> >
> > Recently I have decided to try out LVM cache feature on one of our
> > Dell NX3100 servers running CentOS 7.4.1708 with 110Tb disk array
> > (hardware RAID5 with H710 and H830 Dell adapters).  Two SSD disks
> > each 256Gb are in hardware RAID1 using H710 adapter with primary and
> > extended partitions so I decided to make ~240Gb LVM cache to see if
> > system I/O may be improved.  The server is running Bareos storage
> > daemon and beside sshd and Dell OpenManage monitoring does not have
> > any other services. Unfortunately testing went not as I expected
> > nonetheless at the end system is up and running with no data
> > corrupted.
> >
> > Initially I have tried the default writethrough mode and after
> > running dd reading test with 250Gb file got system unresponsive for
> > roughly 15min with cache allocation around 50%.  Writing to disks it
> > seems speed up the system however marginally, so around 10% on my
> > tests and I did manage to pull more than 32Tb via backup from
> > different hosts and once system became unresponsive to ssh and icmp
> > requests however for a very short time.
> >
> > I though it may be something with cache mode so switched to
> > writeback via lvconvert and run dd reading test again with 250Gb
> > file however that time everything went completely unexpected.
> > System started to slow responding for simple user interactions like
> > list files and run top. And then became completely unresponsive for
> > about half an hours.  Switching to main console via iLO I saw a lot
> > of OOM messages and kernel tried to survive therefore randomly
> > killed almost all processes.  Eventually I did manage to reboot and
> > immediately uncached the array.
> >
> > My question is about very strange behavior of LVM cache.  Well, I
> > may expect no performance boost or even I/O degradation however I do
> > not expect run out of memory and than OOM kicks in.  That server has
> > only 12Gb RAM however it does run only sshd, bareos SD daemon and
> > OpenManange java based monitoring system so no RAM problems were
> > notices for last few years running with our LVM cache.
> >
> > Any ideas what may be wrong?  I have second NX3200 server with
> > similar hardware setup and it would be switch to FreeBSD 11.1 with
> > ZFS very time soon however I may try to install CentOS 7.4 first and
> > see if the problem may be reproduced.
> >
> > LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.
>
> Your experience is _not_ unique.  It is unfortunate but there would seem
> to be some systemic issues with dm-cache being too resoruce heavy.  Not
> aware of any particular issue(s) yet.
>
> I'm focusing on this now since we've had some internal reports that
> writeback is quite slow (and that tests don't complete).  That IO
> latencies are high.  Etc.
>
> I'll work through it and likely enlist Joe Thornber's help next week.
>
> I'll keep you posted as progress is made though.
>
> Thanks,
> Mike
>

[-- Attachment #2: Type: text/html, Size: 4127 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20 10:21   ` Oleg Cherkasov
@ 2017-10-20 10:38     ` Xen
  2017-10-20 11:41       ` Oleg Cherkasov
  0 siblings, 1 reply; 37+ messages in thread
From: Xen @ 2017-10-20 10:38 UTC (permalink / raw)
  To: linux-lvm

Oleg Cherkasov schreef op 20-10-2017 10:21:
> On 19. okt. 2017 20:13, Xen wrote:
>> 
>> The main cause was a way too slow SSD but at the same time... that 
>> sorta thing still shouldn't happen, locking up the entire system.
>> 
>> I haven't had a chance to try again with a faster SSD.
> 
> I have double checked with MegaRAID/CLI and all disks on that rig
> (including SSD ones of course) are SAS 6Gb/s both devices and links.
> My first thought about those SSDs was that those are slower than RAID5
> however it seems not the case.
> 
> Could it be TRIMing issue because those are from 2012?

You mean that the SATA version is too low to interleave TRIMs with data 
access?

Because I think that was the case with my mSata SSD.

I don't currently remember the sata version that allowed interleaving 
but that SSD didn't reach or have it.

After trimming performance would go up greatly.

So I don't know about SAS but it might be similar right.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 18:13 ` Xen
@ 2017-10-20 10:21   ` Oleg Cherkasov
  2017-10-20 10:38     ` Xen
  0 siblings, 1 reply; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-20 10:21 UTC (permalink / raw)
  To: linux-lvm

On 19. okt. 2017 20:13, Xen wrote:
> 
> The main cause was a way too slow SSD but at the same time... that sorta 
> thing still shouldn't happen, locking up the entire system.
> 
> I haven't had a chance to try again with a faster SSD.

I have double checked with MegaRAID/CLI and all disks on that rig 
(including SSD ones of course) are SAS 6Gb/s both devices and links.  My 
first thought about those SSDs was that those are slower than RAID5 
however it seems not the case.

Could it be TRIMing issue because those are from 2012?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20  6:46   ` Xen
@ 2017-10-20  9:59     ` Oleg Cherkasov
  0 siblings, 0 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-20  9:59 UTC (permalink / raw)
  To: linux-lvm

On 20. okt. 2017 08:46, Xen wrote:
> matthew patton schreef op 20-10-2017 2:12:
>>> It is just a backup server,
>>
>> Then caching is pointless.
> 
> That's irrelevant and not up to another person to decide.
> 
>> Furthermore any half-wit caching solution
>> can detect streaming read/write and will deliberately bypass the
>> cache.
> 
> The problem was not performance, it was stability.
> 
>> Furthermore DD has never been a useful benchmark for anything.
>> And if you're not using 'odirect' it's even more pointless.
> 
> Performance was not the issue, stability was.
> 
>>> Server has 2x SSD drives by 256Gb each
>>
>> and for purposes of 'cache' should be individual VD and not waste
>> capacity on RAID1.
> 
> Is probably also going to be quite irrelevant to the problem at hand.
> 
>>> 10x 3Tb drives.  In addition  there are two
>>> MD1200 disk arrays attached with 12x 4Tb disks each.  All
>>
>> Raid5 for this size footprint is NUTs. Raid6 is the bare minimum.
> 
> That's also irrelevant to the problem at hand.

Hi Matthew,

I mostly agree with Xen about stability vs usability issues.  I have a 
stable system and available SSD partition with unused 240Gb so decided 
to run tests with LVM caching using different cache modes.  The _test_ 
results are in my posts so LVM caching has stability issues indeed 
regardless how I did set it up.

I do agree I would need to make a separate Virtual hardware volume for 
the cache and the most likely do not mirror it.  However, the 
performance of the system is defined by a weakest point so it may be 
indeed the slow SSD of course.  I may expect performance degradation 
because of that but not whole system lock down, deny of any services and 
follow with reboot.

Your assumptions about streaming operations of _just a backup server_ 
are not quite right.  Bareos Directory configuration running on a 
separate server pushes that Storage to run multiple backups in parallel 
and eventually restores at the same time.  Therefore even there are just 
few streams going in and out the RAID is really doing random read and 
write operations.

DD is definitely is not a good way to test any caching system, I do 
agree, however it is first first to try and see any good/bad/ugly 
results before running other tests like bonnie++.  In my case, the right 
next command after 'lvconvert' to cache and 'pvs' to check the status, 
were 'dd if=some_250G_file of=/dev/null bs=8M status=process' and that 
was the moment everything went completely unexpected with an unplanned 
reboot.

About RAID5 vs RAIS6, well, as I mentioned in a separate message there 
is a logical volume built of 3 hardware RAID5 virtual disks so it is not 
30+ disks in one RAID5 or something.  Besides, that server is a 
front-end to LTO-6 library so even unexpected happens it would take 3-4 
days to pile-up it from client hosts anyway. And I have few disks in 
stock so replacing and rebuilding RAID5 takes no more than 12 hours. 
RAID5 vs RAID6 is a matter of operational activities efficiency: watch 
dog system logs with Graylog2 and Dell OpenManage/MegaRAID, have spare 
disk and do everything on time.


Cheers,
Oleg

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-20  0:12 ` matthew patton
@ 2017-10-20  6:46   ` Xen
  2017-10-20  9:59     ` Oleg Cherkasov
  0 siblings, 1 reply; 37+ messages in thread
From: Xen @ 2017-10-20  6:46 UTC (permalink / raw)
  To: linux-lvm

matthew patton schreef op 20-10-2017 2:12:
>> It is just a backup server,
> 
> Then caching is pointless.

That's irrelevant and not up to another person to decide.

> Furthermore any half-wit caching solution
> can detect streaming read/write and will deliberately bypass the
> cache.

The problem was not performance, it was stability.

> Furthermore DD has never been a useful benchmark for anything.
> And if you're not using 'odirect' it's even more pointless.

Performance was not the issue, stability was.

>> Server has 2x SSD drives by 256Gb each
> 
> and for purposes of 'cache' should be individual VD and not waste
> capacity on RAID1.

Is probably also going to be quite irrelevant to the problem at hand.

>> 10x 3Tb drives.  In addition  there are two
>> MD1200 disk arrays attached with 12x 4Tb disks each.  All
> 
> Raid5 for this size footprint is NUTs. Raid6 is the bare minimum.

That's also irrelevant to the problem at hand.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 21:14     ` John Stoffel
@ 2017-10-20  6:42       ` Xen
  0 siblings, 0 replies; 37+ messages in thread
From: Xen @ 2017-10-20  6:42 UTC (permalink / raw)
  To: linux-lvm

John Stoffel schreef op 19-10-2017 23:14:

> And RHEL7.4/CentOS 7 is all based on kernel 3.14 (I think) with lots
> of RedHat specific backports.  So knowing the full details will only
> help us provide help to him.

Alright I missed that, sorry.

Still given that a Red Hat developer has stated awareness about the 
problem that means that other than the kernel it isn't likely that 
individual config is going to play a big role.

Also it is likely that anyone in the position to really help would 
already recognise the problems.

I just mean to say that it is going to need a developer and is not very 
likely that individual config is at fault.

Although a different kernel would see different behaviour, you're right 
about that, my apologies.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
       [not found] <541215543.377417.1508458336923.ref@mail.yahoo.com>
@ 2017-10-20  0:12 ` matthew patton
  2017-10-20  6:46   ` Xen
  0 siblings, 1 reply; 37+ messages in thread
From: matthew patton @ 2017-10-20  0:12 UTC (permalink / raw)
  To: LVM general discussion and development

> It is just a backup server,

Then caching is pointless. Furthermore any half-wit caching solution can detect streaming read/write and will deliberately bypass the cache. Furthermore DD has never been a useful benchmark for anything. And if you're not using 'odirect' it's even more pointless.

> Server has 2x SSD drives by 256Gb each

and for purposes of 'cache' should be individual VD and not waste capacity on RAID1. Your controller's battery-backed RAM is for write-back purposes if you want to play that game. Cache is disposable. You can yank the power cord out of the drive and the software will continue. Now if you were TIERing, that's a different topic and depends on the implementation whether or not you can lose a device. The good ones make sure the SSD can disappear and nothing bad happens.

> 10x 3Tb drives.  In addition  there are two
> MD1200 disk arrays attached with 12x 4Tb disks each.  All

Raid5 for this size footprint is NUTs. Raid6 is the bare minimum.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 19:09 ` John Stoffel
  2017-10-19 19:46   ` Xen
@ 2017-10-19 21:59   ` Oleg Cherkasov
  2017-10-20 19:35     ` John Stoffel
  2017-10-21  2:55     ` Mike Snitzer
  1 sibling, 2 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-19 21:59 UTC (permalink / raw)
  To: linux-lvm

On 19. okt. 2017 21:09, John Stoffel wrote:
 >
 > Oleg> Recently I have decided to try out LVM cache feature on one of
 > Oleg> our Dell NX3100 servers running CentOS 7.4.1708 with 110Tb disk
 > Oleg> array (hardware RAID5 with H710 and H830 Dell adapters).  Two
 > Oleg> SSD disks each 256Gb are in hardware RAID1 using H710 adapter
 > Oleg> with primary and extended partitions so I decided to make ~240Gb
 > Oleg> LVM cache to see if system I/O may be improved.  The server is
 > Oleg> running Bareos storage daemon and beside sshd and Dell
 > Oleg> OpenManage monitoring does not have any other services.
 > Oleg> Unfortunately testing went not as I expected nonetheless at the
 > Oleg> end system is up and running with no data corrupted.
 >
 > Can you give more details about the system.  Is this providing storage
 > services (NFS) or is it just a backup server?

It is just a backup server, Bareos Storage Daemon + Dell OpenManage for 
LSI RAID cards (Dell's H7XX and H8XX are LSI based).  That host 
deliberately do no share any files or resources for security reasons, so 
no NFS or SMB.

Server has 2x SSD drives by 256Gb each and 10x 3Tb drives.  In addition 
there are two MD1200 disk arrays attached with 12x 4Tb disks each.  All 
disks exposed to CentOS as Virtual so there are 4 disks in total:

NAME                                      MAJ:MIN RM   SIZE RO TYPE
sda                                         8:0    0 278.9G  0 disk
├─sda1                                      8:1    0   500M  0 part /boot
├─sda2                                      8:2    0  36.1G  0 part
│ ├─centos-swap                           253:0    0  11.7G  0 lvm  [SWAP]
│ └─centos-root                           253:1    0  24.4G  0 lvm
├─sda3                                      8:3    0     1K  0 part
└─sda5                                      8:5    0 242.3G  0 part
sdb                                         8:16   0    30T  0 disk
└─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
sdc                                         8:32   0    40T  0 disk
└─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm
sdd                                         8:48   0    40T  0 disk
└─primary_backup_vg-primary_backup_lv     253:5    0 110.1T  0 lvm

RAM 12Gb, swap around 12Gb as well.  /dev/sda is a hardware RAID1, the 
rest are RAID5.

I did make a cache and cache_meta on /dev/sda5.  It used to be a 
partition for Bareos spool for quite some time and because after 
upgrading to 10GbBASE network I do not need that spooler any more so I 
decided to try LVM cache.

 > How did you setup your LVM config and your cache config?  Did you
 > mirror the two SSDs using MD, then add the device into your VG and use
 > that to setup the lvcache?
All configs are stock CentOS 7.4 at the moment (incrementally upgraded 
from 7.0 of course), so I did not customize or tried to make any 
optimization on config.
 > I ask because I'm running lvcache at home on my main file/kvm server
 > and I've never seen this problem.  But!  I suspect you're running a
 > much older kernel, lvm config, etc.  Please post the full details of
 > your system if you can.
3.10.0-693.2.2.el7.x86_64

CentOS 7.4, as been pointed by Xen, released about a month ago and I had 
updated about a week ago while doing planned maintenance on network so 
had a good excuse to reboot it.

 > Oleg> Initially I have tried the default writethrough mode and after
 > Oleg> running dd reading test with 250Gb file got system unresponsive
 > Oleg> for roughly 15min with cache allocation around 50%.  Writing to
 > Oleg> disks it seems speed up the system however marginally, so around
 > Oleg> 10% on my tests and I did manage to pull more than 32Tb via
 > Oleg> backup from different hosts and once system became unresponsive
 > Oleg> to ssh and icmp requests however for a very short time.
 >
 > Can you run 'top' or 'vmstat -admt 10' on the console while you're
 > running your tests to see what the system does?  How does memory look
 > on this system when you're NOT runnig lvcache?

Well, it is a production system and I am not planning to cache it again 
for test however if any patches would be available then try to run a 
similar system test on spare box before converting it to FreeBSD with ZFS.

Nonetheless I tried to run top during the dd reading test however with 
in first few minutes I did not notice any issues with RAM.  System was 
using less then 2Gb of 12GB and the rest are wired (cache/buffers). 
After few minutes system became unresponsive even dropping ICMP ping 
requests and ssh session frozen and then dropped after time out, so no 
way to check top measurements.

I have recovered some of SAR records and I may see the last 20 minutes 
SAR did not manage to log anything from 2:40pm to 3:00pm before system 
got rebooted and back online at 3:10pm:

User stat:
02:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal 
  %idle
02:10:01 PM     all      0.22      0.00      0.08      0.05      0.00 
  99.64
02:20:35 PM     all      0.21      0.00      5.23     20.58      0.00 
  73.98
02:30:51 PM     all      0.23      0.00      0.43     31.06      0.00 
  68.27
02:40:02 PM     all      0.06      0.00      0.15     18.55      0.00 
  81.24
Average:        all      0.19      0.00      1.54     17.67      0.00 
  80.61

I/O stat:
02:00:01 PM       tps      rtps      wtps   bread/s   bwrtn/s
02:10:01 PM      5.27      3.19      2.08    109.29    195.38
02:20:35 PM   4404.80   3841.22    563.58 971542.00 140195.66
02:30:51 PM   1110.49    586.67    523.83 148206.31 131721.52
02:40:02 PM    510.72    211.29    299.43  51321.12  76246.81
Average:      1566.86   1214.43    352.43 306453.67  88356.03

DMs:
02:00:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
    await     svctm     %util
Average:       dev8-0    370.04    853.43  88355.91    241.08     85.32 
   230.56      1.61     59.54
Average:      dev8-16      0.02      0.14      0.02      8.18      0.00 
     3.71      3.71      0.01
Average:      dev8-32   1196.77 305599.78      0.04    255.35      4.26 
     3.56      0.09     11.28
Average:      dev8-48      0.02      0.35      0.06     18.72      0.00 
    17.77     17.77      0.04
Average:     dev253-0    151.59    118.15   1094.56      8.00     13.60 
    89.71      2.07     31.36
Average:     dev253-1     15.01    722.81     53.73     51.73      3.08 
   204.85     28.35     42.56
Average:     dev253-2   1259.48 218411.68      0.07    173.41      0.21 
     0.16      0.08      9.98
Average:     dev253-3    681.29      1.27  87189.52    127.98    163.02 
   239.29      0.84     57.12
Average:     dev253-4      3.83     11.09     18.09      7.61      0.09 
    22.59     10.72      4.11
Average:     dev253-5   1940.54 305599.86      0.07    157.48      8.47 
     4.36      0.06     11.24

dev253:2 is the cache or actually was ...

Queue stat:
02:00:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
02:10:01 PM         1       302      0.09      0.05      0.05         0
02:20:35 PM         0       568      6.87      9.72      5.28         3
02:30:51 PM         1       569      5.46      6.83      5.83         2
02:40:02 PM         0       568      0.18      2.41      4.26         1
Average:            0       502      3.15      4.75      3.85         2

RAM stat:
02:00:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit 
  %commit  kbactive   kbinact   kbdirty
02:10:01 PM    256304  11866580     97.89     66860   9181100   2709288 
    11.10   5603576   5066808        32
02:20:35 PM    185160  11937724     98.47     56712     39104   2725476 
    11.17    299256    292604        16
02:30:51 PM    175220  11947664     98.55     56712     29640   2730732 
    11.19    113912    113552        24
02:40:02 PM  11195028    927856      7.65     57504     62416   2696248 
    11.05    119488    164076        16
Average:      2952928   9169956     75.64     59447   2328065   2715436 
    11.12   1534058   1409260        22

SWAP stat:
02:00:01 PM kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
02:10:01 PM  12010984    277012      2.25     71828     25.93
02:20:35 PM  11048040   1239956     10.09     88696      7.15
02:30:51 PM  10723456   1564540     12.73     38272      2.45
02:40:02 PM  10716884   1571112     12.79     77928      4.96
Average:     11124841   1163155      9.47     69181      5.95



Cheers,
Oleg

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 19:46   ` Xen
@ 2017-10-19 21:14     ` John Stoffel
  2017-10-20  6:42       ` Xen
  0 siblings, 1 reply; 37+ messages in thread
From: John Stoffel @ 2017-10-19 21:14 UTC (permalink / raw)
  To: Xen; +Cc: LVM general discussion and development

>>>>> "Xen" == Xen  <list@xenhideout.nl> writes:

Xen> John Stoffel schreef op 19-10-2017 21:09:
>> How did you setup your LVM config and your cache config?  Did you
>> mirror the two SSDs using MD

Xen> He said he used hardware RAID to mirror the devices.

Ok, missed that.  But still we need the LVM config info and details on
the system config to address these issues.  I suspect he's not running
with any swap configured as well, and something is pushing the system
over the line.  But it's hard to know.

Any output from 'dmesg' you can share?  The more detailed the better.


>> I ask because I'm running lvcache at home on my main file/kvm server
>> and I've never seen this problem.  But!  I suspect you're running a
>> much older kernel, lvm config, etc.

Xen> lvm2-2.02.171-8.el7.x86_64

Xen> CentOS 7.4 was released a month ago.

And RHEL7.4/CentOS 7 is all based on kernel 3.14 (I think) with lots
of RedHat specific backports.  So knowing the full details will only
help us provide help to him.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 19:09 ` John Stoffel
@ 2017-10-19 19:46   ` Xen
  2017-10-19 21:14     ` John Stoffel
  2017-10-19 21:59   ` Oleg Cherkasov
  1 sibling, 1 reply; 37+ messages in thread
From: Xen @ 2017-10-19 19:46 UTC (permalink / raw)
  To: LVM general discussion and development

John Stoffel schreef op 19-10-2017 21:09:

> How did you setup your LVM config and your cache config?  Did you
> mirror the two SSDs using MD

He said he used hardware RAID to mirror the devices.

> I ask because I'm running lvcache at home on my main file/kvm server
> and I've never seen this problem.  But!  I suspect you're running a
> much older kernel, lvm config, etc.

lvm2-2.02.171-8.el7.x86_64

CentOS 7.4 was released a month ago.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 17:54 Oleg Cherkasov
  2017-10-19 18:13 ` Xen
  2017-10-19 18:49 ` Mike Snitzer
@ 2017-10-19 19:09 ` John Stoffel
  2017-10-19 19:46   ` Xen
  2017-10-19 21:59   ` Oleg Cherkasov
  2017-10-20 16:20 ` lejeczek
  2017-10-24 14:51 ` lejeczek
  4 siblings, 2 replies; 37+ messages in thread
From: John Stoffel @ 2017-10-19 19:09 UTC (permalink / raw)
  To: LVM general discussion and development


Oleg> Recently I have decided to try out LVM cache feature on one of
Oleg> our Dell NX3100 servers running CentOS 7.4.1708 with 110Tb disk
Oleg> array (hardware RAID5 with H710 and H830 Dell adapters).  Two
Oleg> SSD disks each 256Gb are in hardware RAID1 using H710 adapter
Oleg> with primary and extended partitions so I decided to make ~240Gb
Oleg> LVM cache to see if system I/O may be improved.  The server is
Oleg> running Bareos storage daemon and beside sshd and Dell
Oleg> OpenManage monitoring does not have any other services.
Oleg> Unfortunately testing went not as I expected nonetheless at the
Oleg> end system is up and running with no data corrupted.

Can you give more details about the system.  Is this providing storage
services (NFS) or is it just a backup server?

How did you setup your LVM config and your cache config?  Did you
mirror the two SSDs using MD, then add the device into your VG and use
that to setup the lvcache?

I ask because I'm running lvcache at home on my main file/kvm server
and I've never seen this problem.  But!  I suspect you're running a
much older kernel, lvm config, etc.  Please post the full details of
your system if you can. 

Oleg> Initially I have tried the default writethrough mode and after
Oleg> running dd reading test with 250Gb file got system unresponsive
Oleg> for roughly 15min with cache allocation around 50%.  Writing to
Oleg> disks it seems speed up the system however marginally, so around
Oleg> 10% on my tests and I did manage to pull more than 32Tb via
Oleg> backup from different hosts and once system became unresponsive
Oleg> to ssh and icmp requests however for a very short time.

Can you run 'top' or 'vmstat -admt 10' on the console while you're
running your tests to see what the system does?  How does memory look
on this system when you're NOT runnig lvcache?

Do you have any swap space configured on the system?  It might make
sense to allocate 10-20gb of swap space.  

Oleg> I though it may be something with cache mode so switched to writeback 
Oleg> via lvconvert and run dd reading test again with 250Gb file however that 
Oleg> time everything went completely unexpected.  System started to slow 
Oleg> responding for simple user interactions like list files and run top. And 
Oleg> then became completely unresponsive for about half an hours.  Switching 
Oleg> to main console via iLO I saw a lot of OOM messages and kernel tried to 
Oleg> survive therefore randomly killed almost all processes.  Eventually I 
Oleg> did manage to reboot and immediately uncached the array.

Oleg> My question is about very strange behavior of LVM cache.  Well, I may 
Oleg> expect no performance boost or even I/O degradation however I do not 
Oleg> expect run out of memory and than OOM kicks in.  That server has only 
Oleg> 12Gb RAM however it does run only sshd, bareos SD daemon and OpenManange 
Oleg> java based monitoring system so no RAM problems were notices for last 
Oleg> few years running with our LVM cache.

Oleg> Any ideas what may be wrong?  I have second NX3200 server with similar 
Oleg> hardware setup and it would be switch to FreeBSD 11.1 with ZFS very time 
Oleg> soon however I may try to install CentOS 7.4 first and see if the 
Oleg> problem may be reproduced.

Oleg> LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.


Oleg> Thank you!
Oleg> Oleg

Oleg> _______________________________________________
Oleg> linux-lvm mailing list
Oleg> linux-lvm@redhat.com
Oleg> https://www.redhat.com/mailman/listinfo/linux-lvm
Oleg> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 17:54 Oleg Cherkasov
  2017-10-19 18:13 ` Xen
@ 2017-10-19 18:49 ` Mike Snitzer
  2017-10-20 11:07   ` Joe Thornber
  2017-10-19 19:09 ` John Stoffel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2017-10-19 18:49 UTC (permalink / raw)
  To: Oleg Cherkasov; +Cc: ejt, linux-lvm

On Thu, Oct 19 2017 at  1:54pm -0400,
Oleg Cherkasov <o1e9@member.fsf.org> wrote:

> Hi,
> 
> Recently I have decided to try out LVM cache feature on one of our
> Dell NX3100 servers running CentOS 7.4.1708 with 110Tb disk array
> (hardware RAID5 with H710 and H830 Dell adapters).  Two SSD disks
> each 256Gb are in hardware RAID1 using H710 adapter with primary and
> extended partitions so I decided to make ~240Gb LVM cache to see if
> system I/O may be improved.  The server is running Bareos storage
> daemon and beside sshd and Dell OpenManage monitoring does not have
> any other services. Unfortunately testing went not as I expected
> nonetheless at the end system is up and running with no data
> corrupted.
> 
> Initially I have tried the default writethrough mode and after
> running dd reading test with 250Gb file got system unresponsive for
> roughly 15min with cache allocation around 50%.  Writing to disks it
> seems speed up the system however marginally, so around 10% on my
> tests and I did manage to pull more than 32Tb via backup from
> different hosts and once system became unresponsive to ssh and icmp
> requests however for a very short time.
> 
> I though it may be something with cache mode so switched to
> writeback via lvconvert and run dd reading test again with 250Gb
> file however that time everything went completely unexpected.
> System started to slow responding for simple user interactions like
> list files and run top. And then became completely unresponsive for
> about half an hours.  Switching to main console via iLO I saw a lot
> of OOM messages and kernel tried to survive therefore randomly
> killed almost all processes.  Eventually I did manage to reboot and
> immediately uncached the array.
> 
> My question is about very strange behavior of LVM cache.  Well, I
> may expect no performance boost or even I/O degradation however I do
> not expect run out of memory and than OOM kicks in.  That server has
> only 12Gb RAM however it does run only sshd, bareos SD daemon and
> OpenManange java based monitoring system so no RAM problems were
> notices for last few years running with our LVM cache.
> 
> Any ideas what may be wrong?  I have second NX3200 server with
> similar hardware setup and it would be switch to FreeBSD 11.1 with
> ZFS very time soon however I may try to install CentOS 7.4 first and
> see if the problem may be reproduced.
> 
> LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.

Your experience is _not_ unique.  It is unfortunate but there would seem
to be some systemic issues with dm-cache being too resoruce heavy.  Not
aware of any particular issue(s) yet.

I'm focusing on this now since we've had some internal reports that
writeback is quite slow (and that tests don't complete).  That IO
latencies are high.  Etc.

I'll work through it and likely enlist Joe Thornber's help next week.

I'll keep you posted as progress is made though.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] cache on SSD makes system unresponsive
  2017-10-19 17:54 Oleg Cherkasov
@ 2017-10-19 18:13 ` Xen
  2017-10-20 10:21   ` Oleg Cherkasov
  2017-10-19 18:49 ` Mike Snitzer
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Xen @ 2017-10-19 18:13 UTC (permalink / raw)
  To: linux-lvm

Oleg Cherkasov schreef op 19-10-2017 19:54:

> Any ideas what may be wrong?

All I know myself in the past have tried to cache an embedded encrypted 
LVM in a regular home system.

The problem was probably caused by the SSD not clearing write caches 
fast enough but I too got some 2 minute "hanging process" outputs on the 
console.

So it was probably a queueing issue within the kernel and might not have 
been related to the cache,

but I'm still not sure if there wasn't an interplay at work.

The main cause was a way too slow SSD but at the same time... that sorta 
thing still shouldn't happen, locking up the entire system.

I haven't had a chance to try again with a faster SSD.

Regards...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [linux-lvm] cache on SSD makes system unresponsive
@ 2017-10-19 17:54 Oleg Cherkasov
  2017-10-19 18:13 ` Xen
                   ` (4 more replies)
  0 siblings, 5 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-19 17:54 UTC (permalink / raw)
  To: linux-lvm

Hi,

Recently I have decided to try out LVM cache feature on one of our Dell 
NX3100 servers running CentOS 7.4.1708 with 110Tb disk array (hardware 
RAID5 with H710 and H830 Dell adapters).  Two SSD disks each 256Gb are 
in hardware RAID1 using H710 adapter with primary and extended 
partitions so I decided to make ~240Gb LVM cache to see if system I/O 
may be improved.  The server is running Bareos storage daemon and beside 
sshd and Dell OpenManage monitoring does not have any other services. 
Unfortunately testing went not as I expected nonetheless at the end 
system is up and running with no data corrupted.

Initially I have tried the default writethrough mode and after running 
dd reading test with 250Gb file got system unresponsive for roughly 
15min with cache allocation around 50%.  Writing to disks it seems speed 
up the system however marginally, so around 10% on my tests and I did 
manage to pull more than 32Tb via backup from different hosts and once 
system became unresponsive to ssh and icmp requests however for a very 
short time.

I though it may be something with cache mode so switched to writeback 
via lvconvert and run dd reading test again with 250Gb file however that 
time everything went completely unexpected.  System started to slow 
responding for simple user interactions like list files and run top. And 
then became completely unresponsive for about half an hours.  Switching 
to main console via iLO I saw a lot of OOM messages and kernel tried to 
survive therefore randomly killed almost all processes.  Eventually I 
did manage to reboot and immediately uncached the array.

My question is about very strange behavior of LVM cache.  Well, I may 
expect no performance boost or even I/O degradation however I do not 
expect run out of memory and than OOM kicks in.  That server has only 
12Gb RAM however it does run only sshd, bareos SD daemon and OpenManange 
java based monitoring system so no RAM problems were notices for last 
few years running with our LVM cache.

Any ideas what may be wrong?  I have second NX3200 server with similar 
hardware setup and it would be switch to FreeBSD 11.1 with ZFS very time 
soon however I may try to install CentOS 7.4 first and see if the 
problem may be reproduced.

LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.


Thank you!
Oleg

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [linux-lvm] cache on SSD makes system unresponsive
@ 2017-10-19 10:05 Oleg Cherkasov
  0 siblings, 0 replies; 37+ messages in thread
From: Oleg Cherkasov @ 2017-10-19 10:05 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]

Hi,

Recently I have decided to try out LVM cache feature on one of our Dell 
NX3100 servers running CentOS 7.4.1708 with 110Tb disk array (hardware 
RAID5 with H710 and H830 Dell adapters).  Two SSD disks each 256Gb are 
in hardware RAID1 using H710 adapter with primary and extended 
partitions so I decided to make ~240Gb LVM cache to see if system I/O 
may be improved.  The server is running Bareos storage daemon and beside 
sshd and Dell OpenManage monitoring does not have any other services.  
Unfortunately testing went not as I expected nonetheless at the end 
system is up and running with no data corrupted.

Initially I have tried the default writethrough mode and after running 
dd reading test with 250Gb file got system unresponsive for roughly 
15min with cache allocation around 50%. Writing to disks it seems speed 
up the system however marginally, so around 10% on my tests and I did 
manage to pull more than 32Tb via backup from different hosts and once 
system became unresponsive to ssh and icmp requests however for a very 
short time.

I though it may be something with cache mode so switched to writeback 
via lvconvert and run dd reading test again with 250Gb file however that 
time everything went completely unexpected.  System started to slow 
responding for simple user interactions like list files and run top. And 
then became completely unresponsive for about half an hours. Switching 
to main console via iLO I saw a lot of OOM messages and kernel tried to 
survive therefore randomly killed almost all processes.  Eventually I 
did manage to reboot and immediately uncached the array.

My question is about very strange behavior of LVM cache.  Well, I may 
expect no performance boost or even I/O degradation however I do not 
expect run out of memory and than OOM kicks in.  That server has only 
12Gb RAM however it does run only sshd, bareos SD daemon and OpenManange 
java based monitoring system so no RAM problems were notices for last 
few years running with our LVM cache.

Any ideas what may be wrong?  I have second NX3200 server with similar 
hardware setup and it would be switch to FreeBSD 11.1 with ZFS very time 
soon however I may try to install CentOS 7.4 first and see if the 
problem may be reproduced.

LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.


Thank you!

Oleg


[-- Attachment #2: Type: text/html, Size: 3840 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-10-24 23:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1244564108.1073508.1508601932111.ref@mail.yahoo.com>
2017-10-21 16:05 ` [linux-lvm] cache on SSD makes system unresponsive matthew patton
2017-10-24 18:09   ` Oleg Cherkasov
     [not found] <640472762.2746512.1508882485777.ref@mail.yahoo.com>
2017-10-24 22:01 ` matthew patton
2017-10-24 23:10   ` Chris Friesen
     [not found] <1928541660.2031191.1508802005006.ref@mail.yahoo.com>
2017-10-23 23:40 ` matthew patton
2017-10-24 15:36   ` Xen
     [not found] <1714773615.1945146.1508792555922.ref@mail.yahoo.com>
2017-10-23 21:02 ` matthew patton
2017-10-23 21:54   ` Xen
2017-10-24  2:51   ` John Stoffel
     [not found] <1540708205.1077645.1508602122091.ref@mail.yahoo.com>
2017-10-21 16:08 ` matthew patton
     [not found] <541215543.377417.1508458336923.ref@mail.yahoo.com>
2017-10-20  0:12 ` matthew patton
2017-10-20  6:46   ` Xen
2017-10-20  9:59     ` Oleg Cherkasov
2017-10-19 17:54 Oleg Cherkasov
2017-10-19 18:13 ` Xen
2017-10-20 10:21   ` Oleg Cherkasov
2017-10-20 10:38     ` Xen
2017-10-20 11:41       ` Oleg Cherkasov
2017-10-19 18:49 ` Mike Snitzer
2017-10-20 11:07   ` Joe Thornber
2017-10-19 19:09 ` John Stoffel
2017-10-19 19:46   ` Xen
2017-10-19 21:14     ` John Stoffel
2017-10-20  6:42       ` Xen
2017-10-19 21:59   ` Oleg Cherkasov
2017-10-20 19:35     ` John Stoffel
2017-10-21  3:05       ` Mike Snitzer
2017-10-21 14:33       ` Oleg Cherkasov
2017-10-23 10:58         ` Zdenek Kabelac
2017-10-21  2:55     ` Mike Snitzer
2017-10-21 14:10       ` Oleg Cherkasov
2017-10-23 20:45         ` John Stoffel
2017-10-20 16:20 ` lejeczek
2017-10-20 16:48   ` Xen
2017-10-20 17:02     ` Bernd Eckenfels
2017-10-24 14:51 ` lejeczek
  -- strict thread matches above, loose matches on Subject: below --
2017-10-19 10:05 Oleg Cherkasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).