* dm-cache issue @ 2016-11-14 15:02 Alexander Pashaliyski 2016-11-14 15:34 ` Zdenek Kabelac 2016-11-15 19:57 ` John Stoffel 0 siblings, 2 replies; 10+ messages in thread From: Alexander Pashaliyski @ 2016-11-14 15:02 UTC (permalink / raw) To: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 663 bytes --] Hi guys, I am in a process of evaluating dm-cache for our backup system. Currently I have an issue when restart the backup server. The server is booting for hours, because of IO load. It seems is triggered a flush from SSD disk (that is used for a cache device) to the raid controllers (they are with slow SATA disks). I have 10 cached logical volumes in *writethrough mode*, each with 2T of data over 2 raid controllers. I use a single SSD disk for the cache. The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. Do you have any ideas why such flush is triggered? In writethrough cache mode we shouldn't have dirty blocks in the cache. Thanks, Alex [-- Attachment #1.2: Type: text/html, Size: 772 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-14 15:02 dm-cache issue Alexander Pashaliyski @ 2016-11-14 15:34 ` Zdenek Kabelac 2016-11-14 16:05 ` Alexander Pashaliyski 2016-11-15 12:38 ` Teodor Milkov 2016-11-15 19:57 ` John Stoffel 1 sibling, 2 replies; 10+ messages in thread From: Zdenek Kabelac @ 2016-11-14 15:34 UTC (permalink / raw) To: Alexander Pashaliyski, dm-devel Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a): > Hi guys, > > > I am in a process of evaluating dm-cache for our backup system. > > Currently I have an issue when restart the backup server. The server is > booting for hours, because of IO load. It seems is triggered a flush from SSD > disk (that is used for a cache device) to the raid controllers (they are with > slow SATA disks). > I have 10 cached logical volumes in *writethrough mode*, each with 2T of data > over 2 raid controllers. I use a single SSD disk for the cache. > The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. > > > Do you have any ideas why such flush is triggered? In writethrough cache mode > we shouldn't have dirty blocks in the cache. > Hi Have you ensured there was proper shutdown ? Cache needs to be properly deactivated - if it's just turned off, all metadata are marked dirty. Regards Zdenek ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-14 15:34 ` Zdenek Kabelac @ 2016-11-14 16:05 ` Alexander Pashaliyski 2016-11-15 12:38 ` Teodor Milkov 1 sibling, 0 replies; 10+ messages in thread From: Alexander Pashaliyski @ 2016-11-14 16:05 UTC (permalink / raw) To: Zdenek Kabelac, dm-devel On 11/14/2016 05:34 PM, Zdenek Kabelac wrote: > Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a): >> Hi guys, >> >> >> I am in a process of evaluating dm-cache for our backup system. >> >> Currently I have an issue when restart the backup server. The server is >> booting for hours, because of IO load. It seems is triggered a flush >> from SSD >> disk (that is used for a cache device) to the raid controllers (they >> are with >> slow SATA disks). >> I have 10 cached logical volumes in *writethrough mode*, each with 2T >> of data >> over 2 raid controllers. I use a single SSD disk for the cache. >> The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. >> >> >> Do you have any ideas why such flush is triggered? In writethrough >> cache mode >> we shouldn't have dirty blocks in the cache. >> > > Hi > > Have you ensured there was proper shutdown ? > Cache needs to be properly deactivated - if it's just turned off, > all metadata are marked dirty. > > > Regards > > Zdenek > Hi Zdenek, Thank you for your answer. Yes, I am sure. I have modified the lvm2 init script to de-activate all logical volumes and then volume groups, but the issue still persist. Regards, Alex ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-14 15:34 ` Zdenek Kabelac 2016-11-14 16:05 ` Alexander Pashaliyski @ 2016-11-15 12:38 ` Teodor Milkov 2016-11-16 9:24 ` Zdenek Kabelac 1 sibling, 1 reply; 10+ messages in thread From: Teodor Milkov @ 2016-11-15 12:38 UTC (permalink / raw) To: dm-devel On 14.11.2016 17:34, Zdenek Kabelac wrote: > Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a): >> The server is booting for hours, because of IO load. It seems is >> triggered a flush from SSD disk (that is used for a cache device) to >> the raid controllers (they are with slow SATA disks). >> I have 10 cached logical volumes in *writethrough mode*, each with 2T >> of data over 2 raid controllers. I use a single SSD disk for the cache. >> The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. >> >> Do you have any ideas why such flush is triggered? In writethrough >> cache mode >> we shouldn't have dirty blocks in the cache. >> > > Have you ensured there was proper shutdown ? > Cache needs to be properly deactivated - if it's just turned off, > all metadata are marked dirty. > > Zdenek Hi, I'm seeing the same behavior described by Alexander. Even if we assume something is wrong with my shutdown scripts, still how could dm-cache ever be dirty in writethrough mode? What about the case where server crashes for whatever reason (kernel bug, power outage, operator error etc.)? Waiting several hours, or for sufficiently large cache even days for the system to come back up is not practical. I found this 2013 conversation, where Heinz Mauelshagen <heinzm redhat com> states that "in writethrough mode the cache will always be coherent after a crash": https://www.redhat.com/archives/dm-devel/2013-July/msg00117.html I'm thinking for a way to --uncache and recreate cache devices on every boot, which should be safe in writethrough mode and takes reasonable, and more importantly – constant amount of time. Best regards, Teodor Milkov -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-15 12:38 ` Teodor Milkov @ 2016-11-16 9:24 ` Zdenek Kabelac 2016-11-16 13:45 ` Teodor Milkov 0 siblings, 1 reply; 10+ messages in thread From: Zdenek Kabelac @ 2016-11-16 9:24 UTC (permalink / raw) To: Teodor Milkov, dm-devel Dne 15.11.2016 v 13:38 Teodor Milkov napsal(a): > On 14.11.2016 17:34, Zdenek Kabelac wrote: >> Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a): >>> The server is booting for hours, because of IO load. It seems is triggered >>> a flush from SSD disk (that is used for a cache device) to the raid >>> controllers (they are with slow SATA disks). >>> I have 10 cached logical volumes in *writethrough mode*, each with 2T of >>> data over 2 raid controllers. I use a single SSD disk for the cache. >>> The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. >>> >>> Do you have any ideas why such flush is triggered? In writethrough cache mode >>> we shouldn't have dirty blocks in the cache. >>> >> >> Have you ensured there was proper shutdown ? >> Cache needs to be properly deactivated - if it's just turned off, >> all metadata are marked dirty. >> >> Zdenek > > Hi, > > I'm seeing the same behavior described by Alexander. Even if we assume > something is wrong with my shutdown scripts, still how could dm-cache ever be > dirty in writethrough mode? What about the case where server crashes for > whatever reason (kernel bug, power outage, operator error etc.)? Waiting > several hours, or for sufficiently large cache even days for the system to > come back up is not practical. > > I found this 2013 conversation, where Heinz Mauelshagen <heinzm redhat com> > states that "in writethrough mode the cache will always be coherent after a > crash": https://www.redhat.com/archives/dm-devel/2013-July/msg00117.html > > I'm thinking for a way to --uncache and recreate cache devices on every boot, > which should be safe in writethrough mode and takes reasonable, and more > importantly – constant amount of time. My first 'guess' in this reported case is - the disk I/O traffic seen is related to the 'reload' of cached chunks from disk back to cache. This will happen in the case, there has been unclean cache shutdown. However what is unclean is - why it slows down boot by hours. Is the cache too big?? Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-16 9:24 ` Zdenek Kabelac @ 2016-11-16 13:45 ` Teodor Milkov 2016-11-16 14:06 ` Zdenek Kabelac 0 siblings, 1 reply; 10+ messages in thread From: Teodor Milkov @ 2016-11-16 13:45 UTC (permalink / raw) To: Zdenek Kabelac, dm-devel On 16.11.2016 11:24, Zdenek Kabelac wrote: > Dne 15.11.2016 v 13:38 Teodor Milkov napsal(a): >> On 14.11.2016 17:34, Zdenek Kabelac wrote: >>> Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a): >>>> The server is booting for hours, because of IO load. It seems is >>>> triggered >>>> a flush from SSD disk (that is used for a cache device) to the raid >>>> controllers (they are with slow SATA disks). >>>> I have 10 cached logical volumes in *writethrough mode*, each with >>>> 2T of >>>> data over 2 raid controllers. I use a single SSD disk for the cache. >>>> The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. >>>> >>>> Do you have any ideas why such flush is triggered? In writethrough >>>> cache mode >>>> we shouldn't have dirty blocks in the cache. >>>> >>> >>> Have you ensured there was proper shutdown ? >>> Cache needs to be properly deactivated - if it's just turned off, >>> all metadata are marked dirty. >>> >>> Zdenek >> >> Hi, >> >> I'm seeing the same behavior described by Alexander. Even if we assume >> something is wrong with my shutdown scripts, still how could dm-cache >> ever be >> dirty in writethrough mode? What about the case where server crashes for >> whatever reason (kernel bug, power outage, operator error etc.)? Waiting >> several hours, or for sufficiently large cache even days for the >> system to >> come back up is not practical. >> >> I found this 2013 conversation, where Heinz Mauelshagen <heinzm >> redhat com> >> states that "in writethrough mode the cache will always be coherent >> after a >> crash": https://www.redhat.com/archives/dm-devel/2013-July/msg00117.html >> >> I'm thinking for a way to --uncache and recreate cache devices on >> every boot, >> which should be safe in writethrough mode and takes reasonable, and more >> importantly – constant amount of time. > > My first 'guess' in this reported case is - the disk I/O traffic seen > is related to the 'reload' of cached chunks from disk back to cache. > > This will happen in the case, there has been unclean cache shutdown. > > However what is unclean is - why it slows down boot by hours. > Is the cache too big?? Indeed, cache is quite big – a 800GB SSD, but I found experimentally that this is the size where I get good cache hit ratios with my >10TB data volume. As to the 'reload' vs 'flush' – I think it is flushing, because iirc iostat showed lots of SSD reading and HDD writing, but I'm not really sure and need to confirm that. So, are you saying that in case of unclean shutdown this 'reload' is inevitable? How much time it takes obviously depends on the SSD size/speed & HDD speed, but with 800GB SSD it is reasonable to expect very long boot times. > Can you provide full logs from 'deactivation' and following activation? Any hints as to how to collect "full logs from 'deactivation' and following activation"? It happens early in the Debian boot process (I think udev does the activation) and I'm not sure how to enable logging... should I tweak /etc/lvm/lvm.conf? Best regards, Teodor Milkov -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-16 13:45 ` Teodor Milkov @ 2016-11-16 14:06 ` Zdenek Kabelac 2016-11-19 17:07 ` Teodor Milkov 0 siblings, 1 reply; 10+ messages in thread From: Zdenek Kabelac @ 2016-11-16 14:06 UTC (permalink / raw) To: Teodor Milkov, dm-devel Dne 16.11.2016 v 14:45 Teodor Milkov napsal(a): > On 16.11.2016 11:24, Zdenek Kabelac wrote: >> Dne 15.11.2016 v 13:38 Teodor Milkov napsal(a): >>> On 14.11.2016 17:34, Zdenek Kabelac wrote: >>>> Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a): >>>>> The server is booting for hours, because of IO load. It seems is triggered >>>>> a flush from SSD disk (that is used for a cache device) to the raid >>>>> controllers (they are with slow SATA disks). >>>>> I have 10 cached logical volumes in *writethrough mode*, each with 2T of >>>>> data over 2 raid controllers. I use a single SSD disk for the cache. >>>>> The backup system is with lvm2-2.02.164-1 & kernel 4.4.30. >>>>> >>>>> Do you have any ideas why such flush is triggered? In writethrough cache >>>>> mode >>>>> we shouldn't have dirty blocks in the cache. >>>>> >>>> >>>> Have you ensured there was proper shutdown ? >>>> Cache needs to be properly deactivated - if it's just turned off, >>>> all metadata are marked dirty. >>>> >>>> Zdenek >>> >>> Hi, >>> >>> I'm seeing the same behavior described by Alexander. Even if we assume >>> something is wrong with my shutdown scripts, still how could dm-cache ever be >>> dirty in writethrough mode? What about the case where server crashes for >>> whatever reason (kernel bug, power outage, operator error etc.)? Waiting >>> several hours, or for sufficiently large cache even days for the system to >>> come back up is not practical. >>> >>> I found this 2013 conversation, where Heinz Mauelshagen <heinzm redhat com> >>> states that "in writethrough mode the cache will always be coherent after a >>> crash": https://www.redhat.com/archives/dm-devel/2013-July/msg00117.html >>> >>> I'm thinking for a way to --uncache and recreate cache devices on every boot, >>> which should be safe in writethrough mode and takes reasonable, and more >>> importantly – constant amount of time. >> >> My first 'guess' in this reported case is - the disk I/O traffic seen is >> related to the 'reload' of cached chunks from disk back to cache. >> >> This will happen in the case, there has been unclean cache shutdown. >> >> However what is unclean is - why it slows down boot by hours. >> Is the cache too big?? > > Indeed, cache is quite big – a 800GB SSD, but I found experimentally that this > is the size where I get good cache hit ratios with my >10TB data volume. Yep - that's the current trouble of existing dm-cache target. It's getting inefficient when maintaining more then 1 million cache block entries - recent versions of lvm2 even do not allow create such cache without enforcing it. (so for 32k blocks it' ~30G cache data size) > As to the 'reload' vs 'flush' – I think it is flushing, because iirc iostat > showed lots of SSD reading and HDD writing, but I'm not really sure and need > to confirm that. > > So, are you saying that in case of unclean shutdown this 'reload' is inevitable? Yes - clean shutdown is mandatory - otherwise cache can't know consitency and has to refresh itself. Other option would be probably to drop cache and let it rebuild - but you lose already gained 'knowledge' this way. Anyway AFAIK there is ongoing devel and up-streaming process for new cache target which will others couple shortcomings and should perform much better. lvm2 will supposedly handle transition to a new format in some way later. > How much time it takes obviously depends on the SSD size/speed & HDD speed, > but with 800GB SSD it is reasonable to expect very long boot times. > >> Can you provide full logs from 'deactivation' and following activation? > > Any hints as to how to collect "full logs from 'deactivation' and following > activation"? It happens early in the Debian boot process (I think udev does > the activation) and I'm not sure how to enable logging... should I tweak > /etc/lvm/lvm.conf? All you need to collect is basically 'serial' console log from your machine - so if you have other box to trap serial console log - it's the most easiest option. But since you already said you use ~30times bigger cache size then the size with 'reasonable' performance - I think it's already clear where is your problem hidden. Until new target will be deployed - please consider to use significantly smaller cache size so the number of cache chunks is not above 1 000 000. Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-16 14:06 ` Zdenek Kabelac @ 2016-11-19 17:07 ` Teodor Milkov 2016-12-02 19:49 ` Teodor Milkov 0 siblings, 1 reply; 10+ messages in thread From: Teodor Milkov @ 2016-11-19 17:07 UTC (permalink / raw) To: Zdenek Kabelac, dm-devel On 16.11.2016 16:06, Zdenek Kabelac wrote: > Dne 16.11.2016 v 14:45 Teodor Milkov napsal(a): >> On 16.11.2016 11:24, Zdenek Kabelac wrote: >>> My first 'guess' in this reported case is - the disk I/O traffic >>> seen is >>> related to the 'reload' of cached chunks from disk back to cache. >>> >>> This will happen in the case, there has been unclean cache shutdown. >>> >>> However what is unclean is - why it slows down boot by hours. >>> Is the cache too big?? >> >> Indeed, cache is quite big – a 800GB SSD, but I found experimentally >> that this >> is the size where I get good cache hit ratios with my >10TB data volume. > > Yep - that's the current trouble of existing dm-cache target. > It's getting inefficient when maintaining more then 1 million > cache block entries - recent versions of lvm2 even do not allow > create such cache without enforcing it. > (so for 32k blocks it' ~30G cache data size) I'm sorry for not being clear: similarly to the OP my SSD is split among 10 LVs, so eache cache is around 80GB. >> As to the 'reload' vs 'flush' – I think it is flushing, because iirc >> iostat >> showed lots of SSD reading and HDD writing, but I'm not really sure >> and need >> to confirm that. >> >> So, are you saying that in case of unclean shutdown this 'reload' is >> inevitable? > > Yes - clean shutdown is mandatory - otherwise cache can't know consitency > and has to refresh itself. Other option would be probably to drop cache > and let it rebuild - but you lose already gained 'knowledge' this way. > > Anyway AFAIK there is ongoing devel and up-streaming process for new > cache target which will others couple shortcomings and should perform > much > better. lvm2 will supposedly handle transition to a new format in > some way > later. > >> How much time it takes obviously depends on the SSD size/speed & HDD >> speed, >> but with 800GB SSD it is reasonable to expect very long boot times. >> >>> Can you provide full logs from 'deactivation' and following activation? >> >> Any hints as to how to collect "full logs from 'deactivation' and >> following >> activation"? It happens early in the Debian boot process (I think >> udev does >> the activation) and I'm not sure how to enable logging... should I tweak >> /etc/lvm/lvm.conf? > > All you need to collect is basically 'serial' console log from your > machine - so if you have other box to trap serial console log - it's > the most easiest option. > > But since you already said you use ~30times bigger cache size then > the size with 'reasonable' performance - I think it's already clear > where is your > problem hidden. > > Until new target will be deployed - please consider to use > significantly smaller cache size so the number of cache chunks is not > above 1 000 000. Thank you very much for your help! I'll give it another go at debugging what the problem is. I found dm-writeboost in write_around_mode (kinda write-through) works well for me, so if I don't manage to get along with dm-cache I have plan B. Best regards, Teodor -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-19 17:07 ` Teodor Milkov @ 2016-12-02 19:49 ` Teodor Milkov 0 siblings, 0 replies; 10+ messages in thread From: Teodor Milkov @ 2016-12-02 19:49 UTC (permalink / raw) To: Zdenek Kabelac, dm-devel [-- Attachment #1.1: Type: text/plain, Size: 1435 bytes --] So, I ended up with the following setup: * Modified /etc/init.d/lvm2 to explicitly deactivate logical volumes: /lvm lvchange -a n --select name=~.*/ o Create////lvm2-clean-shutdown /if deactivation is successful.// * Turn off automatic LV activation by setting///auto_activation_volume_list=[]/ in //etc/lvm/lvm.conf/ * Try to activate LVs later in the boot only if//////lvm2-clean-shutdown /exists.// This way I have the chance to try manual activation of the volumes in the case of unclean shutdown, or eventually uncache and recreate caches if flush/refresh appears to take too much time. One strange thing is that while testing I forgot to set /auto_activation_volume_list=[] /and//boot process got stuck at an very early stage – probably at udev activating lvm //(////sbin/lvm pvscan --cache --activate ay/ from //lib/udev/rules.d/69-lvm-metad.rules)./// I waited for about five minutes but it didn't look like it's going to finish anytime soon, and I had no tty (too early in the boot), so I did reboot with a rescue CD and turned off LV auto activation in lvm.conf. This time the system came up quickly and the odd thing is that manual /lvchange -ay/ was pretty fast - less that a minute for all 12 × 60GB caches. So, maybe there's a bug in /udev/pvscan --activate/? Anyway, the above setup seems to work well for me, so I'm sticking with it for the time being. Best regards, Teodor [-- Attachment #1.2: Type: text/html, Size: 1979 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-cache issue 2016-11-14 15:02 dm-cache issue Alexander Pashaliyski 2016-11-14 15:34 ` Zdenek Kabelac @ 2016-11-15 19:57 ` John Stoffel 1 sibling, 0 replies; 10+ messages in thread From: John Stoffel @ 2016-11-15 19:57 UTC (permalink / raw) To: Alexander Pashaliyski; +Cc: dm-devel Alexander> I am in a process of evaluating dm-cache for our backup system. Why are you bothering to cache your backup destination? What filesystems are you using for your destinations? Alexander> Currently I have an issue when restart the backup Alexander> server. The server is booting for hours, because of IO Alexander> load. It seems is triggered a flush from SSD disk (that is Alexander> used for a cache device) to the raid controllers (they are Alexander> with slow SATA disks). I have 10 cached logical volumes in Alexander> writethrough mode, each with 2T of data over 2 raid Alexander> controllers. I use a single SSD disk for the cache. The Alexander> backup system is with lvm2-2.02.164-1 & kernel 4.4.30. How big is the SSD cache? Do you have any output from the bootup (syslog, dmesg, etc) you can share. Alexander> Do you have any ideas why such flush is triggered? In Alexander> writethrough cache mode we shouldn't have dirty blocks in Alexander> the cache. I wonder if it makes more sense to use 'lvcache' instead, since you can add/remove cache from LVs at will. With bcache, once you setup a volume with the cache, you're stuck with it. Also, did you mirror your SSD incase it fails? In my experience, using RAID6 on a bunch of SATA disks for backup is fine, you're writing large chunks of data in a few sets of files, so fragmentation and the Read-Modify-Write cycle is much less painful, since you tend to stream your writes to the storage, which is perfectly fine with SATA drives. Also, bcache skips sequential IO by default, which is what you want when doing backups... so I'm wondering what your thinking was here for using this? John ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-12-02 19:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-14 15:02 dm-cache issue Alexander Pashaliyski 2016-11-14 15:34 ` Zdenek Kabelac 2016-11-14 16:05 ` Alexander Pashaliyski 2016-11-15 12:38 ` Teodor Milkov 2016-11-16 9:24 ` Zdenek Kabelac 2016-11-16 13:45 ` Teodor Milkov 2016-11-16 14:06 ` Zdenek Kabelac 2016-11-19 17:07 ` Teodor Milkov 2016-12-02 19:49 ` Teodor Milkov 2016-11-15 19:57 ` John Stoffel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.