* [linux-lvm] lvconvert --uncache takes hours @ 2023-03-01 22:44 Roy Sigurd Karlsbakk 2023-03-01 22:55 ` Demi Marie Obenour 2023-03-02 0:51 ` Roger Heflin 0 siblings, 2 replies; 8+ messages in thread From: Roy Sigurd Karlsbakk @ 2023-03-01 22:44 UTC (permalink / raw) To: linux-lvm; +Cc: Malin Bruland Hi all Working with a friend's machine, it has lvmcache turned on with writeback. This has worked well, but now it's uncaching and it takes *hours*. The amount of cache was chosen to 100GB on an SSD not used for much else and the dataset that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly works with file serving, but also has some VMs that benefit from the caching quite a bit. But then - I wonder - how can it spend hours emptying the cache like this? Most write caching I know of last only seconds or perhaps in really worst case scenarios, minutes. Since this is taking hours, it looks to me something should have been flushed ages ago. Have I (or we) done something very stupid here or is this really how it's supposed to work? Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-01 22:44 [linux-lvm] lvconvert --uncache takes hours Roy Sigurd Karlsbakk @ 2023-03-01 22:55 ` Demi Marie Obenour 2023-03-02 0:51 ` Roger Heflin 1 sibling, 0 replies; 8+ messages in thread From: Demi Marie Obenour @ 2023-03-01 22:55 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Malin Bruland [-- Attachment #1.1: Type: text/plain, Size: 1234 bytes --] On Wed, Mar 01, 2023 at 11:44:00PM +0100, Roy Sigurd Karlsbakk wrote: > Hi all > > Working with a friend's machine, it has lvmcache turned on with writeback. This has worked well, but now it's uncaching and it takes *hours*. The amount of cache was chosen to 100GB on an SSD not used for much else and the dataset that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly works with file serving, but also has some VMs that benefit from the caching quite a bit. But then - I wonder - how can it spend hours emptying the cache like this? Most write caching I know of last only seconds or perhaps in really worst case scenarios, minutes. Since this is taking hours, it looks to me something should have been flushed ages ago. > > Have I (or we) done something very stupid here or is this really how it's supposed to work? It’s likely normal. HDDs stink at small random writes and RAID-6 makes this even worse. That said, I *strongly* recommend using three-disk RAID-1 for the cache, to match the redundancy of the RAID-6. With write-back caching, a failed cache will result in a corrupt and unrecoverable filesystem. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 202 bytes --] _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-01 22:44 [linux-lvm] lvconvert --uncache takes hours Roy Sigurd Karlsbakk 2023-03-01 22:55 ` Demi Marie Obenour @ 2023-03-02 0:51 ` Roger Heflin 2023-03-02 8:33 ` Roy Sigurd Karlsbakk 2023-03-02 17:34 ` Gionatan Danti 1 sibling, 2 replies; 8+ messages in thread From: Roger Heflin @ 2023-03-02 0:51 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Malin Bruland On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote: > > Hi all > > Working with a friend's machine, it has lvmcache turned on with writeback. This has worked well, but now it's uncaching and it takes *hours*. The amount of cache was chosen to 100GB on an SSD not used for much else and the dataset that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly works with file serving, but also has some VMs that benefit from the caching quite a bit. But then - I wonder - how can it spend hours emptying the cache like this? Most write caching I know of last only seconds or perhaps in really worst case scenarios, minutes. Since this is taking hours, it looks to me something should have been flushed ages ago. > > Have I (or we) done something very stupid here or is this really how it's supposed to work? > > Vennlig hilsen > > roy A spinning raid6 array is slow on writes (see raid6 write penalty). Because of that the array can only do about 100 write operattions/sec. If the disk is doing other work then it only has the extra capacity so it could destage slower. A lot depends on how big each chunk is. The lvmcache indicates the smallest chunksize is 32k. 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour. Lvm bookkeeping has to also be written to the spinning disks I would think, so 2 hours if the array were idle. Throw in a 50% baseload on the disks and you get 4 hours. Hours is reasonable. _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-02 0:51 ` Roger Heflin @ 2023-03-02 8:33 ` Roy Sigurd Karlsbakk 2023-03-02 11:27 ` Roger Heflin 2023-03-02 17:34 ` Gionatan Danti 1 sibling, 1 reply; 8+ messages in thread From: Roy Sigurd Karlsbakk @ 2023-03-02 8:33 UTC (permalink / raw) To: linux-lvm; +Cc: Malin Bruland ----- Original Message ----- > From: "Roger Heflin" <rogerheflin@gmail.com> > To: "linux-lvm" <linux-lvm@redhat.com> > Cc: "Malin Bruland" <malin.bruland@pm.me> > Sent: Thursday, 2 March, 2023 01:51:08 > Subject: Re: [linux-lvm] lvconvert --uncache takes hours > On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote: >> >> Hi all >> >> Working with a friend's machine, it has lvmcache turned on with writeback. This >> has worked well, but now it's uncaching and it takes *hours*. The amount of >> cache was chosen to 100GB on an SSD not used for much else and the dataset that >> is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly >> works with file serving, but also has some VMs that benefit from the caching >> quite a bit. But then - I wonder - how can it spend hours emptying the cache >> like this? Most write caching I know of last only seconds or perhaps in really >> worst case scenarios, minutes. Since this is taking hours, it looks to me >> something should have been flushed ages ago. >> >> Have I (or we) done something very stupid here or is this really how it's >> supposed to work? >> >> Vennlig hilsen >> >> roy > > A spinning raid6 array is slow on writes (see raid6 write penalty). > Because of that the array can only do about 100 write operattions/sec. About 100 writes/second per data drive, that is. md parallilses I/O well. > If the disk is doing other work then it only has the extra capacity so > it could destage slower. The system was mostly idle. > A lot depends on how big each chunk is. The lvmcache indicates the > smallest chunksize is 32k. > > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour. Those 100GB was on SSD, not spinning rust. Last I checked, that was the whole point with caching. > Lvm bookkeeping has to also be written to the spinning disks I would > think, so 2 hours if the array were idle. erm - why on earth would you do writes to hdd if you're caching it? > Throw in a 50% baseload on the disks and you get 4 hours. > > Hours is reasonable. As I said, the system was idle. Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-02 8:33 ` Roy Sigurd Karlsbakk @ 2023-03-02 11:27 ` Roger Heflin 0 siblings, 0 replies; 8+ messages in thread From: Roger Heflin @ 2023-03-02 11:27 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Malin Bruland On Thu, Mar 2, 2023 at 2:34 AM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote: > > > ----- Original Message ----- > > From: "Roger Heflin" <rogerheflin@gmail.com> > > To: "linux-lvm" <linux-lvm@redhat.com> > > Cc: "Malin Bruland" <malin.bruland@pm.me> > > Sent: Thursday, 2 March, 2023 01:51:08 > > Subject: Re: [linux-lvm] lvconvert --uncache takes hours > > > On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote: > >> > >> Hi all > >> > >> Working with a friend's machine, it has lvmcache turned on with writeback. This > >> has worked well, but now it's uncaching and it takes *hours*. The amount of > >> cache was chosen to 100GB on an SSD not used for much else and the dataset that > >> is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly > >> works with file serving, but also has some VMs that benefit from the caching > >> quite a bit. But then - I wonder - how can it spend hours emptying the cache > >> like this? Most write caching I know of last only seconds or perhaps in really > >> worst case scenarios, minutes. Since this is taking hours, it looks to me > >> something should have been flushed ages ago. > >> > >> Have I (or we) done something very stupid here or is this really how it's > >> supposed to work? > >> > >> Vennlig hilsen > >> > >> roy > > > > A spinning raid6 array is slow on writes (see raid6 write penalty). > > Because of that the array can only do about 100 write operattions/sec. > > About 100 writes/second per data drive, that is. md parallilses I/O well. > No. On writes you get 100 writes to the raid6 total. With reads you get 100 iops/disk. The writes by their very raid6 nature cannot be parallalized. Each write to md requires a lot of work. At min, you have to re-read the sector you are writing, read the parity you need to update, calculate the parity changes, and , adjust the parity and re-write any parities that you need to change. Your other option is you might be able to write an entire stripe, but that requires writes to all disks + parity calc + writes to parity. All options of writing data to raid5/6 breakdown to iops/disk == total write iops. The raid5/6 format requires the multiple reads and writes, and really makes it slow on writes. > > If the disk is doing other work then it only has the extra capacity so > > it could destage slower. > > The system was mostly idle. > > > A lot depends on how big each chunk is. The lvmcache indicates the > > smallest chunksize is 32k. > > > > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour. > > Those 100GB was on SSD, not spinning rust. Last I checked, that was the whole point with caching. You are de-staging the SSD cache to spinning disks. correct? The writes to spinning disks are slow. > > > Lvm bookkeeping has to also be written to the spinning disks I would > > think, so 2 hours if the array were idle. > > erm - why on earth would you do writes to hdd if you're caching it? Once the cache is gone all LVM should be on the spinning disks. > > > Throw in a 50% baseload on the disks and you get 4 hours. > > > > Hours is reasonable. > > As I said, the system was idle. > > Vennlig hilsen > _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-02 0:51 ` Roger Heflin 2023-03-02 8:33 ` Roy Sigurd Karlsbakk @ 2023-03-02 17:34 ` Gionatan Danti 2023-03-02 18:33 ` Roger Heflin 1 sibling, 1 reply; 8+ messages in thread From: Gionatan Danti @ 2023-03-02 17:34 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Roger Heflin, Malin Bruland Il 2023-03-02 01:51 Roger Heflin ha scritto: > A spinning raid6 array is slow on writes (see raid6 write penalty). > Because of that the array can only do about 100 write operattions/sec. True. But does flushing cached data really proceed in random LBA order (as seen by HDDs), rather than trying to coalesce writes in linear fashion? > If the disk is doing other work then it only has the extra capacity so > it could destage slower. > > A lot depends on how big each chunk is. The lvmcache indicates the > smallest chunksize is 32k. > > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an > hour. You are off an order of magnitude: 3 millions IOP at 100 IOPs means ~30000s, so about 9 hours. > Lvm bookkeeping has to also be written to the spinning disks I would > think, so 2 hours if the array were idle. > > Throw in a 50% baseload on the disks and you get 4 hours. > > Hours is reasonable. If flushing happens in random disk order, than yes, you are bound to wait several hours indeed. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-02 17:34 ` Gionatan Danti @ 2023-03-02 18:33 ` Roger Heflin 2023-03-02 20:47 ` Gionatan Danti 0 siblings, 1 reply; 8+ messages in thread From: Roger Heflin @ 2023-03-02 18:33 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Malin Bruland On Thu, Mar 2, 2023 at 11:44 AM Gionatan Danti <g.danti@assyoma.it> wrote: > > Il 2023-03-02 01:51 Roger Heflin ha scritto: > > A spinning raid6 array is slow on writes (see raid6 write penalty). > > Because of that the array can only do about 100 write operattions/sec. > > True. But does flushing cached data really proceed in random LBA order > (as seen by HDDs), rather than trying to coalesce writes in linear > fashion? > It is a 100G cache over 16TB, so even if it flushes in order the may not be that close to each other (1 in 160). Also if pieces are decided and added to the cached then the cache is not in order on the ssd and proper coalescing would require reading the entire cache and sorting the 3,000,000 location entries before starting the de-stage. And that complication of a de-stage is likely not been coded yet if I was just guessing, the de-stage starts at the beginning and continues to the end of the cache. Even coded though, if the you have enough blocks cached and if the blocks spread say one or 2 on each track it would break down to having to write a tiny bit on each track with seeks between mostly breaking down to the time required to simply read/write the HD end to end. At 150MB/sec (should be about the platter speed) that would take 3.5 hours. > > If the disk is doing other work then it only has the extra capacity so > > it could destage slower. > > > > A lot depends on how big each chunk is. The lvmcache indicates the > > smallest chunksize is 32k. > > > > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an > > hour. > > You are off an order of magnitude: 3 millions IOP at 100 IOPs means > ~30000s, so about 9 hours. Right, I did the calc in my head and screwed it up. I thought it should have been higher but did not re-check it. > > > Lvm bookkeeping has to also be written to the spinning disks I would > > think, so 2 hours if the array were idle. > > > > Throw in a 50% baseload on the disks and you get 4 hours. > > > > Hours is reasonable. > > If flushing happens in random disk order, than yes, you are bound to > wait several hours indeed. > _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] lvconvert --uncache takes hours 2023-03-02 18:33 ` Roger Heflin @ 2023-03-02 20:47 ` Gionatan Danti 0 siblings, 0 replies; 8+ messages in thread From: Gionatan Danti @ 2023-03-02 20:47 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Roger Heflin, Malin Bruland Il 2023-03-02 19:33 Roger Heflin ha scritto: > On Thu, Mar 2, 2023 at 11:44 AM Gionatan Danti <g.danti@assyoma.it> > wrote: > It is a 100G cache over 16TB, so even if it flushes in order the may > not be that close to each other (1 in 160). Yes, but destaging in LBA order (albeit far apart) is much better than in random order. > Also if pieces are decided and added to the cached then the cache is > not in order on the ssd and proper coalescing would require reading > the entire cache and sorting the 3,000,000 location entries before > starting the de-stage. And that complication of a de-stage is likely > not been coded yet if I was just guessing, the de-stage starts at the > beginning and continues to the end of the cache. I would expect reordering and coalescing to happen in reasonably sized window (ie: collect 64 MB of data, reorder and flush them). At the same time, considering how lvmcache works, you are probably right: cached chunks are going to be flushed as discovered (random order). > Even coded though, if the you have enough blocks cached and if the > blocks spread say one or 2 on each track it would break down to having > to write a tiny bit on each track with seeks between mostly breaking > down to the time required to simply read/write the HD end to end. At > 150MB/sec (should be about the platter speed) that would take 3.5 > hours. Which (apart being a totally worst outcome) would be way better than what required for totally random IO. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-03-02 20:48 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-01 22:44 [linux-lvm] lvconvert --uncache takes hours Roy Sigurd Karlsbakk 2023-03-01 22:55 ` Demi Marie Obenour 2023-03-02 0:51 ` Roger Heflin 2023-03-02 8:33 ` Roy Sigurd Karlsbakk 2023-03-02 11:27 ` Roger Heflin 2023-03-02 17:34 ` Gionatan Danti 2023-03-02 18:33 ` Roger Heflin 2023-03-02 20:47 ` Gionatan Danti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).