linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] lvconvert --uncache takes hours
@ 2023-03-01 22:44 Roy Sigurd Karlsbakk
  2023-03-01 22:55 ` Demi Marie Obenour
  2023-03-02  0:51 ` Roger Heflin
  0 siblings, 2 replies; 8+ messages in thread
From: Roy Sigurd Karlsbakk @ 2023-03-01 22:44 UTC (permalink / raw)
  To: linux-lvm; +Cc: Malin Bruland

Hi all

Working with a friend's machine, it has lvmcache turned on with writeback. This has worked well, but now it's uncaching and it takes *hours*. The amount of cache was chosen to 100GB on an SSD not used for much else and the dataset that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly works with file serving, but also has some VMs that benefit from the caching quite a bit. But then - I wonder - how can it spend hours emptying the cache like this? Most write caching I know of last only seconds or perhaps in really worst case scenarios, minutes. Since this is taking hours, it looks to me something should have been flushed ages ago.

Have I (or we) done something very stupid here or is this really how it's supposed to work?

Vennlig hilsen

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-01 22:44 [linux-lvm] lvconvert --uncache takes hours Roy Sigurd Karlsbakk
@ 2023-03-01 22:55 ` Demi Marie Obenour
  2023-03-02  0:51 ` Roger Heflin
  1 sibling, 0 replies; 8+ messages in thread
From: Demi Marie Obenour @ 2023-03-01 22:55 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Malin Bruland


[-- Attachment #1.1: Type: text/plain, Size: 1234 bytes --]

On Wed, Mar 01, 2023 at 11:44:00PM +0100, Roy Sigurd Karlsbakk wrote:
> Hi all
> 
> Working with a friend's machine, it has lvmcache turned on with writeback. This has worked well, but now it's uncaching and it takes *hours*. The amount of cache was chosen to 100GB on an SSD not used for much else and the dataset that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly works with file serving, but also has some VMs that benefit from the caching quite a bit. But then - I wonder - how can it spend hours emptying the cache like this? Most write caching I know of last only seconds or perhaps in really worst case scenarios, minutes. Since this is taking hours, it looks to me something should have been flushed ages ago.
> 
> Have I (or we) done something very stupid here or is this really how it's supposed to work?

It’s likely normal.  HDDs stink at small random writes and RAID-6 makes
this even worse.  That said, I *strongly* recommend using three-disk
RAID-1 for the cache, to match the redundancy of the RAID-6.  With
write-back caching, a failed cache will result in a corrupt and
unrecoverable filesystem.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-01 22:44 [linux-lvm] lvconvert --uncache takes hours Roy Sigurd Karlsbakk
  2023-03-01 22:55 ` Demi Marie Obenour
@ 2023-03-02  0:51 ` Roger Heflin
  2023-03-02  8:33   ` Roy Sigurd Karlsbakk
  2023-03-02 17:34   ` Gionatan Danti
  1 sibling, 2 replies; 8+ messages in thread
From: Roger Heflin @ 2023-03-02  0:51 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Malin Bruland

On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
>
> Hi all
>
> Working with a friend's machine, it has lvmcache turned on with writeback. This has worked well, but now it's uncaching and it takes *hours*. The amount of cache was chosen to 100GB on an SSD not used for much else and the dataset that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly works with file serving, but also has some VMs that benefit from the caching quite a bit. But then - I wonder - how can it spend hours emptying the cache like this? Most write caching I know of last only seconds or perhaps in really worst case scenarios, minutes. Since this is taking hours, it looks to me something should have been flushed ages ago.
>
> Have I (or we) done something very stupid here or is this really how it's supposed to work?
>
> Vennlig hilsen
>
> roy

A spinning raid6 array is slow on writes (see raid6  write penalty).
Because of that the array can only do about 100 write operattions/sec.

If the disk is doing other work then it only has the extra capacity so
it could destage slower.

A lot depends on how big each chunk is.     The lvmcache indicates the
smallest chunksize is 32k.

100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour.

Lvm bookkeeping has to also be written to the spinning disks I would
think, so 2 hours if the array were idle.

Throw in a 50% baseload on the disks and you get 4 hours.

Hours is reasonable.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-02  0:51 ` Roger Heflin
@ 2023-03-02  8:33   ` Roy Sigurd Karlsbakk
  2023-03-02 11:27     ` Roger Heflin
  2023-03-02 17:34   ` Gionatan Danti
  1 sibling, 1 reply; 8+ messages in thread
From: Roy Sigurd Karlsbakk @ 2023-03-02  8:33 UTC (permalink / raw)
  To: linux-lvm; +Cc: Malin Bruland


----- Original Message -----
> From: "Roger Heflin" <rogerheflin@gmail.com>
> To: "linux-lvm" <linux-lvm@redhat.com>
> Cc: "Malin Bruland" <malin.bruland@pm.me>
> Sent: Thursday, 2 March, 2023 01:51:08
> Subject: Re: [linux-lvm] lvconvert --uncache takes hours

> On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
>>
>> Hi all
>>
>> Working with a friend's machine, it has lvmcache turned on with writeback. This
>> has worked well, but now it's uncaching and it takes *hours*. The amount of
>> cache was chosen to 100GB on an SSD not used for much else and the dataset that
>> is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly
>> works with file serving, but also has some VMs that benefit from the caching
>> quite a bit. But then - I wonder - how can it spend hours emptying the cache
>> like this? Most write caching I know of last only seconds or perhaps in really
>> worst case scenarios, minutes. Since this is taking hours, it looks to me
>> something should have been flushed ages ago.
>>
>> Have I (or we) done something very stupid here or is this really how it's
>> supposed to work?
>>
>> Vennlig hilsen
>>
>> roy
> 
> A spinning raid6 array is slow on writes (see raid6  write penalty).
> Because of that the array can only do about 100 write operattions/sec.

About 100 writes/second per data drive, that is. md parallilses I/O well.

> If the disk is doing other work then it only has the extra capacity so
> it could destage slower.

The system was mostly idle.

> A lot depends on how big each chunk is.     The lvmcache indicates the
> smallest chunksize is 32k.
> 
> 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour.

Those 100GB was on SSD, not spinning rust. Last I checked, that was the whole point with caching.

> Lvm bookkeeping has to also be written to the spinning disks I would
> think, so 2 hours if the array were idle.

erm - why on earth would you do writes to hdd if you're caching it?

> Throw in a 50% baseload on the disks and you get 4 hours.
> 
> Hours is reasonable.

As I said, the system was idle.

Vennlig hilsen

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-02  8:33   ` Roy Sigurd Karlsbakk
@ 2023-03-02 11:27     ` Roger Heflin
  0 siblings, 0 replies; 8+ messages in thread
From: Roger Heflin @ 2023-03-02 11:27 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Malin Bruland

On Thu, Mar 2, 2023 at 2:34 AM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
>
>
> ----- Original Message -----
> > From: "Roger Heflin" <rogerheflin@gmail.com>
> > To: "linux-lvm" <linux-lvm@redhat.com>
> > Cc: "Malin Bruland" <malin.bruland@pm.me>
> > Sent: Thursday, 2 March, 2023 01:51:08
> > Subject: Re: [linux-lvm] lvconvert --uncache takes hours
>
> > On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
> >>
> >> Hi all
> >>
> >> Working with a friend's machine, it has lvmcache turned on with writeback. This
> >> has worked well, but now it's uncaching and it takes *hours*. The amount of
> >> cache was chosen to 100GB on an SSD not used for much else and the dataset that
> >> is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly
> >> works with file serving, but also has some VMs that benefit from the caching
> >> quite a bit. But then - I wonder - how can it spend hours emptying the cache
> >> like this? Most write caching I know of last only seconds or perhaps in really
> >> worst case scenarios, minutes. Since this is taking hours, it looks to me
> >> something should have been flushed ages ago.
> >>
> >> Have I (or we) done something very stupid here or is this really how it's
> >> supposed to work?
> >>
> >> Vennlig hilsen
> >>
> >> roy
> >
> > A spinning raid6 array is slow on writes (see raid6  write penalty).
> > Because of that the array can only do about 100 write operattions/sec.
>
> About 100 writes/second per data drive, that is. md parallilses I/O well.
>

No.  On writes you get 100 writes to the raid6 total.  With reads you
get 100 iops/disk.  The writes by their very raid6 nature cannot be
parallalized.

Each write to md requires a lot of work.   At min, you have to re-read
the sector you are writing, read the parity you need to update,
calculate the parity changes, and , adjust the parity and re-write any
parities that you need to change.    Your other option is you might be
able to write an entire stripe, but that requires writes to all disks
+ parity calc + writes to parity.    All options of writing data to
raid5/6 breakdown to iops/disk == total write iops.
The raid5/6 format requires the multiple reads and writes, and  really
makes it slow on writes.

> > If the disk is doing other work then it only has the extra capacity so
> > it could destage slower.
>
> The system was mostly idle.
>
> > A lot depends on how big each chunk is.     The lvmcache indicates the
> > smallest chunksize is 32k.
> >
> > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour.
>
> Those 100GB was on SSD, not spinning rust. Last I checked, that was the whole point with caching.

You are de-staging the SSD cache to spinning disks. correct?  The
writes to spinning disks are slow.

>
> > Lvm bookkeeping has to also be written to the spinning disks I would
> > think, so 2 hours if the array were idle.
>
> erm - why on earth would you do writes to hdd if you're caching it?

Once the cache is gone all LVM should be on the spinning disks.

>
> > Throw in a 50% baseload on the disks and you get 4 hours.
> >
> > Hours is reasonable.
>
> As I said, the system was idle.
>
> Vennlig hilsen
>

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-02  0:51 ` Roger Heflin
  2023-03-02  8:33   ` Roy Sigurd Karlsbakk
@ 2023-03-02 17:34   ` Gionatan Danti
  2023-03-02 18:33     ` Roger Heflin
  1 sibling, 1 reply; 8+ messages in thread
From: Gionatan Danti @ 2023-03-02 17:34 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Roger Heflin, Malin Bruland

Il 2023-03-02 01:51 Roger Heflin ha scritto:
> A spinning raid6 array is slow on writes (see raid6  write penalty).
> Because of that the array can only do about 100 write operattions/sec.

True. But does flushing cached data really proceed in random LBA order 
(as seen by HDDs), rather than trying to coalesce writes in linear 
fashion?

> If the disk is doing other work then it only has the extra capacity so
> it could destage slower.
> 
> A lot depends on how big each chunk is.     The lvmcache indicates the
> smallest chunksize is 32k.
> 
> 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an 
> hour.

You are off an order of magnitude: 3 millions IOP at 100 IOPs means 
~30000s, so about 9 hours.

> Lvm bookkeeping has to also be written to the spinning disks I would
> think, so 2 hours if the array were idle.
> 
> Throw in a 50% baseload on the disks and you get 4 hours.
> 
> Hours is reasonable.

If flushing happens in random disk order, than yes, you are bound to 
wait several hours indeed.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-02 17:34   ` Gionatan Danti
@ 2023-03-02 18:33     ` Roger Heflin
  2023-03-02 20:47       ` Gionatan Danti
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Heflin @ 2023-03-02 18:33 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Malin Bruland

On Thu, Mar 2, 2023 at 11:44 AM Gionatan Danti <g.danti@assyoma.it> wrote:
>
> Il 2023-03-02 01:51 Roger Heflin ha scritto:
> > A spinning raid6 array is slow on writes (see raid6  write penalty).
> > Because of that the array can only do about 100 write operattions/sec.
>
> True. But does flushing cached data really proceed in random LBA order
> (as seen by HDDs), rather than trying to coalesce writes in linear
> fashion?
>
It is a 100G cache over 16TB, so even if it flushes in order the may
not be that close to each other (1 in 160).

Also if pieces are decided and added to the cached then the cache is
not in order on the ssd and proper coalescing would require reading
the entire cache and sorting the 3,000,000 location entries before
starting the de-stage.  And that complication of a de-stage is likely
not been coded yet if I was just guessing, the de-stage starts at the
beginning and continues to the end of the cache.

Even coded though, if the you have enough blocks cached and if the
blocks spread say one or 2 on each track it would break down to having
to write a tiny bit on each track with seeks between mostly breaking
down to the time required to simply read/write  the HD end to end.  At
150MB/sec (should be about the platter speed) that would take 3.5
hours.


> > If the disk is doing other work then it only has the extra capacity so
> > it could destage slower.
> >
> > A lot depends on how big each chunk is.     The lvmcache indicates the
> > smallest chunksize is 32k.
> >
> > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an
> > hour.
>
> You are off an order of magnitude: 3 millions IOP at 100 IOPs means
> ~30000s, so about 9 hours.

Right, I did the calc in my head and screwed it up.  I thought it
should have been higher but did not re-check it.
>
> > Lvm bookkeeping has to also be written to the spinning disks I would
> > think, so 2 hours if the array were idle.
> >
> > Throw in a 50% baseload on the disks and you get 4 hours.
> >
> > Hours is reasonable.
>
> If flushing happens in random disk order, than yes, you are bound to
> wait several hours indeed.
>

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] lvconvert --uncache takes hours
  2023-03-02 18:33     ` Roger Heflin
@ 2023-03-02 20:47       ` Gionatan Danti
  0 siblings, 0 replies; 8+ messages in thread
From: Gionatan Danti @ 2023-03-02 20:47 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Roger Heflin, Malin Bruland

Il 2023-03-02 19:33 Roger Heflin ha scritto:
> On Thu, Mar 2, 2023 at 11:44 AM Gionatan Danti <g.danti@assyoma.it> 
> wrote:
> It is a 100G cache over 16TB, so even if it flushes in order the may
> not be that close to each other (1 in 160).

Yes, but destaging in LBA order (albeit far apart) is much better than 
in random order.

> Also if pieces are decided and added to the cached then the cache is
> not in order on the ssd and proper coalescing would require reading
> the entire cache and sorting the 3,000,000 location entries before
> starting the de-stage.  And that complication of a de-stage is likely
> not been coded yet if I was just guessing, the de-stage starts at the
> beginning and continues to the end of the cache.

I would expect reordering and coalescing to happen in reasonably sized 
window (ie: collect 64 MB of data, reorder and flush them). At the same 
time, considering how lvmcache works, you are probably right: cached 
chunks are going to be flushed as discovered (random order).

> Even coded though, if the you have enough blocks cached and if the
> blocks spread say one or 2 on each track it would break down to having
> to write a tiny bit on each track with seeks between mostly breaking
> down to the time required to simply read/write  the HD end to end.  At
> 150MB/sec (should be about the platter speed) that would take 3.5
> hours.

Which (apart being a totally worst outcome) would be way better than 
what required for totally random IO.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-02 20:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01 22:44 [linux-lvm] lvconvert --uncache takes hours Roy Sigurd Karlsbakk
2023-03-01 22:55 ` Demi Marie Obenour
2023-03-02  0:51 ` Roger Heflin
2023-03-02  8:33   ` Roy Sigurd Karlsbakk
2023-03-02 11:27     ` Roger Heflin
2023-03-02 17:34   ` Gionatan Danti
2023-03-02 18:33     ` Roger Heflin
2023-03-02 20:47       ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).