All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
@ 2020-03-22 17:57 Scott Mcdermott
  2020-03-23  8:26 ` Joe Thornber
  0 siblings, 1 reply; 11+ messages in thread
From: Scott Mcdermott @ 2020-03-22 17:57 UTC (permalink / raw)
  To: linux-lvm

have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
these disks are hosted by a small 4GB big.little ARM system
running4.4.192-rk3399 (armbian 5.98 bionic).  parameters were set
with: lvconvert --type cache --cachemode writeback --cachepolicy smq
--cachesettings migration_threshold=10000000

the system lost power recently.  after coming back up from the crash
and resyncing the raid pair (no issues there), I get the disks online
and "lvchange -ay" the dm-cache device.  the system immediately starts
consuming all memory over the next few seconds, system used memory is
going up to 3500M (last I can see), cached is going down to 163M (last
I can see), the kernel kills all processes (they are faulting in
ext4_filemap_fault -> out_of_memory -> oom_kill_process)

so this happens every time I reboot and try it.  is 4G just not enough
system memory to run a dm-cache device? it was in perfect working
order before the crash and nicely optimizing my nightly rsyncs on top
of the cached device.  why's it going crazy when it goes online (third
second in below dstat 1), resulting in crashing system? how do I get
it online to remove my data from it?

usr sys idl wai stl| read  writ| int   csw | used  free  buff  cach
  2   3  95   0   0|   0     0 | 450   594 |93.9M  413M 10.3M 3184M
  0   0 100   0   0|   0     0 |  78   130 |93.4M  413M 10.3M 3184M
  2   6  91   1   0|8503k    0 |2447  4554 | 111M  393M 10.3M 3187M
  5   9  78   7   0|  11M   34k|7932    16k| 145M  358M 10.3M 3187M
  3  25  60  12   0| 277M 6818k|5385    10k| 602M 22.1M 10.4M 3068M
  9  64  21   5   0| 363M  124M|5509  9276 |1464M 39.9M 10.3M 2211M
  2  31  40  27   0| 342M  208M|5487  9495 |1671M 21.8M 10.3M 2027M
  1  10  63  26   0|  96M  128M|2197  4051 |1698M 23.9M 10.3M 1999M
  1  15  55  29   0| 138M  225M|2361  5007 |1730M 25.3M 10.3M 1966M
  3  16  54  28   0| 163M  234M|3118  5021 |1768M 23.8M 10.3M 1930M
  1   8  58  33   0|  85M  128M|1541  2860 |1795M 24.3M 10.3M 1904M
  2  10  53  35   0|  97M  161M|1949  3275 |1820M 24.1M 10.3M 1879M
  3  16  55  26   0| 148M  235M|2927  4733 |1865M 24.1M 10.3M 1835M
  1   9  65  25   0|  83M  137M|1764  3521 |1891M 23.9M 10.3M 1810M
  5  59  29   6   0| 340M   97M|4291  5530 |3569M 39.5M 10.3M  163M
  2  22  51  25   0| 339M  236M|5985  9526 |3500M  109M 10.3M  163M

note: tried adding some swap on unrelated slow disk, which seems to
delay it by some seconds, but ultimate result is always the same: OOM
every process killed and reboot...

here is paste from "lvs -a" just before "lvchange -ay
raidbak4/bakvol4" which brings the system down:

  LV                VG       Attr       LSize    Pool        Origin
  [bakcache4]       raidbak4 Cwi---C--- <931.38g
  [bakcache4_cdata] raidbak4 Cwi------- <931.38g
  [bakcache4_cmeta] raidbak4 ewi-------   48.00m
  bakvol4           raidbak4 Cwi---C---    1.75t [bakcache4] [bakvol4_corig]
  [bakvol4_corig]   raidbak4 owi---C---    1.75t
  [lvol0_pmspare]   raidbak4 ewi-------   48.00m

this lvchange then brings the system down with OOM death within 10 or
so seconds.  online access to the cached data seems to be
impossible...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-22 17:57 [linux-lvm] when bringing dm-cache online, consumes all memory and reboots Scott Mcdermott
@ 2020-03-23  8:26 ` Joe Thornber
  2020-03-23  9:57   ` Zdenek Kabelac
  2020-03-23 21:35   ` Scott Mcdermott
  0 siblings, 2 replies; 11+ messages in thread
From: Joe Thornber @ 2020-03-23  8:26 UTC (permalink / raw)
  To: LVM general discussion and development

On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
> have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
> data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
> these disks are hosted by a small 4GB big.little ARM system
> running4.4.192-rk3399 (armbian 5.98 bionic).  parameters were set
> with: lvconvert --type cache --cachemode writeback --cachepolicy smq
> --cachesettings migration_threshold=10000000

If you crash then the cache assumes all blocks are dirty and performs
a full writeback.  You have set the migration_threshold extremely high
so I think this writeback process is just submitting far too much io at once.

Bring it down to around 2048 and try again.

- Joe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-23  8:26 ` Joe Thornber
@ 2020-03-23  9:57   ` Zdenek Kabelac
  2020-03-23 16:26     ` John Stoffel
  2020-03-23 22:02     ` Scott Mcdermott
  2020-03-23 21:35   ` Scott Mcdermott
  1 sibling, 2 replies; 11+ messages in thread
From: Zdenek Kabelac @ 2020-03-23  9:57 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 23. 03. 20 v 9:26 Joe Thornber napsal(a):
> On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
>> have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
>> data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
>> these disks are hosted by a small 4GB big.little ARM system
>> running4.4.192-rk3399 (armbian 5.98 bionic).  parameters were set
>> with: lvconvert --type cache --cachemode writeback --cachepolicy smq
>> --cachesettings migration_threshold=10000000
> 
> If you crash then the cache assumes all blocks are dirty and performs
> a full writeback.  You have set the migration_threshold extremely high
> so I think this writeback process is just submitting far too much io at once.
> 
> Bring it down to around 2048 and try again.
> 

Hi

Users should be 'performing' some benchmarking about the 'useful' sizes of
hotspot areas - using nearly 1T of cache for 1.8T of origin doesn't look
the right ration for caching.
(i.e. like if your CPU cache would be halve of your DRAM)

Too big 'cache size' leads usually into way too big caching chunks
(since we try to limit number of 'chunks' in cache to 1 milion  - you
can rise up this limit - but it will consume a lot of your RAM space as well)
So IMHO I'd recommend to use at most 512K chunks - which gives you
about 256GiB of cache size -  but still users should be benchmarking what is 
the best for them...)

Another hint - lvm2 introduced support for new dm-writecache target as well.
So if you intent to accelerate mainly 'write throughput' - dm-cache isn't
the one with highest performance here.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-23  9:57   ` Zdenek Kabelac
@ 2020-03-23 16:26     ` John Stoffel
  2020-03-23 22:02     ` Scott Mcdermott
  1 sibling, 0 replies; 11+ messages in thread
From: John Stoffel @ 2020-03-23 16:26 UTC (permalink / raw)
  To: LVM general discussion and development

>>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:

Zdenek> Dne 23. 03. 20 v 9:26 Joe Thornber napsal(a):
>> On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
>>> have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
>>> data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
>>> these disks are hosted by a small 4GB big.little ARM system
>>> running4.4.192-rk3399 (armbian 5.98 bionic).  parameters were set
>>> with: lvconvert --type cache --cachemode writeback --cachepolicy smq
>>> --cachesettings migration_threshold=10000000
>> 
>> If you crash then the cache assumes all blocks are dirty and performs
>> a full writeback.  You have set the migration_threshold extremely high
>> so I think this writeback process is just submitting far too much io at once.
>> 
>> Bring it down to around 2048 and try again.
>> 

Zdenek> Hi

Zdenek> Users should be 'performing' some benchmarking about the 'useful' sizes of
Zdenek> hotspot areas - using nearly 1T of cache for 1.8T of origin doesn't look
Zdenek> the right ration for caching.
Zdenek> (i.e. like if your CPU cache would be halve of your DRAM)

Zdenek> Too big 'cache size' leads usually into way too big caching
Zdenek> chunks (since we try to limit number of 'chunks' in cache to 1
Zdenek> milion - you can rise up this limit - but it will consume a
Zdenek> lot of your RAM space as well) So IMHO I'd recommend to use at
Zdenek> most 512K chunks - which gives you about 256GiB of cache size
Zdenek> - but still users should be benchmarking what is the best for
Zdenek> them...)

I think dm-cache should be smarter as well, and not let the users
bring the system to it's knees with outrageous numbers.  When a user
puts a migration_threshold that high, there needs to be a safety check
that the system isn't them using too much memory, and should listen to
memory pressure instead.

Also, can you change the migration_threshold without activating?  Or
when activated?

John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-23  8:26 ` Joe Thornber
  2020-03-23  9:57   ` Zdenek Kabelac
@ 2020-03-23 21:35   ` Scott Mcdermott
  1 sibling, 0 replies; 11+ messages in thread
From: Scott Mcdermott @ 2020-03-23 21:35 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 23, 2020 at 1:26 AM Joe Thornber <thornber@redhat.com> wrote:
> On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
> > [system crashed, uses all memory when brought online...]
> > parameters were set with: lvconvert --type cache
> > --cachemode writeback --cachepolicy smq
> > --cachesettings migration_threshold=10000000
>
> If you crash then the cache assumes all blocks are dirty and performs
> a full writeback.  You have set the migration_threshold extremely high
> so I think this writeback process is just submitting far too much io at once.
>
> Bring it down to around 2048 and try again.

the device wasn't visible in "dmsetup table" prior to activation, so I tried:

  lvchange -ay raidbak4/bakvol4; dmsetup message raidbak4-bakvol4 0
migration_threshold 204800

but this continued to crash, apparently the value used at activation
time is enough to crash the system.  instead using:

  lvchange --cachesettings migration_threshold=204800 raidbak4/bakvol4
  lvchange -ay raidbak4/bakvol4

it worked, and the used disk bandwidth was much lower (which, I don't
want it to be, but a functioning system is needed for the thing to
work at all).  after some time doing a lot of I/Os, it went silent and
is presumably flushed, seems to be in working order, thanks.

so I have to experiment to find the highest migration_threshold value
that won't crash my system with OOM? I don't want there to be any
cache bandwidth restriction, it should saturate and use all available
to aggressively promote (for my frequent case, working set actually
would fit entirely in cache, but it's ok if the cache learns this
slowly from usage).

seems like I should be able to use a value that means "use all
available bandwidth" that isn't going to take down my system with OOM.
even if I play with the value, I might find during some pathological
circumstance, it pushes beyond where I tested and now it crashes my
system again.  is there some safe calculation I can use to determine
the maximum amount?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-23  9:57   ` Zdenek Kabelac
  2020-03-23 16:26     ` John Stoffel
@ 2020-03-23 22:02     ` Scott Mcdermott
  2020-03-24  9:43       ` Zdenek Kabelac
  1 sibling, 1 reply; 11+ messages in thread
From: Scott Mcdermott @ 2020-03-23 22:02 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 23, 2020 at 2:57 AM Zdenek Kabelac <zkabelac@redhat.com> wrote:
> Dne 23. 03. 20 v 9:26 Joe Thornber napsal(a):
> > On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
> > > have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
> > > data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
>
> Users should be 'performing' some benchmarking about the 'useful' sizes of
> hotspot areas - using nearly 1T of cache for 1.8T of origin doesn't look
> the right ration for caching.
> (i.e. like if your CPU cache would be halve of your DRAM)

the 1.8T origin will be upgraded over time with larger/more spinning
disks, but the cache will remain as it is.  hopefully it can perform
well whether it is 1:2 cache:data as now or 1:10+ as later.

> Too big 'cache size' leads usually into way too big caching chunks
> (since we try to limit number of 'chunks' in cache to 1 milion  - you
> can rise up this limit - but it will consume a lot of your RAM space as well)
> So IMHO I'd recommend to use at most 512K chunks - which gives you
> about 256GiB of cache size -  but still users should be benchmarking what is
> the best for them...)

how to raise this limit? since I'm low RAM this is a problem, but why
are large chunks an issue, besides memory usage? is this causing
unnecessary I/O by an amplification effect? if my system doesn't have
enough memory for this job I will have to find a host board with more
RAM.

> Another hint - lvm2 introduced support for new dm-writecache target as well.

this won't work for me since a lot of my data is reads, and I'm low
memory with large numbers of files.  rsync of large trees is the main
workload; existing algorithm is not working fantastically well, but
nonetheless giving a nice boost to my rsync completion times over the
uncached times.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-23 22:02     ` Scott Mcdermott
@ 2020-03-24  9:43       ` Zdenek Kabelac
  2020-03-24 11:37         ` Gionatan Danti
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kabelac @ 2020-03-24  9:43 UTC (permalink / raw)
  To: LVM general discussion and development, Scott Mcdermott

Dne 23. 03. 20 v 23:02 Scott Mcdermott napsal(a):
> On Mon, Mar 23, 2020 at 2:57 AM Zdenek Kabelac <zkabelac@redhat.com> wrote:
>> Dne 23. 03. 20 v 9:26 Joe Thornber napsal(a):
>>> On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
>>>> have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
>>>> data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
>>
>> Users should be 'performing' some benchmarking about the 'useful' sizes of
>> hotspot areas - using nearly 1T of cache for 1.8T of origin doesn't look
>> the right ration for caching.
>> (i.e. like if your CPU cache would be halve of your DRAM)
> 
> the 1.8T origin will be upgraded over time with larger/more spinning
> disks, but the cache will remain as it is.  hopefully it can perform
> well whether it is 1:2 cache:data as now or 1:10+ as later.

Hi

Here would be my personal 'experience' - if you need to use so much
data in 'fast' cache - it's probably still better to use real fast storage.
Though I'd question if you have power enough hw to handle that much
data in your system if you are already struggling with memory.
You should probably size your cache on some realistic calculation how
much data can go effectively through your system.

Note - there is 'dmstats' tool to analyze 'hotspots' areas on your
storage (the more details you want to know, the more memory it will take)

Hot-spots dm-cache is efficient for 'repeatedly' accessed same data.
If the workload is about 'streaming' large data sets without having some
rather focused working areas on your disk - the over performance might be 
actually degraded (thats why I'd recommend to use fast big storage for whole 
data set)


> 
>> Too big 'cache size' leads usually into way too big caching chunks
>> (since we try to limit number of 'chunks' in cache to 1 milion  - you
>> can rise up this limit - but it will consume a lot of your RAM space as well)
>> So IMHO I'd recommend to use at most 512K chunks - which gives you
>> about 256GiB of cache size -  but still users should be benchmarking what is
>> the best for them...)
> 
> how to raise this limit? since I'm low RAM this is a problem, but why
> are large chunks an issue, besides memory usage? is this causing
> unnecessary I/O by an amplification effect? if my system doesn't have
> enough memory for this job I will have to find a host board with more
> RAM.

Cache is managing its counters in RAM - so the more 'chunks' cache will have 
the more memory is consumed (possibly seriously crippling performance of your 
system, stressing swap and being low on resources).
The number of cache chunks is a very simple math here. Just devide the size of 
your caching device with size of caching chunk.

Just like with your CPU uses your RAM for page descriptors...

The smaller the cache chunks are - the smaller I/O load it makes when the
chunk is 'promoted'/'demoted' between caching device and origin device.
And also the more 'efficient/precise' disk area is cached.

And as you have figured yourself out this load is BIG!.

By default we require migration threshold to be at least 8 chunks big.
So with big chunks like 2MiB in size - gives you 16MiBof required I/O threshold.

So if you do i.e. read 4K from disk - it may cause i/o load of 2MiB chunk 
block promotion into cache - so you can see the math here...


>> Another hint - lvm2 introduced support for new dm-writecache target as well.
> 
> this won't work for me since a lot of my data is reads, and I'm low
> memory with large numbers of files.  rsync of large trees is the main
> workload; existing algorithm is not working fantastically well, but
> nonetheless giving a nice boost to my rsync completion times over the
> uncached times.

If the main workload is to read whole device over & over again likely no 
caching will enhance your experience and you may simply need fast whole
storage.

dm-cache targets  'hotspots' caching
dm-writecache is like 'extension to your page cache'

Regards

Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-24  9:43       ` Zdenek Kabelac
@ 2020-03-24 11:37         ` Gionatan Danti
  2020-03-24 15:09           ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Gionatan Danti @ 2020-03-24 11:37 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Scott Mcdermott

Il 2020-03-24 10:43 Zdenek Kabelac ha scritto:
> By default we require migration threshold to be at least 8 chunks big.
> So with big chunks like 2MiB in size - gives you 16MiBof required I/O 
> threshold.
> 
> So if you do i.e. read 4K from disk - it may cause i/o load of 2MiB
> chunk block promotion into cache - so you can see the math here...

Hi Zdenek, I am not sure to following you description of 
migration_threshold. From dm-cache kernel doc:

"Migrating data between the origin and cache device uses bandwidth.
The user can set a throttle to prevent more than a certain amount of
migration occurring at any one time.  Currently we're not taking any
account of normal io traffic going to the devices.  More work needs
doing here to avoid migrating during those peak io moments.
For the time being, a message "migration_threshold <#sectors>"
can be used to set the maximum number of sectors being migrated,
the default being 2048 sectors (1MB)."

Can you better explain what really migration_threshold accomplishes? It 
is a "max bandwidth cap" settings, or something more?

> If the main workload is to read whole device over & over again likely
> no caching will enhance your experience and you may simply need fast
> whole
> storage.

 From what I understand the OP want to cache filesystem metadata to 
speedup rsync directory traversal. So a cache device should definitely 
be useful; albeit dm-cache being "blind" in regard to data vs metadata, 
the latter should be good candidate for hotspot promotion.

For reference, I have a ZFS system exactly used for such a workload 
(backup with rsnapshot, which uses rsync and hardlink to create 
deduplicated backups) and setting cache=metadata (rather than "all", so 
data and metadata) gives a very noticeable boot to rsync traversal.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-24 11:37         ` Gionatan Danti
@ 2020-03-24 15:09           ` Zdenek Kabelac
  2020-03-24 22:35             ` Gionatan Danti
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kabelac @ 2020-03-24 15:09 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti; +Cc: Scott Mcdermott

Dne 24. 03. 20 v 12:37 Gionatan Danti napsal(a):
> Il 2020-03-24 10:43 Zdenek Kabelac ha scritto:
>> By default we require migration threshold to be at least 8 chunks big.
>> So with big chunks like 2MiB in size - gives you 16MiBof required I/O 
>> threshold.
>>
>> So if you do i.e. read 4K from disk - it may cause i/o load of 2MiB
>> chunk block promotion into cache - so you can see the math here...
> 
> Hi Zdenek, I am not sure to following you description of migration_threshold. 
>  From dm-cache kernel doc:
> 
> "Migrating data between the origin and cache device uses bandwidth.
> The user can set a throttle to prevent more than a certain amount of
> migration occurring at any one time.� Currently we're not taking any
> account of normal io traffic going to the devices.� More work needs
> doing here to avoid migrating during those peak io moments.
> For the time being, a message "migration_threshold <#sectors>"
> can be used to set the maximum number of sectors being migrated,
> the default being 2048 sectors (1MB)."
> 
> Can you better explain what really migration_threshold accomplishes? It is a 
> "max bandwidth cap" settings, or something more?
> 

In past we had problem that when users have been using huge chunk size,
and small 'migration_threashold' - the cache was unable to demote chunks
from the cache to the origin device  (the size of 'required' data for demotion 
were bigger then what has been allowed by threshold).

So lvm2/libdm implemented protection to always set at least 8 chunks
is the bare minimum.

Now we face clearly the problem from 'the other side' - users have way too big 
chunks (we've seen users with 128M chunks) - and so threshold is set to 1G
and users are facing serious bottleneck on the cache side doing to many
promotions/demotions.

We will likely fix this by setting max chunk size somewhere around 2MiB.

But threshold is not an ultimate protection against overloading of users' 
systems. There is nothing doing 'bandwidth' monitoring of the disk throughput. 
It's just rather a simple limition how many bytes (chunks) can be shifted 
between origin & cache...

>> If the main workload is to read whole device over & over again likely
>> no caching will enhance your experience and you may simply need fast
>> whole
>> storage.
> 
>  From what I understand the OP want to cache filesystem metadata to speedup 
> rsync directory traversal. So a cache device should definitely be useful; 
> albeit dm-cache being "blind" in regard to data vs metadata, the latter should 
> be good candidate for hotspot promotion.
> 
> For reference, I have a ZFS system exactly used for such a workload (backup 
> with rsnapshot, which uses rsync and hardlink to create deduplicated backups) 
> and setting cache=metadata (rather than "all", so data and metadata) gives a 
> very noticeable boot to rsync traversal.

Yeah - if you read only 'directory' metadata structures - it's perfectly OK 
with caching, if you do a full data read of the whole storage it's not going 
to help (which is what I've meant).

The main message is - cache is there to accelerate often read disk blocks.
If there is no hotspot and disk is mostly read over the whole address space 
equally there will be no big benefit of cache usage.

However main message should be - user should think about sizes of caching
devices and its implications - there is no universal setting that fits best 
all users.


Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-24 15:09           ` Zdenek Kabelac
@ 2020-03-24 22:35             ` Gionatan Danti
  2020-03-25  8:55               ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Gionatan Danti @ 2020-03-24 22:35 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Mcdermott, Scott, LVM general discussion and development

Il 2020-03-24 16:09 Zdenek Kabelac ha scritto:
> In past we had problem that when users have been using huge chunk size,
> and small 'migration_threashold' - the cache was unable to demote 
> chunks
> from the cache to the origin device  (the size of 'required' data for
> demotion were bigger then what has been allowed by threshold).
> 
> So lvm2/libdm implemented protection to always set at least 8 chunks
> is the bare minimum.
> 
> Now we face clearly the problem from 'the other side' - users have way
> too big chunks (we've seen users with 128M chunks) - and so threshold
> is set to 1G
> and users are facing serious bottleneck on the cache side doing to many
> promotions/demotions.
> 
> We will likely fix this by setting max chunk size somewhere around 
> 2MiB.

Thanks for the explanation. Maybe is a naive proposal, but can't you 
simply set migration_threshold equal to a single chunk for >2M sized 
chunks, and 8 chunks for smaller ones?

> Yeah - if you read only 'directory' metadata structures - it's
> perfectly OK with caching, if you do a full data read of the whole
> storage it's not going to help (which is what I've meant).
> 
> The main message is - cache is there to accelerate often read disk 
> blocks.
> If there is no hotspot and disk is mostly read over the whole address
> space equally there will be no big benefit of cache usage.
> 
> However main message should be - user should think about sizes of 
> caching
> devices and its implications - there is no universal setting that fits
> best all users.

Sure.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] when bringing dm-cache online, consumes all memory and reboots
  2020-03-24 22:35             ` Gionatan Danti
@ 2020-03-25  8:55               ` Zdenek Kabelac
  0 siblings, 0 replies; 11+ messages in thread
From: Zdenek Kabelac @ 2020-03-25  8:55 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Scott Mcdermott, LVM general discussion and development

Dne 24. 03. 20 v 23:35 Gionatan Danti napsal(a):
> Il 2020-03-24 16:09 Zdenek Kabelac ha scritto:
>> In past we had problem that when users have been using huge chunk size,
>> and small 'migration_threashold' - the cache was unable to demote chunks
>> from the cache to the origin device� (the size of 'required' data for
>> demotion were bigger then what has been allowed by threshold).
>>
>> So lvm2/libdm implemented protection to always set at least 8 chunks
>> is the bare minimum.
>>
>> Now we face clearly the problem from 'the other side' - users have way
>> too big chunks (we've seen users with 128M chunks) - and so threshold
>> is set to 1G
>> and users are facing serious bottleneck on the cache side doing to many
>> promotions/demotions.
>>
>> We will likely fix this by setting max chunk size somewhere around 2MiB. >
> Thanks for the explanation. Maybe is a naive proposal, but can't you simply 
> set migration_threshold equal to a single chunk for >2M sized chunks, and 8 
> chunks for smaller ones?

Using large cache chunks likely degrades the usefulness and purpose of cache.
Though we are missing some comparative tables showing how the
optimal layouts should look like.

So the idea is not to just 'let it somehow work' but rather to move it towards
more efficient usage of available resources.

Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-03-25  8:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-22 17:57 [linux-lvm] when bringing dm-cache online, consumes all memory and reboots Scott Mcdermott
2020-03-23  8:26 ` Joe Thornber
2020-03-23  9:57   ` Zdenek Kabelac
2020-03-23 16:26     ` John Stoffel
2020-03-23 22:02     ` Scott Mcdermott
2020-03-24  9:43       ` Zdenek Kabelac
2020-03-24 11:37         ` Gionatan Danti
2020-03-24 15:09           ` Zdenek Kabelac
2020-03-24 22:35             ` Gionatan Danti
2020-03-25  8:55               ` Zdenek Kabelac
2020-03-23 21:35   ` Scott Mcdermott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.