damon.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* What kind of memory is DAMON RECLAIM able to free?
@ 2023-04-28 14:15 Grzegorz Uriasz
  2023-05-02  1:27 ` SeongJae Park
  0 siblings, 1 reply; 4+ messages in thread
From: Grzegorz Uriasz @ 2023-04-28 14:15 UTC (permalink / raw)
  To: damon; +Cc: dutkahugo

Hi!

I'm running some experiments using DAMON RECLAIM on the 6.2 kernel. I've 
set up an VM with free page reporting enabled with 16 vcores and 16GB of 
ram with very aggressive memory reclamation settings, my kernel boot 
line includes:
- transparent_hugepage=never
- page_reporting.page_reporting_order=0
- damon_reclaim.enabled=Y
- damon_reclaim.min_age=10000000
- damon_reclaim.wmarks_low=0
- damon_reclaim.wmarks_mid=999
- damon_reclaim.wmarks_high=1000
- damon_reclaim.quota_sz=1073741824
- damon_reclaim.quota_reset_interval_ms=1000

The memory usage of the VM starts at 800 MB, after running some 
workloads and ballooning the VM to 16 GB DAMON RECLAIM was able to 
quickly bring the memory usage back down to 3GB, after which it just 
stopped doing anything. What concerns me is that 20%(3.2GB for that VM) 
is the default low watermark in the DAMON RECLAIM module. I've verified 
that the watermarks were properly set in sysfs to my custom values, but 
it doesn't seem to affect anything as free -mh shows 400Mb for apps but 
2.6GB for caches/buffers. The VM besides idling for a very long time 
isn't able to free the buffers. When dropping the caches manually using 
/proc/sys/vm/drop_caches the memory usage returns back to the starting 
one. The cache/buffers don't increase at all after dropping them 
indicating that this memory was indeed idling.

My questions:
1. Are there types of freeable memory which DAMON is not allowed to touch?
2. What prevents DAMON from getting back the memory?
2. /sys/kernel/debug/damon/* seems separate from DAMON RECLAIM, 
/sys/module/damon_reclaim/parameters/kdamond_pid shows DAMON RECLAIM is 
running but the DAMON debugfs doesn't show it nor exposes any registered 
reclamation schemes.

Best Regards,
Grzegorz Uriasz


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What kind of memory is DAMON RECLAIM able to free?
  2023-04-28 14:15 What kind of memory is DAMON RECLAIM able to free? Grzegorz Uriasz
@ 2023-05-02  1:27 ` SeongJae Park
  2023-05-04 13:47   ` Grzegorz Uriasz
  0 siblings, 1 reply; 4+ messages in thread
From: SeongJae Park @ 2023-05-02  1:27 UTC (permalink / raw)
  To: Grzegorz Uriasz; +Cc: damon, dutkahugo

Hi Grzegorz,

On Fri, 28 Apr 2023 16:15:12 +0200 Grzegorz Uriasz <gorbak25@gmail.com> wrote:

> Hi!
> 
> I'm running some experiments using DAMON RECLAIM on the 6.2 kernel. I've 
> set up an VM with free page reporting enabled with 16 vcores and 16GB of 
> ram with very aggressive memory reclamation settings, my kernel boot 
> line includes:
> - transparent_hugepage=never
> - page_reporting.page_reporting_order=0
> - damon_reclaim.enabled=Y
> - damon_reclaim.min_age=10000000
> - damon_reclaim.wmarks_low=0
> - damon_reclaim.wmarks_mid=999
> - damon_reclaim.wmarks_high=1000
> - damon_reclaim.quota_sz=1073741824
> - damon_reclaim.quota_reset_interval_ms=1000
> 
> The memory usage of the VM starts at 800 MB, after running some 
> workloads and ballooning the VM to 16 GB DAMON RECLAIM was able to 
> quickly bring the memory usage back down to 3GB, after which it just 
> stopped doing anything. What concerns me is that 20%(3.2GB for that VM) 
> is the default low watermark in the DAMON RECLAIM module. I've verified 
> that the watermarks were properly set in sysfs to my custom values, but 
> it doesn't seem to affect anything as free -mh shows 400Mb for apps but 
> 2.6GB for caches/buffers. The VM besides idling for a very long time 
> isn't able to free the buffers. When dropping the caches manually using 
> /proc/sys/vm/drop_caches the memory usage returns back to the starting 
> one. The cache/buffers don't increase at all after dropping them 
> indicating that this memory was indeed idling.

Thank you for sharing your great experience and questions!

> 
> My questions:
> 1. Are there types of freeable memory which DAMON is not allowed to touch?

Basically there is no such limitation.  We implemented page type of cgroups
based DAMOS filtering feature in v6.3, but as you're using v6.2, it shouldn't
be related with your use case.

One possible limitation for this case might be the monitoring region.  You can
specify the region to monitor and reclaim using `monitor_region_{start,end}`
parameters.  By default, it's set to biggest System RAM.  If your system is
having non-countinuous System RAMs and the biggest one is not covering the 3GiB
region, the 3GiB regions will not be moitored and therefore not reclaimed.

Can you check if it is excluding the 3GiB region?  You may be able to get it
using `proc/iomem` like files.  You could also refer to DAMON user-space tool
to show its usage of the file[1].

Also, you could get DAMON_RECLAIM internal statistics[2].  Checking those could
also provide some hints, or help excluding unnecessary suspects.

> 2. What prevents DAMON from getting back the memory?

Other than quotas, watermarks and access pattern, there should be nothing
preventing DAMON_RECLAIM reclaiming memory on v6.2 kernel.  DAMOS filters could
also make some effect, but as mentioned-above, it's available from v6.3.

> 2. /sys/kernel/debug/damon/* seems separate from DAMON RECLAIM, 
> /sys/module/damon_reclaim/parameters/kdamond_pid shows DAMON RECLAIM is 
> running but the DAMON debugfs doesn't show it nor exposes any registered 
> reclamation schemes.

You're correct.  DAMON provides two main user interfaces, via debugfs
(/sys/kernel/debug/damon/) and sysfs (/sys/kernel/mm/damon/).  Those are for
fine-controlled use of all DAMON capabilities.  Btw, the debugfs interface is
deprecated now, so please use the sysfs interface.

DAMON modules like DAMON_RECLAIM and DAMON_LRU_SORT are for simpler control of
DAMON for only special purpose system-wide utilization, like proactive reclaim
and LRU lists manipulation.  Those hence provide simpler module parameters
interface.


[1] https://github.com/awslabs/damo/blob/next/_damo_paddr_layout.py
[2] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html#nr-reclaim-tried-regions


Thanks,
SJ

> 
> Best Regards,
> Grzegorz Uriasz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What kind of memory is DAMON RECLAIM able to free?
  2023-05-02  1:27 ` SeongJae Park
@ 2023-05-04 13:47   ` Grzegorz Uriasz
  2023-05-04 17:17     ` SeongJae Park
  0 siblings, 1 reply; 4+ messages in thread
From: Grzegorz Uriasz @ 2023-05-04 13:47 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon, dutkahugo, Grzegorz Uriasz

Hi SeongJae,

/* I apologize for the duplicate email but damon@lists.linux.dev 
rejected my previous message due to embedded pictures, moved the 
screenshots to imgur ;) */

Thank you for your help and providing us a link to the DAMON user space 
tools, they are very helpful.
I've checked the memory regions and indeed there was a 3GB RAM region 
besides the largest system ram(https://imgur.com/a/q8gdV8b).

After setting the start of the monitoring region to 0 DAMOS RECLAIM 
suddenly became more responsive and ram reclamation became more 
immediate and useful. Unfortunately there is still something which holds 
DAMOS RECLAIM back.

I've changed the memory region to start from 0 and ran the same workload 
as before, Damon was able to reclaim 1GB more ram compared to my 
previous tests, unfortunately this still leaves 2GB's of unused RAM in 
the caches :( Like before after dropping the caches the cache usage 
never grows indicating the ram was idling. I've also checked that after 
rising the amount of monitoring regions in DAMON_RECLAIM from 10 to 100 
the reclamation became more effective, DAMON_RECLAIM was able to reclaim 
0.4GB more than in the last test, but this still left 1.6GB of 
reclaimable ram overall(https://imgur.com/a/FHww4XA).

My questions:
1) What is holding DAMOS RECLAIM back?
2) Is it possible to explicitly specify multiple monitoring regions in 
DAMOS RECLAIM or do i need to configure DAMOS manually from userspace 
for that?
3) How to find the number of monitoring regions where DAMOS is most 
effective?

Best Regards,
Grzegorz Uriasz

On 02/05/2023 03:27, SeongJae Park wrote:
> Hi Grzegorz,
>
> On Fri, 28 Apr 2023 16:15:12 +0200 Grzegorz Uriasz <gorbak25@gmail.com> wrote:
>
>> Hi!
>>
>> I'm running some experiments using DAMON RECLAIM on the 6.2 kernel. I've
>> set up an VM with free page reporting enabled with 16 vcores and 16GB of
>> ram with very aggressive memory reclamation settings, my kernel boot
>> line includes:
>> - transparent_hugepage=never
>> - page_reporting.page_reporting_order=0
>> - damon_reclaim.enabled=Y
>> - damon_reclaim.min_age=10000000
>> - damon_reclaim.wmarks_low=0
>> - damon_reclaim.wmarks_mid=999
>> - damon_reclaim.wmarks_high=1000
>> - damon_reclaim.quota_sz=1073741824
>> - damon_reclaim.quota_reset_interval_ms=1000
>>
>> The memory usage of the VM starts at 800 MB, after running some
>> workloads and ballooning the VM to 16 GB DAMON RECLAIM was able to
>> quickly bring the memory usage back down to 3GB, after which it just
>> stopped doing anything. What concerns me is that 20%(3.2GB for that VM)
>> is the default low watermark in the DAMON RECLAIM module. I've verified
>> that the watermarks were properly set in sysfs to my custom values, but
>> it doesn't seem to affect anything as free -mh shows 400Mb for apps but
>> 2.6GB for caches/buffers. The VM besides idling for a very long time
>> isn't able to free the buffers. When dropping the caches manually using
>> /proc/sys/vm/drop_caches the memory usage returns back to the starting
>> one. The cache/buffers don't increase at all after dropping them
>> indicating that this memory was indeed idling.
> Thank you for sharing your great experience and questions!
>
>> My questions:
>> 1. Are there types of freeable memory which DAMON is not allowed to touch?
> Basically there is no such limitation.  We implemented page type of cgroups
> based DAMOS filtering feature in v6.3, but as you're using v6.2, it shouldn't
> be related with your use case.
>
> One possible limitation for this case might be the monitoring region.  You can
> specify the region to monitor and reclaim using `monitor_region_{start,end}`
> parameters.  By default, it's set to biggest System RAM.  If your system is
> having non-countinuous System RAMs and the biggest one is not covering the 3GiB
> region, the 3GiB regions will not be moitored and therefore not reclaimed.
>
> Can you check if it is excluding the 3GiB region?  You may be able to get it
> using `proc/iomem` like files.  You could also refer to DAMON user-space tool
> to show its usage of the file[1].
>
> Also, you could get DAMON_RECLAIM internal statistics[2].  Checking those could
> also provide some hints, or help excluding unnecessary suspects.
>
>> 2. What prevents DAMON from getting back the memory?
> Other than quotas, watermarks and access pattern, there should be nothing
> preventing DAMON_RECLAIM reclaiming memory on v6.2 kernel.  DAMOS filters could
> also make some effect, but as mentioned-above, it's available from v6.3.
>
>> 2. /sys/kernel/debug/damon/* seems separate from DAMON RECLAIM,
>> /sys/module/damon_reclaim/parameters/kdamond_pid shows DAMON RECLAIM is
>> running but the DAMON debugfs doesn't show it nor exposes any registered
>> reclamation schemes.
> You're correct.  DAMON provides two main user interfaces, via debugfs
> (/sys/kernel/debug/damon/) and sysfs (/sys/kernel/mm/damon/).  Those are for
> fine-controlled use of all DAMON capabilities.  Btw, the debugfs interface is
> deprecated now, so please use the sysfs interface.
>
> DAMON modules like DAMON_RECLAIM and DAMON_LRU_SORT are for simpler control of
> DAMON for only special purpose system-wide utilization, like proactive reclaim
> and LRU lists manipulation.  Those hence provide simpler module parameters
> interface.
>
>
> [1] https://github.com/awslabs/damo/blob/next/_damo_paddr_layout.py
> [2] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html#nr-reclaim-tried-regions
>
>
> Thanks,
> SJ
>
>> Best Regards,
>> Grzegorz Uriasz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What kind of memory is DAMON RECLAIM able to free?
  2023-05-04 13:47   ` Grzegorz Uriasz
@ 2023-05-04 17:17     ` SeongJae Park
  0 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2023-05-04 17:17 UTC (permalink / raw)
  To: Grzegorz Uriasz; +Cc: SeongJae Park, damon, dutkahugo

Hi Grzegorz,

On Thu, 4 May 2023 15:47:25 +0200 Grzegorz Uriasz <gorbak25@gmail.com> wrote:

> Hi SeongJae,
> 
> /* I apologize for the duplicate email but damon@lists.linux.dev 
> rejected my previous message due to embedded pictures, moved the 
> screenshots to imgur ;) */

No problem, thank you for patiently posting again :)

Seems the screenshots are showing only text outputs.  Maybe you could simply
copy-paste those into the mail body later if you want to avoid uploading it to
imgur separately.

> 
> Thank you for your help and providing us a link to the DAMON user space 
> tools, they are very helpful.
> I've checked the memory regions and indeed there was a 3GB RAM region 
> besides the largest system ram(https://imgur.com/a/q8gdV8b).
> 
> After setting the start of the monitoring region to 0 DAMOS RECLAIM 
> suddenly became more responsive and ram reclamation became more 
> immediate and useful. Unfortunately there is still something which holds 
> DAMOS RECLAIM back.
> 
> I've changed the memory region to start from 0 and ran the same workload 
> as before, Damon was able to reclaim 1GB more ram compared to my 
> previous tests, unfortunately this still leaves 2GB's of unused RAM in 
> the caches :( Like before after dropping the caches the cache usage 
> never grows indicating the ram was idling. I've also checked that after 
> rising the amount of monitoring regions in DAMON_RECLAIM from 10 to 100 
> the reclamation became more effective, DAMON_RECLAIM was able to reclaim 
> 0.4GB more than in the last test, but this still left 1.6GB of 
> reclaimable ram overall(https://imgur.com/a/FHww4XA).

Thank you for sharing your great experiments results!

> 
> My questions:
> 1) What is holding DAMOS RECLAIM back?

Currently, DAMON utilizes its own monitoring accuracy-overhead tradeoff
mechanism, namely Region Based Sampling[1] and Adaptive Regions Adjustment[2].
I guess DAMON is not showing the remaining 1.6GB as cold enough to be
reclaimed, due to the traded accuracy.

You can increase the accuracy as a cost of increased monitoring overhead by
increasing the {min,max}_nr_accesses.  I think this explains why you shown
DAMON_RECLAIM reclaiming 0.4GB more memory after you increased min_nr_regions
from 10 to 100.  This also explains why dropping cache reclaimed 1.6GB more
memory.  Cache dropping and page fault-driven memory population works in page
granularity, so could be more accurate than DAMON in general.

In detail, no entire page would be idle on the system.  Based on the second
screenshot, we can assume at least the 97MB of buff/cache memory could be
assumed to be still accesses.  My hopotheses is that the pages for the 97MB
memory are quite evenly spread in the DAMON_RECLAIM unreclaiming 2GB memory.
In the case, because DAMON works in region-based sampling[1], it will
occasionally pick one of the 97MB pages as the page to sample access, and
conclude the 2GB region is accessed.

For more detail, let's assume the pages are really evenly distributed, and the
pages are accessed at least once per the DAMON's sampling interval, which is
5ms by default.  Then, for about 1/20 (97MB / 2GB) times of sampling, one of
the pages are picked as sample page.  DAMON aggregates the sampling results for
its aggregation interval, which is 100ms, so do 20 repeated sampling.  So for
every aggregation interval, DAMON shows the region is having at least one
sample saying it was accessed.  So it conclude the 2GB regions is accessed at
least once per the 20 samples, and doesn't reclaim the entire 2GB region.

You may further increase the {min,max}_nr_regions to increase DAMON accuracy
and hence reclaim more memory.  Note that it will also increase the monitoring
overhead.  IF you even increase the numbers to 'your system memory / page
size', DAMON will do the monitoring in page granularity[3], so may provide the
best accuracy same to that of cache dropping.

One possible way for reclaiming more memory while keeping the overhead lower
would be finding memory regions that DAMON is still thinking hot, and setting
the monitoring region to only the region, and continuously
dividing-and-conquering.

> 2) Is it possible to explicitly specify multiple monitoring regions in 
> DAMOS RECLAIM or do i need to configure DAMOS manually from userspace 
> for that?

At the moment, DAMON_RECLAIM does not provide a way for such fine control.  So
you should use DAMOS manually.  Using the DAMON user space tool, damo, you may
use '--regions' option.

> 3) How to find the number of monitoring regions where DAMOS is most 
> effective?

You may simply try different numbers and show the progress.  Visualizing
monitored access pattern could also be helpful.  For DAMOS-specific case,
checking DAMOS stats and tried_regions could also be useful.

Note that these limitations are only in current implementation.  There are TODO
items for improving DAMON accuracy and automating tuning.  Hopefully future
version of DAMON will provide better accuracy and easier tuning.  Nevertheless,
this is where we are at the moment.

Sending questions, sharing experience/usages, and participating discussions
like this helps DAMON community knowing unexpected requirements, getting new
ideas, and prioritizing specific items.  Thank you for inspiring me with this.
Please feel free to ask anything if you need.

[1] https://docs.kernel.org/mm/damon/design.html#region-based-sampling
[2] https://docs.kernel.org/mm/damon/design.html#adaptive-regions-adjustment
[3] https://docs.kernel.org/mm/damon/faq.html#can-i-simply-monitor-page-granularity


Thanks,
SJ

> 
> Best Regards,
> Grzegorz Uriasz
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-04 17:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-28 14:15 What kind of memory is DAMON RECLAIM able to free? Grzegorz Uriasz
2023-05-02  1:27 ` SeongJae Park
2023-05-04 13:47   ` Grzegorz Uriasz
2023-05-04 17:17     ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).