linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Question about the laziness of MADV_FREE
@ 2018-11-29 17:46 Niklas Hambüchen
  2018-11-29 18:00 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Niklas Hambüchen @ 2018-11-29 17:46 UTC (permalink / raw)
  To: linux-mm

Hello,

I'm trying to investigate the memory behaviour of a program that uses madvise(MADV_FREE) to tell the kernel that it no longer uses some pages.

I'm seeing some things I can't quite explain, concerning when freeing happens and how it is accounted for in /proc/pid/smaps.

`man madvise` shows:

       MADV_FREE (since Linux 4.5)
              The application no longer requires the pages in the range
              specified by addr and len.  The kernel can thus free these
              pages, but the freeing could be delayed until memory pressure
              occurs.
              ...
              On a swapless system, freeing
              pages in a given range happens instantly, regardless of memory
              pressure.

https://www.kernel.org/doc/Documentation/filesystems/proc.txt says:

    "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
    The memory isn't freed immediately with madvise(). It's freed in memory
    pressure if the memory is clean. Please note that the printed value might
    be lower than the real value due to optimizations used in the current
    implementation. If this is not desirable please file a bug report.

First, I am on a swapless system.
Nevertheless do I do not observe freeing happening instantly.
Instead, freeing does happen only under memory pressure.

For example, on a 64 GB RAM machine I have a process taking 30 GB resident memory ("RES" in tools like htop). After I put on memory pressure (for example using `stress-ng --vm-bytes 1G --vm-keep -m 50` to allocate and touch 50 GB), RES for that process decreases to 10 GB.

At the same time, I can see the number in LazyFree decrease during this operation.

According to the man page, I would not expect this "ballooning" to be necessary given that I have no swap.

Question 1:
Is `man madvise` outdated? Or am I measuring wrong?

Question 2:
Is the swap condition really binary? E.g. if the man page is accurate, would me adding 1 MB swap already make a difference in the behaviour, or are there more sophisticated rules at play?

Second, as you can see above, the proc-documentation of LazyFree does not mention any special swap rules.

Third, can anybody elaborate on "the printed value might be lower than the real value due to optimizations used in the current implementation"? How far off might the reported LazyFree be?
For my investigation it would be very useful if I could get accurate accounting.
How much work would the "If this is not desirable please file a bug report" bit entail?

Any answers would be very appreciated!
Niklas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about the laziness of MADV_FREE
  2018-11-29 17:46 Question about the laziness of MADV_FREE Niklas Hambüchen
@ 2018-11-29 18:00 ` Michal Hocko
  2018-11-29 19:21   ` Niklas Hambüchen
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-11-29 18:00 UTC (permalink / raw)
  To: Niklas Hambüchen; +Cc: linux-mm

On Thu 29-11-18 18:46:17, Niklas Hamb�chen wrote:
> Hello,
> 
> I'm trying to investigate the memory behaviour of a program that uses madvise(MADV_FREE) to tell the kernel that it no longer uses some pages.
> 
> I'm seeing some things I can't quite explain, concerning when freeing happens and how it is accounted for in /proc/pid/smaps.
> 
> `man madvise` shows:
> 
>        MADV_FREE (since Linux 4.5)
>               The application no longer requires the pages in the range
>               specified by addr and len.  The kernel can thus free these
>               pages, but the freeing could be delayed until memory pressure
>               occurs.
>               ...
>               On a swapless system, freeing
>               pages in a given range happens instantly, regardless of memory
>               pressure.

This part is outdated since 93e06c7a6453 ("mm: enable MADV_FREE for
swapless system") since 4.12. Something to fix in the man page. I will
send a patch for that. Thanks for pointing it out.

> https://www.kernel.org/doc/Documentation/filesystems/proc.txt says:
> 
>     "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
>     The memory isn't freed immediately with madvise(). It's freed in memory
>     pressure if the memory is clean. Please note that the printed value might
>     be lower than the real value due to optimizations used in the current
>     implementation. If this is not desirable please file a bug report.
> 
> First, I am on a swapless system.
> Nevertheless do I do not observe freeing happening instantly.
> Instead, freeing does happen only under memory pressure.

Yes this is how MADV_FREE is implemented.

> For example, on a 64 GB RAM machine I have a process taking 30 GB resident memory ("RES" in tools like htop). After I put on memory pressure (for example using `stress-ng --vm-bytes 1G --vm-keep -m 50` to allocate and touch 50 GB), RES for that process decreases to 10 GB.
> 
> At the same time, I can see the number in LazyFree decrease during this operation.

Those pages get reclaimed under memory pressure.

> According to the man page, I would not expect this "ballooning" to be
> necessary given that I have no swap.
> 
> Question 1:
> Is `man madvise` outdated? Or am I measuring wrong?

yep.

> Question 2:
> Is the swap condition really binary? E.g. if the man page is accurate, would me adding 1 MB swap already make a difference in the behaviour, or are there more sophisticated rules at play?

It used to be like that.

> Second, as you can see above, the proc-documentation of LazyFree does not mention any special swap rules.
> 
> Third, can anybody elaborate on "the printed value might be lower
> than the real value due to optimizations used in the current
> implementation"? How far off might the reported LazyFree be?

We batch multiple pages to become really lazyfree. This means that those
pages are sitting on a per-cpu list (see mark_page_lazyfree). So the
the number drift depends on the number of CPUs.

> For my investigation it would be very useful if I could get accurate accounting.
> How much work would the "If this is not desirable please file a bug report" bit entail?

What would be the reason to get the exact number?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about the laziness of MADV_FREE
  2018-11-29 18:00 ` Michal Hocko
@ 2018-11-29 19:21   ` Niklas Hambüchen
  2018-11-29 20:54     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Niklas Hambüchen @ 2018-11-29 19:21 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm

Hello Michal,

thanks for the swift reply and patch!

> We batch multiple pages to become really lazyfree. This means that those
> pages are sitting on a per-cpu list (see mark_page_lazyfree). So the
> the number drift depends on the number of CPUs.

Is there an upper bound that I can rely on in order to judge how far off the accounting is (perhaps depending on the number of CPUs as you say)?
For example, if the drift is bounded to, say 10%, that would probably be fine, while if it could be off by 2x or so, that would make system inspection tough.

>> For my investigation it would be very useful if I could get accurate accounting.
>> How much work would the "If this is not desirable please file a bug report" bit entail?
> 
> What would be the reason to get the exact number?

Mainly to debug situations where programs run out of memory.
Quite similar to the third point on https://lore.kernel.org/patchwork/cover/755741/.

In such situations, the first thing people usually do is to look at RES and see if things are off.
The fact that RES may still showing memory usage from before can already send one down the wrong investigation path very quickly.
For example, my process takes up to 50 GB when processing some data, and MADV_FREEs it all when it's done and idling.
Due to the lazy freeing, RES will continue to show up as 50 GB even when idle, which may make people suspect a memory leak when there really is none.

In this specific case, one can at least consult proc's LazyFree to figure this out, but if you cannot rely on that number to be accurate either (and the docs not saying how inaccurate it is), it's easy to feel lost about it.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about the laziness of MADV_FREE
  2018-11-29 19:21   ` Niklas Hambüchen
@ 2018-11-29 20:54     ` Michal Hocko
  2018-11-29 23:00       ` Niklas Hambüchen
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-11-29 20:54 UTC (permalink / raw)
  To: Niklas Hambüchen; +Cc: linux-mm

On Thu 29-11-18 20:21:49, Niklas Hamb�chen wrote:
> Hello Michal,
> 
> thanks for the swift reply and patch!
> 
> > We batch multiple pages to become really lazyfree. This means that those
> > pages are sitting on a per-cpu list (see mark_page_lazyfree). So the
> > the number drift depends on the number of CPUs.
> 
> Is there an upper bound that I can rely on in order to judge how far off the accounting is (perhaps depending on the number of CPUs as you say)?
> For example, if the drift is bounded to, say 10%, that would probably be fine, while if it could be off by 2x or so, that would make system inspection tough.

>From a quick look it should be 15*number_of_cpus unless I have missed
other caching. So this shouldn't be all that much unless you have a
giant machine with hundreds of cpus.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about the laziness of MADV_FREE
  2018-11-29 20:54     ` Michal Hocko
@ 2018-11-29 23:00       ` Niklas Hambüchen
  2018-11-30  8:19         ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Niklas Hambüchen @ 2018-11-29 23:00 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm

On 2018-11-29 21:54, Michal Hocko wrote:
> From a quick look it should be 15*number_of_cpus unless I have missed
> other caching. So this shouldn't be all that much unless you have a
> giant machine with hundreds of cpus.

For clarfication, is that 15*number_of_cpus many pages, or "factor" 15*number_of_cpu off what LazyFree reports?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about the laziness of MADV_FREE
  2018-11-29 23:00       ` Niklas Hambüchen
@ 2018-11-30  8:19         ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2018-11-30  8:19 UTC (permalink / raw)
  To: Niklas Hambüchen; +Cc: linux-mm

On Fri 30-11-18 00:00:26, Niklas Hamb�chen wrote:
> On 2018-11-29 21:54, Michal Hocko wrote:
> > From a quick look it should be 15*number_of_cpus unless I have missed
> > other caching. So this shouldn't be all that much unless you have a
> > giant machine with hundreds of cpus.
> 
> For clarfication, is that 15*number_of_cpus many pages, or "factor" 15*number_of_cpu off what LazyFree reports?

The number of pages.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-11-30  8:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-29 17:46 Question about the laziness of MADV_FREE Niklas Hambüchen
2018-11-29 18:00 ` Michal Hocko
2018-11-29 19:21   ` Niklas Hambüchen
2018-11-29 20:54     ` Michal Hocko
2018-11-29 23:00       ` Niklas Hambüchen
2018-11-30  8:19         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).