linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH -mm] provide estimated available memory in /proc/meminfo
       [not found]     ` <20131025120752.GB2415@rh0004426.ams.redhat.com>
@ 2013-11-05 22:38       ` Rik van Riel
  2013-11-05 22:39         ` Kirill A. Shutemov
  2013-11-05 22:45         ` Andrew Morton
  0 siblings, 2 replies; 7+ messages in thread
From: Rik van Riel @ 2013-11-05 22:38 UTC (permalink / raw)
  To: Ron van der Wees
  Cc: Erik Mouw, akpm, linux-kernel, torvalds, walken, Shaohua Li,
	Mel Gorman, Johannes Weiner

Many load balancing and workload placing programs check /proc/meminfo
to estimate how much free memory is available. They generally do this
by adding up "free" and "cached", which was fine ten years ago, but
is pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as
page cache, for example shared memory segments, tmpfs, and ramfs,
and it does not include reclaimable slab memory, which can take up
a large fraction of system memory on mostly idle systems with lots
of files.

Currently, the amount of memory that is available for a new workload,
without pushing the system into swap, can be estimated from MemFree,
Active(file), Inactive(file), and SReclaimable, as well as the "low"
watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should
not be expected to know kernel internals to come up with an estimate
for the amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo,
if things change in the future, we only have to change it in one
place.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
---
 fs/proc/meminfo.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 5aa847a..1c43db5 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -25,9 +25,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	struct sysinfo i;
 	unsigned long committed;
 	unsigned long allowed;
+	long available;
+	unsigned long pagecache, wmark_low = 0;
 	struct vmalloc_info vmi;
 	long cached;
 	unsigned long pages[NR_LRU_LISTS];
+	struct zone *zone;
 	int lru;
 
 /*
@@ -50,12 +53,44 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
 		pages[lru] = global_page_state(NR_LRU_BASE + lru);
 
+	for_each_zone(zone)
+		wmark_low += zone->watermark[WMARK_LOW];
+
+	/*
+	 * Estimate the amount of memory available for userspace allocations,
+	 * without causing swapping.
+	 *
+	 * Free memory cannot be taken below the low watermark, before the
+	 * system starts swapping.
+	 */
+	available = i.freeram - wmark_low;
+
+	/*
+	 * Not all the page cache can be freed, otherwise the system will
+	 * start swapping. Assume at least half of the page cache, or the
+	 * low watermark worth of cache, needs to stay.
+	 */
+	pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
+	pagecache -= min(pagecache / 2, wmark_low);
+	available += pagecache;
+
+	/*
+	 * Part of the reclaimable swap consists of items that are in use,
+	 * and cannot be freed. Cap this estimate at the low watermark.
+	 */
+	available += global_page_state(NR_SLAB_RECLAIMABLE) -
+		     min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low);
+
+	if (available < 0)
+		available = 0;
+
 	/*
 	 * Tagged format, for easy grepping and expansion.
 	 */
 	seq_printf(m,
 		"MemTotal:       %8lu kB\n"
 		"MemFree:        %8lu kB\n"
+		"MemAvailable:   %8lu kB\n"
 		"Buffers:        %8lu kB\n"
 		"Cached:         %8lu kB\n"
 		"SwapCached:     %8lu kB\n"
@@ -108,6 +143,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		,
 		K(i.totalram),
 		K(i.freeram),
+		K(available),
 		K(i.bufferram),
 		K(cached),
 		K(total_swapcache_pages()),

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH -mm] provide estimated available memory in /proc/meminfo
  2013-11-05 22:38       ` [RFC PATCH -mm] provide estimated available memory in /proc/meminfo Rik van Riel
@ 2013-11-05 22:39         ` Kirill A. Shutemov
  2013-11-05 22:45         ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2013-11-05 22:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ron van der Wees, Erik Mouw, akpm, linux-kernel, torvalds,
	walken, Shaohua Li, Mel Gorman, Johannes Weiner

On Tue, Nov 05, 2013 at 05:38:52PM -0500, Rik van Riel wrote:
> Many load balancing and workload placing programs check /proc/meminfo
> to estimate how much free memory is available. They generally do this
> by adding up "free" and "cached", which was fine ten years ago, but
> is pretty much guaranteed to be wrong today.
> 
> It is wrong because Cached includes memory that is not freeable as
> page cache, for example shared memory segments, tmpfs, and ramfs,
> and it does not include reclaimable slab memory, which can take up
> a large fraction of system memory on mostly idle systems with lots
> of files.
> 
> Currently, the amount of memory that is available for a new workload,
> without pushing the system into swap, can be estimated from MemFree,
> Active(file), Inactive(file), and SReclaimable, as well as the "low"
> watermarks from /proc/zoneinfo.

ramfs pages first go to (in)active lists, moves to unevictable later, so
it's not really true already. ;)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH -mm] provide estimated available memory in /proc/meminfo
  2013-11-05 22:38       ` [RFC PATCH -mm] provide estimated available memory in /proc/meminfo Rik van Riel
  2013-11-05 22:39         ` Kirill A. Shutemov
@ 2013-11-05 22:45         ` Andrew Morton
  2013-11-07 15:13           ` [RFC PATCH v2 " Rik van Riel
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2013-11-05 22:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ron van der Wees, Erik Mouw, linux-kernel, torvalds, walken,
	Shaohua Li, Mel Gorman, Johannes Weiner

On Tue, 5 Nov 2013 17:38:52 -0500 Rik van Riel <riel@redhat.com> wrote:

> Many load balancing and workload placing programs check /proc/meminfo
> to estimate how much free memory is available. They generally do this
> by adding up "free" and "cached", which was fine ten years ago, but
> is pretty much guaranteed to be wrong today.
> 
> It is wrong because Cached includes memory that is not freeable as
> page cache, for example shared memory segments, tmpfs, and ramfs,
> and it does not include reclaimable slab memory, which can take up
> a large fraction of system memory on mostly idle systems with lots
> of files.
> 
> Currently, the amount of memory that is available for a new workload,
> without pushing the system into swap, can be estimated from MemFree,
> Active(file), Inactive(file), and SReclaimable, as well as the "low"
> watermarks from /proc/zoneinfo.
> 
> However, this may change in the future, and user space really should
> not be expected to know kernel internals to come up with an estimate
> for the amount of free memory.
> 
> It is more convenient to provide such an estimate in /proc/meminfo,
> if things change in the future, we only have to change it in one
> place.
> 

That's a good idea.

>  fs/proc/meminfo.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)

Documentation/filesystems/proc.txt told me it's feeling all offended.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo
  2013-11-05 22:45         ` Andrew Morton
@ 2013-11-07 15:13           ` Rik van Riel
  2013-11-07 21:21             ` Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2013-11-07 15:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ron van der Wees, Erik Mouw, linux-kernel, torvalds, walken,
	Shaohua Li, Mel Gorman, Johannes Weiner


> >  fs/proc/meminfo.c | 36 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 36 insertions(+)
> 
> Documentation/filesystems/proc.txt told me it's feeling all offended.

You're right, of course.  Here is version 2 :)

---8<---

Subject: provide estimated available memory in /proc/meminfo

Many load balancing and workload placing programs check /proc/meminfo
to estimate how much free memory is available. They generally do this
by adding up "free" and "cached", which was fine ten years ago, but
is pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as
page cache, for example shared memory segments, tmpfs, and ramfs,
and it does not include reclaimable slab memory, which can take up
a large fraction of system memory on mostly idle systems with lots
of files.

Currently, the amount of memory that is available for a new workload,
without pushing the system into swap, can be estimated from MemFree,
Active(file), Inactive(file), and SReclaimable, as well as the "low"
watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should
not be expected to know kernel internals to come up with an estimate
for the amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo.
If things change in the future, we only have to change it in one place.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
---
 Documentation/filesystems/proc.txt |  9 +++++++++
 fs/proc/meminfo.c                  | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index fd8d0d5..f57bdd8 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -761,6 +761,7 @@ The "Locked" indicates whether the mapping is locked in memory or not.
 
 MemTotal:     16344972 kB
 MemFree:      13634064 kB
+MemAvailable: 14836172 kB
 Buffers:          3656 kB
 Cached:        1195708 kB
 SwapCached:          0 kB
@@ -793,6 +794,14 @@ AnonHugePages:   49152 kB
     MemTotal: Total usable ram (i.e. physical ram minus a few reserved
               bits and the kernel binary code)
      MemFree: The sum of LowFree+HighFree
+MemAvailable: An estimate of how much memory is available for starting new
+              applications, without swapping. Calculated from MemFree,
+              SReclaimable, the size of the file LRU lists, and the low
+              watermarks in each zone.
+              The estimate takes into account that the system needs some
+              page cache to function well, and that not all reclaimable
+              slab will be reclaimable, due to items being in use. The
+              impact of those factors will vary from system to system.
      Buffers: Relatively temporary storage for raw disk blocks
               shouldn't get tremendously large (20MB or so)
       Cached: in-memory cache for files read from the disk (the
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 5aa847a..1c43db5 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -25,9 +25,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	struct sysinfo i;
 	unsigned long committed;
 	unsigned long allowed;
+	long available;
+	unsigned long pagecache, wmark_low = 0;
 	struct vmalloc_info vmi;
 	long cached;
 	unsigned long pages[NR_LRU_LISTS];
+	struct zone *zone;
 	int lru;
 
 /*
@@ -50,12 +53,44 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
 		pages[lru] = global_page_state(NR_LRU_BASE + lru);
 
+	for_each_zone(zone)
+		wmark_low += zone->watermark[WMARK_LOW];
+
+	/*
+	 * Estimate the amount of memory available for userspace allocations,
+	 * without causing swapping.
+	 *
+	 * Free memory cannot be taken below the low watermark, before the
+	 * system starts swapping.
+	 */
+	available = i.freeram - wmark_low;
+
+	/*
+	 * Not all the page cache can be freed, otherwise the system will
+	 * start swapping. Assume at least half of the page cache, or the
+	 * low watermark worth of cache, needs to stay.
+	 */
+	pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
+	pagecache -= min(pagecache / 2, wmark_low);
+	available += pagecache;
+
+	/*
+	 * Part of the reclaimable swap consists of items that are in use,
+	 * and cannot be freed. Cap this estimate at the low watermark.
+	 */
+	available += global_page_state(NR_SLAB_RECLAIMABLE) -
+		     min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low);
+
+	if (available < 0)
+		available = 0;
+
 	/*
 	 * Tagged format, for easy grepping and expansion.
 	 */
 	seq_printf(m,
 		"MemTotal:       %8lu kB\n"
 		"MemFree:        %8lu kB\n"
+		"MemAvailable:   %8lu kB\n"
 		"Buffers:        %8lu kB\n"
 		"Cached:         %8lu kB\n"
 		"SwapCached:     %8lu kB\n"
@@ -108,6 +143,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		,
 		K(i.totalram),
 		K(i.freeram),
+		K(available),
 		K(i.bufferram),
 		K(cached),
 		K(total_swapcache_pages()),


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo
  2013-11-07 15:13           ` [RFC PATCH v2 " Rik van Riel
@ 2013-11-07 21:21             ` Johannes Weiner
  2013-11-07 22:27               ` Andrew Morton
  2013-11-08  0:27               ` Minchan Kim
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Weiner @ 2013-11-07 21:21 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Ron van der Wees, Erik Mouw, linux-kernel,
	torvalds, walken, Shaohua Li, Mel Gorman

On Thu, Nov 07, 2013 at 10:13:45AM -0500, Rik van Riel wrote:
> 
> > >  fs/proc/meminfo.c | 36 ++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 36 insertions(+)
> > 
> > Documentation/filesystems/proc.txt told me it's feeling all offended.
> 
> You're right, of course.  Here is version 2 :)
> 
> ---8<---
> 
> Subject: provide estimated available memory in /proc/meminfo
> 
> Many load balancing and workload placing programs check /proc/meminfo
> to estimate how much free memory is available. They generally do this
> by adding up "free" and "cached", which was fine ten years ago, but
> is pretty much guaranteed to be wrong today.
> 
> It is wrong because Cached includes memory that is not freeable as
> page cache, for example shared memory segments, tmpfs, and ramfs,
> and it does not include reclaimable slab memory, which can take up
> a large fraction of system memory on mostly idle systems with lots
> of files.
> 
> Currently, the amount of memory that is available for a new workload,
> without pushing the system into swap, can be estimated from MemFree,
> Active(file), Inactive(file), and SReclaimable, as well as the "low"
> watermarks from /proc/zoneinfo.
> 
> However, this may change in the future, and user space really should
> not be expected to know kernel internals to come up with an estimate
> for the amount of free memory.
> 
> It is more convenient to provide such an estimate in /proc/meminfo.
> If things change in the future, we only have to change it in one place.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Reported-by: Erik Mouw <erik.mouw_2@nxp.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

I have a suspicion that people will end up relying on this number to
start new workloads in situations where lots of the page cache is
actually heavily used.  We might not swap, but there will still be IO
from thrashing cache.

Maybe we'll have to subtract mapped cache pages in the future to
mitigate this risk somehow...

Anyway, we can defer this to when it's proven to be an actual problem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo
  2013-11-07 21:21             ` Johannes Weiner
@ 2013-11-07 22:27               ` Andrew Morton
  2013-11-08  0:27               ` Minchan Kim
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2013-11-07 22:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Rik van Riel, Ron van der Wees, Erik Mouw, linux-kernel,
	torvalds, walken, Shaohua Li, Mel Gorman

On Thu, 7 Nov 2013 16:21:32 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> > Subject: provide estimated available memory in /proc/meminfo
> > 
> > Many load balancing and workload placing programs check /proc/meminfo
> > to estimate how much free memory is available. They generally do this
> > by adding up "free" and "cached", which was fine ten years ago, but
> > is pretty much guaranteed to be wrong today.
> > 
> > It is wrong because Cached includes memory that is not freeable as
> > page cache, for example shared memory segments, tmpfs, and ramfs,
> > and it does not include reclaimable slab memory, which can take up
> > a large fraction of system memory on mostly idle systems with lots
> > of files.
> > 
> > Currently, the amount of memory that is available for a new workload,
> > without pushing the system into swap, can be estimated from MemFree,
> > Active(file), Inactive(file), and SReclaimable, as well as the "low"
> > watermarks from /proc/zoneinfo.
> > 
> > However, this may change in the future, and user space really should
> > not be expected to know kernel internals to come up with an estimate
> > for the amount of free memory.
> > 
> > It is more convenient to provide such an estimate in /proc/meminfo.
> > If things change in the future, we only have to change it in one place.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> I have a suspicion that people will end up relying on this number to
> start new workloads in situations where lots of the page cache is
> actually heavily used.  We might not swap, but there will still be IO
> from thrashing cache.
> 
> Maybe we'll have to subtract mapped cache pages in the future to
> mitigate this risk somehow...
> 
> Anyway, we can defer this to when it's proven to be an actual problem.

Well not really.  Once we release this thing with a particular
implementation, we are constrained in making any later changes.  If we
change it to produce larger numbers, someone's workload will start
swapping.  If we change it to produce smaller numbers, someone's
workload will refuse to start.

It all needs a bit of thought, and even some testing!  I labelled this
one for-3.14.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo
  2013-11-07 21:21             ` Johannes Weiner
  2013-11-07 22:27               ` Andrew Morton
@ 2013-11-08  0:27               ` Minchan Kim
  1 sibling, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2013-11-08  0:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Rik van Riel, Andrew Morton, Ron van der Wees, Erik Mouw,
	linux-kernel, torvalds, walken, Shaohua Li, Mel Gorman

Hi Hannes, Rik.

On Thu, Nov 07, 2013 at 04:21:32PM -0500, Johannes Weiner wrote:
> On Thu, Nov 07, 2013 at 10:13:45AM -0500, Rik van Riel wrote:
> > 
> > > >  fs/proc/meminfo.c | 36 ++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 36 insertions(+)
> > > 
> > > Documentation/filesystems/proc.txt told me it's feeling all offended.
> > 
> > You're right, of course.  Here is version 2 :)
> > 
> > ---8<---
> > 
> > Subject: provide estimated available memory in /proc/meminfo
> > 
> > Many load balancing and workload placing programs check /proc/meminfo
> > to estimate how much free memory is available. They generally do this
> > by adding up "free" and "cached", which was fine ten years ago, but
> > is pretty much guaranteed to be wrong today.
> > 
> > It is wrong because Cached includes memory that is not freeable as
> > page cache, for example shared memory segments, tmpfs, and ramfs,
> > and it does not include reclaimable slab memory, which can take up
> > a large fraction of system memory on mostly idle systems with lots
> > of files.
> > 
> > Currently, the amount of memory that is available for a new workload,
> > without pushing the system into swap, can be estimated from MemFree,
> > Active(file), Inactive(file), and SReclaimable, as well as the "low"
> > watermarks from /proc/zoneinfo.
> > 
> > However, this may change in the future, and user space really should
> > not be expected to know kernel internals to come up with an estimate
> > for the amount of free memory.
> > 
> > It is more convenient to provide such an estimate in /proc/meminfo.
> > If things change in the future, we only have to change it in one place.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> I have a suspicion that people will end up relying on this number to
> start new workloads in situations where lots of the page cache is
> actually heavily used.  We might not swap, but there will still be IO
> from thrashing cache.
> 
> Maybe we'll have to subtract mapped cache pages in the future to
> mitigate this risk somehow...

It might be huge false positive if there was mmaped used-once stream so
that userlevel could free some objects or kill someone to get a free memory.

And shouldn't we consider dirty + writeback, either?

Anyway, this feature is very handy. Swapping/LMK/OOM is very sensivie
subject for embedded people these days so we have been used some matrix
to get a ballpark estimate like

        "buffers + cached + Sreclaimable - (SHMEM + dirty + writeback +
         workingset)"

We included workingset to prevent thrashing page cache.
So, my point is we could include some tunable value in the expression
like workingset and default might be half of page cache like this patch
but admin can control it if the platform is aware of his workingset size.

> 
> Anyway, we can defer this to when it's proven to be an actual problem.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-11-08  0:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ECFC0369-0745-44AB-9858-E689F41CCB4D@nxp.com>
     [not found] ` <5256D4C7.90301@redhat.com>
     [not found]   ` <DB649DE8-5595-4370-912B-BDC6BD5FF8C3@nxp.com>
     [not found]     ` <20131025120752.GB2415@rh0004426.ams.redhat.com>
2013-11-05 22:38       ` [RFC PATCH -mm] provide estimated available memory in /proc/meminfo Rik van Riel
2013-11-05 22:39         ` Kirill A. Shutemov
2013-11-05 22:45         ` Andrew Morton
2013-11-07 15:13           ` [RFC PATCH v2 " Rik van Riel
2013-11-07 21:21             ` Johannes Weiner
2013-11-07 22:27               ` Andrew Morton
2013-11-08  0:27               ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).