linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures
@ 2019-04-12 19:14 Johannes Weiner
  2019-04-12 20:06 ` Roman Gushchin
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Weiner @ 2019-04-12 19:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, kernel-team

With the default overcommit==guess we occasionally run into mmap
rejections despite plenty of memory that would get dropped under
pressure but just isn't accounted reclaimable. One example of this is
dying cgroups pinned by some page cache. A previous case was auxiliary
path name memory associated with dentries; we have since annotated
those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
treat indirectly reclaimable memory as free in overcommit logic")).

But trying to classify all allocated memory reliably as reclaimable
and unreclaimable is a bit of a fool's errand. There could be a myriad
of dependencies that constantly change with kernel versions.

It becomes even more questionable of an effort when considering how
this estimate of available memory is used: it's not compared to the
system-wide allocated virtual memory in any way. It's not even
compared to the allocating process's address space. It's compared to
the single allocation request at hand!

So we have an elaborate left-hand side of the equation that tries to
assess the exact breathing room the system has available down to a
page - and then compare it to an isolated allocation request with no
additional context. We could fail an allocation of N bytes, but for
two allocations of N/2 bytes we'd do this elaborate dance twice in a
row and then still let N bytes of virtual memory through. This doesn't
make a whole lot of sense.

Let's take a step back and look at the actual goal of the
heuristic. From the documentation:

   Heuristic overcommit handling. Obvious overcommits of address
   space are refused. Used for a typical system. It ensures a
   seriously wild allocation fails while allowing overcommit to
   reduce swap usage.  root is allowed to allocate slightly more
   memory in this mode. This is the default.

If all we want to do is catch clearly bogus allocation requests
irrespective of the general virtual memory situation, the physical
memory counter-part doesn't need to be that complicated, either.

When in GUESS mode, catch wild allocations by comparing their request
size to total amount of ram and swap in the system.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/util.c | 51 +++++----------------------------------------------
 1 file changed, 5 insertions(+), 46 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 05a464929b3e..e2e4f8c3fa12 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -652,7 +652,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed);
  */
 int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 {
-	long free, allowed, reserve;
+	long allowed;
 
 	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
 			-(s64)vm_committed_as_batch * num_online_cpus(),
@@ -667,51 +667,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 		return 0;
 
 	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
-		free = global_zone_page_state(NR_FREE_PAGES);
-		free += global_node_page_state(NR_FILE_PAGES);
-
-		/*
-		 * shmem pages shouldn't be counted as free in this
-		 * case, they can't be purged, only swapped out, and
-		 * that won't affect the overall amount of available
-		 * memory in the system.
-		 */
-		free -= global_node_page_state(NR_SHMEM);
-
-		free += get_nr_swap_pages();
-
-		/*
-		 * Any slabs which are created with the
-		 * SLAB_RECLAIM_ACCOUNT flag claim to have contents
-		 * which are reclaimable, under pressure.  The dentry
-		 * cache and most inode caches should fall into this
-		 */
-		free += global_node_page_state(NR_SLAB_RECLAIMABLE);
-
-		/*
-		 * Part of the kernel memory, which can be released
-		 * under memory pressure.
-		 */
-		free += global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
-
-		/*
-		 * Leave reserved pages. The pages are not for anonymous pages.
-		 */
-		if (free <= totalreserve_pages)
+		if (pages > totalram_pages() + total_swap_pages)
 			goto error;
-		else
-			free -= totalreserve_pages;
-
-		/*
-		 * Reserve some for root
-		 */
-		if (!cap_sys_admin)
-			free -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10);
-
-		if (free > pages)
-			return 0;
-
-		goto error;
+		return 0;
 	}
 
 	allowed = vm_commit_limit();
@@ -725,7 +683,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 	 * Don't let a single process grow so big a user can't recover
 	 */
 	if (mm) {
-		reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
+		long reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
+
 		allowed -= min_t(long, mm->total_vm / 32, reserve);
 	}
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures
  2019-04-12 19:14 [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures Johannes Weiner
@ 2019-04-12 20:06 ` Roman Gushchin
  2019-04-17 12:04   ` Vlastimil Babka
  0 siblings, 1 reply; 4+ messages in thread
From: Roman Gushchin @ 2019-04-12 20:06 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team

On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote:
> With the default overcommit==guess we occasionally run into mmap
> rejections despite plenty of memory that would get dropped under
> pressure but just isn't accounted reclaimable. One example of this is
> dying cgroups pinned by some page cache. A previous case was auxiliary
> path name memory associated with dentries; we have since annotated
> those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
> treat indirectly reclaimable memory as free in overcommit logic")).
> 
> But trying to classify all allocated memory reliably as reclaimable
> and unreclaimable is a bit of a fool's errand. There could be a myriad
> of dependencies that constantly change with kernel versions.
> 
> It becomes even more questionable of an effort when considering how
> this estimate of available memory is used: it's not compared to the
> system-wide allocated virtual memory in any way. It's not even
> compared to the allocating process's address space. It's compared to
> the single allocation request at hand!
> 
> So we have an elaborate left-hand side of the equation that tries to
> assess the exact breathing room the system has available down to a
> page - and then compare it to an isolated allocation request with no
> additional context. We could fail an allocation of N bytes, but for
> two allocations of N/2 bytes we'd do this elaborate dance twice in a
> row and then still let N bytes of virtual memory through. This doesn't
> make a whole lot of sense.
> 
> Let's take a step back and look at the actual goal of the
> heuristic. From the documentation:
> 
>    Heuristic overcommit handling. Obvious overcommits of address
>    space are refused. Used for a typical system. It ensures a
>    seriously wild allocation fails while allowing overcommit to
>    reduce swap usage.  root is allowed to allocate slightly more
>    memory in this mode. This is the default.
> 
> If all we want to do is catch clearly bogus allocation requests
> irrespective of the general virtual memory situation, the physical
> memory counter-part doesn't need to be that complicated, either.
> 
> When in GUESS mode, catch wild allocations by comparing their request
> size to total amount of ram and swap in the system.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

My 2c here: any kinds of percpu counters and percpu data is accounted
as unreclaimable and can alter the calculation significantly.

This is a special problem on hosts, which were idle for some time.
Without any memory pressure, kernel caches do occupy most of the memory,
so than a following attempt to start a workload fails.

With a big pleasure:
Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures
  2019-04-12 20:06 ` Roman Gushchin
@ 2019-04-17 12:04   ` Vlastimil Babka
  2019-04-17 15:05     ` Johannes Weiner
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2019-04-17 12:04 UTC (permalink / raw)
  To: Roman Gushchin, Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team

On 4/12/19 10:06 PM, Roman Gushchin wrote:
> On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote:
>> With the default overcommit==guess we occasionally run into mmap
>> rejections despite plenty of memory that would get dropped under
>> pressure but just isn't accounted reclaimable. One example of this is
>> dying cgroups pinned by some page cache. A previous case was auxiliary
>> path name memory associated with dentries; we have since annotated
>> those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
>> treat indirectly reclaimable memory as free in overcommit logic")).
>>
>> But trying to classify all allocated memory reliably as reclaimable
>> and unreclaimable is a bit of a fool's errand. There could be a myriad
>> of dependencies that constantly change with kernel versions.

Just wondering, did you find at least one another reclaimable case like
those path names?

>> It becomes even more questionable of an effort when considering how
>> this estimate of available memory is used: it's not compared to the
>> system-wide allocated virtual memory in any way. It's not even
>> compared to the allocating process's address space. It's compared to
>> the single allocation request at hand!
>>
>> So we have an elaborate left-hand side of the equation that tries to
>> assess the exact breathing room the system has available down to a
>> page - and then compare it to an isolated allocation request with no
>> additional context. We could fail an allocation of N bytes, but for
>> two allocations of N/2 bytes we'd do this elaborate dance twice in a
>> row and then still let N bytes of virtual memory through. This doesn't
>> make a whole lot of sense.
>>
>> Let's take a step back and look at the actual goal of the
>> heuristic. From the documentation:
>>
>>    Heuristic overcommit handling. Obvious overcommits of address
>>    space are refused. Used for a typical system. It ensures a
>>    seriously wild allocation fails while allowing overcommit to
>>    reduce swap usage.  root is allowed to allocate slightly more
>>    memory in this mode. This is the default.
>>
>> If all we want to do is catch clearly bogus allocation requests
>> irrespective of the general virtual memory situation, the physical
>> memory counter-part doesn't need to be that complicated, either.
>>
>> When in GUESS mode, catch wild allocations by comparing their request
>> size to total amount of ram and swap in the system.
>>
>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> My 2c here: any kinds of percpu counters and percpu data is accounted
> as unreclaimable and can alter the calculation significantly.
> 
> This is a special problem on hosts, which were idle for some time.
> Without any memory pressure, kernel caches do occupy most of the memory,
> so than a following attempt to start a workload fails.

So then we remove the kmalloc-reclaimable caches again as not worth the
trouble anymore (they might be useful for anti-fragmentation purposes,
but that's much harder to quantify), or what?

> With a big pleasure:
> Acked-by: Roman Gushchin <guro@fb.com>
> 
> Thanks!
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures
  2019-04-17 12:04   ` Vlastimil Babka
@ 2019-04-17 15:05     ` Johannes Weiner
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2019-04-17 15:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Roman Gushchin, Andrew Morton, linux-mm, linux-kernel, Kernel Team

On Wed, Apr 17, 2019 at 02:04:17PM +0200, Vlastimil Babka wrote:
> On 4/12/19 10:06 PM, Roman Gushchin wrote:
> > On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote:
> >> With the default overcommit==guess we occasionally run into mmap
> >> rejections despite plenty of memory that would get dropped under
> >> pressure but just isn't accounted reclaimable. One example of this is
> >> dying cgroups pinned by some page cache. A previous case was auxiliary
> >> path name memory associated with dentries; we have since annotated
> >> those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
> >> treat indirectly reclaimable memory as free in overcommit logic")).
> >>
> >> But trying to classify all allocated memory reliably as reclaimable
> >> and unreclaimable is a bit of a fool's errand. There could be a myriad
> >> of dependencies that constantly change with kernel versions.
> 
> Just wondering, did you find at least one another reclaimable case like
> those path names?

I'm only aware of the cgroup structures which can be pinned by a
dentry, inode, or page cache page. But they're an entire tree of
memory allocations, per-cpu memory regions etc. that would be
impossible to annotate correctly; it's also unreclaimable while the
cgroup is user-visible and only becomes reclaimable once rmdir'd.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-17 15:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-12 19:14 [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures Johannes Weiner
2019-04-12 20:06 ` Roman Gushchin
2019-04-17 12:04   ` Vlastimil Babka
2019-04-17 15:05     ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).