All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Alexander Sosna <alexander@sosna.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Prevent OOM casualties by enforcing memcg limits
Date: Tue, 27 Apr 2021 09:53:38 +0200	[thread overview]
Message-ID: <YIfDAvcjm9FKDAWP@dhcp22.suse.cz> (raw)
In-Reply-To: <ea6db5cc-f862-7c4b-d872-acb29c2d8193@sosna.de>

On Mon 26-04-21 22:04:56, Alexander Sosna wrote:
> Before this commit memory cgroup limits were not enforced during
> allocation.  If a process within a cgroup tries to allocates more
> memory than allowed, the kernel will not prevent the allocation even if
> OVERCOMMIT_NEVER is set.  Than the OOM killer is activated to kill
> processes in the corresponding cgroup.  This behavior is not to be expected
> when setting OVERCOMMIT_NEVER (vm.overcommit_memory = 2) and it is a huge
> problem for applications assuming that the kernel will deny an allocation
> if not enough memory is available, like PostgreSQL.

Memory cgroup controller is by design accounting physically allocated
memory while overcommit policy is a global control of the virtual memory
allocation. Memcg is not aware of the virtual memory commitment so it
cannot really evaluate OVERCOMMIT_NEVER heuristic.

> To prevent this a
> check is implemented to not allow a process to allocate more memory than
> limited by it's cgroup.  This means a process will not be killed while
> accessing pages but will receive errors on memory allocation as
> appropriate.  This gives programs a chance to handle memory allocation
> failures gracefully instead of being reaped.

I am afraid I have to nak this patch. It is changing a long term
semantic of a user interface which can break many existing applications.
So you would need to create a new overcommit mode which would be
explicitly memcg aware.

As mentioned above memcg would need to have some awareness of the
virtual memory committed for the memcg. Without that
OVERCOMMIT_NEVER_MEMCG would effectively turn into OVERCOMMIT_GUESS.
 
> Signed-off-by: Alexander Sosna <alexander@sosna.de>

Nacked-by: Michal Hocko <mhocko@suse.com>

> diff --git a/mm/util.c b/mm/util.c
> index a8bf17f18a81..c84b83c532c6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -853,6 +853,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed);
>   *
>   * Strict overcommit modes added 2002 Feb 26 by Alan Cox.
>   * Additional code 2002 Jul 20 by Robert Love.
> + * Code to enforce memory cgroup limits added 2021 by Alexander Sosna.
>   *
>   * cap_sys_admin is 1 if the process has admin privileges, 0 otherwise.
>   *
> @@ -891,6 +892,34 @@ int __vm_enough_memory(struct mm_struct *mm, long
> pages, int cap_sys_admin)
>  		long reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
> 
>  		allowed -= min_t(long, mm->total_vm / 32, reserve);
> +
> +#ifdef CONFIG_MEMCG
> +		/*
> +		 * If we are in a memory cgroup we also evaluate if the cgroup
> +		 * has enough memory to allocate a new virtual mapping.
> +		 * This is how we can keep processes from exceeding their
> +		 * limits and also prevent that the OOM killer must be
> +		 * awakened.  This gives programs a chance to handle memory
> +		 * allocation failures gracefully and not being reaped.
> +		 * In the current version mem_cgroup_get_max() is used which
> +		 * allows the processes to exceeded their memory limits if
> +		 * enough SWAP is available.  If this is not intended we could
> +		 * use READ_ONCE(memcg->memory.max) instead.
> +		 *
> +		 * This code is only reached if sysctl_overcommit_memory equals
> +		 * OVERCOMMIT_NEVER, both other options are handled above.
> +		 */
> +		{
> +			struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
> +
> +			if (memcg) {
> +				long available = mem_cgroup_get_max(memcg)
> +						- mem_cgroup_size(memcg);
> +
> +				allowed = min_t(long, available, allowed);
> +			}
> +		}
> +#endif
>  	}
> 
>  	if (percpu_counter_read_positive(&vm_committed_as) < allowed)

-- 
Michal Hocko
SUSE Labs

      parent reply	other threads:[~2021-04-27  7:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 20:04 [PATCH] Prevent OOM casualties by enforcing memcg limits Alexander Sosna
2021-04-27  0:09 ` Chris Down
2021-04-27  6:37   ` Alexander Sosna
2021-04-27  8:08     ` Michal Hocko
2021-04-27 11:01       ` Alexander Sosna
2021-04-27 12:11         ` Michal Hocko
2021-04-27 13:43           ` Alexander Sosna
2021-04-27 14:17             ` Michal Hocko
2021-04-27 12:26     ` Chris Down
2021-04-27  7:53 ` Michal Hocko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YIfDAvcjm9FKDAWP@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=alexander@sosna.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.