All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org,
	chris@chrisdown.name, cgroups@vger.kernel.org,
	shakeelb@google.com
Subject: Re: [PATCH mm v6 4/4] mm: automatically penalize tasks with high swap use
Date: Thu, 4 Jun 2020 17:57:00 +0200	[thread overview]
Message-ID: <20200604155700.GD4362@dhcp22.suse.cz> (raw)
In-Reply-To: <20200527195846.102707-5-kuba@kernel.org>

On Wed 27-05-20 12:58:46, Jakub Kicinski wrote:
> Add a memory.swap.high knob, which can be used to protect the system
> from SWAP exhaustion. The mechanism used for penalizing is similar
> to memory.high penalty (sleep on return to user space).
> 
> That is not to say that the knob itself is equivalent to memory.high.
> The objective is more to protect the system from potentially buggy
> tasks consuming a lot of swap and impacting other tasks, or even
> bringing the whole system to stand still with complete SWAP
> exhaustion. Hopefully without the need to find per-task hard
> limits.
> 
> Slowing misbehaving tasks down gradually allows user space oom
> killers or other protection mechanisms to react. oomd and earlyoom
> already do killing based on swap exhaustion, and memory.swap.high
> protection will help implement such userspace oom policies more
> reliably.
> 
> We can use one counter for number of pages allocated under
> pressure to save struct task space and avoid two separate
> hierarchy walks on the hot path. The exact overage is
> calculated on return to user space, anyway.
> 
> Take the new high limit into account when determining if swap
> is "full". Borrowing the explanation from Johannes:
> 
>   The idea behind "swap full" is that as long as the workload has plenty
>   of swap space available and it's not changing its memory contents, it
>   makes sense to generously hold on to copies of data in the swap
>   device, even after the swapin. A later reclaim cycle can drop the page
>   without any IO. Trading disk space for IO.
> 
>   But the only two ways to reclaim a swap slot is when they're faulted
>   in and the references go away, or by scanning the virtual address space
>   like swapoff does - which is very expensive (one could argue it's too
>   expensive even for swapoff, it's often more practical to just reboot).
> 
>   So at some point in the fill level, we have to start freeing up swap
>   slots on fault/swapin. Otherwise we could eventually run out of swap
>   slots while they're filled with copies of data that is also in RAM.
> 
>   We don't want to OOM a workload because its available swap space is
>   filled with redundant cache.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

I am sorry for being late here but thanks for adding clarifications
which make the semantic much more clear now! Also thanks for simplifying 
the throttling implementation. If a different scaling is needed then
this can be added later on.

I do not see any other problems with the patch.

Thanks!
-- 
Michal Hocko
SUSE Labs


WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Jakub Kicinski <kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH mm v6 4/4] mm: automatically penalize tasks with high swap use
Date: Thu, 4 Jun 2020 17:57:00 +0200	[thread overview]
Message-ID: <20200604155700.GD4362@dhcp22.suse.cz> (raw)
In-Reply-To: <20200527195846.102707-5-kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Wed 27-05-20 12:58:46, Jakub Kicinski wrote:
> Add a memory.swap.high knob, which can be used to protect the system
> from SWAP exhaustion. The mechanism used for penalizing is similar
> to memory.high penalty (sleep on return to user space).
> 
> That is not to say that the knob itself is equivalent to memory.high.
> The objective is more to protect the system from potentially buggy
> tasks consuming a lot of swap and impacting other tasks, or even
> bringing the whole system to stand still with complete SWAP
> exhaustion. Hopefully without the need to find per-task hard
> limits.
> 
> Slowing misbehaving tasks down gradually allows user space oom
> killers or other protection mechanisms to react. oomd and earlyoom
> already do killing based on swap exhaustion, and memory.swap.high
> protection will help implement such userspace oom policies more
> reliably.
> 
> We can use one counter for number of pages allocated under
> pressure to save struct task space and avoid two separate
> hierarchy walks on the hot path. The exact overage is
> calculated on return to user space, anyway.
> 
> Take the new high limit into account when determining if swap
> is "full". Borrowing the explanation from Johannes:
> 
>   The idea behind "swap full" is that as long as the workload has plenty
>   of swap space available and it's not changing its memory contents, it
>   makes sense to generously hold on to copies of data in the swap
>   device, even after the swapin. A later reclaim cycle can drop the page
>   without any IO. Trading disk space for IO.
> 
>   But the only two ways to reclaim a swap slot is when they're faulted
>   in and the references go away, or by scanning the virtual address space
>   like swapoff does - which is very expensive (one could argue it's too
>   expensive even for swapoff, it's often more practical to just reboot).
> 
>   So at some point in the fill level, we have to start freeing up swap
>   slots on fault/swapin. Otherwise we could eventually run out of swap
>   slots while they're filled with copies of data that is also in RAM.
> 
>   We don't want to OOM a workload because its available swap space is
>   filled with redundant cache.
> 
> Signed-off-by: Jakub Kicinski <kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

I am sorry for being late here but thanks for adding clarifications
which make the semantic much more clear now! Also thanks for simplifying 
the throttling implementation. If a different scaling is needed then
this can be added later on.

I do not see any other problems with the patch.

Thanks!
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2020-06-04 15:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27 19:58 [PATCH mm v6 0/4] memcg: Slow down swap allocation as the available space gets depleted Jakub Kicinski
2020-05-27 19:58 ` Jakub Kicinski
2020-05-27 19:58 ` [PATCH mm v6 1/4] mm: prepare for swap over-high accounting and penalty calculation Jakub Kicinski
2020-05-27 19:58   ` Jakub Kicinski
2020-05-27 19:58 ` [PATCH mm v6 2/4] mm: move penalty delay clamping out of calculate_high_delay() Jakub Kicinski
2020-05-27 19:58   ` Jakub Kicinski
2020-05-27 19:58 ` [PATCH mm v6 3/4] mm: move cgroup high memory limit setting into struct page_counter Jakub Kicinski
2020-05-27 19:58   ` Jakub Kicinski
2020-05-27 19:58 ` [PATCH mm v6 4/4] mm: automatically penalize tasks with high swap use Jakub Kicinski
2020-05-27 19:58   ` Jakub Kicinski
2020-05-27 20:05   ` Johannes Weiner
2020-05-27 20:05     ` Johannes Weiner
2020-06-04 15:57   ` Michal Hocko [this message]
2020-06-04 15:57     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200604155700.GD4362@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.