All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jakub Kicinski <kuba@kernel.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	kernel-team@fb.com, tj@kernel.org, chris@chrisdown.name,
	cgroups@vger.kernel.org, shakeelb@google.com
Subject: Re: [PATCH mm v2 3/3] mm: automatically penalize tasks with high swap use
Date: Fri, 15 May 2020 09:14:58 +0200	[thread overview]
Message-ID: <20200515071458.GE29153@dhcp22.suse.cz> (raw)
In-Reply-To: <20200514202130.GA591266@cmpxchg.org>

On Thu 14-05-20 16:21:30, Johannes Weiner wrote:
> On Thu, May 14, 2020 at 09:42:46AM +0200, Michal Hocko wrote:
> > On Wed 13-05-20 11:36:23, Jakub Kicinski wrote:
> > > On Wed, 13 May 2020 10:32:49 +0200 Michal Hocko wrote:
> > > > On Tue 12-05-20 10:55:36, Jakub Kicinski wrote:
> > > > > On Tue, 12 May 2020 09:26:34 +0200 Michal Hocko wrote:  
> > > > > > On Mon 11-05-20 15:55:16, Jakub Kicinski wrote:  
> > > > > > > Use swap.high when deciding if swap is full.    
> > > > > > 
> > > > > > Please be more specific why.  
> > > > > 
> > > > > How about:
> > > > > 
> > > > >     Use swap.high when deciding if swap is full to influence ongoing
> > > > >     swap reclaim in a best effort manner.  
> > > > 
> > > > This is still way too vague. The crux is why should we treat hard and
> > > > high swap limit the same for mem_cgroup_swap_full purpose. Please
> > > > note that I am not saying this is wrong. I am asking for a more
> > > > detailed explanation mostly because I would bet that somebody
> > > > stumbles over this sooner or later.
> > > 
> > > Stumbles in what way?
> > 
> > Reading the code and trying to understand why this particular decision
> > has been made. Because it might be surprising that the hard and high
> > limits are treated same here.
> 
> I don't quite understand the controversy.

I do not think there is any controversy. All I am asking for is a
clarification because this is non-intuitive.
 
> The idea behind "swap full" is that as long as the workload has plenty
> of swap space available and it's not changing its memory contents, it
> makes sense to generously hold on to copies of data in the swap
> device, even after the swapin. A later reclaim cycle can drop the page
> without any IO. Trading disk space for IO.
> 
> But the only two ways to reclaim a swap slot is when they're faulted
> in and the references go away, or by scanning the virtual address space
> like swapoff does - which is very expensive (one could argue it's too
> expensive even for swapoff, it's often more practical to just reboot).
> 
> So at some point in the fill level, we have to start freeing up swap
> slots on fault/swapin. Otherwise we could eventually run out of swap
> slots while they're filled with copies of data that is also in RAM.
> 
> We don't want to OOM a workload because its available swap space is
> filled with redundant cache.

Thanks this is a useful summary.
 
> That applies to physical swap limits, swap.max, and naturally also to
> swap.high which is a limit to implement userspace OOM for swap space
> exhaustion.
> 
> > > Isn't it expected for the kernel to take reasonable precautions to
> > > avoid hitting limits?
> > 
> > Isn't the throttling itself the precautious? How does the swap cache
> > and its control via mem_cgroup_swap_full interact here. See? This is
> > what I am asking to have explained in the changelog.
> 
> It sounds like we need better documentation of what vm_swap_full() and
> friends are there for. It should have been obvious why swap.high - a
> limit on available swap space - hooks into it.

Agreed. The primary source for a confusion is the naming here. Because
vm_swap_full doesn't really try to tell that the swap is full. It merely
tries to tell that it is getting full and so duplicated data should be
dropped.

-- 
Michal Hocko
SUSE Labs


WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Jakub Kicinski <kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH mm v2 3/3] mm: automatically penalize tasks with high swap use
Date: Fri, 15 May 2020 09:14:58 +0200	[thread overview]
Message-ID: <20200515071458.GE29153@dhcp22.suse.cz> (raw)
In-Reply-To: <20200514202130.GA591266-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

On Thu 14-05-20 16:21:30, Johannes Weiner wrote:
> On Thu, May 14, 2020 at 09:42:46AM +0200, Michal Hocko wrote:
> > On Wed 13-05-20 11:36:23, Jakub Kicinski wrote:
> > > On Wed, 13 May 2020 10:32:49 +0200 Michal Hocko wrote:
> > > > On Tue 12-05-20 10:55:36, Jakub Kicinski wrote:
> > > > > On Tue, 12 May 2020 09:26:34 +0200 Michal Hocko wrote:  
> > > > > > On Mon 11-05-20 15:55:16, Jakub Kicinski wrote:  
> > > > > > > Use swap.high when deciding if swap is full.    
> > > > > > 
> > > > > > Please be more specific why.  
> > > > > 
> > > > > How about:
> > > > > 
> > > > >     Use swap.high when deciding if swap is full to influence ongoing
> > > > >     swap reclaim in a best effort manner.  
> > > > 
> > > > This is still way too vague. The crux is why should we treat hard and
> > > > high swap limit the same for mem_cgroup_swap_full purpose. Please
> > > > note that I am not saying this is wrong. I am asking for a more
> > > > detailed explanation mostly because I would bet that somebody
> > > > stumbles over this sooner or later.
> > > 
> > > Stumbles in what way?
> > 
> > Reading the code and trying to understand why this particular decision
> > has been made. Because it might be surprising that the hard and high
> > limits are treated same here.
> 
> I don't quite understand the controversy.

I do not think there is any controversy. All I am asking for is a
clarification because this is non-intuitive.
 
> The idea behind "swap full" is that as long as the workload has plenty
> of swap space available and it's not changing its memory contents, it
> makes sense to generously hold on to copies of data in the swap
> device, even after the swapin. A later reclaim cycle can drop the page
> without any IO. Trading disk space for IO.
> 
> But the only two ways to reclaim a swap slot is when they're faulted
> in and the references go away, or by scanning the virtual address space
> like swapoff does - which is very expensive (one could argue it's too
> expensive even for swapoff, it's often more practical to just reboot).
> 
> So at some point in the fill level, we have to start freeing up swap
> slots on fault/swapin. Otherwise we could eventually run out of swap
> slots while they're filled with copies of data that is also in RAM.
> 
> We don't want to OOM a workload because its available swap space is
> filled with redundant cache.

Thanks this is a useful summary.
 
> That applies to physical swap limits, swap.max, and naturally also to
> swap.high which is a limit to implement userspace OOM for swap space
> exhaustion.
> 
> > > Isn't it expected for the kernel to take reasonable precautions to
> > > avoid hitting limits?
> > 
> > Isn't the throttling itself the precautious? How does the swap cache
> > and its control via mem_cgroup_swap_full interact here. See? This is
> > what I am asking to have explained in the changelog.
> 
> It sounds like we need better documentation of what vm_swap_full() and
> friends are there for. It should have been obvious why swap.high - a
> limit on available swap space - hooks into it.

Agreed. The primary source for a confusion is the naming here. Because
vm_swap_full doesn't really try to tell that the swap is full. It merely
tries to tell that it is getting full and so duplicated data should be
dropped.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-05-15  7:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 22:55 [PATCH mm v2 0/3] memcg: Slow down swap allocation as the available space gets depleted Jakub Kicinski
2020-05-11 22:55 ` Jakub Kicinski
2020-05-11 22:55 ` [PATCH mm v2 1/3] mm: prepare for swap over-high accounting and penalty calculation Jakub Kicinski
2020-05-11 22:55   ` Jakub Kicinski
2020-05-12  7:08   ` Michal Hocko
2020-05-12  7:08     ` Michal Hocko
2020-05-12 17:28     ` Jakub Kicinski
2020-05-12 17:28       ` Jakub Kicinski
2020-05-13  8:06       ` Michal Hocko
2020-05-13  8:06         ` Michal Hocko
2020-05-11 22:55 ` [PATCH mm v2 2/3] mm: move penalty delay clamping out of calculate_high_delay() Jakub Kicinski
2020-05-11 22:55   ` Jakub Kicinski
2020-05-11 22:55 ` [PATCH mm v2 3/3] mm: automatically penalize tasks with high swap use Jakub Kicinski
2020-05-11 22:55   ` Jakub Kicinski
2020-05-12  7:26   ` Michal Hocko
2020-05-12  7:26     ` Michal Hocko
2020-05-12 17:55     ` Jakub Kicinski
2020-05-12 17:55       ` Jakub Kicinski
2020-05-13  8:32       ` Michal Hocko
2020-05-13  8:32         ` Michal Hocko
2020-05-13 18:36         ` Jakub Kicinski
2020-05-13 18:36           ` Jakub Kicinski
2020-05-14  7:42           ` Michal Hocko
2020-05-14  7:42             ` Michal Hocko
2020-05-14 20:21             ` Johannes Weiner
2020-05-14 20:21               ` Johannes Weiner
2020-05-15  7:14               ` Michal Hocko [this message]
2020-05-15  7:14                 ` Michal Hocko
2020-05-13  8:38   ` Michal Hocko
2020-05-13  8:38     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515071458.GE29153@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.