From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=wqBs=7I=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 57D06C433DF
	for <linux-mm@archiver.kernel.org>; Tue, 26 May 2020 20:12:02 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 1769C20899
	for <linux-mm@archiver.kernel.org>; Tue, 26 May 2020 20:12:02 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="PN71YNbU"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1769C20899
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 95735800B6; Tue, 26 May 2020 16:12:01 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9074880010; Tue, 26 May 2020 16:12:01 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7F502800B6; Tue, 26 May 2020 16:12:01 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40])
	by kanga.kvack.org (Postfix) with ESMTP id 655D180010
	for <linux-mm@kvack.org>; Tue, 26 May 2020 16:12:01 -0400 (EDT)
Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 12221246B
	for <linux-mm@kvack.org>; Tue, 26 May 2020 20:12:01 +0000 (UTC)
X-FDA: 76859966442.04.side83_85f82b38fb24e
X-HE-Tag: side83_85f82b38fb24e
X-Filterd-Recvd-Size: 4980
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by imf40.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 26 May 2020 20:12:00 +0000 (UTC)
Received: from kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net (unknown [163.114.132.6])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPSA id 0F34A20870;
	Tue, 26 May 2020 20:11:59 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=default; t=1590523919;
	bh=Lq/Sk1VoyMw9A5M35dRRIZ1osxdCzotsU9LLXg5dNDk=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=PN71YNbUpE9ADDgbIAYUJrWEu8Zu0Qp8ZjEl2B6L5sMWRZLy/2Y/JHJyzXPc1Ftta
	 I1bmCvzqvfSF0NirOuDchDW4LIuK8FQqsvzZImXe8eQ6CgwfRKtHWYAGt5X16UcIOT
	 P+AzyGV4PDKJd9jXussMUOrl8yxurNg+wySvOszw=
Date: Tue, 26 May 2020 13:11:57 -0700
From: Jakub Kicinski <kuba@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, kernel-team@fb.com,
 tj@kernel.org, chris@chrisdown.name, cgroups@vger.kernel.org,
 shakeelb@google.com, mhocko@kernel.org
Subject: Re: [PATCH mm v5 RESEND 4/4] mm: automatically penalize tasks with
 high swap use
Message-ID: <20200526131157.79c17940@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
In-Reply-To: <20200526153309.GD848026@cmpxchg.org>
References: <20200521002411.3963032-1-kuba@kernel.org>
	<20200521002411.3963032-5-kuba@kernel.org>
	<20200526153309.GD848026@cmpxchg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, 26 May 2020 11:33:09 -0400 Johannes Weiner wrote:
> On Wed, May 20, 2020 at 05:24:11PM -0700, Jakub Kicinski wrote:
> > Add a memory.swap.high knob, which can be used to protect the system
> > from SWAP exhaustion. The mechanism used for penalizing is similar
> > to memory.high penalty (sleep on return to user space), but with
> > a less steep slope.  
> 
> The last part is no longer true after incorporating Michal's feedback.
>
> > +	/*
> > +	 * Make the swap curve more gradual, swap can be considered "cheaper",
> > +	 * and is allocated in larger chunks. We want the delays to be gradual.
> > +	 */  
> 
> This comment is also out-of-date, as the same curve is being applied.

Indeed :S
 
> > +	penalty_jiffies += calculate_high_delay(memcg, nr_pages,
> > +						swap_find_max_overage(memcg));
> > +
> >  	/*
> >  	 * Clamp the max delay per usermode return so as to still keep the
> >  	 * application moving forwards and also permit diagnostics, albeit
> > @@ -2585,12 +2608,25 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  	 * reclaim, the cost of mismatch is negligible.
> >  	 */
> >  	do {
> > -		if (page_counter_is_above_high(&memcg->memory)) {
> > -			/* Don't bother a random interrupted task */
> > -			if (in_interrupt()) {
> > +		bool mem_high, swap_high;
> > +
> > +		mem_high = page_counter_is_above_high(&memcg->memory);
> > +		swap_high = page_counter_is_above_high(&memcg->swap);  
> 
> Please open-code these checks instead - we don't really do getters and
> predicates for these, and only have the setters because they are more
> complicated operations.

I added this helper because the calculation doesn't fit into 80 chars. 

In particular reclaim_high will need a temporary variable or IMHO
questionable line split.

static void reclaim_high(struct mem_cgroup *memcg,
			 unsigned int nr_pages,
			 gfp_t gfp_mask)
{
	do {
		if (!page_counter_is_above_high(&memcg->memory))
			continue;
		memcg_memory_event(memcg, MEMCG_HIGH);
		try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
	} while ((memcg = parent_mem_cgroup(memcg)) &&
		 !mem_cgroup_is_root(memcg));
}

What's your preference? Mine is a helper, but I'm probably not
sensitive enough to the ontology here :)

> > +		if (mem_high || swap_high) {
> > +			/* Use one counter for number of pages allocated
> > +			 * under pressure to save struct task space and
> > +			 * avoid two separate hierarchy walks.
> > +			 /*
> >  			current->memcg_nr_pages_over_high += batch;  
> 
> That comment style is leaking out of the networking code ;-) Please
> use the customary style in this code base, /*\n *...
> 
> As for one counter instead of two: I'm not sure that question arises
> in the reader. There have also been some questions recently what the
> counter actually means. How about the following:
> 
> 			/*
> 			 * The allocating tasks in this cgroup will need to do
> 			 * reclaim or be throttled to prevent further growth
> 			 * of the memory or swap footprints.
> 			 *
> 			 * Target some best-effort fairness between the tasks,
> 			 * and distribute reclaim work and delay penalties
> 			 * based on how much each task is actually allocating.
> 			 */

sounds good!

> Otherwise, the patch looks good to me.

Thanks!


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jakub Kicinski <kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: [PATCH mm v5 RESEND 4/4] mm: automatically penalize tasks with
 high swap use
Date: Tue, 26 May 2020 13:11:57 -0700
Message-ID: <20200526131157.79c17940@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
References: <20200521002411.3963032-1-kuba@kernel.org>
        <20200521002411.3963032-5-kuba@kernel.org>
        <20200526153309.GD848026@cmpxchg.org>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Return-path: <cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=default; t=1590523919;
        bh=Lq/Sk1VoyMw9A5M35dRRIZ1osxdCzotsU9LLXg5dNDk=;
        h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
        b=PN71YNbUpE9ADDgbIAYUJrWEu8Zu0Qp8ZjEl2B6L5sMWRZLy/2Y/JHJyzXPc1Ftta
         I1bmCvzqvfSF0NirOuDchDW4LIuK8FQqsvzZImXe8eQ6CgwfRKtHWYAGt5X16UcIOT
         P+AzyGV4PDKJd9jXussMUOrl8yxurNg+wySvOszw=
In-Reply-To: <20200526153309.GD848026-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
List-ID: <cgroups.vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org, tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

On Tue, 26 May 2020 11:33:09 -0400 Johannes Weiner wrote:
> On Wed, May 20, 2020 at 05:24:11PM -0700, Jakub Kicinski wrote:
> > Add a memory.swap.high knob, which can be used to protect the system
> > from SWAP exhaustion. The mechanism used for penalizing is similar
> > to memory.high penalty (sleep on return to user space), but with
> > a less steep slope.  
> 
> The last part is no longer true after incorporating Michal's feedback.
>
> > +	/*
> > +	 * Make the swap curve more gradual, swap can be considered "cheaper",
> > +	 * and is allocated in larger chunks. We want the delays to be gradual.
> > +	 */  
> 
> This comment is also out-of-date, as the same curve is being applied.

Indeed :S
 
> > +	penalty_jiffies += calculate_high_delay(memcg, nr_pages,
> > +						swap_find_max_overage(memcg));
> > +
> >  	/*
> >  	 * Clamp the max delay per usermode return so as to still keep the
> >  	 * application moving forwards and also permit diagnostics, albeit
> > @@ -2585,12 +2608,25 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  	 * reclaim, the cost of mismatch is negligible.
> >  	 */
> >  	do {
> > -		if (page_counter_is_above_high(&memcg->memory)) {
> > -			/* Don't bother a random interrupted task */
> > -			if (in_interrupt()) {
> > +		bool mem_high, swap_high;
> > +
> > +		mem_high = page_counter_is_above_high(&memcg->memory);
> > +		swap_high = page_counter_is_above_high(&memcg->swap);  
> 
> Please open-code these checks instead - we don't really do getters and
> predicates for these, and only have the setters because they are more
> complicated operations.

I added this helper because the calculation doesn't fit into 80 chars. 

In particular reclaim_high will need a temporary variable or IMHO
questionable line split.

static void reclaim_high(struct mem_cgroup *memcg,
			 unsigned int nr_pages,
			 gfp_t gfp_mask)
{
	do {
		if (!page_counter_is_above_high(&memcg->memory))
			continue;
		memcg_memory_event(memcg, MEMCG_HIGH);
		try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
	} while ((memcg = parent_mem_cgroup(memcg)) &&
		 !mem_cgroup_is_root(memcg));
}

What's your preference? Mine is a helper, but I'm probably not
sensitive enough to the ontology here :)

> > +		if (mem_high || swap_high) {
> > +			/* Use one counter for number of pages allocated
> > +			 * under pressure to save struct task space and
> > +			 * avoid two separate hierarchy walks.
> > +			 /*
> >  			current->memcg_nr_pages_over_high += batch;  
> 
> That comment style is leaking out of the networking code ;-) Please
> use the customary style in this code base, /*\n *...
> 
> As for one counter instead of two: I'm not sure that question arises
> in the reader. There have also been some questions recently what the
> counter actually means. How about the following:
> 
> 			/*
> 			 * The allocating tasks in this cgroup will need to do
> 			 * reclaim or be throttled to prevent further growth
> 			 * of the memory or swap footprints.
> 			 *
> 			 * Target some best-effort fairness between the tasks,
> 			 * and distribute reclaim work and delay penalties
> 			 * based on how much each task is actually allocating.
> 			 */

sounds good!

> Otherwise, the patch looks good to me.

Thanks!