From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ukkY=HV=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D2317C433E0
	for <linux-mm@archiver.kernel.org>; Fri, 19 Feb 2021 09:39:25 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 4794264EBF
	for <linux-mm@archiver.kernel.org>; Fri, 19 Feb 2021 09:39:25 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4794264EBF
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 8757F8D0007; Fri, 19 Feb 2021 04:16:11 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 826AB8D0001; Fri, 19 Feb 2021 04:16:11 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 73C8F8D0007; Fri, 19 Feb 2021 04:16:11 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225])
	by kanga.kvack.org (Postfix) with ESMTP id 5ADE78D0001
	for <linux-mm@kvack.org>; Fri, 19 Feb 2021 04:16:11 -0500 (EST)
Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id 1F7073626
	for <linux-mm@kvack.org>; Fri, 19 Feb 2021 09:16:11 +0000 (UTC)
X-FDA: 77834460942.02.728FF32
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf06.hostedemail.com (Postfix) with ESMTP id 4859AC000C45
	for <linux-mm@kvack.org>; Fri, 19 Feb 2021 09:16:11 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1;
	t=1613726169; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=yYsAIWuqG5g0PtLS3vC3gPQPYDZPxGBMvNHr6Q07to4=;
	b=nlWS+saaxoksajbabAIYBwVBt+uZsQwz6nCOYG34Y1s6kCiPBQcqrzCDyycIUsjJ5+4hfk
	bDqBQs0nNzI8lq6UX+lrfdDaZdW5D4rc2SO5ABAN/7CpJ6Y6Y5+yItAIevIcXgkXRqIUHP
	lPdZSsse/w5ZJPF/PUShm0iC9ip8ODI=
Received: from relay2.suse.de (unknown [195.135.221.27])
	by mx2.suse.de (Postfix) with ESMTP id 4A49EACF6;
	Fri, 19 Feb 2021 09:16:09 +0000 (UTC)
Date: Fri, 19 Feb 2021 10:16:08 +0100
From: Michal Hocko <mhocko@suse.com>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ying Huang <ying.huang@intel.com>, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 3/3] mm: Fix missing mem cgroup soft limit tree updates
Message-ID: <YC+B2KvJVSgfVDTe@dhcp22.suse.cz>
References: <cover.1613584277.git.tim.c.chen@linux.intel.com>
 <e269f5df3af1157232b01a9b0dae3edf4880d786.1613584277.git.tim.c.chen@linux.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <e269f5df3af1157232b01a9b0dae3edf4880d786.1613584277.git.tim.c.chen@linux.intel.com>
X-Stat-Signature: 75zromfhijpkn6hayaa68en4z851qpzx
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 4859AC000C45
Received-SPF: none (suse.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from="<mhocko@suse.com>"; helo=mx2.suse.de; client-ip=195.135.220.15
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1613726171-181749
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed 17-02-21 12:41:36, Tim Chen wrote:
> On a per node basis, the mem cgroup soft limit tree on each node tracks
> how much a cgroup has exceeded its soft limit memory limit and sorts
> the cgroup by its excess usage.  On page release, the trees are not
> updated right away, until we have gathered a batch of pages belonging to
> the same cgroup. This reduces the frequency of updating the soft limit tree
> and locking of the tree and associated cgroup.
> 
> However, the batch of pages could contain pages from multiple nodes but
> only the soft limit tree from one node would get updated.  Change the
> logic so that we update the tree in batch of pages, with each batch of
> pages all in the same mem cgroup and memory node.  An update is issued for
> the batch of pages of a node collected till now whenever we encounter
> a page belonging to a different node.  Note that this batching for
> the same node logic is only relevant for v1 cgroup that has a memory
> soft limit.

Let me paste the discussion related to this patch from other reply:
> >> For patch 3 regarding the uncharge_batch, it
> >> is more of an observation that we should uncharge in batch of same node
> >> and not prompted by actual workload.
> >> Thinking more about this, the worst that could happen
> >> is we could have some entries in the soft limit tree that overestimate
> >> the memory used.  The worst that could happen is a soft page reclaim
> >> on that cgroup.  The overhead from extra memcg event update could
> >> be more than a soft page reclaim pass.  So let's drop patch 3
> >> for now.
> >
> > I would still prefer to handle that in the soft limit reclaim path and
> > check each memcg for the soft limit reclaim excess before the reclaim.
> >
> 
> Something like this?
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8bddee75f5cb..b50cae3b2a1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3472,6 +3472,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
>                 if (!mz)
>                         break;
> 
> +               /*
> +                * Soft limit tree is updated based on memcg events sampling.
> +                * We could have missed some updates on page uncharge and
> +                * the cgroup is below soft limit.  Skip useless soft reclaim.
> +                */
> +               if (!soft_limit_excess(mz->memcg))
> +                       continue;
> +
>                 nr_scanned = 0;
>                 reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat,

Yes I meant something like this but then I have looked more closely and
this shouldn't be needed afterall. __mem_cgroup_largest_soft_limit_node
already does all the work
        if (!soft_limit_excess(mz->memcg) ||
            !css_tryget(&mz->memcg->css))
                goto retry;
so this shouldn't really happen.
-- 
Michal Hocko
SUSE Labs