linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ying Huang <ying.huang@intel.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on usage excess
Date: Fri, 19 Feb 2021 10:11:02 +0100	[thread overview]
Message-ID: <YC+ApsntwnlVfCuK@dhcp22.suse.cz> (raw)
In-Reply-To: <06f1f92f1f7d4e57c4e20c97f435252c16c60a27.1613584277.git.tim.c.chen@linux.intel.com>

On Wed 17-02-21 12:41:35, Tim Chen wrote:
> To rate limit updates to the mem cgroup soft limit tree, we only perform
> updates every SOFTLIMIT_EVENTS_TARGET (defined as 1024) memory events.
> 
> However, this sampling based updates may miss a critical update: i.e. when
> the mem cgroup first exceeded its limit but it was not on the soft limit tree.
> It should be on the tree at that point so it could be subjected to soft
> limit page reclaim. If the mem cgroup had few memory events compared with
> other mem cgroups, we may not update it and place in on the mem cgroup
> soft limit tree for many memory events.  And this mem cgroup excess
> usage could creep up and the mem cgroup could be hidden from the soft
> limit page reclaim for a long time.
> 
> Fix this issue by forcing an update to the mem cgroup soft limit tree if a
> mem cgroup has exceeded its memory soft limit but it is not on the mem
> cgroup soft limit tree.

Let me copy your clarification from the other reply (this should go to
the changelog btw.):
> The sceanrio I saw was we have multiple cgroups running pmbench.
> One cgroup exceeded the soft limit and soft reclaim is active on
> that cgroup.  So there are a whole bunch of memcg events associated
> with that cgroup.  Then another cgroup starts to exceed its
> soft limit.
>
> Memory is accessed at a much lower frequency
> for the second cgroup.  The memcg event update was not triggered for the
> second cgroup as the memcg event update didn't happened on the 1024th sample.
> The second cgroup was not placed on the soft limit tree and we didn't
> try to reclaim the excess pages.
>
> As time goes on, we saw that the first cgroup was kept close to its
> soft limit due to reclaim activities, while the second cgroup's memory
> usage slowly creep up as it keeps getting missed from the soft limit tree
> update as the update didn't fall on the modulo 1024 sample.  As a result,
> the memory usage of the second cgroup keeps growing over the soft limit
> for a long time due to its relatively rare occurrence.

Soft limit is evaluated every THRESHOLDS_EVENTS_TARGET * SOFTLIMIT_EVENTS_TARGET.
If all events correspond with a newly charged memory and the last event
was just about the soft limit boundary then we should be bound by 128k
pages (512M and much more if this were huge pages) which is a lot!
I haven't realized this was that much. Now I see the problem. This would
be a useful information for the changelog.

Your fix is focusing on the over-the-limit boundary which will solve the
problem but wouldn't that lead to to updates happening too often in
pathological situation when a memcg would get reclaimed immediatelly?

One way around that would be to lower the SOFTLIMIT_EVENTS_TARGET. Have
you tried that? Do we even need a separate treshold for soft limit, why
cannot we simply update the tree each MEM_CGROUP_TARGET_THRESH?
 
> Reviewed-by: Ying Huang <ying.huang@intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  mm/memcontrol.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a51bf90732cb..d72449eeb85a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -985,15 +985,22 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
>   */
>  static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
>  {
> +	struct mem_cgroup_per_node *mz;
> +	bool force_update = false;
> +
> +	mz = mem_cgroup_nodeinfo(memcg, page_to_nid(page));
> +	if (mz && !mz->on_tree && soft_limit_excess(mz->memcg) > 0)
> +		force_update = true;
> +
>  	/* threshold event is triggered in finer grain than soft limit */
> -	if (unlikely(mem_cgroup_event_ratelimit(memcg,
> +	if (unlikely((force_update) || mem_cgroup_event_ratelimit(memcg,
>  						MEM_CGROUP_TARGET_THRESH))) {
>  		bool do_softlimit;
>  
>  		do_softlimit = mem_cgroup_event_ratelimit(memcg,
>  						MEM_CGROUP_TARGET_SOFTLIMIT);
>  		mem_cgroup_threshold(memcg);
> -		if (unlikely(do_softlimit))
> +		if (unlikely((force_update) || do_softlimit))
>  			mem_cgroup_update_tree(memcg, page);
>  	}
>  }
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-02-19  9:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-17 20:41 [PATCH v2 0/3] Soft limit memory management bug fixes Tim Chen
2021-02-17 20:41 ` [PATCH v2 1/3] mm: Fix dropped memcg from mem cgroup soft limit tree Tim Chen
2021-02-18  8:24   ` Michal Hocko
2021-02-18 18:30     ` Tim Chen
2021-02-18 19:13       ` Michal Hocko
2021-02-18 19:51         ` Tim Chen
2021-02-18 19:13   ` Michal Hocko
2021-03-04 17:35     ` Tim Chen
2021-03-05  9:11       ` Michal Hocko
2021-03-05 19:07         ` Tim Chen
2021-03-08  8:34           ` Michal Hocko
2021-02-17 20:41 ` [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on usage excess Tim Chen
2021-02-19  9:11   ` Michal Hocko [this message]
2021-02-19 18:59     ` Tim Chen
2021-02-20 16:23       ` Tim Chen
2021-02-22  8:40       ` Michal Hocko
2021-02-22 17:41         ` Tim Chen
2021-02-22 19:09           ` Michal Hocko
2021-02-22 19:23             ` Tim Chen
2021-02-22 19:48             ` Tim Chen
2021-02-24 11:53               ` Michal Hocko
2021-02-25 22:48                 ` Tim Chen
2021-02-26  8:52                   ` Michal Hocko
2021-02-27  0:56                     ` Tim Chen
2021-03-01  7:39                       ` Michal Hocko
2021-02-25 22:25           ` Tim Chen
2021-03-02  6:25   ` [mm] 4f09feb8bf: vm-scalability.throughput -4.3% regression kernel test robot
2021-02-17 20:41 ` [PATCH v2 3/3] mm: Fix missing mem cgroup soft limit tree updates Tim Chen
2021-02-18  5:56   ` Johannes Weiner
2021-02-22 18:38     ` Tim Chen
2021-02-23 15:18       ` Johannes Weiner
2021-02-19  9:16   ` Michal Hocko
2021-02-19 19:28     ` Tim Chen
2021-02-22  8:41       ` Michal Hocko
2021-02-22 17:45         ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YC+ApsntwnlVfCuK@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).