From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C7277C433EF
	for <linux-mm@archiver.kernel.org>; Fri, 11 Feb 2022 12:13:49 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E75BA6B0078; Fri, 11 Feb 2022 07:13:48 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E24B56B007B; Fri, 11 Feb 2022 07:13:48 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CEDAA6B007D; Fri, 11 Feb 2022 07:13:48 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144])
	by kanga.kvack.org (Postfix) with ESMTP id C08856B0078
	for <linux-mm@kvack.org>; Fri, 11 Feb 2022 07:13:48 -0500 (EST)
Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 85F738249980
	for <linux-mm@kvack.org>; Fri, 11 Feb 2022 12:13:48 +0000 (UTC)
X-FDA: 79130390136.31.60C3F85
Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47])
	by imf12.hostedemail.com (Postfix) with ESMTP id 0F09940004
	for <linux-mm@kvack.org>; Fri, 11 Feb 2022 12:13:47 +0000 (UTC)
Received: by mail-ej1-f47.google.com with SMTP id ft32so6714799ejc.11
        for <linux-mm@kvack.org>; Fri, 11 Feb 2022 04:13:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=chrisdown.name; s=google;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to:user-agent;
        bh=OZFYLN+S6IkbZKtWQGnLOGDHtfELHim7rgW7KJNl2oA=;
        b=GgzQiLfzSmdOrEWpzLCjSq3QC0hcb9rDCloigLalwdcFW/g9GKJ3a3Z0+28hLH8wbK
         xp8IGtFCVCVQSyA9ZhpEaqrHjiShC/ZCizU4JmHjIzZ1F66JmvkQELHdJPnjFBAkJI3Q
         hXYU9/IjNZR9rTRX5wDuSaFBFFg905KXUmqWg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to:user-agent;
        bh=OZFYLN+S6IkbZKtWQGnLOGDHtfELHim7rgW7KJNl2oA=;
        b=OeI7DGjgJc6vp+OV5kqKWsc+OXFVn6ifH4NQ21VY+1TTCGx7ZWQwr3IKZxhNdJLgyn
         2pLb5FEmH0wI52I4JX+lN/adf9O7Y432D5yWe6+QrsOryUsLHIDzubF5EitvPF2urZN/
         GcL+GRjLplJirol8WUhJPrl1cVHRMYWOhuxRU7u2C/Y7d4GIAWOjTkwSv76+FYgfjXdl
         tf5HW19r5D/rBnsH89q2RQQUyG5t0UeJ4dNI7gMlM9yTZkwONY7oFQrHo7hX/2nz/P5C
         Iki/v55HhkD0vc0mUhzzoBypoQIwU71WYEbv4J+2STOFLUEOR4cv18CSlrtSfdOfAicx
         zG2g==
X-Gm-Message-State: AOAM532nawJx6Lyfy2HeGlPoLFI7DaV81fBtE7nsGYA+jDtTnEz8Y/74
	yh2F4lHk3UcOXwLix58z5tF9Cw==
X-Google-Smtp-Source: ABdhPJyRtqUrUs+M3S+hu6tQUEiRGJ5QkqmMDrFjSkY5y8JkhDvsNhfxzQOQjlNJnDEFjbBJjNU4UA==
X-Received: by 2002:a17:906:3b42:: with SMTP id h2mr1147665ejf.647.1644581626601;
        Fri, 11 Feb 2022 04:13:46 -0800 (PST)
Received: from localhost ([2620:10d:c093:400::5:bc0d])
        by smtp.gmail.com with ESMTPSA id i22sm5828777ejx.128.2022.02.11.04.13.45
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 11 Feb 2022 04:13:45 -0800 (PST)
Date: Fri, 11 Feb 2022 12:13:45 +0000
From: Chris Down <chris@chrisdown.name>
To: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.com>,
	Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>, cgroups@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 4/4] memcg: synchronously enforce memory.high for
 large overcharges
Message-ID: <YgZS+YijLo0/WmEd@chrisdown.name>
References: <20220211064917.2028469-1-shakeelb@google.com>
 <20220211064917.2028469-5-shakeelb@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
In-Reply-To: <20220211064917.2028469-5-shakeelb@google.com>
User-Agent: Mutt/2.1.5 (31b18ae9) (2021-12-30)
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 0F09940004
X-Stat-Signature: jamzd8qdzzc7zriorkeq9u6wjb41o664
Authentication-Results: imf12.hostedemail.com;
	dkim=pass header.d=chrisdown.name header.s=google header.b=GgzQiLfz;
	dmarc=pass (policy=none) header.from=chrisdown.name;
	spf=pass (imf12.hostedemail.com: domain of chris@chrisdown.name designates 209.85.218.47 as permitted sender) smtp.mailfrom=chris@chrisdown.name
X-HE-Tag: 1644581627-286030
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Shakeel Butt writes:
>The high limit is used to throttle the workload without invoking the
>oom-killer. Recently we tried to use the high limit to right size our
>internal workloads. More specifically dynamically adjusting the limits
>of the workload without letting the workload get oom-killed. However due
>to the limitation of the implementation of high limit enforcement, we
>observed the mechanism fails for some real workloads.
>
>The high limit is enforced on return-to-userspace i.e. the kernel let
>the usage goes over the limit and when the execution returns to
>userspace, the high reclaim is triggered and the process can get
>throttled as well. However this mechanism fails for workloads which do
>large allocations in a single kernel entry e.g. applications that
>mlock() a large chunk of memory in a single syscall. Such applications
>bypass the high limit and can trigger the oom-killer.
>
>To make high limit enforcement more robust, this patch makes the limit
>enforcement synchronous only if the accumulated overcharge becomes
>larger than MEMCG_CHARGE_BATCH. So, most of the allocations would still
>be throttled on the return-to-userspace path but only the extreme
>allocations which accumulates large amount of overcharge without
>returning to the userspace will be throttled synchronously. The value
>MEMCG_CHARGE_BATCH is a bit arbitrary but most of other places in the
>memcg codebase uses this constant therefore for now uses the same one.

Note that mem_cgroup_handle_over_high() has its own allocator throttling grace 
period, where it bails out if the penalty to apply is less than 10ms. The 
reclaim will still happen, though. So throttling might not happen even for 
roughly MEMCG_CHARGE_BATCH-sized allocations, depending on the overall size of 
the cgroup and its protection.

>Signed-off-by: Shakeel Butt <shakeelb@google.com>
>---
>Changes since v1:
>- Based on Roman's comment simply the sync enforcement and only target
>  the extreme cases.
>
> mm/memcontrol.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 292b0b99a2c7..0da4be4798e7 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -2703,6 +2703,11 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
> 		}
> 	} while ((memcg = parent_mem_cgroup(memcg)));
>
>+	if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
>+	    !(current->flags & PF_MEMALLOC) &&
>+	    gfpflags_allow_blocking(gfp_mask)) {
>+		mem_cgroup_handle_over_high();

Thanks, I was going to comment on v1 that I prefer to keep the implementation 
of mem_cgroup_handle_over_high if possible since we know that the mechanism has 
been safe in production over the past few years.

One question I have is about throttling. It looks like this new 
mem_cgroup_handle_over_high callsite may mean that throttling is invoked more 
than once on a misbehaving workload that's failing to reclaim since the 
throttling could be invoked both here and in return to userspace, right? That 
might not be a problem, but we should think about the implications of that, 
especially in relation to MEMCG_MAX_HIGH_DELAY_JIFFIES.

Maybe we should record if throttling happened previously and avoid doing it 
again for this entry into kernelspace? Not certain that's the right answer, but 
we should think about what the new semantics should be.

>+	}
> 	return 0;
> }
>
>-- 
>2.35.1.265.g69c8d7142f-goog
>