From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC02CC433FE for ; Thu, 13 Oct 2022 04:18:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AA446B0071; Thu, 13 Oct 2022 00:18:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0590E6B0073; Thu, 13 Oct 2022 00:18:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3BA16B0074; Thu, 13 Oct 2022 00:18:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CB18A6B0071 for ; Thu, 13 Oct 2022 00:18:36 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9A529161003 for ; Thu, 13 Oct 2022 04:18:36 +0000 (UTC) X-FDA: 80014619832.17.DD4913C Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 43F0D80025 for ; Thu, 13 Oct 2022 04:18:36 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id h186-20020a636cc3000000b0045a1966a975so382785pgc.5 for ; Wed, 12 Oct 2022 21:18:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zRvbOdnLFkkO1H+l0YKZdetTiBnUb46NxgSTP+dtmWw=; b=UHRwCHPfL9O7ChsvIKIHzncJ2N3Fjdrq0hlgsp1EwwDR2LNEVm5/FrMmvck2ZjQqDc QO7XNuUb3Or1ixE3y5dL1xON/UZnZE+I0yt1y9b7QFGB/Q6kVb9aREbKxAgtbYMlIheL kgEqGxFSMKdRWmm7zjRn6D+Wmlau2q8wiHgY8X0e555V4tJ72xR+jX4AlRGYaJkyU7ke HXibMWsW8IrgyjHqsw+sZr9M9aVnLuSyFalJWPSuS0HSNLOj1NnAcsQayv7AxTPd1rms rmVtNoHsurRukqtMwk2UDi9TkLaL7sbTXklc/w3CDAk+P4aTlw1M9d1g3TEfT0McpP56 aagQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zRvbOdnLFkkO1H+l0YKZdetTiBnUb46NxgSTP+dtmWw=; b=4tMKIQbQg3IiWPz+bHV6mCKhsB+gf8PJd+g26cI61DYK6GrGHiTyCUv50Tx6Ccb9Sv 1yMFzkmeuxNY6XQ3a0Cb10MCHg9rIHDtRvBrYv8tYI74Gg/NvlipgQ3q2fWWIt0K9Qou 6DSTGr3OiZenjbB7cjibkuZ+zkV1m6OsRA5ZQe/WgfMhRnI7ReY3ZtvWOMJXUOYN1dsv rcH3yGXug3Tqqzys71eJ7kBOzncGDQfqvwG60zgbQQ/UkIsykZ/IwBHkLIpGXUv8ayh/ 4uqaU81kJXUds7bhGix3Eg65p7aTdZhV0yg3mkgLvnMmKkRNDiM+7hvtIKs5VOHQXhod 5kCg== X-Gm-Message-State: ACrzQf3NT+0HmhJwvIf2cgTQuXxQ4M0WtYwr7eXdn5r3aivSFLxQCoHJ 3ocmWMOq6V2n4n4S0NqMW9agZxOn0wsjhA== X-Google-Smtp-Source: AMsMyM6DXcyFsUJiVLXKV0d1YWYVYDoajBGEzNpfPGTooN0zpPbM5rFjzXpN02Uf/C4PaKrSUw/WwCyfDEGs6Q== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a17:902:ecc2:b0:17f:9022:e00b with SMTP id a2-20020a170902ecc200b0017f9022e00bmr33674660plh.87.1665634715170; Wed, 12 Oct 2022 21:18:35 -0700 (PDT) Date: Thu, 13 Oct 2022 04:18:33 +0000 In-Reply-To: Mime-Version: 1.0 References: <20210817194003.2102381-1-weiwan@google.com> <20221012163300.795e7b86@kernel.org> <20221012173825.45d6fbf2@kernel.org> <20221013005431.wzjurocrdoozykl7@google.com> <20221012184050.5a7f3bde@kernel.org> <20221012201650.3e55331d@kernel.org> <20221012204941.3223d205@kernel.org> Message-ID: <20221013041833.rhifxw4gqwk4ofi2@google.com> Subject: Re: [PATCH net-next] net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem() From: Shakeel Butt To: Wei Wang Cc: Jakub Kicinski , Eric Dumazet , netdev@vger.kernel.org, "David S . Miller" , cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Content-Type: text/plain; charset="us-ascii" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665634716; a=rsa-sha256; cv=none; b=Btyve4UiUMlOeSmjS43HGyAR+vp80/EEyuoHlKKSUjoDb5eJ+5nFxAv6kLky5FI0B2lx4U Ag0z/8AEFHNoT4bD3j7CQSCN5NdVlBr3wx5ITwSmT1qVMaUhNl+G+qyQ6LJSOlp7w6XuTR Tksh8MsBpGry/QsxOPuCz/yYE3464kg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UHRwCHPf; spf=pass (imf30.hostedemail.com: domain of 3m5FHYwgKCPwwleoiipfksskpi.gsqpmry1-qqozego.svk@flex--shakeelb.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3m5FHYwgKCPwwleoiipfksskpi.gsqpmry1-qqozego.svk@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665634716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zRvbOdnLFkkO1H+l0YKZdetTiBnUb46NxgSTP+dtmWw=; b=Gai1BF+Cp1zQyOg4tmqSln3YoD3PPibbQ7v042hs0CpMObHP3oqQ+MCbtiTfbnnHL/Tvjm OpcnUjt8eGAb6vg7re/lhrxWMzgiQiozBgkf3HJKjgVMPumsN/9M1OjHdWFM3lsQGwXwBR jZ6jDiT+RNc7c2GJ73SrwpGlE/rqKLw= X-Stat-Signature: d53phgyyjxht4zedrct6mk5i6st18od4 X-Rspamd-Queue-Id: 43F0D80025 X-Rspam-User: X-Rspamd-Server: rspam08 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UHRwCHPf; spf=pass (imf30.hostedemail.com: domain of 3m5FHYwgKCPwwleoiipfksskpi.gsqpmry1-qqozego.svk@flex--shakeelb.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3m5FHYwgKCPwwleoiipfksskpi.gsqpmry1-qqozego.svk@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1665634716-288896 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 12, 2022 at 09:04:59PM -0700, Wei Wang wrote: > On Wed, Oct 12, 2022 at 8:49 PM Jakub Kicinski wrote: > > > > On Wed, 12 Oct 2022 20:34:00 -0700 Wei Wang wrote: > > > > I pushed this little nugget to one affected machine via KLP: > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index 03ffbb255e60..c1ca369a1b77 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -7121,6 +7121,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, > > > > return true; > > > > } > > > > > > > > + if (gfp_mask == GFP_NOWAIT) { > > > > + try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages); > > > > + refill_stock(memcg, nr_pages); > > > > + } > > > > return false; > > > > } > > > > > > > AFAICT, if you force charge by passing __GFP_NOFAIL to try_charge(), > > > you should return true to tell the caller that the nr_pages is > > > actually being charged. > > > > Ack - not sure what the best thing to do is, tho. Always pass NOFAIL > > in softirq? > > > > It's not clear to me yet why doing the charge/uncharge actually helps, > > perhaps try_to_free_mem_cgroup_pages() does more when NOFAIL is passed? > > > I am curious to know as well. > > > I'll do more digging tomorrow. > > > > > Although I am not very sure what refill_stock() does. Does that > > > "uncharge" those pages? > > > > I think so, I copied it from mem_cgroup_uncharge_skmem(). I think I understand why this issue start happening after this patch. The memcg charging happens in batches of 32 (64 nowadays) pages even if the charge request is less. The remaining pre-charge is cached in the per-cpu cache (or stock). With (GFP_NOWAIT | __GFP_NOFAIL), you let the memcg go over the limit without triggering oom-kill and then refill_stock just put the pre-charge in the per-cpu cache. So, the later allocation/charge succeed from the per-cpu cache even though memcg is over the limit. So, with this patch we no longer force charge and then uncharge on failure, so the later allocation/charge fail similarly. Regarding what is the right thing to do, IMHO, is to use GFP_ATOMIC instead of GFP_NOWAIT. If you see the following comment in try_charge_memcg(), we added this exception particularly for these kind of situations. ... /* * Memcg doesn't have a dedicated reserve for atomic * allocations. But like the global atomic pool, we need to * put the burden of reclaim on regular allocation requests * and let these go through as privileged allocations. */ if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH))) return -ENOMEM; ... Shakeel