From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00C69C4338F for ; Thu, 19 Aug 2021 15:01:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 929F560E93 for ; Thu, 19 Aug 2021 15:01:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 929F560E93 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 0F3D38D0001; Thu, 19 Aug 2021 11:01:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A4346B0071; Thu, 19 Aug 2021 11:01:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED56C8D0001; Thu, 19 Aug 2021 11:01:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id D0D716B006C for ; Thu, 19 Aug 2021 11:01:41 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 410C3267CD for ; Thu, 19 Aug 2021 15:01:41 +0000 (UTC) X-FDA: 78492144402.40.5070D49 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id A85A83001DB3 for ; Thu, 19 Aug 2021 15:01:40 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4F16B22035; Thu, 19 Aug 2021 15:01:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1629385299; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZQw3aQLXjbGr24aoUzdui2AtIu+xn1XQqrKU4F2ntFA=; b=e0Gy5MxNWI0jxmMgbRDddx64OBNlnVKh6x2MNkmWrtDZn8njNPyac2b4yfRE1B26nwA/GB RKkpeQ31uiHC+/i5OWborImMWZPMA9jeGQ6czSwfnNHvIEzHFUZDBJBZnqmGDZDSbzxzfM wdj7yVvsCFBSAACHHxca877k1HIi7+8= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 15B2DA3BA2; Thu, 19 Aug 2021 15:01:39 +0000 (UTC) Date: Thu, 19 Aug 2021 17:01:38 +0200 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Leon Yang , Chris Down , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Message-ID: References: <20210817180506.220056-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210817180506.220056-1-hannes@cmpxchg.org> Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=e0Gy5MxN; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Stat-Signature: h9mexiwqq5brux4jtgs4u1ighax3noyz X-Rspamd-Queue-Id: A85A83001DB3 X-Rspamd-Server: rspam01 X-HE-Tag: 1629385300-71001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 17-08-21 14:05:06, Johannes Weiner wrote: > We've noticed occasional OOM killing when memory.low settings are in > effect for cgroups. This is unexpected and undesirable as memory.low > is supposed to express non-OOMing memory priorities between cgroups. > > The reason for this is proportional memory.low reclaim. When cgroups > are below their memory.low threshold, reclaim passes them over in the > first round, and then retries if it couldn't find pages anywhere else. > But when cgroups are slighly above their memory.low setting, page scan > force is scaled down and diminished in proportion to the overage, to > the point where it can cause reclaim to fail as well - only in that > case we currently don't retry, and instead trigger OOM. > > To fix this, hook proportional reclaim into the same retry logic we > have in place for when cgroups are skipped entirely. This way if > reclaim fails and some cgroups were scanned with dimished pressure, > we'll try another full-force cycle before giving up and OOMing. > > Reported-by: Leon Yang > Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Although I have to say that the code is quite tricky and it deserves more comments. See below. [...] > @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * hard protection. > */ > unsigned long cgroup_size = mem_cgroup_size(memcg); > + unsigned long protection; > + > + /* memory.low scaling, make sure we retry before OOM */ > + if (!sc->memcg_low_reclaim && low > min) { > + protection = low; > + sc->memcg_low_skipped = 1; > + } else { > + protection = min; > + } Just by looking at this in isolation one could be really curious how does this not break the low memory protection altogether. The logic is spread over 3 different places. Would something like the following be more understandable? /* * Low limit protected memcgs are already excluded at * a higher level (shrink_node_memcgs) but scaling * down the reclaim target can result in hard to * reclaim and premature OOM. We do not have a full * picture here so we cannot really judge this * sutuation here but pro-actively flag this scenario * and let do_try_to_free_pages to retry if * there is no progress. */ > > /* Avoid TOCTOU with earlier protection check */ > cgroup_size = max(cgroup_size, protection); > -- > 2.32.0 -- Michal Hocko SUSE Labs