From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60AACECAAD8 for ; Fri, 23 Sep 2022 07:45:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEEE080009; Fri, 23 Sep 2022 03:45:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A9F0D80007; Fri, 23 Sep 2022 03:45:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9665E80009; Fri, 23 Sep 2022 03:45:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 84EB180007 for ; Fri, 23 Sep 2022 03:45:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4FC62409E9 for ; Fri, 23 Sep 2022 07:45:08 +0000 (UTC) X-FDA: 79942564296.01.3E86005 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf30.hostedemail.com (Postfix) with ESMTP id E0B3F80009 for ; Fri, 23 Sep 2022 07:45:07 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id AFBD5218FA; Fri, 23 Sep 2022 07:45:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1663919106; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6f+HZS/DDCke26XP72uI0aj2polwGJmTgO/VcYRdyN4=; b=c11PTcBSOgXLHWwqQ+O14Z0b4UPZhBejONfGft2EFpfzFF/ME9Z8NK05ejOJBgvTDYj5qh unyh7cdLOmGV+gzZQb6orIAz8gj4mabsZIz7V5gsPMeEwc1+RCM407mt48ooiasXkayqko T6RRBjDTZ5L8s6G+OuJkFE3jsFYMhrw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9F79A13A00; Fri, 23 Sep 2022 07:45:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /rgmJgJkLWP6dQAAMHmgww (envelope-from ); Fri, 23 Sep 2022 07:45:06 +0000 Date: Fri, 23 Sep 2022 09:45:06 +0200 From: Michal Hocko To: David Rientjes Cc: Gang Li , Zefan Li , Tejun Heo , Johannes Weiner , Andrew Morton , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH v1] mm: oom: introduce cpuset oom Message-ID: References: <20220921064710.89663-1-ligang.bdlg@bytedance.com> <18621b07-256b-7da1-885a-c96dfc8244b6@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <18621b07-256b-7da1-885a-c96dfc8244b6@google.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663919108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6f+HZS/DDCke26XP72uI0aj2polwGJmTgO/VcYRdyN4=; b=RTDJQELNCjKz35LrcIH7tMyW198pgSNGH4o+r6V8r1vWeFptvQYdfzmX12MQLOzHaoCgJv Ac1KeR+g45a8waCHQambR0hhuFJHt3bYnTQgHPZrF6+k58B30kUDrdFOMNIxYKkt5H3KtD HsA/g5UXLJKG1QiF+dYpTaCU5xBf6Q4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=c11PTcBS; spf=pass (imf30.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663919108; a=rsa-sha256; cv=none; b=BSVgX4aSMKQec9YeIVqhmoObblAOes8ZXeREFeuuBLTEXdAhL/htpT0cqQ62IT2qoyMws9 h7WpbqQistc9tHzVFkF6zck7NMoqwK+QsYKvJxE3lDKWzJlgwiJzl+sAq/odbkbavZe2G6 s9nTP3wohTVB0mi7pkqohbCb1F2HBo0= X-Stat-Signature: 7tgmxnxxp8oc169q45y854ju4tfc7u33 X-Rspamd-Queue-Id: E0B3F80009 X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=c11PTcBS; spf=pass (imf30.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspamd-Server: rspam03 X-HE-Tag: 1663919107-768093 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 22-09-22 12:18:04, David Rientjes wrote: > On Wed, 21 Sep 2022, Gang Li wrote: > > > cpuset confine processes to processor and memory node subsets. > > When a process in cpuset triggers oom, it may kill a completely > > irrelevant process on another numa node, which will not release any > > memory for this cpuset. > > > > It seems that `CONSTRAINT_CPUSET` is not really doing much these > > days. Using CONSTRAINT_CPUSET, we can easily achieve node aware oom > > killing by selecting victim from the cpuset which triggers oom. > > > > Suggested-by: Michal Hocko > > Signed-off-by: Gang Li > > Hmm, is this the right approach? > > If a cpuset results in a oom condition, is there a reason why we'd need to > find a process from within that cpuset to kill? I think the idea is to > free memory on the oom set of nodes (cpuset.mems) and that can happen by > killing a process that is not a member of this cpuset. I would argue that the current cpuset should be considered first because chances are that it will already have the biggest memory consumption from the constrained NUMA nodes. At least that would be the case when cpusets are used to partition the system into exclusive NUMA domains. Situation gets more complex with overlapping nodemasks in different cpusets but I believe our existing semantic sucks already for those usecases already because we just shoot a random process with an unknown amount of memory allocated from the constrained nodemask. This new semantic is not much worse. We could find a real oom victim under a different cpuset but the current semantic could as well kill a large memory consumer with a tiny footprint on the target node. With the cpuset view the potential damage is more targeted in many cases. > I understand the challenges of creating a NUMA aware oom killer to target > memory that is actually resident on an oom node, but this approach doesn't > seem right and could actually lead to pathological cases where a small > process trying to fork in an otherwise empty cpuset is repeatedly oom > killing when we'd actually prefer to kill a single large process. Yeah, that is possible and something to consider. One way to go about that is to make the selection from all cpusets with an overlap with the requested nodemask (probably with a preference to more constrained ones). In any case let's keep in mind that this is a mere heuristic. We just need to kill some process, it is not really feasible to aim for the best selection. We should just try to reduce the harm. Our exisiting cpuset based OOM is effectivelly random without any clear relation to cpusets so I would be open to experimenting in this area. -- Michal Hocko SUSE Labs