From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5977C4332F for ; Mon, 12 Dec 2022 08:36:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 475908E0003; Mon, 12 Dec 2022 03:36:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 426328E0002; Mon, 12 Dec 2022 03:36:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C7B98E0003; Mon, 12 Dec 2022 03:36:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1941C8E0002 for ; Mon, 12 Dec 2022 03:36:53 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D78FEA0A1B for ; Mon, 12 Dec 2022 08:36:52 +0000 (UTC) X-FDA: 80232998664.27.89DB1C6 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf10.hostedemail.com (Postfix) with ESMTP id 25C27C0008 for ; Mon, 12 Dec 2022 08:36:50 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=GB3dqhgs; spf=pass (imf10.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670834211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u46k2xaiUNLGH3vDTOujfb61ua4D+uOnX415TQsmtwE=; b=sakfcb1nbpAnM42ZdT6w9Tcfx+WNfQUR2RlrLrzflWj6Nn5+hUgw3QTsLzf8AMBzqolSlj 8TAyOcTQq7AHtCEWh4gA5/njyLBaVJs1TlbgjAfQ5mVqEM0imbD64OS7Ma8TUMglDi+h/m zz06Bu2Z6JwuDyQJeAWMEOEar8f2afM= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=GB3dqhgs; spf=pass (imf10.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670834211; a=rsa-sha256; cv=none; b=cFBMoI0jZRvzbCHv2DS2kC0gaY4qKGDx8vLKwFLmnpZKpfFuVLdIhszVth358EzpEc1aVF gD7e3W//nQ+6tCfPxN3vzjq4bK6yslafUheJRi5UfnsAP5Gqvpqle8igznjANx87f7Yb9u uMe1SxD4M2j2F5tZOA9A/aaKWHLGJPc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DA68A1FE1F; Mon, 12 Dec 2022 08:36:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670834209; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=u46k2xaiUNLGH3vDTOujfb61ua4D+uOnX415TQsmtwE=; b=GB3dqhgsRsmeLo2HX6JezYcgTd5KAOO+DTX1icxK06xXaZjxyZpX5KAGvZmPIV2/7bJQbb mu4hlkmZ/kRDYfVaAmIwG6vRJl3uc9Z5Fwf5iYyjZNAxXMBOjcIJmpCzfj7Pb57hjYaQ0U 3uu/K7ElW1cazqwxoTuzQ7QqMvNY9nY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B1E5B138F3; Mon, 12 Dec 2022 08:36:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 5w2CKCHolmPnDgAAMHmgww (envelope-from ); Mon, 12 Dec 2022 08:36:49 +0000 Date: Mon, 12 Dec 2022 09:36:49 +0100 From: Michal Hocko To: Wei Xu Cc: Mina Almasry , Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Huang Ying , Yang Shi , Yosry Ahmed , fvdl@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: a5gs8t6o5i98irqitjuafnzn1akt7s5r X-Rspam-User: X-Rspamd-Queue-Id: 25C27C0008 X-Rspamd-Server: rspam06 X-HE-Tag: 1670834210-807279 X-HE-Meta: U2FsdGVkX19WNFAoEAZ8aQ7putLOKo8dBrwfQS+oReSeEerNEyNfP4to5Za351rWOLKLigXYw0rxj2NANmetkD20tR57Td8Pb+V/bCehI/mlckj7LXEiAAe+VMYY5Jr5ZEX7L9v1mvenhLWJs7sD1GodD1aWoBHgFflnU4BZJUnBCbZ69jxL1kIs8bw2y3bTgZ2kxrd8cTfcY9jhi2hmjBA0J4FwIrGiJgQFgLQsju5ATFry9ut2Lrp5rrxBYLB9koL2DrTtqolD2UlCnWOALR4YMLbUbhrZU0zxzj1SeCHokCCvNhumCzDvuzxZGmcwN5iMOwvjKQ3SBJwU24yy61DFOr6Nx4buTcWtFNRA110NSeL7ZbzXTSwnUFqQdGW6/npBvWLvetJ5mRpaD6jUc0NnCbsY02tjZFQs2gLgT2jLUw4ncSCutBfzceG9dxlTxBUJbVm8U+2SFeNCpsnF8qmwLvEBa/Vbq6ucbeLirQ1iHc/I2wX1JDXcDs3rTT5RoqZ6Ah6zb0pN/0HUZAiOl85lUchmGn4DKozQwljDOYHmZ6p8Rf1noZ1GJJKN42XkmpxxqXA/HjSx3Bkr71LfqZw7my39j+QsrmAylG0OoaKJ6fn5hZMIsDhjwPEZJLcRcbsMm1G/oo3KFPw/TCjVzStEugDH29/0maq+9ap9jHPqh9jbTZUQXPldeZCLEmyOoLcoXDYEe/trT1jDzLZqUVkVwqBhI5+vl7d9iENd9jbMljqYdVkqE2A6XosFk5zwcWjYKzMWGKVvP9oLjjPZ+Qp0tOWJmqlvFOscFA9oe7y2eOaNrW+Budxb20RrMWCBD51vSxDSAuNOI9QqsLxxJqV7qGEU0mlIeSk/Dok90HAzH7KrppORog== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat 10-12-22 00:01:28, Wei Xu wrote: > On Fri, Dec 9, 2022 at 1:16 PM Michal Hocko wrote: > > > > On Fri 09-12-22 08:41:47, Wei Xu wrote: > > > On Fri, Dec 9, 2022 at 12:08 AM Michal Hocko wrote: > > > > > > > > On Thu 08-12-22 16:59:36, Wei Xu wrote: > > > > [...] > > > > > > What I really mean is to add demotion nodes to the nodemask along with > > > > > > the set of nodes you want to reclaim from. To me that sounds like a > > > > > > more natural interface allowing for all sorts of usecases: > > > > > > - free up demotion targets (only specify demotion nodes in the mask) > > > > > > - control where to demote (e.g. select specific demotion target(s)) > > > > > > - do not demote at all (skip demotion nodes from the node mask) > > > > > > > > > > For clarification, do you mean to add another argument (e.g. > > > > > demotion_nodes) in addition to the "nodes" argument? > > > > > > > > No, nodes=mask argument should control the domain where the memory > > > > reclaim should happen. That includes both aging and the reclaim. If the > > > > mask doesn't contain any lower tier node then no demotion will happen. > > > > If only a subset of lower tiers are specified then only those could be > > > > used for the demotion process. Or put it otherwise, the nodemask is not > > > > only used to filter out zonelists during reclaim it also restricts > > > > migration targets. > > > > > > > > Is this more clear now? > > > > > > In that case, how can we request demotion only from toptier nodes > > > (without counting any reclaimed bytes from other nodes), which is our > > > memory tiering use case? > > > > I am not sure I follow. Could you be more specific please? > > In our memory tiering use case, we would like to proactively free up > memory on top-tier nodes by demoting cold pages to lower-tier nodes. > This is to create enough free top-tier memory for new allocations and > promotions. How many pages and how often to demote from top-tier > nodes can depend on a number of factors (e.g. the amount of free > top-tier memory, the amount of cold pages, the bandwidth pressure on > lower-tier, the task tolerance of slower memory on performance) and > are controlled by the userspace policies. > > Because the purpose of such proactive demotions is to free up top-tier > memory, not to lower the amount of memory charged to the memcg, we'd > like that memory.reclaim can demote the specified amount of bytes from > the given top-tier nodes. If we have to also provide the lower-tier > nodes to memory.reclaim to allow demotions, the kernel can reclaim > from the lower-tier nodes in the same memory.reclaim request. We then > won't be able to control the amount of bytes to be demoted from > top-tier nodes. I am not sure this is something to be handled by the reclaim interface because now you are creating an ambiguity what the interface should do and start depend on it. Consider that we will change the reclaim algorithm in the future and the node you request to demote will simply reclaim rather than demote. This will break your usecase, right? -- Michal Hocko SUSE Labs