From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99778C433EF for ; Thu, 24 Feb 2022 05:28:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F6908D0002; Thu, 24 Feb 2022 00:28:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 17FA18D0001; Thu, 24 Feb 2022 00:28:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3A648D0002; Thu, 24 Feb 2022 00:28:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id C47E58D0001 for ; Thu, 24 Feb 2022 00:28:01 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5986D2152B for ; Thu, 24 Feb 2022 05:28:01 +0000 (UTC) X-FDA: 79176541962.07.FFB23ED Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf29.hostedemail.com (Postfix) with ESMTP id 025CC120004 for ; Thu, 24 Feb 2022 05:27:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645680480; x=1677216480; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=YVsT259EOLtTsjN4BQ2T1bm1qt//VIAGtHU+4hwOobI=; b=Fgwf/upKIRa/gfPIEdIuhzOg8z22J2rksuWV6rtVMuJO+1tCg18KVA5P eH2gDq/8nI0BdJRE8WMKWFOWqjQ3E8T1oHi5kFFKtizhemO/DMxKCFlYv KD9a+l/uYW0UuJIiLgENqdd7Zdza/GGLsBzSus8jYohVawhPdIszJ/5WK sk8NY0fi1Qh2uj+FXAPZclyRVsLlaLgQISzZvJZJHgjYxdOCTbTimx7yV tNGdzLxjJZW+2L/TxsZvKF4YXfSSVmjR/5ahmfv+vvt+z71kuYBUTaEuC ri2Bn2tr9bR+zfoLGeMQJAYFfTPJkRUfLgeT0iN9QGMzawm/20DGiJyEY Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10267"; a="312870246" X-IronPort-AV: E=Sophos;i="5.88,393,1635231600"; d="scan'208";a="312870246" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 21:27:42 -0800 X-IronPort-AV: E=Sophos;i="5.88,393,1635231600"; d="scan'208";a="684160350" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.11]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 21:27:35 -0800 From: "Huang, Ying" To: Yu Zhao Cc: Andrew Morton , Johannes Weiner , Mel Gorman , Michal Hocko , Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Michael Larabel , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Linux ARM , "open list:DOCUMENTATION" , linux-kernel , Linux-MM , Kernel Page Reclaim v2 , "the arch/x86 maintainers" , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , Holger =?utf-8?Q?Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh Subject: Re: [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation References: <20220208081902.3550911-1-yuzhao@google.com> <20220208081902.3550911-6-yuzhao@google.com> <87bkyy56nv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y2213wrl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h78p3pp2.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 24 Feb 2022 13:27:33 +0800 In-Reply-To: (Yu Zhao's message of "Wed, 23 Feb 2022 21:09:56 -0700") Message-ID: <87a6eg4ywq.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 025CC120004 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Fgwf/upK"; spf=none (imf29.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: 99t65sot4swwqu4af7b6biz9iuqgazm5 X-HE-Tag: 1645680479-805015 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yu Zhao writes: > On Wed, Feb 23, 2022 at 8:32 PM Huang, Ying wrote: >> >> Yu Zhao writes: >> >> > On Wed, Feb 23, 2022 at 5:59 PM Huang, Ying wrote: >> >> >> >> Yu Zhao writes: >> >> >> >> > On Wed, Feb 23, 2022 at 1:28 AM Huang, Ying wrote: >> >> >> >> >> >> Hi, Yu, >> >> >> >> >> >> Yu Zhao writes: >> >> >> >> >> >> > To avoid confusions, the terms "promotion" and "demotion" will be >> >> >> > applied to the multigenerational LRU, as a new convention; the terms >> >> >> > "activation" and "deactivation" will be applied to the active/inactive >> >> >> > LRU, as usual. >> >> >> >> >> >> In the memory tiering related commits and patchset, for example as follows, >> >> >> >> >> >> commit 668e4147d8850df32ca41e28f52c146025ca45c6 >> >> >> Author: Yang Shi >> >> >> Date: Thu Sep 2 14:59:19 2021 -0700 >> >> >> >> >> >> mm/vmscan: add page demotion counter >> >> >> >> >> >> https://lore.kernel.org/linux-mm/20220221084529.1052339-1-ying.huang@intel.com/ >> >> >> >> >> >> "demote" and "promote" is used for migrating pages between different >> >> >> types of memory. Is it better for us to avoid overloading these words >> >> >> too much to avoid the possible confusion? >> >> > >> >> > Given that LRU and migration are usually different contexts, I think >> >> > we'd be fine, unless we want a third pair of terms. >> >> >> >> This is true before memory tiering is introduced. In systems with >> >> multiple types memory (called memory tiering), LRU is used to identify >> >> pages to be migrated to the slow memory node. Please take a look at >> >> can_demote(), which is called in shrink_page_list(). >> > >> > This sounds clearly two contexts to me. Promotion/demotion (move >> > between generations) while pages are on LRU; or promotion/demotion >> > (migration between nodes) after pages are taken off LRU. >> > >> > Note that promotion/demotion are not used in function names. They are >> > used to describe how MGLRU works, in comparison with the >> > active/inactive LRU. Memory tiering is not within this context. >> >> Because we have used pgdemote_* in /proc/vmstat, "demotion_enabled" in >> /sys/kernel/mm/numa, and will use pgpromote_* in /proc/vmstat. It seems >> better to avoid to use promote/demote directly for MGLRU in ABI. A >> possible solution is to use "mglru" and "promote/demote" together (such >> as "mglru_promote_*" when it is needed? > > *If* it is needed. Currently there are no such plans. OK. >> >> >> > +static int get_swappiness(struct mem_cgroup *memcg) >> >> >> > +{ >> >> >> > + return mem_cgroup_get_nr_swap_pages(memcg) >= MIN_LRU_BATCH ? >> >> >> > + mem_cgroup_swappiness(memcg) : 0; >> >> >> > +} >> >> >> >> >> >> After we introduced demotion support in Linux kernel. The anonymous >> >> >> pages in the fast memory node could be demoted to the slow memory node >> >> >> via the page reclaiming mechanism as in the following commit. Can you >> >> >> consider that too? >> >> > >> >> > Sure. How do I check whether there is still space on the slow node? >> >> >> >> You can always check the watermark of the slow node. But now, we >> >> actually don't check that (as in demote_page_list()), instead we will >> >> wake up kswapd of the slow node. The intended behavior is something >> >> like, >> >> >> >> DRAM -> PMEM -> disk >> > >> > I'll look into this later -- for now, it's a low priority because >> > there isn't much demand. I'll bump it up if anybody is interested in >> > giving it a try. Meanwhile, please feel free to cook up something if >> > you are interested. >> >> When we introduce a new feature, we shouldn't break an existing one. >> That is, not introducing regression. I think that it is a rule? >> >> If my understanding were correct, MGLRU will ignore to scan anonymous >> page list even if there's demotion target for the node. This breaks the >> demotion feature in the upstream kernel. Right? > > I'm not saying this shouldn't be fixed. I'm saying it's a low priority > until somebody is interested in using/testing it (or making it work). We are interested in this feature and can help to test it. > Regarding regressions, I'm sure MGLRU *will* regress many workloads. > Its goal is to improve the majority of use cases, i.e., total net > gain. Trying to improve everything is methodically wrong because the > problem space is near infinite but the resource is limited. So we have > to prioritize major use cases over minor ones. The bottom line is > users have a choice not to use MGLRU. This is a functionality regression, not performance regression. Without demotion support, some workloads will go OOM when DRAM is used up (while PMEM isn't) if PMEM is onlined in movable zone (as recommended). >> It's a new feature to check whether there is still space on the slow >> node. We can look at that later. > > SGTM. Best Regards, Huang, Ying