From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DA3E9C433EF for ; Wed, 16 Mar 2022 07:56:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=w9IN042CAmdoQFarVtWFlyHgH7LHX58Zh2Kncy9wRS8=; b=zY3zYGUHq0QTsf 12Ut8tIg7850jK1P35IZ0icd8S+quuemF0KGlNCtKHqCKuJQm9iKorJi0bxMQfPnNEDwh0bxUuD7L m0flIm+P33Vc4rajYgW0OgVvwAKwQcnSy97vlUkIpQDyrkOsVwrMmUCZ/qyqwfrs2vhPlmPpwMbk+ A1FyuQoLuC7N8WeufkVzosQiTijg8rwfZ1ectVLyYeXcqiYYLmhvvi+NjbX/Etguxr9RdoAtNdrxa cD+1zvSeTS/bAGCisBPLPTggTInqjAaJnRPfGtR5BkJqwAylprbbc9jKPWsMviV2pzBiHJMGuLygK l6bTn6K4tbh1CKpecw8g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUOUk-00C29s-H9; Wed, 16 Mar 2022 07:54:58 +0000 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUOUf-00C27q-B2 for linux-arm-kernel@lists.infradead.org; Wed, 16 Mar 2022 07:54:55 +0000 Received: by mail-vs1-xe2f.google.com with SMTP id a186so1374377vsc.3 for ; Wed, 16 Mar 2022 00:54:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=b5gTRuko97eI/fthkcfnm9gayEQtvFpZFqZB8tduF7c=; b=WEe9+Bxpxgg10owxznjLHnSwYsxADD5OwT5rKXFFA++k6khG8td0tx9nt5rmyU+hKE gyqj+Vj4rThPsgWeQqN/nUwF+EhGps1cOTqW8y8kn3HflTz0/RlrfHsoJrXexBqRxRSv 4w2LSS6KYHffsXK+CNuyGjSqRUiaCtYTHvkfNn7VhTOzljys9we1ZgXrh83Z+WEQgPtp LzKIRHetzS+NyotLb5RgaeAJJd1KTkWIX3t2A5f5I0PHf4RVEj9ccb7uyFEPm30pJN7t uQqAlWqe7whNwPfU2GEq5GzeaTP6oh3fHKjQRM3PBFASqZ1d5W0d8KeWPlIzlthazodb otgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=b5gTRuko97eI/fthkcfnm9gayEQtvFpZFqZB8tduF7c=; b=suT0J7Ywjz1F7DARkKHTxAoRKLJ1H9/w7SdjR8L7uF4LtrI0HifLH4And5AVJu/6Er jY7hNoTxz53tYRDh3TFMl9jIixxSUx1gaoOMTKUK4nrkuNj4ShbD2iDsIQgQN1trgjRl jvRPIZB3UUMQA6EuivLlmVt34bNs8pXr9h1Lk1GKZYWHP+h3xfyRF1Nm/H8VOyNDBe60 pFIcLx6s5HeMfp2JxvaqODYsEhQ+0znFjphtNh7LT7XJYvGGKxSTvhides1uaRBcLaPS /VpwpqaR/H38Kml5gW7Y5lImSTpAC3Tp7aYXlTSE+x4b4d43dBqJY4kx/qC8HC0475Sl DecA== X-Gm-Message-State: AOAM530q5f3GeYuhvMYcZ/Scqw70ocYseK13PPDaYqNitkT4BhxY58TX YFIruy10wgD+XsJWdni0I0nSd5PabhdQPvOARyqpPg== X-Google-Smtp-Source: ABdhPJwGuAArwozEUZv8gKaGEUnT0xpDpZgIxvxFfayV12kU/WqsaR4Ko1uznCEbu2B6kiUcO73yy/9VP99wOuSBNVI= X-Received: by 2002:a05:6102:f0c:b0:320:9156:732f with SMTP id v12-20020a0561020f0c00b003209156732fmr13355818vss.6.1647417291993; Wed, 16 Mar 2022 00:54:51 -0700 (PDT) MIME-Version: 1.0 References: <20220309021230.721028-1-yuzhao@google.com> <20220309021230.721028-7-yuzhao@google.com> <87wnguwif3.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87wnguwif3.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yu Zhao Date: Wed, 16 Mar 2022 01:54:41 -0600 Message-ID: Subject: Re: [PATCH v9 06/14] mm: multi-gen LRU: minimal implementation To: "Huang, Ying" Cc: Andrew Morton , Linus Torvalds , Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Linux ARM , "open list:DOCUMENTATION" , linux-kernel , Linux-MM , Kernel Page Reclaim v2 , "the arch/x86 maintainers" , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220316_005453_430807_81048CEF X-CRM114-Status: GOOD ( 33.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Mar 15, 2022 at 11:55 PM Huang, Ying wrote: > > Hi, Yu, > > Yu Zhao writes: > > [snip] > > > > > +static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) > > +{ > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > + struct pglist_data *pgdat = lruvec_pgdat(lruvec); > > + > > + if (!can_demote(pgdat->node_id, sc) && > > + mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH) > > + return 0; > > + > > + return mem_cgroup_swappiness(memcg); > > +} > > + > > We have tested v9 for memory tiering system, the demotion works now even > without swap devices configured. Thanks! Admittedly I didn't test it :) So thanks for testing -- I'm glad to hear it didn't fall apart. > And we found that the demotion (page reclaiming on DRAM nodes) speed is > lower than the original implementation. This sounds like an improvement to me, assuming the initial hot/cold memory placements were similar for both the baseline and MGLRU. Correct me if I'm wrong: since demotion is driven by promotion, lower demotion speed means hot and cold pages were sorted out to DRAM and AEP at a faster speed, hence an improvement. # promotion path: numa_hint_faults 498301236 numa_pages_migrated 152650705 numa_hint_faults 494583387 numa_pages_migrated 34165992 # demotion path: pgsteal_anon 153798203 pgsteal_file 33 pgsteal_anon 32701576 pgsteal_file 33 The hint faults are similar but MGLRU has much fewer migrated -- my guess is it demoted much fewer hot/warm pages and therefore led to less work on the promotion path. > The workload itself is just a > memory accessing micro-benchmark with Gauss distribution. It is run on > a system with DRAM and PMEM. Initially, quite some hot pages are placed > in PMEM and quite some cold pages are placed in DRAM. Then the page > placement optimizing mechanism based on NUMA balancing will try to > promote some hot pages from PMEM node to DRAM node. My understanding seems to be correct? > If the DRAM node > near full (reach high watermark), kswapd of the DRAM node will be woke > up to demote (reclaim) some cold DRAM pages to PMEM. Because quite some > pages on DRAM is very cold (not accessed for at least several seconds), > the benchmark performance will be better if demotion speed is faster. I'm confused. It seems to me demotion speed is irrelevant. The time to reach the equilibrium is what we want to measure. > Some data comes from /proc/vmstat and perf-profile is as follows. > > From /proc/vmstat, it seems that the page scanned and page demoted is > much less with MGLRU enabled. The pgdemote_kswapd / pgscan_kswapd is > 5.22 times higher with MGLRU enabled than that with MGLRU disabled. I > think this shows the value of direct page table scanning. Can't disagree :) > From perf-profile, the CPU cycles for kswapd is same. But less pages > are demoted (reclaimed) with MGLRU. And it appears that the total page > table scanning time of MGLRU is longer if we compare walk_page_range > (1.97%, MGLRU enabled) and page_referenced (0.54%, MGLRU disabled)? It's possible if the address space is very large and sparse. But once MGLRU warms up, it should detect it and fall back to page_referenced(). > Because we only demote (reclaim) from DRAM nodes, but not demote > (reclaim) from PMEM nodes and bloom filter doesn't work well enough? The bloom filters are per lruvec. So this should affect them. > One thing that may be not friendly for bloom filter is that some virtual > pages may change their resident nodes because of demotion/promotion. Yes, it's possible. > Can you teach me to how interpret these data for MGLRU? Or can you > point me to the other/better data for MGLRU? You are the expert :) My current understanding is that this is an improvement. IOW, with MGLRU, DRAM (hot) <-> AEP (cold) reached equilibrium a lot faster. > MGLRU disabled via: echo -n 0 > /sys/kernel/mm/lru_gen/enabled > -------------------------------------------------------------- > > /proc/vmstat: > > pgactivate 1767172340 > pgdeactivate 1740111896 > pglazyfree 0 > pgfault 583875828 > pgmajfault 0 > pglazyfreed 0 > pgrefill 1740111896 > pgreuse 22626572 > pgsteal_kswapd 153796237 > pgsteal_direct 1999 > pgdemote_kswapd 153796237 > pgdemote_direct 1999 > pgscan_kswapd 2055504891 > pgscan_direct 1999 > pgscan_direct_throttle 0 > pgscan_anon 2055356614 > pgscan_file 150276 > pgsteal_anon 153798203 > pgsteal_file 33 > zone_reclaim_failed 0 > pginodesteal 0 > slabs_scanned 82761 > kswapd_inodesteal 0 > kswapd_low_wmark_hit_quickly 2960 > kswapd_high_wmark_hit_quickly 17732 > pageoutrun 21583 > pgrotated 0 > drop_pagecache 0 > drop_slab 0 > oom_kill 0 > numa_pte_updates 515994024 > numa_huge_pte_updates 154 > numa_hint_faults 498301236 > numa_hint_faults_local 121109067 > numa_pages_migrated 152650705 > pgmigrate_success 307213704 > pgmigrate_fail 39 > thp_migration_success 93 > thp_migration_fail 0 > thp_migration_split 0 > > perf-profile: > > kswapd.kthread.ret_from_fork: 2.86 > balance_pgdat.kswapd.kthread.ret_from_fork: 2.86 > shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork: 2.85 > shrink_lruvec.shrink_node.balance_pgdat.kswapd.kthread: 2.76 > shrink_inactive_list.shrink_lruvec.shrink_node.balance_pgdat.kswapd: 1.9 > shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node.balance_pgdat: 1.52 > shrink_active_list.shrink_lruvec.shrink_node.balance_pgdat.kswapd: 0.85 > migrate_pages.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node: 0.79 > page_referenced.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node: 0.54 > > > MGLRU enabled via: echo -n 7 > /sys/kernel/mm/lru_gen/enabled > ------------------------------------------------------------- > > /proc/vmstat: > > pgactivate 47212585 > pgdeactivate 0 > pglazyfree 0 > pgfault 580056521 > pgmajfault 0 > pglazyfreed 0 > pgrefill 6911868880 > pgreuse 25108929 > pgsteal_kswapd 32701609 > pgsteal_direct 0 > pgdemote_kswapd 32701609 > pgdemote_direct 0 > pgscan_kswapd 83582770 > pgscan_direct 0 > pgscan_direct_throttle 0 > pgscan_anon 83549777 > pgscan_file 32993 > pgsteal_anon 32701576 > pgsteal_file 33 > zone_reclaim_failed 0 > pginodesteal 0 > slabs_scanned 84829 > kswapd_inodesteal 0 > kswapd_low_wmark_hit_quickly 313 > kswapd_high_wmark_hit_quickly 5262 > pageoutrun 5895 > pgrotated 0 > drop_pagecache 0 > drop_slab 0 > oom_kill 0 > numa_pte_updates 512084786 > numa_huge_pte_updates 198 > numa_hint_faults 494583387 > numa_hint_faults_local 129411334 > numa_pages_migrated 34165992 > pgmigrate_success 67833977 > pgmigrate_fail 7 > thp_migration_success 135 > thp_migration_fail 0 > thp_migration_split 0 > > perf-profile: > > kswapd.kthread.ret_from_fork: 2.86 > balance_pgdat.kswapd.kthread.ret_from_fork: 2.86 > lru_gen_age_node.balance_pgdat.kswapd.kthread.ret_from_fork: 1.97 > walk_page_range.try_to_inc_max_seq.lru_gen_age_node.balance_pgdat.kswapd: 1.97 > shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork: 0.89 > evict_folios.lru_gen_shrink_lruvec.shrink_lruvec.shrink_node.balance_pgdat: 0.89 > scan_folios.evict_folios.lru_gen_shrink_lruvec.shrink_lruvec.shrink_node: 0.66 > > Best Regards, > Huang, Ying > > [snip] > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel