From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D552C433F5 for ; Tue, 12 Apr 2022 02:16:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245077AbiDLCTA (ORCPT ); Mon, 11 Apr 2022 22:19:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244844AbiDLCSs (ORCPT ); Mon, 11 Apr 2022 22:18:48 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2C4733A1B; Mon, 11 Apr 2022 19:16:32 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6FA236168C; Tue, 12 Apr 2022 02:16:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B932C385A3; Tue, 12 Apr 2022 02:16:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1649729791; bh=hxuKFNGBeqS7vQiUwSPiIBI/AqNtviIZC+add1y1ko8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=mopX5Sy7AFPqL3cDVB4gG3iNilwP9Ot+R6JYewM4MOcsc6K2vGgX3GoToffvY9nwC LhzNDzD+Zczh3PgOiZj3/YGSl79ke2DibNlXBhvqkDKYGNN5fiUsbbTuRyc9Geqq1U thiQiWMkVd6JWhndwJJT4g769mJp5YGtwyCk+dAY= Date: Mon, 11 Apr 2022 19:16:27 -0700 From: Andrew Morton To: Yu Zhao Cc: Stephen Rothwell , linux-mm@kvack.org, Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, page-reclaim@google.com, x86@kernel.org, Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?ISO-8859-1?Q? "Holger_Hoffst=E4tte" ?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Subject: Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch Message-Id: <20220411191627.629f21de83cd0a520ef4a142@linux-foundation.org> In-Reply-To: <20220407031525.2368067-11-yuzhao@google.com> References: <20220407031525.2368067-1-yuzhao@google.com> <20220407031525.2368067-11-yuzhao@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 6 Apr 2022 21:15:22 -0600 Yu Zhao wrote: > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that > can be disabled include: > 0x0001: the multi-gen LRU core > 0x0002: walking page table, when arch_has_hw_pte_young() returns > true > 0x0004: clearing the accessed bit in non-leaf PMD entries, when > CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y > [yYnN]: apply to all the components above > E.g., > echo y >/sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0007 > echo 5 >/sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0005 I'm shocked that this actually works. How does it work? Existing pages & folios are drained over time or synchrnously? Supporting structures remain allocated, available for reenablement? Why is it thought necessary to have this? Is it expected to be permanent? > NB: the page table walks happen on the scale of seconds under heavy > memory pressure, in which case the mmap_lock contention is a lesser > concern, compared with the LRU lock contention and the I/O congestion. > So far the only well-known case of the mmap_lock contention happens on > Android, due to Scudo [1] which allocates several thousand VMAs for > merely a few hundred MBs. The SPF and the Maple Tree also have > provided their own assessments [2][3]. However, if walking page tables > does worsen the mmap_lock contention, the kill switch can be used to > disable it. In this case the multi-gen LRU will suffer a minor > performance degradation, as shown previously. > > Clearing the accessed bit in non-leaf PMD entries can also be > disabled, since this behavior was not tested on x86 varieties other > than Intel and AMD. > > ... > > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -432,6 +432,18 @@ static inline void cgroup_put(struct cgroup *cgrp) > css_put(&cgrp->self); > } > > +extern struct mutex cgroup_mutex; > + > +static inline void cgroup_lock(void) > +{ > + mutex_lock(&cgroup_mutex); > +} > + > +static inline void cgroup_unlock(void) > +{ > + mutex_unlock(&cgroup_mutex); > +} It's a tad rude to export mutex_lock like this without (apparently) informing its owner (Tejun). And if we're going to wrap its operations via helper fuctions then - presumably all cgroup_mutex operations should be wrapped and - exiting open-coded operations on this mutex should be converted. > > ... > > +static bool drain_evictable(struct lruvec *lruvec) > +{ > + int gen, type, zone; > + int remaining = MAX_LRU_BATCH; > + > + for_each_gen_type_zone(gen, type, zone) { > + struct list_head *head = &lruvec->lrugen.lists[gen][type][zone]; > + > + while (!list_empty(head)) { > + bool success; > + struct folio *folio = lru_to_folio(head); > + > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > + VM_BUG_ON_FOLIO(folio_is_file_lru(folio) != type, folio); > + VM_BUG_ON_FOLIO(folio_zonenum(folio) != zone, folio); So many new BUG_ONs to upset Linus :( > + success = lru_gen_del_folio(lruvec, folio, false); > + VM_BUG_ON(!success); > + lruvec_add_folio(lruvec, folio); > + > + if (!--remaining) > + return false; > + } > + } > + > + return true; > +} > + > > ... > > +static ssize_t store_enable(struct kobject *kobj, struct kobj_attribute *attr, > + const char *buf, size_t len) > +{ > + int i; > + unsigned int caps; > + > + if (tolower(*buf) == 'n') > + caps = 0; > + else if (tolower(*buf) == 'y') > + caps = -1; > + else if (kstrtouint(buf, 0, &caps)) > + return -EINVAL; See kstrtobool() > + for (i = 0; i < NR_LRU_GEN_CAPS; i++) { > + bool enable = caps & BIT(i); > + > + if (i == LRU_GEN_CORE) > + lru_gen_change_state(enable); > + else if (enable) > + static_branch_enable(&lru_gen_caps[i]); > + else > + static_branch_disable(&lru_gen_caps[i]); > + } > + > + return len; > +} > > ... > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 88A0EC433EF for ; Tue, 12 Apr 2022 02:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Mime-Version:References:In-Reply-To: Message-Id:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Kj+yajb/Yyvzj+idMIHOP4U2nSajpXQxWsFT27+Un9g=; b=TBB1OxLhBGvF6H OBT63PV5uwRqA5DfTCciwyu2JWudamV1yeRXfL4wVpjVDZA2eSSxnbrcWU76L+UwNZnvgfx/lF5rD RAIXB/IL9UAFtNcVSEDtys92thZCfFRO7y5to+xedg7F4t1SZFAA08pF056Ae3HWHyecbE4X0K94B L65i8vyEo4KyCvyNp/lmPb1kmzb1/9jMqYEZ9JMHP8j0mhzf98TGEfkwrOagpLGUnT1jT61KSuGVX aBYIUOiXEdVbJSZbiLX55GyX5o0RvpV6wwwkqSxAF4lKAfxG1kjNuDAMq72Fs9HfvYgggga29oeVu Sp775uf8Xw9+pQoSV6xw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ne65H-00BF6S-Rd; Tue, 12 Apr 2022 02:16:48 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1ne655-00BEzI-St for linux-arm-kernel@lists.infradead.org; Tue, 12 Apr 2022 02:16:38 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 107BFB819BF; Tue, 12 Apr 2022 02:16:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B932C385A3; Tue, 12 Apr 2022 02:16:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1649729791; bh=hxuKFNGBeqS7vQiUwSPiIBI/AqNtviIZC+add1y1ko8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=mopX5Sy7AFPqL3cDVB4gG3iNilwP9Ot+R6JYewM4MOcsc6K2vGgX3GoToffvY9nwC LhzNDzD+Zczh3PgOiZj3/YGSl79ke2DibNlXBhvqkDKYGNN5fiUsbbTuRyc9Geqq1U thiQiWMkVd6JWhndwJJT4g769mJp5YGtwyCk+dAY= Date: Mon, 11 Apr 2022 19:16:27 -0700 From: Andrew Morton To: Yu Zhao Cc: Stephen Rothwell , linux-mm@kvack.org, Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, page-reclaim@google.com, x86@kernel.org, Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?ISO-8859-1?Q?"Holger_Hoffst=E4tte"?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Subject: Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch Message-Id: <20220411191627.629f21de83cd0a520ef4a142@linux-foundation.org> In-Reply-To: <20220407031525.2368067-11-yuzhao@google.com> References: <20220407031525.2368067-1-yuzhao@google.com> <20220407031525.2368067-11-yuzhao@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220411_191636_265416_9E87A6A8 X-CRM114-Status: GOOD ( 31.60 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, 6 Apr 2022 21:15:22 -0600 Yu Zhao wrote: > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that > can be disabled include: > 0x0001: the multi-gen LRU core > 0x0002: walking page table, when arch_has_hw_pte_young() returns > true > 0x0004: clearing the accessed bit in non-leaf PMD entries, when > CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y > [yYnN]: apply to all the components above > E.g., > echo y >/sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0007 > echo 5 >/sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0005 I'm shocked that this actually works. How does it work? Existing pages & folios are drained over time or synchrnously? Supporting structures remain allocated, available for reenablement? Why is it thought necessary to have this? Is it expected to be permanent? > NB: the page table walks happen on the scale of seconds under heavy > memory pressure, in which case the mmap_lock contention is a lesser > concern, compared with the LRU lock contention and the I/O congestion. > So far the only well-known case of the mmap_lock contention happens on > Android, due to Scudo [1] which allocates several thousand VMAs for > merely a few hundred MBs. The SPF and the Maple Tree also have > provided their own assessments [2][3]. However, if walking page tables > does worsen the mmap_lock contention, the kill switch can be used to > disable it. In this case the multi-gen LRU will suffer a minor > performance degradation, as shown previously. > > Clearing the accessed bit in non-leaf PMD entries can also be > disabled, since this behavior was not tested on x86 varieties other > than Intel and AMD. > > ... > > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -432,6 +432,18 @@ static inline void cgroup_put(struct cgroup *cgrp) > css_put(&cgrp->self); > } > > +extern struct mutex cgroup_mutex; > + > +static inline void cgroup_lock(void) > +{ > + mutex_lock(&cgroup_mutex); > +} > + > +static inline void cgroup_unlock(void) > +{ > + mutex_unlock(&cgroup_mutex); > +} It's a tad rude to export mutex_lock like this without (apparently) informing its owner (Tejun). And if we're going to wrap its operations via helper fuctions then - presumably all cgroup_mutex operations should be wrapped and - exiting open-coded operations on this mutex should be converted. > > ... > > +static bool drain_evictable(struct lruvec *lruvec) > +{ > + int gen, type, zone; > + int remaining = MAX_LRU_BATCH; > + > + for_each_gen_type_zone(gen, type, zone) { > + struct list_head *head = &lruvec->lrugen.lists[gen][type][zone]; > + > + while (!list_empty(head)) { > + bool success; > + struct folio *folio = lru_to_folio(head); > + > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > + VM_BUG_ON_FOLIO(folio_is_file_lru(folio) != type, folio); > + VM_BUG_ON_FOLIO(folio_zonenum(folio) != zone, folio); So many new BUG_ONs to upset Linus :( > + success = lru_gen_del_folio(lruvec, folio, false); > + VM_BUG_ON(!success); > + lruvec_add_folio(lruvec, folio); > + > + if (!--remaining) > + return false; > + } > + } > + > + return true; > +} > + > > ... > > +static ssize_t store_enable(struct kobject *kobj, struct kobj_attribute *attr, > + const char *buf, size_t len) > +{ > + int i; > + unsigned int caps; > + > + if (tolower(*buf) == 'n') > + caps = 0; > + else if (tolower(*buf) == 'y') > + caps = -1; > + else if (kstrtouint(buf, 0, &caps)) > + return -EINVAL; See kstrtobool() > + for (i = 0; i < NR_LRU_GEN_CAPS; i++) { > + bool enable = caps & BIT(i); > + > + if (i == LRU_GEN_CORE) > + lru_gen_change_state(enable); > + else if (enable) > + static_branch_enable(&lru_gen_caps[i]); > + else > + static_branch_disable(&lru_gen_caps[i]); > + } > + > + return len; > +} > > ... > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel