From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5221AC433F5 for ; Mon, 10 Jan 2022 10:54:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244300AbiAJKyy (ORCPT ); Mon, 10 Jan 2022 05:54:54 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:46496 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244285AbiAJKyr (ORCPT ); Mon, 10 Jan 2022 05:54:47 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 97F2E21108; Mon, 10 Jan 2022 10:54:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1641812086; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oKacWBSN/ZXwXmFrEa9nCB5WmYPzYr2A9hT2GRYd14M=; b=Z43I/eSbXXVkSr86SLVEM+mrL1i4j2GQBi70dciJ2aG1HwuNDRMRFM1wjl0kf1gMf8f3HD pGZvcIqzVgDrjLyK0elQSOHqU+gH5Jito4dOdMg/I5gMXUZ7j+AGE9cCFRC7QF/kqqU5cV jWCNf5Ta3ex9HQcy3swcNF33faKgh2s= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id A58EFA3B81; Mon, 10 Jan 2022 10:54:45 +0000 (UTC) Date: Mon, 10 Jan 2022 11:54:42 +0100 From: Michal Hocko To: Yu Zhao Cc: Andrew Morton , Linus Torvalds , Andi Kleen , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Matthew Wilcox , Mel Gorman , Michael Larabel , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Konstantin Kharlamov Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging Message-ID: References: <20220104202227.2903605-1-yuzhao@google.com> <20220104202227.2903605-7-yuzhao@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun 09-01-22 21:47:57, Yu Zhao wrote: > On Fri, Jan 07, 2022 at 03:44:50PM +0100, Michal Hocko wrote: > > On Tue 04-01-22 13:22:25, Yu Zhao wrote: > > [...] > > > +static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_mm_walk *walk) > > > +{ > > > + static const struct mm_walk_ops mm_walk_ops = { > > > + .test_walk = should_skip_vma, > > > + .p4d_entry = walk_pud_range, > > > + }; > > > + > > > + int err; > > > +#ifdef CONFIG_MEMCG > > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > > +#endif > > > + > > > + walk->next_addr = FIRST_USER_ADDRESS; > > > + > > > + do { > > > + unsigned long start = walk->next_addr; > > > + unsigned long end = mm->highest_vm_end; > > > + > > > + err = -EBUSY; > > > + > > > + rcu_read_lock(); > > > +#ifdef CONFIG_MEMCG > > > + if (memcg && atomic_read(&memcg->moving_account)) > > > + goto contended; > > > +#endif > > > + if (!mmap_read_trylock(mm)) > > > + goto contended; > > > > Have you evaluated the behavior under mmap_sem contention? I mean what > > would be an effect of some mms being excluded from the walk? This path > > is called from direct reclaim and we do allocate with exclusive mmap_sem > > IIRC and the trylock can fail in a presence of pending writer if I am > > not mistaken so even the read lock holder (e.g. an allocation from the #PF) > > can bypass the walk. > > You are right. Here it must be a trylock; otherwise it can deadlock. Yeah, this is clear. > I think there might be a misunderstanding: the aging doesn't > exclusively rely on page table walks to gather the accessed bit. It > prefers page table walks but it can also fallback to the rmap-based > function, i.e., lru_gen_look_around(), which only gathers the accessed > bit from at most 64 PTEs and therefore is less efficient. But it still > retains about 80% of the performance gains. I have to say that I really have hard time to understand the runtime behavior depending on that interaction. How does the reclaim behave when the virtual scan is enabled, partially enabled and almost completely disabled due to different constrains? I do not see any such an evaluation described in changelogs and I consider this to be a rather important information to judge the overall behavior. > > Or is this considered statistically insignificant thus a theoretical > > problem? > > Yes. People who work on the maple tree and SPF at Google expressed the > same concern during the design review meeting (all stakeholders on the > mailing list were also invited). So we had a counter to monitor the > contention in previous versions, i.e., MM_LOCK_CONTENTION in v4 here: > https://lore.kernel.org/lkml/20210818063107.2696454-8-yuzhao@google.com/ > > And we also combined this patchset with the SPF patchset to see if the > latter makes any difference. Our conclusion was the contention is > statistically insignificant to the performance under memory pressure. > > This can be explained by how often we create a new generation. (We > only walk page tables when we create a new generation. And it's > similar to the low inactive condition for the active/inactive lru.) > > Usually we only do so every few seconds. We'd run into problems with > other parts of the kernel, e.g., lru lock contention, i/o congestion, > etc. if we create more than a few generation every second. This would be a very good information to have in changelogs. Ideally with some numbers and analysis. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4DEDDC433F5 for ; Mon, 10 Jan 2022 10:56:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Tr46rBmohUJ6PgYnPZDv6EdlmKe5jEjoM0T9/EHSjAg=; b=2OLUh4LA65m8dD B1u1z8GYql9JLnWYrjZVThyyKfdoezbXEmZSxY2sVtqRxCDyhd3uBOutr5DGYtckcZN8kfTvh1fQk mV89NUg+f1i0EQzUOk63Qy5QspKGbULYzpuXSLb+eZfdOlb41gAyXUtI6u0WX8pJJBe+ViXRnvG5S 0P/hJYYjwt+JLK6Ivl862VDp3HRGvl4ySdr78NNBFHPuVToRUaB34ehKnv2arNoCBYRW9Sm0V43hV Tt/JV0Z5mrh/21RUjwwkFgGKZsqZWTgIZWVj5W8+16e9/axBWHpFHmI4a4QPYAsMD7PtCOypSSOC/ NTrM7UQuaGqm5oiYdjdw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n6sKG-00AxfP-7p; Mon, 10 Jan 2022 10:54:56 +0000 Received: from smtp-out1.suse.de ([195.135.220.28]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1n6sKB-00Axat-UU for linux-arm-kernel@lists.infradead.org; Mon, 10 Jan 2022 10:54:53 +0000 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 97F2E21108; Mon, 10 Jan 2022 10:54:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1641812086; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oKacWBSN/ZXwXmFrEa9nCB5WmYPzYr2A9hT2GRYd14M=; b=Z43I/eSbXXVkSr86SLVEM+mrL1i4j2GQBi70dciJ2aG1HwuNDRMRFM1wjl0kf1gMf8f3HD pGZvcIqzVgDrjLyK0elQSOHqU+gH5Jito4dOdMg/I5gMXUZ7j+AGE9cCFRC7QF/kqqU5cV jWCNf5Ta3ex9HQcy3swcNF33faKgh2s= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id A58EFA3B81; Mon, 10 Jan 2022 10:54:45 +0000 (UTC) Date: Mon, 10 Jan 2022 11:54:42 +0100 From: Michal Hocko To: Yu Zhao Cc: Andrew Morton , Linus Torvalds , Andi Kleen , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Matthew Wilcox , Mel Gorman , Michael Larabel , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Konstantin Kharlamov Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging Message-ID: References: <20220104202227.2903605-1-yuzhao@google.com> <20220104202227.2903605-7-yuzhao@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220110_025452_171580_84E178D9 X-CRM114-Status: GOOD ( 34.53 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sun 09-01-22 21:47:57, Yu Zhao wrote: > On Fri, Jan 07, 2022 at 03:44:50PM +0100, Michal Hocko wrote: > > On Tue 04-01-22 13:22:25, Yu Zhao wrote: > > [...] > > > +static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_mm_walk *walk) > > > +{ > > > + static const struct mm_walk_ops mm_walk_ops = { > > > + .test_walk = should_skip_vma, > > > + .p4d_entry = walk_pud_range, > > > + }; > > > + > > > + int err; > > > +#ifdef CONFIG_MEMCG > > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > > +#endif > > > + > > > + walk->next_addr = FIRST_USER_ADDRESS; > > > + > > > + do { > > > + unsigned long start = walk->next_addr; > > > + unsigned long end = mm->highest_vm_end; > > > + > > > + err = -EBUSY; > > > + > > > + rcu_read_lock(); > > > +#ifdef CONFIG_MEMCG > > > + if (memcg && atomic_read(&memcg->moving_account)) > > > + goto contended; > > > +#endif > > > + if (!mmap_read_trylock(mm)) > > > + goto contended; > > > > Have you evaluated the behavior under mmap_sem contention? I mean what > > would be an effect of some mms being excluded from the walk? This path > > is called from direct reclaim and we do allocate with exclusive mmap_sem > > IIRC and the trylock can fail in a presence of pending writer if I am > > not mistaken so even the read lock holder (e.g. an allocation from the #PF) > > can bypass the walk. > > You are right. Here it must be a trylock; otherwise it can deadlock. Yeah, this is clear. > I think there might be a misunderstanding: the aging doesn't > exclusively rely on page table walks to gather the accessed bit. It > prefers page table walks but it can also fallback to the rmap-based > function, i.e., lru_gen_look_around(), which only gathers the accessed > bit from at most 64 PTEs and therefore is less efficient. But it still > retains about 80% of the performance gains. I have to say that I really have hard time to understand the runtime behavior depending on that interaction. How does the reclaim behave when the virtual scan is enabled, partially enabled and almost completely disabled due to different constrains? I do not see any such an evaluation described in changelogs and I consider this to be a rather important information to judge the overall behavior. > > Or is this considered statistically insignificant thus a theoretical > > problem? > > Yes. People who work on the maple tree and SPF at Google expressed the > same concern during the design review meeting (all stakeholders on the > mailing list were also invited). So we had a counter to monitor the > contention in previous versions, i.e., MM_LOCK_CONTENTION in v4 here: > https://lore.kernel.org/lkml/20210818063107.2696454-8-yuzhao@google.com/ > > And we also combined this patchset with the SPF patchset to see if the > latter makes any difference. Our conclusion was the contention is > statistically insignificant to the performance under memory pressure. > > This can be explained by how often we create a new generation. (We > only walk page tables when we create a new generation. And it's > similar to the low inactive condition for the active/inactive lru.) > > Usually we only do so every few seconds. We'd run into problems with > other parts of the kernel, e.g., lru lock contention, i/o congestion, > etc. if we create more than a few generation every second. This would be a very good information to have in changelogs. Ideally with some numbers and analysis. -- Michal Hocko SUSE Labs _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel