From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03B80C433EF for ; Tue, 8 Feb 2022 09:16:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 767606B007B; Tue, 8 Feb 2022 04:16:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 717146B007D; Tue, 8 Feb 2022 04:16:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DE166B007E; Tue, 8 Feb 2022 04:16:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 4E3D76B007B for ; Tue, 8 Feb 2022 04:16:30 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0BEC1181D68B5 for ; Tue, 8 Feb 2022 09:16:30 +0000 (UTC) X-FDA: 79119056940.14.C6E45DC Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) by imf18.hostedemail.com (Postfix) with ESMTP id 7A5581C0009 for ; Tue, 8 Feb 2022 09:16:29 +0000 (UTC) Received: by mail-io1-f42.google.com with SMTP id i62so20313204ioa.1 for ; Tue, 08 Feb 2022 01:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=O/lx//nLUVMx2EI4vj5HKtagrUsb3ZYCM82TQ4lEins=; b=GNZIO6/kuk+ic9l6Cd+p/G1mclOH/q3G/vT9SOJqWStL42meaNn/c/wFxfOu3vH9z1 uOLAzQrhJq95IkE+TOApdAN1ebRCQvgTw7uDwp8aJjVbau9Q2hRunsd/UW0wOVvjYwd1 LcpnpZOyuC0DTMEJBpVE1aNfStAYiHEPBjeyFGgD6lSx1oXM4Le3EOGAoBFH3e2WoeDV uIQBx1oy/xTb+3xXbPXk86a1+F/MI5UOjandFhzeQffQ3GTJisprjuAegNuPLc1FAiS7 JphN+alAke3aQBct32r7k9yuTI6pGb/AYKbCAvQdosBGvl25eO7UHSAH5rqstI1IASX8 bjWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=O/lx//nLUVMx2EI4vj5HKtagrUsb3ZYCM82TQ4lEins=; b=xfnjjbv6E/XQiUxoQsJoWOWFYE+ERn5uo6Re6ABhs52CzhE+4CWFAcUqrcePqAvI08 TGWZmJ3xPRW2utQuixogfOAa1UgP3WuFpZL2XljWxWBmsljzOUSvAcWM6YJbUCpxVpPN b2LeAhdebBM1PR3D6YxwdRESnQCMR5I1UQYqsDNdyzkK7Cr9B24foykabygE/QrmXn8a uQ6Lz0/wQeqUhIjyArlekxEpH5C7A2VTZB8Q2TYMpug2838Kel4Ns5gxnPUk/XPC6red y1bbx1RhRIDKid8hHDJoame7nthQ2GRNPp60ZnE3Hc98DUQBrhGgkrSIJIQ7iHG2zOHn YWNw== X-Gm-Message-State: AOAM530qooo4guz7/vBem0OhhRPSzJuC+k52Ujff5E8sibZoeSAyTsM1 hG0GTiozCLUMPL9FbUHLSOtYCw== X-Google-Smtp-Source: ABdhPJw5yaI9Ui9SJAQTv5l/nY/N9q1YxRO/3oThR7TlMGDQjGNGWfY/7ovjRm8eOUtzjrPfeX8IwA== X-Received: by 2002:a05:6638:1028:: with SMTP id n8mr1757754jan.318.1644311788671; Tue, 08 Feb 2022 01:16:28 -0800 (PST) Received: from google.com ([2620:15c:183:200:5f31:19c3:21f5:7300]) by smtp.gmail.com with ESMTPSA id k11sm7556042iob.23.2022.02.08.01.16.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Feb 2022 01:16:27 -0800 (PST) Date: Tue, 8 Feb 2022 02:16:24 -0700 From: Yu Zhao To: Barry Song <21cnbao@gmail.com> Cc: Andrew Morton , Linus Torvalds , Andi Kleen , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , LAK , Linux Doc Mailing List , LKML , Linux-MM , page-reclaim@google.com, x86 Subject: Re: [PATCH v6 0/9] Multigenerational LRU Framework Message-ID: References: <20220104202227.2903605-1-yuzhao@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 7A5581C0009 X-Stat-Signature: d5bwqxwxpufqe4bza8ste15xynp575am X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="GNZIO6/k"; spf=pass (imf18.hostedemail.com: domain of yuzhao@google.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1644311789-586881 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 28, 2022 at 09:54:09PM +1300, Barry Song wrote: > On Tue, Jan 25, 2022 at 7:48 PM Yu Zhao wrote: > > > > On Sun, Jan 23, 2022 at 06:43:06PM +1300, Barry Song wrote: > > > On Wed, Jan 5, 2022 at 7:17 PM Yu Zhao wrote: > > > > > > > > > > Large-scale deployments > > > > ----------------------- > > > > We've rolled out MGLRU to tens of millions of Chrome OS users and > > > > about a million Android users. Google's fleetwide profiling [13] = shows > > > > an overall 40% decrease in kswapd CPU usage, in addition to > > > > > > Hi Yu, > > > > > > Was the overall 40% decrease of kswap CPU usgae seen on x86 or arm6= 4? > > > And I am curious how much we are taking advantage of NONLEAF_PMD_YO= UNG. > > > Does it help a lot in decreasing the cpu usage? > > > > Hi Barry, > > > > The fleet-wide profiling data I shared was from x86. For arm64, I onl= y > > have data from synthetic benchmarks at the moment, and it also shows > > similar improvements. > > > > For Chrome OS (individual users), walk_pte_range(), the function that > > would benefit from ARCH_HAS_NONLEAF_PMD_YOUNG, only uses a small > > portion (<4%) of kswapd CPU time. So ARCH_HAS_NONLEAF_PMD_YOUNG isn't > > that helpful. >=20 > Hi Yu, > Thanks! >=20 > In the current kernel, depending on reverse mapping, while memory is > under pressure, > the cpu usage of kswapd can be very very high especially while a lot of= pages > have large mapcount, thus a huge reverse mapping cost. Agreed. I've posted v7 which includes kswapd profiles collected from an arm64 v8.2 laptop under memory pressure. > Regarding <4%, I guess the figure came from machines with NONLEAF_PMD_= YOUNG=EF=BC=9F No, it's from Snapdragon 7c. Please see the kswapd profiles in v7. > In this case, we can skip many PTE scans while PMD has no accessed bit > set. But for > a machine without NONLEAF, will the figure of cpu usage be much larger? So NONLEAF_PMD_YOUNG at most can save 4% CPU usage from kswapd. But this definitely can vary, depending on the workloads. > > > If so, this might be > > > a good proof that arm64 also needs this hardware feature? > > > In short, I am curious how much the improvement in this patchset de= pends > > > on the hardware ability of NONLEAF_PMD_YOUNG. > > > > For data centers, I do think ARCH_HAS_NONLEAF_PMD_YOUNG has some valu= e. > > In addition to cold/hot memory scanning, there are other use cases li= ke > > dirty tracking, which can benefit from the accessed bit on non-leaf > > entries. I know some proprietary software uses this capability on x86 > > for different purposes than this patchset does. And AFAIK, x86 is the > > only arch that supports this capability, e.g., risc-v and ppc can onl= y > > set the accessed bit in PTEs. >=20 > Yep. NONLEAF is a nice feature. >=20 > btw, page table should have a separate DIRTY bit, right? Yes. > wouldn't dirty page > tracking depend on the DIRTY bit rather than the accessed bit? It depends on the goal. > so x86 also has > NONLEAF dirty bit? No. > Or they are scanning accessed bit of PMD before > scanning DIRTY bits of PTEs? A mandatory sync to disk must use the dirty bit to ensure data integrity. But for a voluntary sync to disk, it can use the accessed bit to narrow the search of dirty pages. A mandatory sync is used to free specific dirty pages. A voluntary sync is used to keep the number of dirty pages low in general and it doesn't target any specific dirty pages. > > In fact, I've discussed this with one of the arm maintainers Will. So > > please check with him too if you are interested in moving forward wit= h > > the idea. I might be able to provide with additional data if you need > > it to make a decision. >=20 > I am interested in running it and have some data without NONLEAF > especially while free memory is very limited and the system has memory > thrashing. The v7 has a switch to disable this feature on x86. If you can run your workloads on x86, then it might be able to help you measure the differenc= e.