From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACA7DC433F5 for ; Mon, 18 Apr 2022 09:58:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4EDF8D0007; Mon, 18 Apr 2022 05:58:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFE8A6B0093; Mon, 18 Apr 2022 05:58:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC71D8D0007; Mon, 18 Apr 2022 05:58:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 9C8166B0092 for ; Mon, 18 Apr 2022 05:58:39 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 5D978120220 for ; Mon, 18 Apr 2022 09:58:39 +0000 (UTC) X-FDA: 79369550358.19.D265450 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf19.hostedemail.com (Postfix) with ESMTP id DFD481A0002 for ; Mon, 18 Apr 2022 09:58:38 +0000 (UTC) Received: by mail-ed1-f41.google.com with SMTP id 21so16968573edv.1 for ; Mon, 18 Apr 2022 02:58:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H70ax/K9P4QL3LzdGVTK9ti/817Hb8ywEBUqt5hZdj0=; b=GeBFdqeInVINk9j3p1jL4YdrYvmbw+APLdasJXrZElFtNqqjtg0u5vMdO7cy1NM/oK cKMtopan96Nlh0y5BgOnmJ/UBkfjHuTlC/9hZVpL6/iXLm+7XIYVllPzDw0pYt8qgeSz 12/ZpkNLlaeSpsjbW7hTunAU82WkAeq4GsXGy0kvSwfhQr+5ughxENWSbQ9rK0t+3kgW Emdt3rEYET5rIYUeSjM5OdkYOp818aaNVao/t3nX8GCRTyftk7k0EZwTMONBelx1B7UN YX5ttmpAv4E2e3DHxLz0B3kjysiVYCFmtfBWpGmH+quPSkF8mHy3BsJAb6wirxHomkzx dQuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H70ax/K9P4QL3LzdGVTK9ti/817Hb8ywEBUqt5hZdj0=; b=bLCNp4SRqE4ciDiv9KWzkqrw4fcXcymAb0JsVfQ8GEYgBy1BrGWU4pfgM74f5uPntP aYPZ1GmPDbHhnxltKChYUvqjE7+W1tipoDam4VnlnE60PWW47jatjuATnjJteofNAPoD HOjjFb6EmTd4omU2FgIQxU3bHCIye9S60Tf6yAWyY3xBCzOdtPXHLrvIYEZWfokjEMLB 29TvwmT/130Iq9/kcEJDq+R8sw1YfeoM2WEhU7U9+aIIJMlzb4UVAUEHaxdA0SYp1xEE it0B6nPogK8Ssznhpf+zQEcvF+SOPOVA9eRCZBCqGQalFnQjUKTTH3RmVnDSx/kWyjUN LmfA== X-Gm-Message-State: AOAM532WAJb7pbl7BzxEzLfTb6g6BpOT4hzOulWYT/Fio/M4qfvi/Koi XJZgraYw5l4o2k5i93ZF5/N1qMQ66QffIUnYwnU= X-Google-Smtp-Source: ABdhPJyFy6C4Lp1eZgRvFaLJ1+gIuUbSNc6rO3HvTwRx0okGsHBwah/dQbuEAd2aZ9IYbsuVfjuGtMbjPinfw4cuKUU= X-Received: by 2002:aa7:d2d6:0:b0:423:97a4:801c with SMTP id k22-20020aa7d2d6000000b0042397a4801cmr10507782edr.383.1650275917413; Mon, 18 Apr 2022 02:58:37 -0700 (PDT) MIME-Version: 1.0 References: <20220407031525.2368067-1-yuzhao@google.com> <20220407031525.2368067-7-yuzhao@google.com> In-Reply-To: <20220407031525.2368067-7-yuzhao@google.com> From: Barry Song <21cnbao@gmail.com> Date: Mon, 18 Apr 2022 21:58:26 +1200 Message-ID: Subject: Re: [PATCH v10 06/14] mm: multi-gen LRU: minimal implementation To: Yu Zhao Cc: Stephen Rothwell , Linux-MM , Andi Kleen , Andrew Morton , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , LAK , Linux Doc Mailing List , LKML , Kernel Page Reclaim v2 , x86 , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GeBFdqeI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: DFD481A0002 X-Stat-Signature: 7r5hjoatjogtz6z7r7eounfz16k8d87i X-HE-Tag: 1650275918-981300 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 7, 2022 at 3:16 PM Yu Zhao wrote: > > To avoid confusion, the terms "promotion" and "demotion" will be > applied to the multi-gen LRU, as a new convention; the terms > "activation" and "deactivation" will be applied to the active/inactive > LRU, as usual. > > The aging produces young generations. Given an lruvec, it increments > max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging > promotes hot pages to the youngest generation when it finds them > accessed through page tables; the demotion of cold pages happens > consequently when it increments max_seq. The aging has the complexity > O(nr_hot_pages), since it is only interested in hot pages. Promotion > in the aging path does not require any LRU list operations, only the > updates of the gen counter and lrugen->nr_pages[]; demotion, unless as > the result of the increment of max_seq, requires LRU list operations, > e.g., lru_deactivate_fn(). > > The eviction consumes old generations. Given an lruvec, it increments > min_seq when the lists indexed by min_seq%MAX_NR_GENS become empty. A > feedback loop modeled after the PID controller monitors refaults over > anon and file types and decides which type to evict when both types > are available from the same generation. > > Each generation is divided into multiple tiers. Tiers represent > different ranges of numbers of accesses through file descriptors. A > page accessed N times through file descriptors is in tier > order_base_2(N). Tiers do not have dedicated lrugen->lists[], only > bits in folio->flags. In contrast to moving across generations, which > requires the LRU lock, moving across tiers only involves operations on > folio->flags. The feedback loop also monitors refaults over all tiers > and decides when to protect pages in which tiers (N>1), using the > first tier (N=0,1) as a baseline. The first tier contains single-use > unmapped clean pages, which are most likely the best choices. The > eviction moves a page to the next generation, i.e., min_seq+1, if the > feedback loop decides so. This approach has the following advantages: > 1. It removes the cost of activation in the buffered access path by > inferring whether pages accessed multiple times through file > descriptors are statistically hot and thus worth protecting in the > eviction path. > 2. It takes pages accessed through page tables into account and avoids > overprotecting pages accessed multiple times through file > descriptors. (Pages accessed through page tables are in the first > tier, since N=0.) > 3. More tiers provide better protection for pages accessed more than > twice through file descriptors, when under heavy buffered I/O > workloads. > Hi Yu, As I told you before, I tried to change the current LRU (not MGLRU) by only promoting unmapped file pages to the head of the inactive head rather than the active head on its second access: https://lore.kernel.org/lkml/CAGsJ_4y=TkCGoWWtWSAptW4RDFUEBeYXwfwu=fUFvV4Sa4VA4A@mail.gmail.com/ I have already seen some very good results by the decease of cpu consumption of kswapd and direct reclamation in the testing. in mglru, it seems "twice" isn't a concern at all, one unmapped file page accessed twice has no much difference with those ones which are accessed once as you only begin to increase refs from the third time: +static void folio_inc_refs(struct folio *folio) +{ + unsigned long refs; + unsigned long old_flags, new_flags; + + if (folio_test_unevictable(folio)) + return; + + /* see the comment on MAX_NR_TIERS */ + do { + new_flags = old_flags = READ_ONCE(folio->flags); + + if (!(new_flags & BIT(PG_referenced))) { + new_flags |= BIT(PG_referenced); + continue; + } + + if (!(new_flags & BIT(PG_workingset))) { + new_flags |= BIT(PG_workingset); + continue; + } + + refs = new_flags & LRU_REFS_MASK; + refs = min(refs + BIT(LRU_REFS_PGOFF), LRU_REFS_MASK); + + new_flags &= ~LRU_REFS_MASK; + new_flags |= refs; + } while (new_flags != old_flags && + cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); +} So my question is what makes you so confident that twice doesn't need any special treatment while the vanilla kernel is upgrading this kind of page to the head of the active instead? I am asking this because I am considering reclaiming unmapped file pages which are only accessed twice when they get to the tail of the inactive list. Thanks Barry