From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED3F2C433EF for ; Sun, 13 Mar 2022 04:57:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDD248D0002; Sat, 12 Mar 2022 23:57:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E8BE58D0001; Sat, 12 Mar 2022 23:57:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D53788D0002; Sat, 12 Mar 2022 23:57:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id C6E028D0001 for ; Sat, 12 Mar 2022 23:57:31 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 9458F120180 for ; Sun, 13 Mar 2022 04:57:31 +0000 (UTC) X-FDA: 79238154702.10.8CB3F09 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf08.hostedemail.com (Postfix) with ESMTP id 19B4E160018 for ; Sun, 13 Mar 2022 04:57:30 +0000 (UTC) Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-2dbd8777564so132716607b3.0 for ; Sat, 12 Mar 2022 20:57:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ostrLnUO2mj0wyzHZmJJ7h19bUlrS9Vf5+VLNn3DoVM=; b=dh66cpD7Wqk3kfXaEFRzK8xzrsPwvBoq7TZ9ooyv4XDKUeHYqyHlNZ0Puys6VeRCq8 ZeEpkcELlSmEMF3MWSQVxqGjMWhAh99RkXxOvPbH5tMn8FodBaVE/dZL9eot+WK/CbQx HqZrcT6pqKgB7yBCLCtMttlIgsBmt0IBQ3JReKBdETgcE3TvNmwx5qyZgU/RrOpBapxP K+98SC3kT6XN5AxVBYJ8VLI6yzHvkjthzDZBZRan0NBAbi7WMtxT6vqcbDN1OB4a8UGN WXG1deoOr8X+Iin1hE2aYBkPblipBdZaDwEy/f6g4jbyDEmnqTjq9qiFDjycfmwJ5WiH kFqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ostrLnUO2mj0wyzHZmJJ7h19bUlrS9Vf5+VLNn3DoVM=; b=wN5HX1KlHHjpn/DZIPBgAJgLfIH74OEUHA/ujS4dU2ABmH0qHmWUvu8FfSYiRdklto 7gzPz36axX+WsSE5y7sQyOIz5wkifpOZuL34491vJgtIMww4QMgbfUcFLR4UyEaCbW2A Sn3M+wyf7vsDPxWNSQhEsHftnxSzH+kVaa8D9F/WJet3gSYJ6taC9IeYhtU2lDvyb8SO DOTs6AyEBsF1QtrCDSIeU3ZOyNvxAGYtUcp1Jtho8muUmgxIaIcsV99g6EbdSI1GCO3z UBy4DVKsB0hc+Lligy4mUGeT5tk0XmqNq9T2sEHn0JgzElEsmPbpUYhTXBXHSsPNrocU NkdQ== X-Gm-Message-State: AOAM5307lEK8jfhbxX3uQEpVFPgEXXQGgTMmgkhGE3k6k6LDWNlHuXkV XwQ4FzBqGYekEAIBx4FoKBlJmF9wgDK2PNlwri0= X-Google-Smtp-Source: ABdhPJwNkleOUaZ6JhmszubFPw5Yuz+G5WUwkcp6MzjBDnPeAeiZ/J4iyVVsOuwRgvltoEq4/nE1ac7NEGuyCjW4uHo= X-Received: by 2002:a81:9806:0:b0:2dc:5953:4d13 with SMTP id p6-20020a819806000000b002dc59534d13mr14185928ywg.233.1647147450199; Sat, 12 Mar 2022 20:57:30 -0800 (PST) MIME-Version: 1.0 References: <20220208081902.3550911-1-yuzhao@google.com> <20220208081902.3550911-5-yuzhao@google.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Sun, 13 Mar 2022 17:57:17 +1300 Message-ID: Subject: Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork To: Yu Zhao Cc: Johannes Weiner , Andrew Morton , Mel Gorman , Michal Hocko , Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Michael Larabel , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , LAK , Linux Doc Mailing List , LKML , Linux-MM , Kernel Page Reclaim v2 , x86 , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 19B4E160018 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dh66cpD7; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: fb14cqezfbix7ym4z5shagoyz3m7fzee X-Rspamd-Server: rspam07 X-HE-Tag: 1647147450-884357 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Mar 13, 2022 at 10:12 AM Yu Zhao wrote: > > On Sat, Mar 12, 2022 at 3:37 AM Barry Song <21cnbao@gmail.com> wrote: > > > > On Sat, Mar 12, 2022 at 12:45 PM Yu Zhao wrote: > > > > > > On Fri, Mar 11, 2022 at 3:16 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > On Tue, Feb 15, 2022 at 10:43 PM Yu Zhao wrote: > > > > > > > > > > On Thu, Feb 10, 2022 at 03:41:57PM -0500, Johannes Weiner wrote: > > > > > > > > > > Thanks for reviewing. > > > > > > > > > > > > +static inline bool lru_gen_is_active(struct lruvec *lruvec, int gen) > > > > > > > +{ > > > > > > > + unsigned long max_seq = lruvec->lrugen.max_seq; > > > > > > > + > > > > > > > + VM_BUG_ON(gen >= MAX_NR_GENS); > > > > > > > + > > > > > > > + /* see the comment on MIN_NR_GENS */ > > > > > > > + return gen == lru_gen_from_seq(max_seq) || gen == lru_gen_from_seq(max_seq - 1); > > > > > > > +} > > > > > > > > > > > > I'm still reading the series, so correct me if I'm wrong: the "active" > > > > > > set is split into two generations for the sole purpose of the > > > > > > second-chance policy for fresh faults, right? > > > > > > > > > > To be precise, the active/inactive notion on top of generations is > > > > > just for ABI compatibility, e.g., the counters in /proc/vmstat. > > > > > Otherwise, this function wouldn't be needed. > > > > > > > > Hi Yu, > > > > I am still quite confused as i am seeing both active/inactive and lru_gen. > > > > eg: > > > > > > > > root@ubuntu:~# cat /proc/vmstat | grep active > > > > nr_zone_inactive_anon 22797 > > > > nr_zone_active_anon 578405 > > > > nr_zone_inactive_file 0 > > > > nr_zone_active_file 4156 > > > > nr_inactive_anon 22800 > > > > nr_active_anon 578574 > > > > nr_inactive_file 0 > > > > nr_active_file 4215 > > > > > > Yes, this is expected. We have to maintain the ABI, i.e., the > > > *_active/inactive_* counters. > > > > > > > and: > > > > > > > > root@ubuntu:~# cat /sys//kernel/debug/lru_gen > > > > > > > > ... > > > > memcg 36 /user.slice/user-0.slice/user@0.service > > > > node 0 > > > > 20 18820 22 0 > > > > 21 7452 0 0 > > > > 22 7448 0 0 > > > > memcg 33 /user.slice/user-0.slice/user@0.service/app.slice > > > > node 0 > > > > 0 2171452 0 0 > > > > 1 2171452 0 0 > > > > 2 2171452 0 0 > > > > 3 2171452 0 0 > > > > memcg 37 /user.slice/user-0.slice/session-1.scope > > > > node 0 > > > > 42 51804 102127 0 > > > > 43 18840 275622 0 > > > > 44 16104 216805 1 > > > > > > > > Does it mean one page could be in both one of the generations and one > > > > of the active/inactive lists? > > > > > > In terms of the data structure, evictable pages are either on > > > lruvec->lists or lrugen->lists. > > > > > > > Do we have some mapping relationship between active/inactive lists > > > > with generations? > > > > > > For the counters, yes -- pages in max_seq and max_seq-1 are counted as > > > active, and the rest are inactive. > > > > > > > We used to put a faulted file page in inactive, if we access it a > > > > second time, it can be promoted > > > > to active. then in recent years, we have also applied this to anon > > > > pages while kernel adds > > > > workingset protection for anon pages. so basically both anon and file > > > > pages go into the inactive > > > > list for the 1st time, if we access it for the second time, they go to > > > > the active list. if we don't access > > > > it any more, they are likely to be reclaimed as they are inactive. > > > > we do have some special fastpath for code section, executable file > > > > pages are kept on active list > > > > as long as they are accessed. > > > > > > Yes. > > > > > > > so all of the above concerns are actually not that correct? > > > > > > They are valid concerns but I don't know any popular workloads that > > > care about them. > > > > Hi Yu, > > here we can get a workload in Kim's patchset while he added workingset > > protection > > for anon pages: > > https://patchwork.kernel.org/project/linux-mm/cover/1581401993-20041-1-git-send-email-iamjoonsoo.kim@lge.com/ > > Thanks. I wouldn't call that a workload because it's not a real > application. By popular workloads, I mean applications that the > majority of people actually run on phones, in cloud, etc. > > > anon pages used to go to active rather than inactive, but kim's patchset > > moved to use inactive first. then only after the anon page is accessed > > second time, it can move to active. > > Yes. To clarify, the A-bit doesn't really mean the first or second > access. It can be many accesses each time it's set. > > > "In current implementation, newly created or swap-in anonymous page is > > > > started on the active list. Growing the active list results in rebalancing > > active/inactive list so old pages on the active list are demoted to the > > inactive list. Hence, hot page on the active list isn't protected at all. > > > > Following is an example of this situation. > > > > Assume that 50 hot pages on active list and system can contain total > > 100 pages. Numbers denote the number of pages on active/inactive > > list (active | inactive). (h) stands for hot pages and (uo) stands for > > used-once pages. > > > > 1. 50 hot pages on active list > > 50(h) | 0 > > > > 2. workload: 50 newly created (used-once) pages > > 50(uo) | 50(h) > > > > 3. workload: another 50 newly created (used-once) pages > > 50(uo) | 50(uo), swap-out 50(h) > > > > As we can see, hot pages are swapped-out and it would cause swap-in later." > > > > Is MGLRU able to avoid the swap-out of the 50 hot pages? > > I think the real question is why the 50 hot pages can be moved to the > inactive list. If they are really hot, the A-bit should protect them. This is a good question. I guess it is probably because the current lru is trying to maintain a balance between the sizes of active and inactive lists. Thus, it can shrink active list even though pages might be still "hot" but not the recently accessed ones. 1. 50 hot pages on active list 50(h) | 0 2. workload: 50 newly created (used-once) pages 50(uo) | 50(h) 3. workload: another 50 newly created (used-once) pages 50(uo) | 50(uo), swap-out 50(h) the old kernel without anon workingset protection put workload 2 on active, so pushed 50 hot pages from active to inactive. workload 3 would further contribute to evict the 50 hot pages. it seems mglru doesn't demote pages from the youngest generation to older generation only in order to balance the list size? so mglru is probably safe in these cases. I will run some tests mentioned in Kim's patchset and report the result to you afterwards. > > > since MGLRU > > is putting faulted pages to the youngest generation directly, do we have the > > risk mentioned in Kim's patchset? > > There are always risks :) I could imagine a thousand ways to make VM > suffer, but all of them could be irrelevant to how it actually does in > production. So a concrete use case of yours would be much appreciated > for this discussion. Thanks Barry