From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2537AC433EF for ; Wed, 27 Apr 2022 01:12:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9651D6B0073; Tue, 26 Apr 2022 21:12:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 915716B0075; Tue, 26 Apr 2022 21:12:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DD526B0078; Tue, 26 Apr 2022 21:12:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 6ED1C6B0073 for ; Tue, 26 Apr 2022 21:12:33 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3C5F521393 for ; Wed, 27 Apr 2022 01:12:33 +0000 (UTC) X-FDA: 79400883786.22.E18FB0F Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) by imf18.hostedemail.com (Postfix) with ESMTP id 792BD1C004C for ; Wed, 27 Apr 2022 01:12:28 +0000 (UTC) Received: by mail-vs1-f48.google.com with SMTP id a127so417952vsa.3 for ; Tue, 26 Apr 2022 18:12:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0OB4iCXtxiKbS3NMWNYxYIAROlGFdL0Wkd5J0MHv91A=; b=Bp9dT2NsWeXsrortAGIE6Vx0R7SGr+A/O59m+DJngXiWhnvAv4m8qbxVl4TUtOJQY5 bAmLUMBUoJocnLAu4lLnTOo5zkNupj3UnX7QkNy45lihtzmQbwIdNBHV08x4R0xmEECs RHOOGe9FvdObHDrXgCsZGk886CHiL8Mvv/8I/3JeNmEJBjskYzLheGO/bCFQfgdYE8wx 4/n80TAzYbqYYRM294NyAu8H8roXt4mF8a1JXWwVzzD+bSmgY1wQCgAeAWiStF7eoYXW /Ec0TvS1b2RbTQiLUA1gTqUxmZh8oiMj/QKw1CLegC52kGpb8aLHwXgU/6srNFGYZd2B J0oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0OB4iCXtxiKbS3NMWNYxYIAROlGFdL0Wkd5J0MHv91A=; b=sl3fbmBmzgu5efXE1eEUc97Vf/ZI2VA9b/ZFlNvavlKqlRHQTtqaIZmQiGNvFotSnL Yt9BHhr1eGCL/FAGOm6tpCYTvU2bTqDlYLLjqn8LJ4/BE8W1opdqKGnu4L9zvtYhzmbh +X4m6qrZ6qA8at0wut5XmAzFkXnezpQsyV+gJAHYkKTGXKreYpY99rMZ9iVm10P3jjsR +6gC3a3q2fh+axgBX43cIxHCB8nPXbbpb8NUUf5hN12sMZhNiqx5p1d8fH5YURTnVRzg cdynJprJ8I6Onzwv5Kyvo8SH0ppmrnDxl9JahvbRVZYc6Ko3BuWPvUJU9w5xMWXYrvz9 RgVA== X-Gm-Message-State: AOAM532bIXrvHLcC5bLfWRUZ9SPhLQuZ1j7TTA07dRP960B1JmO6DYgW hHmKlgvjxqRkCrs1aldHohtTB+S+Tc9kXyAUIGF02Q== X-Google-Smtp-Source: ABdhPJwy/MRAp4oC6Hi1ll9gS9jBkgqQtmNi88ADT/C3NJcTx+Bes6COrB45hWc8WPeVFid4VYYYZG1IRGGt4+/7Wgg= X-Received: by 2002:a05:6102:158a:b0:32a:56ea:3fba with SMTP id g10-20020a056102158a00b0032a56ea3fbamr7630158vsv.84.1651021951811; Tue, 26 Apr 2022 18:12:31 -0700 (PDT) MIME-Version: 1.0 References: <20220407031525.2368067-1-yuzhao@google.com> <20220407031525.2368067-11-yuzhao@google.com> <20220411191627.629f21de83cd0a520ef4a142@linux-foundation.org> <20220426152237.21d3f173eded69c0f63911f0@linux-foundation.org> In-Reply-To: <20220426152237.21d3f173eded69c0f63911f0@linux-foundation.org> From: Yu Zhao Date: Tue, 26 Apr 2022 19:11:54 -0600 Message-ID: Subject: Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch To: Andrew Morton Cc: Tejun Heo , Stephen Rothwell , Linux-MM , Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , Linux ARM , "open list:DOCUMENTATION" , linux-kernel , Kernel Page Reclaim v2 , "the arch/x86 maintainers" , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: oxn44ug9ndwka36okz56dxji1sy5pb77 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Bp9dT2Ns; spf=pass (imf18.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 792BD1C004C X-HE-Tag: 1651021948-232641 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 26, 2022 at 4:22 PM Andrew Morton wrote: > > On Tue, 26 Apr 2022 14:57:15 -0600 Yu Zhao wrote: > > > On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton wrote: > > > > > > On Wed, 6 Apr 2022 21:15:22 -0600 Yu Zhao wrote: > > > > > > > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that > > > > can be disabled include: > > > > 0x0001: the multi-gen LRU core > > > > 0x0002: walking page table, when arch_has_hw_pte_young() returns > > > > true > > > > 0x0004: clearing the accessed bit in non-leaf PMD entries, when > > > > CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y > > > > [yYnN]: apply to all the components above > > > > E.g., > > > > echo y >/sys/kernel/mm/lru_gen/enabled > > > > cat /sys/kernel/mm/lru_gen/enabled > > > > 0x0007 > > > > echo 5 >/sys/kernel/mm/lru_gen/enabled > > > > cat /sys/kernel/mm/lru_gen/enabled > > > > 0x0005 > > > > > > I'm shocked that this actually works. How does it work? Existing > > > pages & folios are drained over time or synchrnously? > > > > Basically we have a double-throw way, and once flipped, new (isolated) > > pages can only be added to the lists of the current implementation. > > Existing pages on the lists of the previous implementation are > > synchronously drained (isolated and then re-added), with > > cond_resched() of course. > > > > > Supporting > > > structures remain allocated, available for reenablement? > > > > Correct. > > > > > Why is it thought necessary to have this? Is it expected to be > > > permanent? > > > > This is almost a must for large scale deployments/experiments. > > > > For deployments, we need to keep fix rollout (high priority) and > > feature enabling (low priority) separate. Rolling out multiple > > binaries works but will make the process slower and more painful. So > > generally for each release, there is only one binary to roll out, and > > unless it's impossible, new features are disabled by default. Once a > > rollout completes, i.e., reaches enough population and remains stable, > > new features are turned on gradually. If something goes wrong with a > > new feature, we turn off that feature rather than roll back the > > kernel. > > > > Similarly, for A/B experiments, we don't want to use two binaries. > > Please let's spell out this sort of high-level thinking in the > changelogging. Will do. > From what you're saying, this is a transient thing. It sounds that > this enablement is only needed when mglru is at an early stage. Once > it has matured more then successive rollouts will have essentially the > same mglru implementation and being able to disable mglru at runtime > will no longer be required? I certainly hope so. But realistically this switch is here to stay, just like anything else added after careful planning or on a whim. > I guess the capability is reasonable simple/small and is livable with, > but does it have a long-term future? I see it as a necessary evil. > I mean, when organizations such as google start adopting the mglru > implementation which is present in Linus's tree we're, what, a year or > more into the future? Will they still need a kill switch then? There are two distinct possibilities: 1. Naturally the number of caps would grow. Old caps that have been proven remain the same values. New caps need to be flipped on/off for deployments/experiments. 2. The worst case scenario: this file becomes something like /sys/kernel/mm/transparent_hugepage/enabled. For different workloads, it's set to different values. Otherwise we'd have to build multiple kernel binaries.