From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9630CC433DB for ; Tue, 9 Mar 2021 00:24:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D910650F7 for ; Tue, 9 Mar 2021 00:24:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D910650F7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A2A9E8D009D; Mon, 8 Mar 2021 19:24:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DA128D007F; Mon, 8 Mar 2021 19:24:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87A7F8D009D; Mon, 8 Mar 2021 19:24:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 6BD098D007F for ; Mon, 8 Mar 2021 19:24:35 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 32E57482A for ; Tue, 9 Mar 2021 00:24:35 +0000 (UTC) X-FDA: 77898439710.15.5075B87 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf26.hostedemail.com (Postfix) with ESMTP id A90B5407F8F7 for ; Tue, 9 Mar 2021 00:24:32 +0000 (UTC) Received: by mail-ed1-f49.google.com with SMTP id v13so17401829edw.9 for ; Mon, 08 Mar 2021 16:24:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3ldMzFuy7XtYOaxXkLzzP1hP70OgEV2wYJ6wrQAzlas=; b=jeNcCOlvTrbIsMXCB9B6QVX6lMNpuitbEDi0X04q8hWbGrev263PWPr2QoiA8k71ch 5OVBnlQs+aS6S8p0MgmX1o6RLLvefrRBoBXH7kLWLyG6lckBVa2BWQmRRArDVA59KPgf ZOodP4NxXnzzcJ1rNtplBARmQR4SNumGLYolR9f4g0GS2cgh9KjaLEmcQWtfRd+djJzY 2A/O8g01KBqgXEhuAu08W1AoLP3WHXYIvQZJ8sfuBJR7IF31upKtwwH4LKrygW+k9x9J GPYky+eN+oR1Qb/zJJITFPZzAHljYbye+E4oe1NrBWfG4XkEyJZx53UZrY3T6wV0u6S/ WtcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3ldMzFuy7XtYOaxXkLzzP1hP70OgEV2wYJ6wrQAzlas=; b=hJf7k1Jx5hYf5MdClUa667cyXm03Ci+essGr+Arv9qQoCXecA+RGGwBGwl2oGVFy4N LYcJ/tGm/C7PPdyYp1jlP7NxMVf1ELNbbI8q+BB8zP0smX/FRDl6jz8sJOQ9GQvXAt/t VEFB8tfWD2DJW8SNAuS9BjbIK70e790uNdu6d3d/TRiFBa95ZKqLkwif8dg5VZW3+PYp E8QSVMI/ioaV/unSeiEhTX2qucvhMLlRnaAAVyErQWexhrAfK2nW2pcyRXNhL9bsqT/M hUkJFimcMDgfFxZzGf/McIhnc03rYyZI1VKnwPc/TV4C+NNs5JiT5yAHnPQtOI8J92nR dlkA== X-Gm-Message-State: AOAM5323CLwxvoyznAHvqP9cd1zotbid+gHEvLQNWQ7S+Hn6lHc1sycP flgkqyTQV8mayKYUmkSmO/NwvfEmVLM0qFqI9ac= X-Google-Smtp-Source: ABdhPJzTakjeKB/IgMQxtQOYJmjIbV2PD/fVBzO5TYn31aBfrrNkPMYL7VAcd3y013xmNVz9cxedOndjyHhXS8ZPq0M= X-Received: by 2002:aa7:cc03:: with SMTP id q3mr1139714edt.366.1615249473522; Mon, 08 Mar 2021 16:24:33 -0800 (PST) MIME-Version: 1.0 References: <20210304235949.7922C1C3@viggo.jf.intel.com> <20210305000009.EDF902E9@viggo.jf.intel.com> In-Reply-To: <20210305000009.EDF902E9@viggo.jf.intel.com> From: Yang Shi Date: Mon, 8 Mar 2021 16:24:22 -0800 Message-ID: Subject: Re: [PATCH 10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration To: Dave Hansen Cc: Linux Kernel Mailing List , Linux MM , Yang Shi , David Rientjes , Huang Ying , Dan Williams , David Hildenbrand , Oscar Salvador Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A90B5407F8F7 X-Stat-Signature: uftqbhwu19yt75udiggnzgpo5fsrmwct Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=mail-ed1-f49.google.com; client-ip=209.85.208.49 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615249472-278494 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 4, 2021 at 4:01 PM Dave Hansen wrote: > > > From: Dave Hansen > > Some method is obviously needed to enable reclaim-based migration. > > Just like traditional autonuma, there will be some workloads that > will benefit like workloads with more "static" configurations where > hot pages stay hot and cold pages stay cold. If pages come and go > from the hot and cold sets, the benefits of this approach will be > more limited. > > The benefits are truly workload-based and *not* hardware-based. > We do not believe that there is a viable threshold where certain > hardware configurations should have this mechanism enabled while > others do not. > > To be conservative, earlier work defaulted to disable reclaim- > based migration and did not include a mechanism to enable it. > This proposes extending the existing "zone_reclaim_mode" (now > now really node_reclaim_mode) as a method to enable it. > > We are open to any alternative that allows end users to enable > this mechanism or disable it it workload harm is detected (just > like traditional autonuma). > > Once this is enabled page demotion may move data to a NUMA node > that does not fall into the cpuset of the allocating process. > This could be construed to violate the guarantees of cpusets. > However, since this is an opt-in mechanism, the assumption is > that anyone enabling it is content to relax the guarantees. I think we'd better have the cpuset violation paragraph along with new zone reclaim mode text so that the users are aware of the potential violation. I don't think commit log is the to-go place for any plain users. > > Signed-off-by: Dave Hansen > Cc: Yang Shi > Cc: David Rientjes > Cc: Huang Ying > Cc: Dan Williams > Cc: David Hildenbrand > Cc: osalvador > > changes since 20200122: > * Changelog material about relaxing cpuset constraints > --- > > b/Documentation/admin-guide/sysctl/vm.rst | 9 +++++++++ > b/include/linux/swap.h | 3 ++- > b/include/uapi/linux/mempolicy.h | 1 + > b/mm/vmscan.c | 6 ++++-- > 4 files changed, 16 insertions(+), 3 deletions(-) > > diff -puN Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE Documentation/admin-guide/sysctl/vm.rst > --- a/Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE 2021-03-04 15:36:26.078806355 -0800 > +++ b/Documentation/admin-guide/sysctl/vm.rst 2021-03-04 15:36:26.093806355 -0800 > @@ -976,6 +976,7 @@ This is value OR'ed together of > 1 Zone reclaim on > 2 Zone reclaim writes dirty pages out > 4 Zone reclaim swaps pages > +8 Zone reclaim migrates pages > = =================================== > > zone_reclaim_mode is disabled by default. For file servers or workloads > @@ -1000,3 +1001,11 @@ of other processes running on other node > Allowing regular swap effectively restricts allocations to the local > node unless explicitly overridden by memory policies or cpuset > configurations. > + > +Page migration during reclaim is intended for systems with tiered memory > +configurations. These systems have multiple types of memory with varied > +performance characteristics instead of plain NUMA systems where the same > +kind of memory is found at varied distances. Allowing page migration > +during reclaim enables these systems to migrate pages from fast tiers to > +slow tiers when the fast tier is under pressure. This migration is > +performed before swap. > diff -puN include/linux/swap.h~RECLAIM_MIGRATE include/linux/swap.h > --- a/include/linux/swap.h~RECLAIM_MIGRATE 2021-03-04 15:36:26.082806355 -0800 > +++ b/include/linux/swap.h 2021-03-04 15:36:26.093806355 -0800 > @@ -382,7 +382,8 @@ extern int sysctl_min_slab_ratio; > static inline bool node_reclaim_enabled(void) > { > /* Is any node_reclaim_mode bit set? */ > - return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP); > + return node_reclaim_mode & (RECLAIM_ZONE |RECLAIM_WRITE| > + RECLAIM_UNMAP|RECLAIM_MIGRATE); > } > > extern void check_move_unevictable_pages(struct pagevec *pvec); > diff -puN include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE include/uapi/linux/mempolicy.h > --- a/include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE 2021-03-04 15:36:26.084806355 -0800 > +++ b/include/uapi/linux/mempolicy.h 2021-03-04 15:36:26.094806355 -0800 > @@ -69,5 +69,6 @@ enum { > #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ > #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ > #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ > +#define RECLAIM_MIGRATE (1<<3) /* Migrate to other nodes during reclaim */ > > #endif /* _UAPI_LINUX_MEMPOLICY_H */ > diff -puN mm/vmscan.c~RECLAIM_MIGRATE mm/vmscan.c > --- a/mm/vmscan.c~RECLAIM_MIGRATE 2021-03-04 15:36:26.087806355 -0800 > +++ b/mm/vmscan.c 2021-03-04 15:36:26.096806355 -0800 > @@ -1073,6 +1073,9 @@ static bool migrate_demote_page_ok(struc > VM_BUG_ON_PAGE(PageHuge(page), page); > VM_BUG_ON_PAGE(PageLRU(page), page); > > + if (!(node_reclaim_mode & RECLAIM_MIGRATE)) > + return false; > + > /* It is pointless to do demotion in memcg reclaim */ > if (cgroup_reclaim(sc)) > return false; > @@ -1082,8 +1085,7 @@ static bool migrate_demote_page_ok(struc > if (PageTransHuge(page) && !thp_migration_supported()) > return false; > > - // FIXME: actually enable this later in the series > - return false; > + return true; > } > > /* Check if a page is dirty or under writeback */ > _ >