linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	ben.widawsky@intel.com, rientjes@google.com, cl@linux.com,
	alex.shi@linux.alibaba.com, dwagner@suse.de, tobin@kernel.org,
	akpm@linux-foundation.org, ying.huang@intel.com,
	dan.j.williams@intel.com, cai@lca.pw, stable@vger.kernel.org
Subject: Re: [RFC][PATCH 01/13] mm/vmscan: restore zone_reclaim_mode ABI
Date: Wed, 10 Feb 2021 10:42:28 +0100	[thread overview]
Message-ID: <20210210094222.GA27173@linux> (raw)
In-Reply-To: <20210126003412.59594AA9@viggo.jf.intel.com>

On Mon, Jan 25, 2021 at 04:34:13PM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl.  Like a good kernel developer, I also went to go update the
> documentation.  I noticed that the bits in the documentation didn't
> match the bits in the #defines.
> 
> The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
> however implicitly checked when checking 'node_reclaim_mode==0'.
> The RECLAIM_ZONE #define was removed in a cleanup.  That, by itself
> is fine.
> 
> But, when the bit was removed (bit 0) the _other_ bit locations also
> got changed.  That's not OK because the bit values are documented to
> mean one specific thing and users surely rely on them meaning that one
> thing and not changing from kernel to kernel.  The end result is that
> if someone had a script that did:
> 
> 	sysctl vm.zone_reclaim_mode=1
> 
> This script would have gone from enalbing node reclaim for clean
> unmapped pages to writing out pages during node reclaim after the
> commit in question.  That's not great.
> 
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again.  Update the documentation to
> make it clear that the first bit is ignored.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> Cc: Alex Shi <alex.shi@linux.alibaba.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: "Tobin C. Harding" <tobin@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: osalvador <osalvador@suse.de>
> Cc: stable@vger.kernel.org

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> 
> --
> 
> Changes from v2:
>  * Update description to indicate that bit0 was used for clean
>    unmapped page node reclaim.
> ---
> 
>  b/Documentation/admin-guide/sysctl/vm.rst |   10 +++++-----
>  b/mm/vmscan.c                             |    9 +++++++--
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
> --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi	2021-01-25 16:23:06.048866718 -0800
> +++ b/Documentation/admin-guide/sysctl/vm.rst	2021-01-25 16:23:06.056866718 -0800
> @@ -978,11 +978,11 @@ that benefit from having their data cach
>  left disabled as the caching effect is likely to be more important than
>  data locality.
>  
> -zone_reclaim may be enabled if it's known that the workload is partitioned
> -such that each partition fits within a NUMA node and that accessing remote
> -memory would cause a measurable performance reduction.  The page allocator
> -will then reclaim easily reusable pages (those page cache pages that are
> -currently not used) before allocating off node pages.
> +Consider enabling one or more zone_reclaim mode bits if it's known that the
> +workload is partitioned such that each partition fits within a NUMA node
> +and that accessing remote memory would cause a measurable performance
> +reduction.  The page allocator will take additional actions before
> +allocating off node pages.
>  
>  Allowing zone reclaim to write out pages stops processes that are
>  writing large amounts of data from dirtying pages on other nodes. Zone
> diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi	2021-01-25 16:23:06.052866718 -0800
> +++ b/mm/vmscan.c	2021-01-25 16:23:06.057866718 -0800
> @@ -4086,8 +4086,13 @@ module_init(kswapd_init)
>   */
>  int node_reclaim_mode __read_mostly;
>  
> -#define RECLAIM_WRITE (1<<0)	/* Writeout pages during reclaim */
> -#define RECLAIM_UNMAP (1<<1)	/* Unmap pages during reclaim */
> +/*
> + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
> + * ABI.  New bits are OK, but existing bits can never change.
> + */
> +#define RECLAIM_ZONE  (1<<0)   /* Run shrink_inactive_list on the zone */
> +#define RECLAIM_WRITE (1<<1)   /* Writeout pages during reclaim */
> +#define RECLAIM_UNMAP (1<<2)   /* Unmap pages during reclaim */
>  
>  /*
>   * Priority for NODE_RECLAIM. This determines the fraction of pages
> _
> 

-- 
Oscar Salvador
SUSE L3


  reply	other threads:[~2021-02-10  9:42 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26  0:34 [RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 01/13] mm/vmscan: restore zone_reclaim_mode ABI Dave Hansen
2021-02-10  9:42   ` Oscar Salvador [this message]
2021-01-26  0:34 ` [RFC][PATCH 02/13] mm/vmscan: move RECLAIM* bits to uapi header Dave Hansen
2021-02-10  9:44   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 03/13] mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks Dave Hansen
2021-01-31  1:10   ` David Rientjes
2021-02-10  9:54   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 04/13] mm/numa: node demotion data structure and lookup Dave Hansen
2021-01-31  1:19   ` David Rientjes
2021-02-01 17:49     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 05/13] mm/numa: automatically generate node migration order Dave Hansen
2021-01-29 20:46   ` Yang Shi
2021-02-01 19:13     ` Dave Hansen
2021-02-02 11:43       ` Oscar Salvador
2021-02-02 17:46       ` Yang Shi
2021-02-03  0:43         ` Dave Hansen
2021-02-04  0:26           ` Yang Shi
2021-01-26  0:34 ` [RFC][PATCH 06/13] mm/migrate: update migration order during on hotplug events Dave Hansen
2021-01-29 20:59   ` Yang Shi
2021-02-02 11:42   ` Oscar Salvador
2021-02-09 23:45     ` Dave Hansen
2021-02-10  8:55       ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 07/13] mm/migrate: make migrate_pages() return nr_succeeded Dave Hansen
2021-01-29 21:04   ` Yang Shi
2021-02-09 23:41     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim Dave Hansen
2021-02-02 11:55   ` Oscar Salvador
2021-02-02 22:45     ` Yang Shi
2021-02-02 22:56       ` Dave Hansen
2021-02-02 18:22   ` Yang Shi
2021-02-02 18:34     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 09/13] mm/vmscan: add page demotion counter Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 10/13] mm/vmscan: add helper for querying ability to age anonymous pages Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap Dave Hansen
2021-02-02 18:56   ` Yang Shi
2021-02-02 21:35     ` Dave Hansen
2021-02-02 22:35       ` Yang Shi
2021-01-26  0:34 ` [RFC][PATCH 12/13] mm/vmscan: never demote for memcg reclaim Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 13/13] mm/migrate: new zone_reclaim_mode to enable reclaim migration Dave Hansen
2021-01-31  1:13 ` [RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210210094222.GA27173@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=ben.widawsky@intel.com \
    --cc=cai@lca.pw \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwagner@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tobin@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).