stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI
       [not found] <20200701152621.D520E62B@viggo.jf.intel.com>
@ 2020-07-01 15:26 ` Dave Hansen
  2020-07-01 20:03   ` David Rientjes
  2020-07-02 11:28   ` Huang, Ying
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Hansen @ 2020-07-01 15:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, ben.widawsky, alex.shi, dwagner, tobin,
	cl, akpm, ying.huang, dan.j.williams, cai, stable


From: Dave Hansen <dave.hansen@linux.intel.com>

I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl.  Like a good kernel developer, I also went to go update the
documentation.  I noticed that the bits in the documentation didn't
match the bits in the #defines.

The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'.
The RECLAIM_ZONE #define was removed in a cleanup.  That, by itself
is fine.

But, when the bit was removed (bit 0) the _other_ bit locations also
got changed.  That's not OK because the bit values are documented to
mean one specific thing and users surely rely on them meaning that one
thing and not changing from kernel to kernel.  The end result is that
if someone had a script that did:

	sysctl vm.zone_reclaim_mode=1

That script went from doing nothing to writing out pages during
node reclaim after the commit in question.  That's not great.

Put the bits back the way they were and add a comment so something
like this is a bit harder to do again.  Update the documentation to
make it clear that the first bit is ignored.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: stable@vger.kernel.org
---

 b/Documentation/admin-guide/sysctl/vm.rst |   10 +++++-----
 b/mm/vmscan.c                             |    9 +++++++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
--- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi	2020-07-01 08:22:11.354955336 -0700
+++ b/Documentation/admin-guide/sysctl/vm.rst	2020-07-01 08:22:11.360955336 -0700
@@ -948,11 +948,11 @@ that benefit from having their data cach
 left disabled as the caching effect is likely to be more important than
 data locality.
 
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction.  The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction.  The page allocator will take additional actions before
+allocating off node pages.
 
 Allowing zone reclaim to write out pages stops processes that are
 writing large amounts of data from dirtying pages on other nodes. Zone
diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi	2020-07-01 08:22:11.356955336 -0700
+++ b/mm/vmscan.c	2020-07-01 08:22:11.362955336 -0700
@@ -4090,8 +4090,13 @@ module_init(kswapd_init)
  */
 int node_reclaim_mode __read_mostly;
 
-#define RECLAIM_WRITE (1<<0)	/* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<1)	/* Unmap pages during reclaim */
+/*
+ * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
+ * ABI.  New bits are OK, but existing bits can never change.
+ */
+#define RECLAIM_ZONE  (1<<0)	/* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<2)	/* Unmap pages during reclaim */
 
 /*
  * Priority for NODE_RECLAIM. This determines the fraction of pages
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI
  2020-07-01 15:26 ` [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI Dave Hansen
@ 2020-07-01 20:03   ` David Rientjes
  2020-07-02 11:28   ` Huang, Ying
  1 sibling, 0 replies; 5+ messages in thread
From: David Rientjes @ 2020-07-01 20:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, ben.widawsky, alex.shi, dwagner, tobin,
	cl, akpm, ying.huang, dan.j.williams, cai, stable

On Wed, 1 Jul 2020, Dave Hansen wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl.  Like a good kernel developer, I also went to go update the
> documentation.  I noticed that the bits in the documentation didn't
> match the bits in the #defines.
> 
> The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
> however implicitly checked when checking 'node_reclaim_mode==0'.
> The RECLAIM_ZONE #define was removed in a cleanup.  That, by itself
> is fine.
> 
> But, when the bit was removed (bit 0) the _other_ bit locations also
> got changed.  That's not OK because the bit values are documented to
> mean one specific thing and users surely rely on them meaning that one
> thing and not changing from kernel to kernel.  The end result is that
> if someone had a script that did:
> 
> 	sysctl vm.zone_reclaim_mode=1
> 
> That script went from doing nothing to writing out pages during
> node reclaim after the commit in question.  That's not great.
> 
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again.  Update the documentation to
> make it clear that the first bit is ignored.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> Cc: Ben Widawsky <ben.widawsky@intel.com>
> Cc: Alex Shi <alex.shi@linux.alibaba.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: "Tobin C. Harding" <tobin@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: stable@vger.kernel.org

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI
  2020-07-01 15:26 ` [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI Dave Hansen
  2020-07-01 20:03   ` David Rientjes
@ 2020-07-02 11:28   ` Huang, Ying
  2020-07-02 14:36     ` Dave Hansen
  1 sibling, 1 reply; 5+ messages in thread
From: Huang, Ying @ 2020-07-02 11:28 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, ben.widawsky, alex.shi, dwagner, tobin,
	cl, akpm, dan.j.williams, cai, stable

Dave Hansen <dave.hansen@linux.intel.com> writes:

> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl.  Like a good kernel developer, I also went to go update the
> documentation.  I noticed that the bits in the documentation didn't
> match the bits in the #defines.
>
> The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
> however implicitly checked when checking 'node_reclaim_mode==0'.
> The RECLAIM_ZONE #define was removed in a cleanup.  That, by itself
> is fine.
>
> But, when the bit was removed (bit 0) the _other_ bit locations also
> got changed.  That's not OK because the bit values are documented to
> mean one specific thing and users surely rely on them meaning that one
> thing and not changing from kernel to kernel.  The end result is that
> if someone had a script that did:
>
> 	sysctl vm.zone_reclaim_mode=1
>
> That script went from doing nothing

Per my understanding, this script would have enabled node reclaim for
clean unmapped pages before commit 648b5cf368e0 ("mm/vmscan: remove
unused RECLAIM_OFF/RECLAIM_ZONE").  So we should revise the description
here?

> to writing out pages during
> node reclaim after the commit in question.  That's not great.
>
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again.  Update the documentation to
> make it clear that the first bit is ignored.
>

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI
  2020-07-02 11:28   ` Huang, Ying
@ 2020-07-02 14:36     ` Dave Hansen
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2020-07-02 14:36 UTC (permalink / raw)
  To: Huang, Ying, Dave Hansen
  Cc: linux-kernel, linux-mm, ben.widawsky, alex.shi, dwagner, tobin,
	cl, akpm, dan.j.williams, cai, stable

On 7/2/20 4:28 AM, Huang, Ying wrote:
>> But, when the bit was removed (bit 0) the _other_ bit locations also
>> got changed.  That's not OK because the bit values are documented to
>> mean one specific thing and users surely rely on them meaning that one
>> thing and not changing from kernel to kernel.  The end result is that
>> if someone had a script that did:
>>
>> 	sysctl vm.zone_reclaim_mode=1
>>
>> That script went from doing nothing
> Per my understanding, this script would have enabled node reclaim for
> clean unmapped pages before commit 648b5cf368e0 ("mm/vmscan: remove
> unused RECLAIM_OFF/RECLAIM_ZONE").  So we should revise the description
> here?

Yes, you're right.  I updated the patch with the updated understanding
about the implicit use of the bit but didn't update the changelog.  I'll
do that for v3.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI
       [not found] <20210219172553.B1E1A317@viggo.jf.intel.com>
@ 2021-02-19 17:25 ` Dave Hansen
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2021-02-19 17:25 UTC (permalink / raw)
  To: dave
  Cc: Dave Hansen, ben.widawsky, osalvador, rientjes, cl, alex.shi,
	dwagner, tobin, akpm, ying.huang, dan.j.williams, cai, stable


From: Dave Hansen <dave.hansen@linux.intel.com>

I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl.  Like a good kernel developer, I also went to go update the
documentation.  I noticed that the bits in the documentation didn't
match the bits in the #defines.

The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'.
The RECLAIM_ZONE #define was removed in a cleanup.  That, by itself
is fine.

But, when the bit was removed (bit 0) the _other_ bit locations also
got changed.  That's not OK because the bit values are documented to
mean one specific thing.  Users surely do not expect the meaning to
change from kernel to kernel.

The end result is that if someone had a script that did:

	sysctl vm.zone_reclaim_mode=1

it would have gone from enabling node reclaim for clean unmapped
pages to writing out pages during node reclaim after the commit in
question.  That's not great.

Put the bits back the way they were and add a comment so something
like this is a bit harder to do again.  Update the documentation to
make it clear that the first bit is ignored.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: stable@vger.kernel.org

--

Changes from v2:
 * Update description to indicate that bit0 was used for clean
   unmapped page node reclaim.
---

 b/Documentation/admin-guide/sysctl/vm.rst |   10 +++++-----
 b/mm/vmscan.c                             |    9 +++++++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
--- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi	2021-02-19 09:25:26.656663105 -0800
+++ b/Documentation/admin-guide/sysctl/vm.rst	2021-02-19 09:25:26.662663105 -0800
@@ -983,11 +983,11 @@ that benefit from having their data cach
 left disabled as the caching effect is likely to be more important than
 data locality.
 
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction.  The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction.  The page allocator will take additional actions before
+allocating off node pages.
 
 Allowing zone reclaim to write out pages stops processes that are
 writing large amounts of data from dirtying pages on other nodes. Zone
diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi	2021-02-19 09:25:26.658663105 -0800
+++ b/mm/vmscan.c	2021-02-19 09:25:26.665663105 -0800
@@ -4095,8 +4095,13 @@ module_init(kswapd_init)
  */
 int node_reclaim_mode __read_mostly;
 
-#define RECLAIM_WRITE (1<<0)	/* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<1)	/* Unmap pages during reclaim */
+/*
+ * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
+ * ABI.  New bits are OK, but existing bits can never change.
+ */
+#define RECLAIM_ZONE  (1<<0)   /* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1)   /* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<2)   /* Unmap pages during reclaim */
 
 /*
  * Priority for NODE_RECLAIM. This determines the fraction of pages
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-19 17:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200701152621.D520E62B@viggo.jf.intel.com>
2020-07-01 15:26 ` [PATCH 1/3] mm/vmscan: restore zone_reclaim_mode ABI Dave Hansen
2020-07-01 20:03   ` David Rientjes
2020-07-02 11:28   ` Huang, Ying
2020-07-02 14:36     ` Dave Hansen
     [not found] <20210219172553.B1E1A317@viggo.jf.intel.com>
2021-02-19 17:25 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).