All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path
@ 2017-02-08  5:40 Gavin Shan
  2017-02-08  9:28 ` Mel Gorman
  2017-02-08 10:08 ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Gavin Shan @ 2017-02-08  5:40 UTC (permalink / raw)
  To: linux-mm; +Cc: mgorman, akpm, anton, mpe, Gavin Shan, # v3 . 16+

When @node_reclaim_node isn't 0, the page allocator tries to reclaim
pages if the amount of free memory in the zones are below the low
watermark. On Power platform, none of NUMA nodes are scanned for page
reclaim because no nodes match the condition in zone_allows_reclaim().
On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
of Node-A to Node-A. So the preferred node even won't be scanned for
page reclaim.

   __alloc_pages_nodemask()
   get_page_from_freelist()
      zone_allows_reclaim()

Anton proposed the test code as below:

   # cat alloc.c
      :
   int main(int argc, char *argv[])
   {
	void *p;
	unsigned long size;
	unsigned long start, end;

	start = time(NULL);
	size = strtoul(argv[1], NULL, 0);
	printf("To allocate %ldGB memory\n", size);

	size <<= 30;
	p = malloc(size);
	assert(p);
	memset(p, 0, size);

	end = time(NULL);
	printf("Used time: %ld seconds\n", end - start);
	sleep(3600);
	return 0;
   }

The system I use for testing has two NUMA nodes. Both have 128GB
memory. In below scnario, the page caches on node#0 should be reclaimed
when it encounters pressure to accommodate request of allocation.

   # echo 2 > /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 > /proc/sys/vm/drop_caches; \
   # taskset -c 0 cat file.32G > /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619712 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619840 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:          186816 kB

With the patch applied, the pagecache on node-0 is reclaimed when
its free memory is running out. It's the expected behaviour.

   # echo 2 > /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 > /proc/sys/vm/drop_caches
   # taskset -c 0 cat file.32G > /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33605568 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:        1379520 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:           317120 kB

Fixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f3e0c69..1a5f665 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2877,7 +2877,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
 #ifdef CONFIG_NUMA
 static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
 {
-	return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <
+	return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <=
 				RECLAIM_DISTANCE;
 }
 #else	/* CONFIG_NUMA */
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path
  2017-02-08  5:40 [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path Gavin Shan
@ 2017-02-08  9:28 ` Mel Gorman
  2017-02-08 10:08 ` Michal Hocko
  1 sibling, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2017-02-08  9:28 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linux-mm, akpm, anton, mpe, # v3 . 16+

On Wed, Feb 08, 2017 at 04:40:55PM +1100, Gavin Shan wrote:
> When @node_reclaim_node isn't 0, the page allocator tries to reclaim
> pages if the amount of free memory in the zones are below the low
> watermark. On Power platform, none of NUMA nodes are scanned for page
> reclaim because no nodes match the condition in zone_allows_reclaim().
> On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
> of Node-A to Node-A. So the preferred node even won't be scanned for
> page reclaim.
> 
> Fixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
> Cc: <stable@vger.kernel.org> # v3.16+
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path
  2017-02-08  5:40 [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path Gavin Shan
  2017-02-08  9:28 ` Mel Gorman
@ 2017-02-08 10:08 ` Michal Hocko
  2017-02-08 23:06   ` Gavin Shan
  2017-02-08 23:07   ` Gavin Shan
  1 sibling, 2 replies; 6+ messages in thread
From: Michal Hocko @ 2017-02-08 10:08 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linux-mm, mgorman, akpm, anton, mpe, # v3 . 16+

On Wed 08-02-17 16:40:55, Gavin Shan wrote:
> When @node_reclaim_node isn't 0, the page allocator tries to reclaim
> pages if the amount of free memory in the zones are below the low
> watermark. On Power platform, none of NUMA nodes are scanned for page
> reclaim because no nodes match the condition in zone_allows_reclaim().
> On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
> of Node-A to Node-A. So the preferred node even won't be scanned for
> page reclaim.

This is quite confusing. I can see 56608209d34b ("powerpc/numa: Set a
smaller value for RECLAIM_DISTANCE to enable zone reclaim") which
enforced the zone_reclaim by reducing the RECLAIM_DISTANCE, now you are
building on top of that. Having RECLAIM_DISTANCE == LOCAL_DISTANCE is
really confusing. What are distances of other nodes (in other words what
does numactl --hardware tells)? I am wondering whether we shouldn't
rather revert 56608209d34b as the node_reclaim (these days) is not
enabled by default anymore.

[...]

> Fixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
> Cc: <stable@vger.kernel.org> # v3.16+
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

anyway the patch looks OK as it brings the previous behavior back. Not
that I would be entirely happy about that behavior as it is quite nasty
- e.g. it will trigger direct reclaim from the allocator fast path way
too much and basically skip the kswapd wake up most of the time if there
is anything reclaimable... But this used to be there before as well.

Acked-by: Michal Hocko <mhocko@suse.com>

but I would really like to get rid of the ppc specific RECLAIM_DISTANCE
if possible as well.

> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f3e0c69..1a5f665 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2877,7 +2877,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
>  #ifdef CONFIG_NUMA
>  static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
>  {
> -	return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <
> +	return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <=
>  				RECLAIM_DISTANCE;
>  }
>  #else	/* CONFIG_NUMA */
> -- 
> 2.7.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path
  2017-02-08 10:08 ` Michal Hocko
@ 2017-02-08 23:06   ` Gavin Shan
  2017-02-09  8:57     ` Michal Hocko
  2017-02-08 23:07   ` Gavin Shan
  1 sibling, 1 reply; 6+ messages in thread
From: Gavin Shan @ 2017-02-08 23:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Gavin Shan, linux-mm, mgorman, akpm, anton, mpe, # v3 . 16+

On Wed, Feb 08, 2017 at 11:08:50AM +0100, Michal Hocko wrote:
>On Wed 08-02-17 16:40:55, Gavin Shan wrote:
>> When @node_reclaim_node isn't 0, the page allocator tries to reclaim
>> pages if the amount of free memory in the zones are below the low
>> watermark. On Power platform, none of NUMA nodes are scanned for page
>> reclaim because no nodes match the condition in zone_allows_reclaim().
>> On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
>> of Node-A to Node-A. So the preferred node even won't be scanned for
>> page reclaim.
>
>This is quite confusing. I can see 56608209d34b ("powerpc/numa: Set a
>smaller value for RECLAIM_DISTANCE to enable zone reclaim") which
>enforced the zone_reclaim by reducing the RECLAIM_DISTANCE, now you are
>building on top of that. Having RECLAIM_DISTANCE == LOCAL_DISTANCE is
>really confusing. What are distances of other nodes (in other words what
>does numactl --hardware tells)? I am wondering whether we shouldn't
>rather revert 56608209d34b as the node_reclaim (these days) is not
>enabled by default anymore.
>

Michael, Yeah, it's a bit confusing. Let me try to summarize the history:
the code 56608209d34b (2.6.35) depends, which is shown in its commit log,
was removed by 957f822a0ab9 (3.10). Since then, the code change introduced
by 56608209d34b (2.6.35) becomes obsoleted. However, the local pagecache
(with @node_reclaim_mode turned on manually) was able to be shrinked at
that point (3.10) until 5f7a75acdb24 (3.16) was merged. This patch fixes
the issue introduced by 5f7a75acdb24 and needs go to 3.16+. Hope this
makes things more clear, not more confusing :-)

Yes, I already planned to set PowerPC specific RECLAIM_DISTANCE to 30, same
value to the generic one, as I said in the last reply of the thread:
https://patchwork.ozlabs.org/patch/718830/

>
>> Fixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
>> Cc: <stable@vger.kernel.org> # v3.16+
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>anyway the patch looks OK as it brings the previous behavior back. Not
>that I would be entirely happy about that behavior as it is quite nasty
>- e.g. it will trigger direct reclaim from the allocator fast path way
>too much and basically skip the kswapd wake up most of the time if there
>is anything reclaimable... But this used to be there before as well.
>
>Acked-by: Michal Hocko <mhocko@suse.com>
>
>but I would really like to get rid of the ppc specific RECLAIM_DISTANCE
>if possible as well.
>

Yes, I will post one patch for this and you will be copied.

Thanks,
Gavin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path
  2017-02-08 10:08 ` Michal Hocko
  2017-02-08 23:06   ` Gavin Shan
@ 2017-02-08 23:07   ` Gavin Shan
  1 sibling, 0 replies; 6+ messages in thread
From: Gavin Shan @ 2017-02-08 23:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Gavin Shan, linux-mm, mgorman, akpm, anton, mpe, # v3 . 16+

On Wed, Feb 08, 2017 at 11:08:50AM +0100, Michal Hocko wrote:
>On Wed 08-02-17 16:40:55, Gavin Shan wrote:
>> When @node_reclaim_node isn't 0, the page allocator tries to reclaim
>> pages if the amount of free memory in the zones are below the low
>> watermark. On Power platform, none of NUMA nodes are scanned for page
>> reclaim because no nodes match the condition in zone_allows_reclaim().
>> On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
>> of Node-A to Node-A. So the preferred node even won't be scanned for
>> page reclaim.
>
>This is quite confusing. I can see 56608209d34b ("powerpc/numa: Set a
>smaller value for RECLAIM_DISTANCE to enable zone reclaim") which
>enforced the zone_reclaim by reducing the RECLAIM_DISTANCE, now you are
>building on top of that. Having RECLAIM_DISTANCE == LOCAL_DISTANCE is
>really confusing. What are distances of other nodes (in other words what
>does numactl --hardware tells)?

oops, missed to paste the output from numactl:

# numactl --hardware
available: 2 nodes (0,8)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 0 size: 130703 MB
node 0 free: 127424 MB
node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 8 size: 130647 MB
node 8 free: 130038 MB
node distances:
node   0   8 
  0:  10  40 
  8:  40  10 

Thanks,
Gavin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path
  2017-02-08 23:06   ` Gavin Shan
@ 2017-02-09  8:57     ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2017-02-09  8:57 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linux-mm, mgorman, akpm, anton, mpe, # v3 . 16+

On Thu 09-02-17 10:06:18, Gavin Shan wrote:
> On Wed, Feb 08, 2017 at 11:08:50AM +0100, Michal Hocko wrote:
> >On Wed 08-02-17 16:40:55, Gavin Shan wrote:
> >> When @node_reclaim_node isn't 0, the page allocator tries to reclaim
> >> pages if the amount of free memory in the zones are below the low
> >> watermark. On Power platform, none of NUMA nodes are scanned for page
> >> reclaim because no nodes match the condition in zone_allows_reclaim().
> >> On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
> >> of Node-A to Node-A. So the preferred node even won't be scanned for
> >> page reclaim.
> >
> >This is quite confusing. I can see 56608209d34b ("powerpc/numa: Set a
> >smaller value for RECLAIM_DISTANCE to enable zone reclaim") which
> >enforced the zone_reclaim by reducing the RECLAIM_DISTANCE, now you are
> >building on top of that. Having RECLAIM_DISTANCE == LOCAL_DISTANCE is
> >really confusing. What are distances of other nodes (in other words what
> >does numactl --hardware tells)? I am wondering whether we shouldn't
> >rather revert 56608209d34b as the node_reclaim (these days) is not
> >enabled by default anymore.
> >
> 
> Michael, Yeah, it's a bit confusing. Let me try to summarize the history:
> the code 56608209d34b (2.6.35) depends, which is shown in its commit log,
> was removed by 957f822a0ab9 (3.10). Since then, the code change introduced
> by 56608209d34b (2.6.35) becomes obsoleted. However, the local pagecache
> (with @node_reclaim_mode turned on manually) was able to be shrinked at
> that point (3.10) until 5f7a75acdb24 (3.16) was merged. This patch fixes
> the issue introduced by 5f7a75acdb24 and needs go to 3.16+. Hope this
> makes things more clear, not more confusing :-)

yeah, it is clear as mud ;)

> Yes, I already planned to set PowerPC specific RECLAIM_DISTANCE to 30, same
> value to the generic one, as I said in the last reply of the thread:
> https://patchwork.ozlabs.org/patch/718830/

just drop the ppc specific definition and use the generic one instead.
 
> >> Fixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
> >> Cc: <stable@vger.kernel.org> # v3.16+
> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >
> >anyway the patch looks OK as it brings the previous behavior back. Not
> >that I would be entirely happy about that behavior as it is quite nasty
> >- e.g. it will trigger direct reclaim from the allocator fast path way
> >too much and basically skip the kswapd wake up most of the time if there
> >is anything reclaimable... But this used to be there before as well.
> >
> >Acked-by: Michal Hocko <mhocko@suse.com>
> >
> >but I would really like to get rid of the ppc specific RECLAIM_DISTANCE
> >if possible as well.
> >
> 
> Yes, I will post one patch for this and you will be copied.

Thanks!

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-09  8:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-08  5:40 [PATCH] mm/page_alloc: Fix nodes for reclaim in fast path Gavin Shan
2017-02-08  9:28 ` Mel Gorman
2017-02-08 10:08 ` Michal Hocko
2017-02-08 23:06   ` Gavin Shan
2017-02-09  8:57     ` Michal Hocko
2017-02-08 23:07   ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.