Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] numa node stats alternative fix
@ 2017-01-02 15:30 Michal Hocko
  2017-01-02 15:30 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
  2017-01-02 15:30 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
  0 siblings, 2 replies; 9+ messages in thread
From: Michal Hocko @ 2017-01-02 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Jia He, Andi Kleen, linux-mm, LKML

Hi,
this is an alternative fix for [1] which is currently sitting in the mm
tree.  I believe that the patch 1 is better because it allows to get rid
of __GFP_OTHER_NODE (patch 2) and it uses less branches as well. Vlastimil
has also shown [2] that the patch from Jia He is not fully compatible with
the code before the patch it tries to fix. I do not think that the issue
is serious enough to warrant stable tree inclusion.

Can we have these patches merged instead?

[1] http://lkml.kernel.org/r/1481522347-20393-1-git-send-email-hejianet@gmail.com
[2] http://lkml.kernel.org/r/233ed490-afb9-4644-6d84-c9f888882da2@suse.cz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] mm: fix remote numa hits statistics
  2017-01-02 15:30 [PATCH 0/2] numa node stats alternative fix Michal Hocko
@ 2017-01-02 15:30 ` Michal Hocko
  2017-01-02 15:30 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2017-01-02 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Jia He, Andi Kleen, linux-mm, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
in zone_statistics") has an unintentional side effect that remote node
allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
many of these potentially because the flag is used very rarely while
we have many users of __alloc_pages_node.

Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
follow up patch) and treat all allocations that were satisfied from the
preferred zone's node as NUMA_HITS because this is the same node we
requested the allocation from in most cases. If this is not the local
node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.

One downsize would be that an allocation request for a node which is
outside of the mempolicy nodemask would be reported as a hit which is a
bit weird but that was the case before b9f00e147f27 already.

Reported-by: Jia He <hejianet@gmail.com>
Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz> # with cbmc[1] superpowers
Signed-off-by: Michal Hocko <mhocko@suse.com>

[1] http://paulmck.livejournal.com/38997.html
---
 mm/page_alloc.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f6d5b73e1d7c..e2a44950a685 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2583,30 +2583,23 @@ int __isolate_free_page(struct page *page, unsigned int order)
  * Update NUMA hit/miss statistics
  *
  * Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
  */
 static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
 								gfp_t flags)
 {
 #ifdef CONFIG_NUMA
-	int local_nid = numa_node_id();
 	enum zone_stat_item local_stat = NUMA_LOCAL;
 
-	if (unlikely(flags & __GFP_OTHER_NODE)) {
+	if (z->node != numa_node_id())
 		local_stat = NUMA_OTHER;
-		local_nid = preferred_zone->node;
-	}
 
-	if (z->node == local_nid) {
+	if (z->node == preferred_zone->node)
 		__inc_zone_state(z, NUMA_HIT);
-		__inc_zone_state(z, local_stat);
-	} else {
+	else {
 		__inc_zone_state(z, NUMA_MISS);
 		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
 	}
+	__inc_zone_state(z, local_stat);
 #endif
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE
  2017-01-02 15:30 [PATCH 0/2] numa node stats alternative fix Michal Hocko
  2017-01-02 15:30 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
@ 2017-01-02 15:30 ` Michal Hocko
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2017-01-02 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Jia He, Andi Kleen, linux-mm, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

The flag has been introduced by 78afd5612deb ("mm: add __GFP_OTHER_NODE
flag") to allow proper accounting of remote node allocations done by
kernel daemons on behalf of a process - e.g. khugepaged.

After "mm: fix remote numa hits statistics" we do not need and actually
use the flag so we can safely remove it because all allocations which
are satisfied from their "home" node are accounted properly.

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/gfp.h            | 13 +++----------
 include/trace/events/mmflags.h |  1 -
 mm/huge_memory.c               |  3 +--
 mm/khugepaged.c                |  5 ++---
 mm/page_alloc.c                |  5 ++---
 tools/perf/builtin-kmem.c      |  1 -
 6 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4175dca4ac39..7806a8f80abc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -38,9 +38,8 @@ struct vm_area_struct;
 #define ___GFP_ACCOUNT		0x100000u
 #define ___GFP_NOTRACK		0x200000u
 #define ___GFP_DIRECT_RECLAIM	0x400000u
-#define ___GFP_OTHER_NODE	0x800000u
-#define ___GFP_WRITE		0x1000000u
-#define ___GFP_KSWAPD_RECLAIM	0x2000000u
+#define ___GFP_WRITE		0x800000u
+#define ___GFP_KSWAPD_RECLAIM	0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -172,11 +171,6 @@ struct vm_area_struct;
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
- *
- * __GFP_OTHER_NODE is for allocations that are on a remote node but that
- *   should not be accounted for as a remote allocation in vmstat. A
- *   typical user would be khugepaged collapsing a huge page on a remote
- *   node.
  */
 #define __GFP_COLD	((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
@@ -184,10 +178,9 @@ struct vm_area_struct;
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK	((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
-#define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 25
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5a81ab48a2fb..556a0efa8298 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -48,7 +48,6 @@
 	{(unsigned long)__GFP_RECLAIM,		"__GFP_RECLAIM"},	\
 	{(unsigned long)__GFP_DIRECT_RECLAIM,	"__GFP_DIRECT_RECLAIM"},\
 	{(unsigned long)__GFP_KSWAPD_RECLAIM,	"__GFP_KSWAPD_RECLAIM"},\
-	{(unsigned long)__GFP_OTHER_NODE,	"__GFP_OTHER_NODE"}	\
 
 #define show_gfp_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f3c2040edbb1..8206abf4ac03 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -918,8 +918,7 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd,
 	}
 
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
-		pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE |
-					       __GFP_OTHER_NODE, vma,
+		pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE, vma,
 					       vmf->address, page_to_nid(page));
 		if (unlikely(!pages[i] ||
 			     mem_cgroup_try_charge(pages[i], vma->vm_mm,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e32389a97030..211974a3992b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -943,7 +943,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
 	/* Only allocate from the target node */
-	gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_OTHER_NODE | __GFP_THISNODE;
+	gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
 
 	/*
 	 * Before allocating the hugepage, release the mmap_sem read lock.
@@ -1326,8 +1326,7 @@ static void collapse_shmem(struct mm_struct *mm,
 	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
 
 	/* Only allocate from the target node */
-	gfp = alloc_hugepage_khugepaged_gfpmask() |
-		__GFP_OTHER_NODE | __GFP_THISNODE;
+	gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
 
 	new_page = khugepaged_alloc_page(hpage, gfp, node);
 	if (!new_page) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2a44950a685..ea60dc06d280 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2584,8 +2584,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
  *
  * Must be called with interrupts disabled.
  */
-static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
-								gfp_t flags)
+static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 {
 #ifdef CONFIG_NUMA
 	enum zone_stat_item local_stat = NUMA_LOCAL;
@@ -2667,7 +2666,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 	}
 
 	__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
-	zone_statistics(preferred_zone, zone, gfp_flags);
+	zone_statistics(preferred_zone, zone);
 	local_irq_restore(flags);
 
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index d426dcb18ce9..33b959d47545 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -645,7 +645,6 @@ static const struct {
 	{ "__GFP_RECLAIM",		"R" },
 	{ "__GFP_DIRECT_RECLAIM",	"DR" },
 	{ "__GFP_KSWAPD_RECLAIM",	"KR" },
-	{ "__GFP_OTHER_NODE",		"ON" },
 };
 
 static size_t max_gfp_len;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: fix remote numa hits statistics
  2017-01-02 14:46     ` Michal Hocko
@ 2017-01-02 15:07       ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2017-01-02 15:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Mel Gorman, Andrew Morton, Joonsoo Kim, Jia He,
	Taku Izumi, Johannes Weiner

On 01/02/2017 03:46 PM, Michal Hocko wrote:
> On Mon 02-01-17 15:16:23, Vlastimil Babka wrote:
>> On 12/21/2016 09:06 AM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@suse.com>
>>>
>>> Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
>>> in zone_statistics") has an unintentional side effect that remote node
>>> allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
>>> NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
>>> many of these potentially because the flag is used very rarely while
>>> we have many users of __alloc_pages_node.
>>>
>>> Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
>>> follow up patch) and treat all allocations that were satisfied from the
>>> preferred zone's node as NUMA_HITS because this is the same node we
>>> requested the allocation from in most cases. If this is not the local
>>> node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.
>>>
>>> One downsize would be that an allocation request for a node which is
>>> outside of the mempolicy nodemask would be reported as a hit which is a
>>> bit weird but that was the case before b9f00e147f27 already.
>>>
>>> Reported-by: Jia He <hejianet@gmail.com>
>>> Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>
>> cbmc tells me that this patch is not equal to pre-commit b9f00e147f27
>> (in situation where __GFP_OTHER_NODE is not passed, as that's the only
>> relevant scenario after your patch), which seems unintended.
>>
>> counter example:
>> numa_node_id() == 1
>> preferred_zone on node 0
>> allocated from zone on node 1
>>
>> pre-b9f00e147f27:
>> allocated zone (node 1) increased NUMA_MISS and NUMA_LOCAL
>> preferred zone (node 0) increased NUMA_FOREIGN
>>
>> (that looks correct to me)
>>
>> your patch:
>> allocated zone (node 1) increased NUMA_MISS
>> preferred zone (node 0) increased NUMA_FOREIGN
>>
>> i.e. it's missing NUMA_LOCAL on node 1, which is IMHO wrong as this was
>> an allocation local to the CPU (albeit a MISS wrt the preferred node).
> 
> I guess the following should fix that, right?

Yes, it does.

With that, you can add:

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

(which is a lie, since the computer did that ;)

> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 647e940e6921..ea60dc06d280 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2587,17 +2587,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
>  {
>  #ifdef CONFIG_NUMA
> -	if (z->node == preferred_zone->node) {
> -		enum zone_stat_item local_stat = NUMA_LOCAL;
> +	enum zone_stat_item local_stat = NUMA_LOCAL;
>  
> +	if (z->node != numa_node_id())
> +		local_stat = NUMA_OTHER;
> +
> +	if (z->node == preferred_zone->node)
>  		__inc_zone_state(z, NUMA_HIT);
> -		if (z->node != numa_node_id())
> -			local_stat = NUMA_OTHER;
> -		__inc_zone_state(z, local_stat);
> -	} else {
> +	else {
>  		__inc_zone_state(z, NUMA_MISS);
>  		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
>  	}
> +	__inc_zone_state(z, local_stat);
>  #endif
>  }
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: fix remote numa hits statistics
  2017-01-02 14:16   ` Vlastimil Babka
@ 2017-01-02 14:46     ` Michal Hocko
  2017-01-02 15:07       ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2017-01-02 14:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Mel Gorman, Andrew Morton, Joonsoo Kim, Jia He,
	Taku Izumi, Johannes Weiner

On Mon 02-01-17 15:16:23, Vlastimil Babka wrote:
> On 12/21/2016 09:06 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
> > in zone_statistics") has an unintentional side effect that remote node
> > allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
> > NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
> > many of these potentially because the flag is used very rarely while
> > we have many users of __alloc_pages_node.
> > 
> > Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
> > follow up patch) and treat all allocations that were satisfied from the
> > preferred zone's node as NUMA_HITS because this is the same node we
> > requested the allocation from in most cases. If this is not the local
> > node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.
> > 
> > One downsize would be that an allocation request for a node which is
> > outside of the mempolicy nodemask would be reported as a hit which is a
> > bit weird but that was the case before b9f00e147f27 already.
> > 
> > Reported-by: Jia He <hejianet@gmail.com>
> > Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> cbmc tells me that this patch is not equal to pre-commit b9f00e147f27
> (in situation where __GFP_OTHER_NODE is not passed, as that's the only
> relevant scenario after your patch), which seems unintended.
> 
> counter example:
> numa_node_id() == 1
> preferred_zone on node 0
> allocated from zone on node 1
> 
> pre-b9f00e147f27:
> allocated zone (node 1) increased NUMA_MISS and NUMA_LOCAL
> preferred zone (node 0) increased NUMA_FOREIGN
> 
> (that looks correct to me)
> 
> your patch:
> allocated zone (node 1) increased NUMA_MISS
> preferred zone (node 0) increased NUMA_FOREIGN
> 
> i.e. it's missing NUMA_LOCAL on node 1, which is IMHO wrong as this was
> an allocation local to the CPU (albeit a MISS wrt the preferred node).

I guess the following should fix that, right?
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 647e940e6921..ea60dc06d280 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2587,17 +2587,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
 static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 {
 #ifdef CONFIG_NUMA
-	if (z->node == preferred_zone->node) {
-		enum zone_stat_item local_stat = NUMA_LOCAL;
+	enum zone_stat_item local_stat = NUMA_LOCAL;
 
+	if (z->node != numa_node_id())
+		local_stat = NUMA_OTHER;
+
+	if (z->node == preferred_zone->node)
 		__inc_zone_state(z, NUMA_HIT);
-		if (z->node != numa_node_id())
-			local_stat = NUMA_OTHER;
-		__inc_zone_state(z, local_stat);
-	} else {
+	else {
 		__inc_zone_state(z, NUMA_MISS);
 		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
 	}
+	__inc_zone_state(z, local_stat);
 #endif
 }
 
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: fix remote numa hits statistics
  2016-12-21  8:06 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
  2016-12-29 11:46   ` Mel Gorman
@ 2017-01-02 14:16   ` Vlastimil Babka
  2017-01-02 14:46     ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2017-01-02 14:16 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Mel Gorman, Andrew Morton, Joonsoo Kim, Jia He, Taku Izumi,
	Johannes Weiner, Michal Hocko

On 12/21/2016 09:06 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
> in zone_statistics") has an unintentional side effect that remote node
> allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
> NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
> many of these potentially because the flag is used very rarely while
> we have many users of __alloc_pages_node.
> 
> Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
> follow up patch) and treat all allocations that were satisfied from the
> preferred zone's node as NUMA_HITS because this is the same node we
> requested the allocation from in most cases. If this is not the local
> node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.
> 
> One downsize would be that an allocation request for a node which is
> outside of the mempolicy nodemask would be reported as a hit which is a
> bit weird but that was the case before b9f00e147f27 already.
> 
> Reported-by: Jia He <hejianet@gmail.com>
> Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
> Signed-off-by: Michal Hocko <mhocko@suse.com>

cbmc tells me that this patch is not equal to pre-commit b9f00e147f27
(in situation where __GFP_OTHER_NODE is not passed, as that's the only
relevant scenario after your patch), which seems unintended.

counter example:
numa_node_id() == 1
preferred_zone on node 0
allocated from zone on node 1

pre-b9f00e147f27:
allocated zone (node 1) increased NUMA_MISS and NUMA_LOCAL
preferred zone (node 0) increased NUMA_FOREIGN

(that looks correct to me)

your patch:
allocated zone (node 1) increased NUMA_MISS
preferred zone (node 0) increased NUMA_FOREIGN

i.e. it's missing NUMA_LOCAL on node 1, which is IMHO wrong as this was
an allocation local to the CPU (albeit a MISS wrt the preferred node).

Vlastimil

> ---
>  mm/page_alloc.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f6d5b73e1d7c..506946a902c5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order)
>   * Update NUMA hit/miss statistics
>   *
>   * Must be called with interrupts disabled.
> - *
> - * When __GFP_OTHER_NODE is set assume the node of the preferred
> - * zone is the local node. This is useful for daemons who allocate
> - * memory on behalf of other processes.
>   */
>  static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
>  								gfp_t flags)
>  {
>  #ifdef CONFIG_NUMA
> -	int local_nid = numa_node_id();
> -	enum zone_stat_item local_stat = NUMA_LOCAL;
> -
> -	if (unlikely(flags & __GFP_OTHER_NODE)) {
> -		local_stat = NUMA_OTHER;
> -		local_nid = preferred_zone->node;
> -	}
> +	if (z->node == preferred_zone->node) {
> +		enum zone_stat_item local_stat = NUMA_LOCAL;
>  
> -	if (z->node == local_nid) {
>  		__inc_zone_state(z, NUMA_HIT);
> +		if (z->node != numa_node_id())
> +			local_stat = NUMA_OTHER;
>  		__inc_zone_state(z, local_stat);
>  	} else {
>  		__inc_zone_state(z, NUMA_MISS);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: fix remote numa hits statistics
  2016-12-29 11:46   ` Mel Gorman
@ 2016-12-29 12:28     ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2016-12-29 12:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Vlastimil Babka, Andrew Morton, Joonsoo Kim, Jia He,
	Taku Izumi, Johannes Weiner

On Thu 29-12-16 11:46:01, Mel Gorman wrote:
> On Wed, Dec 21, 2016 at 09:06:52AM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
> > in zone_statistics") has an unintentional side effect that remote node
> > allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
> > NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
> > many of these potentially because the flag is used very rarely while
> > we have many users of __alloc_pages_node.
> > 
> > Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
> > follow up patch) and treat all allocations that were satisfied from the
> > preferred zone's node as NUMA_HITS because this is the same node we
> > requested the allocation from in most cases. If this is not the local
> > node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.
> > 
> > One downsize would be that an allocation request for a node which is
> > outside of the mempolicy nodemask would be reported as a hit which is a
> > bit weird but that was the case before b9f00e147f27 already.
> > 
> > Reported-by: Jia He <hejianet@gmail.com>
> > Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> For both patches;
> 
> Acked-by: Mel Gorman <mgorman@suse.de>

Thanks! I will give it a week for others to get back to it after holiday
and then resubmit.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: fix remote numa hits statistics
  2016-12-21  8:06 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
@ 2016-12-29 11:46   ` Mel Gorman
  2016-12-29 12:28     ` Michal Hocko
  2017-01-02 14:16   ` Vlastimil Babka
  1 sibling, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2016-12-29 11:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Vlastimil Babka, Andrew Morton, Joonsoo Kim, Jia He,
	Taku Izumi, Johannes Weiner, Michal Hocko

On Wed, Dec 21, 2016 at 09:06:52AM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
> in zone_statistics") has an unintentional side effect that remote node
> allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
> NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
> many of these potentially because the flag is used very rarely while
> we have many users of __alloc_pages_node.
> 
> Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
> follow up patch) and treat all allocations that were satisfied from the
> preferred zone's node as NUMA_HITS because this is the same node we
> requested the allocation from in most cases. If this is not the local
> node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.
> 
> One downsize would be that an allocation request for a node which is
> outside of the mempolicy nodemask would be reported as a hit which is a
> bit weird but that was the case before b9f00e147f27 already.
> 
> Reported-by: Jia He <hejianet@gmail.com>
> Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
> Signed-off-by: Michal Hocko <mhocko@suse.com>

For both patches;

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] mm: fix remote numa hits statistics
  2016-12-21  7:57 [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Michal Hocko
@ 2016-12-21  8:06 ` Michal Hocko
  2016-12-29 11:46   ` Mel Gorman
  2017-01-02 14:16   ` Vlastimil Babka
  0 siblings, 2 replies; 9+ messages in thread
From: Michal Hocko @ 2016-12-21  8:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, Vlastimil Babka, Andrew Morton, Joonsoo Kim, Jia He,
	Taku Izumi, Johannes Weiner, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
in zone_statistics") has an unintentional side effect that remote node
allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
many of these potentially because the flag is used very rarely while
we have many users of __alloc_pages_node.

Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
follow up patch) and treat all allocations that were satisfied from the
preferred zone's node as NUMA_HITS because this is the same node we
requested the allocation from in most cases. If this is not the local
node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.

One downsize would be that an allocation request for a node which is
outside of the mempolicy nodemask would be reported as a hit which is a
bit weird but that was the case before b9f00e147f27 already.

Reported-by: Jia He <hejianet@gmail.com>
Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f6d5b73e1d7c..506946a902c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order)
  * Update NUMA hit/miss statistics
  *
  * Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
  */
 static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
 								gfp_t flags)
 {
 #ifdef CONFIG_NUMA
-	int local_nid = numa_node_id();
-	enum zone_stat_item local_stat = NUMA_LOCAL;
-
-	if (unlikely(flags & __GFP_OTHER_NODE)) {
-		local_stat = NUMA_OTHER;
-		local_nid = preferred_zone->node;
-	}
+	if (z->node == preferred_zone->node) {
+		enum zone_stat_item local_stat = NUMA_LOCAL;
 
-	if (z->node == local_nid) {
 		__inc_zone_state(z, NUMA_HIT);
+		if (z->node != numa_node_id())
+			local_stat = NUMA_OTHER;
 		__inc_zone_state(z, local_stat);
 	} else {
 		__inc_zone_state(z, NUMA_MISS);
-- 
2.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-02 15:30 [PATCH 0/2] numa node stats alternative fix Michal Hocko
2017-01-02 15:30 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
2017-01-02 15:30 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2016-12-21  7:57 [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Michal Hocko
2016-12-21  8:06 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
2016-12-29 11:46   ` Mel Gorman
2016-12-29 12:28     ` Michal Hocko
2017-01-02 14:16   ` Vlastimil Babka
2017-01-02 14:46     ` Michal Hocko
2017-01-02 15:07       ` Vlastimil Babka

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git