linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] numa node stats alternative fix
@ 2017-01-02 15:30 Michal Hocko
  2017-01-02 15:30 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
  2017-01-02 15:30 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
  0 siblings, 2 replies; 3+ messages in thread
From: Michal Hocko @ 2017-01-02 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Jia He, Andi Kleen, linux-mm, LKML

Hi,
this is an alternative fix for [1] which is currently sitting in the mm
tree.  I believe that the patch 1 is better because it allows to get rid
of __GFP_OTHER_NODE (patch 2) and it uses less branches as well. Vlastimil
has also shown [2] that the patch from Jia He is not fully compatible with
the code before the patch it tries to fix. I do not think that the issue
is serious enough to warrant stable tree inclusion.

Can we have these patches merged instead?

[1] http://lkml.kernel.org/r/1481522347-20393-1-git-send-email-hejianet@gmail.com
[2] http://lkml.kernel.org/r/233ed490-afb9-4644-6d84-c9f888882da2@suse.cz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] mm: fix remote numa hits statistics
  2017-01-02 15:30 [PATCH 0/2] numa node stats alternative fix Michal Hocko
@ 2017-01-02 15:30 ` Michal Hocko
  2017-01-02 15:30 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2017-01-02 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Jia He, Andi Kleen, linux-mm, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Jia He has noticed that b9f00e147f27 ("mm, page_alloc: reduce branches
in zone_statistics") has an unintentional side effect that remote node
allocation requests are accounted as NUMA_MISS rathat than NUMA_HIT and
NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE. There are
many of these potentially because the flag is used very rarely while
we have many users of __alloc_pages_node.

Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
follow up patch) and treat all allocations that were satisfied from the
preferred zone's node as NUMA_HITS because this is the same node we
requested the allocation from in most cases. If this is not the local
node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.

One downsize would be that an allocation request for a node which is
outside of the mempolicy nodemask would be reported as a hit which is a
bit weird but that was the case before b9f00e147f27 already.

Reported-by: Jia He <hejianet@gmail.com>
Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz> # with cbmc[1] superpowers
Signed-off-by: Michal Hocko <mhocko@suse.com>

[1] http://paulmck.livejournal.com/38997.html
---
 mm/page_alloc.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f6d5b73e1d7c..e2a44950a685 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2583,30 +2583,23 @@ int __isolate_free_page(struct page *page, unsigned int order)
  * Update NUMA hit/miss statistics
  *
  * Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
  */
 static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
 								gfp_t flags)
 {
 #ifdef CONFIG_NUMA
-	int local_nid = numa_node_id();
 	enum zone_stat_item local_stat = NUMA_LOCAL;
 
-	if (unlikely(flags & __GFP_OTHER_NODE)) {
+	if (z->node != numa_node_id())
 		local_stat = NUMA_OTHER;
-		local_nid = preferred_zone->node;
-	}
 
-	if (z->node == local_nid) {
+	if (z->node == preferred_zone->node)
 		__inc_zone_state(z, NUMA_HIT);
-		__inc_zone_state(z, local_stat);
-	} else {
+	else {
 		__inc_zone_state(z, NUMA_MISS);
 		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
 	}
+	__inc_zone_state(z, local_stat);
 #endif
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE
  2017-01-02 15:30 [PATCH 0/2] numa node stats alternative fix Michal Hocko
  2017-01-02 15:30 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
@ 2017-01-02 15:30 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2017-01-02 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Jia He, Andi Kleen, linux-mm, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

The flag has been introduced by 78afd5612deb ("mm: add __GFP_OTHER_NODE
flag") to allow proper accounting of remote node allocations done by
kernel daemons on behalf of a process - e.g. khugepaged.

After "mm: fix remote numa hits statistics" we do not need and actually
use the flag so we can safely remove it because all allocations which
are satisfied from their "home" node are accounted properly.

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/gfp.h            | 13 +++----------
 include/trace/events/mmflags.h |  1 -
 mm/huge_memory.c               |  3 +--
 mm/khugepaged.c                |  5 ++---
 mm/page_alloc.c                |  5 ++---
 tools/perf/builtin-kmem.c      |  1 -
 6 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4175dca4ac39..7806a8f80abc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -38,9 +38,8 @@ struct vm_area_struct;
 #define ___GFP_ACCOUNT		0x100000u
 #define ___GFP_NOTRACK		0x200000u
 #define ___GFP_DIRECT_RECLAIM	0x400000u
-#define ___GFP_OTHER_NODE	0x800000u
-#define ___GFP_WRITE		0x1000000u
-#define ___GFP_KSWAPD_RECLAIM	0x2000000u
+#define ___GFP_WRITE		0x800000u
+#define ___GFP_KSWAPD_RECLAIM	0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -172,11 +171,6 @@ struct vm_area_struct;
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
- *
- * __GFP_OTHER_NODE is for allocations that are on a remote node but that
- *   should not be accounted for as a remote allocation in vmstat. A
- *   typical user would be khugepaged collapsing a huge page on a remote
- *   node.
  */
 #define __GFP_COLD	((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
@@ -184,10 +178,9 @@ struct vm_area_struct;
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK	((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
-#define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 25
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5a81ab48a2fb..556a0efa8298 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -48,7 +48,6 @@
 	{(unsigned long)__GFP_RECLAIM,		"__GFP_RECLAIM"},	\
 	{(unsigned long)__GFP_DIRECT_RECLAIM,	"__GFP_DIRECT_RECLAIM"},\
 	{(unsigned long)__GFP_KSWAPD_RECLAIM,	"__GFP_KSWAPD_RECLAIM"},\
-	{(unsigned long)__GFP_OTHER_NODE,	"__GFP_OTHER_NODE"}	\
 
 #define show_gfp_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f3c2040edbb1..8206abf4ac03 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -918,8 +918,7 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd,
 	}
 
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
-		pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE |
-					       __GFP_OTHER_NODE, vma,
+		pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE, vma,
 					       vmf->address, page_to_nid(page));
 		if (unlikely(!pages[i] ||
 			     mem_cgroup_try_charge(pages[i], vma->vm_mm,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e32389a97030..211974a3992b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -943,7 +943,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
 	/* Only allocate from the target node */
-	gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_OTHER_NODE | __GFP_THISNODE;
+	gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
 
 	/*
 	 * Before allocating the hugepage, release the mmap_sem read lock.
@@ -1326,8 +1326,7 @@ static void collapse_shmem(struct mm_struct *mm,
 	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
 
 	/* Only allocate from the target node */
-	gfp = alloc_hugepage_khugepaged_gfpmask() |
-		__GFP_OTHER_NODE | __GFP_THISNODE;
+	gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
 
 	new_page = khugepaged_alloc_page(hpage, gfp, node);
 	if (!new_page) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2a44950a685..ea60dc06d280 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2584,8 +2584,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
  *
  * Must be called with interrupts disabled.
  */
-static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
-								gfp_t flags)
+static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 {
 #ifdef CONFIG_NUMA
 	enum zone_stat_item local_stat = NUMA_LOCAL;
@@ -2667,7 +2666,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 	}
 
 	__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
-	zone_statistics(preferred_zone, zone, gfp_flags);
+	zone_statistics(preferred_zone, zone);
 	local_irq_restore(flags);
 
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index d426dcb18ce9..33b959d47545 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -645,7 +645,6 @@ static const struct {
 	{ "__GFP_RECLAIM",		"R" },
 	{ "__GFP_DIRECT_RECLAIM",	"DR" },
 	{ "__GFP_KSWAPD_RECLAIM",	"KR" },
-	{ "__GFP_OTHER_NODE",		"ON" },
 };
 
 static size_t max_gfp_len;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-01-02 15:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-02 15:30 [PATCH 0/2] numa node stats alternative fix Michal Hocko
2017-01-02 15:30 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
2017-01-02 15:30 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).