linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] mm/shuffle: fix and cleanips
@ 2020-06-16 11:52 David Hildenbrand
  2020-06-16 11:52 ` [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-06-16 11:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Alexander Duyck, Andrew Morton,
	Dan Williams, Dave Hansen, Huang Ying, Johannes Weiner,
	Kees Cook, Keith Busch, Mel Gorman, Michal Hocko, Minchan Kim,
	Wei Yang

Patch #1 is a fix for overlapping zones and offline sections. Patch #2
stops shuffling the complete zone when onlining a memory block. Patch #3
removes dynamic reconfiguration which is currently dead code (and it's
unclear if this will be needed in the near future).

David Hildenbrand (3):
  mm/shuffle: don't move pages between zones and don't read garbage
    memmaps
  mm/memory_hotplug: don't shuffle complete zone when onlining memory
  mm/shuffle: remove dynamic reconfiguration

 mm/memory_hotplug.c |  3 ---
 mm/shuffle.c        | 48 ++++++++++++---------------------------------
 mm/shuffle.h        | 29 ---------------------------
 3 files changed, 12 insertions(+), 68 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-06-16 11:52 [PATCH v1 0/3] mm/shuffle: fix and cleanips David Hildenbrand
@ 2020-06-16 11:52 ` David Hildenbrand
  2020-06-16 12:43   ` Michal Hocko
  2020-06-16 11:52 ` [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory David Hildenbrand
  2020-06-16 11:52 ` [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand
  2 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-06-16 11:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, stable, Andrew Morton,
	Johannes Weiner, Michal Hocko, Minchan Kim, Huang Ying, Wei Yang,
	Mel Gorman

Especially with memory hotplug, we can have offline sections (with a
garbage memmap) and overlapping zones. We have to make sure to only
touch initialized memmaps (online sections managed by the buddy) and that
the zone matches, to not move pages between zones.

To test if this can actually happen, I added a simple
	BUG_ON(page_zone(page_i) != page_zone(page_j));
right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and
onlining the first memory block "online_movable" and the second memory
block "online_kernel", it will trigger the BUG, as both zones (NORMAL
and MOVABLE) overlap.

This might result in all kinds of weird situations (e.g., double
allocations, list corruptions, unmovable allocations ending up in the
movable zone).

Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization")
Cc: stable@vger.kernel.org # v5.2+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/shuffle.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/shuffle.c b/mm/shuffle.c
index 44406d9977c77..dd13ab851b3ee 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
  * For two pages to be swapped in the shuffle, they must be free (on a
  * 'free_area' lru), have the same order, and have the same migratetype.
  */
-static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
+static struct page * __meminit shuffle_valid_page(struct zone *zone,
+						  unsigned long pfn, int order)
 {
-	struct page *page;
+	struct page *page = pfn_to_online_page(pfn);
 
 	/*
 	 * Given we're dealing with randomly selected pfns in a zone we
 	 * need to ask questions like...
 	 */
 
-	/* ...is the pfn even in the memmap? */
-	if (!pfn_valid_within(pfn))
+	/* ... is the page managed by the buddy? */
+	if (!page)
 		return NULL;
 
-	/* ...is the pfn in a present section or a hole? */
-	if (!pfn_in_present_section(pfn))
+	/* ... is the page assigned to the same zone? */
+	if (page_zone(page) != zone)
 		return NULL;
 
 	/* ...is the page free and currently on a free_area list? */
-	page = pfn_to_page(pfn);
 	if (!PageBuddy(page))
 		return NULL;
 
@@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zone *z)
 		 * page_j randomly selected in the span @zone_start_pfn to
 		 * @spanned_pages.
 		 */
-		page_i = shuffle_valid_page(i, order);
+		page_i = shuffle_valid_page(z, i, order);
 		if (!page_i)
 			continue;
 
@@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zone *z)
 			j = z->zone_start_pfn +
 				ALIGN_DOWN(get_random_long() % z->spanned_pages,
 						order_pages);
-			page_j = shuffle_valid_page(j, order);
+			page_j = shuffle_valid_page(z, j, order);
 			if (page_j && page_j != page_i)
 				break;
 		}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-16 11:52 [PATCH v1 0/3] mm/shuffle: fix and cleanips David Hildenbrand
  2020-06-16 11:52 ` [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
@ 2020-06-16 11:52 ` David Hildenbrand
  2020-06-16 12:50   ` Michal Hocko
  2020-06-16 11:52 ` [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand
  2 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-06-16 11:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Alexander Duyck,
	Dan Williams, Michal Hocko, Dave Hansen, Kees Cook, Mel Gorman

Commit e900a918b098 ("mm: shuffle initial free memory to improve
memory-side-cache utilization") introduced shuffling of free pages
during system boot and whenever we online memory blocks.

However, whenever we online memory blocks, all pages that will be
exposed to the buddy end up getting freed via __free_one_page(). In the
general case, we free these pages in MAX_ORDER - 1 chunks, which
corresponds to the shuffle order.

Inside __free_one_page(), we will already shuffle the newly onlined pages
using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
memory hotplug.

Note: When hotplugging a DIMM, each memory block (e.g., 128MB .. 2G on
x86-64) will get onlined individually, resulting in a shuffle_zone() for
every memory block getting onlined.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c |  3 ---
 mm/shuffle.c        |  2 +-
 mm/shuffle.h        | 12 ------------
 3 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9b34e03e730a4..845a517649c71 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -40,7 +40,6 @@
 #include <asm/tlbflush.h>
 
 #include "internal.h"
-#include "shuffle.h"
 
 /*
  * online_page_callback contains pointer to current page onlining function.
@@ -822,8 +821,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	zone->zone_pgdat->node_present_pages += onlined_pages;
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
-	shuffle_zone(zone);
-
 	node_states_set_node(nid, &arg);
 	if (need_zonelists_rebuild)
 		build_all_zonelists(NULL);
diff --git a/mm/shuffle.c b/mm/shuffle.c
index dd13ab851b3ee..609c26aa57db0 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -180,7 +180,7 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat)
 	struct zone *z;
 
 	for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-		shuffle_zone(z);
+		__shuffle_zone(z);
 }
 
 bool shuffle_pick_tail(void)
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 4d79f03b6658f..657e2b9ec38dd 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -30,14 +30,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
 	__shuffle_free_memory(pgdat);
 }
 
-extern void __shuffle_zone(struct zone *z);
-static inline void shuffle_zone(struct zone *z)
-{
-	if (!static_branch_unlikely(&page_alloc_shuffle_key))
-		return;
-	__shuffle_zone(z);
-}
-
 static inline bool is_shuffle_order(int order)
 {
 	if (!static_branch_unlikely(&page_alloc_shuffle_key))
@@ -54,10 +46,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
 {
 }
 
-static inline void shuffle_zone(struct zone *z)
-{
-}
-
 static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
 {
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration
  2020-06-16 11:52 [PATCH v1 0/3] mm/shuffle: fix and cleanips David Hildenbrand
  2020-06-16 11:52 ` [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
  2020-06-16 11:52 ` [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory David Hildenbrand
@ 2020-06-16 11:52 ` David Hildenbrand
  2020-06-16 12:41   ` Michal Hocko
  2 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-06-16 11:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Johannes Weiner,
	Michal Hocko, Minchan Kim, Huang Ying, Wei Yang, Keith Busch,
	Mel Gorman

Commit e900a918b098 ("mm: shuffle initial free memory to improve
memory-side-cache utilization") promised "autodetection of a
memory-side-cache (to be added in a follow-on patch)" over a year ago.

The original series included patches [1], however, they were dropped
during review [2] to be followed-up later.

Let's simplify for now and re-add when really (ever?) needed.

[1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/shuffle.c | 28 ++--------------------------
 mm/shuffle.h | 17 -----------------
 2 files changed, 2 insertions(+), 43 deletions(-)

diff --git a/mm/shuffle.c b/mm/shuffle.c
index 609c26aa57db0..702c0d5cf276c 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -10,33 +10,11 @@
 #include "shuffle.h"
 
 DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-static unsigned long shuffle_state __ro_after_init;
-
-/*
- * Depending on the architecture, module parameter parsing may run
- * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents,
- * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE
- * attempts to turn on the implementation, but aborts if it finds
- * SHUFFLE_FORCE_DISABLE already set.
- */
-__meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
-{
-	if (ctl == SHUFFLE_FORCE_DISABLE)
-		set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state);
-
-	if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) {
-		if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state))
-			static_branch_disable(&page_alloc_shuffle_key);
-	} else if (ctl == SHUFFLE_ENABLE
-			&& !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state))
-		static_branch_enable(&page_alloc_shuffle_key);
-}
 
 static bool shuffle_param;
 static int shuffle_show(char *buffer, const struct kernel_param *kp)
 {
-	return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state)
-			? 'Y' : 'N');
+	return sprintf(buffer, "%c\n", shuffle_param ? 'Y' : 'N');
 }
 
 static __meminit int shuffle_store(const char *val,
@@ -47,9 +25,7 @@ static __meminit int shuffle_store(const char *val,
 	if (rc < 0)
 		return rc;
 	if (shuffle_param)
-		page_alloc_shuffle(SHUFFLE_ENABLE);
-	else
-		page_alloc_shuffle(SHUFFLE_FORCE_DISABLE);
+		static_branch_enable(&page_alloc_shuffle_key);
 	return 0;
 }
 module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 657e2b9ec38dd..5574cbd4611e3 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -4,23 +4,10 @@
 #define _MM_SHUFFLE_H
 #include <linux/jump_label.h>
 
-/*
- * SHUFFLE_ENABLE is called from the command line enabling path, or by
- * platform-firmware enabling that indicates the presence of a
- * direct-mapped memory-side-cache. SHUFFLE_FORCE_DISABLE is called from
- * the command line path and overrides any previous or future
- * SHUFFLE_ENABLE.
- */
-enum mm_shuffle_ctl {
-	SHUFFLE_ENABLE,
-	SHUFFLE_FORCE_DISABLE,
-};
-
 #define SHUFFLE_ORDER (MAX_ORDER-1)
 
 #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
 DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
 extern void __shuffle_free_memory(pg_data_t *pgdat);
 extern bool shuffle_pick_tail(void);
 static inline void shuffle_free_memory(pg_data_t *pgdat)
@@ -46,10 +33,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
 {
 }
 
-static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
-{
-}
-
 static inline bool is_shuffle_order(int order)
 {
 	return false;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration
  2020-06-16 11:52 ` [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand
@ 2020-06-16 12:41   ` Michal Hocko
  2020-06-16 13:45     ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-06-16 12:41 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams
  Cc: linux-kernel, linux-mm, Andrew Morton, Johannes Weiner,
	Minchan Kim, Huang Ying, Wei Yang, Keith Busch, Mel Gorman

[Add Dan]

On Tue 16-06-20 13:52:13, David Hildenbrand wrote:
> Commit e900a918b098 ("mm: shuffle initial free memory to improve
> memory-side-cache utilization") promised "autodetection of a
> memory-side-cache (to be added in a follow-on patch)" over a year ago.
> 
> The original series included patches [1], however, they were dropped
> during review [2] to be followed-up later.
> 
> Let's simplify for now and re-add when really (ever?) needed.
> 
> [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
> [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: David Hildenbrand <david@redhat.com>

While I am not against removing this unused code I am really curious
what is the future of the auto detection. Has this just fall through
cracks or there are some more serious problem to make detection
possible/reliable?

> ---
>  mm/shuffle.c | 28 ++--------------------------
>  mm/shuffle.h | 17 -----------------
>  2 files changed, 2 insertions(+), 43 deletions(-)
> 
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 609c26aa57db0..702c0d5cf276c 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -10,33 +10,11 @@
>  #include "shuffle.h"
>  
>  DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
> -static unsigned long shuffle_state __ro_after_init;
> -
> -/*
> - * Depending on the architecture, module parameter parsing may run
> - * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents,
> - * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE
> - * attempts to turn on the implementation, but aborts if it finds
> - * SHUFFLE_FORCE_DISABLE already set.
> - */
> -__meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
> -{
> -	if (ctl == SHUFFLE_FORCE_DISABLE)
> -		set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state);
> -
> -	if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) {
> -		if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state))
> -			static_branch_disable(&page_alloc_shuffle_key);
> -	} else if (ctl == SHUFFLE_ENABLE
> -			&& !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state))
> -		static_branch_enable(&page_alloc_shuffle_key);
> -}
>  
>  static bool shuffle_param;
>  static int shuffle_show(char *buffer, const struct kernel_param *kp)
>  {
> -	return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state)
> -			? 'Y' : 'N');
> +	return sprintf(buffer, "%c\n", shuffle_param ? 'Y' : 'N');
>  }
>  
>  static __meminit int shuffle_store(const char *val,
> @@ -47,9 +25,7 @@ static __meminit int shuffle_store(const char *val,
>  	if (rc < 0)
>  		return rc;
>  	if (shuffle_param)
> -		page_alloc_shuffle(SHUFFLE_ENABLE);
> -	else
> -		page_alloc_shuffle(SHUFFLE_FORCE_DISABLE);
> +		static_branch_enable(&page_alloc_shuffle_key);
>  	return 0;
>  }
>  module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
> diff --git a/mm/shuffle.h b/mm/shuffle.h
> index 657e2b9ec38dd..5574cbd4611e3 100644
> --- a/mm/shuffle.h
> +++ b/mm/shuffle.h
> @@ -4,23 +4,10 @@
>  #define _MM_SHUFFLE_H
>  #include <linux/jump_label.h>
>  
> -/*
> - * SHUFFLE_ENABLE is called from the command line enabling path, or by
> - * platform-firmware enabling that indicates the presence of a
> - * direct-mapped memory-side-cache. SHUFFLE_FORCE_DISABLE is called from
> - * the command line path and overrides any previous or future
> - * SHUFFLE_ENABLE.
> - */
> -enum mm_shuffle_ctl {
> -	SHUFFLE_ENABLE,
> -	SHUFFLE_FORCE_DISABLE,
> -};
> -
>  #define SHUFFLE_ORDER (MAX_ORDER-1)
>  
>  #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
>  DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
> -extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
>  extern void __shuffle_free_memory(pg_data_t *pgdat);
>  extern bool shuffle_pick_tail(void);
>  static inline void shuffle_free_memory(pg_data_t *pgdat)
> @@ -46,10 +33,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
>  {
>  }
>  
> -static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
> -{
> -}
> -
>  static inline bool is_shuffle_order(int order)
>  {
>  	return false;
> -- 
> 2.26.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-06-16 11:52 ` [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
@ 2020-06-16 12:43   ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2020-06-16 12:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, stable, Andrew Morton, Johannes Weiner,
	Minchan Kim, Huang Ying, Wei Yang, Mel Gorman

On Tue 16-06-20 13:52:11, David Hildenbrand wrote:
> Especially with memory hotplug, we can have offline sections (with a
> garbage memmap) and overlapping zones. We have to make sure to only
> touch initialized memmaps (online sections managed by the buddy) and that
> the zone matches, to not move pages between zones.
> 
> To test if this can actually happen, I added a simple
> 	BUG_ON(page_zone(page_i) != page_zone(page_j));
> right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and
> onlining the first memory block "online_movable" and the second memory
> block "online_kernel", it will trigger the BUG, as both zones (NORMAL
> and MOVABLE) overlap.
> 
> This might result in all kinds of weird situations (e.g., double
> allocations, list corruptions, unmovable allocations ending up in the
> movable zone).
> 
> Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization")
> Cc: stable@vger.kernel.org # v5.2+
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/shuffle.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 44406d9977c77..dd13ab851b3ee 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
>   * For two pages to be swapped in the shuffle, they must be free (on a
>   * 'free_area' lru), have the same order, and have the same migratetype.
>   */
> -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
> +static struct page * __meminit shuffle_valid_page(struct zone *zone,
> +						  unsigned long pfn, int order)
>  {
> -	struct page *page;
> +	struct page *page = pfn_to_online_page(pfn);
>  
>  	/*
>  	 * Given we're dealing with randomly selected pfns in a zone we
>  	 * need to ask questions like...
>  	 */
>  
> -	/* ...is the pfn even in the memmap? */
> -	if (!pfn_valid_within(pfn))
> +	/* ... is the page managed by the buddy? */
> +	if (!page)
>  		return NULL;
>  
> -	/* ...is the pfn in a present section or a hole? */
> -	if (!pfn_in_present_section(pfn))
> +	/* ... is the page assigned to the same zone? */
> +	if (page_zone(page) != zone)
>  		return NULL;
>  
>  	/* ...is the page free and currently on a free_area list? */
> -	page = pfn_to_page(pfn);
>  	if (!PageBuddy(page))
>  		return NULL;
>  
> @@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zone *z)
>  		 * page_j randomly selected in the span @zone_start_pfn to
>  		 * @spanned_pages.
>  		 */
> -		page_i = shuffle_valid_page(i, order);
> +		page_i = shuffle_valid_page(z, i, order);
>  		if (!page_i)
>  			continue;
>  
> @@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zone *z)
>  			j = z->zone_start_pfn +
>  				ALIGN_DOWN(get_random_long() % z->spanned_pages,
>  						order_pages);
> -			page_j = shuffle_valid_page(j, order);
> +			page_j = shuffle_valid_page(z, j, order);
>  			if (page_j && page_j != page_i)
>  				break;
>  		}
> -- 
> 2.26.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-16 11:52 ` [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory David Hildenbrand
@ 2020-06-16 12:50   ` Michal Hocko
  2020-06-16 17:00     ` Dan Williams
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-06-16 12:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Alexander Duyck,
	Dan Williams, Dave Hansen, Kees Cook, Mel Gorman

On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> Commit e900a918b098 ("mm: shuffle initial free memory to improve
> memory-side-cache utilization") introduced shuffling of free pages
> during system boot and whenever we online memory blocks.
> 
> However, whenever we online memory blocks, all pages that will be
> exposed to the buddy end up getting freed via __free_one_page(). In the
> general case, we free these pages in MAX_ORDER - 1 chunks, which
> corresponds to the shuffle order.
> 
> Inside __free_one_page(), we will already shuffle the newly onlined pages
> using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> memory hotplug.
> 
> Note: When hotplugging a DIMM, each memory block (e.g., 128MB .. 2G on
> x86-64) will get onlined individually, resulting in a shuffle_zone() for
> every memory block getting onlined.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memory_hotplug.c |  3 ---
>  mm/shuffle.c        |  2 +-
>  mm/shuffle.h        | 12 ------------
>  3 files changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9b34e03e730a4..845a517649c71 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -40,7 +40,6 @@
>  #include <asm/tlbflush.h>
>  
>  #include "internal.h"
> -#include "shuffle.h"
>  
>  /*
>   * online_page_callback contains pointer to current page onlining function.
> @@ -822,8 +821,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>  	zone->zone_pgdat->node_present_pages += onlined_pages;
>  	pgdat_resize_unlock(zone->zone_pgdat, &flags);
>  
> -	shuffle_zone(zone);
> -
>  	node_states_set_node(nid, &arg);
>  	if (need_zonelists_rebuild)
>  		build_all_zonelists(NULL);
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index dd13ab851b3ee..609c26aa57db0 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -180,7 +180,7 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat)
>  	struct zone *z;
>  
>  	for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
> -		shuffle_zone(z);
> +		__shuffle_zone(z);
>  }
>  
>  bool shuffle_pick_tail(void)
> diff --git a/mm/shuffle.h b/mm/shuffle.h
> index 4d79f03b6658f..657e2b9ec38dd 100644
> --- a/mm/shuffle.h
> +++ b/mm/shuffle.h
> @@ -30,14 +30,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
>  	__shuffle_free_memory(pgdat);
>  }
>  
> -extern void __shuffle_zone(struct zone *z);
> -static inline void shuffle_zone(struct zone *z)
> -{
> -	if (!static_branch_unlikely(&page_alloc_shuffle_key))
> -		return;
> -	__shuffle_zone(z);
> -}
> -
>  static inline bool is_shuffle_order(int order)
>  {
>  	if (!static_branch_unlikely(&page_alloc_shuffle_key))
> @@ -54,10 +46,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
>  {
>  }
>  
> -static inline void shuffle_zone(struct zone *z)
> -{
> -}
> -
>  static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
>  {
>  }
> -- 
> 2.26.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration
  2020-06-16 12:41   ` Michal Hocko
@ 2020-06-16 13:45     ` David Hildenbrand
  2020-06-16 16:59       ` Dan Williams
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-06-16 13:45 UTC (permalink / raw)
  To: Michal Hocko, Dan Williams
  Cc: linux-kernel, linux-mm, Andrew Morton, Johannes Weiner,
	Minchan Kim, Huang Ying, Wei Yang, Keith Busch, Mel Gorman

On 16.06.20 14:41, Michal Hocko wrote:
> [Add Dan]

Whops, dropped by mistake. Thanks for adding.

> 
> On Tue 16-06-20 13:52:13, David Hildenbrand wrote:
>> Commit e900a918b098 ("mm: shuffle initial free memory to improve
>> memory-side-cache utilization") promised "autodetection of a
>> memory-side-cache (to be added in a follow-on patch)" over a year ago.
>>
>> The original series included patches [1], however, they were dropped
>> during review [2] to be followed-up later.
>>
>> Let's simplify for now and re-add when really (ever?) needed.
>>
>> [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
>> [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Keith Busch <keith.busch@intel.com>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> While I am not against removing this unused code I am really curious
> what is the future of the auto detection. Has this just fall through
> cracks or there are some more serious problem to make detection
> possible/reliable?

From the bouncing mails I assume Keith - author of the original patches
in [1] -  is no longer working at Intel (or I messed up :) "#5.1.0
Address rejected"). Maybe Dan can clarify what the future of this is.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration
  2020-06-16 13:45     ` David Hildenbrand
@ 2020-06-16 16:59       ` Dan Williams
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2020-06-16 16:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, Linux Kernel Mailing List, Linux MM, Andrew Morton,
	Johannes Weiner, Minchan Kim, Huang Ying, Wei Yang, Keith Busch,
	Mel Gorman

On Tue, Jun 16, 2020 at 6:45 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.06.20 14:41, Michal Hocko wrote:
> > [Add Dan]
>
> Whops, dropped by mistake. Thanks for adding.
>
> >
> > On Tue 16-06-20 13:52:13, David Hildenbrand wrote:
> >> Commit e900a918b098 ("mm: shuffle initial free memory to improve
> >> memory-side-cache utilization") promised "autodetection of a
> >> memory-side-cache (to be added in a follow-on patch)" over a year ago.
> >>
> >> The original series included patches [1], however, they were dropped
> >> during review [2] to be followed-up later.
> >>
> >> Let's simplify for now and re-add when really (ever?) needed.
> >>
> >> [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
> >> [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/
> >>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Minchan Kim <minchan@kernel.org>
> >> Cc: Huang Ying <ying.huang@intel.com>
> >> Cc: Wei Yang <richard.weiyang@gmail.com>
> >> Cc: Keith Busch <keith.busch@intel.com>
> >> Cc: Mel Gorman <mgorman@techsingularity.net>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > While I am not against removing this unused code I am really curious
> > what is the future of the auto detection. Has this just fall through
> > cracks or there are some more serious problem to make detection
> > possible/reliable?
>
> From the bouncing mails I assume Keith - author of the original patches
> in [1] -  is no longer working at Intel (or I messed up :) "#5.1.0
> Address rejected"). Maybe Dan can clarify what the future of this is.

People are actively using this via the command line option. In fact it
has saved some people that have deployed platforms with mistmatched
DIMM populations where no HMAT autodetection would be possible. It's
also been useful for mitigating other memory interleaving performance
problems that are sensitive to a given address line.

The reason this is still manual is the dearth of platforms that
publish an HMAT, but again the command line option is actively
mitigating performance issues or otherwise allowing for testing a
"pre-aged" memory allocator free list.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-16 12:50   ` Michal Hocko
@ 2020-06-16 17:00     ` Dan Williams
  2020-06-16 17:03       ` Dan Williams
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Williams @ 2020-06-16 17:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, Linux Kernel Mailing List, Linux MM,
	Andrew Morton, Alexander Duyck, Dave Hansen, Kees Cook,
	Mel Gorman

On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > memory-side-cache utilization") introduced shuffling of free pages
> > during system boot and whenever we online memory blocks.
> >
> > However, whenever we online memory blocks, all pages that will be
> > exposed to the buddy end up getting freed via __free_one_page(). In the
> > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > corresponds to the shuffle order.
> >
> > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > memory hotplug.
> >
> > Note: When hotplugging a DIMM, each memory block (e.g., 128MB .. 2G on
> > x86-64) will get onlined individually, resulting in a shuffle_zone() for
> > every memory block getting onlined.
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Acked-by: Michal Hocko <mhocko@suse.com>

Nacked-by: Dan Williams <dan.j.williams@intel.com>

As explained in another thread this is in active use.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-16 17:00     ` Dan Williams
@ 2020-06-16 17:03       ` Dan Williams
  2020-06-16 18:24         ` David Hildenbrand
  2020-06-17  6:48         ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: Dan Williams @ 2020-06-16 17:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, Linux Kernel Mailing List, Linux MM,
	Andrew Morton, Alexander Duyck, Dave Hansen, Kees Cook,
	Mel Gorman

On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > > memory-side-cache utilization") introduced shuffling of free pages
> > > during system boot and whenever we online memory blocks.
> > >
> > > However, whenever we online memory blocks, all pages that will be
> > > exposed to the buddy end up getting freed via __free_one_page(). In the
> > > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > > corresponds to the shuffle order.
> > >
> > > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > > memory hotplug.

This was already explained in the initial patch submission. The
shuffle_pick_tail() shuffling at run time is only sufficient for
maintaining the shuffle. It's not sufficient for effectively
randomizing the free list.

See:

e900a918b098 mm: shuffle initial free memory to improve
memory-side-cache utilization

    This initial randomization can be undone over time so a follow-on patch is
    introduced to inject entropy on page free decisions.  It is reasonable to
    ask if the page free entropy is sufficient, but it is not enough due to
    the in-order initial freeing of pages.  At the start of that process
    putting page1 in front or behind page0 still keeps them close together,
    page2 is still near page1 and has a high chance of being adjacent.  As
    more pages are added ordering diversity improves, but there is still high
    page locality for the low address pages and this leads to no significant
    impact to the cache conflict rate.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-16 17:03       ` Dan Williams
@ 2020-06-16 18:24         ` David Hildenbrand
  2020-06-17  6:48         ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-06-16 18:24 UTC (permalink / raw)
  To: Dan Williams, Michal Hocko
  Cc: Linux Kernel Mailing List, Linux MM, Andrew Morton,
	Alexander Duyck, Dave Hansen, Kees Cook, Mel Gorman

On 16.06.20 19:03, Dan Williams wrote:
> On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>> On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
>>>> Commit e900a918b098 ("mm: shuffle initial free memory to improve
>>>> memory-side-cache utilization") introduced shuffling of free pages
>>>> during system boot and whenever we online memory blocks.
>>>>
>>>> However, whenever we online memory blocks, all pages that will be
>>>> exposed to the buddy end up getting freed via __free_one_page(). In the
>>>> general case, we free these pages in MAX_ORDER - 1 chunks, which
>>>> corresponds to the shuffle order.
>>>>
>>>> Inside __free_one_page(), we will already shuffle the newly onlined pages
>>>> using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
>>>> memory hotplug.
> 
> This was already explained in the initial patch submission. The
> shuffle_pick_tail() shuffling at run time is only sufficient for
> maintaining the shuffle. It's not sufficient for effectively
> randomizing the free list.

Thanks for pointing that out. Something like that should definitely
belong into the code in form of a comment. I'll think of something.

Tow things:

1. The "shuffle when onlining" is really sub-optimal. Assume you hotplug
a 64GB DIMM. With 128MB memory blocks you get 512 zone shuffles. I guess
something better would have been to schedule a reshuffle some ms in the
future.

2. You'll run into the same issue whenever you free a bigger,
consecutive range. Like free_contig_rang()'ing one or more multiple
gigantic pages. We might want to reshuffle (or schedule a reshuffle)
here as well.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-16 17:03       ` Dan Williams
  2020-06-16 18:24         ` David Hildenbrand
@ 2020-06-17  6:48         ` Michal Hocko
  2020-06-17 18:13           ` Dan Williams
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-06-17  6:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: David Hildenbrand, Linux Kernel Mailing List, Linux MM,
	Andrew Morton, Alexander Duyck, Dave Hansen, Kees Cook,
	Mel Gorman

On Tue 16-06-20 10:03:31, Dan Williams wrote:
> On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > > > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > > > memory-side-cache utilization") introduced shuffling of free pages
> > > > during system boot and whenever we online memory blocks.
> > > >
> > > > However, whenever we online memory blocks, all pages that will be
> > > > exposed to the buddy end up getting freed via __free_one_page(). In the
> > > > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > > > corresponds to the shuffle order.
> > > >
> > > > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > > > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > > > memory hotplug.
> 
> This was already explained in the initial patch submission. The
> shuffle_pick_tail() shuffling at run time is only sufficient for
> maintaining the shuffle. It's not sufficient for effectively
> randomizing the free list.

Yes, the "randomness" of the added memory will be lower. But is this
observable for hotplug scenarios? Is memory hotplug for the normal
memory really a thing in setups which use RAM as a cache?

While I do agree that the code wise the shuffling per online operation
doesn't really have any overhead really but it would be really great to
know whether it matters at all.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory
  2020-06-17  6:48         ` Michal Hocko
@ 2020-06-17 18:13           ` Dan Williams
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2020-06-17 18:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, Linux Kernel Mailing List, Linux MM,
	Andrew Morton, Alexander Duyck, Dave Hansen, Kees Cook,
	Mel Gorman

On Tue, Jun 16, 2020 at 11:48 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 16-06-20 10:03:31, Dan Williams wrote:
> > On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > > > > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > > > > memory-side-cache utilization") introduced shuffling of free pages
> > > > > during system boot and whenever we online memory blocks.
> > > > >
> > > > > However, whenever we online memory blocks, all pages that will be
> > > > > exposed to the buddy end up getting freed via __free_one_page(). In the
> > > > > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > > > > corresponds to the shuffle order.
> > > > >
> > > > > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > > > > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > > > > memory hotplug.
> >
> > This was already explained in the initial patch submission. The
> > shuffle_pick_tail() shuffling at run time is only sufficient for
> > maintaining the shuffle. It's not sufficient for effectively
> > randomizing the free list.
>
> Yes, the "randomness" of the added memory will be lower. But is this
> observable for hotplug scenarios?

I'm not sure of the intersection of platforms using memory hotplug and
shuffling in production.

> Is memory hotplug for the normal
> memory really a thing in setups which use RAM as a cache?

I would point out again though that the utility of shuffling goes
beyond RAM-as-cache. I have seen some cost sensitive customer platform
configurations that asymmetrically populate memory controllers. Think
1 DIMM on controller0 and 2 DIMMs on controller1. In that case Linux
can get into pathological situations where an application is bandwidth
limited because it only accesses the single-DIMM backed memory range.
Shuffling balances accesses across all available memory memory
controllers restoring full memory bandwidth for that configuration. So
shuffling is used to solve problems that are otherwise invisible to
Linux, there's no indication from the platform that one memory range
has lower bandwidth than another.

> While I do agree that the code wise the shuffling per online operation
> doesn't really have any overhead really but it would be really great to
> know whether it matters at all.

I agree this is a good test case, especially considering the
"dax_kmem" solution where memory might be reserved from the initial
shuffling and onlined later. I assume there's a cross-over point where
not shuffling hotplugged memory starts to be noticeable. I just don't
have those numbers handy.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-17 18:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16 11:52 [PATCH v1 0/3] mm/shuffle: fix and cleanips David Hildenbrand
2020-06-16 11:52 ` [PATCH v1 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
2020-06-16 12:43   ` Michal Hocko
2020-06-16 11:52 ` [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory David Hildenbrand
2020-06-16 12:50   ` Michal Hocko
2020-06-16 17:00     ` Dan Williams
2020-06-16 17:03       ` Dan Williams
2020-06-16 18:24         ` David Hildenbrand
2020-06-17  6:48         ` Michal Hocko
2020-06-17 18:13           ` Dan Williams
2020-06-16 11:52 ` [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand
2020-06-16 12:41   ` Michal Hocko
2020-06-16 13:45     ` David Hildenbrand
2020-06-16 16:59       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).