linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] mm/shuffle: fix and cleanups
@ 2020-06-24  9:47 David Hildenbrand
  2020-06-24  9:47 ` [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Hildenbrand @ 2020-06-24  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Alexander Duyck, Andrew Morton,
	Dan Williams, Huang Ying, Johannes Weiner, Mel Gorman,
	Michal Hocko, Minchan Kim, Wei Yang, Wei Yang

Patch #1 is a fix for overlapping zones and offline sections. Patch #2
documents why we have to shuffle on memory hotplug, when onlining memory.
Patch #3 removes dynamic reconfiguration which is currently dead code.

v2 -> v3:
- "mm/memory_hotplug: document why shuffle_zone() is relevant"
-- Fix spelling, reference introducing commit
- Added ACKs/RB's

v1 -> v2:
- Replace "mm/memory_hotplug: don't shuffle complete zone when onlining
  memory" by "mm/memory_hotplug: document why shuffle_zone() is relevant"
- "mm/shuffle: remove dynamic reconfiguration"
-- Add details why autodetection is not implemented

David Hildenbrand (3):
  mm/shuffle: don't move pages between zones and don't read garbage
    memmaps
  mm/memory_hotplug: document why shuffle_zone() is relevant
  mm/shuffle: remove dynamic reconfiguration

 mm/memory_hotplug.c |  8 ++++++++
 mm/shuffle.c        | 46 +++++++++++----------------------------------
 mm/shuffle.h        | 17 -----------------
 3 files changed, 19 insertions(+), 52 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-06-24  9:47 [PATCH v3 0/3] mm/shuffle: fix and cleanups David Hildenbrand
@ 2020-06-24  9:47 ` David Hildenbrand
  2020-07-01 19:33   ` Sasha Levin
  2020-07-10 14:03   ` Sasha Levin
  2020-06-24  9:47 ` [PATCH v3 2/3] mm/memory_hotplug: document why shuffle_zone() is relevant David Hildenbrand
  2020-06-24  9:47 ` [PATCH v3 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand
  2 siblings, 2 replies; 8+ messages in thread
From: David Hildenbrand @ 2020-06-24  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Wei Yang, Michal Hocko,
	Dan Williams, stable, Andrew Morton, Johannes Weiner,
	Minchan Kim, Huang Ying, Wei Yang, Mel Gorman

Especially with memory hotplug, we can have offline sections (with a
garbage memmap) and overlapping zones. We have to make sure to only
touch initialized memmaps (online sections managed by the buddy) and that
the zone matches, to not move pages between zones.

To test if this can actually happen, I added a simple
	BUG_ON(page_zone(page_i) != page_zone(page_j));
right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and
onlining the first memory block "online_movable" and the second memory
block "online_kernel", it will trigger the BUG, as both zones (NORMAL
and MOVABLE) overlap.

This might result in all kinds of weird situations (e.g., double
allocations, list corruptions, unmovable allocations ending up in the
movable zone).

Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization")
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org # v5.2+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/shuffle.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/shuffle.c b/mm/shuffle.c
index 44406d9977c77..dd13ab851b3ee 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
  * For two pages to be swapped in the shuffle, they must be free (on a
  * 'free_area' lru), have the same order, and have the same migratetype.
  */
-static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
+static struct page * __meminit shuffle_valid_page(struct zone *zone,
+						  unsigned long pfn, int order)
 {
-	struct page *page;
+	struct page *page = pfn_to_online_page(pfn);
 
 	/*
 	 * Given we're dealing with randomly selected pfns in a zone we
 	 * need to ask questions like...
 	 */
 
-	/* ...is the pfn even in the memmap? */
-	if (!pfn_valid_within(pfn))
+	/* ... is the page managed by the buddy? */
+	if (!page)
 		return NULL;
 
-	/* ...is the pfn in a present section or a hole? */
-	if (!pfn_in_present_section(pfn))
+	/* ... is the page assigned to the same zone? */
+	if (page_zone(page) != zone)
 		return NULL;
 
 	/* ...is the page free and currently on a free_area list? */
-	page = pfn_to_page(pfn);
 	if (!PageBuddy(page))
 		return NULL;
 
@@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zone *z)
 		 * page_j randomly selected in the span @zone_start_pfn to
 		 * @spanned_pages.
 		 */
-		page_i = shuffle_valid_page(i, order);
+		page_i = shuffle_valid_page(z, i, order);
 		if (!page_i)
 			continue;
 
@@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zone *z)
 			j = z->zone_start_pfn +
 				ALIGN_DOWN(get_random_long() % z->spanned_pages,
 						order_pages);
-			page_j = shuffle_valid_page(j, order);
+			page_j = shuffle_valid_page(z, j, order);
 			if (page_j && page_j != page_i)
 				break;
 		}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] mm/memory_hotplug: document why shuffle_zone() is relevant
  2020-06-24  9:47 [PATCH v3 0/3] mm/shuffle: fix and cleanups David Hildenbrand
  2020-06-24  9:47 ` [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
@ 2020-06-24  9:47 ` David Hildenbrand
  2020-06-24  9:47 ` [PATCH v3 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand
  2 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2020-06-24  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Dan Williams, Michal Hocko,
	Andrew Morton, Alexander Duyck

It's not completely obvious why we have to shuffle the complete zone -
introduced in commit e900a918b098 ("mm: shuffle initial free memory to
improve memory-side-cache utilization") - because some sort of shuffling is
already performed when onlining pages via __free_one_page(), placing
MAX_ORDER-1 pages either to the head or the tail of the freelist. Let's
document why we have to shuffle the complete zone when exposing larger,
contiguous physical memory areas to the buddy.

Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index be3c62e3fb95c..ac6961abaa103 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -831,6 +831,14 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	zone->zone_pgdat->node_present_pages += onlined_pages;
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
+	/*
+	 * When exposing larger, physically contiguous memory areas to the
+	 * buddy, shuffling in the buddy (when freeing onlined pages, putting
+	 * them either to the head or the tail of the freelist) is only helpful
+	 * for maintaining the shuffle, but not for creating the initial
+	 * shuffle. Shuffle the whole zone to make sure the just onlined pages
+	 * are properly distributed across the whole freelist.
+	 */
 	shuffle_zone(zone);
 
 	node_states_set_node(nid, &arg);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] mm/shuffle: remove dynamic reconfiguration
  2020-06-24  9:47 [PATCH v3 0/3] mm/shuffle: fix and cleanups David Hildenbrand
  2020-06-24  9:47 ` [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
  2020-06-24  9:47 ` [PATCH v3 2/3] mm/memory_hotplug: document why shuffle_zone() is relevant David Hildenbrand
@ 2020-06-24  9:47 ` David Hildenbrand
  2 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2020-06-24  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Dan Williams, Michal Hocko,
	Wei Yang, Andrew Morton, Johannes Weiner, Minchan Kim,
	Huang Ying, Wei Yang, Mel Gorman

Commit e900a918b098 ("mm: shuffle initial free memory to improve
memory-side-cache utilization") promised "autodetection of a
memory-side-cache (to be added in a follow-on patch)" over a year ago.

The original series included patches [1], however, they were dropped
during review [2] to be followed-up later.

Due to lack of platforms that publish an HMAT, autodetection is currently
not implemented. However, manual activation is actively used [3]. Let's
simplify for now and re-add when really (ever?) needed.

[1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com
[2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com
[3] https://lkml.kernel.org/r/CAPcyv4irwGUU2x+c6b4L=KbB1dnasNKaaZd6oSpYjL9kfsnROQ@mail.gmail.com

Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/shuffle.c | 28 ++--------------------------
 mm/shuffle.h | 17 -----------------
 2 files changed, 2 insertions(+), 43 deletions(-)

diff --git a/mm/shuffle.c b/mm/shuffle.c
index dd13ab851b3ee..9b5cd4b004b0f 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -10,33 +10,11 @@
 #include "shuffle.h"
 
 DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-static unsigned long shuffle_state __ro_after_init;
-
-/*
- * Depending on the architecture, module parameter parsing may run
- * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents,
- * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE
- * attempts to turn on the implementation, but aborts if it finds
- * SHUFFLE_FORCE_DISABLE already set.
- */
-__meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
-{
-	if (ctl == SHUFFLE_FORCE_DISABLE)
-		set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state);
-
-	if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) {
-		if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state))
-			static_branch_disable(&page_alloc_shuffle_key);
-	} else if (ctl == SHUFFLE_ENABLE
-			&& !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state))
-		static_branch_enable(&page_alloc_shuffle_key);
-}
 
 static bool shuffle_param;
 static int shuffle_show(char *buffer, const struct kernel_param *kp)
 {
-	return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state)
-			? 'Y' : 'N');
+	return sprintf(buffer, "%c\n", shuffle_param ? 'Y' : 'N');
 }
 
 static __meminit int shuffle_store(const char *val,
@@ -47,9 +25,7 @@ static __meminit int shuffle_store(const char *val,
 	if (rc < 0)
 		return rc;
 	if (shuffle_param)
-		page_alloc_shuffle(SHUFFLE_ENABLE);
-	else
-		page_alloc_shuffle(SHUFFLE_FORCE_DISABLE);
+		static_branch_enable(&page_alloc_shuffle_key);
 	return 0;
 }
 module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 4d79f03b6658f..71b784f0b7c3e 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -4,23 +4,10 @@
 #define _MM_SHUFFLE_H
 #include <linux/jump_label.h>
 
-/*
- * SHUFFLE_ENABLE is called from the command line enabling path, or by
- * platform-firmware enabling that indicates the presence of a
- * direct-mapped memory-side-cache. SHUFFLE_FORCE_DISABLE is called from
- * the command line path and overrides any previous or future
- * SHUFFLE_ENABLE.
- */
-enum mm_shuffle_ctl {
-	SHUFFLE_ENABLE,
-	SHUFFLE_FORCE_DISABLE,
-};
-
 #define SHUFFLE_ORDER (MAX_ORDER-1)
 
 #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
 DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
 extern void __shuffle_free_memory(pg_data_t *pgdat);
 extern bool shuffle_pick_tail(void);
 static inline void shuffle_free_memory(pg_data_t *pgdat)
@@ -58,10 +45,6 @@ static inline void shuffle_zone(struct zone *z)
 {
 }
 
-static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
-{
-}
-
 static inline bool is_shuffle_order(int order)
 {
 	return false;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-06-24  9:47 ` [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
@ 2020-07-01 19:33   ` Sasha Levin
  2020-07-02  7:24     ` David Hildenbrand
  2020-07-06  8:11     ` David Hildenbrand
  2020-07-10 14:03   ` Sasha Levin
  1 sibling, 2 replies; 8+ messages in thread
From: Sasha Levin @ 2020-07-01 19:33 UTC (permalink / raw)
  To: Sasha Levin, David Hildenbrand, linux-kernel
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	Minchan Kim, Huang Ying, Wei Yang, Mel Gorman, stable

Hi

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag
fixing commit: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization").

The bot has tested the following trees: v5.7.6, v5.4.49.

v5.7.6: Build OK!
v5.4.49: Failed to apply! Possible dependencies:
    e03d1f78341e8 ("mm/sparse: rename pfn_present() to pfn_in_present_section()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-07-01 19:33   ` Sasha Levin
@ 2020-07-02  7:24     ` David Hildenbrand
  2020-07-06  8:11     ` David Hildenbrand
  1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2020-07-02  7:24 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	Minchan Kim, Huang Ying, Wei Yang, Mel Gorman, stable

On 01.07.20 21:33, Sasha Levin wrote:
> Hi
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag
> fixing commit: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization").
> 
> The bot has tested the following trees: v5.7.6, v5.4.49.
> 
> v5.7.6: Build OK!
> v5.4.49: Failed to apply! Possible dependencies:
>     e03d1f78341e8 ("mm/sparse: rename pfn_present() to pfn_in_present_section()")
> 
> 
> NOTE: The patch will not be queued to stable trees until it is upstream.
> 
> How should we proceed with this patch?
> 

Well, it contains "Cc: stable@vger.kernel.org # v5.2+" so yes, please queue.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-07-01 19:33   ` Sasha Levin
  2020-07-02  7:24     ` David Hildenbrand
@ 2020-07-06  8:11     ` David Hildenbrand
  1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2020-07-06  8:11 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	Minchan Kim, Huang Ying, Wei Yang, Mel Gorman, stable

On 01.07.20 21:33, Sasha Levin wrote:
> Hi
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag
> fixing commit: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization").
> 
> The bot has tested the following trees: v5.7.6, v5.4.49.
> 
> v5.7.6: Build OK!
> v5.4.49: Failed to apply! Possible dependencies:
>     e03d1f78341e8 ("mm/sparse: rename pfn_present() to pfn_in_present_section()")
> 
> 
> NOTE: The patch will not be queued to stable trees until it is upstream.
> 
> How should we proceed with this patch?
> 

It contains
	Cc: <stable@vger.kernel.org>	[5.2+]
so a stable backport is desired once upstream. The v5.4.49 backport
should be fairly easy.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps
  2020-06-24  9:47 ` [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
  2020-07-01 19:33   ` Sasha Levin
@ 2020-07-10 14:03   ` Sasha Levin
  1 sibling, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2020-07-10 14:03 UTC (permalink / raw)
  To: Sasha Levin, David Hildenbrand, linux-kernel
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Michal Hocko,
	Minchan Kim, Huang Ying, Wei Yang, Mel Gorman, stable

Hi

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag
fixing commit: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization").

The bot has tested the following trees: v5.7.6, v5.4.49.

v5.7.6: Build OK!
v5.4.49: Failed to apply! Possible dependencies:
    e03d1f78341e8 ("mm/sparse: rename pfn_present() to pfn_in_present_section()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-10 14:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24  9:47 [PATCH v3 0/3] mm/shuffle: fix and cleanups David Hildenbrand
2020-06-24  9:47 ` [PATCH v3 1/3] mm/shuffle: don't move pages between zones and don't read garbage memmaps David Hildenbrand
2020-07-01 19:33   ` Sasha Levin
2020-07-02  7:24     ` David Hildenbrand
2020-07-06  8:11     ` David Hildenbrand
2020-07-10 14:03   ` Sasha Levin
2020-06-24  9:47 ` [PATCH v3 2/3] mm/memory_hotplug: document why shuffle_zone() is relevant David Hildenbrand
2020-06-24  9:47 ` [PATCH v3 3/3] mm/shuffle: remove dynamic reconfiguration David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).