linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline}
       [not found] <20210408121804.10440-1-osalvador@suse.de>
@ 2021-04-08 12:17 ` Oscar Salvador
  2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
	Oscar Salvador

This is a preparatory patch that introduces two new functions:
memory_block_online() and memory_block_offline().

For now, these functions will only call online_pages() and offline_pages()
respectively, but they will be later in charge of preparing the vmemmap
pages, carrying out the initialization and proper accounting of such
pages.

Since memory_block struct contains all the information, pass this struct
down the chain till the end functions.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f35298425575..f209925a5d4e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -169,30 +169,41 @@ int memory_notify(unsigned long val, void *v)
 	return blocking_notifier_call_chain(&memory_chain, val, v);
 }
 
+static int memory_block_online(struct memory_block *mem)
+{
+	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+
+	return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
+}
+
+static int memory_block_offline(struct memory_block *mem)
+{
+	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+
+	return offline_pages(start_pfn, nr_pages);
+}
+
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(unsigned long start_section_nr, unsigned long action,
-		    int online_type, int nid)
+memory_block_action(struct memory_block *mem, unsigned long action)
 {
-	unsigned long start_pfn;
-	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	int ret;
 
-	start_pfn = section_nr_to_pfn(start_section_nr);
-
 	switch (action) {
 	case MEM_ONLINE:
-		ret = online_pages(start_pfn, nr_pages, online_type, nid);
+		ret = memory_block_online(mem);
 		break;
 	case MEM_OFFLINE:
-		ret = offline_pages(start_pfn, nr_pages);
+		ret = memory_block_offline(mem);
 		break;
 	default:
 		WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
-		     "%ld\n", __func__, start_section_nr, action, action);
+		     "%ld\n", __func__, mem->start_section_nr, action, action);
 		ret = -EINVAL;
 	}
 
@@ -210,9 +221,7 @@ static int memory_block_change_state(struct memory_block *mem,
 	if (to_state == MEM_OFFLINE)
 		mem->state = MEM_GOING_OFFLINE;
 
-	ret = memory_block_action(mem->start_section_nr, to_state,
-				  mem->online_type, mem->nid);
-
+	ret = memory_block_action(mem, to_state);
 	mem->state = ret ? from_state_req : to_state;
 
 	return ret;
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check
       [not found] <20210408121804.10440-1-osalvador@suse.de>
  2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
@ 2021-04-08 12:17 ` Oscar Salvador
  2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
	Oscar Salvador

When using self-hosted vmemmap pages, the number of pages passed to
{online,offline}_pages might not fully span sections, but they always
fully span pageblocks.
Relax the check account for that case.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0cdbbfbc5757..25e59d5dc13c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -838,9 +838,14 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	int ret;
 	struct memory_notify arg;
 
-	/* We can only online full sections (e.g., SECTION_IS_ONLINE) */
+	/* We can only offline full sections (e.g., SECTION_IS_ONLINE).
+	 * However, when using e.g: memmap_on_memory, some pages are initialized
+	 * prior to calling in here. The remaining amount of pages must be
+	 * pageblock aligned.
+	 */
 	if (WARN_ON_ONCE(!nr_pages ||
-			 !IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION)))
+			 !IS_ALIGNED(pfn, pageblock_nr_pages) ||
+			 !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
 		return -EINVAL;
 
 	mem_hotplug_begin();
@@ -1573,9 +1578,14 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 	int ret, node;
 	char *reason;
 
-	/* We can only offline full sections (e.g., SECTION_IS_ONLINE) */
+	/* We can only offline full sections (e.g., SECTION_IS_ONLINE).
+	 * However, when using e.g: memmap_on_memory, some pages are initialized
+	 * prior to calling in here. The remaining amount of pages must be
+	 * pageblock aligned.
+	 */
 	if (WARN_ON_ONCE(!nr_pages ||
-			 !IS_ALIGNED(start_pfn | nr_pages, PAGES_PER_SECTION)))
+			 !IS_ALIGNED(start_pfn, pageblock_nr_pages) ||
+			 !IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)))
 		return -EINVAL;
 
 	mem_hotplug_begin();
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count()
       [not found] <20210408121804.10440-1-osalvador@suse.de>
  2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
  2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
@ 2021-04-08 12:17 ` Oscar Salvador
  2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
	Oscar Salvador

From: David Hildenbrand <david@redhat.com>

Let's have a single place (inspired by adjust_managed_page_count()) where
we adjust present pages.
In contrast to adjust_managed_page_count(), only memory onlining/offlining
is allowed to modify the number of present pages.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory_hotplug.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 25e59d5dc13c..d05056b3c173 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -829,6 +829,16 @@ struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 	return default_zone_for_pfn(nid, start_pfn, nr_pages);
 }
 
+static void adjust_present_page_count(struct zone *zone, long nr_pages)
+{
+	unsigned long flags;
+
+	zone->present_pages += nr_pages;
+	pgdat_resize_lock(zone->zone_pgdat, &flags);
+	zone->zone_pgdat->node_present_pages += nr_pages;
+	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+}
+
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		       int online_type, int nid)
 {
@@ -882,11 +892,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	}
 
 	online_pages_range(pfn, nr_pages);
-	zone->present_pages += nr_pages;
-
-	pgdat_resize_lock(zone->zone_pgdat, &flags);
-	zone->zone_pgdat->node_present_pages += nr_pages;
-	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+	adjust_present_page_count(zone, nr_pages);
 
 	node_states_set_node(nid, &arg);
 	if (need_zonelists_rebuild)
@@ -1701,11 +1707,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 
 	/* removal success */
 	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
-	zone->present_pages -= nr_pages;
-
-	pgdat_resize_lock(zone->zone_pgdat, &flags);
-	zone->zone_pgdat->node_present_pages -= nr_pages;
-	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+	adjust_present_page_count(zone, -nr_pages);
 
 	init_per_zone_wmark_min();
 
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported
       [not found] <20210408121804.10440-1-osalvador@suse.de>
                   ` (2 preceding siblings ...)
  2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
@ 2021-04-08 12:18 ` Oscar Salvador
  2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
	Oscar Salvador, Michal Hocko

Let the caller check whether it can pass MHP_MEMMAP_ON_MEMORY by
checking mhp_supports_memmap_on_memory().
MHP_MEMMAP_ON_MEMORY can only be set in case
ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE is enabled, the architecture supports
altmap, and the range to be added spans a single memory block.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 drivers/acpi/acpi_memhotplug.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index b02fd51e5589..8cc195c4c861 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -171,6 +171,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 	acpi_handle handle = mem_device->device->handle;
 	int result, num_enabled = 0;
 	struct acpi_memory_info *info;
+	mhp_t mhp_flags = MHP_NONE;
 	int node;
 
 	node = acpi_get_node(handle);
@@ -194,8 +195,10 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
+		if (mhp_supports_memmap_on_memory(info->length))
+			mhp_flags |= MHP_MEMMAP_ON_MEMORY;
 		result = __add_memory(node, info->start_addr, info->length,
-				      MHP_NONE);
+				      mhp_flags);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
       [not found] <20210408121804.10440-1-osalvador@suse.de>
                   ` (3 preceding siblings ...)
  2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
@ 2021-04-08 12:18 ` Oscar Salvador
  2021-04-08 13:36   ` Christoph Hellwig
  2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
	Oscar Salvador

Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..9f0211df1746 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2433,6 +2433,9 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 	def_bool y
 	depends on MEMORY_HOTPLUG
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+	def_bool y
+
 config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 8/8] arm64/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
       [not found] <20210408121804.10440-1-osalvador@suse.de>
                   ` (4 preceding siblings ...)
  2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
@ 2021-04-08 12:18 ` Oscar Salvador
  2021-04-15 10:38 ` [PATCH v7 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador
       [not found] ` <20210408121804.10440-5-osalvador@suse.de>
  7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
	Oscar Salvador

Enable arm64 platform to use the MHP_MEMMAP_ON_MEMORY feature.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 arch/arm64/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5656e7aacd69..0e23acd173f0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -309,6 +309,9 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
 config ARCH_ENABLE_MEMORY_HOTREMOVE
 	def_bool y
 
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+	def_bool y
+
 config SMP
 	def_bool y
 
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
@ 2021-04-08 13:36   ` Christoph Hellwig
  2021-04-08 18:19     ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-04-08 13:36 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, David Hildenbrand, Michal Hocko,
	Anshuman Khandual, Pavel Tatashin, Vlastimil Babka, linux-mm,
	linux-kernel

On Thu, Apr 08, 2021 at 02:18:03PM +0200, Oscar Salvador wrote:
> Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
> +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> +	def_bool y

This needs to go into a common file, with the architectures just
selecting the symbol.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  2021-04-08 13:36   ` Christoph Hellwig
@ 2021-04-08 18:19     ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2021-04-08 18:19 UTC (permalink / raw)
  To: Christoph Hellwig, Oscar Salvador
  Cc: Andrew Morton, Michal Hocko, Anshuman Khandual, Pavel Tatashin,
	Vlastimil Babka, linux-mm, linux-kernel

On 08.04.21 15:36, Christoph Hellwig wrote:
> On Thu, Apr 08, 2021 at 02:18:03PM +0200, Oscar Salvador wrote:
>> Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
>> +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>> +	def_bool y
> 
> This needs to go into a common file, with the architectures just
> selecting the symbol.
> 

I'd like to point out that we have plenty of counter examples. Meaning: 
it's really hard to figure out what the correct way is. Unfortunately, 
the correct way also doesn't seem to be documented :(

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 0/8] Allocate memmap from hotadded memory (per device)
       [not found] <20210408121804.10440-1-osalvador@suse.de>
                   ` (5 preceding siblings ...)
  2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
@ 2021-04-15 10:38 ` Oscar Salvador
       [not found] ` <20210408121804.10440-5-osalvador@suse.de>
  7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-15 10:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel

On Thu, Apr 08, 2021 at 02:17:56PM +0200, Oscar Salvador wrote:
> Hi,
> 
> I decided to send another version with the fixups included as it seemed a bit
> awkward otherwise. It should ease the review.
> Sorry for the spam.

Gentle ping :-) 

hint: patch#4 is the one that needs some looking

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
       [not found] ` <20210408121804.10440-5-osalvador@suse.de>
@ 2021-04-15 11:19   ` David Hildenbrand
  2021-04-16  7:25     ` Oscar Salvador
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2021-04-15 11:19 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Michal Hocko, Anshuman Khandual, Pavel Tatashin, Vlastimil Babka,
	linux-mm, linux-kernel

On 08.04.21 14:18, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
> 
> This has some disadvantages:
>   a) an existing memory is consumed for that purpose
>      (eg: ~2MB per 128MB memory section on x86_64)
>   b) if the whole node is movable then we have off-node struct pages
>      which has performance drawbacks.
>   c) It might be there are no PMD_ALIGNED chunks so memmap array gets
>      populated with base pages.
> 
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
> 
> Vmemap page tables can map arbitrary memory.
> That means that we can simply use the beginning of each memory section and
> map struct pages there.
> struct pages which back the allocated space then just need to be treated
> carefully.
> 
> Implementation wise we will reuse vmem_altmap infrastructure to override
> the default allocator used by __populate_section_memmap.
> Part of the implementation also relies on memory_block structure gaining
> a new field which specifies the number of vmemmap_pages at the beginning.
> This patch also introduces the following functions:
> 
>   - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> 		       calls kasan_add_zero_shadow() or the vmemmap range and marks
> 		       online as many sections as vmemmap pages fully span.
>   - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
> 			 present_pages
>   - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
> 

This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.

It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.


> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   drivers/base/memory.c          |  64 ++++++++++++++--
>   include/linux/memory.h         |   8 +-
>   include/linux/memory_hotplug.h |  13 ++++
>   include/linux/memremap.h       |   2 +-
>   include/linux/mmzone.h         |   7 +-
>   mm/Kconfig                     |   5 ++
>   mm/memory_hotplug.c            | 162 ++++++++++++++++++++++++++++++++++++++++-
>   mm/sparse.c                    |   2 -
>   8 files changed, 247 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f209925a5d4e..a5e536a3e9a4 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -173,16 +173,65 @@ static int memory_block_online(struct memory_block *mem)
>   {
>   	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>   	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> +	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> +	int ret;
> +
> +	/*
> +	 * Although vmemmap pages have a different lifecycle than the pages
> +	 * they describe (they remain until the memory is unplugged), doing
> +	 * its initialization and accounting at hot-{online,offline} stage
> +	 * simplifies things a lot
> +	 */

I suggest detecting the zone in here and just passing it down to online_pages().

> +	if (nr_vmemmap_pages) {
> +		ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid,
> +					 mem->online_type);
> +		if (ret)
> +			return ret;
> +	}
>   
> -	return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
> +	ret = online_pages(start_pfn + nr_vmemmap_pages,
> +			   nr_pages - nr_vmemmap_pages, mem->online_type,
> +			   mem->nid);
> +
> +	/*
> +	 * Undo the work if online_pages() fails.
> +	 */
> +	if (ret && nr_vmemmap_pages) {
> +		vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages);
> +		vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
> +	}
> +
> +	return ret;
>   }

My take would be doing the present page adjustment after onlining succeeded:

static int memory_block_online(struct memory_block *mem)
{
	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
	struct zone *zone;
	int ret;

	zone = mhp_get_target_zone(mem->nid, mem->online_type);

	if (nr_vmemmap_pages) {
		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
		if (ret)
			return ret;
	}

	ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages, zone);
	if (ret) {
		if (nr_vmemmap_pages)
			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
		return ret;
	}

	/*
	 * Account once onlining succeeded. If the page was unpopulated,
	 * it is now already properly populated.
	 */
	if (nr_vmemmap_pages)
		adjust_present_page_count(zone, nr_vmemmap_pages);
	return 0;		
}

And the opposite:

static int memory_block_offline(struct memory_block *mem)
{
	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
	struct zone *zone;
	int ret;

	zone = page_zone(pfn_to_page(start_pfn));


	/*
	 * Unaccount before offlining, such that unpopulated zones can
	 * properly be torn down in offline_pages().
	 */
	if (nr_vmemmap_pages)
		adjust_present_page_count(zone, -nr_vmemmap_pages);

	ret = offline_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages);
	if (ret) {
		if (nr_vmemmap_pages)
			adjust_present_page_count(zone, +nr_vmemmap_pages);
		return ret;
	}

	if (nr_vmemmap_pages)
		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
	return 0;		
}

Having to do the present page adjustment manually is not completely nice,
but it's easier than manually having to mess with zones becomming populated/unpopulated
outside of online_pages()/offline_pages().


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
  2021-04-15 11:19   ` [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range David Hildenbrand
@ 2021-04-16  7:25     ` Oscar Salvador
  2021-04-16  8:32       ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2021-04-16  7:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Michal Hocko, Anshuman Khandual, Pavel Tatashin,
	Vlastimil Babka, linux-mm, linux-kernel

On Thu, Apr 15, 2021 at 01:19:59PM +0200, David Hildenbrand wrote:
> > Implementation wise we will reuse vmem_altmap infrastructure to override
> > the default allocator used by __populate_section_memmap.
> > Part of the implementation also relies on memory_block structure gaining
> > a new field which specifies the number of vmemmap_pages at the beginning.
> > This patch also introduces the following functions:
> > 
> >   - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> > 		       calls kasan_add_zero_shadow() or the vmemmap range and marks
> > 		       online as many sections as vmemmap pages fully span.
> >   - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
> > 			 present_pages
> >   - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
> > 
> 
> This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.

Yeah, was not happy either with the names but at that time I could not
come up with anything better.

> It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.

This should be explained by this:

"On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling
offline_pages(), because offline_pages() performs the tearing-down of kthreads
and the rebuilding of the zonelists if the node/zone become empty."

Is not that clear?

> I suggest detecting the zone in here and just passing it down to online_pages().

Ok, makes sense.

> My take would be doing the present page adjustment after onlining succeeded:
> 
> static int memory_block_online(struct memory_block *mem)
> {
> 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> 	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> 	struct zone *zone;
> 	int ret;
> 
> 	zone = mhp_get_target_zone(mem->nid, mem->online_type);
> 
> 	if (nr_vmemmap_pages) {
> 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
> 		if (ret)
> 			return ret;
> 	}
> 
> 	ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages, zone);
> 	if (ret) {
> 		if (nr_vmemmap_pages)
> 			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
> 		return ret;
> 	}
> 
> 	/*
> 	 * Account once onlining succeeded. If the page was unpopulated,
> 	 * it is now already properly populated.
> 	 */
> 	if (nr_vmemmap_pages)
> 		adjust_present_page_count(zone, nr_vmemmap_pages);
> 	return 0;		
> }
> 
> And the opposite:
> 
> static int memory_block_offline(struct memory_block *mem)
> {
> 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> 	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> 	struct zone *zone;
> 	int ret;
> 
> 	zone = page_zone(pfn_to_page(start_pfn));
> 
> 
> 	/*
> 	 * Unaccount before offlining, such that unpopulated zones can
> 	 * properly be torn down in offline_pages().
> 	 */
> 	if (nr_vmemmap_pages)
> 		adjust_present_page_count(zone, -nr_vmemmap_pages);
> 
> 	ret = offline_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages);
> 	if (ret) {
> 		if (nr_vmemmap_pages)
> 			adjust_present_page_count(zone, +nr_vmemmap_pages);
> 		return ret;
> 	}
> 
> 	if (nr_vmemmap_pages)
> 		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
> 	return 0;		
> }
> 
> Having to do the present page adjustment manually is not completely nice,
> but it's easier than manually having to mess with zones becomming populated/unpopulated
> outside of online_pages()/offline_pages().

Ok, I like this, and the naming is much better.
I will work on this shortly.

thanks David!

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
  2021-04-16  7:25     ` Oscar Salvador
@ 2021-04-16  8:32       ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2021-04-16  8:32 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Michal Hocko, Anshuman Khandual, Pavel Tatashin,
	Vlastimil Babka, linux-mm, linux-kernel

On 16.04.21 09:25, Oscar Salvador wrote:
> On Thu, Apr 15, 2021 at 01:19:59PM +0200, David Hildenbrand wrote:
>>> Implementation wise we will reuse vmem_altmap infrastructure to override
>>> the default allocator used by __populate_section_memmap.
>>> Part of the implementation also relies on memory_block structure gaining
>>> a new field which specifies the number of vmemmap_pages at the beginning.
>>> This patch also introduces the following functions:
>>>
>>>    - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
>>> 		       calls kasan_add_zero_shadow() or the vmemmap range and marks
>>> 		       online as many sections as vmemmap pages fully span.
>>>    - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
>>> 			 present_pages
>>>    - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
>>>
>>
>> This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.
> 
> Yeah, was not happy either with the names but at that time I could not
> come up with anything better.
> 
>> It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.
> 
> This should be explained by this:
> 
> "On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling
> offline_pages(), because offline_pages() performs the tearing-down of kthreads
> and the rebuilding of the zonelists if the node/zone become empty."
> 
> Is not that clear?

Ehm, it is; for some reason my eyes ignored it -- sorry.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-04-16  8:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210408121804.10440-1-osalvador@suse.de>
2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-04-08 13:36   ` Christoph Hellwig
2021-04-08 18:19     ` David Hildenbrand
2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
2021-04-15 10:38 ` [PATCH v7 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador
     [not found] ` <20210408121804.10440-5-osalvador@suse.de>
2021-04-15 11:19   ` [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range David Hildenbrand
2021-04-16  7:25     ` Oscar Salvador
2021-04-16  8:32       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).