* [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline}
[not found] <20210408121804.10440-1-osalvador@suse.de>
@ 2021-04-08 12:17 ` Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:17 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
Oscar Salvador
This is a preparatory patch that introduces two new functions:
memory_block_online() and memory_block_offline().
For now, these functions will only call online_pages() and offline_pages()
respectively, but they will be later in charge of preparing the vmemmap
pages, carrying out the initialization and proper accounting of such
pages.
Since memory_block struct contains all the information, pass this struct
down the chain till the end functions.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
drivers/base/memory.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f35298425575..f209925a5d4e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -169,30 +169,41 @@ int memory_notify(unsigned long val, void *v)
return blocking_notifier_call_chain(&memory_chain, val, v);
}
+static int memory_block_online(struct memory_block *mem)
+{
+ unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+ unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+
+ return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
+}
+
+static int memory_block_offline(struct memory_block *mem)
+{
+ unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+ unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+
+ return offline_pages(start_pfn, nr_pages);
+}
+
/*
* MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
* OK to have direct references to sparsemem variables in here.
*/
static int
-memory_block_action(unsigned long start_section_nr, unsigned long action,
- int online_type, int nid)
+memory_block_action(struct memory_block *mem, unsigned long action)
{
- unsigned long start_pfn;
- unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
int ret;
- start_pfn = section_nr_to_pfn(start_section_nr);
-
switch (action) {
case MEM_ONLINE:
- ret = online_pages(start_pfn, nr_pages, online_type, nid);
+ ret = memory_block_online(mem);
break;
case MEM_OFFLINE:
- ret = offline_pages(start_pfn, nr_pages);
+ ret = memory_block_offline(mem);
break;
default:
WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
- "%ld\n", __func__, start_section_nr, action, action);
+ "%ld\n", __func__, mem->start_section_nr, action, action);
ret = -EINVAL;
}
@@ -210,9 +221,7 @@ static int memory_block_change_state(struct memory_block *mem,
if (to_state == MEM_OFFLINE)
mem->state = MEM_GOING_OFFLINE;
- ret = memory_block_action(mem->start_section_nr, to_state,
- mem->online_type, mem->nid);
-
+ ret = memory_block_action(mem, to_state);
mem->state = ret ? from_state_req : to_state;
return ret;
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check
[not found] <20210408121804.10440-1-osalvador@suse.de>
2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
@ 2021-04-08 12:17 ` Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:17 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
Oscar Salvador
When using self-hosted vmemmap pages, the number of pages passed to
{online,offline}_pages might not fully span sections, but they always
fully span pageblocks.
Relax the check account for that case.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
mm/memory_hotplug.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0cdbbfbc5757..25e59d5dc13c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -838,9 +838,14 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
int ret;
struct memory_notify arg;
- /* We can only online full sections (e.g., SECTION_IS_ONLINE) */
+ /* We can only offline full sections (e.g., SECTION_IS_ONLINE).
+ * However, when using e.g: memmap_on_memory, some pages are initialized
+ * prior to calling in here. The remaining amount of pages must be
+ * pageblock aligned.
+ */
if (WARN_ON_ONCE(!nr_pages ||
- !IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION)))
+ !IS_ALIGNED(pfn, pageblock_nr_pages) ||
+ !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
return -EINVAL;
mem_hotplug_begin();
@@ -1573,9 +1578,14 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
int ret, node;
char *reason;
- /* We can only offline full sections (e.g., SECTION_IS_ONLINE) */
+ /* We can only offline full sections (e.g., SECTION_IS_ONLINE).
+ * However, when using e.g: memmap_on_memory, some pages are initialized
+ * prior to calling in here. The remaining amount of pages must be
+ * pageblock aligned.
+ */
if (WARN_ON_ONCE(!nr_pages ||
- !IS_ALIGNED(start_pfn | nr_pages, PAGES_PER_SECTION)))
+ !IS_ALIGNED(start_pfn, pageblock_nr_pages) ||
+ !IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)))
return -EINVAL;
mem_hotplug_begin();
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count()
[not found] <20210408121804.10440-1-osalvador@suse.de>
2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
@ 2021-04-08 12:17 ` Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:17 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
Oscar Salvador
From: David Hildenbrand <david@redhat.com>
Let's have a single place (inspired by adjust_managed_page_count()) where
we adjust present pages.
In contrast to adjust_managed_page_count(), only memory onlining/offlining
is allowed to modify the number of present pages.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
mm/memory_hotplug.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 25e59d5dc13c..d05056b3c173 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -829,6 +829,16 @@ struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
return default_zone_for_pfn(nid, start_pfn, nr_pages);
}
+static void adjust_present_page_count(struct zone *zone, long nr_pages)
+{
+ unsigned long flags;
+
+ zone->present_pages += nr_pages;
+ pgdat_resize_lock(zone->zone_pgdat, &flags);
+ zone->zone_pgdat->node_present_pages += nr_pages;
+ pgdat_resize_unlock(zone->zone_pgdat, &flags);
+}
+
int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
int online_type, int nid)
{
@@ -882,11 +892,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
}
online_pages_range(pfn, nr_pages);
- zone->present_pages += nr_pages;
-
- pgdat_resize_lock(zone->zone_pgdat, &flags);
- zone->zone_pgdat->node_present_pages += nr_pages;
- pgdat_resize_unlock(zone->zone_pgdat, &flags);
+ adjust_present_page_count(zone, nr_pages);
node_states_set_node(nid, &arg);
if (need_zonelists_rebuild)
@@ -1701,11 +1707,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
/* removal success */
adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
- zone->present_pages -= nr_pages;
-
- pgdat_resize_lock(zone->zone_pgdat, &flags);
- zone->zone_pgdat->node_present_pages -= nr_pages;
- pgdat_resize_unlock(zone->zone_pgdat, &flags);
+ adjust_present_page_count(zone, -nr_pages);
init_per_zone_wmark_min();
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported
[not found] <20210408121804.10440-1-osalvador@suse.de>
` (2 preceding siblings ...)
2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
@ 2021-04-08 12:18 ` Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:18 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
Oscar Salvador, Michal Hocko
Let the caller check whether it can pass MHP_MEMMAP_ON_MEMORY by
checking mhp_supports_memmap_on_memory().
MHP_MEMMAP_ON_MEMORY can only be set in case
ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE is enabled, the architecture supports
altmap, and the range to be added spans a single memory block.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
drivers/acpi/acpi_memhotplug.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index b02fd51e5589..8cc195c4c861 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -171,6 +171,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
acpi_handle handle = mem_device->device->handle;
int result, num_enabled = 0;
struct acpi_memory_info *info;
+ mhp_t mhp_flags = MHP_NONE;
int node;
node = acpi_get_node(handle);
@@ -194,8 +195,10 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
+ if (mhp_supports_memmap_on_memory(info->length))
+ mhp_flags |= MHP_MEMMAP_ON_MEMORY;
result = __add_memory(node, info->start_addr, info->length,
- MHP_NONE);
+ mhp_flags);
/*
* If the memory block has been used by the kernel, add_memory()
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
[not found] <20210408121804.10440-1-osalvador@suse.de>
` (3 preceding siblings ...)
2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
@ 2021-04-08 12:18 ` Oscar Salvador
2021-04-08 13:36 ` Christoph Hellwig
2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
` (2 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:18 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
Oscar Salvador
Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
arch/x86/Kconfig | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..9f0211df1746 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2433,6 +2433,9 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
def_bool y
depends on MEMORY_HOTPLUG
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+ def_bool y
+
config USE_PERCPU_NUMA_NODE_ID
def_bool y
depends on NUMA
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 8/8] arm64/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
[not found] <20210408121804.10440-1-osalvador@suse.de>
` (4 preceding siblings ...)
2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
@ 2021-04-08 12:18 ` Oscar Salvador
2021-04-15 10:38 ` [PATCH v7 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador
[not found] ` <20210408121804.10440-5-osalvador@suse.de>
7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-08 12:18 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel,
Oscar Salvador
Enable arm64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
arch/arm64/Kconfig | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5656e7aacd69..0e23acd173f0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -309,6 +309,9 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
config ARCH_ENABLE_MEMORY_HOTREMOVE
def_bool y
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+ def_bool y
+
config SMP
def_bool y
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
@ 2021-04-08 13:36 ` Christoph Hellwig
2021-04-08 18:19 ` David Hildenbrand
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-04-08 13:36 UTC (permalink / raw)
To: Oscar Salvador
Cc: Andrew Morton, David Hildenbrand, Michal Hocko,
Anshuman Khandual, Pavel Tatashin, Vlastimil Babka, linux-mm,
linux-kernel
On Thu, Apr 08, 2021 at 02:18:03PM +0200, Oscar Salvador wrote:
> Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
> +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
> + def_bool y
This needs to go into a common file, with the architectures just
selecting the symbol.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
2021-04-08 13:36 ` Christoph Hellwig
@ 2021-04-08 18:19 ` David Hildenbrand
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2021-04-08 18:19 UTC (permalink / raw)
To: Christoph Hellwig, Oscar Salvador
Cc: Andrew Morton, Michal Hocko, Anshuman Khandual, Pavel Tatashin,
Vlastimil Babka, linux-mm, linux-kernel
On 08.04.21 15:36, Christoph Hellwig wrote:
> On Thu, Apr 08, 2021 at 02:18:03PM +0200, Oscar Salvador wrote:
>> Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
>> +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>> + def_bool y
>
> This needs to go into a common file, with the architectures just
> selecting the symbol.
>
I'd like to point out that we have plenty of counter examples. Meaning:
it's really hard to figure out what the correct way is. Unfortunately,
the correct way also doesn't seem to be documented :(
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 0/8] Allocate memmap from hotadded memory (per device)
[not found] <20210408121804.10440-1-osalvador@suse.de>
` (5 preceding siblings ...)
2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
@ 2021-04-15 10:38 ` Oscar Salvador
[not found] ` <20210408121804.10440-5-osalvador@suse.de>
7 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2021-04-15 10:38 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
Pavel Tatashin, Vlastimil Babka, linux-mm, linux-kernel
On Thu, Apr 08, 2021 at 02:17:56PM +0200, Oscar Salvador wrote:
> Hi,
>
> I decided to send another version with the fixups included as it seemed a bit
> awkward otherwise. It should ease the review.
> Sorry for the spam.
Gentle ping :-)
hint: patch#4 is the one that needs some looking
--
Oscar Salvador
SUSE L3
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
[not found] ` <20210408121804.10440-5-osalvador@suse.de>
@ 2021-04-15 11:19 ` David Hildenbrand
2021-04-16 7:25 ` Oscar Salvador
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2021-04-15 11:19 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Michal Hocko, Anshuman Khandual, Pavel Tatashin, Vlastimil Babka,
linux-mm, linux-kernel
On 08.04.21 14:18, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
>
> This has some disadvantages:
> a) an existing memory is consumed for that purpose
> (eg: ~2MB per 128MB memory section on x86_64)
> b) if the whole node is movable then we have off-node struct pages
> which has performance drawbacks.
> c) It might be there are no PMD_ALIGNED chunks so memmap array gets
> populated with base pages.
>
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
>
> Vmemap page tables can map arbitrary memory.
> That means that we can simply use the beginning of each memory section and
> map struct pages there.
> struct pages which back the allocated space then just need to be treated
> carefully.
>
> Implementation wise we will reuse vmem_altmap infrastructure to override
> the default allocator used by __populate_section_memmap.
> Part of the implementation also relies on memory_block structure gaining
> a new field which specifies the number of vmemmap_pages at the beginning.
> This patch also introduces the following functions:
>
> - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> calls kasan_add_zero_shadow() or the vmemmap range and marks
> online as many sections as vmemmap pages fully span.
> - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
> present_pages
> - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
>
This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.
It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
> drivers/base/memory.c | 64 ++++++++++++++--
> include/linux/memory.h | 8 +-
> include/linux/memory_hotplug.h | 13 ++++
> include/linux/memremap.h | 2 +-
> include/linux/mmzone.h | 7 +-
> mm/Kconfig | 5 ++
> mm/memory_hotplug.c | 162 ++++++++++++++++++++++++++++++++++++++++-
> mm/sparse.c | 2 -
> 8 files changed, 247 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f209925a5d4e..a5e536a3e9a4 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -173,16 +173,65 @@ static int memory_block_online(struct memory_block *mem)
> {
> unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> + unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> + int ret;
> +
> + /*
> + * Although vmemmap pages have a different lifecycle than the pages
> + * they describe (they remain until the memory is unplugged), doing
> + * its initialization and accounting at hot-{online,offline} stage
> + * simplifies things a lot
> + */
I suggest detecting the zone in here and just passing it down to online_pages().
> + if (nr_vmemmap_pages) {
> + ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid,
> + mem->online_type);
> + if (ret)
> + return ret;
> + }
>
> - return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
> + ret = online_pages(start_pfn + nr_vmemmap_pages,
> + nr_pages - nr_vmemmap_pages, mem->online_type,
> + mem->nid);
> +
> + /*
> + * Undo the work if online_pages() fails.
> + */
> + if (ret && nr_vmemmap_pages) {
> + vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages);
> + vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
> + }
> +
> + return ret;
> }
My take would be doing the present page adjustment after onlining succeeded:
static int memory_block_online(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
struct zone *zone;
int ret;
zone = mhp_get_target_zone(mem->nid, mem->online_type);
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
return ret;
}
ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages, zone);
if (ret) {
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
return ret;
}
/*
* Account once onlining succeeded. If the page was unpopulated,
* it is now already properly populated.
*/
if (nr_vmemmap_pages)
adjust_present_page_count(zone, nr_vmemmap_pages);
return 0;
}
And the opposite:
static int memory_block_offline(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
struct zone *zone;
int ret;
zone = page_zone(pfn_to_page(start_pfn));
/*
* Unaccount before offlining, such that unpopulated zones can
* properly be torn down in offline_pages().
*/
if (nr_vmemmap_pages)
adjust_present_page_count(zone, -nr_vmemmap_pages);
ret = offline_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages);
if (ret) {
if (nr_vmemmap_pages)
adjust_present_page_count(zone, +nr_vmemmap_pages);
return ret;
}
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
return 0;
}
Having to do the present page adjustment manually is not completely nice,
but it's easier than manually having to mess with zones becomming populated/unpopulated
outside of online_pages()/offline_pages().
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
2021-04-15 11:19 ` [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range David Hildenbrand
@ 2021-04-16 7:25 ` Oscar Salvador
2021-04-16 8:32 ` David Hildenbrand
0 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2021-04-16 7:25 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Michal Hocko, Anshuman Khandual, Pavel Tatashin,
Vlastimil Babka, linux-mm, linux-kernel
On Thu, Apr 15, 2021 at 01:19:59PM +0200, David Hildenbrand wrote:
> > Implementation wise we will reuse vmem_altmap infrastructure to override
> > the default allocator used by __populate_section_memmap.
> > Part of the implementation also relies on memory_block structure gaining
> > a new field which specifies the number of vmemmap_pages at the beginning.
> > This patch also introduces the following functions:
> >
> > - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> > calls kasan_add_zero_shadow() or the vmemmap range and marks
> > online as many sections as vmemmap pages fully span.
> > - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
> > present_pages
> > - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
> >
>
> This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.
Yeah, was not happy either with the names but at that time I could not
come up with anything better.
> It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.
This should be explained by this:
"On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling
offline_pages(), because offline_pages() performs the tearing-down of kthreads
and the rebuilding of the zonelists if the node/zone become empty."
Is not that clear?
> I suggest detecting the zone in here and just passing it down to online_pages().
Ok, makes sense.
> My take would be doing the present page adjustment after onlining succeeded:
>
> static int memory_block_online(struct memory_block *mem)
> {
> unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> struct zone *zone;
> int ret;
>
> zone = mhp_get_target_zone(mem->nid, mem->online_type);
>
> if (nr_vmemmap_pages) {
> ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
> if (ret)
> return ret;
> }
>
> ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages, zone);
> if (ret) {
> if (nr_vmemmap_pages)
> mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
> return ret;
> }
>
> /*
> * Account once onlining succeeded. If the page was unpopulated,
> * it is now already properly populated.
> */
> if (nr_vmemmap_pages)
> adjust_present_page_count(zone, nr_vmemmap_pages);
> return 0;
> }
>
> And the opposite:
>
> static int memory_block_offline(struct memory_block *mem)
> {
> unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> struct zone *zone;
> int ret;
>
> zone = page_zone(pfn_to_page(start_pfn));
>
>
> /*
> * Unaccount before offlining, such that unpopulated zones can
> * properly be torn down in offline_pages().
> */
> if (nr_vmemmap_pages)
> adjust_present_page_count(zone, -nr_vmemmap_pages);
>
> ret = offline_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages);
> if (ret) {
> if (nr_vmemmap_pages)
> adjust_present_page_count(zone, +nr_vmemmap_pages);
> return ret;
> }
>
> if (nr_vmemmap_pages)
> mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
> return 0;
> }
>
> Having to do the present page adjustment manually is not completely nice,
> but it's easier than manually having to mess with zones becomming populated/unpopulated
> outside of online_pages()/offline_pages().
Ok, I like this, and the naming is much better.
I will work on this shortly.
thanks David!
--
Oscar Salvador
SUSE L3
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
2021-04-16 7:25 ` Oscar Salvador
@ 2021-04-16 8:32 ` David Hildenbrand
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2021-04-16 8:32 UTC (permalink / raw)
To: Oscar Salvador
Cc: Andrew Morton, Michal Hocko, Anshuman Khandual, Pavel Tatashin,
Vlastimil Babka, linux-mm, linux-kernel
On 16.04.21 09:25, Oscar Salvador wrote:
> On Thu, Apr 15, 2021 at 01:19:59PM +0200, David Hildenbrand wrote:
>>> Implementation wise we will reuse vmem_altmap infrastructure to override
>>> the default allocator used by __populate_section_memmap.
>>> Part of the implementation also relies on memory_block structure gaining
>>> a new field which specifies the number of vmemmap_pages at the beginning.
>>> This patch also introduces the following functions:
>>>
>>> - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
>>> calls kasan_add_zero_shadow() or the vmemmap range and marks
>>> online as many sections as vmemmap pages fully span.
>>> - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
>>> present_pages
>>> - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
>>>
>>
>> This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.
>
> Yeah, was not happy either with the names but at that time I could not
> come up with anything better.
>
>> It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.
>
> This should be explained by this:
>
> "On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling
> offline_pages(), because offline_pages() performs the tearing-down of kthreads
> and the rebuilding of the zonelists if the node/zone become empty."
>
> Is not that clear?
Ehm, it is; for some reason my eyes ignored it -- sorry.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-04-16 8:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20210408121804.10440-1-osalvador@suse.de>
2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-04-08 13:36 ` Christoph Hellwig
2021-04-08 18:19 ` David Hildenbrand
2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
2021-04-15 10:38 ` [PATCH v7 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador
[not found] ` <20210408121804.10440-5-osalvador@suse.de>
2021-04-15 11:19 ` [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range David Hildenbrand
2021-04-16 7:25 ` Oscar Salvador
2021-04-16 8:32 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).