linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Wed, 24 Mar 2021 15:42:12 +0100	[thread overview]
Message-ID: <YFtPxH0CT5QZsnR1@dhcp22.suse.cz> (raw)
In-Reply-To: <YFsqkY2Pd+UZ7vzD@dhcp22.suse.cz>

On Wed 24-03-21 13:03:29, Michal Hocko wrote:
> On Wed 24-03-21 11:12:59, Oscar Salvador wrote:
[...]
> > I kind of understand to be reluctant to use vmemmap_pages terminology here, but
> > unfortunately we need to know about it.
> > We could rename nr_vmemmap_pages to offset_buddy_pages or something like that.
> 
> I am not convinced. It seems you are justr trying to graft the new
> functionality in. But I still believe that {on,off}lining shouldn't care
> about where their vmemmaps come from at all. It should be a
> responsibility of the code which reserves that space to compansate for
> accounting. Otherwise we will end up with a hard to maintain code
> because expectations would be spread at way too many places. Not to
> mention different pfns that the code should care about.

The below is a quick hack on top of this patch to illustrate my
thinking. I have dug out all the vmemmap pieces out of the
{on,off}lining and hooked all the accounting when the space is reserved.
This just compiles without any deeper look so there are likely some
minor problems but I haven't really encountered any major problems or
hacks to introduce into the code. The separation seems to be possible.
The diffstat also looks promising. Am I missing something fundamental in
this?

--- 
 drivers/base/memory.c          |   8 +--
 include/linux/memory_hotplug.h |   6 +-
 mm/memory_hotplug.c            | 151 ++++++++++++++++++++---------------------
 3 files changed, 80 insertions(+), 85 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5ea2b3fbce02..9697acfe96eb 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -181,15 +181,15 @@ memory_block_action(unsigned long start_section_nr, unsigned long action,
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	int ret;
 
-	start_pfn = section_nr_to_pfn(start_section_nr);
+	start_pfn = section_nr_to_pfn(start_section_nr) + nr_vmemmap_pages;
+	nr_pages -= nr_vmemmap_pages;
 
 	switch (action) {
 	case MEM_ONLINE:
-		ret = online_pages(start_pfn, nr_pages, nr_vmemmap_pages,
-				   online_type, nid);
+		ret = online_pages(start_pfn, nr_pages, online_type, nid);
 		break;
 	case MEM_OFFLINE:
-		ret = offline_pages(start_pfn, nr_pages, nr_vmemmap_pages);
+		ret = offline_pages(start_pfn, nr_pages);
 		break;
 	default:
 		WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index a85d4b7d15c2..673d2d4a8443 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -109,8 +109,7 @@ extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
 extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 /* VM interface that may be used by firmware interface */
 extern int online_pages(unsigned long pfn, unsigned long nr_pages,
-			unsigned long nr_vmemmap_pages, int online_type,
-			int nid);
+			int online_type, int nid);
 extern struct zone *test_pages_in_a_zone(unsigned long start_pfn,
 					 unsigned long end_pfn);
 extern void __offline_isolated_pages(unsigned long start_pfn,
@@ -317,8 +316,7 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
 #ifdef CONFIG_MEMORY_HOTREMOVE
 
 extern void try_offline_node(int nid);
-extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
-			 unsigned long nr_vmemmap_pages);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern int remove_memory(int nid, u64 start, u64 size);
 extern void __remove_memory(int nid, u64 start, u64 size);
 extern int offline_and_remove_memory(int nid, u64 start, u64 size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c3a98cb8cde..754026a9164d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -844,30 +844,19 @@ struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 }
 
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
-		       unsigned long nr_vmemmap_pages, int online_type, int nid)
+		       int online_type, int nid)
 {
-	unsigned long flags, buddy_start_pfn, buddy_nr_pages;
+	unsigned long flags;
 	struct zone *zone;
 	int need_zonelists_rebuild = 0;
 	int ret;
 	struct memory_notify arg;
 
-	/* We can only online full sections (e.g., SECTION_IS_ONLINE) */
-	if (WARN_ON_ONCE(!nr_pages ||
-			 !IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION)))
-		return -EINVAL;
-
-	buddy_start_pfn = pfn + nr_vmemmap_pages;
-	buddy_nr_pages = nr_pages - nr_vmemmap_pages;
-
 	mem_hotplug_begin();
 
 	/* associate pfn range with the zone */
 	zone = zone_for_pfn_range(online_type, nid, pfn, nr_pages);
-	if (nr_vmemmap_pages)
-		move_pfn_range_to_zone(zone, pfn, nr_vmemmap_pages, NULL,
-				       MIGRATE_UNMOVABLE);
-	move_pfn_range_to_zone(zone, buddy_start_pfn, buddy_nr_pages, NULL,
+	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL,
 			       MIGRATE_ISOLATE);
 
 	arg.start_pfn = pfn;
@@ -884,7 +873,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	 * onlining, such that undo_isolate_page_range() works correctly.
 	 */
 	spin_lock_irqsave(&zone->lock, flags);
-	zone->nr_isolate_pageblock += buddy_nr_pages / pageblock_nr_pages;
+	zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages;
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/*
@@ -897,7 +886,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		setup_zone_pageset(zone);
 	}
 
-	online_pages_range(pfn, nr_pages, buddy_start_pfn);
+	online_pages_range(pfn, nr_pages, pfn);
 	zone->present_pages += nr_pages;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
@@ -910,9 +899,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	zone_pcp_update(zone);
 
 	/* Basic onlining is complete, allow allocation of onlined pages. */
-	undo_isolate_page_range(buddy_start_pfn,
-				buddy_start_pfn + buddy_nr_pages,
-				MIGRATE_MOVABLE);
+	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
 
 	/*
 	 * Freshly onlined pages aren't shuffled (e.g., all pages are placed to
@@ -1126,6 +1113,59 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 	       IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
 }
 
+static void reserve_vmmemmap_space(int nid, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap)
+{
+	struct zone *zone = &NODE_DATA(nid)->node_zones[ZONE_NORMAL];
+
+	altmap->free = nr_pages;
+	altmap->base_pfn = pfn;
+
+	/* initialize struct pages and account for this space */
+	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
+}
+
+static void unaccount_vmemmap_space(int nid, unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct zone *zone = &NODE_DATA(nid)->node_zones[ZONE_NORMAL];
+	unsigned long flags;
+
+	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
+	zone->present_pages -= nr_pages;
+
+	pgdat_resize_lock(zone->zone_pgdat, &flags);
+	zone->zone_pgdat->node_present_pages -= nr_pages;
+	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+
+	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
+}
+
+static int remove_memory_block_cb(struct memory_block *mem, void *arg)
+{
+	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+	struct vmem_altmap mhp_altmap = {};
+	struct vmem_altmap *altmap = NULL;
+	u64 start = PFN_PHYS(section_nr_to_pfn(mem->start_section_nr));
+	u64 size = memory_block_size_bytes();
+
+	if (!mem->nr_vmemmap_pages) {
+		arch_remove_memory(mem->nid, start, size, NULL);
+		return 0;
+	}
+
+	/*
+	 * Let remove_pmd_table->free_hugepage_table
+	 * do the right thing if we used vmem_altmap
+	 * when hot-adding the range.
+	 */
+	mhp_altmap.alloc = nr_vmemmap_pages;
+	altmap = &mhp_altmap;
+
+	unaccount_vmemmap_space(mem->nid, PHYS_PFN(start), nr_vmemmap_pages);
+	arch_remove_memory(mem->nid, start, size, altmap);
+
+	return 0;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1170,8 +1210,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 			ret = -EINVAL;
 			goto error;
 		}
-		mhp_altmap.free = PHYS_PFN(size);
-		mhp_altmap.base_pfn = PHYS_PFN(start);
+		reserve_vmmemmap_space(nid, PHYS_PFN(start), PHYS_PFN(size), &mhp_altmap);
 		params.altmap = &mhp_altmap;
 	}
 
@@ -1639,25 +1678,16 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
 	return 0;
 }
 
-int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
-			unsigned long nr_vmemmap_pages)
+int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	const unsigned long end_pfn = start_pfn + nr_pages;
-	unsigned long pfn, buddy_start_pfn, buddy_nr_pages, system_ram_pages = 0;
+	unsigned long pfn, system_ram_pages = 0;
 	unsigned long flags;
 	struct zone *zone;
 	struct memory_notify arg;
 	int ret, node;
 	char *reason;
 
-	/* We can only offline full sections (e.g., SECTION_IS_ONLINE) */
-	if (WARN_ON_ONCE(!nr_pages ||
-			 !IS_ALIGNED(start_pfn | nr_pages, PAGES_PER_SECTION)))
-		return -EINVAL;
-
-	buddy_start_pfn = start_pfn + nr_vmemmap_pages;
-	buddy_nr_pages = nr_pages - nr_vmemmap_pages;
-
 	mem_hotplug_begin();
 
 	/*
@@ -1693,7 +1723,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	zone_pcp_disable(zone);
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(buddy_start_pfn, end_pfn,
+	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
@@ -1713,7 +1743,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	}
 
 	do {
-		pfn = buddy_start_pfn;
+		pfn = start_pfn;
 		do {
 			if (signal_pending(current)) {
 				ret = -EINTR;
@@ -1744,18 +1774,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 		 * offlining actually in order to make hugetlbfs's object
 		 * counting consistent.
 		 */
-		ret = dissolve_free_huge_pages(buddy_start_pfn, end_pfn);
+		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
 		if (ret) {
 			reason = "failure to dissolve huge pages";
 			goto failed_removal_isolated;
 		}
 
-		ret = test_pages_isolated(buddy_start_pfn, end_pfn, MEMORY_OFFLINE);
+		ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
 
 	} while (ret);
 
 	/* Mark all sections offline and remove free pages from the buddy. */
-	__offline_isolated_pages(start_pfn, end_pfn, buddy_start_pfn);
+	__offline_isolated_pages(start_pfn, end_pfn, start_pfn);
 	pr_debug("Offlined Pages %ld\n", nr_pages);
 
 	/*
@@ -1764,13 +1794,13 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	 * of isolated pageblocks, memory onlining will properly revert this.
 	 */
 	spin_lock_irqsave(&zone->lock, flags);
-	zone->nr_isolate_pageblock -= buddy_nr_pages / pageblock_nr_pages;
+	zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages;
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	zone_pcp_enable(zone);
 
 	/* removal success */
-	adjust_managed_page_count(pfn_to_page(start_pfn), -buddy_nr_pages);
+	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
 	zone->present_pages -= nr_pages;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
@@ -1799,7 +1829,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	return 0;
 
 failed_removal_isolated:
-	undo_isolate_page_range(buddy_start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 failed_removal_pcplists_disabled:
 	zone_pcp_enable(zone);
@@ -1830,14 +1860,6 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 	return 0;
 }
 
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
-{
-	/*
-	 * If not set, continue with the next block.
-	 */
-	return mem->nr_vmemmap_pages;
-}
-
 static int check_cpu_on_node(pg_data_t *pgdat)
 {
 	int cpu;
@@ -1912,9 +1934,6 @@ EXPORT_SYMBOL(try_offline_node);
 static int __ref try_remove_memory(int nid, u64 start, u64 size)
 {
 	int rc = 0;
-	struct vmem_altmap mhp_altmap = {};
-	struct vmem_altmap *altmap = NULL;
-	unsigned long nr_vmemmap_pages = 0;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
 
@@ -1927,31 +1946,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 	if (rc)
 		return rc;
 
-	/*
-	 * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
-	 * the same granularity it was added - a single memory block.
-	 */
-	if (memmap_on_memory) {
-		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
-						      get_nr_vmemmap_pages_cb);
-		if (nr_vmemmap_pages) {
-			if (size != memory_block_size_bytes()) {
-				pr_warn("Refuse to remove %#llx - %#llx,"
-					"wrong granularity\n",
-					 start, start + size);
-				return -EINVAL;
-			}
-
-			/*
-			 * Let remove_pmd_table->free_hugepage_table
-			 * do the right thing if we used vmem_altmap
-			 * when hot-adding the range.
-			 */
-			mhp_altmap.alloc = nr_vmemmap_pages;
-			altmap = &mhp_altmap;
-		}
-	}
-
 	/* remove memmap entry */
 	firmware_map_remove(start, start + size, "System RAM");
 
@@ -1963,7 +1957,10 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 
 	mem_hotplug_begin();
 
-	arch_remove_memory(nid, start, size, altmap);
+	if (!memmap_on_memory)
+		arch_remove_memory(nid, start, size, NULL);
+	else
+		walk_memory_blocks(start, size, NULL, remove_memory_block_cb);
 
 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
 		memblock_free(start, size);
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2021-03-24 14:43 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  9:26 [PATCH v5 0/5] Allocate memmap from hotadded memory (per device) Oscar Salvador
2021-03-19  9:26 ` [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-03-19 10:20   ` David Hildenbrand
2021-03-19 10:31     ` Oscar Salvador
2021-03-19 12:04       ` David Hildenbrand
2021-03-23 10:11   ` Michal Hocko
2021-03-24 10:12     ` Oscar Salvador
2021-03-24 12:03       ` Michal Hocko
2021-03-24 12:10         ` Michal Hocko
2021-03-24 12:23           ` David Hildenbrand
2021-03-24 12:37             ` Michal Hocko
2021-03-24 13:13               ` David Hildenbrand
2021-03-24 13:40                 ` Michal Hocko
2021-03-24 14:05                   ` David Hildenbrand
2021-03-24 13:27         ` Oscar Salvador
2021-03-24 14:42         ` Michal Hocko [this message]
2021-03-24 14:52           ` David Hildenbrand
2021-03-24 16:04             ` Michal Hocko
2021-03-24 19:16               ` David Hildenbrand
2021-03-25  8:07                 ` Oscar Salvador
2021-03-25  9:17                   ` Michal Hocko
2021-03-25 10:55                     ` Oscar Salvador
2021-03-25 11:08                       ` David Hildenbrand
2021-03-25 11:23                         ` Oscar Salvador
2021-03-25 12:35                         ` Michal Hocko
2021-03-25 12:40                           ` David Hildenbrand
2021-03-25 14:08                             ` Michal Hocko
2021-03-25 14:09                               ` David Hildenbrand
2021-03-25 14:34                                 ` Michal Hocko
2021-03-25 14:46                                   ` David Hildenbrand
2021-03-25 15:12                                     ` Michal Hocko
2021-03-25 15:19                                       ` David Hildenbrand
2021-03-25 15:35                                         ` Michal Hocko
2021-03-25 15:40                                           ` David Hildenbrand
2021-03-25 16:07                                           ` Michal Hocko
2021-03-25 16:20                                             ` David Hildenbrand
2021-03-25 16:36                                               ` Michal Hocko
2021-03-25 16:47                                                 ` Michal Hocko
2021-03-25 16:55                                                   ` David Hildenbrand
2021-03-25 22:06                                                   ` Oscar Salvador
2021-03-26  8:35                                                     ` Michal Hocko
2021-03-26  8:52                                                       ` David Hildenbrand
2021-03-26  8:57                                                         ` Oscar Salvador
2021-03-26 12:15                                                           ` Oscar Salvador
2021-03-26 13:36                                                             ` David Hildenbrand
2021-03-26 14:38                                                         ` Michal Hocko
2021-03-26 14:53                                                           ` David Hildenbrand
2021-03-26 15:31                                                             ` Michal Hocko
2021-03-26 16:03                                                               ` David Hildenbrand
2021-03-26  8:55                                                       ` Oscar Salvador
2021-03-26  9:11                                                         ` Michal Hocko
2021-03-25 18:08                                                 ` David Hildenbrand
2021-03-25 12:26                       ` Michal Hocko
2021-03-25 14:02                         ` Oscar Salvador
2021-03-25 14:40                           ` Michal Hocko
2021-03-19  9:26 ` [PATCH v5 2/5] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-03-23 10:40   ` Michal Hocko
2021-03-19  9:26 ` [PATCH v5 3/5] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-03-23 10:47   ` Michal Hocko
2021-03-24  8:45     ` Oscar Salvador
2021-03-24  9:02       ` Michal Hocko
2021-03-19  9:26 ` [PATCH v5 4/5] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-03-19  9:26 ` [PATCH v5 5/5] arm64/Kconfig: " Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFtPxH0CT5QZsnR1@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).