All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
	"Michael S . Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	Jason Wang <jasowang@redhat.com>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Michal Hocko <mhocko@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH v2 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
Date: Thu, 12 Nov 2020 14:38:13 +0100	[thread overview]
Message-ID: <20201112133815.13332-28-david@redhat.com> (raw)
In-Reply-To: <20201112133815.13332-1-david@redhat.com>

virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.

Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).

This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.

While we could compress the state into 2 bit, using 8 bit is much
easier.

This handling is similar, but different to acpi_scan_try_to_offline():

a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.

b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 89 insertions(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b44d4c7ba73b..217080ca93e5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t online_type = MMOP_ONLINE_KERNEL;
+	uint8_t **online_types = arg;
+	struct page *page;
+	int rc;
+
+	/*
+	 * Sense the online_type via the zone of the memory block. Offlining
+	 * with multiple zones within one memory block will be rejected
+	 * by offlining code ... so we don't care about that.
+	 */
+	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+		online_type = MMOP_ONLINE_MOVABLE;
+
+	rc = device_offline(&mem->dev);
+	/*
+	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+	 * so try_reonline_memory_block() can do the right thing.
+	 */
+	if (!rc)
+		**online_types = online_type;
+
+	(*online_types)++;
+	/* Ignore if already offline. */
+	return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t **online_types = arg;
+	int rc;
+
+	if (**online_types != MMOP_OFFLINE) {
+		mem->online_type = **online_types;
+		rc = device_online(&mem->dev);
+		if (rc < 0)
+			pr_warn("%s: Failed to re-online memory: %d",
+				__func__, rc);
+	}
+
+	/* Continue processing all remaining memory blocks. */
+	(*online_types)++;
+	return 0;
+}
+
 /*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
  */
 int offline_and_remove_memory(int nid, u64 start, u64 size)
 {
-	struct memory_block *mem;
-	int rc = -EINVAL;
+	const unsigned long mb_count = size / memory_block_size_bytes();
+	uint8_t *online_types, *tmp;
+	int rc;
 
 	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
-	    size != memory_block_size_bytes())
-		return rc;
+	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+		return -EINVAL;
+
+	/*
+	 * We'll remember the old online type of each memory block, so we can
+	 * try to revert whatever we did when offlining one memory block fails
+	 * after offlining some others succeeded.
+	 */
+	online_types = kmalloc_array(mb_count, sizeof(*online_types),
+				     GFP_KERNEL);
+	if (!online_types)
+		return -ENOMEM;
+	/*
+	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
+	 * try_reonline_memory_block().
+	 */
+	memset(online_types, MMOP_OFFLINE, mb_count);
 
 	lock_device_hotplug();
-	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
-	if (mem)
-		rc = device_offline(&mem->dev);
-	/* Ignore if the device is already offline. */
-	if (rc > 0)
-		rc = 0;
+
+	tmp = online_types;
+	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
 
 	/*
-	 * In case we succeeded to offline the memory block, remove it.
+	 * In case we succeeded to offline all memory, remove it.
 	 * This cannot fail as it cannot get onlined in the meantime.
 	 */
 	if (!rc) {
 		rc = try_remove_memory(nid, start, size);
-		WARN_ON_ONCE(rc);
+		if (rc)
+			pr_err("%s: Failed to remove memory: %d", __func__, rc);
+	}
+
+	/*
+	 * Rollback what we did. While memory onlining might theoretically fail
+	 * (nacked by a notifier), it barely ever happens.
+	 */
+	if (rc) {
+		tmp = online_types;
+		walk_memory_blocks(start, size, &tmp,
+				   try_reonline_memory_block);
 	}
 	unlock_device_hotplug();
 
+	kfree(online_types);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
-- 
2.26.2


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Oscar Salvador <osalvador@suse.de>
Subject: [PATCH v2 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
Date: Thu, 12 Nov 2020 14:38:13 +0100	[thread overview]
Message-ID: <20201112133815.13332-28-david@redhat.com> (raw)
In-Reply-To: <20201112133815.13332-1-david@redhat.com>

virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.

Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).

This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.

While we could compress the state into 2 bit, using 8 bit is much
easier.

This handling is similar, but different to acpi_scan_try_to_offline():

a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.

b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 89 insertions(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b44d4c7ba73b..217080ca93e5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t online_type = MMOP_ONLINE_KERNEL;
+	uint8_t **online_types = arg;
+	struct page *page;
+	int rc;
+
+	/*
+	 * Sense the online_type via the zone of the memory block. Offlining
+	 * with multiple zones within one memory block will be rejected
+	 * by offlining code ... so we don't care about that.
+	 */
+	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+		online_type = MMOP_ONLINE_MOVABLE;
+
+	rc = device_offline(&mem->dev);
+	/*
+	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+	 * so try_reonline_memory_block() can do the right thing.
+	 */
+	if (!rc)
+		**online_types = online_type;
+
+	(*online_types)++;
+	/* Ignore if already offline. */
+	return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t **online_types = arg;
+	int rc;
+
+	if (**online_types != MMOP_OFFLINE) {
+		mem->online_type = **online_types;
+		rc = device_online(&mem->dev);
+		if (rc < 0)
+			pr_warn("%s: Failed to re-online memory: %d",
+				__func__, rc);
+	}
+
+	/* Continue processing all remaining memory blocks. */
+	(*online_types)++;
+	return 0;
+}
+
 /*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
  */
 int offline_and_remove_memory(int nid, u64 start, u64 size)
 {
-	struct memory_block *mem;
-	int rc = -EINVAL;
+	const unsigned long mb_count = size / memory_block_size_bytes();
+	uint8_t *online_types, *tmp;
+	int rc;
 
 	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
-	    size != memory_block_size_bytes())
-		return rc;
+	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+		return -EINVAL;
+
+	/*
+	 * We'll remember the old online type of each memory block, so we can
+	 * try to revert whatever we did when offlining one memory block fails
+	 * after offlining some others succeeded.
+	 */
+	online_types = kmalloc_array(mb_count, sizeof(*online_types),
+				     GFP_KERNEL);
+	if (!online_types)
+		return -ENOMEM;
+	/*
+	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
+	 * try_reonline_memory_block().
+	 */
+	memset(online_types, MMOP_OFFLINE, mb_count);
 
 	lock_device_hotplug();
-	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
-	if (mem)
-		rc = device_offline(&mem->dev);
-	/* Ignore if the device is already offline. */
-	if (rc > 0)
-		rc = 0;
+
+	tmp = online_types;
+	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
 
 	/*
-	 * In case we succeeded to offline the memory block, remove it.
+	 * In case we succeeded to offline all memory, remove it.
 	 * This cannot fail as it cannot get onlined in the meantime.
 	 */
 	if (!rc) {
 		rc = try_remove_memory(nid, start, size);
-		WARN_ON_ONCE(rc);
+		if (rc)
+			pr_err("%s: Failed to remove memory: %d", __func__, rc);
+	}
+
+	/*
+	 * Rollback what we did. While memory onlining might theoretically fail
+	 * (nacked by a notifier), it barely ever happens.
+	 */
+	if (rc) {
+		tmp = online_types;
+		walk_memory_blocks(start, size, &tmp,
+				   try_reonline_memory_block);
 	}
 	unlock_device_hotplug();
 
+	kfree(online_types);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2020-11-12 13:40 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 13:37 [PATCH v2 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
2020-11-12 13:37 ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 02/29] virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 05/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 06/29] virtio-mem: factor out calculation of the bit number within the subblock bitmap David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 07/29] virtio-mem: print debug messages from virtio_mem_send_*_request() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 08/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 09/29] virtio-mem: factor out handling of fake-offline pages in memory notifier David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 10/29] virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 11/29] virtio-mem: generalize check for added memory David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 12/29] virtio-mem: generalize virtio_mem_owned_mb() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:37 ` [PATCH v2 13/29] virtio-mem: generalize virtio_mem_overlaps_range() David Hildenbrand
2020-11-12 13:37   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 14/29] virtio-mem: drop last_mb_id David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 15/29] virtio-mem: don't always trigger the workqueue when offlining memory David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 16/29] virtio-mem: generalize handling when memory is getting onlined deferred David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 17/29] virito-mem: document Sub Block Mode (SBM) David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 18/29] virtio-mem: memory block states are specific to " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 19/29] virito-mem: subblock " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 20/29] virtio-mem: nb_sb_per_mb and subblock_size " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 21/29] virtio-mem: memory block ids " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 22/29] virito-mem: existing (un)plug functions " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 23/29] virtio-mem: memory notifier callbacks " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 24/29] virtio-mem: factor out adding/removing memory from Linux David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 26/29] virtio-mem: allow to force Big Block Mode (BBM) and set the big block size David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` David Hildenbrand [this message]
2020-11-12 13:38   ` [PATCH v2 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block David Hildenbrand
2020-11-18  4:53   ` Andrew Morton
2020-11-18  4:53     ` Andrew Morton
2020-11-18  8:59     ` David Hildenbrand
2020-11-18  8:59       ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-12 13:38 ` [PATCH v2 29/29] virtio-mem: Big Block Mode (BBM) - safe " David Hildenbrand
2020-11-12 13:38   ` David Hildenbrand
2020-11-18  9:00 ` [PATCH v2 00/29] virtio-mem: Big Block Mode (BBM) Michael S. Tsirkin
2020-11-18  9:00   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201112133815.13332-28-david@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.