LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM)
@ 2020-10-12 12:52 David Hildenbrand
  2020-10-12 12:52 ` [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
                   ` (30 more replies)
  0 siblings, 31 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Michal Hocko, Oscar Salvador,
	Pankaj Gupta, Wei Yang

virtio-mem currently only supports device block sizes that span at most
a single Linux memory block. For example, gigantic pages in the hypervisor
result on x86-64 in a device block size of 1 GiB - when the Linux memory
block size is 128 MiB, we cannot support such devices (we fail loading the
driver). Of course, we want to support any device block size in any Linux
VM.

Bigger device block sizes will become especially important once supporting
VFIO in QEMU - each device block has to be mapped separately, and the
maximum number of mappings for VFIO is 64k. So we usually want blocks in
the gigabyte range when wanting to grow the VM big.

This series:
- Performs some cleanups
- Factors out existing Sub Block Mode (SBM)
- Implements memory hot(un)plug in Big Block Mode (BBM)

I need one core-mm change, to make offline_and_remove_memory() eat bigger
chunks.

This series is based on "next-20201009" and can be found at:
	git@gitlab.com:virtio-mem/linux.git virtio-mem-dbm-v1

Once some virtio-mem patches that are pending in the -mm tree are upstream
(I guess they'll go in in 5.10), I'll resend based on Linus' tree.
I suggest to take this (including the MM patch, acks/review please) via the
vhost tree once time has come. In the meantime, I'll do more testing.

David Hildenbrand (29):
  virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
  virtio-mem: simplify calculation in
    virtio_mem_mb_state_prepare_next_mb()
  virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
  virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
  virtio-mem: generalize check for added memory
  virtio-mem: generalize virtio_mem_owned_mb()
  virtio-mem: generalize virtio_mem_overlaps_range()
  virtio-mem: drop last_mb_id
  virtio-mem: don't always trigger the workqueue when offlining memory
  virtio-mem: generalize handling when memory is getting onlined
    deferred
  virtio-mem: use "unsigned long" for nr_pages when fake
    onlining/offlining
  virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
  virtio-mem: factor out handling of fake-offline pages in memory
    notifier
  virtio-mem: retry fake-offlining via alloc_contig_range() on
    ZONE_MOVABLE
  virito-mem: document Sub Block Mode (SBM)
  virtio-mem: memory block states are specific to Sub Block Mode (SBM)
  virito-mem: subblock states are specific to Sub Block Mode (SBM)
  virtio-mem: factor out calculation of the bit number within the
    sb_states bitmap
  virito-mem: existing (un)plug functions are specific to Sub Block Mode
    (SBM)
  virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block
    Mode (SBM)
  virtio-mem: memory notifier callbacks are specific to Sub Block Mode
    (SBM)
  virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
  virtio-mem: factor out adding/removing memory from Linux
  virtio-mem: print debug messages from virtio_mem_send_*_request()
  virtio-mem: Big Block Mode (BBM) memory hotplug
  virtio-mem: allow to force Big Block Mode (BBM) and set the big block
    size
  mm/memory_hotplug: extend offline_and_remove_memory() to handle more
    than one memory block
  virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
  virtio-mem: Big Block Mode (BBM) - safe memory hotunplug

 drivers/virtio/virtio_mem.c | 1783 +++++++++++++++++++++++++----------
 mm/memory_hotplug.c         |  105 ++-
 2 files changed, 1373 insertions(+), 515 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
@ 2020-10-12 12:52 ` David Hildenbrand
  2020-10-15  3:56   ` Wei Yang
  2020-10-15 19:26   ` Pankaj Gupta
  2020-10-12 12:52 ` [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb() David Hildenbrand
                   ` (29 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's determine the target nid only once in case we have none specified -
usually, we'll end up with node 0 either way.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index ba4de598f663..a1f5bf7a571a 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -70,7 +70,7 @@ struct virtio_mem {
 
 	/* The device block size (for communicating with the device). */
 	uint64_t device_block_size;
-	/* The translated node id. NUMA_NO_NODE in case not specified. */
+	/* The determined node id for all memory of the device. */
 	int nid;
 	/* Physical start address of the memory region. */
 	uint64_t addr;
@@ -406,10 +406,6 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
 static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
-	int nid = vm->nid;
-
-	if (nid == NUMA_NO_NODE)
-		nid = memory_add_physaddr_to_nid(addr);
 
 	/*
 	 * When force-unloading the driver and we still have memory added to
@@ -423,7 +419,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 	}
 
 	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
-	return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
+	return add_memory_driver_managed(vm->nid, addr,
+					 memory_block_size_bytes(),
 					 vm->resource_name,
 					 MEMHP_MERGE_RESOURCE);
 }
@@ -440,13 +437,9 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
-	int nid = vm->nid;
-
-	if (nid == NUMA_NO_NODE)
-		nid = memory_add_physaddr_to_nid(addr);
 
 	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
-	return remove_memory(nid, addr, memory_block_size_bytes());
+	return remove_memory(vm->nid, addr, memory_block_size_bytes());
 }
 
 /*
@@ -461,14 +454,11 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
-	int nid = vm->nid;
-
-	if (nid == NUMA_NO_NODE)
-		nid = memory_add_physaddr_to_nid(addr);
 
 	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
 		mb_id);
-	return offline_and_remove_memory(nid, addr, memory_block_size_bytes());
+	return offline_and_remove_memory(vm->nid, addr,
+					 memory_block_size_bytes());
 }
 
 /*
@@ -1659,6 +1649,10 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
 			&vm->region_size);
 
+	/* Determine the nid for the device based on the lowest address. */
+	if (vm->nid == NUMA_NO_NODE)
+		vm->nid = memory_add_physaddr_to_nid(vm->addr);
+
 	/*
 	 * We always hotplug memory in memory block granularity. This way,
 	 * we have to wait for exactly one memory block to online.
@@ -1707,7 +1701,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
 		 memory_block_size_bytes());
 	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
 		 (unsigned long long)vm->subblock_size);
-	if (vm->nid != NUMA_NO_NODE)
+	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
 
 	return 0;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
  2020-10-12 12:52 ` [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
@ 2020-10-12 12:52 ` David Hildenbrand
  2020-10-15  4:02   ` Wei Yang
  2020-10-15 20:24   ` Pankaj Gupta
  2020-10-12 12:52 ` [PATCH v1 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling David Hildenbrand
                   ` (28 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

We actually need one byte less (next_mb_id is exclusive, first_mb_id is
inclusive). Simplify.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index a1f5bf7a571a..670b3faf412d 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
  */
 static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
 {
-	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
-	unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
+	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
+	unsigned long new_bytes = old_bytes + 1;
 	int old_pages = PFN_UP(old_bytes);
 	int new_pages = PFN_UP(new_bytes);
 	uint8_t *new_mb_state;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
  2020-10-12 12:52 ` [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
  2020-10-12 12:52 ` [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb() David Hildenbrand
@ 2020-10-12 12:52 ` David Hildenbrand
  2020-10-15  7:06   ` Wei Yang
  2020-10-12 12:52 ` [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add() David Hildenbrand
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where
possible, so we don't have do deal with allocation orders.

Add a comment why we have that restriction for now.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 670b3faf412d..78c2fbcddcf8 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -755,14 +755,15 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
  */
 static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
 {
-	const int order = MAX_ORDER - 1;
+	const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
 	int i;
 
 	/*
-	 * We are always called with subblock granularity, which is at least
-	 * aligned to MAX_ORDER - 1.
+	 * We are always called at least with MAX_ORDER_NR_PAGES
+	 * granularity/alignment (e.g., the way subblocks work). All pages
+	 * inside such a block are alike.
 	 */
-	for (i = 0; i < nr_pages; i += 1 << order) {
+	for (i = 0; i < nr_pages; i += max_nr_pages) {
 		struct page *page = pfn_to_page(pfn + i);
 
 		/*
@@ -772,14 +773,14 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
 		 * alike.
 		 */
 		if (PageDirty(page)) {
-			virtio_mem_clear_fake_offline(pfn + i, 1 << order,
+			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
 						      false);
-			generic_online_page(page, order);
+			generic_online_page(page, MAX_ORDER - 1);
 		} else {
-			virtio_mem_clear_fake_offline(pfn + i, 1 << order,
+			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
 						      true);
-			free_contig_range(pfn + i, 1 << order);
-			adjust_managed_page_count(page, 1 << order);
+			free_contig_range(pfn + i, max_nr_pages);
+			adjust_managed_page_count(page, max_nr_pages);
 		}
 	}
 }
@@ -792,7 +793,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 	int sb_id;
 
 	/*
-	 * We exploit here that subblocks have at least MAX_ORDER - 1
+	 * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
 	 * size/alignment and that this callback is is called with such a
 	 * size/alignment. So we cannot cross subblocks and therefore
 	 * also not memory blocks.
@@ -1675,13 +1676,15 @@ static int virtio_mem_init(struct virtio_mem *vm)
 			 "Some memory is not addressable. This can make some memory unusable.\n");
 
 	/*
-	 * Calculate the subblock size:
-	 * - At least MAX_ORDER - 1 / pageblock_order.
-	 * - At least the device block size.
-	 * In the worst case, a single subblock per memory block.
+	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
+	 * pageblock_nr_pages pages. This:
+	 * - Simplifies our page onlining code (virtio_mem_online_page_cb)
+	 *   and fake page onlining code (virtio_mem_fake_online).
+	 * - Is required for now for alloc_contig_range() to work reliably -
+	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	vm->subblock_size = PAGE_SIZE * 1ul << max_t(uint32_t, MAX_ORDER - 1,
-						     pageblock_order);
+	vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
+				  pageblock_nr_pages) * PAGE_SIZE;
 	vm->subblock_size = max_t(uint64_t, vm->device_block_size,
 				  vm->subblock_size);
 	vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-10-12 12:52 ` [PATCH v1 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling David Hildenbrand
@ 2020-10-12 12:52 ` David Hildenbrand
  2020-10-12 13:09   ` Pankaj Gupta
  2020-10-15  7:14   ` Wei Yang
  2020-10-12 12:52 ` [PATCH v1 05/29] virtio-mem: generalize check for added memory David Hildenbrand
                   ` (26 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

We can drop rc2, we don't actually need the value.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 78c2fbcddcf8..b3eebac7191f 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1072,7 +1072,7 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
 				      uint64_t *nb_sb)
 {
 	const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
-	int rc, rc2;
+	int rc;
 
 	if (WARN_ON_ONCE(!count))
 		return -EINVAL;
@@ -1103,13 +1103,12 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
 
 		dev_err(&vm->vdev->dev,
 			"adding memory block %lu failed with %d\n", mb_id, rc);
-		rc2 = virtio_mem_mb_unplug_sb(vm, mb_id, 0, count);
 
 		/*
 		 * TODO: Linux MM does not properly clean up yet in all cases
 		 * where adding of memory failed - especially on -ENOMEM.
 		 */
-		if (rc2)
+		if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
 			new_state = VIRTIO_MEM_MB_STATE_PLUGGED;
 		virtio_mem_mb_set_state(vm, mb_id, new_state);
 		return rc;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-10-12 12:52 ` [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add() David Hildenbrand
@ 2020-10-12 12:52 ` David Hildenbrand
  2020-10-15  8:28   ` Wei Yang
  2020-10-16 22:39   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb() David Hildenbrand
                   ` (25 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's check by traversing busy system RAM resources instead, to avoid
relying on memory block states.

Don't use walk_system_ram_range(), as that works on pages and we want to
use the bare addresses we have easily at hand.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index b3eebac7191f..6bbd1cfd10d3 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1749,6 +1749,20 @@ static void virtio_mem_delete_resource(struct virtio_mem *vm)
 	vm->parent_resource = NULL;
 }
 
+static int virtio_mem_range_has_system_ram(struct resource *res, void *arg)
+{
+	return 1;
+}
+
+static bool virtio_mem_has_memory_added(struct virtio_mem *vm)
+{
+	const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+	return walk_iomem_res_desc(IORES_DESC_NONE, flags, vm->addr,
+				   vm->addr + vm->region_size, NULL,
+				   virtio_mem_range_has_system_ram) == 1;
+}
+
 static int virtio_mem_probe(struct virtio_device *vdev)
 {
 	struct virtio_mem *vm;
@@ -1870,10 +1884,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	 * the system. And there is no way to stop the driver/device from going
 	 * away. Warn at least.
 	 */
-	if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
+	if (virtio_mem_has_memory_added(vm)) {
 		dev_warn(&vdev->dev, "device still has system memory added\n");
 	} else {
 		virtio_mem_delete_resource(vm);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-10-12 12:52 ` [PATCH v1 05/29] virtio-mem: generalize check for added memory David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-15  8:32   ` Wei Yang
  2020-10-15 20:30   ` Pankaj Gupta
  2020-10-12 12:53 ` [PATCH v1 07/29] virtio-mem: generalize virtio_mem_overlaps_range() David Hildenbrand
                   ` (24 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Avoid using memory block ids. Rename it to virtio_mem_contains_range().

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 6bbd1cfd10d3..821143db14fe 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -500,12 +500,13 @@ static bool virtio_mem_overlaps_range(struct virtio_mem *vm,
 }
 
 /*
- * Test if a virtio-mem device owns a memory block. Can be called from
+ * Test if a virtio-mem device contains a given range. Can be called from
  * (notifier) callbacks lockless.
  */
-static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
+static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
+				      uint64_t size)
 {
-	return mb_id >= vm->first_mb_id && mb_id <= vm->last_mb_id;
+	return start >= vm->addr && start + size <= vm->addr + vm->region_size;
 }
 
 static int virtio_mem_notify_going_online(struct virtio_mem *vm,
@@ -800,7 +801,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
-		if (!virtio_mem_owned_mb(vm, mb_id))
+		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
 			continue;
 
 		sb_id = virtio_mem_phys_to_sb_id(vm, addr);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 07/29] virtio-mem: generalize virtio_mem_overlaps_range()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb() David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-20  9:22   ` Pankaj Gupta
  2020-10-12 12:53 ` [PATCH v1 08/29] virtio-mem: drop last_mb_id David Hildenbrand
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Avoid using memory block ids. While at it, use uint64_t for
address/size.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 821143db14fe..37a0e338ae4a 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -489,14 +489,10 @@ static int virtio_mem_translate_node_id(struct virtio_mem *vm, uint16_t node_id)
  * Test if a virtio-mem device overlaps with the given range. Can be called
  * from (notifier) callbacks lockless.
  */
-static bool virtio_mem_overlaps_range(struct virtio_mem *vm,
-				      unsigned long start, unsigned long size)
+static bool virtio_mem_overlaps_range(struct virtio_mem *vm, uint64_t start,
+				      uint64_t size)
 {
-	unsigned long dev_start = virtio_mem_mb_id_to_phys(vm->first_mb_id);
-	unsigned long dev_end = virtio_mem_mb_id_to_phys(vm->last_mb_id) +
-				memory_block_size_bytes();
-
-	return start < dev_end && dev_start < start + size;
+	return start < vm->addr + vm->region_size && vm->addr < start + size;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 08/29] virtio-mem: drop last_mb_id
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 07/29] virtio-mem: generalize virtio_mem_overlaps_range() David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-15  8:35   ` Wei Yang
  2020-10-15 20:32   ` Pankaj Gupta
  2020-10-12 12:53 ` [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory David Hildenbrand
                   ` (22 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

No longer used, let's drop it.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 37a0e338ae4a..5c93f8a65eba 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -84,8 +84,6 @@ struct virtio_mem {
 
 	/* Id of the first memory block of this device. */
 	unsigned long first_mb_id;
-	/* Id of the last memory block of this device. */
-	unsigned long last_mb_id;
 	/* Id of the last usable memory block of this device. */
 	unsigned long last_usable_mb_id;
 	/* Id of the next memory bock to prepare when needed. */
@@ -1689,8 +1687,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
 						   memory_block_size_bytes());
 	vm->next_mb_id = vm->first_mb_id;
-	vm->last_mb_id = virtio_mem_phys_to_mb_id(vm->addr +
-			 vm->region_size) - 1;
 
 	dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
 	dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 08/29] virtio-mem: drop last_mb_id David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  4:03   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 10/29] virtio-mem: generalize handling when memory is getting onlined deferred David Hildenbrand
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's trigger from offlining code when we're not allowed to touch online
memory.

Handle the other case (memmap possibly freeing up another memory block)
when actually removing memory. When removing via virtio_mem_remove(),
virtio_mem_retry() is a NOP and safe to use.

While at it, move retry handling when offlining out of
virtio_mem_notify_offline(), to share it with Device Block Mode (DBM)
soon.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 40 ++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 5c93f8a65eba..8ea00f0b2ecd 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -158,6 +158,7 @@ static DEFINE_MUTEX(virtio_mem_mutex);
 static LIST_HEAD(virtio_mem_devices);
 
 static void virtio_mem_online_page_cb(struct page *page, unsigned int order);
+static void virtio_mem_retry(struct virtio_mem *vm);
 
 /*
  * Register a virtio-mem device so it will be considered for the online_page
@@ -435,9 +436,17 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	int rc;
 
 	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
-	return remove_memory(vm->nid, addr, memory_block_size_bytes());
+	rc = remove_memory(vm->nid, addr, memory_block_size_bytes());
+	if (!rc)
+		/*
+		 * We might have freed up memory we can now unplug, retry
+		 * immediately instead of waiting.
+		 */
+		virtio_mem_retry(vm);
+	return rc;
 }
 
 /*
@@ -452,11 +461,19 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	int rc;
 
 	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
 		mb_id);
-	return offline_and_remove_memory(vm->nid, addr,
-					 memory_block_size_bytes());
+	rc = offline_and_remove_memory(vm->nid, addr,
+				       memory_block_size_bytes());
+	if (!rc)
+		/*
+		 * We might have freed up memory we can now unplug, retry
+		 * immediately instead of waiting.
+		 */
+		virtio_mem_retry(vm);
+	return rc;
 }
 
 /*
@@ -534,15 +551,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
 		BUG();
 		break;
 	}
-
-	/*
-	 * Trigger the workqueue, maybe we can now unplug memory. Also,
-	 * when we offline and remove a memory block, this will re-trigger
-	 * us immediately - which is often nice because the removal of
-	 * the memory block (e.g., memmap) might have freed up memory
-	 * on other memory blocks we manage.
-	 */
-	virtio_mem_retry(vm);
 }
 
 static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
@@ -679,6 +687,14 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 		break;
 	case MEM_OFFLINE:
 		virtio_mem_notify_offline(vm, mb_id);
+
+		/*
+		 * Trigger the workqueue. Now that we have some offline memory,
+		 * maybe we can handle pending unplug requests.
+		 */
+		if (!unplug_online)
+			virtio_mem_retry(vm);
+
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 10/29] virtio-mem: generalize handling when memory is getting onlined deferred
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-12 12:53 ` [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining David Hildenbrand
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

We don't want to add too much memory when it's not getting onlined
immediately, to avoid running OOM. Generalize the handling, to avoid
making use of memory block states. Use a threshold of 1 GiB for now.

Properly adjust the offline size when adding/removing memory. As we are
not always protected by a lock when touching the offline size, use an
atomic64_t. We don't care about races (e.g., someone offlining memory
while we are adding more), only about consistent values.

(1 GiB needs a memmap of ~16MiB - which sounds reasonable even for
 setups with little boot memory and (possibly) one virtio-mem device per
 node)

We don't want to retrigger when onlining is caused immediately by our
action (e.g., adding memory which immediately gets onlined), so use a
flag to indicate if the workqueue is active and use that as an
indicator whether to trigger a retry.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 95 ++++++++++++++++++++++++-------------
 1 file changed, 63 insertions(+), 32 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 8ea00f0b2ecd..cb2e8f254650 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -51,6 +51,7 @@ struct virtio_mem {
 
 	/* Workqueue that processes the plug/unplug requests. */
 	struct work_struct wq;
+	atomic_t wq_active;
 	atomic_t config_changed;
 
 	/* Virtqueue for guest->host requests. */
@@ -99,7 +100,15 @@ struct virtio_mem {
 
 	/* Summary of all memory block states. */
 	unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
-#define VIRTIO_MEM_NB_OFFLINE_THRESHOLD		10
+
+	/*
+	 * We don't want to add too much memory if it's not getting onlined,
+	 * to avoid running OOM. Besides this threshold, we allow to have at
+	 * least two offline blocks at a time (whatever is bigger).
+	 */
+#define VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD		(1024 * 1024 * 1024)
+	atomic64_t offline_size;
+	uint64_t offline_threshold;
 
 	/*
 	 * One byte state per memory block.
@@ -393,6 +402,18 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
 	return 0;
 }
 
+/*
+ * Test if we could add memory without creating too much offline memory -
+ * to avoid running OOM if memory is getting onlined deferred.
+ */
+static bool virtio_mem_could_add_memory(struct virtio_mem *vm, uint64_t size)
+{
+	if (WARN_ON_ONCE(size > vm->offline_threshold))
+		return false;
+
+	return atomic64_read(&vm->offline_size) + size <= vm->offline_threshold;
+}
+
 /*
  * Try to add a memory block to Linux. This will usually only fail
  * if out of memory.
@@ -405,6 +426,8 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
 static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	const uint64_t size = memory_block_size_bytes();
+	int rc;
 
 	/*
 	 * When force-unloading the driver and we still have memory added to
@@ -418,10 +441,13 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 	}
 
 	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
-	return add_memory_driver_managed(vm->nid, addr,
-					 memory_block_size_bytes(),
-					 vm->resource_name,
-					 MEMHP_MERGE_RESOURCE);
+	/* Memory might get onlined immediately. */
+	atomic64_add(size, &vm->offline_size);
+	rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
+				       MEMHP_MERGE_RESOURCE);
+	if (rc)
+		atomic64_sub(size, &vm->offline_size);
+	return rc;
 }
 
 /*
@@ -436,16 +462,19 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	const uint64_t size = memory_block_size_bytes();
 	int rc;
 
 	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
-	rc = remove_memory(vm->nid, addr, memory_block_size_bytes());
-	if (!rc)
+	rc = remove_memory(vm->nid, addr, size);
+	if (!rc) {
+		atomic64_sub(size, &vm->offline_size);
 		/*
 		 * We might have freed up memory we can now unplug, retry
 		 * immediately instead of waiting.
 		 */
 		virtio_mem_retry(vm);
+	}
 	return rc;
 }
 
@@ -461,18 +490,20 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	const uint64_t size = memory_block_size_bytes();
 	int rc;
 
 	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
 		mb_id);
-	rc = offline_and_remove_memory(vm->nid, addr,
-				       memory_block_size_bytes());
-	if (!rc)
+	rc = offline_and_remove_memory(vm->nid, addr, size);
+	if (!rc) {
+		atomic64_sub(size, &vm->offline_size);
 		/*
 		 * We might have freed up memory we can now unplug, retry
 		 * immediately instead of waiting.
 		 */
 		virtio_mem_retry(vm);
+	}
 	return rc;
 }
 
@@ -555,8 +586,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
 
 static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 {
-	unsigned long nb_offline;
-
 	switch (virtio_mem_mb_get_state(vm, mb_id)) {
 	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
 		virtio_mem_mb_set_state(vm, mb_id,
@@ -569,12 +598,6 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 		BUG();
 		break;
 	}
-	nb_offline = vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] +
-		     vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL];
-
-	/* see if we can add new blocks now that we onlined one block */
-	if (nb_offline == VIRTIO_MEM_NB_OFFLINE_THRESHOLD - 1)
-		virtio_mem_retry(vm);
 }
 
 static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
@@ -688,6 +711,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 	case MEM_OFFLINE:
 		virtio_mem_notify_offline(vm, mb_id);
 
+		atomic64_add(size, &vm->offline_size);
 		/*
 		 * Trigger the workqueue. Now that we have some offline memory,
 		 * maybe we can handle pending unplug requests.
@@ -700,6 +724,18 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 		break;
 	case MEM_ONLINE:
 		virtio_mem_notify_online(vm, mb_id);
+
+		atomic64_sub(size, &vm->offline_size);
+		/*
+		 * Start adding more memory once we onlined half of our
+		 * threshold. Don't trigger if it's possibly due to our actipn
+		 * (e.g., us adding memory which gets onlined immediately from
+		 * the core).
+		 */
+		if (!atomic_read(&vm->wq_active) &&
+		    virtio_mem_could_add_memory(vm, vm->offline_threshold / 2))
+			virtio_mem_retry(vm);
+
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
@@ -1060,18 +1096,6 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
 	return 0;
 }
 
-/*
- * Don't add too many blocks that are not onlined yet to avoid running OOM.
- */
-static bool virtio_mem_too_many_mb_offline(struct virtio_mem *vm)
-{
-	unsigned long nb_offline;
-
-	nb_offline = vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] +
-		     vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL];
-	return nb_offline >= VIRTIO_MEM_NB_OFFLINE_THRESHOLD;
-}
-
 /*
  * Try to plug the desired number of subblocks and add the memory block
  * to Linux.
@@ -1225,7 +1249,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 
 	/* Try to plug and add unused blocks */
 	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED) {
-		if (virtio_mem_too_many_mb_offline(vm))
+		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
 			return -ENOSPC;
 
 		rc = virtio_mem_mb_plug_and_add(vm, mb_id, &nb_sb);
@@ -1236,7 +1260,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 
 	/* Try to prepare, plug and add new blocks */
 	while (nb_sb) {
-		if (virtio_mem_too_many_mb_offline(vm))
+		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
 			return -ENOSPC;
 
 		rc = virtio_mem_prepare_next_mb(vm, &mb_id);
@@ -1536,6 +1560,7 @@ static void virtio_mem_run_wq(struct work_struct *work)
 	if (vm->broken)
 		return;
 
+	atomic_set(&vm->wq_active, 1);
 retry:
 	rc = 0;
 
@@ -1596,6 +1621,8 @@ static void virtio_mem_run_wq(struct work_struct *work)
 			"unknown error, marking device broken: %d\n", rc);
 		vm->broken = true;
 	}
+
+	atomic_set(&vm->wq_active, 0);
 }
 
 static enum hrtimer_restart virtio_mem_timer_expired(struct hrtimer *timer)
@@ -1704,6 +1731,10 @@ static int virtio_mem_init(struct virtio_mem *vm)
 						   memory_block_size_bytes());
 	vm->next_mb_id = vm->first_mb_id;
 
+	/* Prepare the offline threshold - make sure we can add two blocks. */
+	vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
+				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
+
 	dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
 	dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
 	dev_info(&vm->vdev->dev, "device block size: 0x%llx",
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 10/29] virtio-mem: generalize handling when memory is getting onlined deferred David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-15 20:31   ` Pankaj Gupta
  2020-10-16  6:11   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline() David Hildenbrand
                   ` (19 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

No harm done, but let's be consistent.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index cb2e8f254650..00d1cfca4713 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -766,7 +766,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
  * (via generic_online_page()) using PageDirty().
  */
 static void virtio_mem_set_fake_offline(unsigned long pfn,
-					unsigned int nr_pages, bool onlined)
+					unsigned long nr_pages, bool onlined)
 {
 	for (; nr_pages--; pfn++) {
 		struct page *page = pfn_to_page(pfn);
@@ -785,7 +785,7 @@ static void virtio_mem_set_fake_offline(unsigned long pfn,
  * (via generic_online_page()), clear PageDirty().
  */
 static void virtio_mem_clear_fake_offline(unsigned long pfn,
-					  unsigned int nr_pages, bool onlined)
+					  unsigned long nr_pages, bool onlined)
 {
 	for (; nr_pages--; pfn++) {
 		struct page *page = pfn_to_page(pfn);
@@ -800,10 +800,10 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
  * Release a range of fake-offline pages to the buddy, effectively
  * fake-onlining them.
  */
-static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
+static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
 {
 	const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
-	int i;
+	unsigned long i;
 
 	/*
 	 * We are always called at least with MAX_ORDER_NR_PAGES
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  6:24   ` Wei Yang
  2020-10-20  9:31   ` Pankaj Gupta
  2020-10-12 12:53 ` [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier David Hildenbrand
                   ` (18 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

... which now matches virtio_mem_fake_online(). We'll reuse this
functionality soon.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 00d1cfca4713..d132bc54ef57 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -832,6 +832,27 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
 	}
 }
 
+/*
+ * Try to allocate a range, marking pages fake-offline, effectively
+ * fake-offlining them.
+ */
+static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
+{
+	int rc;
+
+	rc = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE,
+				GFP_KERNEL);
+	if (rc == -ENOMEM)
+		/* whoops, out of memory */
+		return rc;
+	if (rc)
+		return -EBUSY;
+
+	virtio_mem_set_fake_offline(pfn, nr_pages, true);
+	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
+	return 0;
+}
+
 static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 {
 	const unsigned long addr = page_to_phys(page);
@@ -1335,17 +1356,10 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
 
 	start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
 			     sb_id * vm->subblock_size);
-	rc = alloc_contig_range(start_pfn, start_pfn + nr_pages,
-				MIGRATE_MOVABLE, GFP_KERNEL);
-	if (rc == -ENOMEM)
-		/* whoops, out of memory */
-		return rc;
-	if (rc)
-		return -EBUSY;
 
-	/* Mark it as fake-offline before unplugging it */
-	virtio_mem_set_fake_offline(start_pfn, nr_pages, true);
-	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
+	rc = virtio_mem_fake_offline(start_pfn, nr_pages);
+	if (rc)
+		return rc;
 
 	/* Try to unplug the allocated memory */
 	rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline() David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  7:15   ` Wei Yang
  2020-10-18 12:38   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 14/29] virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE David Hildenbrand
                   ` (17 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's factor out the core pieces and place the implementation next to
virtio_mem_fake_offline(). We'll reuse this functionality soon.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 73 +++++++++++++++++++++++++------------
 1 file changed, 50 insertions(+), 23 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index d132bc54ef57..a2124892e510 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -168,6 +168,10 @@ static LIST_HEAD(virtio_mem_devices);
 
 static void virtio_mem_online_page_cb(struct page *page, unsigned int order);
 static void virtio_mem_retry(struct virtio_mem *vm);
+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
+						  unsigned long nr_pages);
+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
+						   unsigned long nr_pages);
 
 /*
  * Register a virtio-mem device so it will be considered for the online_page
@@ -604,27 +608,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
-	struct page *page;
 	unsigned long pfn;
-	int sb_id, i;
+	int sb_id;
 
 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
 			continue;
-		/*
-		 * Drop our reference to the pages so the memory can get
-		 * offlined and add the unplugged pages to the managed
-		 * page counters (so offlining code can correctly subtract
-		 * them again).
-		 */
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
 			       sb_id * vm->subblock_size);
-		adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
-		for (i = 0; i < nr_pages; i++) {
-			page = pfn_to_page(pfn + i);
-			if (WARN_ON(!page_ref_dec_and_test(page)))
-				dump_page(page, "unplugged page referenced");
-		}
+		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
 	}
 }
 
@@ -633,21 +625,14 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
 {
 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
 	unsigned long pfn;
-	int sb_id, i;
+	int sb_id;
 
 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
 			continue;
-		/*
-		 * Get the reference we dropped when going offline and
-		 * subtract the unplugged pages from the managed page
-		 * counters.
-		 */
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
 			       sb_id * vm->subblock_size);
-		adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
-		for (i = 0; i < nr_pages; i++)
-			page_ref_inc(pfn_to_page(pfn + i));
+		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
 	}
 }
 
@@ -853,6 +838,48 @@ static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
 	return 0;
 }
 
+/*
+ * Handle fake-offline pages when memory is going offline - such that the
+ * pages can be skipped by mm-core when offlining.
+ */
+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
+						  unsigned long nr_pages)
+{
+	struct page *page;
+	unsigned long i;
+
+	/*
+	 * Drop our reference to the pages so the memory can get offlined
+	 * and add the unplugged pages to the managed page counters (so
+	 * offlining code can correctly subtract them again).
+	 */
+	adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
+	/* Drop our reference to the pages so the memory can get offlined. */
+	for (i = 0; i < nr_pages; i++) {
+		page = pfn_to_page(pfn + i);
+		if (WARN_ON(!page_ref_dec_and_test(page)))
+			dump_page(page, "fake-offline page referenced");
+	}
+}
+
+/*
+ * Handle fake-offline pages when memory offlining is canceled - to undo
+ * what we did in virtio_mem_fake_offline_going_offline().
+ */
+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
+						   unsigned long nr_pages)
+{
+	unsigned long i;
+
+	/*
+	 * Get the reference we dropped when going offline and subtract the
+	 * unplugged pages from the managed page counters.
+	 */
+	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
+	for (i = 0; i < nr_pages; i++)
+		page_ref_inc(pfn_to_page(pfn + i));
+}
+
 static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 {
 	const unsigned long addr = page_to_phys(page);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 14/29] virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-12 12:53 ` [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM) David Hildenbrand
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

ZONE_MOVABLE is supposed to give some guarantees, yet,
alloc_contig_range() isn't prepared to properly deal with some racy
cases properly (e.g., temporary page pinning when exiting processed, PCP).

Retry 5 times for now. There is certainly room for improvement in the
future.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index a2124892e510..faeb759687fe 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -823,19 +823,34 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
  */
 static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
 {
-	int rc;
+	const bool is_movable = zone_idx(page_zone(pfn_to_page(pfn))) ==
+				ZONE_MOVABLE;
+	int rc, retry_count;
 
-	rc = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE,
-				GFP_KERNEL);
-	if (rc == -ENOMEM)
-		/* whoops, out of memory */
-		return rc;
-	if (rc)
-		return -EBUSY;
+	/*
+	 * TODO: We want an alloc_contig_range() mode that tries to allocate
+	 * harder (e.g., dealing with temporarily pinned pages, PCP), especially
+	 * with ZONE_MOVABLE. So for now, retry a couple of times with
+	 * ZONE_MOVABLE before giving up - because that zone is supposed to give
+	 * some guarantees.
+	 */
+	for (retry_count = 0; retry_count < 5; retry_count++) {
+		rc = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE,
+					GFP_KERNEL);
+		if (rc == -ENOMEM)
+			/* whoops, out of memory */
+			return rc;
+		else if (rc && !is_movable)
+			break;
+		else if (rc)
+			continue;
 
-	virtio_mem_set_fake_offline(pfn, nr_pages, true);
-	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
-	return 0;
+		virtio_mem_set_fake_offline(pfn, nr_pages, true);
+		adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
+		return 0;
+	}
+
+	return -EBUSY;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (13 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 14/29] virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-15  9:33   ` David Hildenbrand
  2020-10-16  8:03   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 16/29] virtio-mem: memory block states are specific to " David Hildenbrand
                   ` (15 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's add some documentation for the current mode - Sub Block Mode (SBM) -
to prepare for a new mode - Big Block Mode (BBM).

Follow-up patches will properly factor out the existing Sub Block Mode
(SBM) and implement Device Block Mode (DBM).

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index faeb759687fe..fd8685673fe4 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -27,6 +27,21 @@ static bool unplug_online = true;
 module_param(unplug_online, bool, 0644);
 MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
 
+/*
+ * virtio-mem currently supports the following modes of operation:
+ *
+ * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
+ *   size of a Sub Block (SB) is determined based on the device block size, the
+ *   pageblock size, and the maximum allocation granularity of the buddy.
+ *   Subblocks within a Linux memory block might either be plugged or unplugged.
+ *   Memory is added/removed to Linux MM in Linux memory block granularity.
+ *
+ * User space / core MM (auto onlining) is responsible for onlining added
+ * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
+ * always onlined separately, and all memory within a Linux memory block is
+ * onlined to the same zone - virtio-mem relies on this behavior.
+ */
+
 enum virtio_mem_mb_state {
 	/* Unplugged, not added to Linux. Can be reused later. */
 	VIRTIO_MEM_MB_STATE_UNUSED = 0,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 16/29] virtio-mem: memory block states are specific to Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (14 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM) David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:40   ` Wei Yang
                     ` (2 more replies)
  2020-10-12 12:53 ` [PATCH v1 17/29] virito-mem: subblock " David Hildenbrand
                   ` (14 subsequent siblings)
  30 siblings, 3 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

let's use a new "sbm" sub-struct to hold SBM-specific state and rename +
move applicable definitions, frunctions, and variables (related to
memory block states).

While at it:
- Drop the "_STATE" part from memory block states
- Rename "nb_mb_state" to "mb_count"
- "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state"
- Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t"

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 215 ++++++++++++++++++------------------
 1 file changed, 109 insertions(+), 106 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index fd8685673fe4..e76d6f769aa5 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -42,20 +42,23 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
  * onlined to the same zone - virtio-mem relies on this behavior.
  */
 
-enum virtio_mem_mb_state {
+/*
+ * State of a Linux memory block in SBM.
+ */
+enum virtio_mem_sbm_mb_state {
 	/* Unplugged, not added to Linux. Can be reused later. */
-	VIRTIO_MEM_MB_STATE_UNUSED = 0,
+	VIRTIO_MEM_SBM_MB_UNUSED = 0,
 	/* (Partially) plugged, not added to Linux. Error on add_memory(). */
-	VIRTIO_MEM_MB_STATE_PLUGGED,
+	VIRTIO_MEM_SBM_MB_PLUGGED,
 	/* Fully plugged, fully added to Linux, offline. */
-	VIRTIO_MEM_MB_STATE_OFFLINE,
+	VIRTIO_MEM_SBM_MB_OFFLINE,
 	/* Partially plugged, fully added to Linux, offline. */
-	VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
+	VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL,
 	/* Fully plugged, fully added to Linux, online. */
-	VIRTIO_MEM_MB_STATE_ONLINE,
+	VIRTIO_MEM_SBM_MB_ONLINE,
 	/* Partially plugged, fully added to Linux, online. */
-	VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
-	VIRTIO_MEM_MB_STATE_COUNT
+	VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL,
+	VIRTIO_MEM_SBM_MB_COUNT
 };
 
 struct virtio_mem {
@@ -113,9 +116,6 @@ struct virtio_mem {
 	 */
 	const char *resource_name;
 
-	/* Summary of all memory block states. */
-	unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
-
 	/*
 	 * We don't want to add too much memory if it's not getting onlined,
 	 * to avoid running OOM. Besides this threshold, we allow to have at
@@ -125,27 +125,29 @@ struct virtio_mem {
 	atomic64_t offline_size;
 	uint64_t offline_threshold;
 
-	/*
-	 * One byte state per memory block.
-	 *
-	 * Allocated via vmalloc(). When preparing new blocks, resized
-	 * (alloc+copy+free) when needed (crossing pages with the next mb).
-	 * (when crossing pages).
-	 *
-	 * With 128MB memory blocks, we have states for 512GB of memory in one
-	 * page.
-	 */
-	uint8_t *mb_state;
+	struct {
+		/* Summary of all memory block states. */
+		unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
+
+		/*
+		 * One byte state per memory block. Allocated via vmalloc().
+		 * Resized (alloc+copy+free) on demand.
+		 *
+		 * With 128 MiB memory blocks, we have states for 512 GiB of
+		 * memory in one 4 KiB page.
+		 */
+		uint8_t *mb_states;
+	} sbm;
 
 	/*
-	 * $nb_sb_per_mb bit per memory block. Handled similar to mb_state.
+	 * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
 	 *
 	 * With 4MB subblocks, we manage 128GB of memory in one page.
 	 */
 	unsigned long *sb_bitmap;
 
 	/*
-	 * Mutex that protects the nb_mb_state, mb_state, and sb_bitmap.
+	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
 	 *
 	 * When this lock is held the pointers can't change, ONLINE and
 	 * OFFLINE blocks can't change the state and no subblocks will get
@@ -254,70 +256,70 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
 /*
  * Set the state of a memory block, taking care of the state counter.
  */
-static void virtio_mem_mb_set_state(struct virtio_mem *vm, unsigned long mb_id,
-				    enum virtio_mem_mb_state state)
+static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
+					unsigned long mb_id, uint8_t state)
 {
 	const unsigned long idx = mb_id - vm->first_mb_id;
-	enum virtio_mem_mb_state old_state;
+	uint8_t old_state;
 
-	old_state = vm->mb_state[idx];
-	vm->mb_state[idx] = state;
+	old_state = vm->sbm.mb_states[idx];
+	vm->sbm.mb_states[idx] = state;
 
-	BUG_ON(vm->nb_mb_state[old_state] == 0);
-	vm->nb_mb_state[old_state]--;
-	vm->nb_mb_state[state]++;
+	BUG_ON(vm->sbm.mb_count[old_state] == 0);
+	vm->sbm.mb_count[old_state]--;
+	vm->sbm.mb_count[state]++;
 }
 
 /*
  * Get the state of a memory block.
  */
-static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
-							unsigned long mb_id)
+static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
+					   unsigned long mb_id)
 {
 	const unsigned long idx = mb_id - vm->first_mb_id;
 
-	return vm->mb_state[idx];
+	return vm->sbm.mb_states[idx];
 }
 
 /*
  * Prepare the state array for the next memory block.
  */
-static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
+static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
 {
 	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
 	unsigned long new_bytes = old_bytes + 1;
 	int old_pages = PFN_UP(old_bytes);
 	int new_pages = PFN_UP(new_bytes);
-	uint8_t *new_mb_state;
+	uint8_t *new_array;
 
-	if (vm->mb_state && old_pages == new_pages)
+	if (vm->sbm.mb_states && old_pages == new_pages)
 		return 0;
 
-	new_mb_state = vzalloc(new_pages * PAGE_SIZE);
-	if (!new_mb_state)
+	new_array = vzalloc(new_pages * PAGE_SIZE);
+	if (!new_array)
 		return -ENOMEM;
 
 	mutex_lock(&vm->hotplug_mutex);
-	if (vm->mb_state)
-		memcpy(new_mb_state, vm->mb_state, old_pages * PAGE_SIZE);
-	vfree(vm->mb_state);
-	vm->mb_state = new_mb_state;
+	if (vm->sbm.mb_states)
+		memcpy(new_array, vm->sbm.mb_states, old_pages * PAGE_SIZE);
+	vfree(vm->sbm.mb_states);
+	vm->sbm.mb_states = new_array;
 	mutex_unlock(&vm->hotplug_mutex);
 
 	return 0;
 }
 
-#define virtio_mem_for_each_mb_state(_vm, _mb_id, _state) \
+#define virtio_mem_sbm_for_each_mb(_vm, _mb_id, _state) \
 	for (_mb_id = _vm->first_mb_id; \
-	     _mb_id < _vm->next_mb_id && _vm->nb_mb_state[_state]; \
+	     _mb_id < _vm->next_mb_id && _vm->sbm.mb_count[_state]; \
 	     _mb_id++) \
-		if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
+		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
 
-#define virtio_mem_for_each_mb_state_rev(_vm, _mb_id, _state) \
+#define virtio_mem_sbm_for_each_mb_rev(_vm, _mb_id, _state) \
 	for (_mb_id = _vm->next_mb_id - 1; \
-	     _mb_id >= _vm->first_mb_id && _vm->nb_mb_state[_state]; \
+	     _mb_id >= _vm->first_mb_id && _vm->sbm.mb_count[_state]; \
 	     _mb_id--) \
-		if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
+		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
 
 /*
  * Mark all selected subblocks plugged.
@@ -573,9 +575,9 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
 static int virtio_mem_notify_going_online(struct virtio_mem *vm,
 					  unsigned long mb_id)
 {
-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
-	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-	case VIRTIO_MEM_MB_STATE_OFFLINE:
+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
+	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
+	case VIRTIO_MEM_SBM_MB_OFFLINE:
 		return NOTIFY_OK;
 	default:
 		break;
@@ -588,14 +590,14 @@ static int virtio_mem_notify_going_online(struct virtio_mem *vm,
 static void virtio_mem_notify_offline(struct virtio_mem *vm,
 				      unsigned long mb_id)
 {
-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
-	case VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL:
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
+	case VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL:
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
 		break;
-	case VIRTIO_MEM_MB_STATE_ONLINE:
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_OFFLINE);
+	case VIRTIO_MEM_SBM_MB_ONLINE:
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_OFFLINE);
 		break;
 	default:
 		BUG();
@@ -605,13 +607,14 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
 
 static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 {
-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
-	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
+	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
 		break;
-	case VIRTIO_MEM_MB_STATE_OFFLINE:
-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
+	case VIRTIO_MEM_SBM_MB_OFFLINE:
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_ONLINE);
 		break;
 	default:
 		BUG();
@@ -1160,7 +1163,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
 		return -ENOSPC;
 
 	/* Resize the state array if required. */
-	rc = virtio_mem_mb_state_prepare_next_mb(vm);
+	rc = virtio_mem_sbm_mb_states_prepare_next_mb(vm);
 	if (rc)
 		return rc;
 
@@ -1169,7 +1172,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
 	if (rc)
 		return rc;
 
-	vm->nb_mb_state[VIRTIO_MEM_MB_STATE_UNUSED]++;
+	vm->sbm.mb_count[VIRTIO_MEM_SBM_MB_UNUSED]++;
 	*mb_id = vm->next_mb_id++;
 	return 0;
 }
@@ -1203,16 +1206,16 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
 	 * so the memory notifiers will find the block in the right state.
 	 */
 	if (count == vm->nb_sb_per_mb)
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_OFFLINE);
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_OFFLINE);
 	else
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
 
 	/* Add the memory block to linux - if that fails, try to unplug. */
 	rc = virtio_mem_mb_add(vm, mb_id);
 	if (rc) {
-		enum virtio_mem_mb_state new_state = VIRTIO_MEM_MB_STATE_UNUSED;
+		int new_state = VIRTIO_MEM_SBM_MB_UNUSED;
 
 		dev_err(&vm->vdev->dev,
 			"adding memory block %lu failed with %d\n", mb_id, rc);
@@ -1222,8 +1225,8 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
 		 * where adding of memory failed - especially on -ENOMEM.
 		 */
 		if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
-			new_state = VIRTIO_MEM_MB_STATE_PLUGGED;
-		virtio_mem_mb_set_state(vm, mb_id, new_state);
+			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
+		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
 		return rc;
 	}
 
@@ -1276,11 +1279,11 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
 
 	if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
 		if (online)
-			virtio_mem_mb_set_state(vm, mb_id,
-						VIRTIO_MEM_MB_STATE_ONLINE);
+			virtio_mem_sbm_set_mb_state(vm, mb_id,
+						    VIRTIO_MEM_SBM_MB_ONLINE);
 		else
-			virtio_mem_mb_set_state(vm, mb_id,
-						VIRTIO_MEM_MB_STATE_OFFLINE);
+			virtio_mem_sbm_set_mb_state(vm, mb_id,
+						    VIRTIO_MEM_SBM_MB_OFFLINE);
 	}
 
 	return 0;
@@ -1302,8 +1305,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 	mutex_lock(&vm->hotplug_mutex);
 
 	/* Try to plug subblocks of partially plugged online blocks. */
-	virtio_mem_for_each_mb_state(vm, mb_id,
-				     VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
+	virtio_mem_sbm_for_each_mb(vm, mb_id,
+				   VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
 		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, true);
 		if (rc || !nb_sb)
 			goto out_unlock;
@@ -1311,8 +1314,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 	}
 
 	/* Try to plug subblocks of partially plugged offline blocks. */
-	virtio_mem_for_each_mb_state(vm, mb_id,
-				     VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
+	virtio_mem_sbm_for_each_mb(vm, mb_id,
+				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
 		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, false);
 		if (rc || !nb_sb)
 			goto out_unlock;
@@ -1326,7 +1329,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 	mutex_unlock(&vm->hotplug_mutex);
 
 	/* Try to plug and add unused blocks */
-	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED) {
+	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_UNUSED) {
 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
 			return -ENOSPC;
 
@@ -1375,8 +1378,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
 
 	/* some subblocks might have been unplugged even on failure */
 	if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
 	if (rc)
 		return rc;
 
@@ -1387,8 +1390,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
 		 * unplugged. Temporarily drop the mutex, so
 		 * any pending GOING_ONLINE requests can be serviced/rejected.
 		 */
-		virtio_mem_mb_set_state(vm, mb_id,
-					VIRTIO_MEM_MB_STATE_UNUSED);
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_UNUSED);
 
 		mutex_unlock(&vm->hotplug_mutex);
 		rc = virtio_mem_mb_remove(vm, mb_id);
@@ -1426,8 +1429,8 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
 		return rc;
 	}
 
-	virtio_mem_mb_set_state(vm, mb_id,
-				VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
+	virtio_mem_sbm_set_mb_state(vm, mb_id,
+				    VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
 	return 0;
 }
 
@@ -1487,8 +1490,8 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
 		mutex_lock(&vm->hotplug_mutex);
 		if (!rc)
-			virtio_mem_mb_set_state(vm, mb_id,
-						VIRTIO_MEM_MB_STATE_UNUSED);
+			virtio_mem_sbm_set_mb_state(vm, mb_id,
+						    VIRTIO_MEM_SBM_MB_UNUSED);
 	}
 
 	return 0;
@@ -1514,8 +1517,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	mutex_lock(&vm->hotplug_mutex);
 
 	/* Try to unplug subblocks of partially plugged offline blocks. */
-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
-					 VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
+				       VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
 		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
 							 &nb_sb);
 		if (rc || !nb_sb)
@@ -1524,8 +1527,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	}
 
 	/* Try to unplug subblocks of plugged offline blocks. */
-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
-					 VIRTIO_MEM_MB_STATE_OFFLINE) {
+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_OFFLINE) {
 		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
 							 &nb_sb);
 		if (rc || !nb_sb)
@@ -1539,8 +1541,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	}
 
 	/* Try to unplug subblocks of partially plugged online blocks. */
-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
-					 VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
+				       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
 		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
 							&nb_sb);
 		if (rc || !nb_sb)
@@ -1551,8 +1553,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	}
 
 	/* Try to unplug subblocks of plugged online blocks. */
-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
-					 VIRTIO_MEM_MB_STATE_ONLINE) {
+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_ONLINE) {
 		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
 							&nb_sb);
 		if (rc || !nb_sb)
@@ -1578,11 +1579,12 @@ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
 	unsigned long mb_id;
 	int rc;
 
-	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_PLUGGED) {
+	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
 		rc = virtio_mem_mb_unplug(vm, mb_id);
 		if (rc)
 			return rc;
-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_UNUSED);
 	}
 
 	return 0;
@@ -1974,11 +1976,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	 * After we unregistered our callbacks, user space can online partially
 	 * plugged offline blocks. Make sure to remove them.
 	 */
-	virtio_mem_for_each_mb_state(vm, mb_id,
-				     VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
+	virtio_mem_sbm_for_each_mb(vm, mb_id,
+				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
 		rc = virtio_mem_mb_remove(vm, mb_id);
 		BUG_ON(rc);
-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
+		virtio_mem_sbm_set_mb_state(vm, mb_id,
+					    VIRTIO_MEM_SBM_MB_UNUSED);
 	}
 	/*
 	 * After we unregistered our callbacks, user space can no longer
@@ -2003,7 +2006,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	}
 
 	/* remove all tracking data - no locking needed */
-	vfree(vm->mb_state);
+	vfree(vm->sbm.mb_states);
 	vfree(vm->sb_bitmap);
 
 	/* reset the device and cleanup the queues */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 17/29] virito-mem: subblock states are specific to Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (15 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 16/29] virtio-mem: memory block states are specific to " David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:43   ` Wei Yang
  2020-10-20  9:54   ` Pankaj Gupta
  2020-10-12 12:53 ` [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap David Hildenbrand
                   ` (13 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's rename and move accordingly. While at it, rename sb_bitmap to
"sb_states".

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 118 +++++++++++++++++++-----------------
 1 file changed, 62 insertions(+), 56 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index e76d6f769aa5..2cc497ad8298 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -137,17 +137,23 @@ struct virtio_mem {
 		 * memory in one 4 KiB page.
 		 */
 		uint8_t *mb_states;
-	} sbm;
 
-	/*
-	 * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
-	 *
-	 * With 4MB subblocks, we manage 128GB of memory in one page.
-	 */
-	unsigned long *sb_bitmap;
+		/*
+		 * Bitmap: one bit per subblock. Allocated similar to
+		 * sbm.mb_states.
+		 *
+		 * A set bit means the corresponding subblock is plugged,
+		 * otherwise it's unblocked.
+		 *
+		 * With 4 MiB subblocks, we manage 128 GiB of memory in one
+		 * 4 KiB page.
+		 */
+		unsigned long *sb_states;
+	} sbm;
 
 	/*
-	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
+	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and
+	 * sbm.sb_states.
 	 *
 	 * When this lock is held the pointers can't change, ONLINE and
 	 * OFFLINE blocks can't change the state and no subblocks will get
@@ -326,13 +332,13 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
  *
  * Will not modify the state of the memory block.
  */
-static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
-					 unsigned long mb_id, int sb_id,
-					 int count)
+static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
+					  unsigned long mb_id, int sb_id,
+					  int count)
 {
 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
 
-	__bitmap_set(vm->sb_bitmap, bit, count);
+	__bitmap_set(vm->sbm.sb_states, bit, count);
 }
 
 /*
@@ -340,86 +346,87 @@ static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
  *
  * Will not modify the state of the memory block.
  */
-static void virtio_mem_mb_set_sb_unplugged(struct virtio_mem *vm,
-					   unsigned long mb_id, int sb_id,
-					   int count)
+static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
+					    unsigned long mb_id, int sb_id,
+					    int count)
 {
 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
 
-	__bitmap_clear(vm->sb_bitmap, bit, count);
+	__bitmap_clear(vm->sbm.sb_states, bit, count);
 }
 
 /*
  * Test if all selected subblocks are plugged.
  */
-static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm,
-					  unsigned long mb_id, int sb_id,
-					  int count)
+static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
+					   unsigned long mb_id, int sb_id,
+					   int count)
 {
 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
 
 	if (count == 1)
-		return test_bit(bit, vm->sb_bitmap);
+		return test_bit(bit, vm->sbm.sb_states);
 
 	/* TODO: Helper similar to bitmap_set() */
-	return find_next_zero_bit(vm->sb_bitmap, bit + count, bit) >=
+	return find_next_zero_bit(vm->sbm.sb_states, bit + count, bit) >=
 	       bit + count;
 }
 
 /*
  * Test if all selected subblocks are unplugged.
  */
-static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm,
-					    unsigned long mb_id, int sb_id,
-					    int count)
+static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
+					     unsigned long mb_id, int sb_id,
+					     int count)
 {
 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
 
 	/* TODO: Helper similar to bitmap_set() */
-	return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count;
+	return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
+	       bit + count;
 }
 
 /*
  * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
  * none.
  */
-static int virtio_mem_mb_first_unplugged_sb(struct virtio_mem *vm,
+static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
 
-	return find_next_zero_bit(vm->sb_bitmap, bit + vm->nb_sb_per_mb, bit) -
-	       bit;
+	return find_next_zero_bit(vm->sbm.sb_states,
+				  bit + vm->nb_sb_per_mb, bit) - bit;
 }
 
 /*
  * Prepare the subblock bitmap for the next memory block.
  */
-static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
+static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
 {
 	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
 	const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
 	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
 	int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
-	unsigned long *new_sb_bitmap, *old_sb_bitmap;
+	unsigned long *new_bitmap, *old_bitmap;
 
-	if (vm->sb_bitmap && old_pages == new_pages)
+	if (vm->sbm.sb_states && old_pages == new_pages)
 		return 0;
 
-	new_sb_bitmap = vzalloc(new_pages * PAGE_SIZE);
-	if (!new_sb_bitmap)
+	new_bitmap = vzalloc(new_pages * PAGE_SIZE);
+	if (!new_bitmap)
 		return -ENOMEM;
 
 	mutex_lock(&vm->hotplug_mutex);
-	if (new_sb_bitmap)
-		memcpy(new_sb_bitmap, vm->sb_bitmap, old_pages * PAGE_SIZE);
+	if (new_bitmap)
+		memcpy(new_bitmap, vm->sbm.sb_states, old_pages * PAGE_SIZE);
 
-	old_sb_bitmap = vm->sb_bitmap;
-	vm->sb_bitmap = new_sb_bitmap;
+	old_bitmap = vm->sbm.sb_states;
+	vm->sbm.sb_states = new_bitmap;
 	mutex_unlock(&vm->hotplug_mutex);
 
-	vfree(old_sb_bitmap);
+	vfree(old_bitmap);
 	return 0;
 }
 
@@ -630,7 +637,7 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
 	int sb_id;
 
 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
-		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
 			continue;
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
 			       sb_id * vm->subblock_size);
@@ -646,7 +653,7 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
 	int sb_id;
 
 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
-		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
 			continue;
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
 			       sb_id * vm->subblock_size);
@@ -936,7 +943,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 		 * If plugged, online the pages, otherwise, set them fake
 		 * offline (PageOffline).
 		 */
-		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
 			generic_online_page(page, order);
 		else
 			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
@@ -1071,7 +1078,7 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
 
 	rc = virtio_mem_send_plug_request(vm, addr, size);
 	if (!rc)
-		virtio_mem_mb_set_sb_plugged(vm, mb_id, sb_id, count);
+		virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
 	return rc;
 }
 
@@ -1092,7 +1099,7 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
 
 	rc = virtio_mem_send_unplug_request(vm, addr, size);
 	if (!rc)
-		virtio_mem_mb_set_sb_unplugged(vm, mb_id, sb_id, count);
+		virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
 	return rc;
 }
 
@@ -1115,14 +1122,14 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
 	while (*nb_sb) {
 		/* Find the next candidate subblock */
 		while (sb_id >= 0 &&
-		       virtio_mem_mb_test_sb_unplugged(vm, mb_id, sb_id, 1))
+		       virtio_mem_sbm_test_sb_unplugged(vm, mb_id, sb_id, 1))
 			sb_id--;
 		if (sb_id < 0)
 			break;
 		/* Try to unplug multiple subblocks at a time */
 		count = 1;
 		while (count < *nb_sb && sb_id > 0 &&
-		       virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
+		       virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
 			count++;
 			sb_id--;
 		}
@@ -1168,7 +1175,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
 		return rc;
 
 	/* Resize the subblock bitmap if required. */
-	rc = virtio_mem_sb_bitmap_prepare_next_mb(vm);
+	rc = virtio_mem_sbm_sb_states_prepare_next_mb(vm);
 	if (rc)
 		return rc;
 
@@ -1253,14 +1260,13 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
 		return -EINVAL;
 
 	while (*nb_sb) {
-		sb_id = virtio_mem_mb_first_unplugged_sb(vm, mb_id);
+		sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
 		if (sb_id >= vm->nb_sb_per_mb)
 			break;
 		count = 1;
 		while (count < *nb_sb &&
 		       sb_id + count < vm->nb_sb_per_mb &&
-		       !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id + count,
-						      1))
+		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
 			count++;
 
 		rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
@@ -1277,7 +1283,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
 		virtio_mem_fake_online(pfn, nr_pages);
 	}
 
-	if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
 		if (online)
 			virtio_mem_sbm_set_mb_state(vm, mb_id,
 						    VIRTIO_MEM_SBM_MB_ONLINE);
@@ -1377,13 +1383,13 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
 	rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
 
 	/* some subblocks might have been unplugged even on failure */
-	if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
+	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
 		virtio_mem_sbm_set_mb_state(vm, mb_id,
 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
 	if (rc)
 		return rc;
 
-	if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
 		/*
 		 * Remove the block from Linux - this should never fail.
 		 * Hinder the block from getting onlined by marking it
@@ -1452,7 +1458,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
 
 	/* If possible, try to unplug the complete block in one shot. */
 	if (*nb_sb >= vm->nb_sb_per_mb &&
-	    virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
 		rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
 						    vm->nb_sb_per_mb);
 		if (!rc) {
@@ -1466,7 +1472,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
 	for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
 		/* Find the next candidate subblock */
 		while (sb_id >= 0 &&
-		       !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
 			sb_id--;
 		if (sb_id < 0)
 			break;
@@ -1485,7 +1491,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
 	 * remove it. This will usually not fail, as no memory is in use
 	 * anymore - however some other notifiers might NACK the request.
 	 */
-	if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
 		mutex_unlock(&vm->hotplug_mutex);
 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
 		mutex_lock(&vm->hotplug_mutex);
@@ -2007,7 +2013,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 
 	/* remove all tracking data - no locking needed */
 	vfree(vm->sbm.mb_states);
-	vfree(vm->sb_bitmap);
+	vfree(vm->sbm.sb_states);
 
 	/* reset the device and cleanup the queues */
 	vdev->config->reset(vdev);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (16 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 17/29] virito-mem: subblock " David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:46   ` Wei Yang
  2020-10-20  9:58   ` Pankaj Gupta
  2020-10-12 12:53 ` [PATCH v1 19/29] virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM) David Hildenbrand
                   ` (12 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

The calculation is already complicated enough, let's limit it to one
location.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 2cc497ad8298..73ff6e9ba839 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -327,6 +327,16 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
 	     _mb_id--) \
 		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
 
+/*
+ * Calculate the bit number in the sb_states bitmap for the given subblock
+ * inside the given memory block.
+ */
+static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
+					  unsigned long mb_id, int sb_id)
+{
+	return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+}
+
 /*
  * Mark all selected subblocks plugged.
  *
@@ -336,7 +346,7 @@ static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
 					  unsigned long mb_id, int sb_id,
 					  int count)
 {
-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
 
 	__bitmap_set(vm->sbm.sb_states, bit, count);
 }
@@ -350,7 +360,7 @@ static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
 					    unsigned long mb_id, int sb_id,
 					    int count)
 {
-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
 
 	__bitmap_clear(vm->sbm.sb_states, bit, count);
 }
@@ -362,7 +372,7 @@ static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
 					   unsigned long mb_id, int sb_id,
 					   int count)
 {
-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
 
 	if (count == 1)
 		return test_bit(bit, vm->sbm.sb_states);
@@ -379,7 +389,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
 					     unsigned long mb_id, int sb_id,
 					     int count)
 {
-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
 
 	/* TODO: Helper similar to bitmap_set() */
 	return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
@@ -393,7 +403,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
 static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
 
 	return find_next_zero_bit(vm->sbm.sb_states,
 				  bit + vm->nb_sb_per_mb, bit) - bit;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 19/29] virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (17 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:49   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size " David Hildenbrand
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's rename them accordingly. virtio_mem_plug_request() and
virtio_mem_unplug_request() will be handled separately.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 90 ++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 47 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 73ff6e9ba839..fc2b1ff3beed 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1075,8 +1075,8 @@ static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
  * Plug selected subblocks. Updates the plugged state, but not the state
  * of the memory block.
  */
-static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
-				 int sb_id, int count)
+static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
+				  int sb_id, int count)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
 			      sb_id * vm->subblock_size;
@@ -1096,8 +1096,8 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
  * Unplug selected subblocks. Updates the plugged state, but not the state
  * of the memory block.
  */
-static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
-				   int sb_id, int count)
+static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
+				    int sb_id, int count)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
 			      sb_id * vm->subblock_size;
@@ -1122,8 +1122,8 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
  *
  * Note: can fail after some subblocks were unplugged.
  */
-static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
-				       unsigned long mb_id, uint64_t *nb_sb)
+static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
+					unsigned long mb_id, uint64_t *nb_sb)
 {
 	int sb_id, count;
 	int rc;
@@ -1144,7 +1144,7 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
 			sb_id--;
 		}
 
-		rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);
+		rc = virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count);
 		if (rc)
 			return rc;
 		*nb_sb -= count;
@@ -1161,18 +1161,18 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
  *
  * Note: can fail after some subblocks were unplugged.
  */
-static int virtio_mem_mb_unplug(struct virtio_mem *vm, unsigned long mb_id)
+static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
 {
 	uint64_t nb_sb = vm->nb_sb_per_mb;
 
-	return virtio_mem_mb_unplug_any_sb(vm, mb_id, &nb_sb);
+	return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
 }
 
 /*
  * Prepare tracking data for the next memory block.
  */
-static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
-				      unsigned long *mb_id)
+static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
+					  unsigned long *mb_id)
 {
 	int rc;
 
@@ -1200,9 +1200,8 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
  *
  * Will modify the state of the memory block.
  */
-static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
-				      unsigned long mb_id,
-				      uint64_t *nb_sb)
+static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
+					  unsigned long mb_id, uint64_t *nb_sb)
 {
 	const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
 	int rc;
@@ -1214,7 +1213,7 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
 	 * Plug the requested number of subblocks before adding it to linux,
 	 * so that onlining will directly online all plugged subblocks.
 	 */
-	rc = virtio_mem_mb_plug_sb(vm, mb_id, 0, count);
+	rc = virtio_mem_sbm_plug_sb(vm, mb_id, 0, count);
 	if (rc)
 		return rc;
 
@@ -1241,7 +1240,7 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
 		 * TODO: Linux MM does not properly clean up yet in all cases
 		 * where adding of memory failed - especially on -ENOMEM.
 		 */
-		if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
+		if (virtio_mem_sbm_unplug_sb(vm, mb_id, 0, count))
 			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
 		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
 		return rc;
@@ -1259,8 +1258,9 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
  *
  * Note: Can fail after some subblocks were successfully plugged.
  */
-static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
-				     uint64_t *nb_sb, bool online)
+static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
+				      unsigned long mb_id, uint64_t *nb_sb,
+				      bool online)
 {
 	unsigned long pfn, nr_pages;
 	int sb_id, count;
@@ -1279,7 +1279,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
 			count++;
 
-		rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
+		rc = virtio_mem_sbm_plug_sb(vm, mb_id, sb_id, count);
 		if (rc)
 			return rc;
 		*nb_sb -= count;
@@ -1323,7 +1323,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 	/* Try to plug subblocks of partially plugged online blocks. */
 	virtio_mem_sbm_for_each_mb(vm, mb_id,
 				   VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
-		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, true);
+		rc = virtio_mem_sbm_plug_any_sb(vm, mb_id, &nb_sb, true);
 		if (rc || !nb_sb)
 			goto out_unlock;
 		cond_resched();
@@ -1332,7 +1332,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 	/* Try to plug subblocks of partially plugged offline blocks. */
 	virtio_mem_sbm_for_each_mb(vm, mb_id,
 				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
-		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, false);
+		rc = virtio_mem_sbm_plug_any_sb(vm, mb_id, &nb_sb, false);
 		if (rc || !nb_sb)
 			goto out_unlock;
 		cond_resched();
@@ -1349,7 +1349,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
 			return -ENOSPC;
 
-		rc = virtio_mem_mb_plug_and_add(vm, mb_id, &nb_sb);
+		rc = virtio_mem_sbm_plug_and_add_mb(vm, mb_id, &nb_sb);
 		if (rc || !nb_sb)
 			return rc;
 		cond_resched();
@@ -1360,10 +1360,10 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
 			return -ENOSPC;
 
-		rc = virtio_mem_prepare_next_mb(vm, &mb_id);
+		rc = virtio_mem_sbm_prepare_next_mb(vm, &mb_id);
 		if (rc)
 			return rc;
-		rc = virtio_mem_mb_plug_and_add(vm, mb_id, &nb_sb);
+		rc = virtio_mem_sbm_plug_and_add_mb(vm, mb_id, &nb_sb);
 		if (rc)
 			return rc;
 		cond_resched();
@@ -1384,13 +1384,13 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
  *
  * Note: Can fail after some subblocks were successfully unplugged.
  */
-static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
-					       unsigned long mb_id,
-					       uint64_t *nb_sb)
+static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
+						unsigned long mb_id,
+						uint64_t *nb_sb)
 {
 	int rc;
 
-	rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
+	rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
 
 	/* some subblocks might have been unplugged even on failure */
 	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
@@ -1422,9 +1422,9 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
  *
  * Will modify the state of the memory block.
  */
-static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
-					  unsigned long mb_id, int sb_id,
-					  int count)
+static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
+					   unsigned long mb_id, int sb_id,
+					   int count)
 {
 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
 	unsigned long start_pfn;
@@ -1438,7 +1438,7 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
 		return rc;
 
 	/* Try to unplug the allocated memory */
-	rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);
+	rc = virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count);
 	if (rc) {
 		/* Return the memory to the buddy. */
 		virtio_mem_fake_online(start_pfn, nr_pages);
@@ -1460,17 +1460,17 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
  * Note: Can fail after some subblocks were successfully unplugged. Can
  *       return 0 even if subblocks were busy and could not get unplugged.
  */
-static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
-					      unsigned long mb_id,
-					      uint64_t *nb_sb)
+static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
+					       unsigned long mb_id,
+					       uint64_t *nb_sb)
 {
 	int rc, sb_id;
 
 	/* If possible, try to unplug the complete block in one shot. */
 	if (*nb_sb >= vm->nb_sb_per_mb &&
 	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
-		rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
-						    vm->nb_sb_per_mb);
+		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
+						     vm->nb_sb_per_mb);
 		if (!rc) {
 			*nb_sb -= vm->nb_sb_per_mb;
 			goto unplugged;
@@ -1487,7 +1487,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
 		if (sb_id < 0)
 			break;
 
-		rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, sb_id, 1);
+		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, sb_id, 1);
 		if (rc == -EBUSY)
 			continue;
 		else if (rc)
@@ -1535,8 +1535,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	/* Try to unplug subblocks of partially plugged offline blocks. */
 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
 				       VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
-		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
-							 &nb_sb);
+		rc = virtio_mem_sbm_unplug_any_sb_offline(vm, mb_id, &nb_sb);
 		if (rc || !nb_sb)
 			goto out_unlock;
 		cond_resched();
@@ -1544,8 +1543,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 
 	/* Try to unplug subblocks of plugged offline blocks. */
 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_OFFLINE) {
-		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
-							 &nb_sb);
+		rc = virtio_mem_sbm_unplug_any_sb_offline(vm, mb_id, &nb_sb);
 		if (rc || !nb_sb)
 			goto out_unlock;
 		cond_resched();
@@ -1559,8 +1557,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	/* Try to unplug subblocks of partially plugged online blocks. */
 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
 				       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
-		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
-							&nb_sb);
+		rc = virtio_mem_sbm_unplug_any_sb_online(vm, mb_id, &nb_sb);
 		if (rc || !nb_sb)
 			goto out_unlock;
 		mutex_unlock(&vm->hotplug_mutex);
@@ -1570,8 +1567,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 
 	/* Try to unplug subblocks of plugged online blocks. */
 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_ONLINE) {
-		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
-							&nb_sb);
+		rc = virtio_mem_sbm_unplug_any_sb_online(vm, mb_id, &nb_sb);
 		if (rc || !nb_sb)
 			goto out_unlock;
 		mutex_unlock(&vm->hotplug_mutex);
@@ -1596,7 +1592,7 @@ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
 	int rc;
 
 	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
-		rc = virtio_mem_mb_unplug(vm, mb_id);
+		rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
 		if (rc)
 			return rc;
 		virtio_mem_sbm_set_mb_state(vm, mb_id,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (18 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 19/29] virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM) David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:51   ` Wei Yang
  2020-10-16  8:53   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 21/29] virtio-mem: memory notifier callbacks " David Hildenbrand
                   ` (10 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 96 ++++++++++++++++++-------------------
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index fc2b1ff3beed..3a772714fec9 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -96,11 +96,6 @@ struct virtio_mem {
 	/* Maximum region size in bytes. */
 	uint64_t region_size;
 
-	/* The subblock size. */
-	uint64_t subblock_size;
-	/* The number of subblocks per memory block. */
-	uint32_t nb_sb_per_mb;
-
 	/* Id of the first memory block of this device. */
 	unsigned long first_mb_id;
 	/* Id of the last usable memory block of this device. */
@@ -126,6 +121,11 @@ struct virtio_mem {
 	uint64_t offline_threshold;
 
 	struct {
+		/* The subblock size. */
+		uint64_t sb_size;
+		/* The number of subblocks per Linux memory block. */
+		uint32_t sbs_per_mb;
+
 		/* Summary of all memory block states. */
 		unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
 
@@ -256,7 +256,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
 	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
 	const unsigned long mb_addr = virtio_mem_mb_id_to_phys(mb_id);
 
-	return (addr - mb_addr) / vm->subblock_size;
+	return (addr - mb_addr) / vm->sbm.sb_size;
 }
 
 /*
@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
 static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
 					  unsigned long mb_id, int sb_id)
 {
-	return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+	return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
 }
 
 /*
@@ -397,7 +397,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
 }
 
 /*
- * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
+ * Find the first unplugged subblock. Returns vm->sbm.sbs_per_mb in case there is
  * none.
  */
 static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
@@ -406,7 +406,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
 	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
 
 	return find_next_zero_bit(vm->sbm.sb_states,
-				  bit + vm->nb_sb_per_mb, bit) - bit;
+				  bit + vm->sbm.sbs_per_mb, bit) - bit;
 }
 
 /*
@@ -415,8 +415,8 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
 static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
 {
 	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
-	const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
-	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
+	const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
+	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
 	int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
 	unsigned long *new_bitmap, *old_bitmap;
@@ -642,15 +642,15 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
 					    unsigned long mb_id)
 {
-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
 	unsigned long pfn;
 	int sb_id;
 
-	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
+	for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
 		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
 			continue;
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
-			       sb_id * vm->subblock_size);
+			       sb_id * vm->sbm.sb_size);
 		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
 	}
 }
@@ -658,15 +658,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
 static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
 					     unsigned long mb_id)
 {
-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
 	unsigned long pfn;
 	int sb_id;
 
-	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
+	for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
 		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
 			continue;
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
-			       sb_id * vm->subblock_size);
+			       sb_id * vm->sbm.sb_size);
 		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
 	}
 }
@@ -1079,8 +1079,8 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
 				  int sb_id, int count)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
-			      sb_id * vm->subblock_size;
-	const uint64_t size = count * vm->subblock_size;
+			      sb_id * vm->sbm.sb_size;
+	const uint64_t size = count * vm->sbm.sb_size;
 	int rc;
 
 	dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
@@ -1100,8 +1100,8 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
 				    int sb_id, int count)
 {
 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
-			      sb_id * vm->subblock_size;
-	const uint64_t size = count * vm->subblock_size;
+			      sb_id * vm->sbm.sb_size;
+	const uint64_t size = count * vm->sbm.sb_size;
 	int rc;
 
 	dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
@@ -1128,7 +1128,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
 	int sb_id, count;
 	int rc;
 
-	sb_id = vm->nb_sb_per_mb - 1;
+	sb_id = vm->sbm.sbs_per_mb - 1;
 	while (*nb_sb) {
 		/* Find the next candidate subblock */
 		while (sb_id >= 0 &&
@@ -1163,7 +1163,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
  */
 static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
 {
-	uint64_t nb_sb = vm->nb_sb_per_mb;
+	uint64_t nb_sb = vm->sbm.sbs_per_mb;
 
 	return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
 }
@@ -1203,7 +1203,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
 static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
 					  unsigned long mb_id, uint64_t *nb_sb)
 {
-	const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
+	const int count = min_t(int, *nb_sb, vm->sbm.sbs_per_mb);
 	int rc;
 
 	if (WARN_ON_ONCE(!count))
@@ -1221,7 +1221,7 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
 	 * Mark the block properly offline before adding it to Linux,
 	 * so the memory notifiers will find the block in the right state.
 	 */
-	if (count == vm->nb_sb_per_mb)
+	if (count == vm->sbm.sbs_per_mb)
 		virtio_mem_sbm_set_mb_state(vm, mb_id,
 					    VIRTIO_MEM_SBM_MB_OFFLINE);
 	else
@@ -1271,11 +1271,11 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
 
 	while (*nb_sb) {
 		sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
-		if (sb_id >= vm->nb_sb_per_mb)
+		if (sb_id >= vm->sbm.sbs_per_mb)
 			break;
 		count = 1;
 		while (count < *nb_sb &&
-		       sb_id + count < vm->nb_sb_per_mb &&
+		       sb_id + count < vm->sbm.sbs_per_mb &&
 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
 			count++;
 
@@ -1288,12 +1288,12 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
 
 		/* fake-online the pages if the memory block is online */
 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
-			       sb_id * vm->subblock_size);
-		nr_pages = PFN_DOWN(count * vm->subblock_size);
+			       sb_id * vm->sbm.sb_size);
+		nr_pages = PFN_DOWN(count * vm->sbm.sb_size);
 		virtio_mem_fake_online(pfn, nr_pages);
 	}
 
-	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
 		if (online)
 			virtio_mem_sbm_set_mb_state(vm, mb_id,
 						    VIRTIO_MEM_SBM_MB_ONLINE);
@@ -1310,7 +1310,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
  */
 static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 {
-	uint64_t nb_sb = diff / vm->subblock_size;
+	uint64_t nb_sb = diff / vm->sbm.sb_size;
 	unsigned long mb_id;
 	int rc;
 
@@ -1393,13 +1393,13 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
 	rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
 
 	/* some subblocks might have been unplugged even on failure */
-	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
+	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
 		virtio_mem_sbm_set_mb_state(vm, mb_id,
 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
 	if (rc)
 		return rc;
 
-	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
 		/*
 		 * Remove the block from Linux - this should never fail.
 		 * Hinder the block from getting onlined by marking it
@@ -1426,12 +1426,12 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
 					   unsigned long mb_id, int sb_id,
 					   int count)
 {
-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size) * count;
 	unsigned long start_pfn;
 	int rc;
 
 	start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
-			     sb_id * vm->subblock_size);
+			     sb_id * vm->sbm.sb_size);
 
 	rc = virtio_mem_fake_offline(start_pfn, nr_pages);
 	if (rc)
@@ -1467,19 +1467,19 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
 	int rc, sb_id;
 
 	/* If possible, try to unplug the complete block in one shot. */
-	if (*nb_sb >= vm->nb_sb_per_mb &&
-	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (*nb_sb >= vm->sbm.sbs_per_mb &&
+	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
 		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
-						     vm->nb_sb_per_mb);
+						     vm->sbm.sbs_per_mb);
 		if (!rc) {
-			*nb_sb -= vm->nb_sb_per_mb;
+			*nb_sb -= vm->sbm.sbs_per_mb;
 			goto unplugged;
 		} else if (rc != -EBUSY)
 			return rc;
 	}
 
 	/* Fallback to single subblocks. */
-	for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
+	for (sb_id = vm->sbm.sbs_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
 		/* Find the next candidate subblock */
 		while (sb_id >= 0 &&
 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
@@ -1501,7 +1501,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
 	 * remove it. This will usually not fail, as no memory is in use
 	 * anymore - however some other notifiers might NACK the request.
 	 */
-	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
 		mutex_unlock(&vm->hotplug_mutex);
 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
 		mutex_lock(&vm->hotplug_mutex);
@@ -1518,7 +1518,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
  */
 static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 {
-	uint64_t nb_sb = diff / vm->subblock_size;
+	uint64_t nb_sb = diff / vm->sbm.sb_size;
 	unsigned long mb_id;
 	int rc;
 
@@ -1805,11 +1805,11 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-				  pageblock_nr_pages) * PAGE_SIZE;
-	vm->subblock_size = max_t(uint64_t, vm->device_block_size,
-				  vm->subblock_size);
-	vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
+	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
+				pageblock_nr_pages) * PAGE_SIZE;
+	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
+				vm->sbm.sb_size);
+	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
 
 	/* Round up to the next full memory block */
 	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
@@ -1827,7 +1827,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
 		 memory_block_size_bytes());
 	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
-		 (unsigned long long)vm->subblock_size);
+		 (unsigned long long)vm->sbm.sb_size);
 	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 21/29] virtio-mem: memory notifier callbacks are specific to Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (19 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size " David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-19  1:57   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 22/29] virtio-mem: memory block ids " David Hildenbrand
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's rename accordingly.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 3a772714fec9..d06c8760b337 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -589,8 +589,8 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
 	return start >= vm->addr && start + size <= vm->addr + vm->region_size;
 }
 
-static int virtio_mem_notify_going_online(struct virtio_mem *vm,
-					  unsigned long mb_id)
+static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
+					      unsigned long mb_id)
 {
 	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
 	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
@@ -604,8 +604,8 @@ static int virtio_mem_notify_going_online(struct virtio_mem *vm,
 	return NOTIFY_BAD;
 }
 
-static void virtio_mem_notify_offline(struct virtio_mem *vm,
-				      unsigned long mb_id)
+static void virtio_mem_sbm_notify_offline(struct virtio_mem *vm,
+					  unsigned long mb_id)
 {
 	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
 	case VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL:
@@ -622,7 +622,8 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
 	}
 }
 
-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
+static void virtio_mem_sbm_notify_online(struct virtio_mem *vm,
+					 unsigned long mb_id)
 {
 	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
 	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
@@ -639,8 +640,8 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 	}
 }
 
-static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
-					    unsigned long mb_id)
+static void virtio_mem_sbm_notify_going_offline(struct virtio_mem *vm,
+						unsigned long mb_id)
 {
 	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
 	unsigned long pfn;
@@ -655,8 +656,8 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
 	}
 }
 
-static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
-					     unsigned long mb_id)
+static void virtio_mem_sbm_notify_cancel_offline(struct virtio_mem *vm,
+						 unsigned long mb_id)
 {
 	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
 	unsigned long pfn;
@@ -716,7 +717,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 			break;
 		}
 		vm->hotplug_active = true;
-		virtio_mem_notify_going_offline(vm, mb_id);
+		virtio_mem_sbm_notify_going_offline(vm, mb_id);
 		break;
 	case MEM_GOING_ONLINE:
 		mutex_lock(&vm->hotplug_mutex);
@@ -726,10 +727,10 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 			break;
 		}
 		vm->hotplug_active = true;
-		rc = virtio_mem_notify_going_online(vm, mb_id);
+		rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
 		break;
 	case MEM_OFFLINE:
-		virtio_mem_notify_offline(vm, mb_id);
+		virtio_mem_sbm_notify_offline(vm, mb_id);
 
 		atomic64_add(size, &vm->offline_size);
 		/*
@@ -743,7 +744,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
 	case MEM_ONLINE:
-		virtio_mem_notify_online(vm, mb_id);
+		virtio_mem_sbm_notify_online(vm, mb_id);
 
 		atomic64_sub(size, &vm->offline_size);
 		/*
@@ -762,7 +763,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 	case MEM_CANCEL_OFFLINE:
 		if (!vm->hotplug_active)
 			break;
-		virtio_mem_notify_cancel_offline(vm, mb_id);
+		virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 22/29] virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (20 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 21/29] virtio-mem: memory notifier callbacks " David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:54   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 23/29] virtio-mem: factor out adding/removing memory from Linux David Hildenbrand
                   ` (8 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's move first_mb_id/next_mb_id/last_usable_mb_id accordingly.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 44 ++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index d06c8760b337..d3ab04f655ee 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -96,13 +96,6 @@ struct virtio_mem {
 	/* Maximum region size in bytes. */
 	uint64_t region_size;
 
-	/* Id of the first memory block of this device. */
-	unsigned long first_mb_id;
-	/* Id of the last usable memory block of this device. */
-	unsigned long last_usable_mb_id;
-	/* Id of the next memory bock to prepare when needed. */
-	unsigned long next_mb_id;
-
 	/* The parent resource for all memory added via this device. */
 	struct resource *parent_resource;
 	/*
@@ -121,6 +114,13 @@ struct virtio_mem {
 	uint64_t offline_threshold;
 
 	struct {
+		/* Id of the first memory block of this device. */
+		unsigned long first_mb_id;
+		/* Id of the last usable memory block of this device. */
+		unsigned long last_usable_mb_id;
+		/* Id of the next memory bock to prepare when needed. */
+		unsigned long next_mb_id;
+
 		/* The subblock size. */
 		uint64_t sb_size;
 		/* The number of subblocks per Linux memory block. */
@@ -265,7 +265,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
 static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
 					unsigned long mb_id, uint8_t state)
 {
-	const unsigned long idx = mb_id - vm->first_mb_id;
+	const unsigned long idx = mb_id - vm->sbm.first_mb_id;
 	uint8_t old_state;
 
 	old_state = vm->sbm.mb_states[idx];
@@ -282,7 +282,7 @@ static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
 static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
 					   unsigned long mb_id)
 {
-	const unsigned long idx = mb_id - vm->first_mb_id;
+	const unsigned long idx = mb_id - vm->sbm.first_mb_id;
 
 	return vm->sbm.mb_states[idx];
 }
@@ -292,7 +292,7 @@ static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
  */
 static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
 {
-	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
+	unsigned long old_bytes = vm->sbm.next_mb_id - vm->sbm.first_mb_id;
 	unsigned long new_bytes = old_bytes + 1;
 	int old_pages = PFN_UP(old_bytes);
 	int new_pages = PFN_UP(new_bytes);
@@ -316,14 +316,14 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
 }
 
 #define virtio_mem_sbm_for_each_mb(_vm, _mb_id, _state) \
-	for (_mb_id = _vm->first_mb_id; \
-	     _mb_id < _vm->next_mb_id && _vm->sbm.mb_count[_state]; \
+	for (_mb_id = _vm->sbm.first_mb_id; \
+	     _mb_id < _vm->sbm.next_mb_id && _vm->sbm.mb_count[_state]; \
 	     _mb_id++) \
 		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
 
 #define virtio_mem_sbm_for_each_mb_rev(_vm, _mb_id, _state) \
-	for (_mb_id = _vm->next_mb_id - 1; \
-	     _mb_id >= _vm->first_mb_id && _vm->sbm.mb_count[_state]; \
+	for (_mb_id = _vm->sbm.next_mb_id - 1; \
+	     _mb_id >= _vm->sbm.first_mb_id && _vm->sbm.mb_count[_state]; \
 	     _mb_id--) \
 		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
 
@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
 static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
 					  unsigned long mb_id, int sb_id)
 {
-	return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
+	return (mb_id - vm->sbm.first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
 }
 
 /*
@@ -414,7 +414,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
  */
 static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
 {
-	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
+	const unsigned long old_nb_mb = vm->sbm.next_mb_id - vm->sbm.first_mb_id;
 	const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
 	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
@@ -1177,7 +1177,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
 {
 	int rc;
 
-	if (vm->next_mb_id > vm->last_usable_mb_id)
+	if (vm->sbm.next_mb_id > vm->sbm.last_usable_mb_id)
 		return -ENOSPC;
 
 	/* Resize the state array if required. */
@@ -1191,7 +1191,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
 		return rc;
 
 	vm->sbm.mb_count[VIRTIO_MEM_SBM_MB_UNUSED]++;
-	*mb_id = vm->next_mb_id++;
+	*mb_id = vm->sbm.next_mb_id++;
 	return 0;
 }
 
@@ -1622,7 +1622,7 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
 			usable_region_size, &usable_region_size);
 	end_addr = vm->addr + usable_region_size;
 	end_addr = min(end_addr, phys_limit);
-	vm->last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
+	vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
 
 	/* see if there is a request to change the size */
 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
@@ -1813,9 +1813,9 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
 
 	/* Round up to the next full memory block */
-	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
-						   memory_block_size_bytes());
-	vm->next_mb_id = vm->first_mb_id;
+	vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
+						       memory_block_size_bytes());
+	vm->sbm.next_mb_id = vm->sbm.first_mb_id;
 
 	/* Prepare the offline threshold - make sure we can add two blocks. */
 	vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 23/29] virtio-mem: factor out adding/removing memory from Linux
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (21 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 22/29] virtio-mem: memory block ids " David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  8:59   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 24/29] virtio-mem: print debug messages from virtio_mem_send_*_request() David Hildenbrand
                   ` (7 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's use wrappers for the low-level functions that dev_dbg/dev_warn
and work on addr + size, such that we can reuse them for adding/removing
in other granularity.

We only warn when adding memory failed, because that's something to pay
attention to. We won't warn when removing failed, we'll reuse that in
racy context soon (and we do have proper BUG_ON() statements in the
current cases where it must never happen).

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 107 ++++++++++++++++++++++++------------
 1 file changed, 73 insertions(+), 34 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index d3ab04f655ee..eb2ad31a8d8a 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -453,18 +453,16 @@ static bool virtio_mem_could_add_memory(struct virtio_mem *vm, uint64_t size)
 }
 
 /*
- * Try to add a memory block to Linux. This will usually only fail
- * if out of memory.
+ * Try adding memory to Linux. Will usually only fail if out of memory.
  *
  * Must not be called with the vm->hotplug_mutex held (possible deadlock with
  * onlining code).
  *
- * Will not modify the state of the memory block.
+ * Will not modify the state of memory blocks in virtio-mem.
  */
-static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
+static int virtio_mem_add_memory(struct virtio_mem *vm, uint64_t addr,
+				 uint64_t size)
 {
-	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
-	const uint64_t size = memory_block_size_bytes();
 	int rc;
 
 	/*
@@ -478,32 +476,50 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 			return -ENOMEM;
 	}
 
-	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
+	dev_dbg(&vm->vdev->dev, "adding memory: 0x%llx - 0x%llx\n", addr,
+		addr + size - 1);
 	/* Memory might get onlined immediately. */
 	atomic64_add(size, &vm->offline_size);
 	rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
 				       MEMHP_MERGE_RESOURCE);
-	if (rc)
+	if (rc) {
 		atomic64_sub(size, &vm->offline_size);
+		dev_warn(&vm->vdev->dev, "adding memory failed: %d\n", rc);
+		/*
+		 * TODO: Linux MM does not properly clean up yet in all cases
+		 * where adding of memory failed - especially on -ENOMEM.
+		 */
+	}
 	return rc;
 }
 
 /*
- * Try to remove a memory block from Linux. Will only fail if the memory block
- * is not offline.
+ * See virtio_mem_add_memory(): Try adding a single Linux memory block.
+ */
+static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
+{
+	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	const uint64_t size = memory_block_size_bytes();
+
+	return virtio_mem_add_memory(vm, addr, size);
+}
+
+/*
+ * Try removing memory from Linux. Will only fail if memory blocks aren't
+ * offline.
  *
  * Must not be called with the vm->hotplug_mutex held (possible deadlock with
  * onlining code).
  *
- * Will not modify the state of the memory block.
+ * Will not modify the state of memory blocks in virtio-mem.
  */
-static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
+static int virtio_mem_remove_memory(struct virtio_mem *vm, uint64_t addr,
+				    uint64_t size)
 {
-	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
-	const uint64_t size = memory_block_size_bytes();
 	int rc;
 
-	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
+	dev_dbg(&vm->vdev->dev, "removing memory: 0x%llx - 0x%llx\n", addr,
+		addr + size - 1);
 	rc = remove_memory(vm->nid, addr, size);
 	if (!rc) {
 		atomic64_sub(size, &vm->offline_size);
@@ -512,27 +528,41 @@ static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
 		 * immediately instead of waiting.
 		 */
 		virtio_mem_retry(vm);
+	} else {
+		dev_dbg(&vm->vdev->dev, "removing memory failed: %d\n", rc);
 	}
 	return rc;
 }
 
 /*
- * Try to offline and remove a memory block from Linux.
+ * See virtio_mem_remove_memory(): Try removing a single Linux memory block.
+ */
+static int virtio_mem_sbm_remove_mb(struct virtio_mem *vm, unsigned long mb_id)
+{
+	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	const uint64_t size = memory_block_size_bytes();
+
+	return virtio_mem_remove_memory(vm, addr, size);
+}
+
+/*
+ * Try offlining and removing memory from Linux.
  *
  * Must not be called with the vm->hotplug_mutex held (possible deadlock with
  * onlining code).
  *
- * Will not modify the state of the memory block.
+ * Will not modify the state of memory blocks in virtio-mem.
  */
-static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
-					    unsigned long mb_id)
+static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm,
+						uint64_t addr,
+						uint64_t size)
 {
-	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
-	const uint64_t size = memory_block_size_bytes();
 	int rc;
 
-	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
-		mb_id);
+	dev_dbg(&vm->vdev->dev,
+		"offlining and removing memory: 0x%llx - 0x%llx\n", addr,
+		addr + size - 1);
+
 	rc = offline_and_remove_memory(vm->nid, addr, size);
 	if (!rc) {
 		atomic64_sub(size, &vm->offline_size);
@@ -541,10 +571,26 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
 		 * immediately instead of waiting.
 		 */
 		virtio_mem_retry(vm);
+	} else {
+		dev_dbg(&vm->vdev->dev,
+			"offlining and removing memory failed: %d\n", rc);
 	}
 	return rc;
 }
 
+/*
+ * See virtio_mem_offline_and_remove_memory(): Try offlining and removing
+ * a single Linux memory block.
+ */
+static int virtio_mem_sbm_offline_and_remove_mb(struct virtio_mem *vm,
+						unsigned long mb_id)
+{
+	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+	const uint64_t size = memory_block_size_bytes();
+
+	return virtio_mem_offline_and_remove_memory(vm, addr, size);
+}
+
 /*
  * Trigger the workqueue so the device can perform its magic.
  */
@@ -1230,17 +1276,10 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
 
 	/* Add the memory block to linux - if that fails, try to unplug. */
-	rc = virtio_mem_mb_add(vm, mb_id);
+	rc = virtio_mem_sbm_add_mb(vm, mb_id);
 	if (rc) {
 		int new_state = VIRTIO_MEM_SBM_MB_UNUSED;
 
-		dev_err(&vm->vdev->dev,
-			"adding memory block %lu failed with %d\n", mb_id, rc);
-
-		/*
-		 * TODO: Linux MM does not properly clean up yet in all cases
-		 * where adding of memory failed - especially on -ENOMEM.
-		 */
 		if (virtio_mem_sbm_unplug_sb(vm, mb_id, 0, count))
 			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
 		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
@@ -1411,7 +1450,7 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
 					    VIRTIO_MEM_SBM_MB_UNUSED);
 
 		mutex_unlock(&vm->hotplug_mutex);
-		rc = virtio_mem_mb_remove(vm, mb_id);
+		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
 		BUG_ON(rc);
 		mutex_lock(&vm->hotplug_mutex);
 	}
@@ -1504,7 +1543,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
 	 */
 	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
 		mutex_unlock(&vm->hotplug_mutex);
-		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
+		rc = virtio_mem_sbm_offline_and_remove_mb(vm, mb_id);
 		mutex_lock(&vm->hotplug_mutex);
 		if (!rc)
 			virtio_mem_sbm_set_mb_state(vm, mb_id,
@@ -1991,7 +2030,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	 */
 	virtio_mem_sbm_for_each_mb(vm, mb_id,
 				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
-		rc = virtio_mem_mb_remove(vm, mb_id);
+		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
 		BUG_ON(rc);
 		virtio_mem_sbm_set_mb_state(vm, mb_id,
 					    VIRTIO_MEM_SBM_MB_UNUSED);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 24/29] virtio-mem: print debug messages from virtio_mem_send_*_request()
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (22 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 23/29] virtio-mem: factor out adding/removing memory from Linux David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  9:07   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug David Hildenbrand
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta

Let's move the existing dev_dbg() into the functions, print if something
went wrong, and also print for virtio_mem_send_unplug_all_request().

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 50 ++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index eb2ad31a8d8a..e68d0d99590c 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1053,23 +1053,33 @@ static int virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
 		.u.plug.addr = cpu_to_virtio64(vm->vdev, addr),
 		.u.plug.nb_blocks = cpu_to_virtio16(vm->vdev, nb_vm_blocks),
 	};
+	int rc = -ENOMEM;
 
 	if (atomic_read(&vm->config_changed))
 		return -EAGAIN;
 
+	dev_dbg(&vm->vdev->dev, "plugging memory: 0x%llx - 0x%llx\n", addr,
+		addr + size - 1);
+
 	switch (virtio_mem_send_request(vm, &req)) {
 	case VIRTIO_MEM_RESP_ACK:
 		vm->plugged_size += size;
 		return 0;
 	case VIRTIO_MEM_RESP_NACK:
-		return -EAGAIN;
+		rc = -EAGAIN;
+		break;
 	case VIRTIO_MEM_RESP_BUSY:
-		return -ETXTBSY;
+		rc = -ETXTBSY;
+		break;
 	case VIRTIO_MEM_RESP_ERROR:
-		return -EINVAL;
+		rc = -EINVAL;
+		break;
 	default:
-		return -ENOMEM;
+		break;
 	}
+
+	dev_dbg(&vm->vdev->dev, "plugging memory failed: %d\n", rc);
+	return rc;
 }
 
 static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
@@ -1081,21 +1091,30 @@ static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
 		.u.unplug.addr = cpu_to_virtio64(vm->vdev, addr),
 		.u.unplug.nb_blocks = cpu_to_virtio16(vm->vdev, nb_vm_blocks),
 	};
+	int rc = -ENOMEM;
 
 	if (atomic_read(&vm->config_changed))
 		return -EAGAIN;
 
+	dev_dbg(&vm->vdev->dev, "unplugging memory: 0x%llx - 0x%llx\n", addr,
+		addr + size - 1);
+
 	switch (virtio_mem_send_request(vm, &req)) {
 	case VIRTIO_MEM_RESP_ACK:
 		vm->plugged_size -= size;
 		return 0;
 	case VIRTIO_MEM_RESP_BUSY:
-		return -ETXTBSY;
+		rc = -ETXTBSY;
+		break;
 	case VIRTIO_MEM_RESP_ERROR:
-		return -EINVAL;
+		rc = -EINVAL;
+		break;
 	default:
-		return -ENOMEM;
+		break;
 	}
+
+	dev_dbg(&vm->vdev->dev, "unplugging memory failed: %d\n", rc);
+	return rc;
 }
 
 static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
@@ -1103,6 +1122,9 @@ static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
 	const struct virtio_mem_req req = {
 		.type = cpu_to_virtio16(vm->vdev, VIRTIO_MEM_REQ_UNPLUG_ALL),
 	};
+	int rc = -ENOMEM;
+
+	dev_dbg(&vm->vdev->dev, "unplugging all memory");
 
 	switch (virtio_mem_send_request(vm, &req)) {
 	case VIRTIO_MEM_RESP_ACK:
@@ -1112,10 +1134,14 @@ static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
 		atomic_set(&vm->config_changed, 1);
 		return 0;
 	case VIRTIO_MEM_RESP_BUSY:
-		return -ETXTBSY;
+		rc = -ETXTBSY;
+		break;
 	default:
-		return -ENOMEM;
+		break;
 	}
+
+	dev_dbg(&vm->vdev->dev, "unplugging all memory failed: %d\n", rc);
+	return rc;
 }
 
 /*
@@ -1130,9 +1156,6 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
 	const uint64_t size = count * vm->sbm.sb_size;
 	int rc;
 
-	dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
-		sb_id, sb_id + count - 1);
-
 	rc = virtio_mem_send_plug_request(vm, addr, size);
 	if (!rc)
 		virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
@@ -1151,9 +1174,6 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
 	const uint64_t size = count * vm->sbm.sb_size;
 	int rc;
 
-	dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
-		mb_id, sb_id, sb_id + count - 1);
-
 	rc = virtio_mem_send_unplug_request(vm, addr, size);
 	if (!rc)
 		virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (23 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 24/29] virtio-mem: print debug messages from virtio_mem_send_*_request() David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-16  9:38   ` Wei Yang
  2020-10-19  2:26   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 26/29] virtio-mem: allow to force Big Block Mode (BBM) and set the big block size David Hildenbrand
                   ` (5 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

Currently, we do not support device block sizes that exceed the Linux
memory block size. For example, having a device block size of 1 GiB (e.g.,
gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
blocks.

Let's implement Big Block Mode (BBM), whereby we add/remove at least
one Linux memory block at a time. With a 1 GiB device block size, a Big
Block (BB) will cover 8 Linux memory blocks.

We'll keep registering the online_page_callback machinery, it will be used
for safe memory hotunplug in BBM next.

Note: BBM is properly prepared for variable-sized Linux memory
blocks that we might see in the future. So we won't care how many Linux
memory blocks a big block actually spans, and how the memory notifier is
called.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
 1 file changed, 402 insertions(+), 82 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index e68d0d99590c..4d396ef98a92 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
 /*
  * virtio-mem currently supports the following modes of operation:
  *
- * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
+ * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
  *   size of a Sub Block (SB) is determined based on the device block size, the
  *   pageblock size, and the maximum allocation granularity of the buddy.
  *   Subblocks within a Linux memory block might either be plugged or unplugged.
  *   Memory is added/removed to Linux MM in Linux memory block granularity.
  *
+ * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
+ *   Memory is added/removed to Linux MM in Big Block granularity.
+ *
+ * The mode is determined automatically based on the Linux memory block size
+ * and the device block size.
+ *
  * User space / core MM (auto onlining) is responsible for onlining added
  * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
  * always onlined separately, and all memory within a Linux memory block is
@@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
 	VIRTIO_MEM_SBM_MB_COUNT
 };
 
+/*
+ * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
+ */
+enum virtio_mem_bbm_bb_state {
+	/* Unplugged, not added to Linux. Can be reused later. */
+	VIRTIO_MEM_BBM_BB_UNUSED = 0,
+	/* Plugged, not added to Linux. Error on add_memory(). */
+	VIRTIO_MEM_BBM_BB_PLUGGED,
+	/* Plugged and added to Linux. */
+	VIRTIO_MEM_BBM_BB_ADDED,
+	VIRTIO_MEM_BBM_BB_COUNT
+};
+
 struct virtio_mem {
 	struct virtio_device *vdev;
 
@@ -113,6 +132,9 @@ struct virtio_mem {
 	atomic64_t offline_size;
 	uint64_t offline_threshold;
 
+	/* If set, the driver is in SBM, otherwise in BBM. */
+	bool in_sbm;
+
 	struct {
 		/* Id of the first memory block of this device. */
 		unsigned long first_mb_id;
@@ -151,9 +173,27 @@ struct virtio_mem {
 		unsigned long *sb_states;
 	} sbm;
 
+	struct {
+		/* Id of the first big block of this device. */
+		unsigned long first_bb_id;
+		/* Id of the last usable big block of this device. */
+		unsigned long last_usable_bb_id;
+		/* Id of the next device bock to prepare when needed. */
+		unsigned long next_bb_id;
+
+		/* Summary of all big block states. */
+		unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
+
+		/* One byte state per big block. See sbm.mb_states. */
+		uint8_t *bb_states;
+
+		/* The block size used for (un)plugged, adding/removing. */
+		uint64_t bb_size;
+	} bbm;
+
 	/*
-	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and
-	 * sbm.sb_states.
+	 * Mutex that protects the sbm.mb_count, sbm.mb_states,
+	 * sbm.sb_states, bbm.bb_count, and bbm.bb_states
 	 *
 	 * When this lock is held the pointers can't change, ONLINE and
 	 * OFFLINE blocks can't change the state and no subblocks will get
@@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
 	return mb_id * memory_block_size_bytes();
 }
 
+/*
+ * Calculate the big block id of a given address.
+ */
+static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
+					      uint64_t addr)
+{
+	return addr / vm->bbm.bb_size;
+}
+
+/*
+ * Calculate the physical start address of a given big block id.
+ */
+static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
+					 unsigned long bb_id)
+{
+	return bb_id * vm->bbm.bb_size;
+}
+
 /*
  * Calculate the subblock id of a given address.
  */
@@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
 	return (addr - mb_addr) / vm->sbm.sb_size;
 }
 
+/*
+ * Set the state of a big block, taking care of the state counter.
+ */
+static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
+					unsigned long bb_id,
+					enum virtio_mem_bbm_bb_state state)
+{
+	const unsigned long idx = bb_id - vm->bbm.first_bb_id;
+	enum virtio_mem_bbm_bb_state old_state;
+
+	old_state = vm->bbm.bb_states[idx];
+	vm->bbm.bb_states[idx] = state;
+
+	BUG_ON(vm->bbm.bb_count[old_state] == 0);
+	vm->bbm.bb_count[old_state]--;
+	vm->bbm.bb_count[state]++;
+}
+
+/*
+ * Get the state of a big block.
+ */
+static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
+								unsigned long bb_id)
+{
+	return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
+}
+
+/*
+ * Prepare the big block state array for the next big block.
+ */
+static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
+{
+	unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
+	unsigned long new_bytes = old_bytes + 1;
+	int old_pages = PFN_UP(old_bytes);
+	int new_pages = PFN_UP(new_bytes);
+	uint8_t *new_array;
+
+	if (vm->bbm.bb_states && old_pages == new_pages)
+		return 0;
+
+	new_array = vzalloc(new_pages * PAGE_SIZE);
+	if (!new_array)
+		return -ENOMEM;
+
+	mutex_lock(&vm->hotplug_mutex);
+	if (vm->bbm.bb_states)
+		memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
+	vfree(vm->bbm.bb_states);
+	vm->bbm.bb_states = new_array;
+	mutex_unlock(&vm->hotplug_mutex);
+
+	return 0;
+}
+
+#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
+	for (_bb_id = vm->bbm.first_bb_id; \
+	     _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
+	     _bb_id++) \
+		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
+
 /*
  * Set the state of a memory block, taking care of the state counter.
  */
@@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
 	return virtio_mem_add_memory(vm, addr, size);
 }
 
+/*
+ * See virtio_mem_add_memory(): Try adding a big block.
+ */
+static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+	const uint64_t size = vm->bbm.bb_size;
+
+	return virtio_mem_add_memory(vm, addr, size);
+}
+
 /*
  * Try removing memory from Linux. Will only fail if memory blocks aren't
  * offline.
@@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 	struct memory_notify *mhp = arg;
 	const unsigned long start = PFN_PHYS(mhp->start_pfn);
 	const unsigned long size = PFN_PHYS(mhp->nr_pages);
-	const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
 	int rc = NOTIFY_OK;
+	unsigned long id;
 
 	if (!virtio_mem_overlaps_range(vm, start, size))
 		return NOTIFY_DONE;
 
-	/*
-	 * Memory is onlined/offlined in memory block granularity. We cannot
-	 * cross virtio-mem device boundaries and memory block boundaries. Bail
-	 * out if this ever changes.
-	 */
-	if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
-			 !IS_ALIGNED(start, memory_block_size_bytes())))
-		return NOTIFY_BAD;
+	if (vm->in_sbm) {
+		id = virtio_mem_phys_to_mb_id(start);
+		/*
+		 * In SBM, we add memory in separate memory blocks - we expect
+		 * it to be onlined/offlined in the same granularity. Bail out
+		 * if this ever changes.
+		 */
+		if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
+				 !IS_ALIGNED(start, memory_block_size_bytes())))
+			return NOTIFY_BAD;
+	} else {
+		id = virtio_mem_phys_to_bb_id(vm, start);
+		/*
+		 * In BBM, we only care about onlining/offlining happening
+		 * within a single big block, we don't care about the
+		 * actual granularity as we don't track individual Linux
+		 * memory blocks.
+		 */
+		if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
+			return NOTIFY_BAD;
+	}
 
 	/*
 	 * Avoid circular locking lockdep warnings. We lock the mutex
@@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 			break;
 		}
 		vm->hotplug_active = true;
-		virtio_mem_sbm_notify_going_offline(vm, mb_id);
+		if (vm->in_sbm)
+			virtio_mem_sbm_notify_going_offline(vm, id);
 		break;
 	case MEM_GOING_ONLINE:
 		mutex_lock(&vm->hotplug_mutex);
@@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 			break;
 		}
 		vm->hotplug_active = true;
-		rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
+		if (vm->in_sbm)
+			rc = virtio_mem_sbm_notify_going_online(vm, id);
 		break;
 	case MEM_OFFLINE:
-		virtio_mem_sbm_notify_offline(vm, mb_id);
+		if (vm->in_sbm)
+			virtio_mem_sbm_notify_offline(vm, id);
 
 		atomic64_add(size, &vm->offline_size);
 		/*
@@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
 	case MEM_ONLINE:
-		virtio_mem_sbm_notify_online(vm, mb_id);
+		if (vm->in_sbm)
+			virtio_mem_sbm_notify_online(vm, id);
 
 		atomic64_sub(size, &vm->offline_size);
 		/*
@@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 	case MEM_CANCEL_OFFLINE:
 		if (!vm->hotplug_active)
 			break;
-		virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
+		if (vm->in_sbm)
+			virtio_mem_sbm_notify_cancel_offline(vm, id);
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
@@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
 static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 {
 	const unsigned long addr = page_to_phys(page);
-	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
+	unsigned long id, sb_id;
 	struct virtio_mem *vm;
-	int sb_id;
+	bool do_online;
 
-	/*
-	 * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
-	 * size/alignment and that this callback is is called with such a
-	 * size/alignment. So we cannot cross subblocks and therefore
-	 * also not memory blocks.
-	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
 		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
 			continue;
 
-		sb_id = virtio_mem_phys_to_sb_id(vm, addr);
-		/*
-		 * If plugged, online the pages, otherwise, set them fake
-		 * offline (PageOffline).
-		 */
-		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
+		if (vm->in_sbm) {
+			/*
+			 * We exploit here that subblocks have at least
+			 * MAX_ORDER_NR_PAGES size/alignment - so we cannot
+			 * cross subblocks within one call.
+			 */
+			id = virtio_mem_phys_to_mb_id(addr);
+			sb_id = virtio_mem_phys_to_sb_id(vm, addr);
+			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
+								   sb_id, 1);
+		} else {
+			do_online = true;
+		}
+		if (do_online)
 			generic_online_page(page, order);
 		else
 			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
@@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
 	return rc;
 }
 
+/*
+ * Request to unplug a big block.
+ *
+ * Will not modify the state of the big block.
+ */
+static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+	const uint64_t size = vm->bbm.bb_size;
+
+	return virtio_mem_send_unplug_request(vm, addr, size);
+}
+
+/*
+ * Request to plug a big block.
+ *
+ * Will not modify the state of the big block.
+ */
+static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+	const uint64_t size = vm->bbm.bb_size;
+
+	return virtio_mem_send_plug_request(vm, addr, size);
+}
+
 /*
  * Unplug the desired number of plugged subblocks of a offline or not-added
  * memory block. Will fail if any subblock cannot get unplugged (instead of
@@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
 	return 0;
 }
 
-/*
- * Try to plug the requested amount of memory.
- */
-static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
+static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
 {
 	uint64_t nb_sb = diff / vm->sbm.sb_size;
 	unsigned long mb_id;
@@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
 	return rc;
 }
 
+/*
+ * Plug a big block and add it to Linux.
+ *
+ * Will modify the state of the big block.
+ */
+static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
+					  unsigned long bb_id)
+{
+	int rc;
+
+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
+			 VIRTIO_MEM_BBM_BB_UNUSED))
+		return -EINVAL;
+
+	rc = virtio_mem_bbm_plug_bb(vm, bb_id);
+	if (rc)
+		return rc;
+	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
+
+	rc = virtio_mem_bbm_add_bb(vm, bb_id);
+	if (rc) {
+		if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
+			virtio_mem_bbm_set_bb_state(vm, bb_id,
+						    VIRTIO_MEM_BBM_BB_UNUSED);
+		else
+			/* Retry from the main loop. */
+			virtio_mem_bbm_set_bb_state(vm, bb_id,
+						    VIRTIO_MEM_BBM_BB_PLUGGED);
+		return rc;
+	}
+	return 0;
+}
+
+/*
+ * Prepare tracking data for the next big block.
+ */
+static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
+					  unsigned long *bb_id)
+{
+	int rc;
+
+	if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
+		return -ENOSPC;
+
+	/* Resize the big block state array if required. */
+	rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
+	if (rc)
+		return rc;
+
+	vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
+	*bb_id = vm->bbm.next_bb_id;
+	vm->bbm.next_bb_id++;
+	return 0;
+}
+
+static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
+{
+	uint64_t nb_bb = diff / vm->bbm.bb_size;
+	unsigned long bb_id;
+	int rc;
+
+	if (!nb_bb)
+		return 0;
+
+	/* Try to plug and add unused big blocks */
+	virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
+		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
+			return -ENOSPC;
+
+		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
+		if (!rc)
+			nb_bb--;
+		if (rc || !nb_bb)
+			return rc;
+		cond_resched();
+	}
+
+	/* Try to prepare, plug and add new big blocks */
+	while (nb_bb) {
+		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
+			return -ENOSPC;
+
+		rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
+		if (rc)
+			return rc;
+		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
+		if (!rc)
+			nb_bb--;
+		if (rc)
+			return rc;
+		cond_resched();
+	}
+
+	return 0;
+}
+
+/*
+ * Try to plug the requested amount of memory.
+ */
+static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
+{
+	if (vm->in_sbm)
+		return virtio_mem_sbm_plug_request(vm, diff);
+	return virtio_mem_bbm_plug_request(vm, diff);
+}
+
 /*
  * Unplug the desired number of plugged subblocks of an offline memory block.
  * Will fail if any subblock cannot get unplugged (instead of skipping it).
@@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
 	return 0;
 }
 
-/*
- * Try to unplug the requested amount of memory.
- */
-static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
+static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
 {
 	uint64_t nb_sb = diff / vm->sbm.sb_size;
 	unsigned long mb_id;
@@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	return rc;
 }
 
+/*
+ * Try to unplug the requested amount of memory.
+ */
+static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
+{
+	if (vm->in_sbm)
+		return virtio_mem_sbm_unplug_request(vm, diff);
+	return -EBUSY;
+}
+
 /*
  * Try to unplug all blocks that couldn't be unplugged before, for example,
  * because the hypervisor was busy.
  */
 static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
 {
-	unsigned long mb_id;
+	unsigned long id;
 	int rc;
 
-	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
-		rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
+	if (!vm->in_sbm) {
+		virtio_mem_bbm_for_each_bb(vm, id,
+					   VIRTIO_MEM_BBM_BB_PLUGGED) {
+			rc = virtio_mem_bbm_unplug_bb(vm, id);
+			if (rc)
+				return rc;
+			virtio_mem_bbm_set_bb_state(vm, id,
+						    VIRTIO_MEM_BBM_BB_UNUSED);
+		}
+		return 0;
+	}
+
+	virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
+		rc = virtio_mem_sbm_unplug_mb(vm, id);
 		if (rc)
 			return rc;
-		virtio_mem_sbm_set_mb_state(vm, mb_id,
+		virtio_mem_sbm_set_mb_state(vm, id,
 					    VIRTIO_MEM_SBM_MB_UNUSED);
 	}
 
@@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
 			usable_region_size, &usable_region_size);
 	end_addr = vm->addr + usable_region_size;
 	end_addr = min(end_addr, phys_limit);
-	vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
+
+	if (vm->in_sbm)
+		vm->sbm.last_usable_mb_id =
+					 virtio_mem_phys_to_mb_id(end_addr) - 1;
+	else
+		vm->bbm.last_usable_bb_id =
+				     virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
 
 	/* see if there is a request to change the size */
 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
@@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
 static int virtio_mem_init(struct virtio_mem *vm)
 {
 	const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
+	uint64_t sb_size, addr;
 	uint16_t node_id;
 
 	if (!vm->vdev->config->get) {
@@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	if (vm->nid == NUMA_NO_NODE)
 		vm->nid = memory_add_physaddr_to_nid(vm->addr);
 
-	/*
-	 * We always hotplug memory in memory block granularity. This way,
-	 * we have to wait for exactly one memory block to online.
-	 */
-	if (vm->device_block_size > memory_block_size_bytes()) {
-		dev_err(&vm->vdev->dev,
-			"The block size is not supported (too big).\n");
-		return -EINVAL;
-	}
-
 	/* bad device setup - warn only */
 	if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
 		dev_warn(&vm->vdev->dev,
@@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-				pageblock_nr_pages) * PAGE_SIZE;
-	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
-				vm->sbm.sb_size);
-	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
+	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
+			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
+
+	if (sb_size < memory_block_size_bytes()) {
+		/* SBM: At least two subblocks per Linux memory block. */
+		vm->in_sbm = true;
+		vm->sbm.sb_size = sb_size;
+		vm->sbm.sbs_per_mb = memory_block_size_bytes() /
+				     vm->sbm.sb_size;
+
+		/* Round up to the next full memory block */
+		addr = vm->addr + memory_block_size_bytes() - 1;
+		vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
+		vm->sbm.next_mb_id = vm->sbm.first_mb_id;
+	} else {
+		/* BBM: At least one Linux memory block. */
+		vm->bbm.bb_size = vm->device_block_size;
 
-	/* Round up to the next full memory block */
-	vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
-						       memory_block_size_bytes());
-	vm->sbm.next_mb_id = vm->sbm.first_mb_id;
+		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
+		vm->bbm.next_bb_id = vm->bbm.first_bb_id;
+	}
 
 	/* Prepare the offline threshold - make sure we can add two blocks. */
 	vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
+	/* In BBM, we also want at least two big blocks. */
+	vm->offline_threshold = max_t(uint64_t, 2 * vm->bbm.bb_size,
+				      vm->offline_threshold);
 
 	dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
 	dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
@@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm)
 		 (unsigned long long)vm->device_block_size);
 	dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
 		 memory_block_size_bytes());
-	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
-		 (unsigned long long)vm->sbm.sb_size);
+	if (vm->in_sbm)
+		dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
+			 (unsigned long long)vm->sbm.sb_size);
+	else
+		dev_info(&vm->vdev->dev, "big block size: 0x%llx",
+			 (unsigned long long)vm->bbm.bb_size);
 	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
 
@@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vm->wq);
 	hrtimer_cancel(&vm->retry_timer);
 
-	/*
-	 * After we unregistered our callbacks, user space can online partially
-	 * plugged offline blocks. Make sure to remove them.
-	 */
-	virtio_mem_sbm_for_each_mb(vm, mb_id,
-				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
-		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
-		BUG_ON(rc);
-		virtio_mem_sbm_set_mb_state(vm, mb_id,
-					    VIRTIO_MEM_SBM_MB_UNUSED);
+	if (vm->in_sbm) {
+		/*
+		 * After we unregistered our callbacks, user space can online
+		 * partially plugged offline blocks. Make sure to remove them.
+		 */
+		virtio_mem_sbm_for_each_mb(vm, mb_id,
+					   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
+			rc = virtio_mem_sbm_remove_mb(vm, mb_id);
+			BUG_ON(rc);
+			virtio_mem_sbm_set_mb_state(vm, mb_id,
+						    VIRTIO_MEM_SBM_MB_UNUSED);
+		}
+		/*
+		 * After we unregistered our callbacks, user space can no longer
+		 * offline partially plugged online memory blocks. No need to
+		 * worry about them.
+		 */
 	}
-	/*
-	 * After we unregistered our callbacks, user space can no longer
-	 * offline partially plugged online memory blocks. No need to worry
-	 * about them.
-	 */
 
 	/* unregister callbacks */
 	unregister_virtio_mem_device(vm);
@@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	}
 
 	/* remove all tracking data - no locking needed */
-	vfree(vm->sbm.mb_states);
-	vfree(vm->sbm.sb_states);
+	if (vm->in_sbm) {
+		vfree(vm->sbm.mb_states);
+		vfree(vm->sbm.sb_states);
+	} else {
+		vfree(vm->bbm.bb_states);
+	}
 
 	/* reset the device and cleanup the queues */
 	vdev->config->reset(vdev);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 26/29] virtio-mem: allow to force Big Block Mode (BBM) and set the big block size
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (24 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-12 12:53 ` [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block David Hildenbrand
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

Let's allow to force BBM, even if subblocks would be possible. Take care
of properly calculating the first big block id, because the start
address might no longer be aligned to the big block size.

Also, allow to manually configure the size of Big Blocks.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 4d396ef98a92..94cf44b15cbf 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -27,6 +27,16 @@ static bool unplug_online = true;
 module_param(unplug_online, bool, 0644);
 MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
 
+static bool force_bbm;
+module_param(force_bbm, bool, 0444);
+MODULE_PARM_DESC(force_bbm,
+		"Force Big Block Mode. Default is 0 (auto-selection)");
+
+static unsigned long bbm_block_size;
+module_param(bbm_block_size, ulong, 0444);
+MODULE_PARM_DESC(bbm_block_size,
+		 "Big Block size in bytes. Default is 0 (auto-detection).");
+
 /*
  * virtio-mem currently supports the following modes of operation:
  *
@@ -2164,7 +2174,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
 			pageblock_nr_pages) * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
-	if (sb_size < memory_block_size_bytes()) {
+	if (sb_size < memory_block_size_bytes() && !force_bbm) {
 		/* SBM: At least two subblocks per Linux memory block. */
 		vm->in_sbm = true;
 		vm->sbm.sb_size = sb_size;
@@ -2177,9 +2187,24 @@ static int virtio_mem_init(struct virtio_mem *vm)
 		vm->sbm.next_mb_id = vm->sbm.first_mb_id;
 	} else {
 		/* BBM: At least one Linux memory block. */
-		vm->bbm.bb_size = vm->device_block_size;
+		vm->bbm.bb_size = max_t(uint64_t, vm->device_block_size,
+					memory_block_size_bytes());
+
+		if (bbm_block_size) {
+			if (!is_power_of_2(bbm_block_size)) {
+				dev_warn(&vm->vdev->dev,
+					 "bbm_block_size is not a power of 2");
+			} else if (bbm_block_size < vm->bbm.bb_size) {
+				dev_warn(&vm->vdev->dev,
+					 "bbm_block_size is too small");
+			} else {
+				vm->bbm.bb_size = bbm_block_size;
+			}
+		}
 
-		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
+		/* Round up to the next aligned big block */
+		addr = vm->addr + vm->bbm.bb_size - 1;
+		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, addr);
 		vm->bbm.next_bb_id = vm->bbm.first_bb_id;
 	}
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (25 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 26/29] virtio-mem: allow to force Big Block Mode (BBM) and set the big block size David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-15 13:08   ` Michael S. Tsirkin
  2020-10-19  3:22   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug David Hildenbrand
                   ` (3 subsequent siblings)
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.

Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).

This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.

While we could compress the state into 2 bit, using 8 bit is much
easier.

This handling is similar, but different to acpi_scan_try_to_offline():

a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.

b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 89 insertions(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b44d4c7ba73b..217080ca93e5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t online_type = MMOP_ONLINE_KERNEL;
+	uint8_t **online_types = arg;
+	struct page *page;
+	int rc;
+
+	/*
+	 * Sense the online_type via the zone of the memory block. Offlining
+	 * with multiple zones within one memory block will be rejected
+	 * by offlining code ... so we don't care about that.
+	 */
+	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+		online_type = MMOP_ONLINE_MOVABLE;
+
+	rc = device_offline(&mem->dev);
+	/*
+	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+	 * so try_reonline_memory_block() can do the right thing.
+	 */
+	if (!rc)
+		**online_types = online_type;
+
+	(*online_types)++;
+	/* Ignore if already offline. */
+	return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t **online_types = arg;
+	int rc;
+
+	if (**online_types != MMOP_OFFLINE) {
+		mem->online_type = **online_types;
+		rc = device_online(&mem->dev);
+		if (rc < 0)
+			pr_warn("%s: Failed to re-online memory: %d",
+				__func__, rc);
+	}
+
+	/* Continue processing all remaining memory blocks. */
+	(*online_types)++;
+	return 0;
+}
+
 /*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
  */
 int offline_and_remove_memory(int nid, u64 start, u64 size)
 {
-	struct memory_block *mem;
-	int rc = -EINVAL;
+	const unsigned long mb_count = size / memory_block_size_bytes();
+	uint8_t *online_types, *tmp;
+	int rc;
 
 	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
-	    size != memory_block_size_bytes())
-		return rc;
+	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+		return -EINVAL;
+
+	/*
+	 * We'll remember the old online type of each memory block, so we can
+	 * try to revert whatever we did when offlining one memory block fails
+	 * after offlining some others succeeded.
+	 */
+	online_types = kmalloc_array(mb_count, sizeof(*online_types),
+				     GFP_KERNEL);
+	if (!online_types)
+		return -ENOMEM;
+	/*
+	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
+	 * try_reonline_memory_block().
+	 */
+	memset(online_types, MMOP_OFFLINE, mb_count);
 
 	lock_device_hotplug();
-	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
-	if (mem)
-		rc = device_offline(&mem->dev);
-	/* Ignore if the device is already offline. */
-	if (rc > 0)
-		rc = 0;
+
+	tmp = online_types;
+	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
 
 	/*
-	 * In case we succeeded to offline the memory block, remove it.
+	 * In case we succeeded to offline all memory, remove it.
 	 * This cannot fail as it cannot get onlined in the meantime.
 	 */
 	if (!rc) {
 		rc = try_remove_memory(nid, start, size);
-		WARN_ON_ONCE(rc);
+		if (rc)
+			pr_err("%s: Failed to remove memory: %d", __func__, rc);
+	}
+
+	/*
+	 * Rollback what we did. While memory onlining might theoretically fail
+	 * (nacked by a notifier), it barely ever happens.
+	 */
+	if (rc) {
+		tmp = online_types;
+		walk_memory_blocks(start, size, &tmp,
+				   try_reonline_memory_block);
 	}
 	unlock_device_hotplug();
 
+	kfree(online_types);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (26 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-19  3:48   ` Wei Yang
  2020-10-12 12:53 ` [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe " David Hildenbrand
                   ` (2 subsequent siblings)
  30 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

Let's try to unplug completely offline big blocks first. Then, (if
enabled via unplug_offline) try to offline and remove whole big blocks.

No locking necessary - we can deal with concurrent onlining/offlining
just fine.

Note1: This is sub-optimal and might be dangerous in some environments: we
could end up in an infinite loop when offlining (e.g., long-term pinnings),
similar as with DIMMs. We'll introduce safe memory hotunplug via
fake-offlining next, and use this basic mode only when explicitly enabled.

Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable
with bigger block sizes.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 156 +++++++++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 94cf44b15cbf..6bcd0acbff32 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -388,6 +388,12 @@ static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
 	     _bb_id++) \
 		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
 
+#define virtio_mem_bbm_for_each_bb_rev(_vm, _bb_id, _state) \
+	for (_bb_id = vm->bbm.next_bb_id - 1; \
+	     _bb_id >= vm->bbm.first_bb_id && _vm->bbm.bb_count[_state]; \
+	     _bb_id--) \
+		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
+
 /*
  * Set the state of a memory block, taking care of the state counter.
  */
@@ -685,6 +691,18 @@ static int virtio_mem_sbm_remove_mb(struct virtio_mem *vm, unsigned long mb_id)
 	return virtio_mem_remove_memory(vm, addr, size);
 }
 
+/*
+ * See virtio_mem_remove_memory(): Try to remove all Linux memory blocks covered
+ * by the big block.
+ */
+static int virtio_mem_bbm_remove_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+	const uint64_t size = vm->bbm.bb_size;
+
+	return virtio_mem_remove_memory(vm, addr, size);
+}
+
 /*
  * Try offlining and removing memory from Linux.
  *
@@ -731,6 +749,19 @@ static int virtio_mem_sbm_offline_and_remove_mb(struct virtio_mem *vm,
 	return virtio_mem_offline_and_remove_memory(vm, addr, size);
 }
 
+/*
+ * See virtio_mem_offline_and_remove_memory(): Try to offline and remove a
+ * all Linux memory blocks covered by the big block.
+ */
+static int virtio_mem_bbm_offline_and_remove_bb(struct virtio_mem *vm,
+						unsigned long bb_id)
+{
+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+	const uint64_t size = vm->bbm.bb_size;
+
+	return virtio_mem_offline_and_remove_memory(vm, addr, size);
+}
+
 /*
  * Trigger the workqueue so the device can perform its magic.
  */
@@ -1928,6 +1959,129 @@ static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
 	return rc;
 }
 
+/*
+ * Try to offline and remove a big block from Linux and unplug it. Will fail
+ * with -EBUSY if some memory is busy and cannot get unplugged.
+ *
+ * Will modify the state of the memory block. Might temporarily drop the
+ * hotplug_mutex.
+ */
+static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
+						       unsigned long bb_id)
+{
+	int rc;
+
+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
+			 VIRTIO_MEM_BBM_BB_ADDED))
+		return -EINVAL;
+
+	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
+	if (rc)
+		return rc;
+
+	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
+	if (rc)
+		virtio_mem_bbm_set_bb_state(vm, bb_id,
+					    VIRTIO_MEM_BBM_BB_PLUGGED);
+	else
+		virtio_mem_bbm_set_bb_state(vm, bb_id,
+					    VIRTIO_MEM_BBM_BB_UNUSED);
+	return rc;
+}
+
+/*
+ * Try to remove a big block from Linux and unplug it. Will fail with
+ * -EBUSY if some memory is online.
+ *
+ * Will modify the state of the memory block.
+ */
+static int virtio_mem_bbm_remove_and_unplug_bb(struct virtio_mem *vm,
+					       unsigned long bb_id)
+{
+	int rc;
+
+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
+			 VIRTIO_MEM_BBM_BB_ADDED))
+		return -EINVAL;
+
+	rc = virtio_mem_bbm_remove_bb(vm, bb_id);
+	if (rc)
+		return -EBUSY;
+
+	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
+	if (rc)
+		virtio_mem_bbm_set_bb_state(vm, bb_id,
+					    VIRTIO_MEM_BBM_BB_PLUGGED);
+	else
+		virtio_mem_bbm_set_bb_state(vm, bb_id,
+					    VIRTIO_MEM_BBM_BB_UNUSED);
+	return rc;
+}
+
+/*
+ * Test if a big block is completely offline.
+ */
+static bool virtio_mem_bbm_bb_is_offline(struct virtio_mem *vm,
+					 unsigned long bb_id)
+{
+	const unsigned long start_pfn = PFN_DOWN(virtio_mem_bb_id_to_phys(vm, bb_id));
+	const unsigned long nr_pages = PFN_DOWN(vm->bbm.bb_size);
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < start_pfn + nr_pages;
+	     pfn += PAGES_PER_SECTION) {
+		if (pfn_to_online_page(pfn))
+			return false;
+	}
+
+	return true;
+}
+
+static int virtio_mem_bbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
+{
+	uint64_t nb_bb = diff / vm->bbm.bb_size;
+	uint64_t bb_id;
+	int rc;
+
+	if (!nb_bb)
+		return 0;
+
+	/* Try to unplug completely offline big blocks first. */
+	virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) {
+		cond_resched();
+		/*
+		 * As we're holding no locks, this check is racy as memory
+		 * can get onlined in the meantime - but we'll fail gracefully.
+		 */
+		if (!virtio_mem_bbm_bb_is_offline(vm, bb_id))
+			continue;
+		rc = virtio_mem_bbm_remove_and_unplug_bb(vm, bb_id);
+		if (rc == -EBUSY)
+			continue;
+		if (!rc)
+			nb_bb--;
+		if (rc || !nb_bb)
+			return rc;
+	}
+
+	if (!unplug_online)
+		return 0;
+
+	/* Try to unplug any big blocks. */
+	virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) {
+		cond_resched();
+		rc = virtio_mem_bbm_offline_remove_and_unplug_bb(vm, bb_id);
+		if (rc == -EBUSY)
+			continue;
+		if (!rc)
+			nb_bb--;
+		if (rc || !nb_bb)
+			return rc;
+	}
+
+	return nb_bb ? -EBUSY : 0;
+}
+
 /*
  * Try to unplug the requested amount of memory.
  */
@@ -1935,7 +2089,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
 {
 	if (vm->in_sbm)
 		return virtio_mem_sbm_unplug_request(vm, diff);
-	return -EBUSY;
+	return virtio_mem_bbm_unplug_request(vm, diff);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (27 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug David Hildenbrand
@ 2020-10-12 12:53 ` David Hildenbrand
  2020-10-19  7:54   ` Wei Yang
  2020-10-20  0:24   ` Wei Yang
  2020-10-18 12:49 ` [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) Wei Yang
  2020-10-18 15:29 ` Michael S. Tsirkin
  30 siblings, 2 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-12 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	David Hildenbrand, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

Let's add a safe mechanism to unplug memory, avoiding long/endless loops
when trying to offline memory - similar to in SBM.

Fake-offline all memory (via alloc_contig_range()) before trying to
offline+remove it. Use this mode as default, but allow to enable the other
mode explicitly (which could give better memory hotunplug guarantees in
some environments).

The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=0
on the cmdline.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 97 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 6bcd0acbff32..09f11489be6f 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -37,6 +37,11 @@ module_param(bbm_block_size, ulong, 0444);
 MODULE_PARM_DESC(bbm_block_size,
 		 "Big Block size in bytes. Default is 0 (auto-detection).");
 
+static bool bbm_safe_unplug = true;
+module_param(bbm_safe_unplug, bool, 0444);
+MODULE_PARM_DESC(bbm_safe_unplug,
+	     "Use a safe unplug mechanism in BBM, avoiding long/endless loops");
+
 /*
  * virtio-mem currently supports the following modes of operation:
  *
@@ -87,6 +92,8 @@ enum virtio_mem_bbm_bb_state {
 	VIRTIO_MEM_BBM_BB_PLUGGED,
 	/* Plugged and added to Linux. */
 	VIRTIO_MEM_BBM_BB_ADDED,
+	/* All online parts are fake-offline, ready to remove. */
+	VIRTIO_MEM_BBM_BB_FAKE_OFFLINE,
 	VIRTIO_MEM_BBM_BB_COUNT
 };
 
@@ -889,6 +896,32 @@ static void virtio_mem_sbm_notify_cancel_offline(struct virtio_mem *vm,
 	}
 }
 
+static void virtio_mem_bbm_notify_going_offline(struct virtio_mem *vm,
+						unsigned long bb_id,
+						unsigned long pfn,
+						unsigned long nr_pages)
+{
+	/*
+	 * When marked as "fake-offline", all online memory of this device block
+	 * is allocated by us. Otherwise, we don't have any memory allocated.
+	 */
+	if (virtio_mem_bbm_get_bb_state(vm, bb_id) !=
+	    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE)
+		return;
+	virtio_mem_fake_offline_going_offline(pfn, nr_pages);
+}
+
+static void virtio_mem_bbm_notify_cancel_offline(struct virtio_mem *vm,
+						 unsigned long bb_id,
+						 unsigned long pfn,
+						 unsigned long nr_pages)
+{
+	if (virtio_mem_bbm_get_bb_state(vm, bb_id) !=
+	    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE)
+		return;
+	virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
+}
+
 /*
  * This callback will either be called synchronously from add_memory() or
  * asynchronously (e.g., triggered via user space). We have to be careful
@@ -949,6 +982,10 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 		vm->hotplug_active = true;
 		if (vm->in_sbm)
 			virtio_mem_sbm_notify_going_offline(vm, id);
+		else
+			virtio_mem_bbm_notify_going_offline(vm, id,
+							    mhp->start_pfn,
+							    mhp->nr_pages);
 		break;
 	case MEM_GOING_ONLINE:
 		mutex_lock(&vm->hotplug_mutex);
@@ -999,6 +1036,10 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 			break;
 		if (vm->in_sbm)
 			virtio_mem_sbm_notify_cancel_offline(vm, id);
+		else
+			virtio_mem_bbm_notify_cancel_offline(vm, id,
+							     mhp->start_pfn,
+							     mhp->nr_pages);
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
@@ -1189,7 +1230,13 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
 								   sb_id, 1);
 		} else {
-			do_online = true;
+			/*
+			 * If the whole block is marked fake offline, keep
+			 * everything that way.
+			 */
+			id = virtio_mem_phys_to_bb_id(vm, addr);
+			do_online = virtio_mem_bbm_get_bb_state(vm, id) !=
+				    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE;
 		}
 		if (do_online)
 			generic_online_page(page, order);
@@ -1969,15 +2016,50 @@ static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
 static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
 						       unsigned long bb_id)
 {
+	const unsigned long start_pfn = PFN_DOWN(virtio_mem_bb_id_to_phys(vm, bb_id));
+	const unsigned long nr_pages = PFN_DOWN(vm->bbm.bb_size);
+	unsigned long end_pfn = start_pfn + nr_pages;
+	unsigned long pfn;
+	struct page *page;
 	int rc;
 
 	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
 			 VIRTIO_MEM_BBM_BB_ADDED))
 		return -EINVAL;
 
+	if (bbm_safe_unplug) {
+		/*
+		 * Start by fake-offlining all memory. Once we marked the device
+		 * block as fake-offline, all newly onlined memory will
+		 * automatically be kept fake-offline. Protect from concurrent
+		 * onlining/offlining until we have a consistent state.
+		 */
+		mutex_lock(&vm->hotplug_mutex);
+		virtio_mem_bbm_set_bb_state(vm, bb_id,
+					    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE);
+
+		for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+			page = pfn_to_online_page(pfn);
+			if (!page)
+				continue;
+
+			rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION);
+			if (rc) {
+				end_pfn = pfn;
+				goto rollback_safe_unplug;
+			}
+		}
+		mutex_unlock(&vm->hotplug_mutex);
+	}
+
 	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
-	if (rc)
+	if (rc) {
+		if (bbm_safe_unplug) {
+			mutex_lock(&vm->hotplug_mutex);
+			goto rollback_safe_unplug;
+		}
 		return rc;
+	}
 
 	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
 	if (rc)
@@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
 		virtio_mem_bbm_set_bb_state(vm, bb_id,
 					    VIRTIO_MEM_BBM_BB_UNUSED);
 	return rc;
+
+rollback_safe_unplug:
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		page = pfn_to_online_page(pfn);
+		if (!page)
+			continue;
+		virtio_mem_fake_online(pfn, PAGES_PER_SECTION);
+	}
+	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
+	mutex_unlock(&vm->hotplug_mutex);
+	return rc;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
  2020-10-12 12:52 ` [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add() David Hildenbrand
@ 2020-10-12 13:09   ` Pankaj Gupta
  2020-10-15  7:14   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-12 13:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 78c2fbcddcf8..b3eebac7191f 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -1072,7 +1072,7 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
>                                       uint64_t *nb_sb)
>  {
>         const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
> -       int rc, rc2;
> +       int rc;
>
>         if (WARN_ON_ONCE(!count))
>                 return -EINVAL;
> @@ -1103,13 +1103,12 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
>
>                 dev_err(&vm->vdev->dev,
>                         "adding memory block %lu failed with %d\n", mb_id, rc);
> -               rc2 = virtio_mem_mb_unplug_sb(vm, mb_id, 0, count);
>
>                 /*
>                  * TODO: Linux MM does not properly clean up yet in all cases
>                  * where adding of memory failed - especially on -ENOMEM.
>                  */
> -               if (rc2)
> +               if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
>                         new_state = VIRTIO_MEM_MB_STATE_PLUGGED;
>                 virtio_mem_mb_set_state(vm, mb_id, new_state);
>                 return rc;

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
  2020-10-12 12:52 ` [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
@ 2020-10-15  3:56   ` Wei Yang
  2020-10-15 19:26   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-15  3:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:52:55PM +0200, David Hildenbrand wrote:
>Let's determine the target nid only once in case we have none specified -
>usually, we'll end up with node 0 either way.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 28 +++++++++++-----------------
> 1 file changed, 11 insertions(+), 17 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index ba4de598f663..a1f5bf7a571a 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -70,7 +70,7 @@ struct virtio_mem {
> 
> 	/* The device block size (for communicating with the device). */
> 	uint64_t device_block_size;
>-	/* The translated node id. NUMA_NO_NODE in case not specified. */
>+	/* The determined node id for all memory of the device. */
> 	int nid;
> 	/* Physical start address of the memory region. */
> 	uint64_t addr;
>@@ -406,10 +406,6 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
> static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>-	int nid = vm->nid;
>-
>-	if (nid == NUMA_NO_NODE)
>-		nid = memory_add_physaddr_to_nid(addr);
> 
> 	/*
> 	 * When force-unloading the driver and we still have memory added to
>@@ -423,7 +419,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
> 	}
> 
> 	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
>-	return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
>+	return add_memory_driver_managed(vm->nid, addr,
>+					 memory_block_size_bytes(),
> 					 vm->resource_name,
> 					 MEMHP_MERGE_RESOURCE);
> }
>@@ -440,13 +437,9 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
> static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>-	int nid = vm->nid;
>-
>-	if (nid == NUMA_NO_NODE)
>-		nid = memory_add_physaddr_to_nid(addr);
> 
> 	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
>-	return remove_memory(nid, addr, memory_block_size_bytes());
>+	return remove_memory(vm->nid, addr, memory_block_size_bytes());
> }
> 
> /*
>@@ -461,14 +454,11 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>-	int nid = vm->nid;
>-
>-	if (nid == NUMA_NO_NODE)
>-		nid = memory_add_physaddr_to_nid(addr);
> 
> 	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
> 		mb_id);
>-	return offline_and_remove_memory(nid, addr, memory_block_size_bytes());
>+	return offline_and_remove_memory(vm->nid, addr,
>+					 memory_block_size_bytes());
> }
> 
> /*
>@@ -1659,6 +1649,10 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
> 			&vm->region_size);
> 
>+	/* Determine the nid for the device based on the lowest address. */
>+	if (vm->nid == NUMA_NO_NODE)
>+		vm->nid = memory_add_physaddr_to_nid(vm->addr);
>+
> 	/*
> 	 * We always hotplug memory in memory block granularity. This way,
> 	 * we have to wait for exactly one memory block to online.
>@@ -1707,7 +1701,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 		 memory_block_size_bytes());
> 	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
> 		 (unsigned long long)vm->subblock_size);
>-	if (vm->nid != NUMA_NO_NODE)
>+	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
> 
> 	return 0;
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-12 12:52 ` [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb() David Hildenbrand
@ 2020-10-15  4:02   ` Wei Yang
  2020-10-15  8:00     ` David Hildenbrand
  2020-10-15 20:24   ` Pankaj Gupta
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-15  4:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:52:56PM +0200, David Hildenbrand wrote:
>We actually need one byte less (next_mb_id is exclusive, first_mb_id is
>inclusive). Simplify.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index a1f5bf7a571a..670b3faf412d 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>  */
> static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
> {
>-	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
>-	unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
>+	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>+	unsigned long new_bytes = old_bytes + 1;

This is correct.

So this looks more like a fix?

> 	int old_pages = PFN_UP(old_bytes);
> 	int new_pages = PFN_UP(new_bytes);
> 	uint8_t *new_mb_state;
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
  2020-10-12 12:52 ` [PATCH v1 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling David Hildenbrand
@ 2020-10-15  7:06   ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-15  7:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:52:57PM +0200, David Hildenbrand wrote:
>Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where
>possible, so we don't have do deal with allocation orders.
>
>Add a comment why we have that restriction for now.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
  2020-10-12 12:52 ` [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add() David Hildenbrand
  2020-10-12 13:09   ` Pankaj Gupta
@ 2020-10-15  7:14   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-15  7:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:52:58PM +0200, David Hildenbrand wrote:
>We can drop rc2, we don't actually need the value.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>


-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-15  4:02   ` Wei Yang
@ 2020-10-15  8:00     ` David Hildenbrand
  2020-10-15 10:00       ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-15  8:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 15.10.20 06:02, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:52:56PM +0200, David Hildenbrand wrote:
>> We actually need one byte less (next_mb_id is exclusive, first_mb_id is
>> inclusive). Simplify.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index a1f5bf7a571a..670b3faf412d 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>>  */
>> static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>> {
>> -	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
>> -	unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
>> +	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>> +	unsigned long new_bytes = old_bytes + 1;
> 
> This is correct.
> 
> So this looks more like a fix?

We allocate an additional new page "one memory block too early".

So we would allocate the first page for blocks 0..510, and already
allocate the second page with block 511, although we could have fit it
into the first page. Block 512 will then find that the second page is
already there and simply use the second page.

So as we do it consistently, nothing will go wrong - that's why I
avoided using the "fix" terminology.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-12 12:52 ` [PATCH v1 05/29] virtio-mem: generalize check for added memory David Hildenbrand
@ 2020-10-15  8:28   ` Wei Yang
  2020-10-15  8:50     ` David Hildenbrand
  2020-10-16 22:39   ` Wei Yang
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-15  8:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:52:59PM +0200, David Hildenbrand wrote:
>Let's check by traversing busy system RAM resources instead, to avoid
>relying on memory block states.
>
>Don't use walk_system_ram_range(), as that works on pages and we want to
>use the bare addresses we have easily at hand.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index b3eebac7191f..6bbd1cfd10d3 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -1749,6 +1749,20 @@ static void virtio_mem_delete_resource(struct virtio_mem *vm)
> 	vm->parent_resource = NULL;
> }
> 
>+static int virtio_mem_range_has_system_ram(struct resource *res, void *arg)
>+{
>+	return 1;
>+}
>+
>+static bool virtio_mem_has_memory_added(struct virtio_mem *vm)
>+{
>+	const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>+
>+	return walk_iomem_res_desc(IORES_DESC_NONE, flags, vm->addr,
>+				   vm->addr + vm->region_size, NULL,
>+				   virtio_mem_range_has_system_ram) == 1;
>+}
>+
> static int virtio_mem_probe(struct virtio_device *vdev)
> {
> 	struct virtio_mem *vm;
>@@ -1870,10 +1884,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	 * the system. And there is no way to stop the driver/device from going
> 	 * away. Warn at least.
> 	 */
>-	if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
>-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
>-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
>-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
>+	if (virtio_mem_has_memory_added(vm)) {

I am not sure this would be more efficient.

> 		dev_warn(&vdev->dev, "device still has system memory added\n");
> 	} else {
> 		virtio_mem_delete_resource(vm);

BTW, I got one question during review.

Per my understanding, there are 4 states of a virtio memory block

  * OFFLINE[_PARTIAL]
  * ONLINE[_PARTIAL]

While, if my understanding is correct, those two offline states are transient.
If the required range is onlined, the state would be change to
ONLINE[_PARTIAL] respectively. If it is not, the state is reverted to UNUSED
or PLUGGED.

What I am lost is why you do virtio_mem_mb_remove() on OFFLINE_PARTIAL memory
block? Since we wait for the workqueue finish its job.

Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block.
How about memory block in other states? It is not necessary to remove
ONLINE[_PARTIAL] memroy blocks?

Thanks in advance, since I may missed some concepts.

>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb()
  2020-10-12 12:53 ` [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb() David Hildenbrand
@ 2020-10-15  8:32   ` Wei Yang
  2020-10-15  8:37     ` David Hildenbrand
  2020-10-15 20:30   ` Pankaj Gupta
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-15  8:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:00PM +0200, David Hildenbrand wrote:
>Avoid using memory block ids. Rename it to virtio_mem_contains_range().
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 6bbd1cfd10d3..821143db14fe 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -500,12 +500,13 @@ static bool virtio_mem_overlaps_range(struct virtio_mem *vm,
> }
> 
> /*
>- * Test if a virtio-mem device owns a memory block. Can be called from
>+ * Test if a virtio-mem device contains a given range. Can be called from
>  * (notifier) callbacks lockless.
>  */
>-static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
>+static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
>+				      uint64_t size)
> {
>-	return mb_id >= vm->first_mb_id && mb_id <= vm->last_mb_id;
>+	return start >= vm->addr && start + size <= vm->addr + vm->region_size;

Do we have some reason to do this change?

> }
> 
> static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>@@ -800,7 +801,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> 	 */
> 	rcu_read_lock();
> 	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
>-		if (!virtio_mem_owned_mb(vm, mb_id))
>+		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> 			continue;
> 
> 		sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 08/29] virtio-mem: drop last_mb_id
  2020-10-12 12:53 ` [PATCH v1 08/29] virtio-mem: drop last_mb_id David Hildenbrand
@ 2020-10-15  8:35   ` Wei Yang
  2020-10-15 20:32   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-15  8:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:02PM +0200, David Hildenbrand wrote:
>No longer used, let's drop it.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

If above two patches are merged.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 4 ----
> 1 file changed, 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 37a0e338ae4a..5c93f8a65eba 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -84,8 +84,6 @@ struct virtio_mem {
> 
> 	/* Id of the first memory block of this device. */
> 	unsigned long first_mb_id;
>-	/* Id of the last memory block of this device. */
>-	unsigned long last_mb_id;
> 	/* Id of the last usable memory block of this device. */
> 	unsigned long last_usable_mb_id;
> 	/* Id of the next memory bock to prepare when needed. */
>@@ -1689,8 +1687,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
> 						   memory_block_size_bytes());
> 	vm->next_mb_id = vm->first_mb_id;
>-	vm->last_mb_id = virtio_mem_phys_to_mb_id(vm->addr +
>-			 vm->region_size) - 1;
> 
> 	dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> 	dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb()
  2020-10-15  8:32   ` Wei Yang
@ 2020-10-15  8:37     ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-15  8:37 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 15.10.20 10:32, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:00PM +0200, David Hildenbrand wrote:
>> Avoid using memory block ids. Rename it to virtio_mem_contains_range().
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index 6bbd1cfd10d3..821143db14fe 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -500,12 +500,13 @@ static bool virtio_mem_overlaps_range(struct virtio_mem *vm,
>> }
>>
>> /*
>> - * Test if a virtio-mem device owns a memory block. Can be called from
>> + * Test if a virtio-mem device contains a given range. Can be called from
>>  * (notifier) callbacks lockless.
>>  */
>> -static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
>> +static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
>> +				      uint64_t size)
>> {
>> -	return mb_id >= vm->first_mb_id && mb_id <= vm->last_mb_id;
>> +	return start >= vm->addr && start + size <= vm->addr + vm->region_size;
> 
> Do we have some reason to do this change?

Big Block Mode :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-15  8:28   ` Wei Yang
@ 2020-10-15  8:50     ` David Hildenbrand
  2020-10-16  2:16       ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-15  8:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 15.10.20 10:28, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:52:59PM +0200, David Hildenbrand wrote:
>> Let's check by traversing busy system RAM resources instead, to avoid
>> relying on memory block states.
>>
>> Don't use walk_system_ram_range(), as that works on pages and we want to
>> use the bare addresses we have easily at hand.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 19 +++++++++++++++----
>> 1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index b3eebac7191f..6bbd1cfd10d3 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -1749,6 +1749,20 @@ static void virtio_mem_delete_resource(struct virtio_mem *vm)
>> 	vm->parent_resource = NULL;
>> }
>>
>> +static int virtio_mem_range_has_system_ram(struct resource *res, void *arg)
>> +{
>> +	return 1;
>> +}
>> +
>> +static bool virtio_mem_has_memory_added(struct virtio_mem *vm)
>> +{
>> +	const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> +
>> +	return walk_iomem_res_desc(IORES_DESC_NONE, flags, vm->addr,
>> +				   vm->addr + vm->region_size, NULL,
>> +				   virtio_mem_range_has_system_ram) == 1;
>> +}
>> +
>> static int virtio_mem_probe(struct virtio_device *vdev)
>> {
>> 	struct virtio_mem *vm;
>> @@ -1870,10 +1884,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>> 	 * the system. And there is no way to stop the driver/device from going
>> 	 * away. Warn at least.
>> 	 */
>> -	if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
>> -	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
>> -	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
>> -	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
>> +	if (virtio_mem_has_memory_added(vm)) {
> 
> I am not sure this would be more efficient.

In general, no. However, this is a preparation for Big Block Mode, which
won't have memory block states.

(this path only triggers when unloading the driver - which most probably
only ever happens during my testing ... :) and we don't really care
about performance there)

> 
>> 		dev_warn(&vdev->dev, "device still has system memory added\n");
>> 	} else {
>> 		virtio_mem_delete_resource(vm);
> 
> BTW, I got one question during review.
> 
> Per my understanding, there are 4 states of a virtio memory block
> 
>   * OFFLINE[_PARTIAL]
>   * ONLINE[_PARTIAL]
> 
> While, if my understanding is correct, those two offline states are transient.
> If the required range is onlined, the state would be change to
> ONLINE[_PARTIAL] respectively. If it is not, the state is reverted to UNUSED
> or PLUGGED.

Very right.

> 
> What I am lost is why you do virtio_mem_mb_remove() on OFFLINE_PARTIAL memory
> block? Since we wait for the workqueue finish its job.

That's an interesting corner case. Assume you have a 128MB memory block
but only 64MB are plugged.

As long as we have our online_pages callback in place, we can hinder the
unplugged 64MB from getting exposed to the buddy
(virtio_mem_online_page_cb()). However, once we unloaded the driver,
this is no longer the case. If someone would online that memory block,
we would expose unplugged memory to the buddy - very bad.

So we have to remove these partially plugged, offline memory blocks when
losing control over them.

I tried to document that via:

"After we unregistered our callbacks, user space can online partially
plugged offline blocks. Make sure to remove them."

> 
> Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block.
> How about memory block in other states? It is not necessary to remove
> ONLINE[_PARTIAL] memroy blocks?

Blocks that are fully plugged (ONLINE or OFFLINE) can get
onlined/offlined without us having to care. Works fine - we only have to
care about partially plugged blocks.

While we *could* unplug OFFLINE blocks, there is no way we can
deterministically offline+remove ONLINE blocks. So that memory has to
stay, even after we unloaded the driver (similar to the dax/kmem driver).

ONLINE_PARTIAL is already taken care of: it cannot get offlined anymore,
as we still hold references to these struct pages
(virtio_mem_set_fake_offline()), and as we no longer have the memory
notifier in place, we can no longer agree to offline this memory (when
going_offline).

I tried to document that via

"After we unregistered our callbacks, user space can no longer offline
partially plugged online memory blocks. No need to worry about them."


> 
> Thanks in advance, since I may missed some concepts.

(force) driver unloading is a complicated corner case.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM) David Hildenbrand
@ 2020-10-15  9:33   ` David Hildenbrand
  2020-10-20  9:38     ` Pankaj Gupta
  2020-10-16  8:03   ` Wei Yang
  1 sibling, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-15  9:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Andrew Morton, Michael S . Tsirkin,
	Jason Wang, Pankaj Gupta

On 12.10.20 14:53, David Hildenbrand wrote:
> Let's add some documentation for the current mode - Sub Block Mode (SBM) -
> to prepare for a new mode - Big Block Mode (BBM).
> 
> Follow-up patches will properly factor out the existing Sub Block Mode
> (SBM) and implement Device Block Mode (DBM).

s/Device Block Mode (DBM)/Big Block Mode (BBM)/

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-15  8:00     ` David Hildenbrand
@ 2020-10-15 10:00       ` Wei Yang
  2020-10-15 10:01         ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-15 10:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Thu, Oct 15, 2020 at 10:00:15AM +0200, David Hildenbrand wrote:
>On 15.10.20 06:02, Wei Yang wrote:
>> On Mon, Oct 12, 2020 at 02:52:56PM +0200, David Hildenbrand wrote:
>>> We actually need one byte less (next_mb_id is exclusive, first_mb_id is
>>> inclusive). Simplify.
>>>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> drivers/virtio/virtio_mem.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>>> index a1f5bf7a571a..670b3faf412d 100644
>>> --- a/drivers/virtio/virtio_mem.c
>>> +++ b/drivers/virtio/virtio_mem.c
>>> @@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>>>  */
>>> static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>>> {
>>> -	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
>>> -	unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
>>> +	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>>> +	unsigned long new_bytes = old_bytes + 1;
>> 
>> This is correct.
>> 
>> So this looks more like a fix?
>
>We allocate an additional new page "one memory block too early".
>
>So we would allocate the first page for blocks 0..510, and already
>allocate the second page with block 511, although we could have fit it
>into the first page. Block 512 will then find that the second page is
>already there and simply use the second page.
>
>So as we do it consistently, nothing will go wrong - that's why I
>avoided using the "fix" terminology.
>

Yes, my feeling is this is not a simplification. Instead this is a more
precise calculation.

How about use this subject?

virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb()

>Thanks!
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-15 10:00       ` Wei Yang
@ 2020-10-15 10:01         ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-15 10:01 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 15.10.20 12:00, Wei Yang wrote:
> On Thu, Oct 15, 2020 at 10:00:15AM +0200, David Hildenbrand wrote:
>> On 15.10.20 06:02, Wei Yang wrote:
>>> On Mon, Oct 12, 2020 at 02:52:56PM +0200, David Hildenbrand wrote:
>>>> We actually need one byte less (next_mb_id is exclusive, first_mb_id is
>>>> inclusive). Simplify.
>>>>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> drivers/virtio/virtio_mem.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>>>> index a1f5bf7a571a..670b3faf412d 100644
>>>> --- a/drivers/virtio/virtio_mem.c
>>>> +++ b/drivers/virtio/virtio_mem.c
>>>> @@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>>>>  */
>>>> static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>>>> {
>>>> -	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
>>>> -	unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
>>>> +	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>>>> +	unsigned long new_bytes = old_bytes + 1;
>>>
>>> This is correct.
>>>
>>> So this looks more like a fix?
>>
>> We allocate an additional new page "one memory block too early".
>>
>> So we would allocate the first page for blocks 0..510, and already
>> allocate the second page with block 511, although we could have fit it
>> into the first page. Block 512 will then find that the second page is
>> already there and simply use the second page.
>>
>> So as we do it consistently, nothing will go wrong - that's why I
>> avoided using the "fix" terminology.
>>
> 
> Yes, my feeling is this is not a simplification. Instead this is a more
> precise calculation.
> 
> How about use this subject?
> 
> virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb()

Agreed, thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2020-10-12 12:53 ` [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block David Hildenbrand
@ 2020-10-15 13:08   ` Michael S. Tsirkin
  2020-10-19  3:22   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Michael S. Tsirkin @ 2020-10-15 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Jason Wang, Pankaj Gupta, Michal Hocko, Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:21PM +0200, David Hildenbrand wrote:
> virtio-mem soon wants to use offline_and_remove_memory() memory that
> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
> remove that restriction.
> 
> Let's remember the old state and try to restore that if anything goes
> wrong. While re-onlining can, in general, fail, it's highly unlikely to
> happen (usually only when a notifier fails to allocate memory, and these
> are rather rare).
> 
> This will be used by virtio-mem to offline+remove memory ranges that are
> bigger than a single memory block - for example, with a device block
> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
> block size of 128MB.
> 
> While we could compress the state into 2 bit, using 8 bit is much
> easier.
> 
> This handling is similar, but different to acpi_scan_try_to_offline():
> 
> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
> optimization is still relevant - it should only apply to ZONE_NORMAL
> (where we have no guarantees). If relevant, we can always add it.
> 
> b) acpi_scan_try_to_offline() simply onlines all memory in case
> something goes wrong. It doesn't restore previous online type. Let's do
> that, so we won't overwrite what e.g., user space configured.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Could I get some acks from mm folks for this one?
The rest can go in through my tree I guess ...
Andrew?

Thanks!

> ---
>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 89 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b44d4c7ba73b..217080ca93e5 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size)
>  }
>  EXPORT_SYMBOL_GPL(remove_memory);
>  
> +static int try_offline_memory_block(struct memory_block *mem, void *arg)
> +{
> +	uint8_t online_type = MMOP_ONLINE_KERNEL;
> +	uint8_t **online_types = arg;
> +	struct page *page;
> +	int rc;
> +
> +	/*
> +	 * Sense the online_type via the zone of the memory block. Offlining
> +	 * with multiple zones within one memory block will be rejected
> +	 * by offlining code ... so we don't care about that.
> +	 */
> +	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
> +	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
> +		online_type = MMOP_ONLINE_MOVABLE;
> +
> +	rc = device_offline(&mem->dev);
> +	/*
> +	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
> +	 * so try_reonline_memory_block() can do the right thing.
> +	 */
> +	if (!rc)
> +		**online_types = online_type;
> +
> +	(*online_types)++;
> +	/* Ignore if already offline. */
> +	return rc < 0 ? rc : 0;
> +}
> +
> +static int try_reonline_memory_block(struct memory_block *mem, void *arg)
> +{
> +	uint8_t **online_types = arg;
> +	int rc;
> +
> +	if (**online_types != MMOP_OFFLINE) {
> +		mem->online_type = **online_types;
> +		rc = device_online(&mem->dev);
> +		if (rc < 0)
> +			pr_warn("%s: Failed to re-online memory: %d",
> +				__func__, rc);
> +	}
> +
> +	/* Continue processing all remaining memory blocks. */
> +	(*online_types)++;
> +	return 0;
> +}
> +
>  /*
> - * Try to offline and remove a memory block. Might take a long time to
> - * finish in case memory is still in use. Primarily useful for memory devices
> - * that logically unplugged all memory (so it's no longer in use) and want to
> - * offline + remove the memory block.
> + * Try to offline and remove memory. Might take a long time to finish in case
> + * memory is still in use. Primarily useful for memory devices that logically
> + * unplugged all memory (so it's no longer in use) and want to offline + remove
> + * that memory.
>   */
>  int offline_and_remove_memory(int nid, u64 start, u64 size)
>  {
> -	struct memory_block *mem;
> -	int rc = -EINVAL;
> +	const unsigned long mb_count = size / memory_block_size_bytes();
> +	uint8_t *online_types, *tmp;
> +	int rc;
>  
>  	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
> -	    size != memory_block_size_bytes())
> -		return rc;
> +	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
> +		return -EINVAL;
> +
> +	/*
> +	 * We'll remember the old online type of each memory block, so we can
> +	 * try to revert whatever we did when offlining one memory block fails
> +	 * after offlining some others succeeded.
> +	 */
> +	online_types = kmalloc_array(mb_count, sizeof(*online_types),
> +				     GFP_KERNEL);
> +	if (!online_types)
> +		return -ENOMEM;
> +	/*
> +	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
> +	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
> +	 * try_reonline_memory_block().
> +	 */
> +	memset(online_types, MMOP_OFFLINE, mb_count);
>  
>  	lock_device_hotplug();
> -	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
> -	if (mem)
> -		rc = device_offline(&mem->dev);
> -	/* Ignore if the device is already offline. */
> -	if (rc > 0)
> -		rc = 0;
> +
> +	tmp = online_types;
> +	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
>  
>  	/*
> -	 * In case we succeeded to offline the memory block, remove it.
> +	 * In case we succeeded to offline all memory, remove it.
>  	 * This cannot fail as it cannot get onlined in the meantime.
>  	 */
>  	if (!rc) {
>  		rc = try_remove_memory(nid, start, size);
> -		WARN_ON_ONCE(rc);
> +		if (rc)
> +			pr_err("%s: Failed to remove memory: %d", __func__, rc);
> +	}
> +
> +	/*
> +	 * Rollback what we did. While memory onlining might theoretically fail
> +	 * (nacked by a notifier), it barely ever happens.
> +	 */
> +	if (rc) {
> +		tmp = online_types;
> +		walk_memory_blocks(start, size, &tmp,
> +				   try_reonline_memory_block);
>  	}
>  	unlock_device_hotplug();
>  
> +	kfree(online_types);
>  	return rc;
>  }
>  EXPORT_SYMBOL_GPL(offline_and_remove_memory);
> -- 
> 2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
  2020-10-12 12:52 ` [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
  2020-10-15  3:56   ` Wei Yang
@ 2020-10-15 19:26   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-15 19:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> Let's determine the target nid only once in case we have none specified -
> usually, we'll end up with node 0 either way.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 28 +++++++++++-----------------
>  1 file changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index ba4de598f663..a1f5bf7a571a 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -70,7 +70,7 @@ struct virtio_mem {
>
>         /* The device block size (for communicating with the device). */
>         uint64_t device_block_size;
> -       /* The translated node id. NUMA_NO_NODE in case not specified. */
> +       /* The determined node id for all memory of the device. */
>         int nid;
>         /* Physical start address of the memory region. */
>         uint64_t addr;
> @@ -406,10 +406,6 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
>  static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
>  {
>         const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
> -       int nid = vm->nid;
> -
> -       if (nid == NUMA_NO_NODE)
> -               nid = memory_add_physaddr_to_nid(addr);
>
>         /*
>          * When force-unloading the driver and we still have memory added to
> @@ -423,7 +419,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
>         }
>
>         dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
> -       return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
> +       return add_memory_driver_managed(vm->nid, addr,
> +                                        memory_block_size_bytes(),
>                                          vm->resource_name,
>                                          MEMHP_MERGE_RESOURCE);
>  }
> @@ -440,13 +437,9 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
>  static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
>  {
>         const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
> -       int nid = vm->nid;
> -
> -       if (nid == NUMA_NO_NODE)
> -               nid = memory_add_physaddr_to_nid(addr);
>
>         dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
> -       return remove_memory(nid, addr, memory_block_size_bytes());
> +       return remove_memory(vm->nid, addr, memory_block_size_bytes());
>  }
>
>  /*
> @@ -461,14 +454,11 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
>                                             unsigned long mb_id)
>  {
>         const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
> -       int nid = vm->nid;
> -
> -       if (nid == NUMA_NO_NODE)
> -               nid = memory_add_physaddr_to_nid(addr);
>
>         dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
>                 mb_id);
> -       return offline_and_remove_memory(nid, addr, memory_block_size_bytes());
> +       return offline_and_remove_memory(vm->nid, addr,
> +                                        memory_block_size_bytes());
>  }
>
>  /*
> @@ -1659,6 +1649,10 @@ static int virtio_mem_init(struct virtio_mem *vm)
>         virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
>                         &vm->region_size);
>
> +       /* Determine the nid for the device based on the lowest address. */
> +       if (vm->nid == NUMA_NO_NODE)
> +               vm->nid = memory_add_physaddr_to_nid(vm->addr);
> +
>         /*
>          * We always hotplug memory in memory block granularity. This way,
>          * we have to wait for exactly one memory block to online.
> @@ -1707,7 +1701,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
>                  memory_block_size_bytes());
>         dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>                  (unsigned long long)vm->subblock_size);
> -       if (vm->nid != NUMA_NO_NODE)
> +       if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
>                 dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
>
>         return 0;

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-12 12:52 ` [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb() David Hildenbrand
  2020-10-15  4:02   ` Wei Yang
@ 2020-10-15 20:24   ` Pankaj Gupta
  2020-10-16  9:00     ` David Hildenbrand
  1 sibling, 1 reply; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-15 20:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> We actually need one byte less (next_mb_id is exclusive, first_mb_id is
> inclusive). Simplify.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index a1f5bf7a571a..670b3faf412d 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>   */
>  static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>  {
> -       unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
> -       unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
> +       unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
> +       unsigned long new_bytes = old_bytes + 1;

Maybe we can avoid new_bytes & old_bytes variables, instead use single
variable. Can later be used with PFN_UP/PFN_DOWN.

>         int old_pages = PFN_UP(old_bytes);
>         int new_pages = PFN_UP(new_bytes);
>         uint8_t *new_mb_state;

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb()
  2020-10-12 12:53 ` [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb() David Hildenbrand
  2020-10-15  8:32   ` Wei Yang
@ 2020-10-15 20:30   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-15 20:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> Avoid using memory block ids. Rename it to virtio_mem_contains_range().
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 6bbd1cfd10d3..821143db14fe 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -500,12 +500,13 @@ static bool virtio_mem_overlaps_range(struct virtio_mem *vm,
>  }
>
>  /*
> - * Test if a virtio-mem device owns a memory block. Can be called from
> + * Test if a virtio-mem device contains a given range. Can be called from
>   * (notifier) callbacks lockless.
>   */
> -static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
> +static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
> +                                     uint64_t size)
>  {
> -       return mb_id >= vm->first_mb_id && mb_id <= vm->last_mb_id;
> +       return start >= vm->addr && start + size <= vm->addr + vm->region_size;
>  }
>
>  static int virtio_mem_notify_going_online(struct virtio_mem *vm,
> @@ -800,7 +801,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>          */
>         rcu_read_lock();
>         list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
> -               if (!virtio_mem_owned_mb(vm, mb_id))
> +               if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
>                         continue;
>
>                 sb_id = virtio_mem_phys_to_sb_id(vm, addr);

Looks good.
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining
  2020-10-12 12:53 ` [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining David Hildenbrand
@ 2020-10-15 20:31   ` Pankaj Gupta
  2020-10-16  6:11   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-15 20:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> No harm done, but let's be consistent.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index cb2e8f254650..00d1cfca4713 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -766,7 +766,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>   * (via generic_online_page()) using PageDirty().
>   */
>  static void virtio_mem_set_fake_offline(unsigned long pfn,
> -                                       unsigned int nr_pages, bool onlined)
> +                                       unsigned long nr_pages, bool onlined)
>  {
>         for (; nr_pages--; pfn++) {
>                 struct page *page = pfn_to_page(pfn);
> @@ -785,7 +785,7 @@ static void virtio_mem_set_fake_offline(unsigned long pfn,
>   * (via generic_online_page()), clear PageDirty().
>   */
>  static void virtio_mem_clear_fake_offline(unsigned long pfn,
> -                                         unsigned int nr_pages, bool onlined)
> +                                         unsigned long nr_pages, bool onlined)
>  {
>         for (; nr_pages--; pfn++) {
>                 struct page *page = pfn_to_page(pfn);
> @@ -800,10 +800,10 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
>   * Release a range of fake-offline pages to the buddy, effectively
>   * fake-onlining them.
>   */
> -static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
> +static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
>  {
>         const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
> -       int i;
> +       unsigned long i;
>
>         /*
>          * We are always called at least with MAX_ORDER_NR_PAGES

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 08/29] virtio-mem: drop last_mb_id
  2020-10-12 12:53 ` [PATCH v1 08/29] virtio-mem: drop last_mb_id David Hildenbrand
  2020-10-15  8:35   ` Wei Yang
@ 2020-10-15 20:32   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-15 20:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> No longer used, let's drop it.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 37a0e338ae4a..5c93f8a65eba 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -84,8 +84,6 @@ struct virtio_mem {
>
>         /* Id of the first memory block of this device. */
>         unsigned long first_mb_id;
> -       /* Id of the last memory block of this device. */
> -       unsigned long last_mb_id;
>         /* Id of the last usable memory block of this device. */
>         unsigned long last_usable_mb_id;
>         /* Id of the next memory bock to prepare when needed. */
> @@ -1689,8 +1687,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
>         vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>                                                    memory_block_size_bytes());
>         vm->next_mb_id = vm->first_mb_id;
> -       vm->last_mb_id = virtio_mem_phys_to_mb_id(vm->addr +
> -                        vm->region_size) - 1;
>
>         dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
>         dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-15  8:50     ` David Hildenbrand
@ 2020-10-16  2:16       ` Wei Yang
  2020-10-16  9:11         ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16  2:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Thu, Oct 15, 2020 at 10:50:27AM +0200, David Hildenbrand wrote:
[...]
>> 
>>> 		dev_warn(&vdev->dev, "device still has system memory added\n");
>>> 	} else {
>>> 		virtio_mem_delete_resource(vm);
>> 
>> BTW, I got one question during review.
>> 
>> Per my understanding, there are 4 states of a virtio memory block
>> 
>>   * OFFLINE[_PARTIAL]
>>   * ONLINE[_PARTIAL]
>> 
>> While, if my understanding is correct, those two offline states are transient.
>> If the required range is onlined, the state would be change to
>> ONLINE[_PARTIAL] respectively. If it is not, the state is reverted to UNUSED
>> or PLUGGED.
>
>Very right.
>
>> 
>> What I am lost is why you do virtio_mem_mb_remove() on OFFLINE_PARTIAL memory
>> block? Since we wait for the workqueue finish its job.

I have tried to understand the logic, while still have some confusion.

>
>That's an interesting corner case. Assume you have a 128MB memory block
>but only 64MB are plugged.

Since we just plug a part of memory block, this state is OFFLINE_PARTIAL
first. But then we would add these memory and online it. This means the state
of this memory block is ONLINE_PARTIAL.

When this state is changed to OFFLINE_PARTIAL again?

>
>As long as we have our online_pages callback in place, we can hinder the
>unplugged 64MB from getting exposed to the buddy
>(virtio_mem_online_page_cb()). However, once we unloaded the driver,

Yes,

virtio_mem_set_fake_offline() would __SetPageOffline() to those pages.

>this is no longer the case. If someone would online that memory block,
>we would expose unplugged memory to the buddy - very bad.
>

Per my understanding, at this point of time, the memory block is at online
state. Even part of it is set to *fake* offline.

So how could user trigger another online from sysfs interface?

>So we have to remove these partially plugged, offline memory blocks when
>losing control over them.
>
>I tried to document that via:
>
>"After we unregistered our callbacks, user space can online partially
>plugged offline blocks. Make sure to remove them."
>
>> 
>> Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block.
>> How about memory block in other states? It is not necessary to remove
>> ONLINE[_PARTIAL] memroy blocks?
>
>Blocks that are fully plugged (ONLINE or OFFLINE) can get
>onlined/offlined without us having to care. Works fine - we only have to
>care about partially plugged blocks.
>
>While we *could* unplug OFFLINE blocks, there is no way we can
>deterministically offline+remove ONLINE blocks. So that memory has to
>stay, even after we unloaded the driver (similar to the dax/kmem driver).

For OFFLINE memory blocks, would that leave the situation:

Guest doesn't need those pages, while host still maps them?

>
>ONLINE_PARTIAL is already taken care of: it cannot get offlined anymore,
>as we still hold references to these struct pages
>(virtio_mem_set_fake_offline()), and as we no longer have the memory
>notifier in place, we can no longer agree to offline this memory (when
>going_offline).
>

Ok, I seems to understand the logic now.

But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
How they hold reference to struct page?

>I tried to document that via
>
>"After we unregistered our callbacks, user space can no longer offline
>partially plugged online memory blocks. No need to worry about them."
>
>
>> 
>> Thanks in advance, since I may missed some concepts.
>
>(force) driver unloading is a complicated corner case.
>
>Thanks!
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-12 12:53 ` [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory David Hildenbrand
@ 2020-10-16  4:03   ` Wei Yang
  2020-10-16  9:18     ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16  4:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote:
>Let's trigger from offlining code when we're not allowed to touch online
>memory.

This describes the change in virtio_mem_memory_notifier_cb()?

>
>Handle the other case (memmap possibly freeing up another memory block)
>when actually removing memory. When removing via virtio_mem_remove(),
>virtio_mem_retry() is a NOP and safe to use.
>
>While at it, move retry handling when offlining out of
>virtio_mem_notify_offline(), to share it with Device Block Mode (DBM)
>soon.

I may not understand the logic fully. Here is my understanding of current
logic:


  virtio_mem_run_wq()
      virtio_mem_unplug_request()
          virtio_mem_mb_unplug_any_sb_offline()
	      virtio_mem_mb_remove()             --- 1
	  virtio_mem_mb_unplug_any_sb_online()
	      virtio_mem_mb_offline_and_remove() --- 2

This patch tries to trigger the wq at 1 and 2. And these two functions are
only valid during this code flow.

These two functions actually remove some memory from the system. So I am not
sure where extra unplug-able memory comes from. I guess those memory is from
memory block device and mem_sectioin, memmap? While those memory is still
marked as online, right?

In case we can gather extra memory at 1 and form a whole memory block. So that
we can unplug an online memory block (by moving data to a new place), this
just affect the process at 2. This means there is no need to trigger the wq at
1, and we can leave it at 2.

>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 40 ++++++++++++++++++++++++++-----------
> 1 file changed, 28 insertions(+), 12 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 5c93f8a65eba..8ea00f0b2ecd 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -158,6 +158,7 @@ static DEFINE_MUTEX(virtio_mem_mutex);
> static LIST_HEAD(virtio_mem_devices);
> 
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order);
>+static void virtio_mem_retry(struct virtio_mem *vm);
> 
> /*
>  * Register a virtio-mem device so it will be considered for the online_page
>@@ -435,9 +436,17 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
> static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>+	int rc;
> 
> 	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
>-	return remove_memory(vm->nid, addr, memory_block_size_bytes());
>+	rc = remove_memory(vm->nid, addr, memory_block_size_bytes());
>+	if (!rc)
>+		/*
>+		 * We might have freed up memory we can now unplug, retry
>+		 * immediately instead of waiting.
>+		 */
>+		virtio_mem_retry(vm);
>+	return rc;
> }
> 
> /*
>@@ -452,11 +461,19 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>+	int rc;
> 
> 	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
> 		mb_id);
>-	return offline_and_remove_memory(vm->nid, addr,
>-					 memory_block_size_bytes());
>+	rc = offline_and_remove_memory(vm->nid, addr,
>+				       memory_block_size_bytes());
>+	if (!rc)
>+		/*
>+		 * We might have freed up memory we can now unplug, retry
>+		 * immediately instead of waiting.
>+		 */
>+		virtio_mem_retry(vm);
>+	return rc;
> }
> 
> /*
>@@ -534,15 +551,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
> 		BUG();
> 		break;
> 	}
>-
>-	/*
>-	 * Trigger the workqueue, maybe we can now unplug memory. Also,
>-	 * when we offline and remove a memory block, this will re-trigger
>-	 * us immediately - which is often nice because the removal of
>-	 * the memory block (e.g., memmap) might have freed up memory
>-	 * on other memory blocks we manage.
>-	 */
>-	virtio_mem_retry(vm);
> }
> 
> static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
>@@ -679,6 +687,14 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 		break;
> 	case MEM_OFFLINE:
> 		virtio_mem_notify_offline(vm, mb_id);
>+
>+		/*
>+		 * Trigger the workqueue. Now that we have some offline memory,
>+		 * maybe we can handle pending unplug requests.
>+		 */
>+		if (!unplug_online)
>+			virtio_mem_retry(vm);
>+
> 		vm->hotplug_active = false;
> 		mutex_unlock(&vm->hotplug_mutex);
> 		break;
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining
  2020-10-12 12:53 ` [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining David Hildenbrand
  2020-10-15 20:31   ` Pankaj Gupta
@ 2020-10-16  6:11   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  6:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:05PM +0200, David Hildenbrand wrote:
>No harm done, but let's be consistent.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index cb2e8f254650..00d1cfca4713 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -766,7 +766,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>  * (via generic_online_page()) using PageDirty().
>  */
> static void virtio_mem_set_fake_offline(unsigned long pfn,
>-					unsigned int nr_pages, bool onlined)
>+					unsigned long nr_pages, bool onlined)
> {
> 	for (; nr_pages--; pfn++) {
> 		struct page *page = pfn_to_page(pfn);
>@@ -785,7 +785,7 @@ static void virtio_mem_set_fake_offline(unsigned long pfn,
>  * (via generic_online_page()), clear PageDirty().
>  */
> static void virtio_mem_clear_fake_offline(unsigned long pfn,
>-					  unsigned int nr_pages, bool onlined)
>+					  unsigned long nr_pages, bool onlined)
> {
> 	for (; nr_pages--; pfn++) {
> 		struct page *page = pfn_to_page(pfn);
>@@ -800,10 +800,10 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
>  * Release a range of fake-offline pages to the buddy, effectively
>  * fake-onlining them.
>  */
>-static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
>+static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
> {
> 	const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
>-	int i;
>+	unsigned long i;
> 
> 	/*
> 	 * We are always called at least with MAX_ORDER_NR_PAGES
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
  2020-10-12 12:53 ` [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline() David Hildenbrand
@ 2020-10-16  6:24   ` Wei Yang
  2020-10-20  9:31   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  6:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:06PM +0200, David Hildenbrand wrote:
>... which now matches virtio_mem_fake_online(). We'll reuse this
>functionality soon.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 34 ++++++++++++++++++++++++----------
> 1 file changed, 24 insertions(+), 10 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 00d1cfca4713..d132bc54ef57 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -832,6 +832,27 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
> 	}
> }
> 
>+/*
>+ * Try to allocate a range, marking pages fake-offline, effectively
>+ * fake-offlining them.
>+ */
>+static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
>+{
>+	int rc;
>+
>+	rc = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE,
>+				GFP_KERNEL);
>+	if (rc == -ENOMEM)
>+		/* whoops, out of memory */
>+		return rc;
>+	if (rc)
>+		return -EBUSY;
>+
>+	virtio_mem_set_fake_offline(pfn, nr_pages, true);
>+	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>+	return 0;
>+}
>+
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> 	const unsigned long addr = page_to_phys(page);
>@@ -1335,17 +1356,10 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
> 
> 	start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			     sb_id * vm->subblock_size);
>-	rc = alloc_contig_range(start_pfn, start_pfn + nr_pages,
>-				MIGRATE_MOVABLE, GFP_KERNEL);
>-	if (rc == -ENOMEM)
>-		/* whoops, out of memory */
>-		return rc;
>-	if (rc)
>-		return -EBUSY;
> 
>-	/* Mark it as fake-offline before unplugging it */
>-	virtio_mem_set_fake_offline(start_pfn, nr_pages, true);
>-	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
>+	rc = virtio_mem_fake_offline(start_pfn, nr_pages);
>+	if (rc)
>+		return rc;
> 
> 	/* Try to unplug the allocated memory */
> 	rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier
  2020-10-12 12:53 ` [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier David Hildenbrand
@ 2020-10-16  7:15   ` Wei Yang
  2020-10-16  8:00     ` Wei Yang
  2020-10-18 12:38   ` Wei Yang
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16  7:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:07PM +0200, David Hildenbrand wrote:
>Let's factor out the core pieces and place the implementation next to
>virtio_mem_fake_offline(). We'll reuse this functionality soon.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 73 +++++++++++++++++++++++++------------
> 1 file changed, 50 insertions(+), 23 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index d132bc54ef57..a2124892e510 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -168,6 +168,10 @@ static LIST_HEAD(virtio_mem_devices);
> 
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order);
> static void virtio_mem_retry(struct virtio_mem *vm);
>+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
>+						  unsigned long nr_pages);
>+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>+						   unsigned long nr_pages);
> 
> /*
>  * Register a virtio-mem device so it will be considered for the online_page
>@@ -604,27 +608,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>-	struct page *page;
> 	unsigned long pfn;
>-	int sb_id, i;
>+	int sb_id;
> 
> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
>-		/*
>-		 * Drop our reference to the pages so the memory can get
>-		 * offlined and add the unplugged pages to the managed
>-		 * page counters (so offlining code can correctly subtract
>-		 * them again).
>-		 */
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			       sb_id * vm->subblock_size);
>-		adjust_managed_page_count(pfn_to_page(pfn), nr_pages);

One question about the original code, why we want to adjust count here?

The code flow is

    __offline_pages()
        memory_notify(MEM_GOING_OFFLINE, &arg)
	    virtio_mem_notify_going_offline(vm, mb_id)
	        adjust_managed_page_count(pfn_to_page(pfn), nr_pages)
	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages)

Do we adjust the count twice?

>-		for (i = 0; i < nr_pages; i++) {
>-			page = pfn_to_page(pfn + i);
>-			if (WARN_ON(!page_ref_dec_and_test(page)))
>-				dump_page(page, "unplugged page referenced");
>-		}
>+		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
> 	}
> }
> 
>@@ -633,21 +625,14 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> {
> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
> 	unsigned long pfn;
>-	int sb_id, i;
>+	int sb_id;
> 
> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
>-		/*
>-		 * Get the reference we dropped when going offline and
>-		 * subtract the unplugged pages from the managed page
>-		 * counters.
>-		 */
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			       sb_id * vm->subblock_size);
>-		adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>-		for (i = 0; i < nr_pages; i++)
>-			page_ref_inc(pfn_to_page(pfn + i));
>+		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
> 	}
> }
> 
>@@ -853,6 +838,48 @@ static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
> 	return 0;
> }
> 
>+/*
>+ * Handle fake-offline pages when memory is going offline - such that the
>+ * pages can be skipped by mm-core when offlining.
>+ */
>+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
>+						  unsigned long nr_pages)
>+{
>+	struct page *page;
>+	unsigned long i;
>+
>+	/*
>+	 * Drop our reference to the pages so the memory can get offlined
>+	 * and add the unplugged pages to the managed page counters (so
>+	 * offlining code can correctly subtract them again).
>+	 */
>+	adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
>+	/* Drop our reference to the pages so the memory can get offlined. */
>+	for (i = 0; i < nr_pages; i++) {
>+		page = pfn_to_page(pfn + i);
>+		if (WARN_ON(!page_ref_dec_and_test(page)))
>+			dump_page(page, "fake-offline page referenced");
>+	}
>+}
>+
>+/*
>+ * Handle fake-offline pages when memory offlining is canceled - to undo
>+ * what we did in virtio_mem_fake_offline_going_offline().
>+ */
>+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>+						   unsigned long nr_pages)
>+{
>+	unsigned long i;
>+
>+	/*
>+	 * Get the reference we dropped when going offline and subtract the
>+	 * unplugged pages from the managed page counters.
>+	 */
>+	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>+	for (i = 0; i < nr_pages; i++)
>+		page_ref_inc(pfn_to_page(pfn + i));
>+}
>+
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> 	const unsigned long addr = page_to_phys(page);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier
  2020-10-16  7:15   ` Wei Yang
@ 2020-10-16  8:00     ` Wei Yang
  2020-10-16  8:57       ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Andrew Morton, Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Fri, Oct 16, 2020 at 03:15:03PM +0800, Wei Yang wrote:
>On Mon, Oct 12, 2020 at 02:53:07PM +0200, David Hildenbrand wrote:
>>Let's factor out the core pieces and place the implementation next to
>>virtio_mem_fake_offline(). We'll reuse this functionality soon.
>>
>>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>Cc: Jason Wang <jasowang@redhat.com>
>>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>Signed-off-by: David Hildenbrand <david@redhat.com>
>>---
>> drivers/virtio/virtio_mem.c | 73 +++++++++++++++++++++++++------------
>> 1 file changed, 50 insertions(+), 23 deletions(-)
>>
>>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>>index d132bc54ef57..a2124892e510 100644
>>--- a/drivers/virtio/virtio_mem.c
>>+++ b/drivers/virtio/virtio_mem.c
>>@@ -168,6 +168,10 @@ static LIST_HEAD(virtio_mem_devices);
>> 
>> static void virtio_mem_online_page_cb(struct page *page, unsigned int order);
>> static void virtio_mem_retry(struct virtio_mem *vm);
>>+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
>>+						  unsigned long nr_pages);
>>+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>>+						   unsigned long nr_pages);
>> 
>> /*
>>  * Register a virtio-mem device so it will be considered for the online_page
>>@@ -604,27 +608,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
>> 					    unsigned long mb_id)
>> {
>> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>>-	struct page *page;
>> 	unsigned long pfn;
>>-	int sb_id, i;
>>+	int sb_id;
>> 
>> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>> 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>> 			continue;
>>-		/*
>>-		 * Drop our reference to the pages so the memory can get
>>-		 * offlined and add the unplugged pages to the managed
>>-		 * page counters (so offlining code can correctly subtract
>>-		 * them again).
>>-		 */
>> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>> 			       sb_id * vm->subblock_size);
>>-		adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
>
>One question about the original code, why we want to adjust count here?
>
>The code flow is
>
>    __offline_pages()
>        memory_notify(MEM_GOING_OFFLINE, &arg)
>	    virtio_mem_notify_going_offline(vm, mb_id)
>	        adjust_managed_page_count(pfn_to_page(pfn), nr_pages)
>	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages)
>
>Do we adjust the count twice?
>

Ah, I got the reason why we need to adjust count for *unplugged* sub-blocks.

>>-		for (i = 0; i < nr_pages; i++) {
>>-			page = pfn_to_page(pfn + i);
>>-			if (WARN_ON(!page_ref_dec_and_test(page)))

Another question is when we grab a refcount for the unpluged pages? The one
you mentioned in virtio_mem_set_fake_offline().

>>-				dump_page(page, "unplugged page referenced");
>>-		}
>>+		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
>> 	}
>> }
>> 
>>@@ -633,21 +625,14 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
>> {
>> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>> 	unsigned long pfn;
>>-	int sb_id, i;
>>+	int sb_id;
>> 
>> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>> 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>> 			continue;
>>-		/*
>>-		 * Get the reference we dropped when going offline and
>>-		 * subtract the unplugged pages from the managed page
>>-		 * counters.
>>-		 */
>> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>> 			       sb_id * vm->subblock_size);
>>-		adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>>-		for (i = 0; i < nr_pages; i++)
>>-			page_ref_inc(pfn_to_page(pfn + i));
>>+		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
>> 	}
>> }
>> 
>>@@ -853,6 +838,48 @@ static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
>> 	return 0;
>> }
>> 
>>+/*
>>+ * Handle fake-offline pages when memory is going offline - such that the
>>+ * pages can be skipped by mm-core when offlining.
>>+ */
>>+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
>>+						  unsigned long nr_pages)
>>+{
>>+	struct page *page;
>>+	unsigned long i;
>>+
>>+	/*
>>+	 * Drop our reference to the pages so the memory can get offlined
>>+	 * and add the unplugged pages to the managed page counters (so
>>+	 * offlining code can correctly subtract them again).
>>+	 */
>>+	adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
>>+	/* Drop our reference to the pages so the memory can get offlined. */
>>+	for (i = 0; i < nr_pages; i++) {
>>+		page = pfn_to_page(pfn + i);
>>+		if (WARN_ON(!page_ref_dec_and_test(page)))
>>+			dump_page(page, "fake-offline page referenced");
>>+	}
>>+}
>>+
>>+/*
>>+ * Handle fake-offline pages when memory offlining is canceled - to undo
>>+ * what we did in virtio_mem_fake_offline_going_offline().
>>+ */
>>+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>>+						   unsigned long nr_pages)
>>+{
>>+	unsigned long i;
>>+
>>+	/*
>>+	 * Get the reference we dropped when going offline and subtract the
>>+	 * unplugged pages from the managed page counters.
>>+	 */
>>+	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>>+	for (i = 0; i < nr_pages; i++)
>>+		page_ref_inc(pfn_to_page(pfn + i));
>>+}
>>+
>> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>> {
>> 	const unsigned long addr = page_to_phys(page);
>>-- 
>>2.26.2
>
>-- 
>Wei Yang
>Help you, Help me

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM) David Hildenbrand
  2020-10-15  9:33   ` David Hildenbrand
@ 2020-10-16  8:03   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:09PM +0200, David Hildenbrand wrote:
>Let's add some documentation for the current mode - Sub Block Mode (SBM) -
>to prepare for a new mode - Big Block Mode (BBM).
>
>Follow-up patches will properly factor out the existing Sub Block Mode
>(SBM) and implement Device Block Mode (DBM).
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index faeb759687fe..fd8685673fe4 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -27,6 +27,21 @@ static bool unplug_online = true;
> module_param(unplug_online, bool, 0644);
> MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
> 
>+/*
>+ * virtio-mem currently supports the following modes of operation:
>+ *
>+ * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>+ *   size of a Sub Block (SB) is determined based on the device block size, the
>+ *   pageblock size, and the maximum allocation granularity of the buddy.
>+ *   Subblocks within a Linux memory block might either be plugged or unplugged.
>+ *   Memory is added/removed to Linux MM in Linux memory block granularity.
>+ *
>+ * User space / core MM (auto onlining) is responsible for onlining added
>+ * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>+ * always onlined separately, and all memory within a Linux memory block is
>+ * onlined to the same zone - virtio-mem relies on this behavior.
>+ */
>+
> enum virtio_mem_mb_state {
> 	/* Unplugged, not added to Linux. Can be reused later. */
> 	VIRTIO_MEM_MB_STATE_UNUSED = 0,
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 16/29] virtio-mem: memory block states are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 16/29] virtio-mem: memory block states are specific to " David Hildenbrand
@ 2020-10-16  8:40   ` Wei Yang
  2020-10-16  8:43   ` Wei Yang
  2020-10-20  9:48   ` Pankaj Gupta
  2 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:10PM +0200, David Hildenbrand wrote:
>let's use a new "sbm" sub-struct to hold SBM-specific state and rename +
>move applicable definitions, frunctions, and variables (related to
>memory block states).
>
>While at it:
>- Drop the "_STATE" part from memory block states
>- Rename "nb_mb_state" to "mb_count"
>- "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state"
>- Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t"
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 215 ++++++++++++++++++------------------
> 1 file changed, 109 insertions(+), 106 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index fd8685673fe4..e76d6f769aa5 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -42,20 +42,23 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>  * onlined to the same zone - virtio-mem relies on this behavior.
>  */
> 
>-enum virtio_mem_mb_state {
>+/*
>+ * State of a Linux memory block in SBM.
>+ */
>+enum virtio_mem_sbm_mb_state {
> 	/* Unplugged, not added to Linux. Can be reused later. */
>-	VIRTIO_MEM_MB_STATE_UNUSED = 0,
>+	VIRTIO_MEM_SBM_MB_UNUSED = 0,
> 	/* (Partially) plugged, not added to Linux. Error on add_memory(). */
>-	VIRTIO_MEM_MB_STATE_PLUGGED,
>+	VIRTIO_MEM_SBM_MB_PLUGGED,
> 	/* Fully plugged, fully added to Linux, offline. */
>-	VIRTIO_MEM_MB_STATE_OFFLINE,
>+	VIRTIO_MEM_SBM_MB_OFFLINE,
> 	/* Partially plugged, fully added to Linux, offline. */
>-	VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
>+	VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL,
> 	/* Fully plugged, fully added to Linux, online. */
>-	VIRTIO_MEM_MB_STATE_ONLINE,
>+	VIRTIO_MEM_SBM_MB_ONLINE,
> 	/* Partially plugged, fully added to Linux, online. */
>-	VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
>-	VIRTIO_MEM_MB_STATE_COUNT
>+	VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL,
>+	VIRTIO_MEM_SBM_MB_COUNT
> };
> 
> struct virtio_mem {
>@@ -113,9 +116,6 @@ struct virtio_mem {
> 	 */
> 	const char *resource_name;
> 
>-	/* Summary of all memory block states. */
>-	unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
>-
> 	/*
> 	 * We don't want to add too much memory if it's not getting onlined,
> 	 * to avoid running OOM. Besides this threshold, we allow to have at
>@@ -125,27 +125,29 @@ struct virtio_mem {
> 	atomic64_t offline_size;
> 	uint64_t offline_threshold;
> 
>-	/*
>-	 * One byte state per memory block.
>-	 *
>-	 * Allocated via vmalloc(). When preparing new blocks, resized
>-	 * (alloc+copy+free) when needed (crossing pages with the next mb).
>-	 * (when crossing pages).
>-	 *
>-	 * With 128MB memory blocks, we have states for 512GB of memory in one
>-	 * page.
>-	 */
>-	uint8_t *mb_state;
>+	struct {
>+		/* Summary of all memory block states. */
>+		unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
>+
>+		/*
>+		 * One byte state per memory block. Allocated via vmalloc().
>+		 * Resized (alloc+copy+free) on demand.
>+		 *
>+		 * With 128 MiB memory blocks, we have states for 512 GiB of
>+		 * memory in one 4 KiB page.
>+		 */
>+		uint8_t *mb_states;
>+	} sbm;
> 
> 	/*
>-	 * $nb_sb_per_mb bit per memory block. Handled similar to mb_state.
>+	 * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
> 	 *
> 	 * With 4MB subblocks, we manage 128GB of memory in one page.
> 	 */
> 	unsigned long *sb_bitmap;

Why not include this into sbm? As I expect this is not necessary for BBM.

> 
> 	/*
>-	 * Mutex that protects the nb_mb_state, mb_state, and sb_bitmap.
>+	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
> 	 *
> 	 * When this lock is held the pointers can't change, ONLINE and
> 	 * OFFLINE blocks can't change the state and no subblocks will get
>@@ -254,70 +256,70 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> /*
>  * Set the state of a memory block, taking care of the state counter.
>  */
>-static void virtio_mem_mb_set_state(struct virtio_mem *vm, unsigned long mb_id,
>-				    enum virtio_mem_mb_state state)
>+static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
>+					unsigned long mb_id, uint8_t state)
> {
> 	const unsigned long idx = mb_id - vm->first_mb_id;
>-	enum virtio_mem_mb_state old_state;
>+	uint8_t old_state;
> 
>-	old_state = vm->mb_state[idx];
>-	vm->mb_state[idx] = state;
>+	old_state = vm->sbm.mb_states[idx];
>+	vm->sbm.mb_states[idx] = state;
> 
>-	BUG_ON(vm->nb_mb_state[old_state] == 0);
>-	vm->nb_mb_state[old_state]--;
>-	vm->nb_mb_state[state]++;
>+	BUG_ON(vm->sbm.mb_count[old_state] == 0);
>+	vm->sbm.mb_count[old_state]--;
>+	vm->sbm.mb_count[state]++;
> }
> 
> /*
>  * Get the state of a memory block.
>  */
>-static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>-							unsigned long mb_id)
>+static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
>+					   unsigned long mb_id)
> {
> 	const unsigned long idx = mb_id - vm->first_mb_id;
> 
>-	return vm->mb_state[idx];
>+	return vm->sbm.mb_states[idx];
> }
> 
> /*
>  * Prepare the state array for the next memory block.
>  */
>-static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>+static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> 	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
> 	unsigned long new_bytes = old_bytes + 1;
> 	int old_pages = PFN_UP(old_bytes);
> 	int new_pages = PFN_UP(new_bytes);
>-	uint8_t *new_mb_state;
>+	uint8_t *new_array;
> 
>-	if (vm->mb_state && old_pages == new_pages)
>+	if (vm->sbm.mb_states && old_pages == new_pages)
> 		return 0;
> 
>-	new_mb_state = vzalloc(new_pages * PAGE_SIZE);
>-	if (!new_mb_state)
>+	new_array = vzalloc(new_pages * PAGE_SIZE);
>+	if (!new_array)
> 		return -ENOMEM;
> 
> 	mutex_lock(&vm->hotplug_mutex);
>-	if (vm->mb_state)
>-		memcpy(new_mb_state, vm->mb_state, old_pages * PAGE_SIZE);
>-	vfree(vm->mb_state);
>-	vm->mb_state = new_mb_state;
>+	if (vm->sbm.mb_states)
>+		memcpy(new_array, vm->sbm.mb_states, old_pages * PAGE_SIZE);
>+	vfree(vm->sbm.mb_states);
>+	vm->sbm.mb_states = new_array;
> 	mutex_unlock(&vm->hotplug_mutex);
> 
> 	return 0;
> }
> 
>-#define virtio_mem_for_each_mb_state(_vm, _mb_id, _state) \
>+#define virtio_mem_sbm_for_each_mb(_vm, _mb_id, _state) \
> 	for (_mb_id = _vm->first_mb_id; \
>-	     _mb_id < _vm->next_mb_id && _vm->nb_mb_state[_state]; \
>+	     _mb_id < _vm->next_mb_id && _vm->sbm.mb_count[_state]; \
> 	     _mb_id++) \
>-		if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
>+		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
>-#define virtio_mem_for_each_mb_state_rev(_vm, _mb_id, _state) \
>+#define virtio_mem_sbm_for_each_mb_rev(_vm, _mb_id, _state) \
> 	for (_mb_id = _vm->next_mb_id - 1; \
>-	     _mb_id >= _vm->first_mb_id && _vm->nb_mb_state[_state]; \
>+	     _mb_id >= _vm->first_mb_id && _vm->sbm.mb_count[_state]; \
> 	     _mb_id--) \
>-		if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
>+		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
> /*
>  * Mark all selected subblocks plugged.
>@@ -573,9 +575,9 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
> static int virtio_mem_notify_going_online(struct virtio_mem *vm,
> 					  unsigned long mb_id)
> {
>-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
>-	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
>-	case VIRTIO_MEM_MB_STATE_OFFLINE:
>+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
>+	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
>+	case VIRTIO_MEM_SBM_MB_OFFLINE:
> 		return NOTIFY_OK;
> 	default:
> 		break;
>@@ -588,14 +590,14 @@ static int virtio_mem_notify_going_online(struct virtio_mem *vm,
> static void virtio_mem_notify_offline(struct virtio_mem *vm,
> 				      unsigned long mb_id)
> {
>-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
>-	case VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL:
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
>+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
>+	case VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 		break;
>-	case VIRTIO_MEM_MB_STATE_ONLINE:
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE);
>+	case VIRTIO_MEM_SBM_MB_ONLINE:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE);
> 		break;
> 	default:
> 		BUG();
>@@ -605,13 +607,14 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
> 
> static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
> {
>-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
>-	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
>+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
>+	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
> 		break;
>-	case VIRTIO_MEM_MB_STATE_OFFLINE:
>-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
>+	case VIRTIO_MEM_SBM_MB_OFFLINE:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_ONLINE);
> 		break;
> 	default:
> 		BUG();
>@@ -1160,7 +1163,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> 		return -ENOSPC;
> 
> 	/* Resize the state array if required. */
>-	rc = virtio_mem_mb_state_prepare_next_mb(vm);
>+	rc = virtio_mem_sbm_mb_states_prepare_next_mb(vm);
> 	if (rc)
> 		return rc;
> 
>@@ -1169,7 +1172,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> 	if (rc)
> 		return rc;
> 
>-	vm->nb_mb_state[VIRTIO_MEM_MB_STATE_UNUSED]++;
>+	vm->sbm.mb_count[VIRTIO_MEM_SBM_MB_UNUSED]++;
> 	*mb_id = vm->next_mb_id++;
> 	return 0;
> }
>@@ -1203,16 +1206,16 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
> 	 * so the memory notifiers will find the block in the right state.
> 	 */
> 	if (count == vm->nb_sb_per_mb)
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE);
> 	else
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 
> 	/* Add the memory block to linux - if that fails, try to unplug. */
> 	rc = virtio_mem_mb_add(vm, mb_id);
> 	if (rc) {
>-		enum virtio_mem_mb_state new_state = VIRTIO_MEM_MB_STATE_UNUSED;
>+		int new_state = VIRTIO_MEM_SBM_MB_UNUSED;
> 
> 		dev_err(&vm->vdev->dev,
> 			"adding memory block %lu failed with %d\n", mb_id, rc);
>@@ -1222,8 +1225,8 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
> 		 * where adding of memory failed - especially on -ENOMEM.
> 		 */
> 		if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
>-			new_state = VIRTIO_MEM_MB_STATE_PLUGGED;
>-		virtio_mem_mb_set_state(vm, mb_id, new_state);
>+			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
>+		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
> 		return rc;
> 	}
> 
>@@ -1276,11 +1279,11 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> 
> 	if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> 		if (online)
>-			virtio_mem_mb_set_state(vm, mb_id,
>-						VIRTIO_MEM_MB_STATE_ONLINE);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_ONLINE);
> 		else
>-			virtio_mem_mb_set_state(vm, mb_id,
>-						VIRTIO_MEM_MB_STATE_OFFLINE);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_OFFLINE);
> 	}
> 
> 	return 0;
>@@ -1302,8 +1305,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	mutex_lock(&vm->hotplug_mutex);
> 
> 	/* Try to plug subblocks of partially plugged online blocks. */
>-	virtio_mem_for_each_mb_state(vm, mb_id,
>-				     VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id,
>+				   VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
> 		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, true);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
>@@ -1311,8 +1314,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to plug subblocks of partially plugged offline blocks. */
>-	virtio_mem_for_each_mb_state(vm, mb_id,
>-				     VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id,
>+				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
> 		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, false);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
>@@ -1326,7 +1329,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	mutex_unlock(&vm->hotplug_mutex);
> 
> 	/* Try to plug and add unused blocks */
>-	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_UNUSED) {
> 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
> 			return -ENOSPC;
> 
>@@ -1375,8 +1378,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> 
> 	/* some subblocks might have been unplugged even on failure */
> 	if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 	if (rc)
> 		return rc;
> 
>@@ -1387,8 +1390,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> 		 * unplugged. Temporarily drop the mutex, so
> 		 * any pending GOING_ONLINE requests can be serviced/rejected.
> 		 */
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_UNUSED);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_UNUSED);
> 
> 		mutex_unlock(&vm->hotplug_mutex);
> 		rc = virtio_mem_mb_remove(vm, mb_id);
>@@ -1426,8 +1429,8 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
> 		return rc;
> 	}
> 
>-	virtio_mem_mb_set_state(vm, mb_id,
>-				VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
>+	virtio_mem_sbm_set_mb_state(vm, mb_id,
>+				    VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
> 	return 0;
> }
> 
>@@ -1487,8 +1490,8 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> 		mutex_lock(&vm->hotplug_mutex);
> 		if (!rc)
>-			virtio_mem_mb_set_state(vm, mb_id,
>-						VIRTIO_MEM_MB_STATE_UNUSED);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 
> 	return 0;
>@@ -1514,8 +1517,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	mutex_lock(&vm->hotplug_mutex);
> 
> 	/* Try to unplug subblocks of partially plugged offline blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
>+				       VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
> 		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
> 							 &nb_sb);
> 		if (rc || !nb_sb)
>@@ -1524,8 +1527,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to unplug subblocks of plugged offline blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_OFFLINE) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_OFFLINE) {
> 		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
> 							 &nb_sb);
> 		if (rc || !nb_sb)
>@@ -1539,8 +1541,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to unplug subblocks of partially plugged online blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
>+				       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
> 		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
> 							&nb_sb);
> 		if (rc || !nb_sb)
>@@ -1551,8 +1553,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to unplug subblocks of plugged online blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_ONLINE) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_ONLINE) {
> 		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
> 							&nb_sb);
> 		if (rc || !nb_sb)
>@@ -1578,11 +1579,12 @@ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> 	unsigned long mb_id;
> 	int rc;
> 
>-	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_PLUGGED) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
> 		rc = virtio_mem_mb_unplug(vm, mb_id);
> 		if (rc)
> 			return rc;
>-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 
> 	return 0;
>@@ -1974,11 +1976,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	 * After we unregistered our callbacks, user space can online partially
> 	 * plugged offline blocks. Make sure to remove them.
> 	 */
>-	virtio_mem_for_each_mb_state(vm, mb_id,
>-				     VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id,
>+				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
> 		rc = virtio_mem_mb_remove(vm, mb_id);
> 		BUG_ON(rc);
>-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 	/*
> 	 * After we unregistered our callbacks, user space can no longer
>@@ -2003,7 +2006,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	}
> 
> 	/* remove all tracking data - no locking needed */
>-	vfree(vm->mb_state);
>+	vfree(vm->sbm.mb_states);
> 	vfree(vm->sb_bitmap);
> 
> 	/* reset the device and cleanup the queues */
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 17/29] virito-mem: subblock states are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 17/29] virito-mem: subblock " David Hildenbrand
@ 2020-10-16  8:43   ` Wei Yang
  2020-10-20  9:54   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:11PM +0200, David Hildenbrand wrote:
>Let's rename and move accordingly. While at it, rename sb_bitmap to
>"sb_states".
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Ok, you separate the change into two parts.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 118 +++++++++++++++++++-----------------
> 1 file changed, 62 insertions(+), 56 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index e76d6f769aa5..2cc497ad8298 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -137,17 +137,23 @@ struct virtio_mem {
> 		 * memory in one 4 KiB page.
> 		 */
> 		uint8_t *mb_states;
>-	} sbm;
> 
>-	/*
>-	 * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
>-	 *
>-	 * With 4MB subblocks, we manage 128GB of memory in one page.
>-	 */
>-	unsigned long *sb_bitmap;
>+		/*
>+		 * Bitmap: one bit per subblock. Allocated similar to
>+		 * sbm.mb_states.
>+		 *
>+		 * A set bit means the corresponding subblock is plugged,
>+		 * otherwise it's unblocked.
>+		 *
>+		 * With 4 MiB subblocks, we manage 128 GiB of memory in one
>+		 * 4 KiB page.
>+		 */
>+		unsigned long *sb_states;
>+	} sbm;
> 
> 	/*
>-	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
>+	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>+	 * sbm.sb_states.
> 	 *
> 	 * When this lock is held the pointers can't change, ONLINE and
> 	 * OFFLINE blocks can't change the state and no subblocks will get
>@@ -326,13 +332,13 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
>  *
>  * Will not modify the state of the memory block.
>  */
>-static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
>-					 unsigned long mb_id, int sb_id,
>-					 int count)
>+static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
>+					  unsigned long mb_id, int sb_id,
>+					  int count)
> {
> 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> 
>-	__bitmap_set(vm->sb_bitmap, bit, count);
>+	__bitmap_set(vm->sbm.sb_states, bit, count);
> }
> 
> /*
>@@ -340,86 +346,87 @@ static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
>  *
>  * Will not modify the state of the memory block.
>  */
>-static void virtio_mem_mb_set_sb_unplugged(struct virtio_mem *vm,
>-					   unsigned long mb_id, int sb_id,
>-					   int count)
>+static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
>+					    unsigned long mb_id, int sb_id,
>+					    int count)
> {
> 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> 
>-	__bitmap_clear(vm->sb_bitmap, bit, count);
>+	__bitmap_clear(vm->sbm.sb_states, bit, count);
> }
> 
> /*
>  * Test if all selected subblocks are plugged.
>  */
>-static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm,
>-					  unsigned long mb_id, int sb_id,
>-					  int count)
>+static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
>+					   unsigned long mb_id, int sb_id,
>+					   int count)
> {
> 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> 
> 	if (count == 1)
>-		return test_bit(bit, vm->sb_bitmap);
>+		return test_bit(bit, vm->sbm.sb_states);
> 
> 	/* TODO: Helper similar to bitmap_set() */
>-	return find_next_zero_bit(vm->sb_bitmap, bit + count, bit) >=
>+	return find_next_zero_bit(vm->sbm.sb_states, bit + count, bit) >=
> 	       bit + count;
> }
> 
> /*
>  * Test if all selected subblocks are unplugged.
>  */
>-static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm,
>-					    unsigned long mb_id, int sb_id,
>-					    int count)
>+static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
>+					     unsigned long mb_id, int sb_id,
>+					     int count)
> {
> 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> 
> 	/* TODO: Helper similar to bitmap_set() */
>-	return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count;
>+	return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
>+	       bit + count;
> }
> 
> /*
>  * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
>  * none.
>  */
>-static int virtio_mem_mb_first_unplugged_sb(struct virtio_mem *vm,
>+static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
> 	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
> 
>-	return find_next_zero_bit(vm->sb_bitmap, bit + vm->nb_sb_per_mb, bit) -
>-	       bit;
>+	return find_next_zero_bit(vm->sbm.sb_states,
>+				  bit + vm->nb_sb_per_mb, bit) - bit;
> }
> 
> /*
>  * Prepare the subblock bitmap for the next memory block.
>  */
>-static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
>+static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> 	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
> 	const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
> 	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
> 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> 	int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
>-	unsigned long *new_sb_bitmap, *old_sb_bitmap;
>+	unsigned long *new_bitmap, *old_bitmap;
> 
>-	if (vm->sb_bitmap && old_pages == new_pages)
>+	if (vm->sbm.sb_states && old_pages == new_pages)
> 		return 0;
> 
>-	new_sb_bitmap = vzalloc(new_pages * PAGE_SIZE);
>-	if (!new_sb_bitmap)
>+	new_bitmap = vzalloc(new_pages * PAGE_SIZE);
>+	if (!new_bitmap)
> 		return -ENOMEM;
> 
> 	mutex_lock(&vm->hotplug_mutex);
>-	if (new_sb_bitmap)
>-		memcpy(new_sb_bitmap, vm->sb_bitmap, old_pages * PAGE_SIZE);
>+	if (new_bitmap)
>+		memcpy(new_bitmap, vm->sbm.sb_states, old_pages * PAGE_SIZE);
> 
>-	old_sb_bitmap = vm->sb_bitmap;
>-	vm->sb_bitmap = new_sb_bitmap;
>+	old_bitmap = vm->sbm.sb_states;
>+	vm->sbm.sb_states = new_bitmap;
> 	mutex_unlock(&vm->hotplug_mutex);
> 
>-	vfree(old_sb_bitmap);
>+	vfree(old_bitmap);
> 	return 0;
> }
> 
>@@ -630,7 +637,7 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> 	int sb_id;
> 
> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>-		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			       sb_id * vm->subblock_size);
>@@ -646,7 +653,7 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> 	int sb_id;
> 
> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>-		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			       sb_id * vm->subblock_size);
>@@ -936,7 +943,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> 		 * If plugged, online the pages, otherwise, set them fake
> 		 * offline (PageOffline).
> 		 */
>-		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			generic_online_page(page, order);
> 		else
> 			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>@@ -1071,7 +1078,7 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 
> 	rc = virtio_mem_send_plug_request(vm, addr, size);
> 	if (!rc)
>-		virtio_mem_mb_set_sb_plugged(vm, mb_id, sb_id, count);
>+		virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
> 	return rc;
> }
> 
>@@ -1092,7 +1099,7 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 
> 	rc = virtio_mem_send_unplug_request(vm, addr, size);
> 	if (!rc)
>-		virtio_mem_mb_set_sb_unplugged(vm, mb_id, sb_id, count);
>+		virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
> 	return rc;
> }
> 
>@@ -1115,14 +1122,14 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
> 	while (*nb_sb) {
> 		/* Find the next candidate subblock */
> 		while (sb_id >= 0 &&
>-		       virtio_mem_mb_test_sb_unplugged(vm, mb_id, sb_id, 1))
>+		       virtio_mem_sbm_test_sb_unplugged(vm, mb_id, sb_id, 1))
> 			sb_id--;
> 		if (sb_id < 0)
> 			break;
> 		/* Try to unplug multiple subblocks at a time */
> 		count = 1;
> 		while (count < *nb_sb && sb_id > 0 &&
>-		       virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
>+		       virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
> 			count++;
> 			sb_id--;
> 		}
>@@ -1168,7 +1175,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> 		return rc;
> 
> 	/* Resize the subblock bitmap if required. */
>-	rc = virtio_mem_sb_bitmap_prepare_next_mb(vm);
>+	rc = virtio_mem_sbm_sb_states_prepare_next_mb(vm);
> 	if (rc)
> 		return rc;
> 
>@@ -1253,14 +1260,13 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> 		return -EINVAL;
> 
> 	while (*nb_sb) {
>-		sb_id = virtio_mem_mb_first_unplugged_sb(vm, mb_id);
>+		sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
> 		if (sb_id >= vm->nb_sb_per_mb)
> 			break;
> 		count = 1;
> 		while (count < *nb_sb &&
> 		       sb_id + count < vm->nb_sb_per_mb &&
>-		       !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id + count,
>-						      1))
>+		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> 			count++;
> 
> 		rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
>@@ -1277,7 +1283,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> 		virtio_mem_fake_online(pfn, nr_pages);
> 	}
> 
>-	if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> 		if (online)
> 			virtio_mem_sbm_set_mb_state(vm, mb_id,
> 						    VIRTIO_MEM_SBM_MB_ONLINE);
>@@ -1377,13 +1383,13 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> 	rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
> 
> 	/* some subblocks might have been unplugged even on failure */
>-	if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>+	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
> 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 	if (rc)
> 		return rc;
> 
>-	if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> 		/*
> 		 * Remove the block from Linux - this should never fail.
> 		 * Hinder the block from getting onlined by marking it
>@@ -1452,7 +1458,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> 
> 	/* If possible, try to unplug the complete block in one shot. */
> 	if (*nb_sb >= vm->nb_sb_per_mb &&
>-	    virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> 		rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
> 						    vm->nb_sb_per_mb);
> 		if (!rc) {
>@@ -1466,7 +1472,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> 	for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> 		/* Find the next candidate subblock */
> 		while (sb_id >= 0 &&
>-		       !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			sb_id--;
> 		if (sb_id < 0)
> 			break;
>@@ -1485,7 +1491,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> 	 * remove it. This will usually not fail, as no memory is in use
> 	 * anymore - however some other notifiers might NACK the request.
> 	 */
>-	if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> 		mutex_unlock(&vm->hotplug_mutex);
> 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> 		mutex_lock(&vm->hotplug_mutex);
>@@ -2007,7 +2013,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 
> 	/* remove all tracking data - no locking needed */
> 	vfree(vm->sbm.mb_states);
>-	vfree(vm->sb_bitmap);
>+	vfree(vm->sbm.sb_states);
> 
> 	/* reset the device and cleanup the queues */
> 	vdev->config->reset(vdev);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 16/29] virtio-mem: memory block states are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 16/29] virtio-mem: memory block states are specific to " David Hildenbrand
  2020-10-16  8:40   ` Wei Yang
@ 2020-10-16  8:43   ` Wei Yang
  2020-10-20  9:48   ` Pankaj Gupta
  2 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:10PM +0200, David Hildenbrand wrote:
>let's use a new "sbm" sub-struct to hold SBM-specific state and rename +
>move applicable definitions, frunctions, and variables (related to
>memory block states).
>
>While at it:
>- Drop the "_STATE" part from memory block states
>- Rename "nb_mb_state" to "mb_count"
>- "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state"
>- Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t"
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 215 ++++++++++++++++++------------------
> 1 file changed, 109 insertions(+), 106 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index fd8685673fe4..e76d6f769aa5 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -42,20 +42,23 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>  * onlined to the same zone - virtio-mem relies on this behavior.
>  */
> 
>-enum virtio_mem_mb_state {
>+/*
>+ * State of a Linux memory block in SBM.
>+ */
>+enum virtio_mem_sbm_mb_state {
> 	/* Unplugged, not added to Linux. Can be reused later. */
>-	VIRTIO_MEM_MB_STATE_UNUSED = 0,
>+	VIRTIO_MEM_SBM_MB_UNUSED = 0,
> 	/* (Partially) plugged, not added to Linux. Error on add_memory(). */
>-	VIRTIO_MEM_MB_STATE_PLUGGED,
>+	VIRTIO_MEM_SBM_MB_PLUGGED,
> 	/* Fully plugged, fully added to Linux, offline. */
>-	VIRTIO_MEM_MB_STATE_OFFLINE,
>+	VIRTIO_MEM_SBM_MB_OFFLINE,
> 	/* Partially plugged, fully added to Linux, offline. */
>-	VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
>+	VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL,
> 	/* Fully plugged, fully added to Linux, online. */
>-	VIRTIO_MEM_MB_STATE_ONLINE,
>+	VIRTIO_MEM_SBM_MB_ONLINE,
> 	/* Partially plugged, fully added to Linux, online. */
>-	VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
>-	VIRTIO_MEM_MB_STATE_COUNT
>+	VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL,
>+	VIRTIO_MEM_SBM_MB_COUNT
> };
> 
> struct virtio_mem {
>@@ -113,9 +116,6 @@ struct virtio_mem {
> 	 */
> 	const char *resource_name;
> 
>-	/* Summary of all memory block states. */
>-	unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
>-
> 	/*
> 	 * We don't want to add too much memory if it's not getting onlined,
> 	 * to avoid running OOM. Besides this threshold, we allow to have at
>@@ -125,27 +125,29 @@ struct virtio_mem {
> 	atomic64_t offline_size;
> 	uint64_t offline_threshold;
> 
>-	/*
>-	 * One byte state per memory block.
>-	 *
>-	 * Allocated via vmalloc(). When preparing new blocks, resized
>-	 * (alloc+copy+free) when needed (crossing pages with the next mb).
>-	 * (when crossing pages).
>-	 *
>-	 * With 128MB memory blocks, we have states for 512GB of memory in one
>-	 * page.
>-	 */
>-	uint8_t *mb_state;
>+	struct {
>+		/* Summary of all memory block states. */
>+		unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
>+
>+		/*
>+		 * One byte state per memory block. Allocated via vmalloc().
>+		 * Resized (alloc+copy+free) on demand.
>+		 *
>+		 * With 128 MiB memory blocks, we have states for 512 GiB of
>+		 * memory in one 4 KiB page.
>+		 */
>+		uint8_t *mb_states;
>+	} sbm;
> 
> 	/*
>-	 * $nb_sb_per_mb bit per memory block. Handled similar to mb_state.
>+	 * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
> 	 *
> 	 * With 4MB subblocks, we manage 128GB of memory in one page.
> 	 */
> 	unsigned long *sb_bitmap;
> 
> 	/*
>-	 * Mutex that protects the nb_mb_state, mb_state, and sb_bitmap.
>+	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
> 	 *
> 	 * When this lock is held the pointers can't change, ONLINE and
> 	 * OFFLINE blocks can't change the state and no subblocks will get
>@@ -254,70 +256,70 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> /*
>  * Set the state of a memory block, taking care of the state counter.
>  */
>-static void virtio_mem_mb_set_state(struct virtio_mem *vm, unsigned long mb_id,
>-				    enum virtio_mem_mb_state state)
>+static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
>+					unsigned long mb_id, uint8_t state)
> {
> 	const unsigned long idx = mb_id - vm->first_mb_id;
>-	enum virtio_mem_mb_state old_state;
>+	uint8_t old_state;
> 
>-	old_state = vm->mb_state[idx];
>-	vm->mb_state[idx] = state;
>+	old_state = vm->sbm.mb_states[idx];
>+	vm->sbm.mb_states[idx] = state;
> 
>-	BUG_ON(vm->nb_mb_state[old_state] == 0);
>-	vm->nb_mb_state[old_state]--;
>-	vm->nb_mb_state[state]++;
>+	BUG_ON(vm->sbm.mb_count[old_state] == 0);
>+	vm->sbm.mb_count[old_state]--;
>+	vm->sbm.mb_count[state]++;
> }
> 
> /*
>  * Get the state of a memory block.
>  */
>-static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>-							unsigned long mb_id)
>+static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
>+					   unsigned long mb_id)
> {
> 	const unsigned long idx = mb_id - vm->first_mb_id;
> 
>-	return vm->mb_state[idx];
>+	return vm->sbm.mb_states[idx];
> }
> 
> /*
>  * Prepare the state array for the next memory block.
>  */
>-static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>+static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> 	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
> 	unsigned long new_bytes = old_bytes + 1;
> 	int old_pages = PFN_UP(old_bytes);
> 	int new_pages = PFN_UP(new_bytes);
>-	uint8_t *new_mb_state;
>+	uint8_t *new_array;
> 
>-	if (vm->mb_state && old_pages == new_pages)
>+	if (vm->sbm.mb_states && old_pages == new_pages)
> 		return 0;
> 
>-	new_mb_state = vzalloc(new_pages * PAGE_SIZE);
>-	if (!new_mb_state)
>+	new_array = vzalloc(new_pages * PAGE_SIZE);
>+	if (!new_array)
> 		return -ENOMEM;
> 
> 	mutex_lock(&vm->hotplug_mutex);
>-	if (vm->mb_state)
>-		memcpy(new_mb_state, vm->mb_state, old_pages * PAGE_SIZE);
>-	vfree(vm->mb_state);
>-	vm->mb_state = new_mb_state;
>+	if (vm->sbm.mb_states)
>+		memcpy(new_array, vm->sbm.mb_states, old_pages * PAGE_SIZE);
>+	vfree(vm->sbm.mb_states);
>+	vm->sbm.mb_states = new_array;
> 	mutex_unlock(&vm->hotplug_mutex);
> 
> 	return 0;
> }
> 
>-#define virtio_mem_for_each_mb_state(_vm, _mb_id, _state) \
>+#define virtio_mem_sbm_for_each_mb(_vm, _mb_id, _state) \
> 	for (_mb_id = _vm->first_mb_id; \
>-	     _mb_id < _vm->next_mb_id && _vm->nb_mb_state[_state]; \
>+	     _mb_id < _vm->next_mb_id && _vm->sbm.mb_count[_state]; \
> 	     _mb_id++) \
>-		if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
>+		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
>-#define virtio_mem_for_each_mb_state_rev(_vm, _mb_id, _state) \
>+#define virtio_mem_sbm_for_each_mb_rev(_vm, _mb_id, _state) \
> 	for (_mb_id = _vm->next_mb_id - 1; \
>-	     _mb_id >= _vm->first_mb_id && _vm->nb_mb_state[_state]; \
>+	     _mb_id >= _vm->first_mb_id && _vm->sbm.mb_count[_state]; \
> 	     _mb_id--) \
>-		if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
>+		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
> /*
>  * Mark all selected subblocks plugged.
>@@ -573,9 +575,9 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
> static int virtio_mem_notify_going_online(struct virtio_mem *vm,
> 					  unsigned long mb_id)
> {
>-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
>-	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
>-	case VIRTIO_MEM_MB_STATE_OFFLINE:
>+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
>+	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
>+	case VIRTIO_MEM_SBM_MB_OFFLINE:
> 		return NOTIFY_OK;
> 	default:
> 		break;
>@@ -588,14 +590,14 @@ static int virtio_mem_notify_going_online(struct virtio_mem *vm,
> static void virtio_mem_notify_offline(struct virtio_mem *vm,
> 				      unsigned long mb_id)
> {
>-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
>-	case VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL:
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
>+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
>+	case VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 		break;
>-	case VIRTIO_MEM_MB_STATE_ONLINE:
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE);
>+	case VIRTIO_MEM_SBM_MB_ONLINE:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE);
> 		break;
> 	default:
> 		BUG();
>@@ -605,13 +607,14 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
> 
> static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
> {
>-	switch (virtio_mem_mb_get_state(vm, mb_id)) {
>-	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
>+	switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
>+	case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
> 		break;
>-	case VIRTIO_MEM_MB_STATE_OFFLINE:
>-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
>+	case VIRTIO_MEM_SBM_MB_OFFLINE:
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_ONLINE);
> 		break;
> 	default:
> 		BUG();
>@@ -1160,7 +1163,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> 		return -ENOSPC;
> 
> 	/* Resize the state array if required. */
>-	rc = virtio_mem_mb_state_prepare_next_mb(vm);
>+	rc = virtio_mem_sbm_mb_states_prepare_next_mb(vm);
> 	if (rc)
> 		return rc;
> 
>@@ -1169,7 +1172,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> 	if (rc)
> 		return rc;
> 
>-	vm->nb_mb_state[VIRTIO_MEM_MB_STATE_UNUSED]++;
>+	vm->sbm.mb_count[VIRTIO_MEM_SBM_MB_UNUSED]++;
> 	*mb_id = vm->next_mb_id++;
> 	return 0;
> }
>@@ -1203,16 +1206,16 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
> 	 * so the memory notifiers will find the block in the right state.
> 	 */
> 	if (count == vm->nb_sb_per_mb)
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE);
> 	else
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 
> 	/* Add the memory block to linux - if that fails, try to unplug. */
> 	rc = virtio_mem_mb_add(vm, mb_id);
> 	if (rc) {
>-		enum virtio_mem_mb_state new_state = VIRTIO_MEM_MB_STATE_UNUSED;
>+		int new_state = VIRTIO_MEM_SBM_MB_UNUSED;
> 
> 		dev_err(&vm->vdev->dev,
> 			"adding memory block %lu failed with %d\n", mb_id, rc);
>@@ -1222,8 +1225,8 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
> 		 * where adding of memory failed - especially on -ENOMEM.
> 		 */
> 		if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
>-			new_state = VIRTIO_MEM_MB_STATE_PLUGGED;
>-		virtio_mem_mb_set_state(vm, mb_id, new_state);
>+			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
>+		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
> 		return rc;
> 	}
> 
>@@ -1276,11 +1279,11 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> 
> 	if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> 		if (online)
>-			virtio_mem_mb_set_state(vm, mb_id,
>-						VIRTIO_MEM_MB_STATE_ONLINE);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_ONLINE);
> 		else
>-			virtio_mem_mb_set_state(vm, mb_id,
>-						VIRTIO_MEM_MB_STATE_OFFLINE);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_OFFLINE);
> 	}
> 
> 	return 0;
>@@ -1302,8 +1305,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	mutex_lock(&vm->hotplug_mutex);
> 
> 	/* Try to plug subblocks of partially plugged online blocks. */
>-	virtio_mem_for_each_mb_state(vm, mb_id,
>-				     VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id,
>+				   VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
> 		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, true);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
>@@ -1311,8 +1314,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to plug subblocks of partially plugged offline blocks. */
>-	virtio_mem_for_each_mb_state(vm, mb_id,
>-				     VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id,
>+				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
> 		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, false);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
>@@ -1326,7 +1329,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	mutex_unlock(&vm->hotplug_mutex);
> 
> 	/* Try to plug and add unused blocks */
>-	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_UNUSED) {
> 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
> 			return -ENOSPC;
> 
>@@ -1375,8 +1378,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> 
> 	/* some subblocks might have been unplugged even on failure */
> 	if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 	if (rc)
> 		return rc;
> 
>@@ -1387,8 +1390,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> 		 * unplugged. Temporarily drop the mutex, so
> 		 * any pending GOING_ONLINE requests can be serviced/rejected.
> 		 */
>-		virtio_mem_mb_set_state(vm, mb_id,
>-					VIRTIO_MEM_MB_STATE_UNUSED);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_UNUSED);
> 
> 		mutex_unlock(&vm->hotplug_mutex);
> 		rc = virtio_mem_mb_remove(vm, mb_id);
>@@ -1426,8 +1429,8 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
> 		return rc;
> 	}
> 
>-	virtio_mem_mb_set_state(vm, mb_id,
>-				VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
>+	virtio_mem_sbm_set_mb_state(vm, mb_id,
>+				    VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
> 	return 0;
> }
> 
>@@ -1487,8 +1490,8 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> 		mutex_lock(&vm->hotplug_mutex);
> 		if (!rc)
>-			virtio_mem_mb_set_state(vm, mb_id,
>-						VIRTIO_MEM_MB_STATE_UNUSED);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 
> 	return 0;
>@@ -1514,8 +1517,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	mutex_lock(&vm->hotplug_mutex);
> 
> 	/* Try to unplug subblocks of partially plugged offline blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
>+				       VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
> 		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
> 							 &nb_sb);
> 		if (rc || !nb_sb)
>@@ -1524,8 +1527,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to unplug subblocks of plugged offline blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_OFFLINE) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_OFFLINE) {
> 		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
> 							 &nb_sb);
> 		if (rc || !nb_sb)
>@@ -1539,8 +1541,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to unplug subblocks of partially plugged online blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
>+				       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
> 		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
> 							&nb_sb);
> 		if (rc || !nb_sb)
>@@ -1551,8 +1553,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	}
> 
> 	/* Try to unplug subblocks of plugged online blocks. */
>-	virtio_mem_for_each_mb_state_rev(vm, mb_id,
>-					 VIRTIO_MEM_MB_STATE_ONLINE) {
>+	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_ONLINE) {
> 		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
> 							&nb_sb);
> 		if (rc || !nb_sb)
>@@ -1578,11 +1579,12 @@ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> 	unsigned long mb_id;
> 	int rc;
> 
>-	virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_PLUGGED) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
> 		rc = virtio_mem_mb_unplug(vm, mb_id);
> 		if (rc)
> 			return rc;
>-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 
> 	return 0;
>@@ -1974,11 +1976,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	 * After we unregistered our callbacks, user space can online partially
> 	 * plugged offline blocks. Make sure to remove them.
> 	 */
>-	virtio_mem_for_each_mb_state(vm, mb_id,
>-				     VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
>+	virtio_mem_sbm_for_each_mb(vm, mb_id,
>+				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
> 		rc = virtio_mem_mb_remove(vm, mb_id);
> 		BUG_ON(rc);
>-		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
>+		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+					    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 	/*
> 	 * After we unregistered our callbacks, user space can no longer
>@@ -2003,7 +2006,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	}
> 
> 	/* remove all tracking data - no locking needed */
>-	vfree(vm->mb_state);
>+	vfree(vm->sbm.mb_states);
> 	vfree(vm->sb_bitmap);
> 
> 	/* reset the device and cleanup the queues */
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap
  2020-10-12 12:53 ` [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap David Hildenbrand
@ 2020-10-16  8:46   ` Wei Yang
  2020-10-20  9:58   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:12PM +0200, David Hildenbrand wrote:
>The calculation is already complicated enough, let's limit it to one
>location.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 2cc497ad8298..73ff6e9ba839 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -327,6 +327,16 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> 	     _mb_id--) \
> 		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
>+/*
>+ * Calculate the bit number in the sb_states bitmap for the given subblock
>+ * inside the given memory block.
>+ */
>+static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
>+					  unsigned long mb_id, int sb_id)
>+{
>+	return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+}
>+
> /*
>  * Mark all selected subblocks plugged.
>  *
>@@ -336,7 +346,7 @@ static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
> 					  unsigned long mb_id, int sb_id,
> 					  int count)
> {
>-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
> 
> 	__bitmap_set(vm->sbm.sb_states, bit, count);
> }
>@@ -350,7 +360,7 @@ static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
> 					    unsigned long mb_id, int sb_id,
> 					    int count)
> {
>-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
> 
> 	__bitmap_clear(vm->sbm.sb_states, bit, count);
> }
>@@ -362,7 +372,7 @@ static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
> 					   unsigned long mb_id, int sb_id,
> 					   int count)
> {
>-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
> 
> 	if (count == 1)
> 		return test_bit(bit, vm->sbm.sb_states);
>@@ -379,7 +389,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> 					     unsigned long mb_id, int sb_id,
> 					     int count)
> {
>-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
> 
> 	/* TODO: Helper similar to bitmap_set() */
> 	return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
>@@ -393,7 +403,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
>-	const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
>+	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
> 
> 	return find_next_zero_bit(vm->sbm.sb_states,
> 				  bit + vm->nb_sb_per_mb, bit) - bit;
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 19/29] virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 19/29] virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM) David Hildenbrand
@ 2020-10-16  8:49   ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:13PM +0200, David Hildenbrand wrote:
>Let's rename them accordingly. virtio_mem_plug_request() and
>virtio_mem_unplug_request() will be handled separately.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

The code is correct, while the naming is a bit long to understand...

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 90 ++++++++++++++++++-------------------
> 1 file changed, 43 insertions(+), 47 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 73ff6e9ba839..fc2b1ff3beed 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -1075,8 +1075,8 @@ static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
>  * Plug selected subblocks. Updates the plugged state, but not the state
>  * of the memory block.
>  */
>-static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
>-				 int sb_id, int count)
>+static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
>+				  int sb_id, int count)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
> 			      sb_id * vm->subblock_size;
>@@ -1096,8 +1096,8 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
>  * Unplug selected subblocks. Updates the plugged state, but not the state
>  * of the memory block.
>  */
>-static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>-				   int sb_id, int count)
>+static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>+				    int sb_id, int count)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
> 			      sb_id * vm->subblock_size;
>@@ -1122,8 +1122,8 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>  *
>  * Note: can fail after some subblocks were unplugged.
>  */
>-static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
>-				       unsigned long mb_id, uint64_t *nb_sb)
>+static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
>+					unsigned long mb_id, uint64_t *nb_sb)
> {
> 	int sb_id, count;
> 	int rc;
>@@ -1144,7 +1144,7 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
> 			sb_id--;
> 		}
> 
>-		rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);
>+		rc = virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count);
> 		if (rc)
> 			return rc;
> 		*nb_sb -= count;
>@@ -1161,18 +1161,18 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
>  *
>  * Note: can fail after some subblocks were unplugged.
>  */
>-static int virtio_mem_mb_unplug(struct virtio_mem *vm, unsigned long mb_id)
>+static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
> {
> 	uint64_t nb_sb = vm->nb_sb_per_mb;
> 
>-	return virtio_mem_mb_unplug_any_sb(vm, mb_id, &nb_sb);
>+	return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
> }
> 
> /*
>  * Prepare tracking data for the next memory block.
>  */
>-static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
>-				      unsigned long *mb_id)
>+static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
>+					  unsigned long *mb_id)
> {
> 	int rc;
> 
>@@ -1200,9 +1200,8 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
>  *
>  * Will modify the state of the memory block.
>  */
>-static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
>-				      unsigned long mb_id,
>-				      uint64_t *nb_sb)
>+static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
>+					  unsigned long mb_id, uint64_t *nb_sb)
> {
> 	const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
> 	int rc;
>@@ -1214,7 +1213,7 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
> 	 * Plug the requested number of subblocks before adding it to linux,
> 	 * so that onlining will directly online all plugged subblocks.
> 	 */
>-	rc = virtio_mem_mb_plug_sb(vm, mb_id, 0, count);
>+	rc = virtio_mem_sbm_plug_sb(vm, mb_id, 0, count);
> 	if (rc)
> 		return rc;
> 
>@@ -1241,7 +1240,7 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
> 		 * TODO: Linux MM does not properly clean up yet in all cases
> 		 * where adding of memory failed - especially on -ENOMEM.
> 		 */
>-		if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
>+		if (virtio_mem_sbm_unplug_sb(vm, mb_id, 0, count))
> 			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
> 		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
> 		return rc;
>@@ -1259,8 +1258,9 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
>  *
>  * Note: Can fail after some subblocks were successfully plugged.
>  */
>-static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
>-				     uint64_t *nb_sb, bool online)
>+static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>+				      unsigned long mb_id, uint64_t *nb_sb,
>+				      bool online)
> {
> 	unsigned long pfn, nr_pages;
> 	int sb_id, count;
>@@ -1279,7 +1279,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> 			count++;
> 
>-		rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
>+		rc = virtio_mem_sbm_plug_sb(vm, mb_id, sb_id, count);
> 		if (rc)
> 			return rc;
> 		*nb_sb -= count;
>@@ -1323,7 +1323,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	/* Try to plug subblocks of partially plugged online blocks. */
> 	virtio_mem_sbm_for_each_mb(vm, mb_id,
> 				   VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
>-		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, true);
>+		rc = virtio_mem_sbm_plug_any_sb(vm, mb_id, &nb_sb, true);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
> 		cond_resched();
>@@ -1332,7 +1332,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	/* Try to plug subblocks of partially plugged offline blocks. */
> 	virtio_mem_sbm_for_each_mb(vm, mb_id,
> 				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>-		rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, false);
>+		rc = virtio_mem_sbm_plug_any_sb(vm, mb_id, &nb_sb, false);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
> 		cond_resched();
>@@ -1349,7 +1349,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
> 			return -ENOSPC;
> 
>-		rc = virtio_mem_mb_plug_and_add(vm, mb_id, &nb_sb);
>+		rc = virtio_mem_sbm_plug_and_add_mb(vm, mb_id, &nb_sb);
> 		if (rc || !nb_sb)
> 			return rc;
> 		cond_resched();
>@@ -1360,10 +1360,10 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 		if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
> 			return -ENOSPC;
> 
>-		rc = virtio_mem_prepare_next_mb(vm, &mb_id);
>+		rc = virtio_mem_sbm_prepare_next_mb(vm, &mb_id);
> 		if (rc)
> 			return rc;
>-		rc = virtio_mem_mb_plug_and_add(vm, mb_id, &nb_sb);
>+		rc = virtio_mem_sbm_plug_and_add_mb(vm, mb_id, &nb_sb);
> 		if (rc)
> 			return rc;
> 		cond_resched();
>@@ -1384,13 +1384,13 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>  *
>  * Note: Can fail after some subblocks were successfully unplugged.
>  */
>-static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
>-					       unsigned long mb_id,
>-					       uint64_t *nb_sb)
>+static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
>+						unsigned long mb_id,
>+						uint64_t *nb_sb)
> {
> 	int rc;
> 
>-	rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
>+	rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
> 
> 	/* some subblocks might have been unplugged even on failure */
> 	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>@@ -1422,9 +1422,9 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
>  *
>  * Will modify the state of the memory block.
>  */
>-static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
>-					  unsigned long mb_id, int sb_id,
>-					  int count)
>+static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
>+					   unsigned long mb_id, int sb_id,
>+					   int count)
> {
> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
> 	unsigned long start_pfn;
>@@ -1438,7 +1438,7 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
> 		return rc;
> 
> 	/* Try to unplug the allocated memory */
>-	rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);
>+	rc = virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count);
> 	if (rc) {
> 		/* Return the memory to the buddy. */
> 		virtio_mem_fake_online(start_pfn, nr_pages);
>@@ -1460,17 +1460,17 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
>  * Note: Can fail after some subblocks were successfully unplugged. Can
>  *       return 0 even if subblocks were busy and could not get unplugged.
>  */
>-static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>-					      unsigned long mb_id,
>-					      uint64_t *nb_sb)
>+static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
>+					       unsigned long mb_id,
>+					       uint64_t *nb_sb)
> {
> 	int rc, sb_id;
> 
> 	/* If possible, try to unplug the complete block in one shot. */
> 	if (*nb_sb >= vm->nb_sb_per_mb &&
> 	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>-		rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
>-						    vm->nb_sb_per_mb);
>+		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
>+						     vm->nb_sb_per_mb);
> 		if (!rc) {
> 			*nb_sb -= vm->nb_sb_per_mb;
> 			goto unplugged;
>@@ -1487,7 +1487,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> 		if (sb_id < 0)
> 			break;
> 
>-		rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, sb_id, 1);
>+		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, sb_id, 1);
> 		if (rc == -EBUSY)
> 			continue;
> 		else if (rc)
>@@ -1535,8 +1535,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	/* Try to unplug subblocks of partially plugged offline blocks. */
> 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
> 				       VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>-		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
>-							 &nb_sb);
>+		rc = virtio_mem_sbm_unplug_any_sb_offline(vm, mb_id, &nb_sb);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
> 		cond_resched();
>@@ -1544,8 +1543,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 
> 	/* Try to unplug subblocks of plugged offline blocks. */
> 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_OFFLINE) {
>-		rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
>-							 &nb_sb);
>+		rc = virtio_mem_sbm_unplug_any_sb_offline(vm, mb_id, &nb_sb);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
> 		cond_resched();
>@@ -1559,8 +1557,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	/* Try to unplug subblocks of partially plugged online blocks. */
> 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
> 				       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
>-		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
>-							&nb_sb);
>+		rc = virtio_mem_sbm_unplug_any_sb_online(vm, mb_id, &nb_sb);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
> 		mutex_unlock(&vm->hotplug_mutex);
>@@ -1570,8 +1567,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 
> 	/* Try to unplug subblocks of plugged online blocks. */
> 	virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_ONLINE) {
>-		rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
>-							&nb_sb);
>+		rc = virtio_mem_sbm_unplug_any_sb_online(vm, mb_id, &nb_sb);
> 		if (rc || !nb_sb)
> 			goto out_unlock;
> 		mutex_unlock(&vm->hotplug_mutex);
>@@ -1596,7 +1592,7 @@ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> 	int rc;
> 
> 	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>-		rc = virtio_mem_mb_unplug(vm, mb_id);
>+		rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
> 		if (rc)
> 			return rc;
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size " David Hildenbrand
@ 2020-10-16  8:51   ` Wei Yang
  2020-10-16  8:53   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 96 ++++++++++++++++++-------------------
> 1 file changed, 48 insertions(+), 48 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index fc2b1ff3beed..3a772714fec9 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -96,11 +96,6 @@ struct virtio_mem {
> 	/* Maximum region size in bytes. */
> 	uint64_t region_size;
> 
>-	/* The subblock size. */
>-	uint64_t subblock_size;
>-	/* The number of subblocks per memory block. */
>-	uint32_t nb_sb_per_mb;
>-
> 	/* Id of the first memory block of this device. */
> 	unsigned long first_mb_id;
> 	/* Id of the last usable memory block of this device. */
>@@ -126,6 +121,11 @@ struct virtio_mem {
> 	uint64_t offline_threshold;
> 
> 	struct {
>+		/* The subblock size. */
>+		uint64_t sb_size;
>+		/* The number of subblocks per Linux memory block. */
>+		uint32_t sbs_per_mb;
>+
> 		/* Summary of all memory block states. */
> 		unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
> 
>@@ -256,7 +256,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> 	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
> 	const unsigned long mb_addr = virtio_mem_mb_id_to_phys(mb_id);
> 
>-	return (addr - mb_addr) / vm->subblock_size;
>+	return (addr - mb_addr) / vm->sbm.sb_size;
> }
> 
> /*
>@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> 					  unsigned long mb_id, int sb_id)
> {
>-	return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+	return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
> }
> 
> /*
>@@ -397,7 +397,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> }
> 
> /*
>- * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
>+ * Find the first unplugged subblock. Returns vm->sbm.sbs_per_mb in case there is
>  * none.
>  */
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>@@ -406,7 +406,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> 	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
> 
> 	return find_next_zero_bit(vm->sbm.sb_states,
>-				  bit + vm->nb_sb_per_mb, bit) - bit;
>+				  bit + vm->sbm.sbs_per_mb, bit) - bit;
> }
> 
> /*
>@@ -415,8 +415,8 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> 	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
>-	const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
>-	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
>+	const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
>+	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
> 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> 	int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
> 	unsigned long *new_bitmap, *old_bitmap;
>@@ -642,15 +642,15 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
> static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
>-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> 	unsigned long pfn;
> 	int sb_id;
> 
>-	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+	for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> 		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			       sb_id * vm->subblock_size);
>+			       sb_id * vm->sbm.sb_size);
> 		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
> 	}
> }
>@@ -658,15 +658,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> 					     unsigned long mb_id)
> {
>-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> 	unsigned long pfn;
> 	int sb_id;
> 
>-	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+	for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> 		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			       sb_id * vm->subblock_size);
>+			       sb_id * vm->sbm.sb_size);
> 		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
> 	}
> }
>@@ -1079,8 +1079,8 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 				  int sb_id, int count)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>-			      sb_id * vm->subblock_size;
>-	const uint64_t size = count * vm->subblock_size;
>+			      sb_id * vm->sbm.sb_size;
>+	const uint64_t size = count * vm->sbm.sb_size;
> 	int rc;
> 
> 	dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
>@@ -1100,8 +1100,8 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 				    int sb_id, int count)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>-			      sb_id * vm->subblock_size;
>-	const uint64_t size = count * vm->subblock_size;
>+			      sb_id * vm->sbm.sb_size;
>+	const uint64_t size = count * vm->sbm.sb_size;
> 	int rc;
> 
> 	dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
>@@ -1128,7 +1128,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
> 	int sb_id, count;
> 	int rc;
> 
>-	sb_id = vm->nb_sb_per_mb - 1;
>+	sb_id = vm->sbm.sbs_per_mb - 1;
> 	while (*nb_sb) {
> 		/* Find the next candidate subblock */
> 		while (sb_id >= 0 &&
>@@ -1163,7 +1163,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
>  */
> static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
> {
>-	uint64_t nb_sb = vm->nb_sb_per_mb;
>+	uint64_t nb_sb = vm->sbm.sbs_per_mb;
> 
> 	return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
> }
>@@ -1203,7 +1203,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
> static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> 					  unsigned long mb_id, uint64_t *nb_sb)
> {
>-	const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
>+	const int count = min_t(int, *nb_sb, vm->sbm.sbs_per_mb);
> 	int rc;
> 
> 	if (WARN_ON_ONCE(!count))
>@@ -1221,7 +1221,7 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> 	 * Mark the block properly offline before adding it to Linux,
> 	 * so the memory notifiers will find the block in the right state.
> 	 */
>-	if (count == vm->nb_sb_per_mb)
>+	if (count == vm->sbm.sbs_per_mb)
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
> 					    VIRTIO_MEM_SBM_MB_OFFLINE);
> 	else
>@@ -1271,11 +1271,11 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> 
> 	while (*nb_sb) {
> 		sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
>-		if (sb_id >= vm->nb_sb_per_mb)
>+		if (sb_id >= vm->sbm.sbs_per_mb)
> 			break;
> 		count = 1;
> 		while (count < *nb_sb &&
>-		       sb_id + count < vm->nb_sb_per_mb &&
>+		       sb_id + count < vm->sbm.sbs_per_mb &&
> 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> 			count++;
> 
>@@ -1288,12 +1288,12 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> 
> 		/* fake-online the pages if the memory block is online */
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			       sb_id * vm->subblock_size);
>-		nr_pages = PFN_DOWN(count * vm->subblock_size);
>+			       sb_id * vm->sbm.sb_size);
>+		nr_pages = PFN_DOWN(count * vm->sbm.sb_size);
> 		virtio_mem_fake_online(pfn, nr_pages);
> 	}
> 
>-	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		if (online)
> 			virtio_mem_sbm_set_mb_state(vm, mb_id,
> 						    VIRTIO_MEM_SBM_MB_ONLINE);
>@@ -1310,7 +1310,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>  */
> static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
>-	uint64_t nb_sb = diff / vm->subblock_size;
>+	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
> 	int rc;
> 
>@@ -1393,13 +1393,13 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
> 	rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
> 
> 	/* some subblocks might have been unplugged even on failure */
>-	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>+	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
> 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 	if (rc)
> 		return rc;
> 
>-	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		/*
> 		 * Remove the block from Linux - this should never fail.
> 		 * Hinder the block from getting onlined by marking it
>@@ -1426,12 +1426,12 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
> 					   unsigned long mb_id, int sb_id,
> 					   int count)
> {
>-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
>+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size) * count;
> 	unsigned long start_pfn;
> 	int rc;
> 
> 	start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			     sb_id * vm->subblock_size);
>+			     sb_id * vm->sbm.sb_size);
> 
> 	rc = virtio_mem_fake_offline(start_pfn, nr_pages);
> 	if (rc)
>@@ -1467,19 +1467,19 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	int rc, sb_id;
> 
> 	/* If possible, try to unplug the complete block in one shot. */
>-	if (*nb_sb >= vm->nb_sb_per_mb &&
>-	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (*nb_sb >= vm->sbm.sbs_per_mb &&
>+	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
>-						     vm->nb_sb_per_mb);
>+						     vm->sbm.sbs_per_mb);
> 		if (!rc) {
>-			*nb_sb -= vm->nb_sb_per_mb;
>+			*nb_sb -= vm->sbm.sbs_per_mb;
> 			goto unplugged;
> 		} else if (rc != -EBUSY)
> 			return rc;
> 	}
> 
> 	/* Fallback to single subblocks. */
>-	for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
>+	for (sb_id = vm->sbm.sbs_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> 		/* Find the next candidate subblock */
> 		while (sb_id >= 0 &&
> 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>@@ -1501,7 +1501,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	 * remove it. This will usually not fail, as no memory is in use
> 	 * anymore - however some other notifiers might NACK the request.
> 	 */
>-	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		mutex_unlock(&vm->hotplug_mutex);
> 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> 		mutex_lock(&vm->hotplug_mutex);
>@@ -1518,7 +1518,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
>  */
> static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
>-	uint64_t nb_sb = diff / vm->subblock_size;
>+	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
> 	int rc;
> 
>@@ -1805,11 +1805,11 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	 * - Is required for now for alloc_contig_range() to work reliably -
> 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
> 	 */
>-	vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>-				  pageblock_nr_pages) * PAGE_SIZE;
>-	vm->subblock_size = max_t(uint64_t, vm->device_block_size,
>-				  vm->subblock_size);
>-	vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
>+	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+				pageblock_nr_pages) * PAGE_SIZE;
>+	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>+				vm->sbm.sb_size);
>+	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
> 
> 	/* Round up to the next full memory block */
> 	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>@@ -1827,7 +1827,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> 		 memory_block_size_bytes());
> 	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>-		 (unsigned long long)vm->subblock_size);
>+		 (unsigned long long)vm->sbm.sb_size);
> 	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
> 
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size " David Hildenbrand
  2020-10-16  8:51   ` Wei Yang
@ 2020-10-16  8:53   ` Wei Yang
  2020-10-16 13:17     ` David Hildenbrand
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

One trivial suggestion, could we move this patch close the data structure
movement patch?

I know this would be some work, since you have changed some of the code logic.
This would take you some time to rebase.

>---
> drivers/virtio/virtio_mem.c | 96 ++++++++++++++++++-------------------
> 1 file changed, 48 insertions(+), 48 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index fc2b1ff3beed..3a772714fec9 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -96,11 +96,6 @@ struct virtio_mem {
> 	/* Maximum region size in bytes. */
> 	uint64_t region_size;
> 
>-	/* The subblock size. */
>-	uint64_t subblock_size;
>-	/* The number of subblocks per memory block. */
>-	uint32_t nb_sb_per_mb;
>-
> 	/* Id of the first memory block of this device. */
> 	unsigned long first_mb_id;
> 	/* Id of the last usable memory block of this device. */
>@@ -126,6 +121,11 @@ struct virtio_mem {
> 	uint64_t offline_threshold;
> 
> 	struct {
>+		/* The subblock size. */
>+		uint64_t sb_size;
>+		/* The number of subblocks per Linux memory block. */
>+		uint32_t sbs_per_mb;
>+
> 		/* Summary of all memory block states. */
> 		unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
> 
>@@ -256,7 +256,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> 	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
> 	const unsigned long mb_addr = virtio_mem_mb_id_to_phys(mb_id);
> 
>-	return (addr - mb_addr) / vm->subblock_size;
>+	return (addr - mb_addr) / vm->sbm.sb_size;
> }
> 
> /*
>@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> 					  unsigned long mb_id, int sb_id)
> {
>-	return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+	return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
> }
> 
> /*
>@@ -397,7 +397,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> }
> 
> /*
>- * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
>+ * Find the first unplugged subblock. Returns vm->sbm.sbs_per_mb in case there is
>  * none.
>  */
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>@@ -406,7 +406,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> 	const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
> 
> 	return find_next_zero_bit(vm->sbm.sb_states,
>-				  bit + vm->nb_sb_per_mb, bit) - bit;
>+				  bit + vm->sbm.sbs_per_mb, bit) - bit;
> }
> 
> /*
>@@ -415,8 +415,8 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> 	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
>-	const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
>-	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
>+	const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
>+	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
> 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> 	int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
> 	unsigned long *new_bitmap, *old_bitmap;
>@@ -642,15 +642,15 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
> static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
>-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> 	unsigned long pfn;
> 	int sb_id;
> 
>-	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+	for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> 		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			       sb_id * vm->subblock_size);
>+			       sb_id * vm->sbm.sb_size);
> 		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
> 	}
> }
>@@ -658,15 +658,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> 					     unsigned long mb_id)
> {
>-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> 	unsigned long pfn;
> 	int sb_id;
> 
>-	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+	for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> 		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			       sb_id * vm->subblock_size);
>+			       sb_id * vm->sbm.sb_size);
> 		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
> 	}
> }
>@@ -1079,8 +1079,8 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 				  int sb_id, int count)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>-			      sb_id * vm->subblock_size;
>-	const uint64_t size = count * vm->subblock_size;
>+			      sb_id * vm->sbm.sb_size;
>+	const uint64_t size = count * vm->sbm.sb_size;
> 	int rc;
> 
> 	dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
>@@ -1100,8 +1100,8 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 				    int sb_id, int count)
> {
> 	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>-			      sb_id * vm->subblock_size;
>-	const uint64_t size = count * vm->subblock_size;
>+			      sb_id * vm->sbm.sb_size;
>+	const uint64_t size = count * vm->sbm.sb_size;
> 	int rc;
> 
> 	dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
>@@ -1128,7 +1128,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
> 	int sb_id, count;
> 	int rc;
> 
>-	sb_id = vm->nb_sb_per_mb - 1;
>+	sb_id = vm->sbm.sbs_per_mb - 1;
> 	while (*nb_sb) {
> 		/* Find the next candidate subblock */
> 		while (sb_id >= 0 &&
>@@ -1163,7 +1163,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
>  */
> static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
> {
>-	uint64_t nb_sb = vm->nb_sb_per_mb;
>+	uint64_t nb_sb = vm->sbm.sbs_per_mb;
> 
> 	return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
> }
>@@ -1203,7 +1203,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
> static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> 					  unsigned long mb_id, uint64_t *nb_sb)
> {
>-	const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
>+	const int count = min_t(int, *nb_sb, vm->sbm.sbs_per_mb);
> 	int rc;
> 
> 	if (WARN_ON_ONCE(!count))
>@@ -1221,7 +1221,7 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> 	 * Mark the block properly offline before adding it to Linux,
> 	 * so the memory notifiers will find the block in the right state.
> 	 */
>-	if (count == vm->nb_sb_per_mb)
>+	if (count == vm->sbm.sbs_per_mb)
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
> 					    VIRTIO_MEM_SBM_MB_OFFLINE);
> 	else
>@@ -1271,11 +1271,11 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> 
> 	while (*nb_sb) {
> 		sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
>-		if (sb_id >= vm->nb_sb_per_mb)
>+		if (sb_id >= vm->sbm.sbs_per_mb)
> 			break;
> 		count = 1;
> 		while (count < *nb_sb &&
>-		       sb_id + count < vm->nb_sb_per_mb &&
>+		       sb_id + count < vm->sbm.sbs_per_mb &&
> 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> 			count++;
> 
>@@ -1288,12 +1288,12 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> 
> 		/* fake-online the pages if the memory block is online */
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			       sb_id * vm->subblock_size);
>-		nr_pages = PFN_DOWN(count * vm->subblock_size);
>+			       sb_id * vm->sbm.sb_size);
>+		nr_pages = PFN_DOWN(count * vm->sbm.sb_size);
> 		virtio_mem_fake_online(pfn, nr_pages);
> 	}
> 
>-	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		if (online)
> 			virtio_mem_sbm_set_mb_state(vm, mb_id,
> 						    VIRTIO_MEM_SBM_MB_ONLINE);
>@@ -1310,7 +1310,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>  */
> static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
>-	uint64_t nb_sb = diff / vm->subblock_size;
>+	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
> 	int rc;
> 
>@@ -1393,13 +1393,13 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
> 	rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
> 
> 	/* some subblocks might have been unplugged even on failure */
>-	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>+	if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
> 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 	if (rc)
> 		return rc;
> 
>-	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		/*
> 		 * Remove the block from Linux - this should never fail.
> 		 * Hinder the block from getting onlined by marking it
>@@ -1426,12 +1426,12 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
> 					   unsigned long mb_id, int sb_id,
> 					   int count)
> {
>-	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
>+	const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size) * count;
> 	unsigned long start_pfn;
> 	int rc;
> 
> 	start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>-			     sb_id * vm->subblock_size);
>+			     sb_id * vm->sbm.sb_size);
> 
> 	rc = virtio_mem_fake_offline(start_pfn, nr_pages);
> 	if (rc)
>@@ -1467,19 +1467,19 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	int rc, sb_id;
> 
> 	/* If possible, try to unplug the complete block in one shot. */
>-	if (*nb_sb >= vm->nb_sb_per_mb &&
>-	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (*nb_sb >= vm->sbm.sbs_per_mb &&
>+	    virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
>-						     vm->nb_sb_per_mb);
>+						     vm->sbm.sbs_per_mb);
> 		if (!rc) {
>-			*nb_sb -= vm->nb_sb_per_mb;
>+			*nb_sb -= vm->sbm.sbs_per_mb;
> 			goto unplugged;
> 		} else if (rc != -EBUSY)
> 			return rc;
> 	}
> 
> 	/* Fallback to single subblocks. */
>-	for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
>+	for (sb_id = vm->sbm.sbs_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> 		/* Find the next candidate subblock */
> 		while (sb_id >= 0 &&
> 		       !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>@@ -1501,7 +1501,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	 * remove it. This will usually not fail, as no memory is in use
> 	 * anymore - however some other notifiers might NACK the request.
> 	 */
>-	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		mutex_unlock(&vm->hotplug_mutex);
> 		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> 		mutex_lock(&vm->hotplug_mutex);
>@@ -1518,7 +1518,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
>  */
> static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
>-	uint64_t nb_sb = diff / vm->subblock_size;
>+	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
> 	int rc;
> 
>@@ -1805,11 +1805,11 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	 * - Is required for now for alloc_contig_range() to work reliably -
> 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
> 	 */
>-	vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>-				  pageblock_nr_pages) * PAGE_SIZE;
>-	vm->subblock_size = max_t(uint64_t, vm->device_block_size,
>-				  vm->subblock_size);
>-	vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
>+	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+				pageblock_nr_pages) * PAGE_SIZE;
>+	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>+				vm->sbm.sb_size);
>+	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
> 
> 	/* Round up to the next full memory block */
> 	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>@@ -1827,7 +1827,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> 		 memory_block_size_bytes());
> 	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>-		 (unsigned long long)vm->subblock_size);
>+		 (unsigned long long)vm->sbm.sb_size);
> 	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
> 
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 22/29] virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 22/29] virtio-mem: memory block ids " David Hildenbrand
@ 2020-10-16  8:54   ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:16PM +0200, David Hildenbrand wrote:
>Let's move first_mb_id/next_mb_id/last_usable_mb_id accordingly.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 44 ++++++++++++++++++-------------------
> 1 file changed, 22 insertions(+), 22 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index d06c8760b337..d3ab04f655ee 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -96,13 +96,6 @@ struct virtio_mem {
> 	/* Maximum region size in bytes. */
> 	uint64_t region_size;
> 
>-	/* Id of the first memory block of this device. */
>-	unsigned long first_mb_id;
>-	/* Id of the last usable memory block of this device. */
>-	unsigned long last_usable_mb_id;
>-	/* Id of the next memory bock to prepare when needed. */
>-	unsigned long next_mb_id;
>-
> 	/* The parent resource for all memory added via this device. */
> 	struct resource *parent_resource;
> 	/*
>@@ -121,6 +114,13 @@ struct virtio_mem {
> 	uint64_t offline_threshold;
> 
> 	struct {
>+		/* Id of the first memory block of this device. */
>+		unsigned long first_mb_id;
>+		/* Id of the last usable memory block of this device. */
>+		unsigned long last_usable_mb_id;
>+		/* Id of the next memory bock to prepare when needed. */
>+		unsigned long next_mb_id;
>+
> 		/* The subblock size. */
> 		uint64_t sb_size;
> 		/* The number of subblocks per Linux memory block. */
>@@ -265,7 +265,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
> 					unsigned long mb_id, uint8_t state)
> {
>-	const unsigned long idx = mb_id - vm->first_mb_id;
>+	const unsigned long idx = mb_id - vm->sbm.first_mb_id;
> 	uint8_t old_state;
> 
> 	old_state = vm->sbm.mb_states[idx];
>@@ -282,7 +282,7 @@ static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
> static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
> 					   unsigned long mb_id)
> {
>-	const unsigned long idx = mb_id - vm->first_mb_id;
>+	const unsigned long idx = mb_id - vm->sbm.first_mb_id;
> 
> 	return vm->sbm.mb_states[idx];
> }
>@@ -292,7 +292,7 @@ static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
>  */
> static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> {
>-	unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>+	unsigned long old_bytes = vm->sbm.next_mb_id - vm->sbm.first_mb_id;
> 	unsigned long new_bytes = old_bytes + 1;
> 	int old_pages = PFN_UP(old_bytes);
> 	int new_pages = PFN_UP(new_bytes);
>@@ -316,14 +316,14 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> }
> 
> #define virtio_mem_sbm_for_each_mb(_vm, _mb_id, _state) \
>-	for (_mb_id = _vm->first_mb_id; \
>-	     _mb_id < _vm->next_mb_id && _vm->sbm.mb_count[_state]; \
>+	for (_mb_id = _vm->sbm.first_mb_id; \
>+	     _mb_id < _vm->sbm.next_mb_id && _vm->sbm.mb_count[_state]; \
> 	     _mb_id++) \
> 		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
> #define virtio_mem_sbm_for_each_mb_rev(_vm, _mb_id, _state) \
>-	for (_mb_id = _vm->next_mb_id - 1; \
>-	     _mb_id >= _vm->first_mb_id && _vm->sbm.mb_count[_state]; \
>+	for (_mb_id = _vm->sbm.next_mb_id - 1; \
>+	     _mb_id >= _vm->sbm.first_mb_id && _vm->sbm.mb_count[_state]; \
> 	     _mb_id--) \
> 		if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
> 
>@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> 					  unsigned long mb_id, int sb_id)
> {
>-	return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
>+	return (mb_id - vm->sbm.first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
> }
> 
> /*
>@@ -414,7 +414,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>  */
> static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
>-	const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
>+	const unsigned long old_nb_mb = vm->sbm.next_mb_id - vm->sbm.first_mb_id;
> 	const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
> 	const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
> 	int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
>@@ -1177,7 +1177,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
> {
> 	int rc;
> 
>-	if (vm->next_mb_id > vm->last_usable_mb_id)
>+	if (vm->sbm.next_mb_id > vm->sbm.last_usable_mb_id)
> 		return -ENOSPC;
> 
> 	/* Resize the state array if required. */
>@@ -1191,7 +1191,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
> 		return rc;
> 
> 	vm->sbm.mb_count[VIRTIO_MEM_SBM_MB_UNUSED]++;
>-	*mb_id = vm->next_mb_id++;
>+	*mb_id = vm->sbm.next_mb_id++;
> 	return 0;
> }
> 
>@@ -1622,7 +1622,7 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
> 			usable_region_size, &usable_region_size);
> 	end_addr = vm->addr + usable_region_size;
> 	end_addr = min(end_addr, phys_limit);
>-	vm->last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>+	vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
> 
> 	/* see if there is a request to change the size */
> 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>@@ -1813,9 +1813,9 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
> 
> 	/* Round up to the next full memory block */
>-	vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>-						   memory_block_size_bytes());
>-	vm->next_mb_id = vm->first_mb_id;
>+	vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>+						       memory_block_size_bytes());
>+	vm->sbm.next_mb_id = vm->sbm.first_mb_id;
> 
> 	/* Prepare the offline threshold - make sure we can add two blocks. */
> 	vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier
  2020-10-16  8:00     ` Wei Yang
@ 2020-10-16  8:57       ` David Hildenbrand
  2020-10-18 12:37         ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16  8:57 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

>> Do we adjust the count twice?
>>
> 
> Ah, I got the reason why we need to adjust count for *unplugged* sub-blocks.

Exactly.

> 
>>> -		for (i = 0; i < nr_pages; i++) {
>>> -			page = pfn_to_page(pfn + i);
>>> -			if (WARN_ON(!page_ref_dec_and_test(page)))
> 
> Another question is when we grab a refcount for the unpluged pages? The one
> you mentioned in virtio_mem_set_fake_offline().

Yeah, that was confusing on my side. I actually meant
virtio_mem_fake_offline() - patch #12.

We have a reference on unplugged (fake offline) blocks via

1. memmap initialization, if never online via generic_online_page()

So if we keep pages fake offline when onlining memory, they

a) Have a refcount of 1
b) Have *not* increased the managed page count

2. alloc_contig_range(), if fake offlined. After we fake-offlined pages
(e.g., patch #12), such pages

a) Have a refcount of 1
b) Have *not* increased the managed page count (because we manually
decreased it)


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 23/29] virtio-mem: factor out adding/removing memory from Linux
  2020-10-12 12:53 ` [PATCH v1 23/29] virtio-mem: factor out adding/removing memory from Linux David Hildenbrand
@ 2020-10-16  8:59   ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  8:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:17PM +0200, David Hildenbrand wrote:
>Let's use wrappers for the low-level functions that dev_dbg/dev_warn
>and work on addr + size, such that we can reuse them for adding/removing
>in other granularity.
>
>We only warn when adding memory failed, because that's something to pay
>attention to. We won't warn when removing failed, we'll reuse that in
>racy context soon (and we do have proper BUG_ON() statements in the
>current cases where it must never happen).
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 107 ++++++++++++++++++++++++------------
> 1 file changed, 73 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index d3ab04f655ee..eb2ad31a8d8a 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -453,18 +453,16 @@ static bool virtio_mem_could_add_memory(struct virtio_mem *vm, uint64_t size)
> }
> 
> /*
>- * Try to add a memory block to Linux. This will usually only fail
>- * if out of memory.
>+ * Try adding memory to Linux. Will usually only fail if out of memory.
>  *
>  * Must not be called with the vm->hotplug_mutex held (possible deadlock with
>  * onlining code).
>  *
>- * Will not modify the state of the memory block.
>+ * Will not modify the state of memory blocks in virtio-mem.
>  */
>-static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
>+static int virtio_mem_add_memory(struct virtio_mem *vm, uint64_t addr,
>+				 uint64_t size)
> {
>-	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>-	const uint64_t size = memory_block_size_bytes();
> 	int rc;
> 
> 	/*
>@@ -478,32 +476,50 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
> 			return -ENOMEM;
> 	}
> 
>-	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
>+	dev_dbg(&vm->vdev->dev, "adding memory: 0x%llx - 0x%llx\n", addr,
>+		addr + size - 1);
> 	/* Memory might get onlined immediately. */
> 	atomic64_add(size, &vm->offline_size);
> 	rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
> 				       MEMHP_MERGE_RESOURCE);
>-	if (rc)
>+	if (rc) {
> 		atomic64_sub(size, &vm->offline_size);
>+		dev_warn(&vm->vdev->dev, "adding memory failed: %d\n", rc);
>+		/*
>+		 * TODO: Linux MM does not properly clean up yet in all cases
>+		 * where adding of memory failed - especially on -ENOMEM.
>+		 */
>+	}
> 	return rc;
> }
> 
> /*
>- * Try to remove a memory block from Linux. Will only fail if the memory block
>- * is not offline.
>+ * See virtio_mem_add_memory(): Try adding a single Linux memory block.
>+ */
>+static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
>+{
>+	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>+	const uint64_t size = memory_block_size_bytes();
>+
>+	return virtio_mem_add_memory(vm, addr, size);
>+}
>+
>+/*
>+ * Try removing memory from Linux. Will only fail if memory blocks aren't
>+ * offline.
>  *
>  * Must not be called with the vm->hotplug_mutex held (possible deadlock with
>  * onlining code).
>  *
>- * Will not modify the state of the memory block.
>+ * Will not modify the state of memory blocks in virtio-mem.
>  */
>-static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
>+static int virtio_mem_remove_memory(struct virtio_mem *vm, uint64_t addr,
>+				    uint64_t size)
> {
>-	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>-	const uint64_t size = memory_block_size_bytes();
> 	int rc;
> 
>-	dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
>+	dev_dbg(&vm->vdev->dev, "removing memory: 0x%llx - 0x%llx\n", addr,
>+		addr + size - 1);
> 	rc = remove_memory(vm->nid, addr, size);
> 	if (!rc) {
> 		atomic64_sub(size, &vm->offline_size);
>@@ -512,27 +528,41 @@ static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
> 		 * immediately instead of waiting.
> 		 */
> 		virtio_mem_retry(vm);
>+	} else {
>+		dev_dbg(&vm->vdev->dev, "removing memory failed: %d\n", rc);
> 	}
> 	return rc;
> }
> 
> /*
>- * Try to offline and remove a memory block from Linux.
>+ * See virtio_mem_remove_memory(): Try removing a single Linux memory block.
>+ */
>+static int virtio_mem_sbm_remove_mb(struct virtio_mem *vm, unsigned long mb_id)
>+{
>+	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>+	const uint64_t size = memory_block_size_bytes();
>+
>+	return virtio_mem_remove_memory(vm, addr, size);
>+}
>+
>+/*
>+ * Try offlining and removing memory from Linux.
>  *
>  * Must not be called with the vm->hotplug_mutex held (possible deadlock with
>  * onlining code).
>  *
>- * Will not modify the state of the memory block.
>+ * Will not modify the state of memory blocks in virtio-mem.
>  */
>-static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
>-					    unsigned long mb_id)
>+static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm,
>+						uint64_t addr,
>+						uint64_t size)
> {
>-	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>-	const uint64_t size = memory_block_size_bytes();
> 	int rc;
> 
>-	dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
>-		mb_id);
>+	dev_dbg(&vm->vdev->dev,
>+		"offlining and removing memory: 0x%llx - 0x%llx\n", addr,
>+		addr + size - 1);
>+
> 	rc = offline_and_remove_memory(vm->nid, addr, size);
> 	if (!rc) {
> 		atomic64_sub(size, &vm->offline_size);
>@@ -541,10 +571,26 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
> 		 * immediately instead of waiting.
> 		 */
> 		virtio_mem_retry(vm);
>+	} else {
>+		dev_dbg(&vm->vdev->dev,
>+			"offlining and removing memory failed: %d\n", rc);
> 	}
> 	return rc;
> }
> 
>+/*
>+ * See virtio_mem_offline_and_remove_memory(): Try offlining and removing
>+ * a single Linux memory block.
>+ */
>+static int virtio_mem_sbm_offline_and_remove_mb(struct virtio_mem *vm,
>+						unsigned long mb_id)
>+{
>+	const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
>+	const uint64_t size = memory_block_size_bytes();
>+
>+	return virtio_mem_offline_and_remove_memory(vm, addr, size);
>+}
>+
> /*
>  * Trigger the workqueue so the device can perform its magic.
>  */
>@@ -1230,17 +1276,10 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> 					    VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> 
> 	/* Add the memory block to linux - if that fails, try to unplug. */
>-	rc = virtio_mem_mb_add(vm, mb_id);
>+	rc = virtio_mem_sbm_add_mb(vm, mb_id);
> 	if (rc) {
> 		int new_state = VIRTIO_MEM_SBM_MB_UNUSED;
> 
>-		dev_err(&vm->vdev->dev,
>-			"adding memory block %lu failed with %d\n", mb_id, rc);
>-
>-		/*
>-		 * TODO: Linux MM does not properly clean up yet in all cases
>-		 * where adding of memory failed - especially on -ENOMEM.
>-		 */
> 		if (virtio_mem_sbm_unplug_sb(vm, mb_id, 0, count))
> 			new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
> 		virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
>@@ -1411,7 +1450,7 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
> 					    VIRTIO_MEM_SBM_MB_UNUSED);
> 
> 		mutex_unlock(&vm->hotplug_mutex);
>-		rc = virtio_mem_mb_remove(vm, mb_id);
>+		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
> 		BUG_ON(rc);
> 		mutex_lock(&vm->hotplug_mutex);
> 	}
>@@ -1504,7 +1543,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	 */
> 	if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> 		mutex_unlock(&vm->hotplug_mutex);
>-		rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
>+		rc = virtio_mem_sbm_offline_and_remove_mb(vm, mb_id);
> 		mutex_lock(&vm->hotplug_mutex);
> 		if (!rc)
> 			virtio_mem_sbm_set_mb_state(vm, mb_id,
>@@ -1991,7 +2030,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	 */
> 	virtio_mem_sbm_for_each_mb(vm, mb_id,
> 				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>-		rc = virtio_mem_mb_remove(vm, mb_id);
>+		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
> 		BUG_ON(rc);
> 		virtio_mem_sbm_set_mb_state(vm, mb_id,
> 					    VIRTIO_MEM_SBM_MB_UNUSED);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb()
  2020-10-15 20:24   ` Pankaj Gupta
@ 2020-10-16  9:00     ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16  9:00 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

On 15.10.20 22:24, Pankaj Gupta wrote:
>> We actually need one byte less (next_mb_id is exclusive, first_mb_id is
>> inclusive). Simplify.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  drivers/virtio/virtio_mem.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index a1f5bf7a571a..670b3faf412d 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -257,8 +257,8 @@ static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
>>   */
>>  static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
>>  {
>> -       unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id + 1;
>> -       unsigned long new_bytes = vm->next_mb_id - vm->first_mb_id + 2;
>> +       unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>> +       unsigned long new_bytes = old_bytes + 1;
> 
> Maybe we can avoid new_bytes & old_bytes variables, instead use single
> variable. Can later be used with PFN_UP/PFN_DOWN.

I'll see if it fits into a single line now - if it does, I'll move it
there. Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 24/29] virtio-mem: print debug messages from virtio_mem_send_*_request()
  2020-10-12 12:53 ` [PATCH v1 24/29] virtio-mem: print debug messages from virtio_mem_send_*_request() David Hildenbrand
@ 2020-10-16  9:07   ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16  9:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:18PM +0200, David Hildenbrand wrote:
>Let's move the existing dev_dbg() into the functions, print if something
>went wrong, and also print for virtio_mem_send_unplug_all_request().
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 50 ++++++++++++++++++++++++++-----------
> 1 file changed, 35 insertions(+), 15 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index eb2ad31a8d8a..e68d0d99590c 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -1053,23 +1053,33 @@ static int virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
> 		.u.plug.addr = cpu_to_virtio64(vm->vdev, addr),
> 		.u.plug.nb_blocks = cpu_to_virtio16(vm->vdev, nb_vm_blocks),
> 	};
>+	int rc = -ENOMEM;
> 
> 	if (atomic_read(&vm->config_changed))
> 		return -EAGAIN;
> 
>+	dev_dbg(&vm->vdev->dev, "plugging memory: 0x%llx - 0x%llx\n", addr,
>+		addr + size - 1);
>+
> 	switch (virtio_mem_send_request(vm, &req)) {
> 	case VIRTIO_MEM_RESP_ACK:
> 		vm->plugged_size += size;
> 		return 0;
> 	case VIRTIO_MEM_RESP_NACK:
>-		return -EAGAIN;
>+		rc = -EAGAIN;
>+		break;
> 	case VIRTIO_MEM_RESP_BUSY:
>-		return -ETXTBSY;
>+		rc = -ETXTBSY;
>+		break;
> 	case VIRTIO_MEM_RESP_ERROR:
>-		return -EINVAL;
>+		rc = -EINVAL;
>+		break;
> 	default:
>-		return -ENOMEM;
>+		break;
> 	}
>+
>+	dev_dbg(&vm->vdev->dev, "plugging memory failed: %d\n", rc);
>+	return rc;
> }
> 
> static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
>@@ -1081,21 +1091,30 @@ static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
> 		.u.unplug.addr = cpu_to_virtio64(vm->vdev, addr),
> 		.u.unplug.nb_blocks = cpu_to_virtio16(vm->vdev, nb_vm_blocks),
> 	};
>+	int rc = -ENOMEM;
> 
> 	if (atomic_read(&vm->config_changed))
> 		return -EAGAIN;
> 
>+	dev_dbg(&vm->vdev->dev, "unplugging memory: 0x%llx - 0x%llx\n", addr,
>+		addr + size - 1);
>+
> 	switch (virtio_mem_send_request(vm, &req)) {
> 	case VIRTIO_MEM_RESP_ACK:
> 		vm->plugged_size -= size;
> 		return 0;
> 	case VIRTIO_MEM_RESP_BUSY:
>-		return -ETXTBSY;
>+		rc = -ETXTBSY;
>+		break;
> 	case VIRTIO_MEM_RESP_ERROR:
>-		return -EINVAL;
>+		rc = -EINVAL;
>+		break;
> 	default:
>-		return -ENOMEM;
>+		break;
> 	}
>+
>+	dev_dbg(&vm->vdev->dev, "unplugging memory failed: %d\n", rc);
>+	return rc;
> }
> 
> static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
>@@ -1103,6 +1122,9 @@ static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
> 	const struct virtio_mem_req req = {
> 		.type = cpu_to_virtio16(vm->vdev, VIRTIO_MEM_REQ_UNPLUG_ALL),
> 	};
>+	int rc = -ENOMEM;
>+
>+	dev_dbg(&vm->vdev->dev, "unplugging all memory");
> 
> 	switch (virtio_mem_send_request(vm, &req)) {
> 	case VIRTIO_MEM_RESP_ACK:
>@@ -1112,10 +1134,14 @@ static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
> 		atomic_set(&vm->config_changed, 1);
> 		return 0;
> 	case VIRTIO_MEM_RESP_BUSY:
>-		return -ETXTBSY;
>+		rc = -ETXTBSY;
>+		break;
> 	default:
>-		return -ENOMEM;
>+		break;
> 	}
>+
>+	dev_dbg(&vm->vdev->dev, "unplugging all memory failed: %d\n", rc);
>+	return rc;
> }
> 
> /*
>@@ -1130,9 +1156,6 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 	const uint64_t size = count * vm->sbm.sb_size;
> 	int rc;
> 
>-	dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
>-		sb_id, sb_id + count - 1);
>-
> 	rc = virtio_mem_send_plug_request(vm, addr, size);
> 	if (!rc)
> 		virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
>@@ -1151,9 +1174,6 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 	const uint64_t size = count * vm->sbm.sb_size;
> 	int rc;
> 
>-	dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
>-		mb_id, sb_id, sb_id + count - 1);
>-
> 	rc = virtio_mem_send_unplug_request(vm, addr, size);
> 	if (!rc)
> 		virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-16  2:16       ` Wei Yang
@ 2020-10-16  9:11         ` David Hildenbrand
  2020-10-16 10:02           ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16  9:11 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

>> That's an interesting corner case. Assume you have a 128MB memory block
>> but only 64MB are plugged.
> 
> Since we just plug a part of memory block, this state is OFFLINE_PARTIAL
> first. But then we would add these memory and online it. This means the state
> of this memory block is ONLINE_PARTIAL.
> 
> When this state is changed to OFFLINE_PARTIAL again?

Please note that memory onlining is *completely* controllable by user
space. User space can offline/online memory blocks as it wants. Not
saying this might actually be the right thing to do - but we cannot
trust that user space does the right thing.

So at any point in time, you have to assume that

a) added memory might not get onlined
b) previously onlined memory might get offlined
c) previously offline memory might get onlined

> 
>>
>> As long as we have our online_pages callback in place, we can hinder the
>> unplugged 64MB from getting exposed to the buddy
>> (virtio_mem_online_page_cb()). However, once we unloaded the driver,
> 
> Yes,
> 
> virtio_mem_set_fake_offline() would __SetPageOffline() to those pages.
> 
>> this is no longer the case. If someone would online that memory block,
>> we would expose unplugged memory to the buddy - very bad.
>>
> 
> Per my understanding, at this point of time, the memory block is at online
> state. Even part of it is set to *fake* offline.
> 
> So how could user trigger another online from sysfs interface?

Assume we added a partially plugged memory block, which is now offline.
Further assume user space did not online the memory block (e.g., no udev
rules).

User space could happily online the block after unloading the driver.
Again, we have to assume user space could do crazy things.

> 
>> So we have to remove these partially plugged, offline memory blocks when
>> losing control over them.
>>
>> I tried to document that via:
>>
>> "After we unregistered our callbacks, user space can online partially
>> plugged offline blocks. Make sure to remove them."
>>
>>>
>>> Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block.
>>> How about memory block in other states? It is not necessary to remove
>>> ONLINE[_PARTIAL] memroy blocks?
>>
>> Blocks that are fully plugged (ONLINE or OFFLINE) can get
>> onlined/offlined without us having to care. Works fine - we only have to
>> care about partially plugged blocks.
>>
>> While we *could* unplug OFFLINE blocks, there is no way we can
>> deterministically offline+remove ONLINE blocks. So that memory has to
>> stay, even after we unloaded the driver (similar to the dax/kmem driver).
> 
> For OFFLINE memory blocks, would that leave the situation:
> 
> Guest doesn't need those pages, while host still maps them?

Yes, but the guest could online the memory and make use of it.

(again, whoever decides to unload the driver better be knowing what he does)

To do it even more cleanly, we would

a) Have to remove completely plugged offline blocks (not done)
b) Have to remove partially plugged offline blocks (done)
c) Actually send unplug requests to the hypervisor

Right now, only b) is done, because it might actually cause harm (as
discussed). However, the problem is, that c) might actually fail.

Long short: we could add a) if it turns out to be a real issue. But
than, unloading the driver isn't really suggested, the current
implementation just "keeps it working without crashes" - and I guess
that's good enough for now.

> 
>>
>> ONLINE_PARTIAL is already taken care of: it cannot get offlined anymore,
>> as we still hold references to these struct pages
>> (virtio_mem_set_fake_offline()), and as we no longer have the memory
>> notifier in place, we can no longer agree to offline this memory (when
>> going_offline).
>>
> 
> Ok, I seems to understand the logic now.
> 
> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
> How they hold reference to struct page?

Sorry, I should have given you the right pointer. (similar to my other
reply)

We hold a reference either via

1. alloc_contig_range()
2. memmap init code, when not calling generic_online_page().

So these fake-offline pages can never be actually offlined, because we
no longer have the memory notifier registered to fix that up.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-16  4:03   ` Wei Yang
@ 2020-10-16  9:18     ` David Hildenbrand
  2020-10-18  3:57       ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16  9:18 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 16.10.20 06:03, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote:
>> Let's trigger from offlining code when we're not allowed to touch online
>> memory.
> 
> This describes the change in virtio_mem_memory_notifier_cb()?

Ah, yes, can try to make that clearer.

> 
>>
>> Handle the other case (memmap possibly freeing up another memory block)
>> when actually removing memory. When removing via virtio_mem_remove(),
>> virtio_mem_retry() is a NOP and safe to use.
>>
>> While at it, move retry handling when offlining out of
>> virtio_mem_notify_offline(), to share it with Device Block Mode (DBM)
>> soon.
> 
> I may not understand the logic fully. Here is my understanding of current
> logic:
> 
> 
>   virtio_mem_run_wq()
>       virtio_mem_unplug_request()
>           virtio_mem_mb_unplug_any_sb_offline()
> 	      virtio_mem_mb_remove()             --- 1
> 	  virtio_mem_mb_unplug_any_sb_online()
> 	      virtio_mem_mb_offline_and_remove() --- 2
> 
> This patch tries to trigger the wq at 1 and 2. And these two functions are
> only valid during this code flow.

Exactly.

> 
> These two functions actually remove some memory from the system. So I am not
> sure where extra unplug-able memory comes from. I guess those memory is from
> memory block device and mem_sectioin, memmap? While those memory is still
> marked as online, right?

Imagine you end up (only after some repeating plugging and unplugging of
memory, otherwise it's obviously impossible):

Memory block X: Contains only movable data

Memory block X + 1: Contains memmap of Memory block X:


We start to unplug from high, to low.

1. Try to unplug/offline/remove block X + 1: fails, because of the
   memmap
2. Try to unplug/offline/remove block X: succeeds.
3. Not all requested memory got unplugged. Sleep for 30 seconds.
4. Retry to unplug/offline/remove block X + 1: succeeds

What we do in 2, is that we trigger a retry of ourselves. That means,
that in 3. we don't actually sleep, but retry immediately.

This has been proven helpful in some of my tests, where you want to
unplug *a lot* of memory again, not just some parts.


Triggering a retry is fairly cheap. Assume you don't actually have to
perform any more unplugging. The workqueue wakes up, detects that
nothing is to do, and goes back to sleep.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug
  2020-10-12 12:53 ` [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug David Hildenbrand
@ 2020-10-16  9:38   ` Wei Yang
  2020-10-16 13:13     ` David Hildenbrand
  2020-10-19  2:26   ` Wei Yang
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16  9:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>Currently, we do not support device block sizes that exceed the Linux
>memory block size. For example, having a device block size of 1 GiB (e.g.,
>gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>blocks.
>
>Let's implement Big Block Mode (BBM), whereby we add/remove at least
>one Linux memory block at a time. With a 1 GiB device block size, a Big
>Block (BB) will cover 8 Linux memory blocks.
>
>We'll keep registering the online_page_callback machinery, it will be used
>for safe memory hotunplug in BBM next.
>
>Note: BBM is properly prepared for variable-sized Linux memory
>blocks that we might see in the future. So we won't care how many Linux
>memory blocks a big block actually spans, and how the memory notifier is
>called.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
> 1 file changed, 402 insertions(+), 82 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index e68d0d99590c..4d396ef98a92 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
> /*
>  * virtio-mem currently supports the following modes of operation:
>  *
>- * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>+ * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
>  *   size of a Sub Block (SB) is determined based on the device block size, the
>  *   pageblock size, and the maximum allocation granularity of the buddy.
>  *   Subblocks within a Linux memory block might either be plugged or unplugged.
>  *   Memory is added/removed to Linux MM in Linux memory block granularity.
>  *
>+ * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>+ *   Memory is added/removed to Linux MM in Big Block granularity.
>+ *
>+ * The mode is determined automatically based on the Linux memory block size
>+ * and the device block size.
>+ *
>  * User space / core MM (auto onlining) is responsible for onlining added
>  * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>  * always onlined separately, and all memory within a Linux memory block is
>@@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
> 	VIRTIO_MEM_SBM_MB_COUNT
> };
> 
>+/*
>+ * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>+ */
>+enum virtio_mem_bbm_bb_state {
>+	/* Unplugged, not added to Linux. Can be reused later. */
>+	VIRTIO_MEM_BBM_BB_UNUSED = 0,
>+	/* Plugged, not added to Linux. Error on add_memory(). */
>+	VIRTIO_MEM_BBM_BB_PLUGGED,
>+	/* Plugged and added to Linux. */
>+	VIRTIO_MEM_BBM_BB_ADDED,
>+	VIRTIO_MEM_BBM_BB_COUNT
>+};
>+
> struct virtio_mem {
> 	struct virtio_device *vdev;
> 
>@@ -113,6 +132,9 @@ struct virtio_mem {
> 	atomic64_t offline_size;
> 	uint64_t offline_threshold;
> 
>+	/* If set, the driver is in SBM, otherwise in BBM. */
>+	bool in_sbm;
>+
> 	struct {
> 		/* Id of the first memory block of this device. */
> 		unsigned long first_mb_id;
>@@ -151,9 +173,27 @@ struct virtio_mem {
> 		unsigned long *sb_states;
> 	} sbm;
> 
>+	struct {
>+		/* Id of the first big block of this device. */
>+		unsigned long first_bb_id;
>+		/* Id of the last usable big block of this device. */
>+		unsigned long last_usable_bb_id;
>+		/* Id of the next device bock to prepare when needed. */
>+		unsigned long next_bb_id;
>+
>+		/* Summary of all big block states. */
>+		unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>+
>+		/* One byte state per big block. See sbm.mb_states. */
>+		uint8_t *bb_states;
>+
>+		/* The block size used for (un)plugged, adding/removing. */
>+		uint64_t bb_size;
>+	} bbm;

Can we use a union here?

>+
> 	/*
>-	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>-	 * sbm.sb_states.
>+	 * Mutex that protects the sbm.mb_count, sbm.mb_states,
>+	 * sbm.sb_states, bbm.bb_count, and bbm.bb_states
> 	 *
> 	 * When this lock is held the pointers can't change, ONLINE and
> 	 * OFFLINE blocks can't change the state and no subblocks will get
>@@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
> 	return mb_id * memory_block_size_bytes();
> }
> 
>+/*
>+ * Calculate the big block id of a given address.
>+ */
>+static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
>+					      uint64_t addr)
>+{
>+	return addr / vm->bbm.bb_size;
>+}
>+
>+/*
>+ * Calculate the physical start address of a given big block id.
>+ */
>+static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
>+					 unsigned long bb_id)
>+{
>+	return bb_id * vm->bbm.bb_size;
>+}
>+
> /*
>  * Calculate the subblock id of a given address.
>  */
>@@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> 	return (addr - mb_addr) / vm->sbm.sb_size;
> }
> 
>+/*
>+ * Set the state of a big block, taking care of the state counter.
>+ */
>+static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
>+					unsigned long bb_id,
>+					enum virtio_mem_bbm_bb_state state)
>+{
>+	const unsigned long idx = bb_id - vm->bbm.first_bb_id;
>+	enum virtio_mem_bbm_bb_state old_state;
>+
>+	old_state = vm->bbm.bb_states[idx];
>+	vm->bbm.bb_states[idx] = state;
>+
>+	BUG_ON(vm->bbm.bb_count[old_state] == 0);
>+	vm->bbm.bb_count[old_state]--;
>+	vm->bbm.bb_count[state]++;
>+}
>+
>+/*
>+ * Get the state of a big block.
>+ */
>+static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
>+								unsigned long bb_id)
>+{
>+	return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
>+}
>+
>+/*
>+ * Prepare the big block state array for the next big block.
>+ */
>+static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>+{
>+	unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
>+	unsigned long new_bytes = old_bytes + 1;
>+	int old_pages = PFN_UP(old_bytes);
>+	int new_pages = PFN_UP(new_bytes);
>+	uint8_t *new_array;
>+
>+	if (vm->bbm.bb_states && old_pages == new_pages)
>+		return 0;
>+
>+	new_array = vzalloc(new_pages * PAGE_SIZE);
>+	if (!new_array)
>+		return -ENOMEM;
>+
>+	mutex_lock(&vm->hotplug_mutex);
>+	if (vm->bbm.bb_states)
>+		memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
>+	vfree(vm->bbm.bb_states);
>+	vm->bbm.bb_states = new_array;
>+	mutex_unlock(&vm->hotplug_mutex);
>+
>+	return 0;
>+}
>+
>+#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
>+	for (_bb_id = vm->bbm.first_bb_id; \
>+	     _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
>+	     _bb_id++) \
>+		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>+
> /*
>  * Set the state of a memory block, taking care of the state counter.
>  */
>@@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
> 	return virtio_mem_add_memory(vm, addr, size);
> }
> 
>+/*
>+ * See virtio_mem_add_memory(): Try adding a big block.
>+ */
>+static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_add_memory(vm, addr, size);
>+}
>+
> /*
>  * Try removing memory from Linux. Will only fail if memory blocks aren't
>  * offline.
>@@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 	struct memory_notify *mhp = arg;
> 	const unsigned long start = PFN_PHYS(mhp->start_pfn);
> 	const unsigned long size = PFN_PHYS(mhp->nr_pages);
>-	const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
> 	int rc = NOTIFY_OK;
>+	unsigned long id;
> 
> 	if (!virtio_mem_overlaps_range(vm, start, size))
> 		return NOTIFY_DONE;
> 
>-	/*
>-	 * Memory is onlined/offlined in memory block granularity. We cannot
>-	 * cross virtio-mem device boundaries and memory block boundaries. Bail
>-	 * out if this ever changes.
>-	 */
>-	if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>-			 !IS_ALIGNED(start, memory_block_size_bytes())))
>-		return NOTIFY_BAD;
>+	if (vm->in_sbm) {
>+		id = virtio_mem_phys_to_mb_id(start);
>+		/*
>+		 * In SBM, we add memory in separate memory blocks - we expect
>+		 * it to be onlined/offlined in the same granularity. Bail out
>+		 * if this ever changes.
>+		 */
>+		if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>+				 !IS_ALIGNED(start, memory_block_size_bytes())))
>+			return NOTIFY_BAD;
>+	} else {
>+		id = virtio_mem_phys_to_bb_id(vm, start);
>+		/*
>+		 * In BBM, we only care about onlining/offlining happening
>+		 * within a single big block, we don't care about the
>+		 * actual granularity as we don't track individual Linux
>+		 * memory blocks.
>+		 */
>+		if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
>+			return NOTIFY_BAD;
>+	}
> 
> 	/*
> 	 * Avoid circular locking lockdep warnings. We lock the mutex
>@@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 			break;
> 		}
> 		vm->hotplug_active = true;
>-		virtio_mem_sbm_notify_going_offline(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_going_offline(vm, id);
> 		break;
> 	case MEM_GOING_ONLINE:
> 		mutex_lock(&vm->hotplug_mutex);
>@@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 			break;
> 		}
> 		vm->hotplug_active = true;
>-		rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
>+		if (vm->in_sbm)
>+			rc = virtio_mem_sbm_notify_going_online(vm, id);
> 		break;
> 	case MEM_OFFLINE:
>-		virtio_mem_sbm_notify_offline(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_offline(vm, id);
> 
> 		atomic64_add(size, &vm->offline_size);
> 		/*
>@@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 		mutex_unlock(&vm->hotplug_mutex);
> 		break;
> 	case MEM_ONLINE:
>-		virtio_mem_sbm_notify_online(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_online(vm, id);
> 
> 		atomic64_sub(size, &vm->offline_size);
> 		/*
>@@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 	case MEM_CANCEL_OFFLINE:
> 		if (!vm->hotplug_active)
> 			break;
>-		virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_cancel_offline(vm, id);
> 		vm->hotplug_active = false;
> 		mutex_unlock(&vm->hotplug_mutex);
> 		break;
>@@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> 	const unsigned long addr = page_to_phys(page);
>-	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
>+	unsigned long id, sb_id;
> 	struct virtio_mem *vm;
>-	int sb_id;
>+	bool do_online;
> 
>-	/*
>-	 * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
>-	 * size/alignment and that this callback is is called with such a
>-	 * size/alignment. So we cannot cross subblocks and therefore
>-	 * also not memory blocks.
>-	 */
> 	rcu_read_lock();
> 	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
> 		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> 			continue;
> 
>-		sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>-		/*
>-		 * If plugged, online the pages, otherwise, set them fake
>-		 * offline (PageOffline).
>-		 */
>-		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>+		if (vm->in_sbm) {
>+			/*
>+			 * We exploit here that subblocks have at least
>+			 * MAX_ORDER_NR_PAGES size/alignment - so we cannot
>+			 * cross subblocks within one call.
>+			 */
>+			id = virtio_mem_phys_to_mb_id(addr);
>+			sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>+			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
>+								   sb_id, 1);
>+		} else {
>+			do_online = true;
>+		}
>+		if (do_online)
> 			generic_online_page(page, order);
> 		else
> 			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>@@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 	return rc;
> }
> 
>+/*
>+ * Request to unplug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_send_unplug_request(vm, addr, size);
>+}
>+
>+/*
>+ * Request to plug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_send_plug_request(vm, addr, size);
>+}
>+
> /*
>  * Unplug the desired number of plugged subblocks of a offline or not-added
>  * memory block. Will fail if any subblock cannot get unplugged (instead of
>@@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> 	return 0;
> }
> 
>-/*
>- * Try to plug the requested amount of memory.
>- */
>-static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
> 	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
>@@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	return rc;
> }
> 
>+/*
>+ * Plug a big block and add it to Linux.
>+ *
>+ * Will modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
>+					  unsigned long bb_id)
>+{
>+	int rc;
>+
>+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+			 VIRTIO_MEM_BBM_BB_UNUSED))
>+		return -EINVAL;
>+
>+	rc = virtio_mem_bbm_plug_bb(vm, bb_id);
>+	if (rc)
>+		return rc;
>+	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>+
>+	rc = virtio_mem_bbm_add_bb(vm, bb_id);
>+	if (rc) {
>+		if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
>+			virtio_mem_bbm_set_bb_state(vm, bb_id,
>+						    VIRTIO_MEM_BBM_BB_UNUSED);
>+		else
>+			/* Retry from the main loop. */
>+			virtio_mem_bbm_set_bb_state(vm, bb_id,
>+						    VIRTIO_MEM_BBM_BB_PLUGGED);
>+		return rc;
>+	}
>+	return 0;
>+}
>+
>+/*
>+ * Prepare tracking data for the next big block.
>+ */
>+static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
>+					  unsigned long *bb_id)
>+{
>+	int rc;
>+
>+	if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
>+		return -ENOSPC;
>+
>+	/* Resize the big block state array if required. */
>+	rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
>+	if (rc)
>+		return rc;
>+
>+	vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
>+	*bb_id = vm->bbm.next_bb_id;
>+	vm->bbm.next_bb_id++;
>+	return 0;
>+}
>+
>+static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	uint64_t nb_bb = diff / vm->bbm.bb_size;
>+	unsigned long bb_id;
>+	int rc;
>+
>+	if (!nb_bb)
>+		return 0;
>+
>+	/* Try to plug and add unused big blocks */
>+	virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
>+		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+			return -ENOSPC;
>+
>+		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+		if (!rc)
>+			nb_bb--;
>+		if (rc || !nb_bb)
>+			return rc;
>+		cond_resched();
>+	}
>+
>+	/* Try to prepare, plug and add new big blocks */
>+	while (nb_bb) {
>+		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+			return -ENOSPC;
>+
>+		rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
>+		if (rc)
>+			return rc;
>+		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+		if (!rc)
>+			nb_bb--;
>+		if (rc)
>+			return rc;
>+		cond_resched();
>+	}
>+
>+	return 0;
>+}
>+
>+/*
>+ * Try to plug the requested amount of memory.
>+ */
>+static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	if (vm->in_sbm)
>+		return virtio_mem_sbm_plug_request(vm, diff);
>+	return virtio_mem_bbm_plug_request(vm, diff);
>+}
>+
> /*
>  * Unplug the desired number of plugged subblocks of an offline memory block.
>  * Will fail if any subblock cannot get unplugged (instead of skipping it).
>@@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	return 0;
> }
> 
>-/*
>- * Try to unplug the requested amount of memory.
>- */
>-static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
> 	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
>@@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	return rc;
> }
> 
>+/*
>+ * Try to unplug the requested amount of memory.
>+ */
>+static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	if (vm->in_sbm)
>+		return virtio_mem_sbm_unplug_request(vm, diff);
>+	return -EBUSY;
>+}
>+
> /*
>  * Try to unplug all blocks that couldn't be unplugged before, for example,
>  * because the hypervisor was busy.
>  */
> static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> {
>-	unsigned long mb_id;
>+	unsigned long id;
> 	int rc;
> 
>-	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>-		rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
>+	if (!vm->in_sbm) {
>+		virtio_mem_bbm_for_each_bb(vm, id,
>+					   VIRTIO_MEM_BBM_BB_PLUGGED) {
>+			rc = virtio_mem_bbm_unplug_bb(vm, id);
>+			if (rc)
>+				return rc;
>+			virtio_mem_bbm_set_bb_state(vm, id,
>+						    VIRTIO_MEM_BBM_BB_UNUSED);
>+		}
>+		return 0;
>+	}
>+
>+	virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>+		rc = virtio_mem_sbm_unplug_mb(vm, id);
> 		if (rc)
> 			return rc;
>-		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+		virtio_mem_sbm_set_mb_state(vm, id,
> 					    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 
>@@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
> 			usable_region_size, &usable_region_size);
> 	end_addr = vm->addr + usable_region_size;
> 	end_addr = min(end_addr, phys_limit);
>-	vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>+
>+	if (vm->in_sbm)
>+		vm->sbm.last_usable_mb_id =
>+					 virtio_mem_phys_to_mb_id(end_addr) - 1;
>+	else
>+		vm->bbm.last_usable_bb_id =
>+				     virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
> 
> 	/* see if there is a request to change the size */
> 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>@@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
> static int virtio_mem_init(struct virtio_mem *vm)
> {
> 	const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
>+	uint64_t sb_size, addr;
> 	uint16_t node_id;
> 
> 	if (!vm->vdev->config->get) {
>@@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	if (vm->nid == NUMA_NO_NODE)
> 		vm->nid = memory_add_physaddr_to_nid(vm->addr);
> 
>-	/*
>-	 * We always hotplug memory in memory block granularity. This way,
>-	 * we have to wait for exactly one memory block to online.
>-	 */
>-	if (vm->device_block_size > memory_block_size_bytes()) {
>-		dev_err(&vm->vdev->dev,
>-			"The block size is not supported (too big).\n");
>-		return -EINVAL;
>-	}
>-
> 	/* bad device setup - warn only */
> 	if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
> 		dev_warn(&vm->vdev->dev,
>@@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	 * - Is required for now for alloc_contig_range() to work reliably -
> 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
> 	 */
>-	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>-				pageblock_nr_pages) * PAGE_SIZE;
>-	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>-				vm->sbm.sb_size);
>-	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>+	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+			pageblock_nr_pages) * PAGE_SIZE;
>+	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
>+
>+	if (sb_size < memory_block_size_bytes()) {
>+		/* SBM: At least two subblocks per Linux memory block. */
>+		vm->in_sbm = true;
>+		vm->sbm.sb_size = sb_size;
>+		vm->sbm.sbs_per_mb = memory_block_size_bytes() /
>+				     vm->sbm.sb_size;
>+
>+		/* Round up to the next full memory block */
>+		addr = vm->addr + memory_block_size_bytes() - 1;
>+		vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
>+		vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+	} else {
>+		/* BBM: At least one Linux memory block. */
>+		vm->bbm.bb_size = vm->device_block_size;
> 
>-	/* Round up to the next full memory block */
>-	vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>-						       memory_block_size_bytes());
>-	vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
>+		vm->bbm.next_bb_id = vm->bbm.first_bb_id;
>+	}
> 
> 	/* Prepare the offline threshold - make sure we can add two blocks. */
> 	vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
> 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
>+	/* In BBM, we also want at least two big blocks. */
>+	vm->offline_threshold = max_t(uint64_t, 2 * vm->bbm.bb_size,
>+				      vm->offline_threshold);
> 
> 	dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> 	dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
>@@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 		 (unsigned long long)vm->device_block_size);
> 	dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> 		 memory_block_size_bytes());
>-	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>-		 (unsigned long long)vm->sbm.sb_size);
>+	if (vm->in_sbm)
>+		dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>+			 (unsigned long long)vm->sbm.sb_size);
>+	else
>+		dev_info(&vm->vdev->dev, "big block size: 0x%llx",
>+			 (unsigned long long)vm->bbm.bb_size);
> 	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
> 
>@@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	cancel_work_sync(&vm->wq);
> 	hrtimer_cancel(&vm->retry_timer);
> 
>-	/*
>-	 * After we unregistered our callbacks, user space can online partially
>-	 * plugged offline blocks. Make sure to remove them.
>-	 */
>-	virtio_mem_sbm_for_each_mb(vm, mb_id,
>-				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>-		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>-		BUG_ON(rc);
>-		virtio_mem_sbm_set_mb_state(vm, mb_id,
>-					    VIRTIO_MEM_SBM_MB_UNUSED);
>+	if (vm->in_sbm) {
>+		/*
>+		 * After we unregistered our callbacks, user space can online
>+		 * partially plugged offline blocks. Make sure to remove them.
>+		 */
>+		virtio_mem_sbm_for_each_mb(vm, mb_id,
>+					   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>+			rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>+			BUG_ON(rc);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_UNUSED);
>+		}
>+		/*
>+		 * After we unregistered our callbacks, user space can no longer
>+		 * offline partially plugged online memory blocks. No need to
>+		 * worry about them.
>+		 */
> 	}
>-	/*
>-	 * After we unregistered our callbacks, user space can no longer
>-	 * offline partially plugged online memory blocks. No need to worry
>-	 * about them.
>-	 */
> 
> 	/* unregister callbacks */
> 	unregister_virtio_mem_device(vm);
>@@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	}
> 
> 	/* remove all tracking data - no locking needed */
>-	vfree(vm->sbm.mb_states);
>-	vfree(vm->sbm.sb_states);
>+	if (vm->in_sbm) {
>+		vfree(vm->sbm.mb_states);
>+		vfree(vm->sbm.sb_states);
>+	} else {
>+		vfree(vm->bbm.bb_states);
>+	}
> 
> 	/* reset the device and cleanup the queues */
> 	vdev->config->reset(vdev);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-16  9:11         ` David Hildenbrand
@ 2020-10-16 10:02           ` Wei Yang
  2020-10-16 10:32             ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16 10:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Fri, Oct 16, 2020 at 11:11:24AM +0200, David Hildenbrand wrote:
>>> That's an interesting corner case. Assume you have a 128MB memory block
>>> but only 64MB are plugged.
>> 
>> Since we just plug a part of memory block, this state is OFFLINE_PARTIAL
>> first. But then we would add these memory and online it. This means the state
>> of this memory block is ONLINE_PARTIAL.
>> 
>> When this state is changed to OFFLINE_PARTIAL again?
>
>Please note that memory onlining is *completely* controllable by user
>space. User space can offline/online memory blocks as it wants. Not
>saying this might actually be the right thing to do - but we cannot
>trust that user space does the right thing.
>
>So at any point in time, you have to assume that
>
>a) added memory might not get onlined
>b) previously onlined memory might get offlined
>c) previously offline memory might get onlined
>
>> 
>>>
>>> As long as we have our online_pages callback in place, we can hinder the
>>> unplugged 64MB from getting exposed to the buddy
>>> (virtio_mem_online_page_cb()). However, once we unloaded the driver,
>> 
>> Yes,
>> 
>> virtio_mem_set_fake_offline() would __SetPageOffline() to those pages.
>> 
>>> this is no longer the case. If someone would online that memory block,
>>> we would expose unplugged memory to the buddy - very bad.
>>>
>> 
>> Per my understanding, at this point of time, the memory block is at online
>> state. Even part of it is set to *fake* offline.
>> 
>> So how could user trigger another online from sysfs interface?
>
>Assume we added a partially plugged memory block, which is now offline.
>Further assume user space did not online the memory block (e.g., no udev
>rules).
>
>User space could happily online the block after unloading the driver.
>Again, we have to assume user space could do crazy things.
>

You are right, online memory is not a forced behavior.

>> 
>>> So we have to remove these partially plugged, offline memory blocks when
>>> losing control over them.
>>>
>>> I tried to document that via:
>>>
>>> "After we unregistered our callbacks, user space can online partially
>>> plugged offline blocks. Make sure to remove them."
>>>
>>>>
>>>> Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block.
>>>> How about memory block in other states? It is not necessary to remove
>>>> ONLINE[_PARTIAL] memroy blocks?
>>>
>>> Blocks that are fully plugged (ONLINE or OFFLINE) can get
>>> onlined/offlined without us having to care. Works fine - we only have to
>>> care about partially plugged blocks.
>>>
>>> While we *could* unplug OFFLINE blocks, there is no way we can
>>> deterministically offline+remove ONLINE blocks. So that memory has to
>>> stay, even after we unloaded the driver (similar to the dax/kmem driver).
>> 
>> For OFFLINE memory blocks, would that leave the situation:
>> 
>> Guest doesn't need those pages, while host still maps them?
>
>Yes, but the guest could online the memory and make use of it.
>
>(again, whoever decides to unload the driver better be knowing what he does)
>
>To do it even more cleanly, we would
>
>a) Have to remove completely plugged offline blocks (not done)
>b) Have to remove partially plugged offline blocks (done)
>c) Actually send unplug requests to the hypervisor
>
>Right now, only b) is done, because it might actually cause harm (as
>discussed). However, the problem is, that c) might actually fail.
>
>Long short: we could add a) if it turns out to be a real issue. But
>than, unloading the driver isn't really suggested, the current
>implementation just "keeps it working without crashes" - and I guess
>that's good enough for now.
>
>> 
>>>
>>> ONLINE_PARTIAL is already taken care of: it cannot get offlined anymore,
>>> as we still hold references to these struct pages
>>> (virtio_mem_set_fake_offline()), and as we no longer have the memory
>>> notifier in place, we can no longer agree to offline this memory (when
>>> going_offline).
>>>
>> 
>> Ok, I seems to understand the logic now.
>> 
>> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
>> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
>> How they hold reference to struct page?
>
>Sorry, I should have given you the right pointer. (similar to my other
>reply)
>
>We hold a reference either via
>
>1. alloc_contig_range()

I am not familiar with this one, need to spend some time to look into.

>2. memmap init code, when not calling generic_online_page().

I may miss some code here. Before online pages, memmaps are allocated in
section_activate(). They are supposed to be zero-ed. (I don't get the exact
code line.) I am not sure when we grab a refcount here.

>
>So these fake-offline pages can never be actually offlined, because we
>no longer have the memory notifier registered to fix that up.
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-16 10:02           ` Wei Yang
@ 2020-10-16 10:32             ` David Hildenbrand
  2020-10-16 22:38               ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16 10:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

>>> Ok, I seems to understand the logic now.
>>>
>>> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
>>> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
>>> How they hold reference to struct page?
>>
>> Sorry, I should have given you the right pointer. (similar to my other
>> reply)
>>
>> We hold a reference either via
>>
>> 1. alloc_contig_range()
> 
> I am not familiar with this one, need to spend some time to look into.

Each individual page will have a pagecount of 1.

> 
>> 2. memmap init code, when not calling generic_online_page().
> 
> I may miss some code here. Before online pages, memmaps are allocated in
> section_activate(). They are supposed to be zero-ed. (I don't get the exact
> code line.) I am not sure when we grab a refcount here.

Best to refer to __init_single_page() -> init_page_count().

Each page that wasn't onlined via generic_online_page() has a refcount
of 1 and looks like allocated.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug
  2020-10-16  9:38   ` Wei Yang
@ 2020-10-16 13:13     ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16 13:13 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On 16.10.20 11:38, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>> Currently, we do not support device block sizes that exceed the Linux
>> memory block size. For example, having a device block size of 1 GiB (e.g.,
>> gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>> blocks.
>>
>> Let's implement Big Block Mode (BBM), whereby we add/remove at least
>> one Linux memory block at a time. With a 1 GiB device block size, a Big
>> Block (BB) will cover 8 Linux memory blocks.
>>
>> We'll keep registering the online_page_callback machinery, it will be used
>> for safe memory hotunplug in BBM next.
>>
>> Note: BBM is properly prepared for variable-sized Linux memory
>> blocks that we might see in the future. So we won't care how many Linux
>> memory blocks a big block actually spans, and how the memory notifier is
>> called.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
>> 1 file changed, 402 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index e68d0d99590c..4d396ef98a92 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>> /*
>>  * virtio-mem currently supports the following modes of operation:
>>  *
>> - * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>> + * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
>>  *   size of a Sub Block (SB) is determined based on the device block size, the
>>  *   pageblock size, and the maximum allocation granularity of the buddy.
>>  *   Subblocks within a Linux memory block might either be plugged or unplugged.
>>  *   Memory is added/removed to Linux MM in Linux memory block granularity.
>>  *
>> + * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>> + *   Memory is added/removed to Linux MM in Big Block granularity.
>> + *
>> + * The mode is determined automatically based on the Linux memory block size
>> + * and the device block size.
>> + *
>>  * User space / core MM (auto onlining) is responsible for onlining added
>>  * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>>  * always onlined separately, and all memory within a Linux memory block is
>> @@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
>> 	VIRTIO_MEM_SBM_MB_COUNT
>> };
>>
>> +/*
>> + * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>> + */
>> +enum virtio_mem_bbm_bb_state {
>> +	/* Unplugged, not added to Linux. Can be reused later. */
>> +	VIRTIO_MEM_BBM_BB_UNUSED = 0,
>> +	/* Plugged, not added to Linux. Error on add_memory(). */
>> +	VIRTIO_MEM_BBM_BB_PLUGGED,
>> +	/* Plugged and added to Linux. */
>> +	VIRTIO_MEM_BBM_BB_ADDED,
>> +	VIRTIO_MEM_BBM_BB_COUNT
>> +};
>> +
>> struct virtio_mem {
>> 	struct virtio_device *vdev;
>>
>> @@ -113,6 +132,9 @@ struct virtio_mem {
>> 	atomic64_t offline_size;
>> 	uint64_t offline_threshold;
>>
>> +	/* If set, the driver is in SBM, otherwise in BBM. */
>> +	bool in_sbm;
>> +
>> 	struct {
>> 		/* Id of the first memory block of this device. */
>> 		unsigned long first_mb_id;
>> @@ -151,9 +173,27 @@ struct virtio_mem {
>> 		unsigned long *sb_states;
>> 	} sbm;
>>
>> +	struct {
>> +		/* Id of the first big block of this device. */
>> +		unsigned long first_bb_id;
>> +		/* Id of the last usable big block of this device. */
>> +		unsigned long last_usable_bb_id;
>> +		/* Id of the next device bock to prepare when needed. */
>> +		unsigned long next_bb_id;
>> +
>> +		/* Summary of all big block states. */
>> +		unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>> +
>> +		/* One byte state per big block. See sbm.mb_states. */
>> +		uint8_t *bb_states;
>> +
>> +		/* The block size used for (un)plugged, adding/removing. */
>> +		uint64_t bb_size;
>> +	} bbm;
> 
> Can we use a union here?

As I had the same thought initially, it most probably makes sense :)

Thanks!


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
  2020-10-16  8:53   ` Wei Yang
@ 2020-10-16 13:17     ` David Hildenbrand
  2020-10-18 12:41       ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-16 13:17 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 16.10.20 10:53, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>> Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> One trivial suggestion, could we move this patch close the data structure
> movement patch?
> 
> I know this would be some work, since you have changed some of the code logic.
> This would take you some time to rebase.

You mean after patch #17 ?

I guess I can move patch #18 (prereq) a little further up (e.g., after
patch #15). Guess moving it in front of #19 shouldn't be too hard.

Will give it a try - if it takes too much effort, I'll leave it like this.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-16 10:32             ` David Hildenbrand
@ 2020-10-16 22:38               ` Wei Yang
  2020-10-17  7:39                 ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-16 22:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Fri, Oct 16, 2020 at 12:32:50PM +0200, David Hildenbrand wrote:
>>>> Ok, I seems to understand the logic now.
>>>>
>>>> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
>>>> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
>>>> How they hold reference to struct page?
>>>
>>> Sorry, I should have given you the right pointer. (similar to my other
>>> reply)
>>>
>>> We hold a reference either via
>>>
>>> 1. alloc_contig_range()
>> 
>> I am not familiar with this one, need to spend some time to look into.
>
>Each individual page will have a pagecount of 1.
>
>> 
>>> 2. memmap init code, when not calling generic_online_page().
>> 
>> I may miss some code here. Before online pages, memmaps are allocated in
>> section_activate(). They are supposed to be zero-ed. (I don't get the exact
>> code line.) I am not sure when we grab a refcount here.
>
>Best to refer to __init_single_page() -> init_page_count().
>
>Each page that wasn't onlined via generic_online_page() has a refcount
>of 1 and looks like allocated.
>

Thanks, I see the logic.

    online_pages()
        move_pfn_range_to_zone()  --- 1)
	online_pages_range()      --- 2)

At 1), __init_single_page() would set page count to 1. At 2),
generic_online_page() would clear page count, while the call back would not.

Then I am trying to search the place where un-zero page count prevent offline.
scan_movable_pages() would fail, since this is a PageOffline() and has 1 page
count.

So the GUARD we prevent offline partial-onlined pages is

    (PageOffline && page_count)

And your commit aa218795cb5fd583c94f

mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

is introduced to handle this case.

That's pretty clear now.

>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-12 12:52 ` [PATCH v1 05/29] virtio-mem: generalize check for added memory David Hildenbrand
  2020-10-15  8:28   ` Wei Yang
@ 2020-10-16 22:39   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-16 22:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:52:59PM +0200, David Hildenbrand wrote:
>Let's check by traversing busy system RAM resources instead, to avoid
>relying on memory block states.
>
>Don't use walk_system_ram_range(), as that works on pages and we want to
>use the bare addresses we have easily at hand.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index b3eebac7191f..6bbd1cfd10d3 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -1749,6 +1749,20 @@ static void virtio_mem_delete_resource(struct virtio_mem *vm)
> 	vm->parent_resource = NULL;
> }
> 
>+static int virtio_mem_range_has_system_ram(struct resource *res, void *arg)
>+{
>+	return 1;
>+}
>+
>+static bool virtio_mem_has_memory_added(struct virtio_mem *vm)
>+{
>+	const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>+
>+	return walk_iomem_res_desc(IORES_DESC_NONE, flags, vm->addr,
>+				   vm->addr + vm->region_size, NULL,
>+				   virtio_mem_range_has_system_ram) == 1;
>+}
>+
> static int virtio_mem_probe(struct virtio_device *vdev)
> {
> 	struct virtio_mem *vm;
>@@ -1870,10 +1884,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	 * the system. And there is no way to stop the driver/device from going
> 	 * away. Warn at least.
> 	 */
>-	if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
>-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
>-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
>-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
>+	if (virtio_mem_has_memory_added(vm)) {
> 		dev_warn(&vdev->dev, "device still has system memory added\n");
> 	} else {
> 		virtio_mem_delete_resource(vm);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-16 22:38               ` Wei Yang
@ 2020-10-17  7:39                 ` David Hildenbrand
  2020-10-18 12:27                   ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-17  7:39 UTC (permalink / raw)
  To: Wei Yang
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Andrew Morton, Michael S . Tsirkin, Jason Wang, Pankaj Gupta


> Am 17.10.2020 um 00:38 schrieb Wei Yang <richard.weiyang@linux.alibaba.com>:
> 
> On Fri, Oct 16, 2020 at 12:32:50PM +0200, David Hildenbrand wrote:
>>>>> Ok, I seems to understand the logic now.
>>>>> 
>>>>> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
>>>>> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
>>>>> How they hold reference to struct page?
>>>> 
>>>> Sorry, I should have given you the right pointer. (similar to my other
>>>> reply)
>>>> 
>>>> We hold a reference either via
>>>> 
>>>> 1. alloc_contig_range()
>>> 
>>> I am not familiar with this one, need to spend some time to look into.
>> 
>> Each individual page will have a pagecount of 1.
>> 
>>> 
>>>> 2. memmap init code, when not calling generic_online_page().
>>> 
>>> I may miss some code here. Before online pages, memmaps are allocated in
>>> section_activate(). They are supposed to be zero-ed. (I don't get the exact
>>> code line.) I am not sure when we grab a refcount here.
>> 
>> Best to refer to __init_single_page() -> init_page_count().
>> 
>> Each page that wasn't onlined via generic_online_page() has a refcount
>> of 1 and looks like allocated.
>> 
> 
> Thanks, I see the logic.
> 
>    online_pages()
>        move_pfn_range_to_zone()  --- 1)
>    online_pages_range()      --- 2)
> 
> At 1), __init_single_page() would set page count to 1. At 2),
> generic_online_page() would clear page count, while the call back would not.
> 
> Then I am trying to search the place where un-zero page count prevent offline.
> scan_movable_pages() would fail, since this is a PageOffline() and has 1 page
> count.
> 
> So the GUARD we prevent offline partial-onlined pages is
> 
>    (PageOffline && page_count)
> 
> And your commit aa218795cb5fd583c94f
> 
> mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
> 
> is introduced to handle this case.
> 
> That's pretty clear now.
> 

I‘m happy to see that I am no longer the only person that understands all this magic :)

Thanks for having a look / reviewing!

>> -- 
>> Thanks,
>> 
>> David / dhildenb
> 
> -- 
> Wei Yang
> Help you, Help me
> 


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-16  9:18     ` David Hildenbrand
@ 2020-10-18  3:57       ` Wei Yang
  2020-10-19  9:04         ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-18  3:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Fri, Oct 16, 2020 at 11:18:39AM +0200, David Hildenbrand wrote:
>On 16.10.20 06:03, Wei Yang wrote:
>> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote:
>>> Let's trigger from offlining code when we're not allowed to touch online

Here "touch" means "unplug"? If so, maybe s/touch/unplug/ would be more easy
to understand.

>>> memory.
>> 
>> This describes the change in virtio_mem_memory_notifier_cb()?
>
>Ah, yes, can try to make that clearer.
>
>> 
>>>
>>> Handle the other case (memmap possibly freeing up another memory block)
>>> when actually removing memory. When removing via virtio_mem_remove(),
>>> virtio_mem_retry() is a NOP and safe to use.
>>>
>>> While at it, move retry handling when offlining out of
>>> virtio_mem_notify_offline(), to share it with Device Block Mode (DBM)
>>> soon.
>> 
>> I may not understand the logic fully. Here is my understanding of current
>> logic:
>> 
>> 
>>   virtio_mem_run_wq()
>>       virtio_mem_unplug_request()
>>           virtio_mem_mb_unplug_any_sb_offline()
>> 	      virtio_mem_mb_remove()             --- 1
>> 	  virtio_mem_mb_unplug_any_sb_online()
>> 	      virtio_mem_mb_offline_and_remove() --- 2
>> 

I am trying to get more understanding about the logic of virtio_mem_retry().

Current logic seems clear to me. There are four places to trigger it:

    * notify_offline
    * notify_online
    * timer_expired
    * config_changed

In this patch, we try to optimize the first case, notify_offline.

Now, we would always trigger retry when one of our memory block get offlined.
Per my understanding, this logic is correct while missed one case (or be more
precise, not handle one case timely). The case this patch wants to improve is
virtio_mem_mb_remove(). If my understanding is correct.

   virtio_mem_run_wq()
       virtio_mem_unplug_request()
           virtio_mem_mb_unplug_any_sb_offline()
 	      virtio_mem_mb_remove()             --- 1
           virtio_mem_mb_unplug_any_sb_online()
              virtio_mem_mb_offline_and_remove() --- 2

The above is two functions this patch adjusts. For 2), it will offline the
memory block, thus will trigger virtio_mem_retry() originally. But for 1), the
memory block is already offlined, so virtio_mem_retry() will not be triggered
originally. This is the case we want to improve in this patch. Instead of wait
for timer expire, we trigger retry immediately after unplug/remove an offlined
memory block.

And after this change, this patch still adjust the original
virtio_mem_notify_offline() path to just trigger virtio_mem_retry() when
unplug_online is false. (This means the offline event is notified from user
space instead of from unplug event).

If my above analysis is correct, I got one small suggestion for this patch.
Instead of adjust current notify_offline handling, how about just trigger
retry during virtio_mem_mb_remove()? Since per my understanding, we just want
to do immediate trigger retry when unplug an offlined memory block.

>> This patch tries to trigger the wq at 1 and 2. And these two functions are
>> only valid during this code flow.
>
>Exactly.
>
>> 
>> These two functions actually remove some memory from the system. So I am not
>> sure where extra unplug-able memory comes from. I guess those memory is from
>> memory block device and mem_sectioin, memmap? While those memory is still
>> marked as online, right?
>
>Imagine you end up (only after some repeating plugging and unplugging of
>memory, otherwise it's obviously impossible):
>
>Memory block X: Contains only movable data
>
>Memory block X + 1: Contains memmap of Memory block X:
>
>
>We start to unplug from high, to low.
>
>1. Try to unplug/offline/remove block X + 1: fails, because of the
>   memmap
>2. Try to unplug/offline/remove block X: succeeds.
>3. Not all requested memory got unplugged. Sleep for 30 seconds.
>4. Retry to unplug/offline/remove block X + 1: succeeds
>
>What we do in 2, is that we trigger a retry of ourselves. That means,
>that in 3. we don't actually sleep, but retry immediately.
>
>This has been proven helpful in some of my tests, where you want to
>unplug *a lot* of memory again, not just some parts.
>
>
>Triggering a retry is fairly cheap. Assume you don't actually have to
>perform any more unplugging. The workqueue wakes up, detects that
>nothing is to do, and goes back to sleep.
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory
  2020-10-17  7:39                 ` David Hildenbrand
@ 2020-10-18 12:27                   ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-18 12:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Sat, Oct 17, 2020 at 09:39:38AM +0200, David Hildenbrand wrote:
>
>> Am 17.10.2020 um 00:38 schrieb Wei Yang <richard.weiyang@linux.alibaba.com>:
>> 
>> On Fri, Oct 16, 2020 at 12:32:50PM +0200, David Hildenbrand wrote:
>>>>>> Ok, I seems to understand the logic now.
>>>>>> 
>>>>>> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three
>>>>>> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag.
>>>>>> How they hold reference to struct page?
>>>>> 
>>>>> Sorry, I should have given you the right pointer. (similar to my other
>>>>> reply)
>>>>> 
>>>>> We hold a reference either via
>>>>> 
>>>>> 1. alloc_contig_range()
>>>> 
>>>> I am not familiar with this one, need to spend some time to look into.
>>> 
>>> Each individual page will have a pagecount of 1.
>>> 
>>>> 
>>>>> 2. memmap init code, when not calling generic_online_page().
>>>> 
>>>> I may miss some code here. Before online pages, memmaps are allocated in
>>>> section_activate(). They are supposed to be zero-ed. (I don't get the exact
>>>> code line.) I am not sure when we grab a refcount here.
>>> 
>>> Best to refer to __init_single_page() -> init_page_count().
>>> 
>>> Each page that wasn't onlined via generic_online_page() has a refcount
>>> of 1 and looks like allocated.
>>> 
>> 
>> Thanks, I see the logic.
>> 
>>    online_pages()
>>        move_pfn_range_to_zone()  --- 1)
>>    online_pages_range()      --- 2)
>> 
>> At 1), __init_single_page() would set page count to 1. At 2),
>> generic_online_page() would clear page count, while the call back would not.
>> 
>> Then I am trying to search the place where un-zero page count prevent offline.
>> scan_movable_pages() would fail, since this is a PageOffline() and has 1 page
>> count.
>> 
>> So the GUARD we prevent offline partial-onlined pages is
>> 
>>    (PageOffline && page_count)
>> 
>> And your commit aa218795cb5fd583c94f
>> 
>> mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
>> 
>> is introduced to handle this case.
>> 
>> That's pretty clear now.
>> 
>
>I‘m happy to see that I am no longer the only person that understands all this magic :)

Thanks for sharing the magic :-)

>
>Thanks for having a look / reviewing!
>
>>> -- 
>>> Thanks,
>>> 
>>> David / dhildenb
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier
  2020-10-16  8:57       ` David Hildenbrand
@ 2020-10-18 12:37         ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-18 12:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Fri, Oct 16, 2020 at 10:57:35AM +0200, David Hildenbrand wrote:
>>> Do we adjust the count twice?
>>>
>> 
>> Ah, I got the reason why we need to adjust count for *unplugged* sub-blocks.
>
>Exactly.
>
>> 
>>>> -		for (i = 0; i < nr_pages; i++) {
>>>> -			page = pfn_to_page(pfn + i);
>>>> -			if (WARN_ON(!page_ref_dec_and_test(page)))
>> 
>> Another question is when we grab a refcount for the unpluged pages? The one
>> you mentioned in virtio_mem_set_fake_offline().
>
>Yeah, that was confusing on my side. I actually meant
>virtio_mem_fake_offline() - patch #12.
>
>We have a reference on unplugged (fake offline) blocks via
>
>1. memmap initialization, if never online via generic_online_page()
>
>So if we keep pages fake offline when onlining memory, they
>
>a) Have a refcount of 1
>b) Have *not* increased the managed page count
>
>2. alloc_contig_range(), if fake offlined. After we fake-offlined pages
>(e.g., patch #12), such pages
>
>a) Have a refcount of 1
>b) Have *not* increased the managed page count (because we manually
>decreased it)
>

Yep, I got the reason now.

>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier
  2020-10-12 12:53 ` [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier David Hildenbrand
  2020-10-16  7:15   ` Wei Yang
@ 2020-10-18 12:38   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-18 12:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:07PM +0200, David Hildenbrand wrote:
>Let's factor out the core pieces and place the implementation next to
>virtio_mem_fake_offline(). We'll reuse this functionality soon.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> drivers/virtio/virtio_mem.c | 73 +++++++++++++++++++++++++------------
> 1 file changed, 50 insertions(+), 23 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index d132bc54ef57..a2124892e510 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -168,6 +168,10 @@ static LIST_HEAD(virtio_mem_devices);
> 
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order);
> static void virtio_mem_retry(struct virtio_mem *vm);
>+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
>+						  unsigned long nr_pages);
>+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>+						   unsigned long nr_pages);
> 
> /*
>  * Register a virtio-mem device so it will be considered for the online_page
>@@ -604,27 +608,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> 					    unsigned long mb_id)
> {
> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>-	struct page *page;
> 	unsigned long pfn;
>-	int sb_id, i;
>+	int sb_id;
> 
> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
>-		/*
>-		 * Drop our reference to the pages so the memory can get
>-		 * offlined and add the unplugged pages to the managed
>-		 * page counters (so offlining code can correctly subtract
>-		 * them again).
>-		 */
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			       sb_id * vm->subblock_size);
>-		adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
>-		for (i = 0; i < nr_pages; i++) {
>-			page = pfn_to_page(pfn + i);
>-			if (WARN_ON(!page_ref_dec_and_test(page)))
>-				dump_page(page, "unplugged page referenced");
>-		}
>+		virtio_mem_fake_offline_going_offline(pfn, nr_pages);
> 	}
> }
> 
>@@ -633,21 +625,14 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> {
> 	const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
> 	unsigned long pfn;
>-	int sb_id, i;
>+	int sb_id;
> 
> 	for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> 		if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> 			continue;
>-		/*
>-		 * Get the reference we dropped when going offline and
>-		 * subtract the unplugged pages from the managed page
>-		 * counters.
>-		 */
> 		pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> 			       sb_id * vm->subblock_size);
>-		adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>-		for (i = 0; i < nr_pages; i++)
>-			page_ref_inc(pfn_to_page(pfn + i));
>+		virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
> 	}
> }
> 
>@@ -853,6 +838,48 @@ static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
> 	return 0;
> }
> 
>+/*
>+ * Handle fake-offline pages when memory is going offline - such that the
>+ * pages can be skipped by mm-core when offlining.
>+ */
>+static void virtio_mem_fake_offline_going_offline(unsigned long pfn,
>+						  unsigned long nr_pages)
>+{
>+	struct page *page;
>+	unsigned long i;
>+
>+	/*
>+	 * Drop our reference to the pages so the memory can get offlined
>+	 * and add the unplugged pages to the managed page counters (so
>+	 * offlining code can correctly subtract them again).
>+	 */
>+	adjust_managed_page_count(pfn_to_page(pfn), nr_pages);
>+	/* Drop our reference to the pages so the memory can get offlined. */
>+	for (i = 0; i < nr_pages; i++) {
>+		page = pfn_to_page(pfn + i);
>+		if (WARN_ON(!page_ref_dec_and_test(page)))
>+			dump_page(page, "fake-offline page referenced");
>+	}
>+}
>+
>+/*
>+ * Handle fake-offline pages when memory offlining is canceled - to undo
>+ * what we did in virtio_mem_fake_offline_going_offline().
>+ */
>+static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>+						   unsigned long nr_pages)
>+{
>+	unsigned long i;
>+
>+	/*
>+	 * Get the reference we dropped when going offline and subtract the
>+	 * unplugged pages from the managed page counters.
>+	 */
>+	adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
>+	for (i = 0; i < nr_pages; i++)
>+		page_ref_inc(pfn_to_page(pfn + i));
>+}
>+
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> 	const unsigned long addr = page_to_phys(page);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
  2020-10-16 13:17     ` David Hildenbrand
@ 2020-10-18 12:41       ` Wei Yang
  2020-10-19 11:57         ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-18 12:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Fri, Oct 16, 2020 at 03:17:06PM +0200, David Hildenbrand wrote:
>On 16.10.20 10:53, Wei Yang wrote:
>> On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>>> Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>>>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> 
>> One trivial suggestion, could we move this patch close the data structure
>> movement patch?
>> 
>> I know this would be some work, since you have changed some of the code logic.
>> This would take you some time to rebase.
>
>You mean after patch #17 ?

Yes

>
>I guess I can move patch #18 (prereq) a little further up (e.g., after
>patch #15). Guess moving it in front of #19 shouldn't be too hard.
>
>Will give it a try - if it takes too much effort, I'll leave it like this.
>

Not a big deal, while it will make the change more intact to me.

This is a big patch set to me. In case it could be split into two parts, like
bug fix/logic improvement and BBM implementation, that would be more friendly
to review.

>Thanks!
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (28 preceding siblings ...)
  2020-10-12 12:53 ` [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe " David Hildenbrand
@ 2020-10-18 12:49 ` Wei Yang
  2020-10-18 15:29 ` Michael S. Tsirkin
  30 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-18 12:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Michal Hocko, Oscar Salvador,
	Pankaj Gupta, Wei Yang

On Mon, Oct 12, 2020 at 02:52:54PM +0200, David Hildenbrand wrote:
>virtio-mem currently only supports device block sizes that span at most
>a single Linux memory block. For example, gigantic pages in the hypervisor
>result on x86-64 in a device block size of 1 GiB - when the Linux memory
>block size is 128 MiB, we cannot support such devices (we fail loading the
>driver). Of course, we want to support any device block size in any Linux
>VM.
>
>Bigger device block sizes will become especially important once supporting
>VFIO in QEMU - each device block has to be mapped separately, and the
>maximum number of mappings for VFIO is 64k. So we usually want blocks in
>the gigabyte range when wanting to grow the VM big.
>
>This series:
>- Performs some cleanups
>- Factors out existing Sub Block Mode (SBM)
>- Implements memory hot(un)plug in Big Block Mode (BBM)
>
>I need one core-mm change, to make offline_and_remove_memory() eat bigger
>chunks.
>
>This series is based on "next-20201009" and can be found at:
>	git@gitlab.com:virtio-mem/linux.git virtio-mem-dbm-v1
>

I am trying to apply this patch set, while found I can't 'git fetch' this
repo. Is there any other repo I would apply this patch set?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM)
  2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
                   ` (29 preceding siblings ...)
  2020-10-18 12:49 ` [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) Wei Yang
@ 2020-10-18 15:29 ` Michael S. Tsirkin
  2020-10-18 16:34   ` David Hildenbrand
  30 siblings, 1 reply; 108+ messages in thread
From: Michael S. Tsirkin @ 2020-10-18 15:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Jason Wang, Michal Hocko, Oscar Salvador, Pankaj Gupta, Wei Yang

On Mon, Oct 12, 2020 at 02:52:54PM +0200, David Hildenbrand wrote:
> virtio-mem currently only supports device block sizes that span at most
> a single Linux memory block. For example, gigantic pages in the hypervisor
> result on x86-64 in a device block size of 1 GiB - when the Linux memory
> block size is 128 MiB, we cannot support such devices (we fail loading the
> driver). Of course, we want to support any device block size in any Linux
> VM.
> 
> Bigger device block sizes will become especially important once supporting
> VFIO in QEMU - each device block has to be mapped separately, and the
> maximum number of mappings for VFIO is 64k. So we usually want blocks in
> the gigabyte range when wanting to grow the VM big.

I guess it missed this Linux right? There's an mm change which did not
get an ack from mm mainatiners, so I can't merge it ...

> This series:
> - Performs some cleanups
> - Factors out existing Sub Block Mode (SBM)
> - Implements memory hot(un)plug in Big Block Mode (BBM)
> 
> I need one core-mm change, to make offline_and_remove_memory() eat bigger
> chunks.
> 
> This series is based on "next-20201009" and can be found at:
> 	git@gitlab.com:virtio-mem/linux.git virtio-mem-dbm-v1
> 
> Once some virtio-mem patches that are pending in the -mm tree are upstream
> (I guess they'll go in in 5.10), I'll resend based on Linus' tree.
> I suggest to take this (including the MM patch, acks/review please) via the
> vhost tree once time has come. In the meantime, I'll do more testing.
> 
> David Hildenbrand (29):
>   virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
>   virtio-mem: simplify calculation in
>     virtio_mem_mb_state_prepare_next_mb()
>   virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
>   virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
>   virtio-mem: generalize check for added memory
>   virtio-mem: generalize virtio_mem_owned_mb()
>   virtio-mem: generalize virtio_mem_overlaps_range()
>   virtio-mem: drop last_mb_id
>   virtio-mem: don't always trigger the workqueue when offlining memory
>   virtio-mem: generalize handling when memory is getting onlined
>     deferred
>   virtio-mem: use "unsigned long" for nr_pages when fake
>     onlining/offlining
>   virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
>   virtio-mem: factor out handling of fake-offline pages in memory
>     notifier
>   virtio-mem: retry fake-offlining via alloc_contig_range() on
>     ZONE_MOVABLE
>   virito-mem: document Sub Block Mode (SBM)
>   virtio-mem: memory block states are specific to Sub Block Mode (SBM)
>   virito-mem: subblock states are specific to Sub Block Mode (SBM)
>   virtio-mem: factor out calculation of the bit number within the
>     sb_states bitmap
>   virito-mem: existing (un)plug functions are specific to Sub Block Mode
>     (SBM)
>   virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block
>     Mode (SBM)
>   virtio-mem: memory notifier callbacks are specific to Sub Block Mode
>     (SBM)
>   virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
>   virtio-mem: factor out adding/removing memory from Linux
>   virtio-mem: print debug messages from virtio_mem_send_*_request()
>   virtio-mem: Big Block Mode (BBM) memory hotplug
>   virtio-mem: allow to force Big Block Mode (BBM) and set the big block
>     size
>   mm/memory_hotplug: extend offline_and_remove_memory() to handle more
>     than one memory block
>   virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
>   virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
> 
>  drivers/virtio/virtio_mem.c | 1783 +++++++++++++++++++++++++----------
>  mm/memory_hotplug.c         |  105 ++-
>  2 files changed, 1373 insertions(+), 515 deletions(-)
> 
> -- 
> 2.26.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM)
  2020-10-18 15:29 ` Michael S. Tsirkin
@ 2020-10-18 16:34   ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-18 16:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Andrew Morton, Jason Wang, Michal Hocko, Oscar Salvador,
	Pankaj Gupta, Wei Yang


> Am 18.10.2020 um 17:29 schrieb Michael S. Tsirkin <mst@redhat.com>:
> 
> On Mon, Oct 12, 2020 at 02:52:54PM +0200, David Hildenbrand wrote:
>> virtio-mem currently only supports device block sizes that span at most
>> a single Linux memory block. For example, gigantic pages in the hypervisor
>> result on x86-64 in a device block size of 1 GiB - when the Linux memory
>> block size is 128 MiB, we cannot support such devices (we fail loading the
>> driver). Of course, we want to support any device block size in any Linux
>> VM.
>> 
>> Bigger device block sizes will become especially important once supporting
>> VFIO in QEMU - each device block has to be mapped separately, and the
>> maximum number of mappings for VFIO is 64k. So we usually want blocks in
>> the gigabyte range when wanting to grow the VM big.
> 
> I guess it missed this Linux right? There's an mm change which did not
> get an ack from mm mainatiners, so I can't merge it ...

No issue, I was targeting 5.11 either way! I‘ll resend based on linus‘ tree now that all prereqs are upstream.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 21/29] virtio-mem: memory notifier callbacks are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 21/29] virtio-mem: memory notifier callbacks " David Hildenbrand
@ 2020-10-19  1:57   ` Wei Yang
  2020-10-19 10:22     ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-19  1:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 12, 2020 at 02:53:15PM +0200, David Hildenbrand wrote:
>Let's rename accordingly.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 3a772714fec9..d06c8760b337 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -589,8 +589,8 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
> 	return start >= vm->addr && start + size <= vm->addr + vm->region_size;
> }
> 
>-static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>-					  unsigned long mb_id)
>+static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
>+					      unsigned long mb_id)

Look into this patch with "virtio-mem: Big Block Mode (BBM) memory hotplug"
together, I thought the code is a little "complex".

The final logic of virtio_mem_memory_notifier_cb() looks like this:

    virtio_mem_memory_notifier_cb()
        if (vm->in_sbm)
	    notify_xxx()
        if (vm->in_sbm)
	    notify_xxx()

Can we adjust this like

    virtio_mem_memory_notifier_cb()
	notify_xxx()
            if (vm->in_sbm)
                return
	notify_xxx()
            if (vm->in_sbm)
                return

This style looks a little better to me.


-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug
  2020-10-12 12:53 ` [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug David Hildenbrand
  2020-10-16  9:38   ` Wei Yang
@ 2020-10-19  2:26   ` Wei Yang
  2020-10-19  9:15     ` David Hildenbrand
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-19  2:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>Currently, we do not support device block sizes that exceed the Linux
>memory block size. For example, having a device block size of 1 GiB (e.g.,
>gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>blocks.
>
>Let's implement Big Block Mode (BBM), whereby we add/remove at least
>one Linux memory block at a time. With a 1 GiB device block size, a Big
>Block (BB) will cover 8 Linux memory blocks.
>
>We'll keep registering the online_page_callback machinery, it will be used
>for safe memory hotunplug in BBM next.
>
>Note: BBM is properly prepared for variable-sized Linux memory
>blocks that we might see in the future. So we won't care how many Linux
>memory blocks a big block actually spans, and how the memory notifier is
>called.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
> 1 file changed, 402 insertions(+), 82 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index e68d0d99590c..4d396ef98a92 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
> /*
>  * virtio-mem currently supports the following modes of operation:
>  *
>- * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>+ * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
>  *   size of a Sub Block (SB) is determined based on the device block size, the
>  *   pageblock size, and the maximum allocation granularity of the buddy.
>  *   Subblocks within a Linux memory block might either be plugged or unplugged.
>  *   Memory is added/removed to Linux MM in Linux memory block granularity.
>  *
>+ * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>+ *   Memory is added/removed to Linux MM in Big Block granularity.
>+ *
>+ * The mode is determined automatically based on the Linux memory block size
>+ * and the device block size.
>+ *
>  * User space / core MM (auto onlining) is responsible for onlining added
>  * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>  * always onlined separately, and all memory within a Linux memory block is
>@@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
> 	VIRTIO_MEM_SBM_MB_COUNT
> };
> 
>+/*
>+ * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>+ */
>+enum virtio_mem_bbm_bb_state {
>+	/* Unplugged, not added to Linux. Can be reused later. */
>+	VIRTIO_MEM_BBM_BB_UNUSED = 0,
>+	/* Plugged, not added to Linux. Error on add_memory(). */
>+	VIRTIO_MEM_BBM_BB_PLUGGED,
>+	/* Plugged and added to Linux. */
>+	VIRTIO_MEM_BBM_BB_ADDED,
>+	VIRTIO_MEM_BBM_BB_COUNT
>+};
>+
> struct virtio_mem {
> 	struct virtio_device *vdev;
> 
>@@ -113,6 +132,9 @@ struct virtio_mem {
> 	atomic64_t offline_size;
> 	uint64_t offline_threshold;
> 
>+	/* If set, the driver is in SBM, otherwise in BBM. */
>+	bool in_sbm;
>+
> 	struct {
> 		/* Id of the first memory block of this device. */
> 		unsigned long first_mb_id;
>@@ -151,9 +173,27 @@ struct virtio_mem {
> 		unsigned long *sb_states;
> 	} sbm;
> 
>+	struct {
>+		/* Id of the first big block of this device. */
>+		unsigned long first_bb_id;
>+		/* Id of the last usable big block of this device. */
>+		unsigned long last_usable_bb_id;
>+		/* Id of the next device bock to prepare when needed. */
>+		unsigned long next_bb_id;
>+
>+		/* Summary of all big block states. */
>+		unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>+
>+		/* One byte state per big block. See sbm.mb_states. */
>+		uint8_t *bb_states;
>+
>+		/* The block size used for (un)plugged, adding/removing. */
>+		uint64_t bb_size;
>+	} bbm;
>+
> 	/*
>-	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>-	 * sbm.sb_states.
>+	 * Mutex that protects the sbm.mb_count, sbm.mb_states,
>+	 * sbm.sb_states, bbm.bb_count, and bbm.bb_states
> 	 *
> 	 * When this lock is held the pointers can't change, ONLINE and
> 	 * OFFLINE blocks can't change the state and no subblocks will get
>@@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
> 	return mb_id * memory_block_size_bytes();
> }
> 
>+/*
>+ * Calculate the big block id of a given address.
>+ */
>+static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
>+					      uint64_t addr)
>+{
>+	return addr / vm->bbm.bb_size;
>+}
>+
>+/*
>+ * Calculate the physical start address of a given big block id.
>+ */
>+static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
>+					 unsigned long bb_id)
>+{
>+	return bb_id * vm->bbm.bb_size;
>+}
>+
> /*
>  * Calculate the subblock id of a given address.
>  */
>@@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> 	return (addr - mb_addr) / vm->sbm.sb_size;
> }
> 
>+/*
>+ * Set the state of a big block, taking care of the state counter.
>+ */
>+static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
>+					unsigned long bb_id,
>+					enum virtio_mem_bbm_bb_state state)
>+{
>+	const unsigned long idx = bb_id - vm->bbm.first_bb_id;
>+	enum virtio_mem_bbm_bb_state old_state;
>+
>+	old_state = vm->bbm.bb_states[idx];
>+	vm->bbm.bb_states[idx] = state;
>+
>+	BUG_ON(vm->bbm.bb_count[old_state] == 0);
>+	vm->bbm.bb_count[old_state]--;
>+	vm->bbm.bb_count[state]++;
>+}
>+
>+/*
>+ * Get the state of a big block.
>+ */
>+static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
>+								unsigned long bb_id)
>+{
>+	return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
>+}
>+
>+/*
>+ * Prepare the big block state array for the next big block.
>+ */
>+static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>+{
>+	unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
>+	unsigned long new_bytes = old_bytes + 1;
>+	int old_pages = PFN_UP(old_bytes);
>+	int new_pages = PFN_UP(new_bytes);
>+	uint8_t *new_array;
>+
>+	if (vm->bbm.bb_states && old_pages == new_pages)
>+		return 0;
>+
>+	new_array = vzalloc(new_pages * PAGE_SIZE);
>+	if (!new_array)
>+		return -ENOMEM;
>+
>+	mutex_lock(&vm->hotplug_mutex);
>+	if (vm->bbm.bb_states)
>+		memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
>+	vfree(vm->bbm.bb_states);
>+	vm->bbm.bb_states = new_array;
>+	mutex_unlock(&vm->hotplug_mutex);
>+
>+	return 0;
>+}
>+
>+#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
>+	for (_bb_id = vm->bbm.first_bb_id; \
>+	     _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
>+	     _bb_id++) \
>+		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>+
> /*
>  * Set the state of a memory block, taking care of the state counter.
>  */
>@@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
> 	return virtio_mem_add_memory(vm, addr, size);
> }
> 
>+/*
>+ * See virtio_mem_add_memory(): Try adding a big block.
>+ */
>+static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_add_memory(vm, addr, size);
>+}
>+
> /*
>  * Try removing memory from Linux. Will only fail if memory blocks aren't
>  * offline.
>@@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 	struct memory_notify *mhp = arg;
> 	const unsigned long start = PFN_PHYS(mhp->start_pfn);
> 	const unsigned long size = PFN_PHYS(mhp->nr_pages);
>-	const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
> 	int rc = NOTIFY_OK;
>+	unsigned long id;
> 
> 	if (!virtio_mem_overlaps_range(vm, start, size))
> 		return NOTIFY_DONE;
> 
>-	/*
>-	 * Memory is onlined/offlined in memory block granularity. We cannot
>-	 * cross virtio-mem device boundaries and memory block boundaries. Bail
>-	 * out if this ever changes.
>-	 */
>-	if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>-			 !IS_ALIGNED(start, memory_block_size_bytes())))
>-		return NOTIFY_BAD;
>+	if (vm->in_sbm) {
>+		id = virtio_mem_phys_to_mb_id(start);
>+		/*
>+		 * In SBM, we add memory in separate memory blocks - we expect
>+		 * it to be onlined/offlined in the same granularity. Bail out
>+		 * if this ever changes.
>+		 */
>+		if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>+				 !IS_ALIGNED(start, memory_block_size_bytes())))
>+			return NOTIFY_BAD;
>+	} else {
>+		id = virtio_mem_phys_to_bb_id(vm, start);
>+		/*
>+		 * In BBM, we only care about onlining/offlining happening
>+		 * within a single big block, we don't care about the
>+		 * actual granularity as we don't track individual Linux
>+		 * memory blocks.
>+		 */
>+		if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
>+			return NOTIFY_BAD;
>+	}
> 
> 	/*
> 	 * Avoid circular locking lockdep warnings. We lock the mutex
>@@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 			break;
> 		}
> 		vm->hotplug_active = true;
>-		virtio_mem_sbm_notify_going_offline(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_going_offline(vm, id);
> 		break;
> 	case MEM_GOING_ONLINE:
> 		mutex_lock(&vm->hotplug_mutex);
>@@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 			break;
> 		}
> 		vm->hotplug_active = true;
>-		rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
>+		if (vm->in_sbm)
>+			rc = virtio_mem_sbm_notify_going_online(vm, id);
> 		break;
> 	case MEM_OFFLINE:
>-		virtio_mem_sbm_notify_offline(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_offline(vm, id);
> 
> 		atomic64_add(size, &vm->offline_size);
> 		/*
>@@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 		mutex_unlock(&vm->hotplug_mutex);
> 		break;
> 	case MEM_ONLINE:
>-		virtio_mem_sbm_notify_online(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_online(vm, id);
> 
> 		atomic64_sub(size, &vm->offline_size);
> 		/*
>@@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 	case MEM_CANCEL_OFFLINE:
> 		if (!vm->hotplug_active)
> 			break;
>-		virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
>+		if (vm->in_sbm)
>+			virtio_mem_sbm_notify_cancel_offline(vm, id);
> 		vm->hotplug_active = false;
> 		mutex_unlock(&vm->hotplug_mutex);
> 		break;
>@@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> 	const unsigned long addr = page_to_phys(page);
>-	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
>+	unsigned long id, sb_id;
> 	struct virtio_mem *vm;
>-	int sb_id;
>+	bool do_online;
> 
>-	/*
>-	 * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
>-	 * size/alignment and that this callback is is called with such a
>-	 * size/alignment. So we cannot cross subblocks and therefore
>-	 * also not memory blocks.
>-	 */
> 	rcu_read_lock();
> 	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
> 		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> 			continue;
> 
>-		sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>-		/*
>-		 * If plugged, online the pages, otherwise, set them fake
>-		 * offline (PageOffline).
>-		 */
>-		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>+		if (vm->in_sbm) {
>+			/*
>+			 * We exploit here that subblocks have at least
>+			 * MAX_ORDER_NR_PAGES size/alignment - so we cannot
>+			 * cross subblocks within one call.
>+			 */
>+			id = virtio_mem_phys_to_mb_id(addr);
>+			sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>+			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
>+								   sb_id, 1);
>+		} else {
>+			do_online = true;
>+		}
>+		if (do_online)
> 			generic_online_page(page, order);
> 		else
> 			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>@@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> 	return rc;
> }
> 
>+/*
>+ * Request to unplug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_send_unplug_request(vm, addr, size);
>+}
>+
>+/*
>+ * Request to plug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_send_plug_request(vm, addr, size);
>+}
>+
> /*
>  * Unplug the desired number of plugged subblocks of a offline or not-added
>  * memory block. Will fail if any subblock cannot get unplugged (instead of
>@@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> 	return 0;
> }
> 
>-/*
>- * Try to plug the requested amount of memory.
>- */
>-static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
> 	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
>@@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> 	return rc;
> }
> 
>+/*
>+ * Plug a big block and add it to Linux.
>+ *
>+ * Will modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
>+					  unsigned long bb_id)
>+{
>+	int rc;
>+
>+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+			 VIRTIO_MEM_BBM_BB_UNUSED))
>+		return -EINVAL;
>+
>+	rc = virtio_mem_bbm_plug_bb(vm, bb_id);
>+	if (rc)
>+		return rc;
>+	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>+
>+	rc = virtio_mem_bbm_add_bb(vm, bb_id);
>+	if (rc) {
>+		if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
>+			virtio_mem_bbm_set_bb_state(vm, bb_id,
>+						    VIRTIO_MEM_BBM_BB_UNUSED);
>+		else
>+			/* Retry from the main loop. */
>+			virtio_mem_bbm_set_bb_state(vm, bb_id,
>+						    VIRTIO_MEM_BBM_BB_PLUGGED);
>+		return rc;
>+	}
>+	return 0;
>+}
>+
>+/*
>+ * Prepare tracking data for the next big block.
>+ */
>+static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
>+					  unsigned long *bb_id)
>+{
>+	int rc;
>+
>+	if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
>+		return -ENOSPC;
>+
>+	/* Resize the big block state array if required. */
>+	rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
>+	if (rc)
>+		return rc;
>+
>+	vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
>+	*bb_id = vm->bbm.next_bb_id;
>+	vm->bbm.next_bb_id++;
>+	return 0;
>+}
>+
>+static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	uint64_t nb_bb = diff / vm->bbm.bb_size;
>+	unsigned long bb_id;
>+	int rc;
>+
>+	if (!nb_bb)
>+		return 0;
>+
>+	/* Try to plug and add unused big blocks */
>+	virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
>+		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+			return -ENOSPC;
>+
>+		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+		if (!rc)
>+			nb_bb--;
>+		if (rc || !nb_bb)
>+			return rc;
>+		cond_resched();
>+	}
>+
>+	/* Try to prepare, plug and add new big blocks */
>+	while (nb_bb) {
>+		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+			return -ENOSPC;
>+
>+		rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
>+		if (rc)
>+			return rc;
>+		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+		if (!rc)
>+			nb_bb--;
>+		if (rc)
>+			return rc;
>+		cond_resched();
>+	}
>+
>+	return 0;
>+}
>+
>+/*
>+ * Try to plug the requested amount of memory.
>+ */
>+static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	if (vm->in_sbm)
>+		return virtio_mem_sbm_plug_request(vm, diff);
>+	return virtio_mem_bbm_plug_request(vm, diff);
>+}
>+
> /*
>  * Unplug the desired number of plugged subblocks of an offline memory block.
>  * Will fail if any subblock cannot get unplugged (instead of skipping it).
>@@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> 	return 0;
> }
> 
>-/*
>- * Try to unplug the requested amount of memory.
>- */
>-static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
> 	uint64_t nb_sb = diff / vm->sbm.sb_size;
> 	unsigned long mb_id;
>@@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	return rc;
> }
> 
>+/*
>+ * Try to unplug the requested amount of memory.
>+ */
>+static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	if (vm->in_sbm)
>+		return virtio_mem_sbm_unplug_request(vm, diff);
>+	return -EBUSY;
>+}
>+
> /*
>  * Try to unplug all blocks that couldn't be unplugged before, for example,
>  * because the hypervisor was busy.
>  */
> static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> {
>-	unsigned long mb_id;
>+	unsigned long id;
> 	int rc;
> 
>-	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>-		rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
>+	if (!vm->in_sbm) {
>+		virtio_mem_bbm_for_each_bb(vm, id,
>+					   VIRTIO_MEM_BBM_BB_PLUGGED) {
>+			rc = virtio_mem_bbm_unplug_bb(vm, id);
>+			if (rc)
>+				return rc;
>+			virtio_mem_bbm_set_bb_state(vm, id,
>+						    VIRTIO_MEM_BBM_BB_UNUSED);
>+		}
>+		return 0;
>+	}
>+
>+	virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>+		rc = virtio_mem_sbm_unplug_mb(vm, id);
> 		if (rc)
> 			return rc;
>-		virtio_mem_sbm_set_mb_state(vm, mb_id,
>+		virtio_mem_sbm_set_mb_state(vm, id,
> 					    VIRTIO_MEM_SBM_MB_UNUSED);
> 	}
> 
>@@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
> 			usable_region_size, &usable_region_size);
> 	end_addr = vm->addr + usable_region_size;
> 	end_addr = min(end_addr, phys_limit);
>-	vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>+
>+	if (vm->in_sbm)
>+		vm->sbm.last_usable_mb_id =
>+					 virtio_mem_phys_to_mb_id(end_addr) - 1;
>+	else
>+		vm->bbm.last_usable_bb_id =
>+				     virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
> 
> 	/* see if there is a request to change the size */
> 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>@@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
> static int virtio_mem_init(struct virtio_mem *vm)
> {
> 	const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
>+	uint64_t sb_size, addr;
> 	uint16_t node_id;
> 
> 	if (!vm->vdev->config->get) {
>@@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	if (vm->nid == NUMA_NO_NODE)
> 		vm->nid = memory_add_physaddr_to_nid(vm->addr);
> 
>-	/*
>-	 * We always hotplug memory in memory block granularity. This way,
>-	 * we have to wait for exactly one memory block to online.
>-	 */
>-	if (vm->device_block_size > memory_block_size_bytes()) {
>-		dev_err(&vm->vdev->dev,
>-			"The block size is not supported (too big).\n");
>-		return -EINVAL;
>-	}
>-
> 	/* bad device setup - warn only */
> 	if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
> 		dev_warn(&vm->vdev->dev,
>@@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 	 * - Is required for now for alloc_contig_range() to work reliably -
> 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
> 	 */
>-	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>-				pageblock_nr_pages) * PAGE_SIZE;
>-	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>-				vm->sbm.sb_size);
>-	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>+	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+			pageblock_nr_pages) * PAGE_SIZE;
>+	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
>+
>+	if (sb_size < memory_block_size_bytes()) {
>+		/* SBM: At least two subblocks per Linux memory block. */
>+		vm->in_sbm = true;
>+		vm->sbm.sb_size = sb_size;
>+		vm->sbm.sbs_per_mb = memory_block_size_bytes() /
>+				     vm->sbm.sb_size;
>+
>+		/* Round up to the next full memory block */
>+		addr = vm->addr + memory_block_size_bytes() - 1;
>+		vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
>+		vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+	} else {
>+		/* BBM: At least one Linux memory block. */
>+		vm->bbm.bb_size = vm->device_block_size;
> 
>-	/* Round up to the next full memory block */
>-	vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>-						       memory_block_size_bytes());
>-	vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);

Per my understanding, vm->addr is not guaranteed to be bb_size aligned, right?

Why not round up to next big block?

>+		vm->bbm.next_bb_id = vm->bbm.first_bb_id;
>+	}
> 
> 	/* Prepare the offline threshold - make sure we can add two blocks. */
> 	vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
> 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
>+	/* In BBM, we also want at least two big blocks. */
>+	vm->offline_threshold = max_t(uint64_t, 2 * vm->bbm.bb_size,
>+				      vm->offline_threshold);
> 
> 	dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> 	dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
>@@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm)
> 		 (unsigned long long)vm->device_block_size);
> 	dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> 		 memory_block_size_bytes());
>-	dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>-		 (unsigned long long)vm->sbm.sb_size);
>+	if (vm->in_sbm)
>+		dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>+			 (unsigned long long)vm->sbm.sb_size);
>+	else
>+		dev_info(&vm->vdev->dev, "big block size: 0x%llx",
>+			 (unsigned long long)vm->bbm.bb_size);
> 	if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> 		dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
> 
>@@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	cancel_work_sync(&vm->wq);
> 	hrtimer_cancel(&vm->retry_timer);
> 
>-	/*
>-	 * After we unregistered our callbacks, user space can online partially
>-	 * plugged offline blocks. Make sure to remove them.
>-	 */
>-	virtio_mem_sbm_for_each_mb(vm, mb_id,
>-				   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>-		rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>-		BUG_ON(rc);
>-		virtio_mem_sbm_set_mb_state(vm, mb_id,
>-					    VIRTIO_MEM_SBM_MB_UNUSED);
>+	if (vm->in_sbm) {
>+		/*
>+		 * After we unregistered our callbacks, user space can online
>+		 * partially plugged offline blocks. Make sure to remove them.
>+		 */
>+		virtio_mem_sbm_for_each_mb(vm, mb_id,
>+					   VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>+			rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>+			BUG_ON(rc);
>+			virtio_mem_sbm_set_mb_state(vm, mb_id,
>+						    VIRTIO_MEM_SBM_MB_UNUSED);
>+		}
>+		/*
>+		 * After we unregistered our callbacks, user space can no longer
>+		 * offline partially plugged online memory blocks. No need to
>+		 * worry about them.
>+		 */
> 	}
>-	/*
>-	 * After we unregistered our callbacks, user space can no longer
>-	 * offline partially plugged online memory blocks. No need to worry
>-	 * about them.
>-	 */
> 
> 	/* unregister callbacks */
> 	unregister_virtio_mem_device(vm);
>@@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> 	}
> 
> 	/* remove all tracking data - no locking needed */
>-	vfree(vm->sbm.mb_states);
>-	vfree(vm->sbm.sb_states);
>+	if (vm->in_sbm) {
>+		vfree(vm->sbm.mb_states);
>+		vfree(vm->sbm.sb_states);
>+	} else {
>+		vfree(vm->bbm.bb_states);
>+	}
> 
> 	/* reset the device and cleanup the queues */
> 	vdev->config->reset(vdev);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2020-10-12 12:53 ` [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block David Hildenbrand
  2020-10-15 13:08   ` Michael S. Tsirkin
@ 2020-10-19  3:22   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-19  3:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:21PM +0200, David Hildenbrand wrote:
>virtio-mem soon wants to use offline_and_remove_memory() memory that
>exceeds a single Linux memory block (memory_block_size_bytes()). Let's
>remove that restriction.
>
>Let's remember the old state and try to restore that if anything goes
>wrong. While re-onlining can, in general, fail, it's highly unlikely to
>happen (usually only when a notifier fails to allocate memory, and these
>are rather rare).
>
>This will be used by virtio-mem to offline+remove memory ranges that are
>bigger than a single memory block - for example, with a device block
>size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
>block size of 128MB.
>
>While we could compress the state into 2 bit, using 8 bit is much
>easier.
>
>This handling is similar, but different to acpi_scan_try_to_offline():
>
>a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
>optimization is still relevant - it should only apply to ZONE_NORMAL
>(where we have no guarantees). If relevant, we can always add it.
>
>b) acpi_scan_try_to_offline() simply onlines all memory in case
>something goes wrong. It doesn't restore previous online type. Let's do
>that, so we won't overwrite what e.g., user space configured.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Looks good to me.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

>---
> mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 89 insertions(+), 16 deletions(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b44d4c7ba73b..217080ca93e5 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size)
> }
> EXPORT_SYMBOL_GPL(remove_memory);
> 
>+static int try_offline_memory_block(struct memory_block *mem, void *arg)
>+{
>+	uint8_t online_type = MMOP_ONLINE_KERNEL;
>+	uint8_t **online_types = arg;
>+	struct page *page;
>+	int rc;
>+
>+	/*
>+	 * Sense the online_type via the zone of the memory block. Offlining
>+	 * with multiple zones within one memory block will be rejected
>+	 * by offlining code ... so we don't care about that.
>+	 */
>+	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
>+	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
>+		online_type = MMOP_ONLINE_MOVABLE;
>+
>+	rc = device_offline(&mem->dev);
>+	/*
>+	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
>+	 * so try_reonline_memory_block() can do the right thing.
>+	 */
>+	if (!rc)
>+		**online_types = online_type;
>+
>+	(*online_types)++;
>+	/* Ignore if already offline. */
>+	return rc < 0 ? rc : 0;
>+}
>+
>+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
>+{
>+	uint8_t **online_types = arg;
>+	int rc;
>+
>+	if (**online_types != MMOP_OFFLINE) {
>+		mem->online_type = **online_types;
>+		rc = device_online(&mem->dev);
>+		if (rc < 0)
>+			pr_warn("%s: Failed to re-online memory: %d",
>+				__func__, rc);
>+	}
>+
>+	/* Continue processing all remaining memory blocks. */
>+	(*online_types)++;
>+	return 0;
>+}
>+
> /*
>- * Try to offline and remove a memory block. Might take a long time to
>- * finish in case memory is still in use. Primarily useful for memory devices
>- * that logically unplugged all memory (so it's no longer in use) and want to
>- * offline + remove the memory block.
>+ * Try to offline and remove memory. Might take a long time to finish in case
>+ * memory is still in use. Primarily useful for memory devices that logically
>+ * unplugged all memory (so it's no longer in use) and want to offline + remove
>+ * that memory.
>  */
> int offline_and_remove_memory(int nid, u64 start, u64 size)
> {
>-	struct memory_block *mem;
>-	int rc = -EINVAL;
>+	const unsigned long mb_count = size / memory_block_size_bytes();
>+	uint8_t *online_types, *tmp;
>+	int rc;
> 
> 	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
>-	    size != memory_block_size_bytes())
>-		return rc;
>+	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
>+		return -EINVAL;
>+
>+	/*
>+	 * We'll remember the old online type of each memory block, so we can
>+	 * try to revert whatever we did when offlining one memory block fails
>+	 * after offlining some others succeeded.
>+	 */
>+	online_types = kmalloc_array(mb_count, sizeof(*online_types),
>+				     GFP_KERNEL);
>+	if (!online_types)
>+		return -ENOMEM;
>+	/*
>+	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
>+	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
>+	 * try_reonline_memory_block().
>+	 */
>+	memset(online_types, MMOP_OFFLINE, mb_count);
> 
> 	lock_device_hotplug();
>-	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
>-	if (mem)
>-		rc = device_offline(&mem->dev);
>-	/* Ignore if the device is already offline. */
>-	if (rc > 0)
>-		rc = 0;
>+
>+	tmp = online_types;
>+	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
> 
> 	/*
>-	 * In case we succeeded to offline the memory block, remove it.
>+	 * In case we succeeded to offline all memory, remove it.
> 	 * This cannot fail as it cannot get onlined in the meantime.
> 	 */
> 	if (!rc) {
> 		rc = try_remove_memory(nid, start, size);
>-		WARN_ON_ONCE(rc);
>+		if (rc)
>+			pr_err("%s: Failed to remove memory: %d", __func__, rc);
>+	}
>+
>+	/*
>+	 * Rollback what we did. While memory onlining might theoretically fail
>+	 * (nacked by a notifier), it barely ever happens.
>+	 */
>+	if (rc) {
>+		tmp = online_types;
>+		walk_memory_blocks(start, size, &tmp,
>+				   try_reonline_memory_block);
> 	}
> 	unlock_device_hotplug();
> 
>+	kfree(online_types);
> 	return rc;
> }
> EXPORT_SYMBOL_GPL(offline_and_remove_memory);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
  2020-10-12 12:53 ` [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug David Hildenbrand
@ 2020-10-19  3:48   ` Wei Yang
  2020-10-19  9:12     ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-19  3:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:22PM +0200, David Hildenbrand wrote:
>Let's try to unplug completely offline big blocks first. Then, (if
>enabled via unplug_offline) try to offline and remove whole big blocks.
>
>No locking necessary - we can deal with concurrent onlining/offlining
>just fine.
>
>Note1: This is sub-optimal and might be dangerous in some environments: we
>could end up in an infinite loop when offlining (e.g., long-term pinnings),
>similar as with DIMMs. We'll introduce safe memory hotunplug via
>fake-offlining next, and use this basic mode only when explicitly enabled.
>
>Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable
>with bigger block sizes.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 156 +++++++++++++++++++++++++++++++++++-
> 1 file changed, 155 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 94cf44b15cbf..6bcd0acbff32 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -388,6 +388,12 @@ static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
> 	     _bb_id++) \
> 		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
> 
>+#define virtio_mem_bbm_for_each_bb_rev(_vm, _bb_id, _state) \
>+	for (_bb_id = vm->bbm.next_bb_id - 1; \
>+	     _bb_id >= vm->bbm.first_bb_id && _vm->bbm.bb_count[_state]; \
>+	     _bb_id--) \
>+		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>+
> /*
>  * Set the state of a memory block, taking care of the state counter.
>  */
>@@ -685,6 +691,18 @@ static int virtio_mem_sbm_remove_mb(struct virtio_mem *vm, unsigned long mb_id)
> 	return virtio_mem_remove_memory(vm, addr, size);
> }
> 
>+/*
>+ * See virtio_mem_remove_memory(): Try to remove all Linux memory blocks covered
>+ * by the big block.
>+ */
>+static int virtio_mem_bbm_remove_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_remove_memory(vm, addr, size);
>+}
>+
> /*
>  * Try offlining and removing memory from Linux.
>  *
>@@ -731,6 +749,19 @@ static int virtio_mem_sbm_offline_and_remove_mb(struct virtio_mem *vm,
> 	return virtio_mem_offline_and_remove_memory(vm, addr, size);
> }
> 
>+/*
>+ * See virtio_mem_offline_and_remove_memory(): Try to offline and remove a
>+ * all Linux memory blocks covered by the big block.
>+ */
>+static int virtio_mem_bbm_offline_and_remove_bb(struct virtio_mem *vm,
>+						unsigned long bb_id)
>+{
>+	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+	const uint64_t size = vm->bbm.bb_size;
>+
>+	return virtio_mem_offline_and_remove_memory(vm, addr, size);
>+}
>+
> /*
>  * Trigger the workqueue so the device can perform its magic.
>  */
>@@ -1928,6 +1959,129 @@ static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
> 	return rc;
> }
> 
>+/*
>+ * Try to offline and remove a big block from Linux and unplug it. Will fail
>+ * with -EBUSY if some memory is busy and cannot get unplugged.
>+ *
>+ * Will modify the state of the memory block. Might temporarily drop the
>+ * hotplug_mutex.
>+ */
>+static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
>+						       unsigned long bb_id)
>+{
>+	int rc;
>+
>+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+			 VIRTIO_MEM_BBM_BB_ADDED))
>+		return -EINVAL;
>+
>+	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
>+	if (rc)
>+		return rc;
>+
>+	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>+	if (rc)
>+		virtio_mem_bbm_set_bb_state(vm, bb_id,
>+					    VIRTIO_MEM_BBM_BB_PLUGGED);
>+	else
>+		virtio_mem_bbm_set_bb_state(vm, bb_id,
>+					    VIRTIO_MEM_BBM_BB_UNUSED);
>+	return rc;
>+}
>+
>+/*
>+ * Try to remove a big block from Linux and unplug it. Will fail with
>+ * -EBUSY if some memory is online.
>+ *
>+ * Will modify the state of the memory block.
>+ */
>+static int virtio_mem_bbm_remove_and_unplug_bb(struct virtio_mem *vm,
>+					       unsigned long bb_id)
>+{
>+	int rc;
>+
>+	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+			 VIRTIO_MEM_BBM_BB_ADDED))
>+		return -EINVAL;
>+
>+	rc = virtio_mem_bbm_remove_bb(vm, bb_id);
>+	if (rc)
>+		return -EBUSY;
>+
>+	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>+	if (rc)
>+		virtio_mem_bbm_set_bb_state(vm, bb_id,
>+					    VIRTIO_MEM_BBM_BB_PLUGGED);
>+	else
>+		virtio_mem_bbm_set_bb_state(vm, bb_id,
>+					    VIRTIO_MEM_BBM_BB_UNUSED);
>+	return rc;
>+}
>+
>+/*
>+ * Test if a big block is completely offline.
>+ */
>+static bool virtio_mem_bbm_bb_is_offline(struct virtio_mem *vm,
>+					 unsigned long bb_id)
>+{
>+	const unsigned long start_pfn = PFN_DOWN(virtio_mem_bb_id_to_phys(vm, bb_id));
>+	const unsigned long nr_pages = PFN_DOWN(vm->bbm.bb_size);
>+	unsigned long pfn;
>+
>+	for (pfn = start_pfn; pfn < start_pfn + nr_pages;
>+	     pfn += PAGES_PER_SECTION) {

Can we do the check with memory block granularity?

>+		if (pfn_to_online_page(pfn))
>+			return false;
>+	}
>+
>+	return true;
>+}
>+
>+static int virtio_mem_bbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+	uint64_t nb_bb = diff / vm->bbm.bb_size;
>+	uint64_t bb_id;
>+	int rc;
>+
>+	if (!nb_bb)
>+		return 0;
>+
>+	/* Try to unplug completely offline big blocks first. */
>+	virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) {
>+		cond_resched();
>+		/*
>+		 * As we're holding no locks, this check is racy as memory
>+		 * can get onlined in the meantime - but we'll fail gracefully.
>+		 */
>+		if (!virtio_mem_bbm_bb_is_offline(vm, bb_id))
>+			continue;
>+		rc = virtio_mem_bbm_remove_and_unplug_bb(vm, bb_id);
>+		if (rc == -EBUSY)
>+			continue;
>+		if (!rc)
>+			nb_bb--;
>+		if (rc || !nb_bb)
>+			return rc;
>+	}
>+
>+	if (!unplug_online)
>+		return 0;
>+
>+	/* Try to unplug any big blocks. */
>+	virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) {
>+		cond_resched();
>+		rc = virtio_mem_bbm_offline_remove_and_unplug_bb(vm, bb_id);
>+		if (rc == -EBUSY)
>+			continue;
>+		if (!rc)
>+			nb_bb--;
>+		if (rc || !nb_bb)
>+			return rc;
>+	}
>+
>+	return nb_bb ? -EBUSY : 0;
>+}
>+
> /*
>  * Try to unplug the requested amount of memory.
>  */
>@@ -1935,7 +2089,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
> 	if (vm->in_sbm)
> 		return virtio_mem_sbm_unplug_request(vm, diff);
>-	return -EBUSY;
>+	return virtio_mem_bbm_unplug_request(vm, diff);
> }
> 
> /*
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
  2020-10-12 12:53 ` [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe " David Hildenbrand
@ 2020-10-19  7:54   ` Wei Yang
  2020-10-19  8:50     ` David Hildenbrand
  2020-10-20  0:24   ` Wei Yang
  1 sibling, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-19  7:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote:
>Let's add a safe mechanism to unplug memory, avoiding long/endless loops
>when trying to offline memory - similar to in SBM.
>
>Fake-offline all memory (via alloc_contig_range()) before trying to
>offline+remove it. Use this mode as default, but allow to enable the other
>mode explicitly (which could give better memory hotunplug guarantees in

I don't get the point how unsafe mode would have a better guarantees?

>some environments).
>
>The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=0
>on the cmdline.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/virtio/virtio_mem.c | 97 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 95 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 6bcd0acbff32..09f11489be6f 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -37,6 +37,11 @@ module_param(bbm_block_size, ulong, 0444);
> MODULE_PARM_DESC(bbm_block_size,
> 		 "Big Block size in bytes. Default is 0 (auto-detection).");
> 
>+static bool bbm_safe_unplug = true;
>+module_param(bbm_safe_unplug, bool, 0444);
>+MODULE_PARM_DESC(bbm_safe_unplug,
>+	     "Use a safe unplug mechanism in BBM, avoiding long/endless loops");
>+
> /*
>  * virtio-mem currently supports the following modes of operation:
>  *
>@@ -87,6 +92,8 @@ enum virtio_mem_bbm_bb_state {
> 	VIRTIO_MEM_BBM_BB_PLUGGED,
> 	/* Plugged and added to Linux. */
> 	VIRTIO_MEM_BBM_BB_ADDED,
>+	/* All online parts are fake-offline, ready to remove. */
>+	VIRTIO_MEM_BBM_BB_FAKE_OFFLINE,
> 	VIRTIO_MEM_BBM_BB_COUNT
> };
> 
>@@ -889,6 +896,32 @@ static void virtio_mem_sbm_notify_cancel_offline(struct virtio_mem *vm,
> 	}
> }
> 
>+static void virtio_mem_bbm_notify_going_offline(struct virtio_mem *vm,
>+						unsigned long bb_id,
>+						unsigned long pfn,
>+						unsigned long nr_pages)
>+{
>+	/*
>+	 * When marked as "fake-offline", all online memory of this device block
>+	 * is allocated by us. Otherwise, we don't have any memory allocated.
>+	 */
>+	if (virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+	    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE)
>+		return;
>+	virtio_mem_fake_offline_going_offline(pfn, nr_pages);
>+}
>+
>+static void virtio_mem_bbm_notify_cancel_offline(struct virtio_mem *vm,
>+						 unsigned long bb_id,
>+						 unsigned long pfn,
>+						 unsigned long nr_pages)
>+{
>+	if (virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+	    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE)
>+		return;
>+	virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
>+}
>+
> /*
>  * This callback will either be called synchronously from add_memory() or
>  * asynchronously (e.g., triggered via user space). We have to be careful
>@@ -949,6 +982,10 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 		vm->hotplug_active = true;
> 		if (vm->in_sbm)
> 			virtio_mem_sbm_notify_going_offline(vm, id);
>+		else
>+			virtio_mem_bbm_notify_going_offline(vm, id,
>+							    mhp->start_pfn,
>+							    mhp->nr_pages);
> 		break;
> 	case MEM_GOING_ONLINE:
> 		mutex_lock(&vm->hotplug_mutex);
>@@ -999,6 +1036,10 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> 			break;
> 		if (vm->in_sbm)
> 			virtio_mem_sbm_notify_cancel_offline(vm, id);
>+		else
>+			virtio_mem_bbm_notify_cancel_offline(vm, id,
>+							     mhp->start_pfn,
>+							     mhp->nr_pages);
> 		vm->hotplug_active = false;
> 		mutex_unlock(&vm->hotplug_mutex);
> 		break;
>@@ -1189,7 +1230,13 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> 			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
> 								   sb_id, 1);
> 		} else {
>-			do_online = true;
>+			/*
>+			 * If the whole block is marked fake offline, keep
>+			 * everything that way.
>+			 */
>+			id = virtio_mem_phys_to_bb_id(vm, addr);
>+			do_online = virtio_mem_bbm_get_bb_state(vm, id) !=
>+				    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE;
> 		}
> 		if (do_online)
> 			generic_online_page(page, order);
>@@ -1969,15 +2016,50 @@ static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
> static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
> 						       unsigned long bb_id)
> {
>+	const unsigned long start_pfn = PFN_DOWN(virtio_mem_bb_id_to_phys(vm, bb_id));
>+	const unsigned long nr_pages = PFN_DOWN(vm->bbm.bb_size);
>+	unsigned long end_pfn = start_pfn + nr_pages;
>+	unsigned long pfn;
>+	struct page *page;
> 	int rc;
> 
> 	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
> 			 VIRTIO_MEM_BBM_BB_ADDED))
> 		return -EINVAL;
> 
>+	if (bbm_safe_unplug) {
>+		/*
>+		 * Start by fake-offlining all memory. Once we marked the device
>+		 * block as fake-offline, all newly onlined memory will
>+		 * automatically be kept fake-offline. Protect from concurrent
>+		 * onlining/offlining until we have a consistent state.
>+		 */
>+		mutex_lock(&vm->hotplug_mutex);
>+		virtio_mem_bbm_set_bb_state(vm, bb_id,
>+					    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE);
>+

State is set here.

>+		for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>+			page = pfn_to_online_page(pfn);
>+			if (!page)
>+				continue;
>+
>+			rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION);
>+			if (rc) {
>+				end_pfn = pfn;
>+				goto rollback_safe_unplug;
>+			}
>+		}
>+		mutex_unlock(&vm->hotplug_mutex);
>+	}
>+
> 	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
>-	if (rc)
>+	if (rc) {
>+		if (bbm_safe_unplug) {
>+			mutex_lock(&vm->hotplug_mutex);
>+			goto rollback_safe_unplug;
>+		}
> 		return rc;
>+	}
> 
> 	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
> 	if (rc)

And changed to PLUGGED or UNUSED based on rc.

>@@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
> 		virtio_mem_bbm_set_bb_state(vm, bb_id,
> 					    VIRTIO_MEM_BBM_BB_UNUSED);
> 	return rc;
>+
>+rollback_safe_unplug:
>+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>+		page = pfn_to_online_page(pfn);
>+		if (!page)
>+			continue;
>+		virtio_mem_fake_online(pfn, PAGES_PER_SECTION);
>+	}
>+	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);

And changed to ADDED if failed.

>+	mutex_unlock(&vm->hotplug_mutex);
>+	return rc;
> }

So in which case, the bbm state is FAKE_OFFLINE during
virtio_mem_bbm_notify_going_offline() and
virtio_mem_bbm_notify_cancel_offline() ?

> 
> /*
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
  2020-10-19  7:54   ` Wei Yang
@ 2020-10-19  8:50     ` David Hildenbrand
  2020-10-20  0:23       ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-19  8:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On 19.10.20 09:54, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote:
>> Let's add a safe mechanism to unplug memory, avoiding long/endless loops
>> when trying to offline memory - similar to in SBM.
>>
>> Fake-offline all memory (via alloc_contig_range()) before trying to
>> offline+remove it. Use this mode as default, but allow to enable the other
>> mode explicitly (which could give better memory hotunplug guarantees in
> 
> I don't get the point how unsafe mode would have a better guarantees?

It's primarily only relevant when there is a lot of concurrent action
going on while unplugging memory. Using alloc_contig_range() on
ZONE_MOVABLE can fail more easily than memory offlining.

alloc_contig_range() doesn't try as hard as memory offlining code to
isolate memory. There are known issues with temporary page pinning
(e.g., when a process dies) and the PCP. (mostly discovered via CMA
allocations)

See the TODO I add in patch #14.

[...]
>>
>> +	if (bbm_safe_unplug) {
>> +		/*
>> +		 * Start by fake-offlining all memory. Once we marked the device
>> +		 * block as fake-offline, all newly onlined memory will
>> +		 * automatically be kept fake-offline. Protect from concurrent
>> +		 * onlining/offlining until we have a consistent state.
>> +		 */
>> +		mutex_lock(&vm->hotplug_mutex);
>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +					    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE);
>> +
> 
> State is set here.
> 
>> +		for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>> +			page = pfn_to_online_page(pfn);
>> +			if (!page)
>> +				continue;
>> +
>> +			rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION);
>> +			if (rc) {
>> +				end_pfn = pfn;
>> +				goto rollback_safe_unplug;
>> +			}
>> +		}
>> +		mutex_unlock(&vm->hotplug_mutex);
>> +	}
>> +
>> 	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
>> -	if (rc)
>> +	if (rc) {
>> +		if (bbm_safe_unplug) {
>> +			mutex_lock(&vm->hotplug_mutex);
>> +			goto rollback_safe_unplug;
>> +		}
>> 		return rc;
>> +	}
>>
>> 	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>> 	if (rc)
> 
> And changed to PLUGGED or UNUSED based on rc.

Right, after offlining+remove succeeded. So no longer added to Linux.

The final state depends on the success of the unplug request towards the
hypervisor.

> 
>> @@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
>> 		virtio_mem_bbm_set_bb_state(vm, bb_id,
>> 					    VIRTIO_MEM_BBM_BB_UNUSED);
>> 	return rc;
>> +
>> +rollback_safe_unplug:
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>> +		page = pfn_to_online_page(pfn);
>> +		if (!page)
>> +			continue;
>> +		virtio_mem_fake_online(pfn, PAGES_PER_SECTION);
>> +	}
>> +	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
> 
> And changed to ADDED if failed.

Right, back to the initial state when entering this function.

> 
>> +	mutex_unlock(&vm->hotplug_mutex);
>> +	return rc;
>> }
> 
> So in which case, the bbm state is FAKE_OFFLINE during
> virtio_mem_bbm_notify_going_offline() and
> virtio_mem_bbm_notify_cancel_offline() ?

Exactly, so we can do our magic with fake-offline pages and our
virtio_mem_bbm_offline_and_remove_bb() can actually succeed.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-18  3:57       ` Wei Yang
@ 2020-10-19  9:04         ` David Hildenbrand
  2020-10-20  0:41           ` Wei Yang
  0 siblings, 1 reply; 108+ messages in thread
From: David Hildenbrand @ 2020-10-19  9:04 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 18.10.20 05:57, Wei Yang wrote:
> On Fri, Oct 16, 2020 at 11:18:39AM +0200, David Hildenbrand wrote:
>> On 16.10.20 06:03, Wei Yang wrote:
>>> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote:
>>>> Let's trigger from offlining code when we're not allowed to touch online
> 
> Here "touch" means "unplug"? If so, maybe s/touch/unplug/ would be more easy
> to understand.

Yes, much better.

[...]

> I am trying to get more understanding about the logic of virtio_mem_retry().
> 
> Current logic seems clear to me. There are four places to trigger it:
> 
>     * notify_offline
>     * notify_online
>     * timer_expired
>     * config_changed
> 
> In this patch, we try to optimize the first case, notify_offline.

Yes.

> 
> Now, we would always trigger retry when one of our memory block get offlined.
> Per my understanding, this logic is correct while missed one case (or be more
> precise, not handle one case timely). The case this patch wants to improve is
> virtio_mem_mb_remove(). If my understanding is correct.
> 

Yes, that's one part of it. Read below.

>    virtio_mem_run_wq()
>        virtio_mem_unplug_request()
>            virtio_mem_mb_unplug_any_sb_offline()
>  	      virtio_mem_mb_remove()             --- 1
>            virtio_mem_mb_unplug_any_sb_online()
>               virtio_mem_mb_offline_and_remove() --- 2
> 
> The above is two functions this patch adjusts. For 2), it will offline the
> memory block, thus will trigger virtio_mem_retry() originally. But for 1), the
> memory block is already offlined, so virtio_mem_retry() will not be triggered
> originally. This is the case we want to improve in this patch. Instead of wait
> for timer expire, we trigger retry immediately after unplug/remove an offlined
> memory block.
> 
> And after this change, this patch still adjust the original
> virtio_mem_notify_offline() path to just trigger virtio_mem_retry() when
> unplug_online is false. (This means the offline event is notified from user
> space instead of from unplug event).
> 
> If my above analysis is correct, I got one small suggestion for this patch.
> Instead of adjust current notify_offline handling, how about just trigger
> retry during virtio_mem_mb_remove()? Since per my understanding, we just want
> to do immediate trigger retry when unplug an offlined memory block.

I probably should have added the following to the patch description:

"This is a preparation for Big Block Mode (BBM), whereby we can see some
temporary offlining of memory blocks without actually making progress"

Imagine you have a Big Block that spans to Linux memory blocks. Assume
the first Linux memory blocks has no unmovable data on it.

Assume you call offline_and_remove_memory()

1. Try to offline the first block. Works, notifiers triggered.
virtio_mem_retry().
2. Try to offline the second block. Does not work.
3. Re-online first block.
4. Exit to main loop, exit workqueue.
5. Retry immediately (due to virtio_mem_retry()), go to 1.

So, you'll keep retrying forever. Found while debugging that exact issue :)


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
  2020-10-19  3:48   ` Wei Yang
@ 2020-10-19  9:12     ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-19  9:12 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On 19.10.20 05:48, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:22PM +0200, David Hildenbrand wrote:
>> Let's try to unplug completely offline big blocks first. Then, (if
>> enabled via unplug_offline) try to offline and remove whole big blocks.
>>
>> No locking necessary - we can deal with concurrent onlining/offlining
>> just fine.
>>
>> Note1: This is sub-optimal and might be dangerous in some environments: we
>> could end up in an infinite loop when offlining (e.g., long-term pinnings),
>> similar as with DIMMs. We'll introduce safe memory hotunplug via
>> fake-offlining next, and use this basic mode only when explicitly enabled.
>>
>> Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable
>> with bigger block sizes.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 156 +++++++++++++++++++++++++++++++++++-
>> 1 file changed, 155 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index 94cf44b15cbf..6bcd0acbff32 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -388,6 +388,12 @@ static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>> 	     _bb_id++) \
>> 		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>>
>> +#define virtio_mem_bbm_for_each_bb_rev(_vm, _bb_id, _state) \
>> +	for (_bb_id = vm->bbm.next_bb_id - 1; \
>> +	     _bb_id >= vm->bbm.first_bb_id && _vm->bbm.bb_count[_state]; \
>> +	     _bb_id--) \
>> +		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>> +
>> /*
>>  * Set the state of a memory block, taking care of the state counter.
>>  */
>> @@ -685,6 +691,18 @@ static int virtio_mem_sbm_remove_mb(struct virtio_mem *vm, unsigned long mb_id)
>> 	return virtio_mem_remove_memory(vm, addr, size);
>> }
>>
>> +/*
>> + * See virtio_mem_remove_memory(): Try to remove all Linux memory blocks covered
>> + * by the big block.
>> + */
>> +static int virtio_mem_bbm_remove_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> +	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> +	const uint64_t size = vm->bbm.bb_size;
>> +
>> +	return virtio_mem_remove_memory(vm, addr, size);
>> +}
>> +
>> /*
>>  * Try offlining and removing memory from Linux.
>>  *
>> @@ -731,6 +749,19 @@ static int virtio_mem_sbm_offline_and_remove_mb(struct virtio_mem *vm,
>> 	return virtio_mem_offline_and_remove_memory(vm, addr, size);
>> }
>>
>> +/*
>> + * See virtio_mem_offline_and_remove_memory(): Try to offline and remove a
>> + * all Linux memory blocks covered by the big block.
>> + */
>> +static int virtio_mem_bbm_offline_and_remove_bb(struct virtio_mem *vm,
>> +						unsigned long bb_id)
>> +{
>> +	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> +	const uint64_t size = vm->bbm.bb_size;
>> +
>> +	return virtio_mem_offline_and_remove_memory(vm, addr, size);
>> +}
>> +
>> /*
>>  * Trigger the workqueue so the device can perform its magic.
>>  */
>> @@ -1928,6 +1959,129 @@ static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> 	return rc;
>> }
>>
>> +/*
>> + * Try to offline and remove a big block from Linux and unplug it. Will fail
>> + * with -EBUSY if some memory is busy and cannot get unplugged.
>> + *
>> + * Will modify the state of the memory block. Might temporarily drop the
>> + * hotplug_mutex.
>> + */
>> +static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
>> +						       unsigned long bb_id)
>> +{
>> +	int rc;
>> +
>> +	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>> +			 VIRTIO_MEM_BBM_BB_ADDED))
>> +		return -EINVAL;
>> +
>> +	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
>> +	if (rc)
>> +		return rc;
>> +
>> +	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>> +	if (rc)
>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +					    VIRTIO_MEM_BBM_BB_PLUGGED);
>> +	else
>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +					    VIRTIO_MEM_BBM_BB_UNUSED);
>> +	return rc;
>> +}
>> +
>> +/*
>> + * Try to remove a big block from Linux and unplug it. Will fail with
>> + * -EBUSY if some memory is online.
>> + *
>> + * Will modify the state of the memory block.
>> + */
>> +static int virtio_mem_bbm_remove_and_unplug_bb(struct virtio_mem *vm,
>> +					       unsigned long bb_id)
>> +{
>> +	int rc;
>> +
>> +	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>> +			 VIRTIO_MEM_BBM_BB_ADDED))
>> +		return -EINVAL;
>> +
>> +	rc = virtio_mem_bbm_remove_bb(vm, bb_id);
>> +	if (rc)
>> +		return -EBUSY;
>> +
>> +	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>> +	if (rc)
>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +					    VIRTIO_MEM_BBM_BB_PLUGGED);
>> +	else
>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +					    VIRTIO_MEM_BBM_BB_UNUSED);
>> +	return rc;
>> +}
>> +
>> +/*
>> + * Test if a big block is completely offline.
>> + */
>> +static bool virtio_mem_bbm_bb_is_offline(struct virtio_mem *vm,
>> +					 unsigned long bb_id)
>> +{
>> +	const unsigned long start_pfn = PFN_DOWN(virtio_mem_bb_id_to_phys(vm, bb_id));
>> +	const unsigned long nr_pages = PFN_DOWN(vm->bbm.bb_size);
>> +	unsigned long pfn;
>> +
>> +	for (pfn = start_pfn; pfn < start_pfn + nr_pages;
>> +	     pfn += PAGES_PER_SECTION) {
> 
> Can we do the check with memory block granularity?

I had that initially, but the code turned out nicer this way (e.g.,
PAGES_PER_SECTION).

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug
  2020-10-19  2:26   ` Wei Yang
@ 2020-10-19  9:15     ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-19  9:15 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On 19.10.20 04:26, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>> Currently, we do not support device block sizes that exceed the Linux
>> memory block size. For example, having a device block size of 1 GiB (e.g.,
>> gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>> blocks.
>>
>> Let's implement Big Block Mode (BBM), whereby we add/remove at least
>> one Linux memory block at a time. With a 1 GiB device block size, a Big
>> Block (BB) will cover 8 Linux memory blocks.
>>
>> We'll keep registering the online_page_callback machinery, it will be used
>> for safe memory hotunplug in BBM next.
>>
>> Note: BBM is properly prepared for variable-sized Linux memory
>> blocks that we might see in the future. So we won't care how many Linux
>> memory blocks a big block actually spans, and how the memory notifier is
>> called.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
>> 1 file changed, 402 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index e68d0d99590c..4d396ef98a92 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>> /*
>>  * virtio-mem currently supports the following modes of operation:
>>  *
>> - * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>> + * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
>>  *   size of a Sub Block (SB) is determined based on the device block size, the
>>  *   pageblock size, and the maximum allocation granularity of the buddy.
>>  *   Subblocks within a Linux memory block might either be plugged or unplugged.
>>  *   Memory is added/removed to Linux MM in Linux memory block granularity.
>>  *
>> + * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>> + *   Memory is added/removed to Linux MM in Big Block granularity.
>> + *
>> + * The mode is determined automatically based on the Linux memory block size
>> + * and the device block size.
>> + *
>>  * User space / core MM (auto onlining) is responsible for onlining added
>>  * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>>  * always onlined separately, and all memory within a Linux memory block is
>> @@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
>> 	VIRTIO_MEM_SBM_MB_COUNT
>> };
>>
>> +/*
>> + * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>> + */
>> +enum virtio_mem_bbm_bb_state {
>> +	/* Unplugged, not added to Linux. Can be reused later. */
>> +	VIRTIO_MEM_BBM_BB_UNUSED = 0,
>> +	/* Plugged, not added to Linux. Error on add_memory(). */
>> +	VIRTIO_MEM_BBM_BB_PLUGGED,
>> +	/* Plugged and added to Linux. */
>> +	VIRTIO_MEM_BBM_BB_ADDED,
>> +	VIRTIO_MEM_BBM_BB_COUNT
>> +};
>> +
>> struct virtio_mem {
>> 	struct virtio_device *vdev;
>>
>> @@ -113,6 +132,9 @@ struct virtio_mem {
>> 	atomic64_t offline_size;
>> 	uint64_t offline_threshold;
>>
>> +	/* If set, the driver is in SBM, otherwise in BBM. */
>> +	bool in_sbm;
>> +
>> 	struct {
>> 		/* Id of the first memory block of this device. */
>> 		unsigned long first_mb_id;
>> @@ -151,9 +173,27 @@ struct virtio_mem {
>> 		unsigned long *sb_states;
>> 	} sbm;
>>
>> +	struct {
>> +		/* Id of the first big block of this device. */
>> +		unsigned long first_bb_id;
>> +		/* Id of the last usable big block of this device. */
>> +		unsigned long last_usable_bb_id;
>> +		/* Id of the next device bock to prepare when needed. */
>> +		unsigned long next_bb_id;
>> +
>> +		/* Summary of all big block states. */
>> +		unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>> +
>> +		/* One byte state per big block. See sbm.mb_states. */
>> +		uint8_t *bb_states;
>> +
>> +		/* The block size used for (un)plugged, adding/removing. */
>> +		uint64_t bb_size;
>> +	} bbm;
>> +
>> 	/*
>> -	 * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>> -	 * sbm.sb_states.
>> +	 * Mutex that protects the sbm.mb_count, sbm.mb_states,
>> +	 * sbm.sb_states, bbm.bb_count, and bbm.bb_states
>> 	 *
>> 	 * When this lock is held the pointers can't change, ONLINE and
>> 	 * OFFLINE blocks can't change the state and no subblocks will get
>> @@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
>> 	return mb_id * memory_block_size_bytes();
>> }
>>
>> +/*
>> + * Calculate the big block id of a given address.
>> + */
>> +static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
>> +					      uint64_t addr)
>> +{
>> +	return addr / vm->bbm.bb_size;
>> +}
>> +
>> +/*
>> + * Calculate the physical start address of a given big block id.
>> + */
>> +static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
>> +					 unsigned long bb_id)
>> +{
>> +	return bb_id * vm->bbm.bb_size;
>> +}
>> +
>> /*
>>  * Calculate the subblock id of a given address.
>>  */
>> @@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
>> 	return (addr - mb_addr) / vm->sbm.sb_size;
>> }
>>
>> +/*
>> + * Set the state of a big block, taking care of the state counter.
>> + */
>> +static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
>> +					unsigned long bb_id,
>> +					enum virtio_mem_bbm_bb_state state)
>> +{
>> +	const unsigned long idx = bb_id - vm->bbm.first_bb_id;
>> +	enum virtio_mem_bbm_bb_state old_state;
>> +
>> +	old_state = vm->bbm.bb_states[idx];
>> +	vm->bbm.bb_states[idx] = state;
>> +
>> +	BUG_ON(vm->bbm.bb_count[old_state] == 0);
>> +	vm->bbm.bb_count[old_state]--;
>> +	vm->bbm.bb_count[state]++;
>> +}
>> +
>> +/*
>> + * Get the state of a big block.
>> + */
>> +static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
>> +								unsigned long bb_id)
>> +{
>> +	return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
>> +}
>> +
>> +/*
>> + * Prepare the big block state array for the next big block.
>> + */
>> +static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>> +{
>> +	unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
>> +	unsigned long new_bytes = old_bytes + 1;
>> +	int old_pages = PFN_UP(old_bytes);
>> +	int new_pages = PFN_UP(new_bytes);
>> +	uint8_t *new_array;
>> +
>> +	if (vm->bbm.bb_states && old_pages == new_pages)
>> +		return 0;
>> +
>> +	new_array = vzalloc(new_pages * PAGE_SIZE);
>> +	if (!new_array)
>> +		return -ENOMEM;
>> +
>> +	mutex_lock(&vm->hotplug_mutex);
>> +	if (vm->bbm.bb_states)
>> +		memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
>> +	vfree(vm->bbm.bb_states);
>> +	vm->bbm.bb_states = new_array;
>> +	mutex_unlock(&vm->hotplug_mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
>> +	for (_bb_id = vm->bbm.first_bb_id; \
>> +	     _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
>> +	     _bb_id++) \
>> +		if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>> +
>> /*
>>  * Set the state of a memory block, taking care of the state counter.
>>  */
>> @@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
>> 	return virtio_mem_add_memory(vm, addr, size);
>> }
>>
>> +/*
>> + * See virtio_mem_add_memory(): Try adding a big block.
>> + */
>> +static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> +	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> +	const uint64_t size = vm->bbm.bb_size;
>> +
>> +	return virtio_mem_add_memory(vm, addr, size);
>> +}
>> +
>> /*
>>  * Try removing memory from Linux. Will only fail if memory blocks aren't
>>  * offline.
>> @@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> 	struct memory_notify *mhp = arg;
>> 	const unsigned long start = PFN_PHYS(mhp->start_pfn);
>> 	const unsigned long size = PFN_PHYS(mhp->nr_pages);
>> -	const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
>> 	int rc = NOTIFY_OK;
>> +	unsigned long id;
>>
>> 	if (!virtio_mem_overlaps_range(vm, start, size))
>> 		return NOTIFY_DONE;
>>
>> -	/*
>> -	 * Memory is onlined/offlined in memory block granularity. We cannot
>> -	 * cross virtio-mem device boundaries and memory block boundaries. Bail
>> -	 * out if this ever changes.
>> -	 */
>> -	if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>> -			 !IS_ALIGNED(start, memory_block_size_bytes())))
>> -		return NOTIFY_BAD;
>> +	if (vm->in_sbm) {
>> +		id = virtio_mem_phys_to_mb_id(start);
>> +		/*
>> +		 * In SBM, we add memory in separate memory blocks - we expect
>> +		 * it to be onlined/offlined in the same granularity. Bail out
>> +		 * if this ever changes.
>> +		 */
>> +		if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>> +				 !IS_ALIGNED(start, memory_block_size_bytes())))
>> +			return NOTIFY_BAD;
>> +	} else {
>> +		id = virtio_mem_phys_to_bb_id(vm, start);
>> +		/*
>> +		 * In BBM, we only care about onlining/offlining happening
>> +		 * within a single big block, we don't care about the
>> +		 * actual granularity as we don't track individual Linux
>> +		 * memory blocks.
>> +		 */
>> +		if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
>> +			return NOTIFY_BAD;
>> +	}
>>
>> 	/*
>> 	 * Avoid circular locking lockdep warnings. We lock the mutex
>> @@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> 			break;
>> 		}
>> 		vm->hotplug_active = true;
>> -		virtio_mem_sbm_notify_going_offline(vm, mb_id);
>> +		if (vm->in_sbm)
>> +			virtio_mem_sbm_notify_going_offline(vm, id);
>> 		break;
>> 	case MEM_GOING_ONLINE:
>> 		mutex_lock(&vm->hotplug_mutex);
>> @@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> 			break;
>> 		}
>> 		vm->hotplug_active = true;
>> -		rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
>> +		if (vm->in_sbm)
>> +			rc = virtio_mem_sbm_notify_going_online(vm, id);
>> 		break;
>> 	case MEM_OFFLINE:
>> -		virtio_mem_sbm_notify_offline(vm, mb_id);
>> +		if (vm->in_sbm)
>> +			virtio_mem_sbm_notify_offline(vm, id);
>>
>> 		atomic64_add(size, &vm->offline_size);
>> 		/*
>> @@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> 		mutex_unlock(&vm->hotplug_mutex);
>> 		break;
>> 	case MEM_ONLINE:
>> -		virtio_mem_sbm_notify_online(vm, mb_id);
>> +		if (vm->in_sbm)
>> +			virtio_mem_sbm_notify_online(vm, id);
>>
>> 		atomic64_sub(size, &vm->offline_size);
>> 		/*
>> @@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> 	case MEM_CANCEL_OFFLINE:
>> 		if (!vm->hotplug_active)
>> 			break;
>> -		virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
>> +		if (vm->in_sbm)
>> +			virtio_mem_sbm_notify_cancel_offline(vm, id);
>> 		vm->hotplug_active = false;
>> 		mutex_unlock(&vm->hotplug_mutex);
>> 		break;
>> @@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>> {
>> 	const unsigned long addr = page_to_phys(page);
>> -	const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
>> +	unsigned long id, sb_id;
>> 	struct virtio_mem *vm;
>> -	int sb_id;
>> +	bool do_online;
>>
>> -	/*
>> -	 * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
>> -	 * size/alignment and that this callback is is called with such a
>> -	 * size/alignment. So we cannot cross subblocks and therefore
>> -	 * also not memory blocks.
>> -	 */
>> 	rcu_read_lock();
>> 	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
>> 		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
>> 			continue;
>>
>> -		sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>> -		/*
>> -		 * If plugged, online the pages, otherwise, set them fake
>> -		 * offline (PageOffline).
>> -		 */
>> -		if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>> +		if (vm->in_sbm) {
>> +			/*
>> +			 * We exploit here that subblocks have at least
>> +			 * MAX_ORDER_NR_PAGES size/alignment - so we cannot
>> +			 * cross subblocks within one call.
>> +			 */
>> +			id = virtio_mem_phys_to_mb_id(addr);
>> +			sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>> +			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
>> +								   sb_id, 1);
>> +		} else {
>> +			do_online = true;
>> +		}
>> +		if (do_online)
>> 			generic_online_page(page, order);
>> 		else
>> 			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>> @@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>> 	return rc;
>> }
>>
>> +/*
>> + * Request to unplug a big block.
>> + *
>> + * Will not modify the state of the big block.
>> + */
>> +static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> +	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> +	const uint64_t size = vm->bbm.bb_size;
>> +
>> +	return virtio_mem_send_unplug_request(vm, addr, size);
>> +}
>> +
>> +/*
>> + * Request to plug a big block.
>> + *
>> + * Will not modify the state of the big block.
>> + */
>> +static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> +	const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> +	const uint64_t size = vm->bbm.bb_size;
>> +
>> +	return virtio_mem_send_plug_request(vm, addr, size);
>> +}
>> +
>> /*
>>  * Unplug the desired number of plugged subblocks of a offline or not-added
>>  * memory block. Will fail if any subblock cannot get unplugged (instead of
>> @@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>> 	return 0;
>> }
>>
>> -/*
>> - * Try to plug the requested amount of memory.
>> - */
>> -static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>> +static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>> {
>> 	uint64_t nb_sb = diff / vm->sbm.sb_size;
>> 	unsigned long mb_id;
>> @@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>> 	return rc;
>> }
>>
>> +/*
>> + * Plug a big block and add it to Linux.
>> + *
>> + * Will modify the state of the big block.
>> + */
>> +static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
>> +					  unsigned long bb_id)
>> +{
>> +	int rc;
>> +
>> +	if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>> +			 VIRTIO_MEM_BBM_BB_UNUSED))
>> +		return -EINVAL;
>> +
>> +	rc = virtio_mem_bbm_plug_bb(vm, bb_id);
>> +	if (rc)
>> +		return rc;
>> +	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>> +
>> +	rc = virtio_mem_bbm_add_bb(vm, bb_id);
>> +	if (rc) {
>> +		if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
>> +			virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +						    VIRTIO_MEM_BBM_BB_UNUSED);
>> +		else
>> +			/* Retry from the main loop. */
>> +			virtio_mem_bbm_set_bb_state(vm, bb_id,
>> +						    VIRTIO_MEM_BBM_BB_PLUGGED);
>> +		return rc;
>> +	}
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Prepare tracking data for the next big block.
>> + */
>> +static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
>> +					  unsigned long *bb_id)
>> +{
>> +	int rc;
>> +
>> +	if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
>> +		return -ENOSPC;
>> +
>> +	/* Resize the big block state array if required. */
>> +	rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
>> +	if (rc)
>> +		return rc;
>> +
>> +	vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
>> +	*bb_id = vm->bbm.next_bb_id;
>> +	vm->bbm.next_bb_id++;
>> +	return 0;
>> +}
>> +
>> +static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>> +{
>> +	uint64_t nb_bb = diff / vm->bbm.bb_size;
>> +	unsigned long bb_id;
>> +	int rc;
>> +
>> +	if (!nb_bb)
>> +		return 0;
>> +
>> +	/* Try to plug and add unused big blocks */
>> +	virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
>> +		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>> +			return -ENOSPC;
>> +
>> +		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>> +		if (!rc)
>> +			nb_bb--;
>> +		if (rc || !nb_bb)
>> +			return rc;
>> +		cond_resched();
>> +	}
>> +
>> +	/* Try to prepare, plug and add new big blocks */
>> +	while (nb_bb) {
>> +		if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>> +			return -ENOSPC;
>> +
>> +		rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
>> +		if (rc)
>> +			return rc;
>> +		rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>> +		if (!rc)
>> +			nb_bb--;
>> +		if (rc)
>> +			return rc;
>> +		cond_resched();
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Try to plug the requested amount of memory.
>> + */
>> +static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>> +{
>> +	if (vm->in_sbm)
>> +		return virtio_mem_sbm_plug_request(vm, diff);
>> +	return virtio_mem_bbm_plug_request(vm, diff);
>> +}
>> +
>> /*
>>  * Unplug the desired number of plugged subblocks of an offline memory block.
>>  * Will fail if any subblock cannot get unplugged (instead of skipping it).
>> @@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
>> 	return 0;
>> }
>>
>> -/*
>> - * Try to unplug the requested amount of memory.
>> - */
>> -static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> +static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> {
>> 	uint64_t nb_sb = diff / vm->sbm.sb_size;
>> 	unsigned long mb_id;
>> @@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> 	return rc;
>> }
>>
>> +/*
>> + * Try to unplug the requested amount of memory.
>> + */
>> +static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> +{
>> +	if (vm->in_sbm)
>> +		return virtio_mem_sbm_unplug_request(vm, diff);
>> +	return -EBUSY;
>> +}
>> +
>> /*
>>  * Try to unplug all blocks that couldn't be unplugged before, for example,
>>  * because the hypervisor was busy.
>>  */
>> static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
>> {
>> -	unsigned long mb_id;
>> +	unsigned long id;
>> 	int rc;
>>
>> -	virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>> -		rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
>> +	if (!vm->in_sbm) {
>> +		virtio_mem_bbm_for_each_bb(vm, id,
>> +					   VIRTIO_MEM_BBM_BB_PLUGGED) {
>> +			rc = virtio_mem_bbm_unplug_bb(vm, id);
>> +			if (rc)
>> +				return rc;
>> +			virtio_mem_bbm_set_bb_state(vm, id,
>> +						    VIRTIO_MEM_BBM_BB_UNUSED);
>> +		}
>> +		return 0;
>> +	}
>> +
>> +	virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>> +		rc = virtio_mem_sbm_unplug_mb(vm, id);
>> 		if (rc)
>> 			return rc;
>> -		virtio_mem_sbm_set_mb_state(vm, mb_id,
>> +		virtio_mem_sbm_set_mb_state(vm, id,
>> 					    VIRTIO_MEM_SBM_MB_UNUSED);
>> 	}
>>
>> @@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
>> 			usable_region_size, &usable_region_size);
>> 	end_addr = vm->addr + usable_region_size;
>> 	end_addr = min(end_addr, phys_limit);
>> -	vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>> +
>> +	if (vm->in_sbm)
>> +		vm->sbm.last_usable_mb_id =
>> +					 virtio_mem_phys_to_mb_id(end_addr) - 1;
>> +	else
>> +		vm->bbm.last_usable_bb_id =
>> +				     virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
>>
>> 	/* see if there is a request to change the size */
>> 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>> @@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
>> static int virtio_mem_init(struct virtio_mem *vm)
>> {
>> 	const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
>> +	uint64_t sb_size, addr;
>> 	uint16_t node_id;
>>
>> 	if (!vm->vdev->config->get) {
>> @@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
>> 	if (vm->nid == NUMA_NO_NODE)
>> 		vm->nid = memory_add_physaddr_to_nid(vm->addr);
>>
>> -	/*
>> -	 * We always hotplug memory in memory block granularity. This way,
>> -	 * we have to wait for exactly one memory block to online.
>> -	 */
>> -	if (vm->device_block_size > memory_block_size_bytes()) {
>> -		dev_err(&vm->vdev->dev,
>> -			"The block size is not supported (too big).\n");
>> -		return -EINVAL;
>> -	}
>> -
>> 	/* bad device setup - warn only */
>> 	if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
>> 		dev_warn(&vm->vdev->dev,
>> @@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
>> 	 * - Is required for now for alloc_contig_range() to work reliably -
>> 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
>> 	 */
>> -	vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>> -				pageblock_nr_pages) * PAGE_SIZE;
>> -	vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>> -				vm->sbm.sb_size);
>> -	vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>> +	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>> +			pageblock_nr_pages) * PAGE_SIZE;
>> +	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
>> +
>> +	if (sb_size < memory_block_size_bytes()) {
>> +		/* SBM: At least two subblocks per Linux memory block. */
>> +		vm->in_sbm = true;
>> +		vm->sbm.sb_size = sb_size;
>> +		vm->sbm.sbs_per_mb = memory_block_size_bytes() /
>> +				     vm->sbm.sb_size;
>> +
>> +		/* Round up to the next full memory block */
>> +		addr = vm->addr + memory_block_size_bytes() - 1;
>> +		vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
>> +		vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>> +	} else {
>> +		/* BBM: At least one Linux memory block. */
>> +		vm->bbm.bb_size = vm->device_block_size;
>>
>> -	/* Round up to the next full memory block */
>> -	vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>> -						       memory_block_size_bytes());
>> -	vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>> +		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
> 
> Per my understanding, vm->addr is not guaranteed to be bb_size aligned, right?
> 

The virtio spec enforces alignment to device block size. (QEMU is buggy
with bigger block sizes, though. Fix is on the QEMU list.)

> Why not round up to next big block?

Will implicitly be done in patch #26, where it might no longer be
guaranteed.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 21/29] virtio-mem: memory notifier callbacks are specific to Sub Block Mode (SBM)
  2020-10-19  1:57   ` Wei Yang
@ 2020-10-19 10:22     ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-19 10:22 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 19.10.20 03:57, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:15PM +0200, David Hildenbrand wrote:
>> Let's rename accordingly.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/virtio/virtio_mem.c | 29 +++++++++++++++--------------
>> 1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index 3a772714fec9..d06c8760b337 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -589,8 +589,8 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
>> 	return start >= vm->addr && start + size <= vm->addr + vm->region_size;
>> }
>>
>> -static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>> -					  unsigned long mb_id)
>> +static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
>> +					      unsigned long mb_id)
> 
> Look into this patch with "virtio-mem: Big Block Mode (BBM) memory hotplug"
> together, I thought the code is a little "complex".
> 
> The final logic of virtio_mem_memory_notifier_cb() looks like this:
> 
>     virtio_mem_memory_notifier_cb()
>         if (vm->in_sbm)
> 	    notify_xxx()
>         if (vm->in_sbm)
> 	    notify_xxx()
> 
> Can we adjust this like
> 
>     virtio_mem_memory_notifier_cb()
> 	notify_xxx()
>             if (vm->in_sbm)
>                 return
> 	notify_xxx()
>             if (vm->in_sbm)
>                 return
> 
> This style looks a little better to me.

Then we lose all the shared code after any of the mode-specific
handling? Like we have in MEM_OFFLINE, MEM_ONLINE, MEM_CANCEL_OFFLINE, ...

Don't think this will improve the situation.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
  2020-10-18 12:41       ` Wei Yang
@ 2020-10-19 11:57         ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-19 11:57 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On 18.10.20 14:41, Wei Yang wrote:
> On Fri, Oct 16, 2020 at 03:17:06PM +0200, David Hildenbrand wrote:
>> On 16.10.20 10:53, Wei Yang wrote:
>>> On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>>>> Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>>>>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> One trivial suggestion, could we move this patch close the data structure
>>> movement patch?
>>>
>>> I know this would be some work, since you have changed some of the code logic.
>>> This would take you some time to rebase.
>>
>> You mean after patch #17 ?
> 
> Yes
> 
>>
>> I guess I can move patch #18 (prereq) a little further up (e.g., after
>> patch #15). Guess moving it in front of #19 shouldn't be too hard.
>>
>> Will give it a try - if it takes too much effort, I'll leave it like this.
>>
> 
> Not a big deal, while it will make the change more intact to me.
> 
> This is a big patch set to me. In case it could be split into two parts, like
> bug fix/logic improvement and BBM implementation, that would be more friendly
> to review.

I'll most probably keep it as a single series, but reshuffle the patches
into

1. cleanups
2. preparations
3. BBM

That should make things easier to digest. Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
  2020-10-19  8:50     ` David Hildenbrand
@ 2020-10-20  0:23       ` Wei Yang
  0 siblings, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-20  0:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On Mon, Oct 19, 2020 at 10:50:39AM +0200, David Hildenbrand wrote:
>On 19.10.20 09:54, Wei Yang wrote:
>> On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote:
>>> Let's add a safe mechanism to unplug memory, avoiding long/endless loops
>>> when trying to offline memory - similar to in SBM.
>>>
>>> Fake-offline all memory (via alloc_contig_range()) before trying to
>>> offline+remove it. Use this mode as default, but allow to enable the other
>>> mode explicitly (which could give better memory hotunplug guarantees in
>> 
>> I don't get the point how unsafe mode would have a better guarantees?
>
>It's primarily only relevant when there is a lot of concurrent action
>going on while unplugging memory. Using alloc_contig_range() on
>ZONE_MOVABLE can fail more easily than memory offlining.
>
>alloc_contig_range() doesn't try as hard as memory offlining code to
>isolate memory. There are known issues with temporary page pinning
>(e.g., when a process dies) and the PCP. (mostly discovered via CMA
>allocations)
>
>See the TODO I add in patch #14.
>
>[...]
>>>
>>> +	if (bbm_safe_unplug) {
>>> +		/*
>>> +		 * Start by fake-offlining all memory. Once we marked the device
>>> +		 * block as fake-offline, all newly onlined memory will
>>> +		 * automatically be kept fake-offline. Protect from concurrent
>>> +		 * onlining/offlining until we have a consistent state.
>>> +		 */
>>> +		mutex_lock(&vm->hotplug_mutex);
>>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>>> +					    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE);
>>> +
>> 
>> State is set here.
>> 
>>> +		for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> +			page = pfn_to_online_page(pfn);
>>> +			if (!page)
>>> +				continue;
>>> +
>>> +			rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION);
>>> +			if (rc) {
>>> +				end_pfn = pfn;
>>> +				goto rollback_safe_unplug;
>>> +			}
>>> +		}
>>> +		mutex_unlock(&vm->hotplug_mutex);
>>> +	}
>>> +
>>> 	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
>>> -	if (rc)
>>> +	if (rc) {
>>> +		if (bbm_safe_unplug) {
>>> +			mutex_lock(&vm->hotplug_mutex);
>>> +			goto rollback_safe_unplug;
>>> +		}
>>> 		return rc;
>>> +	}
>>>
>>> 	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>>> 	if (rc)
>> 
>> And changed to PLUGGED or UNUSED based on rc.
>
>Right, after offlining+remove succeeded. So no longer added to Linux.
>
>The final state depends on the success of the unplug request towards the
>hypervisor.
>
>> 
>>> @@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
>>> 		virtio_mem_bbm_set_bb_state(vm, bb_id,
>>> 					    VIRTIO_MEM_BBM_BB_UNUSED);
>>> 	return rc;
>>> +
>>> +rollback_safe_unplug:
>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> +		page = pfn_to_online_page(pfn);
>>> +		if (!page)
>>> +			continue;
>>> +		virtio_mem_fake_online(pfn, PAGES_PER_SECTION);
>>> +	}
>>> +	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>> 
>> And changed to ADDED if failed.
>
>Right, back to the initial state when entering this function.
>
>> 
>>> +	mutex_unlock(&vm->hotplug_mutex);
>>> +	return rc;
>>> }
>> 
>> So in which case, the bbm state is FAKE_OFFLINE during
>> virtio_mem_bbm_notify_going_offline() and
>> virtio_mem_bbm_notify_cancel_offline() ?
>
>Exactly, so we can do our magic with fake-offline pages and our
>virtio_mem_bbm_offline_and_remove_bb() can actually succeed.

Ah, my fault. The exact code flow is this:

    virtio_mem_bbm_offline_remove_and_unplug_bb()
        virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_FAKE_OFFLINE)
	virtio_mem_fake_offline(pfn, PAGES_PER_SECTION)
	virtio_mem_bbm_offline_and_remove_bb(vm, bb_id)
	    offline and trigger memory notification  --- 1)

The notification is necessary at 1) to release the refcount, which is grabbed
during fake offline.

>
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
  2020-10-12 12:53 ` [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe " David Hildenbrand
  2020-10-19  7:54   ` Wei Yang
@ 2020-10-20  0:24   ` Wei Yang
  1 sibling, 0 replies; 108+ messages in thread
From: Wei Yang @ 2020-10-20  0:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador, Wei Yang

On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote:
>Let's add a safe mechanism to unplug memory, avoiding long/endless loops
>when trying to offline memory - similar to in SBM.
>
>Fake-offline all memory (via alloc_contig_range()) before trying to
>offline+remove it. Use this mode as default, but allow to enable the other
>mode explicitly (which could give better memory hotunplug guarantees in
>some environments).
>
>The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=0
>on the cmdline.
>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-19  9:04         ` David Hildenbrand
@ 2020-10-20  0:41           ` Wei Yang
  2020-10-20  9:09             ` David Hildenbrand
  0 siblings, 1 reply; 108+ messages in thread
From: Wei Yang @ 2020-10-20  0:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta

On Mon, Oct 19, 2020 at 11:04:40AM +0200, David Hildenbrand wrote:
>On 18.10.20 05:57, Wei Yang wrote:
>> On Fri, Oct 16, 2020 at 11:18:39AM +0200, David Hildenbrand wrote:
>>> On 16.10.20 06:03, Wei Yang wrote:
>>>> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote:
>>>>> Let's trigger from offlining code when we're not allowed to touch online
>> 
>> Here "touch" means "unplug"? If so, maybe s/touch/unplug/ would be more easy
>> to understand.
>
>Yes, much better.
>
>[...]
>
>> I am trying to get more understanding about the logic of virtio_mem_retry().
>> 
>> Current logic seems clear to me. There are four places to trigger it:
>> 
>>     * notify_offline
>>     * notify_online
>>     * timer_expired
>>     * config_changed
>> 
>> In this patch, we try to optimize the first case, notify_offline.
>
>Yes.
>
>> 
>> Now, we would always trigger retry when one of our memory block get offlined.
>> Per my understanding, this logic is correct while missed one case (or be more
>> precise, not handle one case timely). The case this patch wants to improve is
>> virtio_mem_mb_remove(). If my understanding is correct.
>> 
>
>Yes, that's one part of it. Read below.
>
>>    virtio_mem_run_wq()
>>        virtio_mem_unplug_request()
>>            virtio_mem_mb_unplug_any_sb_offline()
>>  	      virtio_mem_mb_remove()             --- 1
>>            virtio_mem_mb_unplug_any_sb_online()
>>               virtio_mem_mb_offline_and_remove() --- 2
>> 
>> The above is two functions this patch adjusts. For 2), it will offline the
>> memory block, thus will trigger virtio_mem_retry() originally. But for 1), the
>> memory block is already offlined, so virtio_mem_retry() will not be triggered
>> originally. This is the case we want to improve in this patch. Instead of wait
>> for timer expire, we trigger retry immediately after unplug/remove an offlined
>> memory block.
>> 
>> And after this change, this patch still adjust the original
>> virtio_mem_notify_offline() path to just trigger virtio_mem_retry() when
>> unplug_online is false. (This means the offline event is notified from user
>> space instead of from unplug event).
>> 
>> If my above analysis is correct, I got one small suggestion for this patch.
>> Instead of adjust current notify_offline handling, how about just trigger
>> retry during virtio_mem_mb_remove()? Since per my understanding, we just want
>> to do immediate trigger retry when unplug an offlined memory block.
>
>I probably should have added the following to the patch description:
>
>"This is a preparation for Big Block Mode (BBM), whereby we can see some
>temporary offlining of memory blocks without actually making progress"
>
>Imagine you have a Big Block that spans to Linux memory blocks. Assume
>the first Linux memory blocks has no unmovable data on it.
>
>Assume you call offline_and_remove_memory()
>
>1. Try to offline the first block. Works, notifiers triggered.
>virtio_mem_retry().

After this patch, the virtio_mem_retry() is remove here.

>2. Try to offline the second block. Does not work.
>3. Re-online first block.
>4. Exit to main loop, exit workqueue.

Since offline_and_remove_memory() doesn't succeed, virtio_mem_retry() is not
triggered.

>5. Retry immediately (due to virtio_mem_retry()), go to 1.

So we won't have endless loop.

>
>So, you'll keep retrying forever. Found while debugging that exact issue :)
>

If this is the case, my suggestion is to record it in the changelog.
Otherwise, we may lose this corner case which is important to this change.

>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory
  2020-10-20  0:41           ` Wei Yang
@ 2020-10-20  9:09             ` David Hildenbrand
  0 siblings, 0 replies; 108+ messages in thread
From: David Hildenbrand @ 2020-10-20  9:09 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-kernel, linux-mm, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang, Pankaj Gupta


>> So, you'll keep retrying forever. Found while debugging that exact issue :)
>>
> 
> If this is the case, my suggestion is to record it in the changelog.
> Otherwise, we may lose this corner case which is important to this change.

Yes, already added it - thanks!


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 07/29] virtio-mem: generalize virtio_mem_overlaps_range()
  2020-10-12 12:53 ` [PATCH v1 07/29] virtio-mem: generalize virtio_mem_overlaps_range() David Hildenbrand
@ 2020-10-20  9:22   ` Pankaj Gupta
  0 siblings, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-20  9:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> Avoid using memory block ids. While at it, use uint64_t for
> address/size.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 821143db14fe..37a0e338ae4a 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -489,14 +489,10 @@ static int virtio_mem_translate_node_id(struct virtio_mem *vm, uint16_t node_id)
>   * Test if a virtio-mem device overlaps with the given range. Can be called
>   * from (notifier) callbacks lockless.
>   */
> -static bool virtio_mem_overlaps_range(struct virtio_mem *vm,
> -                                     unsigned long start, unsigned long size)
> +static bool virtio_mem_overlaps_range(struct virtio_mem *vm, uint64_t start,
> +                                     uint64_t size)
>  {
> -       unsigned long dev_start = virtio_mem_mb_id_to_phys(vm->first_mb_id);
> -       unsigned long dev_end = virtio_mem_mb_id_to_phys(vm->last_mb_id) +
> -                               memory_block_size_bytes();
> -
> -       return start < dev_end && dev_start < start + size;
> +       return start < vm->addr + vm->region_size && vm->addr < start + size;
>  }

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
  2020-10-12 12:53 ` [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline() David Hildenbrand
  2020-10-16  6:24   ` Wei Yang
@ 2020-10-20  9:31   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-20  9:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> ... which now matches virtio_mem_fake_online(). We'll reuse this
> functionality soon.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 34 ++++++++++++++++++++++++----------
>  1 file changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 00d1cfca4713..d132bc54ef57 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -832,6 +832,27 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
>         }
>  }
>
> +/*
> + * Try to allocate a range, marking pages fake-offline, effectively
> + * fake-offlining them.
> + */
> +static int virtio_mem_fake_offline(unsigned long pfn, unsigned long nr_pages)
> +{
> +       int rc;
> +
> +       rc = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE,
> +                               GFP_KERNEL);
> +       if (rc == -ENOMEM)
> +               /* whoops, out of memory */
> +               return rc;
> +       if (rc)
> +               return -EBUSY;
> +
> +       virtio_mem_set_fake_offline(pfn, nr_pages, true);
> +       adjust_managed_page_count(pfn_to_page(pfn), -nr_pages);
> +       return 0;
> +}
> +
>  static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>  {
>         const unsigned long addr = page_to_phys(page);
> @@ -1335,17 +1356,10 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
>
>         start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>                              sb_id * vm->subblock_size);
> -       rc = alloc_contig_range(start_pfn, start_pfn + nr_pages,
> -                               MIGRATE_MOVABLE, GFP_KERNEL);
> -       if (rc == -ENOMEM)
> -               /* whoops, out of memory */
> -               return rc;
> -       if (rc)
> -               return -EBUSY;
>
> -       /* Mark it as fake-offline before unplugging it */
> -       virtio_mem_set_fake_offline(start_pfn, nr_pages, true);
> -       adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
> +       rc = virtio_mem_fake_offline(start_pfn, nr_pages);
> +       if (rc)
> +               return rc;
>
>         /* Try to unplug the allocated memory */
>         rc = virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, count);

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM)
  2020-10-15  9:33   ` David Hildenbrand
@ 2020-10-20  9:38     ` Pankaj Gupta
  0 siblings, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-20  9:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> > Let's add some documentation for the current mode - Sub Block Mode (SBM) -
> > to prepare for a new mode - Big Block Mode (BBM).
> >
> > Follow-up patches will properly factor out the existing Sub Block Mode
> > (SBM) and implement Device Block Mode (DBM).
>
> s/Device Block Mode (DBM)/Big Block Mode (BBM)/
>

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 16/29] virtio-mem: memory block states are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 16/29] virtio-mem: memory block states are specific to " David Hildenbrand
  2020-10-16  8:40   ` Wei Yang
  2020-10-16  8:43   ` Wei Yang
@ 2020-10-20  9:48   ` Pankaj Gupta
  2 siblings, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-20  9:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> let's use a new "sbm" sub-struct to hold SBM-specific state and rename +
> move applicable definitions, frunctions, and variables (related to
> memory block states).
>
> While at it:
> - Drop the "_STATE" part from memory block states
> - Rename "nb_mb_state" to "mb_count"
> - "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state"
> - Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t"
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 215 ++++++++++++++++++------------------
>  1 file changed, 109 insertions(+), 106 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index fd8685673fe4..e76d6f769aa5 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -42,20 +42,23 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>   * onlined to the same zone - virtio-mem relies on this behavior.
>   */
>
> -enum virtio_mem_mb_state {
> +/*
> + * State of a Linux memory block in SBM.
> + */
> +enum virtio_mem_sbm_mb_state {
>         /* Unplugged, not added to Linux. Can be reused later. */
> -       VIRTIO_MEM_MB_STATE_UNUSED = 0,
> +       VIRTIO_MEM_SBM_MB_UNUSED = 0,
>         /* (Partially) plugged, not added to Linux. Error on add_memory(). */
> -       VIRTIO_MEM_MB_STATE_PLUGGED,
> +       VIRTIO_MEM_SBM_MB_PLUGGED,
>         /* Fully plugged, fully added to Linux, offline. */
> -       VIRTIO_MEM_MB_STATE_OFFLINE,
> +       VIRTIO_MEM_SBM_MB_OFFLINE,
>         /* Partially plugged, fully added to Linux, offline. */
> -       VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
> +       VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL,
>         /* Fully plugged, fully added to Linux, online. */
> -       VIRTIO_MEM_MB_STATE_ONLINE,
> +       VIRTIO_MEM_SBM_MB_ONLINE,
>         /* Partially plugged, fully added to Linux, online. */
> -       VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
> -       VIRTIO_MEM_MB_STATE_COUNT
> +       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL,
> +       VIRTIO_MEM_SBM_MB_COUNT
>  };
>
>  struct virtio_mem {
> @@ -113,9 +116,6 @@ struct virtio_mem {
>          */
>         const char *resource_name;
>
> -       /* Summary of all memory block states. */
> -       unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
> -
>         /*
>          * We don't want to add too much memory if it's not getting onlined,
>          * to avoid running OOM. Besides this threshold, we allow to have at
> @@ -125,27 +125,29 @@ struct virtio_mem {
>         atomic64_t offline_size;
>         uint64_t offline_threshold;
>
> -       /*
> -        * One byte state per memory block.
> -        *
> -        * Allocated via vmalloc(). When preparing new blocks, resized
> -        * (alloc+copy+free) when needed (crossing pages with the next mb).
> -        * (when crossing pages).
> -        *
> -        * With 128MB memory blocks, we have states for 512GB of memory in one
> -        * page.
> -        */
> -       uint8_t *mb_state;
> +       struct {
> +               /* Summary of all memory block states. */
> +               unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
> +
> +               /*
> +                * One byte state per memory block. Allocated via vmalloc().
> +                * Resized (alloc+copy+free) on demand.
> +                *
> +                * With 128 MiB memory blocks, we have states for 512 GiB of
> +                * memory in one 4 KiB page.
> +                */
> +               uint8_t *mb_states;
> +       } sbm;
>
>         /*
> -        * $nb_sb_per_mb bit per memory block. Handled similar to mb_state.
> +        * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
>          *
>          * With 4MB subblocks, we manage 128GB of memory in one page.
>          */
>         unsigned long *sb_bitmap;
>
>         /*
> -        * Mutex that protects the nb_mb_state, mb_state, and sb_bitmap.
> +        * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
>          *
>          * When this lock is held the pointers can't change, ONLINE and
>          * OFFLINE blocks can't change the state and no subblocks will get
> @@ -254,70 +256,70 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
>  /*
>   * Set the state of a memory block, taking care of the state counter.
>   */
> -static void virtio_mem_mb_set_state(struct virtio_mem *vm, unsigned long mb_id,
> -                                   enum virtio_mem_mb_state state)
> +static void virtio_mem_sbm_set_mb_state(struct virtio_mem *vm,
> +                                       unsigned long mb_id, uint8_t state)
>  {
>         const unsigned long idx = mb_id - vm->first_mb_id;
> -       enum virtio_mem_mb_state old_state;
> +       uint8_t old_state;
>
> -       old_state = vm->mb_state[idx];
> -       vm->mb_state[idx] = state;
> +       old_state = vm->sbm.mb_states[idx];
> +       vm->sbm.mb_states[idx] = state;
>
> -       BUG_ON(vm->nb_mb_state[old_state] == 0);
> -       vm->nb_mb_state[old_state]--;
> -       vm->nb_mb_state[state]++;
> +       BUG_ON(vm->sbm.mb_count[old_state] == 0);
> +       vm->sbm.mb_count[old_state]--;
> +       vm->sbm.mb_count[state]++;
>  }
>
>  /*
>   * Get the state of a memory block.
>   */
> -static enum virtio_mem_mb_state virtio_mem_mb_get_state(struct virtio_mem *vm,
> -                                                       unsigned long mb_id)
> +static uint8_t virtio_mem_sbm_get_mb_state(struct virtio_mem *vm,
> +                                          unsigned long mb_id)
>  {
>         const unsigned long idx = mb_id - vm->first_mb_id;
>
> -       return vm->mb_state[idx];
> +       return vm->sbm.mb_states[idx];
>  }
>
>  /*
>   * Prepare the state array for the next memory block.
>   */
> -static int virtio_mem_mb_state_prepare_next_mb(struct virtio_mem *vm)
> +static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
>  {
>         unsigned long old_bytes = vm->next_mb_id - vm->first_mb_id;
>         unsigned long new_bytes = old_bytes + 1;
>         int old_pages = PFN_UP(old_bytes);
>         int new_pages = PFN_UP(new_bytes);
> -       uint8_t *new_mb_state;
> +       uint8_t *new_array;
>
> -       if (vm->mb_state && old_pages == new_pages)
> +       if (vm->sbm.mb_states && old_pages == new_pages)
>                 return 0;
>
> -       new_mb_state = vzalloc(new_pages * PAGE_SIZE);
> -       if (!new_mb_state)
> +       new_array = vzalloc(new_pages * PAGE_SIZE);
> +       if (!new_array)
>                 return -ENOMEM;
>
>         mutex_lock(&vm->hotplug_mutex);
> -       if (vm->mb_state)
> -               memcpy(new_mb_state, vm->mb_state, old_pages * PAGE_SIZE);
> -       vfree(vm->mb_state);
> -       vm->mb_state = new_mb_state;
> +       if (vm->sbm.mb_states)
> +               memcpy(new_array, vm->sbm.mb_states, old_pages * PAGE_SIZE);
> +       vfree(vm->sbm.mb_states);
> +       vm->sbm.mb_states = new_array;
>         mutex_unlock(&vm->hotplug_mutex);
>
>         return 0;
>  }
>
> -#define virtio_mem_for_each_mb_state(_vm, _mb_id, _state) \
> +#define virtio_mem_sbm_for_each_mb(_vm, _mb_id, _state) \
>         for (_mb_id = _vm->first_mb_id; \
> -            _mb_id < _vm->next_mb_id && _vm->nb_mb_state[_state]; \
> +            _mb_id < _vm->next_mb_id && _vm->sbm.mb_count[_state]; \
>              _mb_id++) \
> -               if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
> +               if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
>
> -#define virtio_mem_for_each_mb_state_rev(_vm, _mb_id, _state) \
> +#define virtio_mem_sbm_for_each_mb_rev(_vm, _mb_id, _state) \
>         for (_mb_id = _vm->next_mb_id - 1; \
> -            _mb_id >= _vm->first_mb_id && _vm->nb_mb_state[_state]; \
> +            _mb_id >= _vm->first_mb_id && _vm->sbm.mb_count[_state]; \
>              _mb_id--) \
> -               if (virtio_mem_mb_get_state(_vm, _mb_id) == _state)
> +               if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
>
>  /*
>   * Mark all selected subblocks plugged.
> @@ -573,9 +575,9 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
>  static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>                                           unsigned long mb_id)
>  {
> -       switch (virtio_mem_mb_get_state(vm, mb_id)) {
> -       case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
> -       case VIRTIO_MEM_MB_STATE_OFFLINE:
> +       switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
> +       case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
> +       case VIRTIO_MEM_SBM_MB_OFFLINE:
>                 return NOTIFY_OK;
>         default:
>                 break;
> @@ -588,14 +590,14 @@ static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>  static void virtio_mem_notify_offline(struct virtio_mem *vm,
>                                       unsigned long mb_id)
>  {
> -       switch (virtio_mem_mb_get_state(vm, mb_id)) {
> -       case VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL:
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
> +       switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
> +       case VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL:
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
>                 break;
> -       case VIRTIO_MEM_MB_STATE_ONLINE:
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_OFFLINE);
> +       case VIRTIO_MEM_SBM_MB_ONLINE:
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_OFFLINE);
>                 break;
>         default:
>                 BUG();
> @@ -605,13 +607,14 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
>
>  static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
>  {
> -       switch (virtio_mem_mb_get_state(vm, mb_id)) {
> -       case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
> +       switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
> +       case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                       VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
>                 break;
> -       case VIRTIO_MEM_MB_STATE_OFFLINE:
> -               virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
> +       case VIRTIO_MEM_SBM_MB_OFFLINE:
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_ONLINE);
>                 break;
>         default:
>                 BUG();
> @@ -1160,7 +1163,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
>                 return -ENOSPC;
>
>         /* Resize the state array if required. */
> -       rc = virtio_mem_mb_state_prepare_next_mb(vm);
> +       rc = virtio_mem_sbm_mb_states_prepare_next_mb(vm);
>         if (rc)
>                 return rc;
>
> @@ -1169,7 +1172,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
>         if (rc)
>                 return rc;
>
> -       vm->nb_mb_state[VIRTIO_MEM_MB_STATE_UNUSED]++;
> +       vm->sbm.mb_count[VIRTIO_MEM_SBM_MB_UNUSED]++;
>         *mb_id = vm->next_mb_id++;
>         return 0;
>  }
> @@ -1203,16 +1206,16 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
>          * so the memory notifiers will find the block in the right state.
>          */
>         if (count == vm->nb_sb_per_mb)
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_OFFLINE);
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_OFFLINE);
>         else
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
>
>         /* Add the memory block to linux - if that fails, try to unplug. */
>         rc = virtio_mem_mb_add(vm, mb_id);
>         if (rc) {
> -               enum virtio_mem_mb_state new_state = VIRTIO_MEM_MB_STATE_UNUSED;
> +               int new_state = VIRTIO_MEM_SBM_MB_UNUSED;
>
>                 dev_err(&vm->vdev->dev,
>                         "adding memory block %lu failed with %d\n", mb_id, rc);
> @@ -1222,8 +1225,8 @@ static int virtio_mem_mb_plug_and_add(struct virtio_mem *vm,
>                  * where adding of memory failed - especially on -ENOMEM.
>                  */
>                 if (virtio_mem_mb_unplug_sb(vm, mb_id, 0, count))
> -                       new_state = VIRTIO_MEM_MB_STATE_PLUGGED;
> -               virtio_mem_mb_set_state(vm, mb_id, new_state);
> +                       new_state = VIRTIO_MEM_SBM_MB_PLUGGED;
> +               virtio_mem_sbm_set_mb_state(vm, mb_id, new_state);
>                 return rc;
>         }
>
> @@ -1276,11 +1279,11 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
>
>         if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>                 if (online)
> -                       virtio_mem_mb_set_state(vm, mb_id,
> -                                               VIRTIO_MEM_MB_STATE_ONLINE);
> +                       virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                                   VIRTIO_MEM_SBM_MB_ONLINE);
>                 else
> -                       virtio_mem_mb_set_state(vm, mb_id,
> -                                               VIRTIO_MEM_MB_STATE_OFFLINE);
> +                       virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                                   VIRTIO_MEM_SBM_MB_OFFLINE);
>         }
>
>         return 0;
> @@ -1302,8 +1305,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>         mutex_lock(&vm->hotplug_mutex);
>
>         /* Try to plug subblocks of partially plugged online blocks. */
> -       virtio_mem_for_each_mb_state(vm, mb_id,
> -                                    VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
> +       virtio_mem_sbm_for_each_mb(vm, mb_id,
> +                                  VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
>                 rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, true);
>                 if (rc || !nb_sb)
>                         goto out_unlock;
> @@ -1311,8 +1314,8 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>         }
>
>         /* Try to plug subblocks of partially plugged offline blocks. */
> -       virtio_mem_for_each_mb_state(vm, mb_id,
> -                                    VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
> +       virtio_mem_sbm_for_each_mb(vm, mb_id,
> +                                  VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>                 rc = virtio_mem_mb_plug_any_sb(vm, mb_id, &nb_sb, false);
>                 if (rc || !nb_sb)
>                         goto out_unlock;
> @@ -1326,7 +1329,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>         mutex_unlock(&vm->hotplug_mutex);
>
>         /* Try to plug and add unused blocks */
> -       virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED) {
> +       virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_UNUSED) {
>                 if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
>                         return -ENOSPC;
>
> @@ -1375,8 +1378,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
>
>         /* some subblocks might have been unplugged even on failure */
>         if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
>         if (rc)
>                 return rc;
>
> @@ -1387,8 +1390,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
>                  * unplugged. Temporarily drop the mutex, so
>                  * any pending GOING_ONLINE requests can be serviced/rejected.
>                  */
> -               virtio_mem_mb_set_state(vm, mb_id,
> -                                       VIRTIO_MEM_MB_STATE_UNUSED);
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_UNUSED);
>
>                 mutex_unlock(&vm->hotplug_mutex);
>                 rc = virtio_mem_mb_remove(vm, mb_id);
> @@ -1426,8 +1429,8 @@ static int virtio_mem_mb_unplug_sb_online(struct virtio_mem *vm,
>                 return rc;
>         }
>
> -       virtio_mem_mb_set_state(vm, mb_id,
> -                               VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
> +       virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                   VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL);
>         return 0;
>  }
>
> @@ -1487,8 +1490,8 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>                 rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
>                 mutex_lock(&vm->hotplug_mutex);
>                 if (!rc)
> -                       virtio_mem_mb_set_state(vm, mb_id,
> -                                               VIRTIO_MEM_MB_STATE_UNUSED);
> +                       virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                                   VIRTIO_MEM_SBM_MB_UNUSED);
>         }
>
>         return 0;
> @@ -1514,8 +1517,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>         mutex_lock(&vm->hotplug_mutex);
>
>         /* Try to unplug subblocks of partially plugged offline blocks. */
> -       virtio_mem_for_each_mb_state_rev(vm, mb_id,
> -                                        VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
> +       virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
> +                                      VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>                 rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
>                                                          &nb_sb);
>                 if (rc || !nb_sb)
> @@ -1524,8 +1527,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>         }
>
>         /* Try to unplug subblocks of plugged offline blocks. */
> -       virtio_mem_for_each_mb_state_rev(vm, mb_id,
> -                                        VIRTIO_MEM_MB_STATE_OFFLINE) {
> +       virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_OFFLINE) {
>                 rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id,
>                                                          &nb_sb);
>                 if (rc || !nb_sb)
> @@ -1539,8 +1541,8 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>         }
>
>         /* Try to unplug subblocks of partially plugged online blocks. */
> -       virtio_mem_for_each_mb_state_rev(vm, mb_id,
> -                                        VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) {
> +       virtio_mem_sbm_for_each_mb_rev(vm, mb_id,
> +                                      VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL) {
>                 rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
>                                                         &nb_sb);
>                 if (rc || !nb_sb)
> @@ -1551,8 +1553,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>         }
>
>         /* Try to unplug subblocks of plugged online blocks. */
> -       virtio_mem_for_each_mb_state_rev(vm, mb_id,
> -                                        VIRTIO_MEM_MB_STATE_ONLINE) {
> +       virtio_mem_sbm_for_each_mb_rev(vm, mb_id, VIRTIO_MEM_SBM_MB_ONLINE) {
>                 rc = virtio_mem_mb_unplug_any_sb_online(vm, mb_id,
>                                                         &nb_sb);
>                 if (rc || !nb_sb)
> @@ -1578,11 +1579,12 @@ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
>         unsigned long mb_id;
>         int rc;
>
> -       virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_PLUGGED) {
> +       virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>                 rc = virtio_mem_mb_unplug(vm, mb_id);
>                 if (rc)
>                         return rc;
> -               virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_UNUSED);
>         }
>
>         return 0;
> @@ -1974,11 +1976,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>          * After we unregistered our callbacks, user space can online partially
>          * plugged offline blocks. Make sure to remove them.
>          */
> -       virtio_mem_for_each_mb_state(vm, mb_id,
> -                                    VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) {
> +       virtio_mem_sbm_for_each_mb(vm, mb_id,
> +                                  VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>                 rc = virtio_mem_mb_remove(vm, mb_id);
>                 BUG_ON(rc);
> -               virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED);
> +               virtio_mem_sbm_set_mb_state(vm, mb_id,
> +                                           VIRTIO_MEM_SBM_MB_UNUSED);
>         }
>         /*
>          * After we unregistered our callbacks, user space can no longer
> @@ -2003,7 +2006,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>         }
>
>         /* remove all tracking data - no locking needed */
> -       vfree(vm->mb_state);
> +       vfree(vm->sbm.mb_states);
>         vfree(vm->sb_bitmap);
>
>         /* reset the device and cleanup the queues */

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 17/29] virito-mem: subblock states are specific to Sub Block Mode (SBM)
  2020-10-12 12:53 ` [PATCH v1 17/29] virito-mem: subblock " David Hildenbrand
  2020-10-16  8:43   ` Wei Yang
@ 2020-10-20  9:54   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-20  9:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> Let's rename and move accordingly. While at it, rename sb_bitmap to
> "sb_states".
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 118 +++++++++++++++++++-----------------
>  1 file changed, 62 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index e76d6f769aa5..2cc497ad8298 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -137,17 +137,23 @@ struct virtio_mem {
>                  * memory in one 4 KiB page.
>                  */
>                 uint8_t *mb_states;
> -       } sbm;
>
> -       /*
> -        * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
> -        *
> -        * With 4MB subblocks, we manage 128GB of memory in one page.
> -        */
> -       unsigned long *sb_bitmap;
> +               /*
> +                * Bitmap: one bit per subblock. Allocated similar to
> +                * sbm.mb_states.
> +                *
> +                * A set bit means the corresponding subblock is plugged,
> +                * otherwise it's unblocked.
> +                *
> +                * With 4 MiB subblocks, we manage 128 GiB of memory in one
> +                * 4 KiB page.
> +                */
> +               unsigned long *sb_states;
> +       } sbm;
>
>         /*
> -        * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
> +        * Mutex that protects the sbm.mb_count, sbm.mb_states, and
> +        * sbm.sb_states.
>          *
>          * When this lock is held the pointers can't change, ONLINE and
>          * OFFLINE blocks can't change the state and no subblocks will get
> @@ -326,13 +332,13 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
>   *
>   * Will not modify the state of the memory block.
>   */
> -static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
> -                                        unsigned long mb_id, int sb_id,
> -                                        int count)
> +static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
> +                                         unsigned long mb_id, int sb_id,
> +                                         int count)
>  {
>         const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> -       __bitmap_set(vm->sb_bitmap, bit, count);
> +       __bitmap_set(vm->sbm.sb_states, bit, count);
>  }
>
>  /*
> @@ -340,86 +346,87 @@ static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
>   *
>   * Will not modify the state of the memory block.
>   */
> -static void virtio_mem_mb_set_sb_unplugged(struct virtio_mem *vm,
> -                                          unsigned long mb_id, int sb_id,
> -                                          int count)
> +static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
> +                                           unsigned long mb_id, int sb_id,
> +                                           int count)
>  {
>         const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> -       __bitmap_clear(vm->sb_bitmap, bit, count);
> +       __bitmap_clear(vm->sbm.sb_states, bit, count);
>  }
>
>  /*
>   * Test if all selected subblocks are plugged.
>   */
> -static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm,
> -                                         unsigned long mb_id, int sb_id,
> -                                         int count)
> +static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
> +                                          unsigned long mb_id, int sb_id,
> +                                          int count)
>  {
>         const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
>         if (count == 1)
> -               return test_bit(bit, vm->sb_bitmap);
> +               return test_bit(bit, vm->sbm.sb_states);
>
>         /* TODO: Helper similar to bitmap_set() */
> -       return find_next_zero_bit(vm->sb_bitmap, bit + count, bit) >=
> +       return find_next_zero_bit(vm->sbm.sb_states, bit + count, bit) >=
>                bit + count;
>  }
>
>  /*
>   * Test if all selected subblocks are unplugged.
>   */
> -static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm,
> -                                           unsigned long mb_id, int sb_id,
> -                                           int count)
> +static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> +                                            unsigned long mb_id, int sb_id,
> +                                            int count)
>  {
>         const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
>         /* TODO: Helper similar to bitmap_set() */
> -       return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count;
> +       return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
> +              bit + count;
>  }
>
>  /*
>   * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
>   * none.
>   */
> -static int virtio_mem_mb_first_unplugged_sb(struct virtio_mem *vm,
> +static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>                                             unsigned long mb_id)
>  {
>         const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
>
> -       return find_next_zero_bit(vm->sb_bitmap, bit + vm->nb_sb_per_mb, bit) -
> -              bit;
> +       return find_next_zero_bit(vm->sbm.sb_states,
> +                                 bit + vm->nb_sb_per_mb, bit) - bit;
>  }
>
>  /*
>   * Prepare the subblock bitmap for the next memory block.
>   */
> -static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
> +static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
>  {
>         const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
>         const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
>         const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
>         int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
>         int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
> -       unsigned long *new_sb_bitmap, *old_sb_bitmap;
> +       unsigned long *new_bitmap, *old_bitmap;
>
> -       if (vm->sb_bitmap && old_pages == new_pages)
> +       if (vm->sbm.sb_states && old_pages == new_pages)
>                 return 0;
>
> -       new_sb_bitmap = vzalloc(new_pages * PAGE_SIZE);
> -       if (!new_sb_bitmap)
> +       new_bitmap = vzalloc(new_pages * PAGE_SIZE);
> +       if (!new_bitmap)
>                 return -ENOMEM;
>
>         mutex_lock(&vm->hotplug_mutex);
> -       if (new_sb_bitmap)
> -               memcpy(new_sb_bitmap, vm->sb_bitmap, old_pages * PAGE_SIZE);
> +       if (new_bitmap)
> +               memcpy(new_bitmap, vm->sbm.sb_states, old_pages * PAGE_SIZE);
>
> -       old_sb_bitmap = vm->sb_bitmap;
> -       vm->sb_bitmap = new_sb_bitmap;
> +       old_bitmap = vm->sbm.sb_states;
> +       vm->sbm.sb_states = new_bitmap;
>         mutex_unlock(&vm->hotplug_mutex);
>
> -       vfree(old_sb_bitmap);
> +       vfree(old_bitmap);
>         return 0;
>  }
>
> @@ -630,7 +637,7 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
>         int sb_id;
>
>         for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> -               if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> +               if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>                         continue;
>                 pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>                                sb_id * vm->subblock_size);
> @@ -646,7 +653,7 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
>         int sb_id;
>
>         for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> -               if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> +               if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>                         continue;
>                 pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>                                sb_id * vm->subblock_size);
> @@ -936,7 +943,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>                  * If plugged, online the pages, otherwise, set them fake
>                  * offline (PageOffline).
>                  */
> -               if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> +               if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>                         generic_online_page(page, order);
>                 else
>                         virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
> @@ -1071,7 +1078,7 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
>
>         rc = virtio_mem_send_plug_request(vm, addr, size);
>         if (!rc)
> -               virtio_mem_mb_set_sb_plugged(vm, mb_id, sb_id, count);
> +               virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
>         return rc;
>  }
>
> @@ -1092,7 +1099,7 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>
>         rc = virtio_mem_send_unplug_request(vm, addr, size);
>         if (!rc)
> -               virtio_mem_mb_set_sb_unplugged(vm, mb_id, sb_id, count);
> +               virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
>         return rc;
>  }
>
> @@ -1115,14 +1122,14 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
>         while (*nb_sb) {
>                 /* Find the next candidate subblock */
>                 while (sb_id >= 0 &&
> -                      virtio_mem_mb_test_sb_unplugged(vm, mb_id, sb_id, 1))
> +                      virtio_mem_sbm_test_sb_unplugged(vm, mb_id, sb_id, 1))
>                         sb_id--;
>                 if (sb_id < 0)
>                         break;
>                 /* Try to unplug multiple subblocks at a time */
>                 count = 1;
>                 while (count < *nb_sb && sb_id > 0 &&
> -                      virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
> +                      virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
>                         count++;
>                         sb_id--;
>                 }
> @@ -1168,7 +1175,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
>                 return rc;
>
>         /* Resize the subblock bitmap if required. */
> -       rc = virtio_mem_sb_bitmap_prepare_next_mb(vm);
> +       rc = virtio_mem_sbm_sb_states_prepare_next_mb(vm);
>         if (rc)
>                 return rc;
>
> @@ -1253,14 +1260,13 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
>                 return -EINVAL;
>
>         while (*nb_sb) {
> -               sb_id = virtio_mem_mb_first_unplugged_sb(vm, mb_id);
> +               sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
>                 if (sb_id >= vm->nb_sb_per_mb)
>                         break;
>                 count = 1;
>                 while (count < *nb_sb &&
>                        sb_id + count < vm->nb_sb_per_mb &&
> -                      !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id + count,
> -                                                     1))
> +                      !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
>                         count++;
>
>                 rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
> @@ -1277,7 +1283,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
>                 virtio_mem_fake_online(pfn, nr_pages);
>         }
>
> -       if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> +       if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>                 if (online)
>                         virtio_mem_sbm_set_mb_state(vm, mb_id,
>                                                     VIRTIO_MEM_SBM_MB_ONLINE);
> @@ -1377,13 +1383,13 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
>         rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
>
>         /* some subblocks might have been unplugged even on failure */
> -       if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
> +       if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>                 virtio_mem_sbm_set_mb_state(vm, mb_id,
>                                             VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
>         if (rc)
>                 return rc;
>
> -       if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> +       if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>                 /*
>                  * Remove the block from Linux - this should never fail.
>                  * Hinder the block from getting onlined by marking it
> @@ -1452,7 +1458,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>
>         /* If possible, try to unplug the complete block in one shot. */
>         if (*nb_sb >= vm->nb_sb_per_mb &&
> -           virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> +           virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>                 rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
>                                                     vm->nb_sb_per_mb);
>                 if (!rc) {
> @@ -1466,7 +1472,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>         for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
>                 /* Find the next candidate subblock */
>                 while (sb_id >= 0 &&
> -                      !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> +                      !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>                         sb_id--;
>                 if (sb_id < 0)
>                         break;
> @@ -1485,7 +1491,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>          * remove it. This will usually not fail, as no memory is in use
>          * anymore - however some other notifiers might NACK the request.
>          */
> -       if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> +       if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>                 mutex_unlock(&vm->hotplug_mutex);
>                 rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
>                 mutex_lock(&vm->hotplug_mutex);
> @@ -2007,7 +2013,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>
>         /* remove all tracking data - no locking needed */
>         vfree(vm->sbm.mb_states);
> -       vfree(vm->sb_bitmap);
> +       vfree(vm->sbm.sb_states);
>
>         /* reset the device and cleanup the queues */
>         vdev->config->reset(vdev);

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap
  2020-10-12 12:53 ` [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap David Hildenbrand
  2020-10-16  8:46   ` Wei Yang
@ 2020-10-20  9:58   ` Pankaj Gupta
  1 sibling, 0 replies; 108+ messages in thread
From: Pankaj Gupta @ 2020-10-20  9:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, virtualization, Andrew Morton,
	Michael S . Tsirkin, Jason Wang

> The calculation is already complicated enough, let's limit it to one
> location.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_mem.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 2cc497ad8298..73ff6e9ba839 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -327,6 +327,16 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
>              _mb_id--) \
>                 if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
>
> +/*
> + * Calculate the bit number in the sb_states bitmap for the given subblock
> + * inside the given memory block.
> + */
> +static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> +                                         unsigned long mb_id, int sb_id)
> +{
> +       return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> +}
> +
>  /*
>   * Mark all selected subblocks plugged.
>   *
> @@ -336,7 +346,7 @@ static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
>                                           unsigned long mb_id, int sb_id,
>                                           int count)
>  {
> -       const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> +       const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
>         __bitmap_set(vm->sbm.sb_states, bit, count);
>  }
> @@ -350,7 +360,7 @@ static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
>                                             unsigned long mb_id, int sb_id,
>                                             int count)
>  {
> -       const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> +       const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
>         __bitmap_clear(vm->sbm.sb_states, bit, count);
>  }
> @@ -362,7 +372,7 @@ static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
>                                            unsigned long mb_id, int sb_id,
>                                            int count)
>  {
> -       const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> +       const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
>         if (count == 1)
>                 return test_bit(bit, vm->sbm.sb_states);
> @@ -379,7 +389,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
>                                              unsigned long mb_id, int sb_id,
>                                              int count)
>  {
> -       const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> +       const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
>         /* TODO: Helper similar to bitmap_set() */
>         return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
> @@ -393,7 +403,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
>  static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>                                             unsigned long mb_id)
>  {
> -       const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
> +       const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
>
>         return find_next_zero_bit(vm->sbm.sb_states,
>                                   bit + vm->nb_sb_per_mb, bit) - bit;

Agree, there are alot of *b things, good to clean as much.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

end of thread, back to index

Thread overview: 108+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 12:52 [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) David Hildenbrand
2020-10-12 12:52 ` [PATCH v1 01/29] virtio-mem: determine nid only once using memory_add_physaddr_to_nid() David Hildenbrand
2020-10-15  3:56   ` Wei Yang
2020-10-15 19:26   ` Pankaj Gupta
2020-10-12 12:52 ` [PATCH v1 02/29] virtio-mem: simplify calculation in virtio_mem_mb_state_prepare_next_mb() David Hildenbrand
2020-10-15  4:02   ` Wei Yang
2020-10-15  8:00     ` David Hildenbrand
2020-10-15 10:00       ` Wei Yang
2020-10-15 10:01         ` David Hildenbrand
2020-10-15 20:24   ` Pankaj Gupta
2020-10-16  9:00     ` David Hildenbrand
2020-10-12 12:52 ` [PATCH v1 03/29] virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling David Hildenbrand
2020-10-15  7:06   ` Wei Yang
2020-10-12 12:52 ` [PATCH v1 04/29] virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add() David Hildenbrand
2020-10-12 13:09   ` Pankaj Gupta
2020-10-15  7:14   ` Wei Yang
2020-10-12 12:52 ` [PATCH v1 05/29] virtio-mem: generalize check for added memory David Hildenbrand
2020-10-15  8:28   ` Wei Yang
2020-10-15  8:50     ` David Hildenbrand
2020-10-16  2:16       ` Wei Yang
2020-10-16  9:11         ` David Hildenbrand
2020-10-16 10:02           ` Wei Yang
2020-10-16 10:32             ` David Hildenbrand
2020-10-16 22:38               ` Wei Yang
2020-10-17  7:39                 ` David Hildenbrand
2020-10-18 12:27                   ` Wei Yang
2020-10-16 22:39   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 06/29] virtio-mem: generalize virtio_mem_owned_mb() David Hildenbrand
2020-10-15  8:32   ` Wei Yang
2020-10-15  8:37     ` David Hildenbrand
2020-10-15 20:30   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 07/29] virtio-mem: generalize virtio_mem_overlaps_range() David Hildenbrand
2020-10-20  9:22   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 08/29] virtio-mem: drop last_mb_id David Hildenbrand
2020-10-15  8:35   ` Wei Yang
2020-10-15 20:32   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory David Hildenbrand
2020-10-16  4:03   ` Wei Yang
2020-10-16  9:18     ` David Hildenbrand
2020-10-18  3:57       ` Wei Yang
2020-10-19  9:04         ` David Hildenbrand
2020-10-20  0:41           ` Wei Yang
2020-10-20  9:09             ` David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 10/29] virtio-mem: generalize handling when memory is getting onlined deferred David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 11/29] virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining David Hildenbrand
2020-10-15 20:31   ` Pankaj Gupta
2020-10-16  6:11   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 12/29] virtio-mem: factor out fake-offlining into virtio_mem_fake_offline() David Hildenbrand
2020-10-16  6:24   ` Wei Yang
2020-10-20  9:31   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 13/29] virtio-mem: factor out handling of fake-offline pages in memory notifier David Hildenbrand
2020-10-16  7:15   ` Wei Yang
2020-10-16  8:00     ` Wei Yang
2020-10-16  8:57       ` David Hildenbrand
2020-10-18 12:37         ` Wei Yang
2020-10-18 12:38   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 14/29] virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 15/29] virito-mem: document Sub Block Mode (SBM) David Hildenbrand
2020-10-15  9:33   ` David Hildenbrand
2020-10-20  9:38     ` Pankaj Gupta
2020-10-16  8:03   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 16/29] virtio-mem: memory block states are specific to " David Hildenbrand
2020-10-16  8:40   ` Wei Yang
2020-10-16  8:43   ` Wei Yang
2020-10-20  9:48   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 17/29] virito-mem: subblock " David Hildenbrand
2020-10-16  8:43   ` Wei Yang
2020-10-20  9:54   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 18/29] virtio-mem: factor out calculation of the bit number within the sb_states bitmap David Hildenbrand
2020-10-16  8:46   ` Wei Yang
2020-10-20  9:58   ` Pankaj Gupta
2020-10-12 12:53 ` [PATCH v1 19/29] virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM) David Hildenbrand
2020-10-16  8:49   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 20/29] virtio-mem: nb_sb_per_mb and subblock_size " David Hildenbrand
2020-10-16  8:51   ` Wei Yang
2020-10-16  8:53   ` Wei Yang
2020-10-16 13:17     ` David Hildenbrand
2020-10-18 12:41       ` Wei Yang
2020-10-19 11:57         ` David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 21/29] virtio-mem: memory notifier callbacks " David Hildenbrand
2020-10-19  1:57   ` Wei Yang
2020-10-19 10:22     ` David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 22/29] virtio-mem: memory block ids " David Hildenbrand
2020-10-16  8:54   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 23/29] virtio-mem: factor out adding/removing memory from Linux David Hildenbrand
2020-10-16  8:59   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 24/29] virtio-mem: print debug messages from virtio_mem_send_*_request() David Hildenbrand
2020-10-16  9:07   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug David Hildenbrand
2020-10-16  9:38   ` Wei Yang
2020-10-16 13:13     ` David Hildenbrand
2020-10-19  2:26   ` Wei Yang
2020-10-19  9:15     ` David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 26/29] virtio-mem: allow to force Big Block Mode (BBM) and set the big block size David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block David Hildenbrand
2020-10-15 13:08   ` Michael S. Tsirkin
2020-10-19  3:22   ` Wei Yang
2020-10-12 12:53 ` [PATCH v1 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug David Hildenbrand
2020-10-19  3:48   ` Wei Yang
2020-10-19  9:12     ` David Hildenbrand
2020-10-12 12:53 ` [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe " David Hildenbrand
2020-10-19  7:54   ` Wei Yang
2020-10-19  8:50     ` David Hildenbrand
2020-10-20  0:23       ` Wei Yang
2020-10-20  0:24   ` Wei Yang
2020-10-18 12:49 ` [PATCH v1 00/29] virtio-mem: Big Block Mode (BBM) Wei Yang
2020-10-18 15:29 ` Michael S. Tsirkin
2020-10-18 16:34   ` David Hildenbrand

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git