[PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1
@ 2021-11-26 13:42 David Hildenbrand
  2021-11-26 13:42 ` [PATCH v1 1/2] virtio-mem: prepare page onlining code " David Hildenbrand
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-11-26 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Michael S. Tsirkin, Jason Wang, Zi Yan,
	Gavin Shan, Hui Zhu, Eric Ren, Sebastien Boeuf, Pankaj Gupta,
	Wei Yang, virtualization, linux-mm

The virtio-mem driver currently supports logical hot(un)plug in
MAX_ORDER - 1 granularity (4MiB on x86-64) or bigger. We want to support
pageblock granularity (2MiB on x86-64), to make hot(un)plug even more
flexible, and to improve hotunplug when using ZONE_NORMAL.

With pageblock granularity, we then have a granularity comparable to
hugepage ballooning. Further, there are ideas to increase MAX_ORDER, so
we really want to decouple it from MAX_ORDER.

While ZONE_MOVABLE should mostly work already, alloc_contig_range() still
needs work to be able to properly handle pageblock granularity on
ZONE_NORMAL. This support is in the works [1], so let's prepare
virtio-mem for supporting smaller granularity than MAX_ORDER - 1.

Tested with ZONE_MOVABLE after removing the MAX_ORDER - 1 granularity
limitation in virtio-mem, and using different device block sizes (2MiB,
4MiB, 8MiB).

[1] https://lkml.kernel.org/r/20211115193725.737539-1-zi.yan@sent.com

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: virtualization@lists.linux-foundation.org
Cc: linux-mm@kvack.org

David Hildenbrand (2):
  virtio-mem: prepare page onlining code for granularity smaller than
    MAX_ORDER - 1
  virtio-mem: prepare fake page onlining code for granularity smaller
    than MAX_ORDER - 1

 drivers/virtio/virtio_mem.c | 110 ++++++++++++++++++++++++------------
 1 file changed, 74 insertions(+), 36 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 1/2] virtio-mem: prepare page onlining code for granularity smaller than MAX_ORDER - 1
  2021-11-26 13:42 [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1 David Hildenbrand
@ 2021-11-26 13:42 ` David Hildenbrand
  2021-12-09 11:26   ` Eric Ren
  2021-11-26 13:42 ` [PATCH v1 2/2] virtio-mem: prepare fake " David Hildenbrand
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-11-26 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Michael S. Tsirkin, Jason Wang, Zi Yan,
	Gavin Shan, Hui Zhu, Eric Ren, Sebastien Boeuf, Pankaj Gupta,
	Wei Yang, virtualization, linux-mm

Let's prepare our page onlining code for subblock size smaller than
MAX_ORDER - 1: we'll get called for a MAX_ORDER - 1 page but might have
some subblocks in the range plugged and some unplugged. In that case,
fallback to subblock granularity to properly only expose the plugged
parts to the buddy.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 86 ++++++++++++++++++++++++++-----------
 1 file changed, 62 insertions(+), 24 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 96e5a8782769..03e1c5743699 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -20,6 +20,7 @@
 #include <linux/mutex.h>
 #include <linux/bitmap.h>
 #include <linux/lockdep.h>
+#include <linux/log2.h>
 
 #include <acpi/acpi_numa.h>
 
@@ -1228,28 +1229,46 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
 		page_ref_inc(pfn_to_page(pfn + i));
 }
 
-static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
+static void virtio_mem_online_page(struct virtio_mem *vm,
+				   struct page *page, unsigned int order)
 {
-	const unsigned long addr = page_to_phys(page);
-	unsigned long id, sb_id;
-	struct virtio_mem *vm;
+	const unsigned long start = page_to_phys(page);
+	const unsigned long end = start + PFN_PHYS(1 << order);
+	unsigned long addr, next, id, sb_id, count;
 	bool do_online;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
-		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
-			continue;
+	/*
+	 * We can get called with any order up to MAX_ORDER - 1. If our
+	 * subblock size is smaller than that and we have a mixture of plugged
+	 * and unplugged subblocks within such a page, we have to process in
+	 * smaller granularity. In that case we'll adjust the order exactly once
+	 * within the loop.
+	 */
+	for (addr = start; addr < end; ) {
+		next = addr + PFN_PHYS(1 << order);
 
 		if (vm->in_sbm) {
-			/*
-			 * We exploit here that subblocks have at least
-			 * MAX_ORDER_NR_PAGES size/alignment - so we cannot
-			 * cross subblocks within one call.
-			 */
 			id = virtio_mem_phys_to_mb_id(addr);
 			sb_id = virtio_mem_phys_to_sb_id(vm, addr);
-			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
-								   sb_id, 1);
+			count = virtio_mem_phys_to_sb_id(vm, next - 1) - sb_id + 1;
+
+			if (virtio_mem_sbm_test_sb_plugged(vm, id, sb_id, count)) {
+				/* Fully plugged. */
+				do_online = true;
+			} else if (count == 1 ||
+				   virtio_mem_sbm_test_sb_unplugged(vm, id, sb_id, count)) {
+				/* Fully unplugged. */
+				do_online = false;
+			} else {
+				/*
+				 * Mixture, process sub-blocks instead. This
+				 * will be at least the size of a pageblock.
+				 * We'll run into this case exactly once.
+				 */
+				order = ilog2(vm->sbm.sb_size) - PAGE_SHIFT;
+				do_online = virtio_mem_sbm_test_sb_plugged(vm, id, sb_id, 1);
+				continue;
+			}
 		} else {
 			/*
 			 * If the whole block is marked fake offline, keep
@@ -1260,18 +1279,38 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
 				    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE;
 		}
 
+		if (do_online)
+			generic_online_page(pfn_to_page(PFN_DOWN(addr)), order);
+		else
+			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
+						    false);
+		addr = next;
+	}
+}
+
+static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
+{
+	const unsigned long addr = page_to_phys(page);
+	struct virtio_mem *vm;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
 		/*
-		 * virtio_mem_set_fake_offline() might sleep, we don't need
-		 * the device anymore. See virtio_mem_remove() how races
+		 * Pages we're onlining will never cross memory blocks and,
+		 * therefore, not virtio-mem devices.
+		 */
+		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
+			continue;
+
+		/*
+		 * virtio_mem_set_fake_offline() might sleep. We can safely
+		 * drop the RCU lock at this point because the device
+		 * cannot go away. See virtio_mem_remove() how races
 		 * between memory onlining and device removal are handled.
 		 */
 		rcu_read_unlock();
 
-		if (do_online)
-			generic_online_page(page, order);
-		else
-			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
-						    false);
+		virtio_mem_online_page(vm, page, order);
 		return;
 	}
 	rcu_read_unlock();
@@ -2438,8 +2477,7 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 	/*
 	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
 	 * pageblock_nr_pages pages. This:
-	 * - Simplifies our page onlining code (virtio_mem_online_page_cb)
-	 *   and fake page onlining code (virtio_mem_fake_online).
+	 * - Simplifies our fake page onlining code (virtio_mem_fake_online).
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 2/2] virtio-mem: prepare fake page onlining code for granularity smaller than MAX_ORDER - 1
  2021-11-26 13:42 [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1 David Hildenbrand
  2021-11-26 13:42 ` [PATCH v1 1/2] virtio-mem: prepare page onlining code " David Hildenbrand
@ 2021-11-26 13:42 ` David Hildenbrand
  2021-12-09 11:51   ` Eric Ren
  2021-11-29 16:47 ` [PATCH v1 0/2] virtio-mem: prepare " Zi Yan
  2021-11-30 23:56 ` Michael S. Tsirkin
  3 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-11-26 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Michael S. Tsirkin, Jason Wang, Zi Yan,
	Gavin Shan, Hui Zhu, Eric Ren, Sebastien Boeuf, Pankaj Gupta,
	Wei Yang, virtualization, linux-mm

Let's prepare our fake page onlining code for subblock size smaller than
MAX_ORDER - 1: we might get called for ranges not covering properly
aligned MAX_ORDER - 1 pages. We have to detect the order to use
dynamically.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 03e1c5743699..50de7582c9f8 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1121,15 +1121,18 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
  */
 static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
 {
-	const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
+	unsigned long order = MAX_ORDER - 1;
 	unsigned long i;
 
 	/*
-	 * We are always called at least with MAX_ORDER_NR_PAGES
-	 * granularity/alignment (e.g., the way subblocks work). All pages
-	 * inside such a block are alike.
+	 * We might get called for ranges that don't cover properly aligned
+	 * MAX_ORDER - 1 pages; however, we can only online properly aligned
+	 * pages with an order of MAX_ORDER - 1 at maximum.
 	 */
-	for (i = 0; i < nr_pages; i += max_nr_pages) {
+	while (!IS_ALIGNED(pfn | nr_pages, 1 << order))
+		order--;
+
+	for (i = 0; i < nr_pages; i += 1 << order) {
 		struct page *page = pfn_to_page(pfn + i);
 
 		/*
@@ -1139,14 +1142,12 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
 		 * alike.
 		 */
 		if (PageDirty(page)) {
-			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
-						      false);
-			generic_online_page(page, MAX_ORDER - 1);
+			virtio_mem_clear_fake_offline(pfn + i, 1 << order, false);
+			generic_online_page(page, order);
 		} else {
-			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
-						      true);
-			free_contig_range(pfn + i, max_nr_pages);
-			adjust_managed_page_count(page, max_nr_pages);
+			virtio_mem_clear_fake_offline(pfn + i, 1 << order, true);
+			free_contig_range(pfn + i, 1 << order);
+			adjust_managed_page_count(page, 1 << order);
 		}
 	}
 }
@@ -2477,7 +2478,6 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 	/*
 	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
 	 * pageblock_nr_pages pages. This:
-	 * - Simplifies our fake page onlining code (virtio_mem_fake_online).
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1
  2021-11-26 13:42 [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1 David Hildenbrand
  2021-11-26 13:42 ` [PATCH v1 1/2] virtio-mem: prepare page onlining code " David Hildenbrand
  2021-11-26 13:42 ` [PATCH v1 2/2] virtio-mem: prepare fake " David Hildenbrand
@ 2021-11-29 16:47 ` Zi Yan
  2021-11-30 14:51   ` David Hildenbrand
  2021-11-30 23:56 ` Michael S. Tsirkin
  3 siblings, 1 reply; 10+ messages in thread
From: Zi Yan @ 2021-11-29 16:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Michael S. Tsirkin, Jason Wang, Gavin Shan,
	Hui Zhu, Eric Ren, Sebastien Boeuf, Pankaj Gupta, Wei Yang,
	virtualization, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

On 26 Nov 2021, at 8:42, David Hildenbrand wrote:

> The virtio-mem driver currently supports logical hot(un)plug in
> MAX_ORDER - 1 granularity (4MiB on x86-64) or bigger. We want to support
> pageblock granularity (2MiB on x86-64), to make hot(un)plug even more
> flexible, and to improve hotunplug when using ZONE_NORMAL.
>
> With pageblock granularity, we then have a granularity comparable to
> hugepage ballooning. Further, there are ideas to increase MAX_ORDER, so
> we really want to decouple it from MAX_ORDER.
>
> While ZONE_MOVABLE should mostly work already, alloc_contig_range() still
> needs work to be able to properly handle pageblock granularity on
> ZONE_NORMAL. This support is in the works [1], so let's prepare
> virtio-mem for supporting smaller granularity than MAX_ORDER - 1.
>
> Tested with ZONE_MOVABLE after removing the MAX_ORDER - 1 granularity
> limitation in virtio-mem, and using different device block sizes (2MiB,
> 4MiB, 8MiB).
>
> [1] https://lkml.kernel.org/r/20211115193725.737539-1-zi.yan@sent.com

The patchset looks good to me. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com>


> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Gavin Shan <gshan@redhat.com>
> Cc: Hui Zhu <teawater@gmail.com>
> Cc: Eric Ren <renzhengeek@gmail.com>
> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: linux-mm@kvack.org
>
> David Hildenbrand (2):
>   virtio-mem: prepare page onlining code for granularity smaller than
>     MAX_ORDER - 1
>   virtio-mem: prepare fake page onlining code for granularity smaller
>     than MAX_ORDER - 1
>
>  drivers/virtio/virtio_mem.c | 110 ++++++++++++++++++++++++------------
>  1 file changed, 74 insertions(+), 36 deletions(-)
>
> -- 
> 2.31.1


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1
  2021-11-29 16:47 ` [PATCH v1 0/2] virtio-mem: prepare " Zi Yan
@ 2021-11-30 14:51   ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-11-30 14:51 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, Michael S. Tsirkin, Jason Wang, Gavin Shan,
	Hui Zhu, Eric Ren, Sebastien Boeuf, Pankaj Gupta, Wei Yang,
	virtualization, linux-mm

On 29.11.21 17:47, Zi Yan wrote:
> On 26 Nov 2021, at 8:42, David Hildenbrand wrote:
> 
>> The virtio-mem driver currently supports logical hot(un)plug in
>> MAX_ORDER - 1 granularity (4MiB on x86-64) or bigger. We want to support
>> pageblock granularity (2MiB on x86-64), to make hot(un)plug even more
>> flexible, and to improve hotunplug when using ZONE_NORMAL.
>>
>> With pageblock granularity, we then have a granularity comparable to
>> hugepage ballooning. Further, there are ideas to increase MAX_ORDER, so
>> we really want to decouple it from MAX_ORDER.
>>
>> While ZONE_MOVABLE should mostly work already, alloc_contig_range() still
>> needs work to be able to properly handle pageblock granularity on
>> ZONE_NORMAL. This support is in the works [1], so let's prepare
>> virtio-mem for supporting smaller granularity than MAX_ORDER - 1.
>>
>> Tested with ZONE_MOVABLE after removing the MAX_ORDER - 1 granularity
>> limitation in virtio-mem, and using different device block sizes (2MiB,
>> 4MiB, 8MiB).
>>
>> [1] https://lkml.kernel.org/r/20211115193725.737539-1-zi.yan@sent.com
> 
> The patchset looks good to me. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com>

Thanks a lot!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1
  2021-11-26 13:42 [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1 David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-11-29 16:47 ` [PATCH v1 0/2] virtio-mem: prepare " Zi Yan
@ 2021-11-30 23:56 ` Michael S. Tsirkin
  2021-12-01  8:24   ` David Hildenbrand
  3 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2021-11-30 23:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Jason Wang, Zi Yan, Gavin Shan, Hui Zhu, Eric Ren,
	Sebastien Boeuf, Pankaj Gupta, Wei Yang, virtualization,
	linux-mm

On Fri, Nov 26, 2021 at 02:42:07PM +0100, David Hildenbrand wrote:
> The virtio-mem driver currently supports logical hot(un)plug in
> MAX_ORDER - 1 granularity (4MiB on x86-64) or bigger. We want to support
> pageblock granularity (2MiB on x86-64), to make hot(un)plug even more
> flexible, and to improve hotunplug when using ZONE_NORMAL.
> 
> With pageblock granularity, we then have a granularity comparable to
> hugepage ballooning. Further, there are ideas to increase MAX_ORDER, so
> we really want to decouple it from MAX_ORDER.
> 
> While ZONE_MOVABLE should mostly work already, alloc_contig_range() still
> needs work to be able to properly handle pageblock granularity on
> ZONE_NORMAL. This support is in the works [1], so let's prepare
> virtio-mem for supporting smaller granularity than MAX_ORDER - 1.

is there value to merging this seprately? or should this just
be part of that patchset?

> Tested with ZONE_MOVABLE after removing the MAX_ORDER - 1 granularity
> limitation in virtio-mem, and using different device block sizes (2MiB,
> 4MiB, 8MiB).
> 
> [1] https://lkml.kernel.org/r/20211115193725.737539-1-zi.yan@sent.com
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Gavin Shan <gshan@redhat.com>
> Cc: Hui Zhu <teawater@gmail.com>
> Cc: Eric Ren <renzhengeek@gmail.com>
> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: linux-mm@kvack.org
> 
> David Hildenbrand (2):
>   virtio-mem: prepare page onlining code for granularity smaller than
>     MAX_ORDER - 1
>   virtio-mem: prepare fake page onlining code for granularity smaller
>     than MAX_ORDER - 1
> 
>  drivers/virtio/virtio_mem.c | 110 ++++++++++++++++++++++++------------
>  1 file changed, 74 insertions(+), 36 deletions(-)
> 
> -- 
> 2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1
  2021-11-30 23:56 ` Michael S. Tsirkin
@ 2021-12-01  8:24   ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-12-01  8:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Jason Wang, Zi Yan, Gavin Shan, Hui Zhu, Eric Ren,
	Sebastien Boeuf, Pankaj Gupta, Wei Yang, virtualization,
	linux-mm

On 01.12.21 00:56, Michael S. Tsirkin wrote:
> On Fri, Nov 26, 2021 at 02:42:07PM +0100, David Hildenbrand wrote:
>> The virtio-mem driver currently supports logical hot(un)plug in
>> MAX_ORDER - 1 granularity (4MiB on x86-64) or bigger. We want to support
>> pageblock granularity (2MiB on x86-64), to make hot(un)plug even more
>> flexible, and to improve hotunplug when using ZONE_NORMAL.
>>
>> With pageblock granularity, we then have a granularity comparable to
>> hugepage ballooning. Further, there are ideas to increase MAX_ORDER, so
>> we really want to decouple it from MAX_ORDER.
>>
>> While ZONE_MOVABLE should mostly work already, alloc_contig_range() still
>> needs work to be able to properly handle pageblock granularity on
>> ZONE_NORMAL. This support is in the works [1], so let's prepare
>> virtio-mem for supporting smaller granularity than MAX_ORDER - 1.
> 
> is there value to merging this seprately? or should this just
> be part of that patchset?
> 

The value would be to give it additional testing ahead of time. But we
could just carry it along. Whatever you prefer. (I'd suggest merging it
right away)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] virtio-mem: prepare page onlining code for granularity smaller than MAX_ORDER - 1
  2021-11-26 13:42 ` [PATCH v1 1/2] virtio-mem: prepare page onlining code " David Hildenbrand
@ 2021-12-09 11:26   ` Eric Ren
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Ren @ 2021-12-09 11:26 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Michael S. Tsirkin, Jason Wang, Zi Yan, Gavin Shan, Hui Zhu,
	Sebastien Boeuf, Pankaj Gupta, Wei Yang, virtualization,
	linux-mm

Hi David,

Thanks for working on this!

On 2021/11/26 21:42, David Hildenbrand wrote:
> Let's prepare our page onlining code for subblock size smaller than
> MAX_ORDER - 1: we'll get called for a MAX_ORDER - 1 page but might have
> some subblocks in the range plugged and some unplugged. In that case,
> fallback to subblock granularity to properly only expose the plugged
> parts to the buddy.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   drivers/virtio/virtio_mem.c | 86 ++++++++++++++++++++++++++-----------
>   1 file changed, 62 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 96e5a8782769..03e1c5743699 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -20,6 +20,7 @@
>   #include <linux/mutex.h>
>   #include <linux/bitmap.h>
>   #include <linux/lockdep.h>
> +#include <linux/log2.h>
>   
>   #include <acpi/acpi_numa.h>
>   
> @@ -1228,28 +1229,46 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>   		page_ref_inc(pfn_to_page(pfn + i));
>   }
>   
> -static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> +static void virtio_mem_online_page(struct virtio_mem *vm,
> +				   struct page *page, unsigned int order)
>   {
> -	const unsigned long addr = page_to_phys(page);
> -	unsigned long id, sb_id;
> -	struct virtio_mem *vm;
> +	const unsigned long start = page_to_phys(page);
> +	const unsigned long end = start + PFN_PHYS(1 << order);
> +	unsigned long addr, next, id, sb_id, count;
>   	bool do_online;
>   
> -	rcu_read_lock();
> -	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
> -		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> -			continue;
> +	/*
> +	 * We can get called with any order up to MAX_ORDER - 1. If our
> +	 * subblock size is smaller than that and we have a mixture of plugged
> +	 * and unplugged subblocks within such a page, we have to process in
> +	 * smaller granularity. In that case we'll adjust the order exactly once
> +	 * within the loop.
> +	 */
> +	for (addr = start; addr < end; ) {
> +		next = addr + PFN_PHYS(1 << order);
>   
>   		if (vm->in_sbm) {
> -			/*
> -			 * We exploit here that subblocks have at least
> -			 * MAX_ORDER_NR_PAGES size/alignment - so we cannot
> -			 * cross subblocks within one call.
> -			 */
>   			id = virtio_mem_phys_to_mb_id(addr);
>   			sb_id = virtio_mem_phys_to_sb_id(vm, addr);
> -			do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
> -								   sb_id, 1);
> +			count = virtio_mem_phys_to_sb_id(vm, next - 1) - sb_id + 1;
> +
> +			if (virtio_mem_sbm_test_sb_plugged(vm, id, sb_id, count)) {
> +				/* Fully plugged. */
> +				do_online = true;
> +			} else if (count == 1 ||
> +				   virtio_mem_sbm_test_sb_unplugged(vm, id, sb_id, count)) {
> +				/* Fully unplugged. */
> +				do_online = false;
> +			} else {
> +				/*
> +				 * Mixture, process sub-blocks instead. This
> +				 * will be at least the size of a pageblock.
> +				 * We'll run into this case exactly once.
> +				 */
> +				order = ilog2(vm->sbm.sb_size) - PAGE_SHIFT;
> +				do_online = virtio_mem_sbm_test_sb_plugged(vm, id, sb_id, 1);
> +				continue;
> +			}
>   		} else {
>   			/*
>   			 * If the whole block is marked fake offline, keep
> @@ -1260,18 +1279,38 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>   				    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE;
>   		}
>   
> +		if (do_online)
> +			generic_online_page(pfn_to_page(PFN_DOWN(addr)), order);
> +		else
> +			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
> +						    false);
Should we just use PHYS_PFN() here? addr is obviously PFN aligned. 
Anyway, it doesn't matter.

LGTM.
Reviewed-by: Eric Ren <renzhengeek@gmail.com>
> +		addr = next;
> +	}
> +}
> +
> +static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> +{
> +	const unsigned long addr = page_to_phys(page);
> +	struct virtio_mem *vm;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
>   		/*
> -		 * virtio_mem_set_fake_offline() might sleep, we don't need
> -		 * the device anymore. See virtio_mem_remove() how races
> +		 * Pages we're onlining will never cross memory blocks and,
> +		 * therefore, not virtio-mem devices.
> +		 */
> +		if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> +			continue;
> +
> +		/*
> +		 * virtio_mem_set_fake_offline() might sleep. We can safely
> +		 * drop the RCU lock at this point because the device
> +		 * cannot go away. See virtio_mem_remove() how races
>   		 * between memory onlining and device removal are handled.
>   		 */
>   		rcu_read_unlock();
>   
> -		if (do_online)
> -			generic_online_page(page, order);
> -		else
> -			virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
> -						    false);
> +		virtio_mem_online_page(vm, page, order);
>   		return;
>   	}
>   	rcu_read_unlock();
> @@ -2438,8 +2477,7 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
>   	/*
>   	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
>   	 * pageblock_nr_pages pages. This:
> -	 * - Simplifies our page onlining code (virtio_mem_online_page_cb)
> -	 *   and fake page onlining code (virtio_mem_fake_online).
> +	 * - Simplifies our fake page onlining code (virtio_mem_fake_online).
>   	 * - Is required for now for alloc_contig_range() to work reliably -
>   	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
>   	 */


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] virtio-mem: prepare fake page onlining code for granularity smaller than MAX_ORDER - 1
  2021-11-26 13:42 ` [PATCH v1 2/2] virtio-mem: prepare fake " David Hildenbrand
@ 2021-12-09 11:51   ` Eric Ren
  2021-12-09 11:52     ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Ren @ 2021-12-09 11:51 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Michael S. Tsirkin, Jason Wang, Zi Yan, Gavin Shan, Hui Zhu,
	Sebastien Boeuf, Pankaj Gupta, Wei Yang, virtualization,
	linux-mm

Hi,

On 2021/11/26 21:42, David Hildenbrand wrote:
> Let's prepare our fake page onlining code for subblock size smaller than
> MAX_ORDER - 1: we might get called for ranges not covering properly
> aligned MAX_ORDER - 1 pages. We have to detect the order to use
> dynamically.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   drivers/virtio/virtio_mem.c | 26 +++++++++++++-------------
>   1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 03e1c5743699..50de7582c9f8 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -1121,15 +1121,18 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
>    */
>   static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
>   {
> -	const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
> +	unsigned long order = MAX_ORDER - 1;
>   	unsigned long i;
>   
>   	/*
> -	 * We are always called at least with MAX_ORDER_NR_PAGES
> -	 * granularity/alignment (e.g., the way subblocks work). All pages
> -	 * inside such a block are alike.
> +	 * We might get called for ranges that don't cover properly aligned
> +	 * MAX_ORDER - 1 pages; however, we can only online properly aligned
> +	 * pages with an order of MAX_ORDER - 1 at maximum.
>   	 */
> -	for (i = 0; i < nr_pages; i += max_nr_pages) {
> +	while (!IS_ALIGNED(pfn | nr_pages, 1 << order))
> +		order--;
> +
> +	for (i = 0; i < nr_pages; i += 1 << order) {
>   		struct page *page = pfn_to_page(pfn + i);
>   
>   		/*
> @@ -1139,14 +1142,12 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
>   		 * alike.
>   		 */
>   		if (PageDirty(page)) {
> -			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
> -						      false);
> -			generic_online_page(page, MAX_ORDER - 1);
> +			virtio_mem_clear_fake_offline(pfn + i, 1 << order, false);
> +			generic_online_page(page, order);
>   		} else {
> -			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
> -						      true);
> -			free_contig_range(pfn + i, max_nr_pages);
> -			adjust_managed_page_count(page, max_nr_pages);
> +			virtio_mem_clear_fake_offline(pfn + i, 1 << order, true);
> +			free_contig_range(pfn + i, 1 << order);
> +			adjust_managed_page_count(page, 1 << order);
In the loop, pfn + i, 1 << order are repeatedly calculated. 1 << order 
is a step size, pfn + i  is each step position.
Better to figure the numer once each iter?

LGTL.
LGTM.
Reviewed-by: Eric Ren <renzhengeek@gmail.com>
>   		}
>   	}
>   }
> @@ -2477,7 +2478,6 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
>   	/*
>   	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
>   	 * pageblock_nr_pages pages. This:
> -	 * - Simplifies our fake page onlining code (virtio_mem_fake_online).
>   	 * - Is required for now for alloc_contig_range() to work reliably -
>   	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
>   	 */


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] virtio-mem: prepare fake page onlining code for granularity smaller than MAX_ORDER - 1
  2021-12-09 11:51   ` Eric Ren
@ 2021-12-09 11:52     ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-12-09 11:52 UTC (permalink / raw)
  To: Eric Ren, linux-kernel
  Cc: Michael S. Tsirkin, Jason Wang, Zi Yan, Gavin Shan, Hui Zhu,
	Sebastien Boeuf, Pankaj Gupta, Wei Yang, virtualization,
	linux-mm

Hi Eric,

thanks for the review!

>>   		if (PageDirty(page)) {
>> -			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
>> -						      false);
>> -			generic_online_page(page, MAX_ORDER - 1);
>> +			virtio_mem_clear_fake_offline(pfn + i, 1 << order, false);
>> +			generic_online_page(page, order);
>>   		} else {
>> -			virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
>> -						      true);
>> -			free_contig_range(pfn + i, max_nr_pages);
>> -			adjust_managed_page_count(page, max_nr_pages);
>> +			virtio_mem_clear_fake_offline(pfn + i, 1 << order, true);
>> +			free_contig_range(pfn + i, 1 << order);
>> +			adjust_managed_page_count(page, 1 << order);
> In the loop, pfn + i, 1 << order are repeatedly calculated. 1 << order 
> is a step size, pfn + i  is each step position.
> Better to figure the numer once each iter?

The compiler better be smart enough to calculate such constants once :)

> 
> LGTL.
> LGTM.
> Reviewed-by: Eric Ren <renzhengeek@gmail.com>

Thanks!


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-09 11:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26 13:42 [PATCH v1 0/2] virtio-mem: prepare for granularity smaller than MAX_ORDER - 1 David Hildenbrand
2021-11-26 13:42 ` [PATCH v1 1/2] virtio-mem: prepare page onlining code " David Hildenbrand
2021-12-09 11:26   ` Eric Ren
2021-11-26 13:42 ` [PATCH v1 2/2] virtio-mem: prepare fake " David Hildenbrand
2021-12-09 11:51   ` Eric Ren
2021-12-09 11:52     ` David Hildenbrand
2021-11-29 16:47 ` [PATCH v1 0/2] virtio-mem: prepare " Zi Yan
2021-11-30 14:51   ` David Hildenbrand
2021-11-30 23:56 ` Michael S. Tsirkin
2021-12-01  8:24   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).