All of lore.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
@ 2020-10-02 15:44 Claudio Imbrenda
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions Claudio Imbrenda
                   ` (8 more replies)
  0 siblings, 9 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

The KVM unit tests are increasingly being used to test more than just
KVM. They are being used to test TCG, qemu I/O device emulation, other
hypervisors, and even actual hardware.

The existing memory allocators are becoming more and more inadequate to
the needs of the upcoming unit tests (but also some existing ones, see
below).

Some important features that are lacking:
* ability to perform a small physical page allocation with a big
  alignment withtout wasting huge amounts of memory
* ability to allocate physical pages from specific pools/areaas (e.g.
  below 16M, or 4G, etc)
* ability to reserve arbitrary pages (if free), removing them from the
  free pool

Some other features that are nice, but not so fundamental:
* no need for the generic allocator to keep track of metadata
  (i.e. allocation size), this is now handled by the lower level
  allocators
* coalescing small blocks into bigger ones, to allow contiguous memory
  freed in small blocks in a random order to be used for large
  allocations again

This is achieved in the following ways:

For the virtual allocator:
* only the virtul allocator needs one extra page of metadata, but only
  for allocations that wouldn't fit in one page

For the page allocator:
* page allocator has up to 6 memory pools, each pool has a metadata
  area; the metadata has a byte for each page in the area, describing
  the order of the block it belongs to, and whether it is free
* if there are no free blocks of the desired size, a bigger block is
  split until we reach the required size; the unused parts of the block
  are put back in the free lists
* if an allocation needs ablock with a larger alignment than its size, a
  larger block of (at least) the required order is split; the unused parts
  put back in the appropriate free lists
* if the allocation could not be satisfied, the next allowed area is
  searched; the allocation fails only when all allowed areas have been
  tried
* new functions to perform allocations from specific areas; the areas
  are arch-dependent and should be set up by the arch code
* for now x86 has a memory area for "lowest" memory under 16MB, one for
  "low" memory under 4GB and one for the rest, while s390x has one for under
  2GB and one for the rest; suggestions for more fine grained areas or for
  the other architectures are welcome
* upon freeing a block, an attempt is made to coalesce it into the
  appropriate neighbour (if it is free), and so on for the resulting
  larger block thus obtained

For the physical allocator:
* the minimum alignment is now handled manually, since it has been
  removed from the common struct


This patchset addresses some current but otherwise unsolvable issues on
s390x, such as the need to allocate a block under 2GB for each SMP CPU
upon CPU activation.

This patchset has been tested on s390x, amd64 and i386. It has also been
compiled on aarch64.

V1->V2:
* Renamed some functions, as per review comments
* Improved commit messages
* Split the list handling functions into an independent header
* Addded arch-specific headers to define the memory areas
* Fixed some minor issues
* The magic value for small allocations in the virtual allocator is now
  put right before the returned pointer, like for large allocations
* Added comments to make the code more readable
* Many minor fixes

Claudio Imbrenda (7):
  lib/list: Add double linked list management functions
  lib/vmalloc: vmalloc support for handling allocation metadata
  lib/asm: Add definitions of memory areas
  lib/alloc_page: complete rewrite of the page allocator
  lib/alloc: simplify free and malloc
  lib/alloc.h: remove align_min from struct alloc_ops
  lib/alloc_page: allow reserving arbitrary memory ranges

 lib/asm-generic/memory_areas.h |  11 +
 lib/arm/asm/memory_areas.h     |  11 +
 lib/arm64/asm/memory_areas.h   |  11 +
 lib/powerpc/asm/memory_areas.h |  11 +
 lib/ppc64/asm/memory_areas.h   |  11 +
 lib/s390x/asm/memory_areas.h   |  17 ++
 lib/x86/asm/memory_areas.h     |  22 ++
 lib/alloc.h                    |   3 +-
 lib/alloc_page.h               |  80 ++++-
 lib/list.h                     |  53 ++++
 lib/alloc.c                    |  42 +--
 lib/alloc_page.c               | 541 +++++++++++++++++++++++++++------
 lib/alloc_phys.c               |   9 +-
 lib/arm/setup.c                |   2 +-
 lib/s390x/sclp.c               |   6 +-
 lib/s390x/smp.c                |   6 +-
 lib/vmalloc.c                  | 121 ++++++--
 s390x/smp.c                    |   4 +-
 18 files changed, 789 insertions(+), 172 deletions(-)
 create mode 100644 lib/asm-generic/memory_areas.h
 create mode 100644 lib/arm/asm/memory_areas.h
 create mode 100644 lib/arm64/asm/memory_areas.h
 create mode 100644 lib/powerpc/asm/memory_areas.h
 create mode 100644 lib/ppc64/asm/memory_areas.h
 create mode 100644 lib/s390x/asm/memory_areas.h
 create mode 100644 lib/x86/asm/memory_areas.h
 create mode 100644 lib/list.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-10-02 18:18   ` Andrew Jones
  2020-11-06 11:29   ` Paolo Bonzini
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata Claudio Imbrenda
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

Add simple double linked lists.

Apart from the struct itself, there are functions to add and remove
items, and check for emptyness.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/list.h | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)
 create mode 100644 lib/list.h

diff --git a/lib/list.h b/lib/list.h
new file mode 100644
index 0000000..702a78c
--- /dev/null
+++ b/lib/list.h
@@ -0,0 +1,53 @@
+#ifndef LIST_H
+#define LIST_H
+
+#include <stdbool.h>
+
+/*
+ * Simple double linked list. The pointer to the list is a list item itself,
+ * like in the kernel implementation.
+ */
+struct linked_list {
+	struct linked_list *prev;
+	struct linked_list *next;
+};
+
+/*
+ * An empty list is a list item whose prev and next both point to itself.
+ * Returns true if the list is empty.
+ */
+static inline bool is_list_empty(struct linked_list *p)
+{
+	return !p->next || !p->prev || p == p->next || p == p->prev;
+}
+
+/*
+ * Remove the given element from the list, if the list is not already empty.
+ * The removed element is returned.
+ */
+static inline struct linked_list *list_remove(struct linked_list *l)
+{
+	if (is_list_empty(l))
+		return NULL;
+
+	l->prev->next = l->next;
+	l->next->prev = l->prev;
+	l->prev = l->next = NULL;
+
+	return l;
+}
+
+/*
+ * Add the given element after the given list head.
+ */
+static inline void list_add(struct linked_list *head, struct linked_list *li)
+{
+	assert(li);
+	assert(head);
+	li->prev = head;
+	li->next = head->next;
+	head->next->prev = li;
+	head->next = li;
+}
+
+#endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-10-03  8:46   ` Andrew Jones
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas Claudio Imbrenda
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

Add allocation metadata handling to vmalloc.

In upcoming patches, allocation metadata will have to be handled
directly bt the lower level allocators, and will not be handled by the
common wrapper.

In this patch, the number of allocated pages plus a magic value are
written immediately before the returned pointer. This means that multi
page allocations will allocate one extra page (which is not worse than
what the current allocator does).

For small allocations there is an optimization: the returned address is
intentionally not page-aligned. This signals that the allocation
spanned one page only. In this case the metadata is only the magic
value, and it is also saved immediately before the returned pointer.
Since the pointer does not point to the begininng of the page, there is
always space in the same page for the magic value.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/vmalloc.c | 105 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 95 insertions(+), 10 deletions(-)

diff --git a/lib/vmalloc.c b/lib/vmalloc.c
index e0c7b6b..2f25734 100644
--- a/lib/vmalloc.c
+++ b/lib/vmalloc.c
@@ -15,6 +15,16 @@
 #include <bitops.h>
 #include "vmalloc.h"
 
+#define VM_MAGIC 0x7E57C0DE
+
+#define GET_METADATA(x) (((struct metadata *)(x)) - 1)
+#define GET_MAGIC(x) (*((unsigned long *)(x) - 1))
+
+struct metadata {
+	unsigned long npages;
+	unsigned long magic;
+};
+
 static struct spinlock lock;
 static void *vfree_top = 0;
 static void *page_root;
@@ -25,8 +35,14 @@ static void *page_root;
  *
  * nr is the number of pages to allocate
  * alignment_pages is the alignment of the allocation *in pages*
+ * metadata indicates whether an extra (unaligned) page needs to be allocated
+ * right before the main (aligned) allocation.
+ *
+ * The return value points to the first allocated virtual page, which will
+ * be the (potentially unaligned) metadata page if the metadata flag is
+ * specified.
  */
-void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
+static void *do_alloc_vpages(ulong nr, unsigned int align_order, bool metadata)
 {
 	uintptr_t ptr;
 
@@ -34,6 +50,8 @@ void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
 	ptr = (uintptr_t)vfree_top;
 	ptr -= PAGE_SIZE * nr;
 	ptr &= GENMASK_ULL(63, PAGE_SHIFT + align_order);
+	if (metadata)
+		ptr -= PAGE_SIZE;
 	vfree_top = (void *)ptr;
 	spin_unlock(&lock);
 
@@ -41,6 +59,11 @@ void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
 	return (void *)ptr;
 }
 
+void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
+{
+	return do_alloc_vpages(nr, align_order, false);
+}
+
 void *alloc_vpages(ulong nr)
 {
 	return alloc_vpages_aligned(nr, 0);
@@ -69,35 +92,97 @@ void *vmap(phys_addr_t phys, size_t size)
 	return mem;
 }
 
+/*
+ * Allocate one page, for an object with specified alignment.
+ * The resulting pointer will be aligned to the required alignment, but
+ * intentionally not page-aligned.
+ * The metadata for single pages allocation is just the magic value,
+ * which is placed right before the pointer, like for bigger allocations.
+ */
+static void *vm_alloc_one_page(size_t alignment)
+{
+	void *p;
+
+	/* this guarantees that there will be space for the magic value */
+	assert(alignment >= sizeof(uintptr_t));
+	assert(alignment < PAGE_SIZE);
+	p = alloc_vpage();
+	install_page(page_root, virt_to_phys(alloc_page()), p);
+	p = (void *)((uintptr_t)p + alignment);
+	/* write the magic value right before the returned address */
+	GET_MAGIC(p) = VM_MAGIC;
+	return p;
+}
+
 /*
  * Allocate virtual memory, with the specified minimum alignment.
+ * If the allocation fits in one page, only one page is allocated. Otherwise
+ * enough pages are allocated for the object, plus one to keep metadata
+ * information about the allocation.
  */
 static void *vm_memalign(size_t alignment, size_t size)
 {
+	struct metadata *m;
 	phys_addr_t pa;
-	void *mem, *p;
+	uintptr_t p;
+	void *mem;
+	size_t i;
 
+	if (!size)
+		return NULL;
 	assert(is_power_of_2(alignment));
 
+	if (alignment < sizeof(uintptr_t))
+		alignment = sizeof(uintptr_t);
+	/* it fits in one page, allocate only one page */
+	if (alignment + size <= PAGE_SIZE)
+		return vm_alloc_one_page(alignment);
 	size = PAGE_ALIGN(size) / PAGE_SIZE;
 	alignment = get_order(PAGE_ALIGN(alignment) / PAGE_SIZE);
-	mem = p = alloc_vpages_aligned(size, alignment);
-	while (size--) {
+	mem = do_alloc_vpages(size, alignment, true);
+	p = (uintptr_t)mem;
+	/* skip the metadata page */
+	mem = (void *)(p + PAGE_SIZE);
+	/*
+	 * time to actually allocate the physical pages to back our virtual
+	 * allocation; note that we need to allocate one extra page (for the
+	 * metadata), hence the <=
+	 */
+	for (i = 0; i <= size; i++, p += PAGE_SIZE) {
 		pa = virt_to_phys(alloc_page());
 		assert(pa);
-		install_page(page_root, pa, p);
-		p += PAGE_SIZE;
+		install_page(page_root, pa, (void *)p);
 	}
+	m = GET_METADATA(mem);
+	m->npages = size;
+	m->magic = VM_MAGIC;
 	return mem;
 }
 
 static void vm_free(void *mem, size_t size)
 {
-	while (size) {
-		free_page(phys_to_virt(virt_to_pte_phys(page_root, mem)));
-		mem += PAGE_SIZE;
-		size -= PAGE_SIZE;
+	struct metadata *m;
+	uintptr_t ptr, end;
+
+	/* the pointer is not page-aligned, it was a single-page allocation */
+	if (!IS_ALIGNED((uintptr_t)mem, PAGE_SIZE)) {
+		assert(GET_MAGIC(mem) == VM_MAGIC);
+		ptr = virt_to_pte_phys(page_root, mem) & PAGE_MASK;
+		free_page(phys_to_virt(ptr));
+		return;
 	}
+
+	/* the pointer is page-aligned, it was a multi-page allocation */
+	m = GET_METADATA(mem);
+	assert(m->magic == VM_MAGIC);
+	assert(m->npages > 0);
+	/* free all the pages including the metadata page */
+	ptr = (uintptr_t)mem - PAGE_SIZE;
+	end = ptr + m->npages * PAGE_SIZE;
+	for ( ; ptr < end; ptr += PAGE_SIZE)
+		free_page(phys_to_virt(virt_to_pte_phys(page_root, (void *)ptr)));
+	/* free the last one separately to avoid overflow issues */
+	free_page(phys_to_virt(virt_to_pte_phys(page_root, (void *)ptr)));
 }
 
 static struct alloc_ops vmalloc_ops = {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions Claudio Imbrenda
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-10-03  9:23   ` Andrew Jones
  2020-11-06 11:34   ` Paolo Bonzini
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator Claudio Imbrenda
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

Add definitions and boundaries of memory areas for some architectures.
This is needed by the next patch.

Most architectures only get one generic memory area, wherease x86 and
s390x get some more attention:

x86 gets
* lowest area (24-bit addresses)
* low area (32-bit addresses)
* the rest

s390x gets
* low area (31-bit addresses)
* the rest

Notice that the number indicates the order in which the areas are
scanned when more than one area is indicated. The default order tries
to get allocations from higher address ranges before trying lower ones.
This tries to keep the precious lower addresses as free as possible.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/asm-generic/memory_areas.h | 11 +++++++++++
 lib/arm/asm/memory_areas.h     | 11 +++++++++++
 lib/arm64/asm/memory_areas.h   | 11 +++++++++++
 lib/powerpc/asm/memory_areas.h | 11 +++++++++++
 lib/ppc64/asm/memory_areas.h   | 11 +++++++++++
 lib/s390x/asm/memory_areas.h   | 17 +++++++++++++++++
 lib/x86/asm/memory_areas.h     | 22 ++++++++++++++++++++++
 7 files changed, 94 insertions(+)
 create mode 100644 lib/asm-generic/memory_areas.h
 create mode 100644 lib/arm/asm/memory_areas.h
 create mode 100644 lib/arm64/asm/memory_areas.h
 create mode 100644 lib/powerpc/asm/memory_areas.h
 create mode 100644 lib/ppc64/asm/memory_areas.h
 create mode 100644 lib/s390x/asm/memory_areas.h
 create mode 100644 lib/x86/asm/memory_areas.h

diff --git a/lib/asm-generic/memory_areas.h b/lib/asm-generic/memory_areas.h
new file mode 100644
index 0000000..927baa7
--- /dev/null
+++ b/lib/asm-generic/memory_areas.h
@@ -0,0 +1,11 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN 0
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#endif
diff --git a/lib/arm/asm/memory_areas.h b/lib/arm/asm/memory_areas.h
new file mode 100644
index 0000000..927baa7
--- /dev/null
+++ b/lib/arm/asm/memory_areas.h
@@ -0,0 +1,11 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN 0
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#endif
diff --git a/lib/arm64/asm/memory_areas.h b/lib/arm64/asm/memory_areas.h
new file mode 100644
index 0000000..927baa7
--- /dev/null
+++ b/lib/arm64/asm/memory_areas.h
@@ -0,0 +1,11 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN 0
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#endif
diff --git a/lib/powerpc/asm/memory_areas.h b/lib/powerpc/asm/memory_areas.h
new file mode 100644
index 0000000..927baa7
--- /dev/null
+++ b/lib/powerpc/asm/memory_areas.h
@@ -0,0 +1,11 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN 0
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#endif
diff --git a/lib/ppc64/asm/memory_areas.h b/lib/ppc64/asm/memory_areas.h
new file mode 100644
index 0000000..927baa7
--- /dev/null
+++ b/lib/ppc64/asm/memory_areas.h
@@ -0,0 +1,11 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN 0
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#endif
diff --git a/lib/s390x/asm/memory_areas.h b/lib/s390x/asm/memory_areas.h
new file mode 100644
index 0000000..4856a27
--- /dev/null
+++ b/lib/s390x/asm/memory_areas.h
@@ -0,0 +1,17 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN BIT(31-12)
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_LOW_PFN 0
+#define AREA_LOW_NUMBER 1
+#define AREA_LOW 2
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#define AREA_DMA31 AREA_LOW
+
+#endif
diff --git a/lib/x86/asm/memory_areas.h b/lib/x86/asm/memory_areas.h
new file mode 100644
index 0000000..d704df3
--- /dev/null
+++ b/lib/x86/asm/memory_areas.h
@@ -0,0 +1,22 @@
+#ifndef MEMORY_AREAS_H
+#define MEMORY_AREAS_H
+
+#define AREA_NORMAL_PFN BIT(32-12)
+#define AREA_NORMAL_NUMBER 0
+#define AREA_NORMAL 1
+
+#define AREA_LOW_PFN BIT(24-12)
+#define AREA_LOW_NUMBER 1
+#define AREA_LOW 2
+
+#define AREA_LOWEST_PFN 0
+#define AREA_LOWEST_NUMBER 2
+#define AREA_LOWEST 4
+
+#define AREA_DMA24 AREA_LOWEST
+#define AREA_DMA32 (AREA_LOWEST | AREA_LOW)
+
+#define AREA_ANY -1
+#define AREA_ANY_NUMBER 0xff
+
+#endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
                   ` (2 preceding siblings ...)
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-10-05 12:40   ` Andrew Jones
  2020-12-08  0:41   ` Nadav Amit
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 5/7] lib/alloc: simplify free and malloc Claudio Imbrenda
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

This is a complete rewrite of the page allocator.

This will bring a few improvements:
* no need to specify the size when freeing
* allocate small areas with a large alignment without wasting memory
* ability to initialize and use multiple memory areas (e.g. DMA)
* more sanity checks

A few things have changed:
* initialization cannot be done with free_pages like before,
  page_alloc_init_area has to be used instead

Arch-specific changes:
* s390x now uses the area below 2GiB for SMP lowcore initialization.

Details:
Each memory area has metadata at the very beginning. The metadata is a
byte array with one entry per usable page (so, excluding the metadata
itself). Each entry indicates if the page is special (unused for now),
if it is allocated, and the order of the block. Both free and allocated
pages are part of larger blocks.

Some more fixed size metadata is present in a fixed-size static array.
This metadata contains start and end page frame numbers, the pointer to
the metadata array, and the array of freelists. The array of freelists
has an entry for each possible order (indicated by the macro NLISTS,
defined as BITS_PER_LONG - PAGE_SHIFT).

On allocation, if the free list for the needed size is empty, larger
blocks are split. When a small allocation with a large alignment is
requested, an appropriately large block is split, to guarantee the
alignment.

When a block is freed, an attempt will be made to merge it into the
neighbour, iterating the process as long as possible.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/alloc_page.h |  63 +++++-
 lib/alloc_page.c | 484 +++++++++++++++++++++++++++++++++++++----------
 lib/arm/setup.c  |   2 +-
 lib/s390x/sclp.c |   6 +-
 lib/s390x/smp.c  |   2 +-
 lib/vmalloc.c    |  13 +-
 6 files changed, 453 insertions(+), 117 deletions(-)

diff --git a/lib/alloc_page.h b/lib/alloc_page.h
index 88540d1..81847ae 100644
--- a/lib/alloc_page.h
+++ b/lib/alloc_page.h
@@ -8,12 +8,71 @@
 #ifndef ALLOC_PAGE_H
 #define ALLOC_PAGE_H 1
 
+#include <asm/memory_areas.h>
+
+/* Returns true if the page allocator has been initialized */
 bool page_alloc_initialized(void);
+
+/*
+ * Initializes a memory area.
+ * n is the number of the area to initialize
+ * base_pfn is the physical frame number of the start of the area to initialize
+ * top_pfn is the physical frame number of the first page immediately after
+ * the end of the area to initialize
+ */
+void page_alloc_init_area(u8 n, uintptr_t base_pfn, uintptr_t top_pfn);
+
+/* Enables the page allocator. At least one area must have been initialized */
 void page_alloc_ops_enable(void);
+
+/*
+ * Allocate aligned memory from the specified areas.
+ * areas is a bitmap of allowed areas
+ * alignment must be a power of 2
+ */
+void *memalign_pages_area(unsigned int areas, size_t alignment, size_t size);
+
+/*
+ * Allocate aligned memory from any area.
+ * Equivalent to memalign_pages_area(~0, alignment, size).
+ */
+void *memalign_pages(size_t alignment, size_t size);
+
+/*
+ * Allocate naturally aligned memory from the specified areas.
+ * Equivalent to memalign_pages_area(areas, 1ull << order, 1ull << order).
+ */
+void *alloc_pages_area(unsigned int areas, unsigned int order);
+
+/*
+ * Allocate one page from any area.
+ * Equivalent to alloc_pages(0);
+ */
 void *alloc_page(void);
+
+/*
+ * Allocate naturally aligned memory from any area.
+ * Equivalent to alloc_pages_area(~0, order);
+ */
 void *alloc_pages(unsigned int order);
-void free_page(void *page);
+
+/*
+ * Frees a memory block allocated with any of the memalign_pages* or
+ * alloc_pages* functions.
+ * The pointer must point to the start of the block.
+ */
 void free_pages(void *mem, size_t size);
-void free_pages_by_order(void *mem, unsigned int order);
+
+/* For backwards compatibility */
+static inline void free_page(void *mem)
+{
+	return free_pages(mem, 1);
+}
+
+/* For backwards compatibility */
+static inline void free_pages_by_order(void *mem, unsigned int order)
+{
+	free_pages(mem, 1ull << order);
+}
 
 #endif
diff --git a/lib/alloc_page.c b/lib/alloc_page.c
index 74fe726..29d221f 100644
--- a/lib/alloc_page.c
+++ b/lib/alloc_page.c
@@ -9,169 +9,445 @@
 #include "alloc_phys.h"
 #include "alloc_page.h"
 #include "bitops.h"
+#include "list.h"
 #include <asm/page.h>
 #include <asm/io.h>
 #include <asm/spinlock.h>
+#include <asm/memory_areas.h>
 
+#define IS_ALIGNED_ORDER(x,order) IS_ALIGNED((x),BIT_ULL(order))
+#define NLISTS ((BITS_PER_LONG) - (PAGE_SHIFT))
+#define PFN(x) ((uintptr_t)(x) >> PAGE_SHIFT)
+
+#define MAX_AREAS	6
+
+#define ORDER_MASK	0x3f
+#define ALLOC_MASK	0x40
+
+struct mem_area {
+	/* Physical frame number of the first usable frame in the area */
+	uintptr_t base;
+	/* Physical frame number of the first frame outside the area */
+	uintptr_t top;
+	/* Combination ALLOC_MASK and order */
+	u8 *page_states;
+	/* One freelist for each possible block size, up to NLISTS */
+	struct linked_list freelists[NLISTS];
+};
+
+static struct mem_area areas[MAX_AREAS];
+static unsigned int areas_mask;
 static struct spinlock lock;
-static void *freelist = 0;
 
 bool page_alloc_initialized(void)
 {
-	return freelist != 0;
+	return areas_mask != 0;
 }
 
-void free_pages(void *mem, size_t size)
+static inline bool area_or_metadata_contains(struct mem_area *a, uintptr_t pfn)
 {
-	void *old_freelist;
-	void *end;
+	return (pfn >= PFN(a->page_states)) && (pfn < a->top);
+}
 
-	assert_msg((unsigned long) mem % PAGE_SIZE == 0,
-		   "mem not page aligned: %p", mem);
+static inline bool area_contains(struct mem_area *a, uintptr_t pfn)
+{
+	return (pfn >= a->base) && (pfn < a->top);
+}
 
-	assert_msg(size % PAGE_SIZE == 0, "size not page aligned: %#zx", size);
+/*
+ * Splits the free block starting at addr into 2 blocks of half the size.
+ *
+ * The function depends on the following assumptions:
+ * - The allocator must have been initialized
+ * - the block must be within the memory area
+ * - all pages in the block must be free and not special
+ * - the pointer must point to the start of the block
+ * - all pages in the block must have the same block size.
+ * - the block size must be greater than 0
+ * - the block size must be smaller than the maximum allowed
+ * - the block must be in a free list
+ * - the function is called with the lock held
+ */
+static void split(struct mem_area *a, void *addr)
+{
+	uintptr_t pfn = PFN(addr);
+	struct linked_list *p;
+	uintptr_t i, idx;
+	u8 order;
 
-	assert_msg(size == 0 || (uintptr_t)mem == -size ||
-		   (uintptr_t)mem + size > (uintptr_t)mem,
-		   "mem + size overflow: %p + %#zx", mem, size);
+	assert(a && area_contains(a, pfn));
+	idx = pfn - a->base;
+	order = a->page_states[idx];
+	assert(!(order & ~ORDER_MASK) && order && (order < NLISTS));
+	assert(IS_ALIGNED_ORDER(pfn, order));
+	assert(area_contains(a, pfn + BIT(order) - 1));
 
-	if (size == 0) {
-		freelist = NULL;
-		return;
-	}
+	/* Remove the block from its free list */
+	p = list_remove(addr);
+	assert(p);
 
-	spin_lock(&lock);
-	old_freelist = freelist;
-	freelist = mem;
-	end = mem + size;
-	while (mem + PAGE_SIZE != end) {
-		*(void **)mem = (mem + PAGE_SIZE);
-		mem += PAGE_SIZE;
+	/* update the block size for each page in the block */
+	for (i = 0; i < BIT(order); i++) {
+		assert(a->page_states[idx + i] == order);
+		a->page_states[idx + i] = order - 1;
 	}
-
-	*(void **)mem = old_freelist;
-	spin_unlock(&lock);
+	order--;
+	/* add the first half block to the appropriate free list */
+	list_add(a->freelists + order, p);
+	/* add the second half block to the appropriate free list */
+	list_add(a->freelists + order, (void *)((pfn + BIT(order)) * PAGE_SIZE));
 }
 
-void free_pages_by_order(void *mem, unsigned int order)
+/*
+ * Returns a block whose alignment and size are at least the parameter values.
+ * If there is not enough free memory, NULL is returned.
+ *
+ * Both parameters must be not larger than the largest allowed order
+ */
+static void *page_memalign_order(struct mem_area *a, u8 al, u8 sz)
 {
-	free_pages(mem, 1ul << (order + PAGE_SHIFT));
+	struct linked_list *p, *res = NULL;
+	u8 order;
+
+	assert((al < NLISTS) && (sz < NLISTS));
+	/* we need the bigger of the two as starting point */
+	order = sz > al ? sz : al;
+
+	/* search all free lists for some memory */
+	for ( ; order < NLISTS; order++) {
+		p = a->freelists[order].next;
+		if (!is_list_empty(p))
+			break;
+	}
+	/* out of memory */
+	if (order >= NLISTS)
+		return NULL;
+
+	/*
+	 * the block is bigger than what we need because either there were
+	 * no smaller blocks, or the smaller blocks were not aligned to our
+	 * needs; therefore we split the block until we reach the needed size
+	 */
+	for (; order > sz; order--)
+		split(a, p);
+
+	res = list_remove(p);
+	memset(a->page_states + (PFN(res) - a->base), ALLOC_MASK | order, BIT(order));
+	return res;
 }
 
-void *alloc_page()
+/*
+ * Try to merge two blocks into a bigger one.
+ * Returns true in case of a successful merge.
+ * Merging will succeed only if both blocks have the same block size and are
+ * both free.
+ *
+ * The function depends on the following assumptions:
+ * - the first parameter is strictly smaller than the second
+ * - the parameters must point each to the start of their block
+ * - the two parameters point to adjacent blocks
+ * - the two blocks are both in a free list
+ * - all of the pages of the two blocks must be free
+ * - all of the pages of the two blocks must have the same block size
+ * - the function is called with the lock held
+ */
+static bool coalesce(struct mem_area *a, u8 order, uintptr_t pfn, uintptr_t pfn2)
 {
-	void *p;
+	uintptr_t first, second, i;
+	struct linked_list *li;
 
-	if (!freelist)
-		return 0;
+	assert(IS_ALIGNED_ORDER(pfn, order) && IS_ALIGNED_ORDER(pfn2, order));
+	assert(pfn2 == pfn + BIT(order));
+	assert(a);
 
-	spin_lock(&lock);
-	p = freelist;
-	freelist = *(void **)freelist;
-	spin_unlock(&lock);
+	/* attempting to coalesce two blocks that belong to different areas */
+	if (!area_contains(a, pfn) || !area_contains(a, pfn2 + BIT(order) - 1))
+		return false;
+	first = pfn - a->base;
+	second = pfn2 - a->base;
+	/* the two blocks have different sizes, cannot coalesce */
+	if ((a->page_states[first] != order) || (a->page_states[second] != order))
+		return false;
 
-	if (p)
-		memset(p, 0, PAGE_SIZE);
-	return p;
+	/* we can coalesce, remove both blocks from their freelists */
+	li = list_remove((void *)(pfn2 << PAGE_SHIFT));
+	assert(li);
+	li = list_remove((void *)(pfn << PAGE_SHIFT));
+	assert(li);
+	/* check the metadata entries and update with the new size */
+	for (i = 0; i < (2ull << order); i++) {
+		assert(a->page_states[first + i] == order);
+		a->page_states[first + i] = order + 1;
+	}
+	/* finally add the newly coalesced block to the appropriate freelist */
+	list_add(a->freelists + order + 1, li);
+	return true;
 }
 
 /*
- * Allocates (1 << order) physically contiguous and naturally aligned pages.
- * Returns NULL if there's no memory left.
+ * Free a block of memory.
+ * The parameter can be NULL, in which case nothing happens.
+ *
+ * The function depends on the following assumptions:
+ * - the parameter is page aligned
+ * - the parameter belongs to an existing memory area
+ * - the parameter points to the beginning of the block
+ * - the size of the block is less than the maximum allowed
+ * - the block is completely contained in its memory area
+ * - all pages in the block have the same block size
+ * - no pages in the memory block were already free
+ * - no pages in the memory block are special
  */
-void *alloc_pages(unsigned int order)
+static void _free_pages(void *mem)
 {
-	/* Generic list traversal. */
-	void *prev;
-	void *curr = NULL;
-	void *next = freelist;
-
-	/* Looking for a run of length (1 << order). */
-	unsigned long run = 0;
-	const unsigned long n = 1ul << order;
-	const unsigned long align_mask = (n << PAGE_SHIFT) - 1;
-	void *run_start = NULL;
-	void *run_prev = NULL;
-	unsigned long run_next_pa = 0;
-	unsigned long pa;
+	uintptr_t pfn2, pfn = PFN(mem);
+	struct mem_area *a = NULL;
+	uintptr_t i, p;
+	u8 order;
 
-	assert(order < sizeof(unsigned long) * 8);
+	if (!mem)
+		return;
+	assert(IS_ALIGNED((uintptr_t)mem, PAGE_SIZE));
 
-	spin_lock(&lock);
-	for (;;) {
-		prev = curr;
-		curr = next;
+	/* find which area this pointer belongs to*/
+	for (i = 0; !a && (i < MAX_AREAS); i++) {
+		if ((areas_mask & BIT(i)) && area_contains(areas + i, pfn))
+			a = areas + i;
+	}
+	assert_msg(a, "memory does not belong to any area: %p", mem);
 
-		if (!curr) {
-			run_start = NULL;
-			break;
-		}
+	p = pfn - a->base;
+	order = a->page_states[p] & ORDER_MASK;
 
-		next = *((void **) curr);
-		pa = virt_to_phys(curr);
-
-		if (run == 0) {
-			if (!(pa & align_mask)) {
-				run_start = curr;
-				run_prev = prev;
-				run_next_pa = pa + PAGE_SIZE;
-				run = 1;
-			}
-		} else if (pa == run_next_pa) {
-			run_next_pa += PAGE_SIZE;
-			run += 1;
-		} else {
-			run = 0;
-		}
+	/* ensure that the first page is allocated and not special */
+	assert(a->page_states[p] == (order | ALLOC_MASK));
+	/* ensure that the order has a sane value */
+	assert(order < NLISTS);
+	/* ensure that the block is aligned properly for its size */
+	assert(IS_ALIGNED_ORDER(pfn, order));
+	/* ensure that the area can contain the whole block */
+	assert(area_contains(a, pfn + BIT(order) - 1));
 
-		if (run == n) {
-			if (run_prev)
-				*((void **) run_prev) = next;
-			else
-				freelist = next;
-			break;
-		}
+	for (i = 0; i < BIT(order); i++) {
+		/* check that all pages of the block have consistent metadata */
+		assert(a->page_states[p + i] == (ALLOC_MASK | order));
+		/* set the page as free */
+		a->page_states[p + i] &= ~ALLOC_MASK;
 	}
-	spin_unlock(&lock);
-	if (run_start)
-		memset(run_start, 0, n * PAGE_SIZE);
-	return run_start;
+	/* provisionally add the block to the appropriate free list */
+	list_add(a->freelists + order, mem);
+	/* try to coalesce the block with neighbouring blocks if possible */
+	do {
+		/*
+		 * get the order again since it might have changed after
+		 * coalescing in a previous iteration
+		 */
+		order = a->page_states[p] & ORDER_MASK;
+		/*
+		 * let's consider this block and the next one if this block
+		 * is aligned to the next size, otherwise let's consider the
+		 * previous block and this one
+		 */
+		if (!IS_ALIGNED_ORDER(pfn, order + 1))
+			pfn = pfn - BIT(order);
+		pfn2 = pfn + BIT(order);
+		/* repeat as long as we manage to coalesce something */
+	} while (coalesce(a, order, pfn, pfn2));
 }
 
+void free_pages(void *mem, size_t size)
+{
+	spin_lock(&lock);
+	_free_pages(mem);
+	spin_unlock(&lock);
+}
 
-void free_page(void *page)
+static void *page_memalign_order_area(unsigned area, u8 ord, u8 al)
 {
+	void *res = NULL;
+	int i;
+
 	spin_lock(&lock);
-	*(void **)page = freelist;
-	freelist = page;
+	area &= areas_mask;
+	for (i = 0; !res && (i < MAX_AREAS); i++)
+		if (area & BIT(i))
+			res = page_memalign_order(areas + i, ord, al);
 	spin_unlock(&lock);
+	return res;
 }
 
-static void *page_memalign(size_t alignment, size_t size)
+/*
+ * Allocates (1 << order) physically contiguous and naturally aligned pages.
+ * Returns NULL if the allocation was not possible.
+ */
+void *alloc_pages_area(unsigned int area, unsigned int order)
 {
-	unsigned long n = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
-	unsigned int order;
+	return page_memalign_order_area(area, order, order);
+}
 
-	if (!size)
-		return NULL;
+void *alloc_pages(unsigned int order)
+{
+	return alloc_pages_area(AREA_ANY, order);
+}
 
-	order = get_order(n);
+/*
+ * Allocates (1 << order) physically contiguous aligned pages.
+ * Returns NULL if the allocation was not possible.
+ */
+void *memalign_pages_area(unsigned int area, size_t alignment, size_t size)
+{
+	assert(is_power_of_2(alignment));
+	alignment = get_order(PAGE_ALIGN(alignment) >> PAGE_SHIFT);
+	size = get_order(PAGE_ALIGN(size) >> PAGE_SHIFT);
+	assert(alignment < NLISTS);
+	assert(size < NLISTS);
+	return page_memalign_order_area(area, size, alignment);
+}
 
-	return alloc_pages(order);
+void *memalign_pages(size_t alignment, size_t size)
+{
+	return memalign_pages_area(AREA_ANY, alignment, size);
 }
 
-static void page_free(void *mem, size_t size)
+/*
+ * Allocates one page
+ */
+void *alloc_page()
 {
-	free_pages(mem, size);
+	return alloc_pages(0);
 }
 
 static struct alloc_ops page_alloc_ops = {
-	.memalign = page_memalign,
-	.free = page_free,
+	.memalign = memalign_pages,
+	.free = free_pages,
 	.align_min = PAGE_SIZE,
 };
 
+/*
+ * Enables the page allocator.
+ *
+ * Prerequisites:
+ * - at least one memory area has been initialized
+ */
 void page_alloc_ops_enable(void)
 {
+	spin_lock(&lock);
+	assert(page_alloc_initialized());
 	alloc_ops = &page_alloc_ops;
+	spin_unlock(&lock);
+}
+
+/*
+ * Adds a new memory area to the pool of available memory.
+ *
+ * Prerequisites:
+ * - the lock is held
+ * - start and top are page frame numbers
+ * - start is smaller than top
+ * - top does not fall outside of addressable memory
+ * - there is at least one more slot free for memory areas
+ * - if a specific memory area number has been indicated, it needs to be free
+ * - the memory area to add does not overlap with existing areas
+ * - the memory area to add has at least 5 pages available
+ */
+static void _page_alloc_init_area(u8 n, uintptr_t start_pfn, uintptr_t top_pfn)
+{
+	size_t table_size, npages, i;
+	struct mem_area *a;
+	u8 order = 0;
+
+	/* the number must be within the allowed range */
+	assert(n < MAX_AREAS);
+	/* the new area number must be unused */
+	assert(!(areas_mask & BIT(n)));
+
+	/* other basic sanity checks */
+	assert(top_pfn > start_pfn);
+	assert(top_pfn - start_pfn > 4);
+	assert(top_pfn < BIT_ULL(sizeof(void *) * 8 - PAGE_SHIFT));
+
+	/* calculate the size of the metadata table in pages */
+	table_size = (top_pfn - start_pfn + PAGE_SIZE) / (PAGE_SIZE + 1);
+
+	/* fill in the values of the new area */
+	a = areas + n;
+	a->page_states = (void *)(start_pfn << PAGE_SHIFT);
+	a->base = start_pfn + table_size;
+	a->top = top_pfn;
+	npages = top_pfn - a->base;
+	assert((a->base - start_pfn) * PAGE_SIZE >= npages);
+
+	/* check that the new area does not overlap with any existing areas */
+	for (i = 0; i < MAX_AREAS; i++) {
+		if (!(areas_mask & BIT(i)))
+			continue;
+		assert(!area_or_metadata_contains(areas + i, start_pfn));
+		assert(!area_or_metadata_contains(areas + i, top_pfn - 1));
+		assert(!area_or_metadata_contains(a, PFN(areas[i].page_states)));
+		assert(!area_or_metadata_contains(a, areas[i].top - 1));
+	}
+	/* initialize all freelists for the new area */
+	for (i = 0; i < NLISTS; i++)
+		a->freelists[i].next = a->freelists[i].prev = a->freelists + i;
+
+	/* initialize the metadata for the available memory */
+	for (i = a->base; i < a->top; i += 1ull << order) {
+		/* search which order to start from */
+		while (i + BIT(order) > a->top) {
+			assert(order);
+			order--;
+		}
+		/*
+		 * we need both loops, one for the start and the other for
+		 * the end of the block, in case it spans a power of two
+		 * boundary
+		 */
+		while (IS_ALIGNED_ORDER(i, order + 1) && (i + BIT(order + 1) <= a->top))
+			order++;
+		assert(order < NLISTS);
+		/* initialize the metadata and add to the freelist */
+		memset(a->page_states + (i - a->base), order, BIT(order));
+		list_add(a->freelists + order, (void *)(i << PAGE_SHIFT));
+	}
+	/* finally mark the area as present */
+	areas_mask |= BIT(n);
+}
+
+static void __page_alloc_init_area(u8 n, uintptr_t cutoff, uintptr_t base_pfn, uintptr_t *top_pfn)
+{
+	if (*top_pfn > cutoff) {
+		spin_lock(&lock);
+		if (base_pfn >= cutoff) {
+			_page_alloc_init_area(n, base_pfn, *top_pfn);
+			*top_pfn = 0;
+		} else {
+			_page_alloc_init_area(n, cutoff, *top_pfn);
+			*top_pfn = cutoff;
+		}
+		spin_unlock(&lock);
+	}
+}
+
+/*
+ * Adds a new memory area to the pool of available memory.
+ *
+ * Prerequisites:
+ * see _page_alloc_init_area
+ */
+void page_alloc_init_area(u8 n, uintptr_t base_pfn, uintptr_t top_pfn)
+{
+	if (n != AREA_ANY_NUMBER) {
+		__page_alloc_init_area(n, 0, base_pfn, &top_pfn);
+		return;
+	}
+#ifdef AREA_HIGH_PFN
+	__page_alloc_init_area(AREA_HIGH_NUMBER, AREA_HIGH_PFN), base_pfn, &top_pfn);
+#endif
+	__page_alloc_init_area(AREA_NORMAL_NUMBER, AREA_NORMAL_PFN, base_pfn, &top_pfn);
+#ifdef AREA_LOW_PFN
+	__page_alloc_init_area(AREA_LOW_NUMBER, AREA_LOW_PFN, base_pfn, &top_pfn);
+#endif
+#ifdef AREA_LOWEST_PFN
+	__page_alloc_init_area(AREA_LOWEST_NUMBER, AREA_LOWEST_PFN, base_pfn, &top_pfn);
+#endif
 }
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 78562e4..3f03ca6 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -155,7 +155,7 @@ static void mem_init(phys_addr_t freemem_start)
 	assert(sizeof(long) == 8 || !(base >> 32));
 	if (sizeof(long) != 8 && (top >> 32) != 0)
 		top = ((uint64_t)1 << 32);
-	free_pages((void *)(unsigned long)base, top - base);
+	page_alloc_init_area(0, base >> PAGE_SHIFT, top >> PAGE_SHIFT);
 	page_alloc_ops_enable();
 }
 
diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
index 4054d0e..4e2ac18 100644
--- a/lib/s390x/sclp.c
+++ b/lib/s390x/sclp.c
@@ -37,11 +37,11 @@ static void mem_init(phys_addr_t mem_end)
 
 	phys_alloc_init(freemem_start, mem_end - freemem_start);
 	phys_alloc_get_unused(&base, &top);
-	base = (base + PAGE_SIZE - 1) & -PAGE_SIZE;
-	top = top & -PAGE_SIZE;
+	base = PAGE_ALIGN(base) >> PAGE_SHIFT;
+	top = top >> PAGE_SHIFT;
 
 	/* Make the pages available to the physical allocator */
-	free_pages((void *)(unsigned long)base, top - base);
+	page_alloc_init_area(AREA_ANY_NUMBER, base, top);
 	page_alloc_ops_enable();
 }
 
diff --git a/lib/s390x/smp.c b/lib/s390x/smp.c
index 2860e9c..ea93329 100644
--- a/lib/s390x/smp.c
+++ b/lib/s390x/smp.c
@@ -190,7 +190,7 @@ int smp_cpu_setup(uint16_t addr, struct psw psw)
 
 	sigp_retry(cpu->addr, SIGP_INITIAL_CPU_RESET, 0, NULL);
 
-	lc = alloc_pages(1);
+	lc = alloc_pages_area(AREA_DMA31, 1);
 	cpu->lowcore = lc;
 	memset(lc, 0, PAGE_SIZE * 2);
 	sigp_retry(cpu->addr, SIGP_SET_PREFIX, (unsigned long )lc, NULL);
diff --git a/lib/vmalloc.c b/lib/vmalloc.c
index 2f25734..3aec5ac 100644
--- a/lib/vmalloc.c
+++ b/lib/vmalloc.c
@@ -217,18 +217,19 @@ void setup_vm()
 	 * so that it can be used to allocate page tables.
 	 */
 	if (!page_alloc_initialized()) {
-		base = PAGE_ALIGN(base);
-		top = top & -PAGE_SIZE;
-		free_pages(phys_to_virt(base), top - base);
+		base = PAGE_ALIGN(base) >> PAGE_SHIFT;
+		top = top >> PAGE_SHIFT;
+		page_alloc_init_area(AREA_ANY_NUMBER, base, top);
+		page_alloc_ops_enable();
 	}
 
 	find_highmem();
 	phys_alloc_get_unused(&base, &top);
 	page_root = setup_mmu(top);
 	if (base != top) {
-		base = PAGE_ALIGN(base);
-		top = top & -PAGE_SIZE;
-		free_pages(phys_to_virt(base), top - base);
+		base = PAGE_ALIGN(base) >> PAGE_SHIFT;
+		top = top >> PAGE_SHIFT;
+		page_alloc_init_area(AREA_ANY_NUMBER, base, top);
 	}
 
 	spin_lock(&lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 5/7] lib/alloc: simplify free and malloc
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
                   ` (3 preceding siblings ...)
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops Claudio Imbrenda
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

Remove the size parameter from the various free functions

Since the backends can handle the allocation sizes on their own,
simplify the generic malloc wrappers.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/alloc.h      |  2 +-
 lib/alloc_page.h |  6 +++---
 lib/alloc.c      | 42 +++++-------------------------------------
 lib/alloc_page.c |  2 +-
 lib/s390x/smp.c  |  4 ++--
 lib/vmalloc.c    |  2 +-
 s390x/smp.c      |  4 ++--
 7 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/lib/alloc.h b/lib/alloc.h
index c44d459..9b4b634 100644
--- a/lib/alloc.h
+++ b/lib/alloc.h
@@ -24,7 +24,7 @@
 
 struct alloc_ops {
 	void *(*memalign)(size_t alignment, size_t size);
-	void (*free)(void *ptr, size_t size);
+	void (*free)(void *ptr);
 	size_t align_min;
 };
 
diff --git a/lib/alloc_page.h b/lib/alloc_page.h
index 81847ae..6c23018 100644
--- a/lib/alloc_page.h
+++ b/lib/alloc_page.h
@@ -61,18 +61,18 @@ void *alloc_pages(unsigned int order);
  * alloc_pages* functions.
  * The pointer must point to the start of the block.
  */
-void free_pages(void *mem, size_t size);
+void free_pages(void *mem);
 
 /* For backwards compatibility */
 static inline void free_page(void *mem)
 {
-	return free_pages(mem, 1);
+	return free_pages(mem);
 }
 
 /* For backwards compatibility */
 static inline void free_pages_by_order(void *mem, unsigned int order)
 {
-	free_pages(mem, 1ull << order);
+	free_pages(mem);
 }
 
 #endif
diff --git a/lib/alloc.c b/lib/alloc.c
index 9d89d24..a46f464 100644
--- a/lib/alloc.c
+++ b/lib/alloc.c
@@ -50,56 +50,24 @@ void *calloc(size_t nmemb, size_t size)
 	return ptr;
 }
 
-#define METADATA_EXTRA	(2 * sizeof(uintptr_t))
-#define OFS_SLACK	(-2 * sizeof(uintptr_t))
-#define OFS_SIZE	(-sizeof(uintptr_t))
-
-static inline void *block_begin(void *mem)
-{
-	uintptr_t slack = *(uintptr_t *)(mem + OFS_SLACK);
-	return mem - slack;
-}
-
-static inline uintptr_t block_size(void *mem)
-{
-	return *(uintptr_t *)(mem + OFS_SIZE);
-}
-
 void free(void *ptr)
 {
-	if (!alloc_ops->free)
-		return;
-
-	void *base = block_begin(ptr);
-	uintptr_t sz = block_size(ptr);
-
-	alloc_ops->free(base, sz);
+	if (alloc_ops->free)
+		alloc_ops->free(ptr);
 }
 
 void *memalign(size_t alignment, size_t size)
 {
 	void *p;
-	uintptr_t blkalign;
-	uintptr_t mem;
 
 	if (!size)
 		return NULL;
 
-	assert(alignment >= sizeof(void *) && is_power_of_2(alignment));
+	assert(is_power_of_2(alignment));
 	assert(alloc_ops && alloc_ops->memalign);
 
-	size += alignment - 1;
-	blkalign = MAX(alignment, alloc_ops->align_min);
-	size = ALIGN(size + METADATA_EXTRA, alloc_ops->align_min);
-	p = alloc_ops->memalign(blkalign, size);
+	p = alloc_ops->memalign(alignment, size);
 	assert(p);
 
-	/* Leave room for metadata before aligning the result.  */
-	mem = (uintptr_t)p + METADATA_EXTRA;
-	mem = ALIGN(mem, alignment);
-
-	/* Write the metadata */
-	*(uintptr_t *)(mem + OFS_SLACK) = mem - (uintptr_t)p;
-	*(uintptr_t *)(mem + OFS_SIZE) = size;
-	return (void *)mem;
+	return (void *)p;
 }
diff --git a/lib/alloc_page.c b/lib/alloc_page.c
index 29d221f..046082a 100644
--- a/lib/alloc_page.c
+++ b/lib/alloc_page.c
@@ -255,7 +255,7 @@ static void _free_pages(void *mem)
 	} while (coalesce(a, order, pfn, pfn2));
 }
 
-void free_pages(void *mem, size_t size)
+void free_pages(void *mem)
 {
 	spin_lock(&lock);
 	_free_pages(mem);
diff --git a/lib/s390x/smp.c b/lib/s390x/smp.c
index ea93329..77d80ca 100644
--- a/lib/s390x/smp.c
+++ b/lib/s390x/smp.c
@@ -163,8 +163,8 @@ int smp_cpu_destroy(uint16_t addr)
 	rc = smp_cpu_stop_nolock(addr, false);
 	if (!rc) {
 		cpu = smp_cpu_from_addr(addr);
-		free_pages(cpu->lowcore, 2 * PAGE_SIZE);
-		free_pages(cpu->stack, 4 * PAGE_SIZE);
+		free_pages(cpu->lowcore);
+		free_pages(cpu->stack);
 		cpu->lowcore = (void *)-1UL;
 		cpu->stack = (void *)-1UL;
 	}
diff --git a/lib/vmalloc.c b/lib/vmalloc.c
index 3aec5ac..986a34c 100644
--- a/lib/vmalloc.c
+++ b/lib/vmalloc.c
@@ -159,7 +159,7 @@ static void *vm_memalign(size_t alignment, size_t size)
 	return mem;
 }
 
-static void vm_free(void *mem, size_t size)
+static void vm_free(void *mem)
 {
 	struct metadata *m;
 	uintptr_t ptr, end;
diff --git a/s390x/smp.c b/s390x/smp.c
index ad30e3c..4ca1dce 100644
--- a/s390x/smp.c
+++ b/s390x/smp.c
@@ -143,7 +143,7 @@ static void test_store_status(void)
 	sigp(1, SIGP_STORE_STATUS_AT_ADDRESS, (uintptr_t)status, NULL);
 	while (!status->prefix) { mb(); }
 	report(1, "status written");
-	free_pages(status, PAGE_SIZE * 2);
+	free_pages(status);
 	report_prefix_pop();
 	smp_cpu_stop(1);
 
@@ -276,7 +276,7 @@ static void test_reset_initial(void)
 	report_prefix_pop();
 
 	report(smp_cpu_stopped(1), "cpu stopped");
-	free_pages(status, PAGE_SIZE);
+	free_pages(status);
 	report_prefix_pop();
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
                   ` (4 preceding siblings ...)
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 5/7] lib/alloc: simplify free and malloc Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-11-06 11:35   ` Paolo Bonzini
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 7/7] lib/alloc_page: allow reserving arbitrary memory ranges Claudio Imbrenda
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

Remove align_min from struct alloc_ops.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/alloc.h      | 1 -
 lib/alloc_page.c | 1 -
 lib/alloc_phys.c | 9 +++++----
 lib/vmalloc.c    | 1 -
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/alloc.h b/lib/alloc.h
index 9b4b634..db90b01 100644
--- a/lib/alloc.h
+++ b/lib/alloc.h
@@ -25,7 +25,6 @@
 struct alloc_ops {
 	void *(*memalign)(size_t alignment, size_t size);
 	void (*free)(void *ptr);
-	size_t align_min;
 };
 
 extern struct alloc_ops *alloc_ops;
diff --git a/lib/alloc_page.c b/lib/alloc_page.c
index 046082a..3c6c4ee 100644
--- a/lib/alloc_page.c
+++ b/lib/alloc_page.c
@@ -320,7 +320,6 @@ void *alloc_page()
 static struct alloc_ops page_alloc_ops = {
 	.memalign = memalign_pages,
 	.free = free_pages,
-	.align_min = PAGE_SIZE,
 };
 
 /*
diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
index 72e20f7..a4d2bf2 100644
--- a/lib/alloc_phys.c
+++ b/lib/alloc_phys.c
@@ -29,8 +29,8 @@ static phys_addr_t base, top;
 static void *early_memalign(size_t alignment, size_t size);
 static struct alloc_ops early_alloc_ops = {
 	.memalign = early_memalign,
-	.align_min = DEFAULT_MINIMUM_ALIGNMENT
 };
+static size_t align_min;
 
 struct alloc_ops *alloc_ops = &early_alloc_ops;
 
@@ -39,8 +39,7 @@ void phys_alloc_show(void)
 	int i;
 
 	spin_lock(&lock);
-	printf("phys_alloc minimum alignment: %#" PRIx64 "\n",
-		(u64)early_alloc_ops.align_min);
+	printf("phys_alloc minimum alignment: %#" PRIx64 "\n", (u64)align_min);
 	for (i = 0; i < nr_regions; ++i)
 		printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
 			(u64)regions[i].base,
@@ -64,7 +63,7 @@ void phys_alloc_set_minimum_alignment(phys_addr_t align)
 {
 	assert(align && !(align & (align - 1)));
 	spin_lock(&lock);
-	early_alloc_ops.align_min = align;
+	align_min = align;
 	spin_unlock(&lock);
 }
 
@@ -83,6 +82,8 @@ static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
 		top_safe = MIN(top_safe, 1ULL << 32);
 
 	assert(base < top_safe);
+	if (align < align_min)
+		align = align_min;
 
 	addr = ALIGN(base, align);
 	size += addr - base;
diff --git a/lib/vmalloc.c b/lib/vmalloc.c
index 986a34c..b28a390 100644
--- a/lib/vmalloc.c
+++ b/lib/vmalloc.c
@@ -188,7 +188,6 @@ static void vm_free(void *mem)
 static struct alloc_ops vmalloc_ops = {
 	.memalign = vm_memalign,
 	.free = vm_free,
-	.align_min = PAGE_SIZE,
 };
 
 void __attribute__((__weak__)) find_highmem(void)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [kvm-unit-tests PATCH v2 7/7] lib/alloc_page: allow reserving arbitrary memory ranges
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
                   ` (5 preceding siblings ...)
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops Claudio Imbrenda
@ 2020-10-02 15:44 ` Claudio Imbrenda
  2020-10-05 11:54 ` [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Pierre Morel
  2020-11-06 11:36 ` Paolo Bonzini
  8 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-02 15:44 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier

Two new functions are introduced, that allow specific memory ranges to
be reserved and freed.

This is useful when a testcase needs memory at very specific addresses,
with the guarantee that the page allocator will not touch those pages.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 lib/alloc_page.h | 15 ++++++++++
 lib/alloc_page.c | 78 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/lib/alloc_page.h b/lib/alloc_page.h
index 6c23018..816ff5d 100644
--- a/lib/alloc_page.h
+++ b/lib/alloc_page.h
@@ -75,4 +75,19 @@ static inline void free_pages_by_order(void *mem, unsigned int order)
 	free_pages(mem);
 }
 
+/*
+ * Allocates and reserves the specified memory range if possible.
+ * Returns NULL in case of failure.
+ */
+void *alloc_pages_special(uintptr_t addr, size_t npages);
+
+/*
+ * Frees a reserved memory range that had been reserved with
+ * alloc_pages_special.
+ * The memory range does not need to match a previous allocation
+ * exactly, it can also be a subset, in which case only the specified
+ * pages will be freed and unreserved.
+ */
+void free_pages_special(uintptr_t addr, size_t npages);
+
 #endif
diff --git a/lib/alloc_page.c b/lib/alloc_page.c
index 3c6c4ee..d9665a4 100644
--- a/lib/alloc_page.c
+++ b/lib/alloc_page.c
@@ -23,13 +23,14 @@
 
 #define ORDER_MASK	0x3f
 #define ALLOC_MASK	0x40
+#define SPECIAL_MASK	0x80
 
 struct mem_area {
 	/* Physical frame number of the first usable frame in the area */
 	uintptr_t base;
 	/* Physical frame number of the first frame outside the area */
 	uintptr_t top;
-	/* Combination ALLOC_MASK and order */
+	/* Combination of SPECIAL_MASK, ALLOC_MASK, and order */
 	u8 *page_states;
 	/* One freelist for each possible block size, up to NLISTS */
 	struct linked_list freelists[NLISTS];
@@ -136,6 +137,16 @@ static void *page_memalign_order(struct mem_area *a, u8 al, u8 sz)
 	return res;
 }
 
+static struct mem_area *get_area(uintptr_t pfn)
+{
+	uintptr_t i;
+
+	for (i = 0; i < MAX_AREAS; i++)
+		if ((areas_mask & BIT(i)) && area_contains(areas + i, pfn))
+			return areas + i;
+	return NULL;
+}
+
 /*
  * Try to merge two blocks into a bigger one.
  * Returns true in case of a successful merge.
@@ -210,10 +221,7 @@ static void _free_pages(void *mem)
 	assert(IS_ALIGNED((uintptr_t)mem, PAGE_SIZE));
 
 	/* find which area this pointer belongs to*/
-	for (i = 0; !a && (i < MAX_AREAS); i++) {
-		if ((areas_mask & BIT(i)) && area_contains(areas + i, pfn))
-			a = areas + i;
-	}
+	a = get_area(pfn);
 	assert_msg(a, "memory does not belong to any area: %p", mem);
 
 	p = pfn - a->base;
@@ -262,6 +270,66 @@ void free_pages(void *mem)
 	spin_unlock(&lock);
 }
 
+static void *_alloc_page_special(uintptr_t addr)
+{
+	struct mem_area *a;
+	uintptr_t mask, i;
+
+	a = get_area(PFN(addr));
+	assert(a);
+	i = PFN(addr) - a->base;
+	if (a->page_states[i] & (ALLOC_MASK | SPECIAL_MASK))
+		return NULL;
+	while (a->page_states[i]) {
+		mask = GENMASK_ULL(63, PAGE_SHIFT + a->page_states[i]);
+		split(a, (void *)(addr & mask));
+	}
+	a->page_states[i] = SPECIAL_MASK;
+	return (void *)addr;
+}
+
+static void _free_page_special(uintptr_t addr)
+{
+	struct mem_area *a;
+	uintptr_t i;
+
+	a = get_area(PFN(addr));
+	assert(a);
+	i = PFN(addr) - a->base;
+	assert(a->page_states[i] == SPECIAL_MASK);
+	a->page_states[i] = ALLOC_MASK;
+	_free_pages((void *)addr);
+}
+
+void *alloc_pages_special(uintptr_t addr, size_t n)
+{
+	uintptr_t i;
+
+	assert(IS_ALIGNED(addr, PAGE_SIZE));
+	spin_lock(&lock);
+	for (i = 0; i < n; i++)
+		if (!_alloc_page_special(addr + i * PAGE_SIZE))
+			break;
+	if (i < n) {
+		for (n = 0 ; n < i; n++)
+			_free_page_special(addr + n * PAGE_SIZE);
+		addr = 0;
+	}
+	spin_unlock(&lock);
+	return (void *)addr;
+}
+
+void free_pages_special(uintptr_t addr, size_t n)
+{
+	uintptr_t i;
+
+	assert(IS_ALIGNED(addr, PAGE_SIZE));
+	spin_lock(&lock);
+	for (i = 0; i < n; i++)
+		_free_page_special(addr + i * PAGE_SIZE);
+	spin_unlock(&lock);
+}
+
 static void *page_memalign_order_area(unsigned area, u8 ord, u8 al)
 {
 	void *res = NULL;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions Claudio Imbrenda
@ 2020-10-02 18:18   ` Andrew Jones
  2020-10-05  6:57     ` Claudio Imbrenda
  2020-11-06 11:29   ` Paolo Bonzini
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2020-10-02 18:18 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Fri, Oct 02, 2020 at 05:44:14PM +0200, Claudio Imbrenda wrote:
> Add simple double linked lists.
> 
> Apart from the struct itself, there are functions to add and remove
> items, and check for emptyness.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  lib/list.h | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
>  create mode 100644 lib/list.h
> 
> diff --git a/lib/list.h b/lib/list.h
> new file mode 100644
> index 0000000..702a78c
> --- /dev/null
> +++ b/lib/list.h
> @@ -0,0 +1,53 @@
> +#ifndef LIST_H
> +#define LIST_H
> +
> +#include <stdbool.h>
> +
> +/*
> + * Simple double linked list. The pointer to the list is a list item itself,
> + * like in the kernel implementation.
> + */
> +struct linked_list {
> +	struct linked_list *prev;
> +	struct linked_list *next;
> +};
> +
> +/*
> + * An empty list is a list item whose prev and next both point to itself.
> + * Returns true if the list is empty.
> + */
> +static inline bool is_list_empty(struct linked_list *p)

I'd prefer the 'is' to be dropped or to come after 'list' in the name.
Or, why not just use all the same names as the kernel, including call
the structure list_head?

> +{
> +	return !p->next || !p->prev || p == p->next || p == p->prev;

The '||'s can't be right and I'm not sure what you want to do about the
NULLs. I think forbidding them is probably best, meaning this function
should be

 {
  assert(p && p->prev && p->next);
  return p->prev == p && p->next == p;
 }

But since p can't be NULL, then we should always return true from this
function anyway, as a list with a single entry ('p') isn't empty.
The kernel's list_empty() call explicitly calls its parameter 'head',
because it expects a list's head pointer, which is a pointer to a
list_head that is not embedded in a structure.

> +}
> +
> +/*
> + * Remove the given element from the list, if the list is not already empty.
> + * The removed element is returned.
> + */
> +static inline struct linked_list *list_remove(struct linked_list *l)
> +{
> +	if (is_list_empty(l))
> +		return NULL;

This isn't necessary. Removing an entry from a list of one entry is still
the entry.

> +
> +	l->prev->next = l->next;
> +	l->next->prev = l->prev;
> +	l->prev = l->next = NULL;
> +
> +	return l;
> +}
> +
> +/*
> + * Add the given element after the given list head.
> + */
> +static inline void list_add(struct linked_list *head, struct linked_list *li)
> +{
> +	assert(li);
> +	assert(head);
> +	li->prev = head;
> +	li->next = head->next;
> +	head->next->prev = li;
> +	head->next = li;
> +}
> +
> +#endif
> -- 
> 2.26.2
>

I think we should just copy the kernel's list_head much closer.

Thanks,
drew 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata Claudio Imbrenda
@ 2020-10-03  8:46   ` Andrew Jones
  2020-10-05  7:00     ` Claudio Imbrenda
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2020-10-03  8:46 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Fri, Oct 02, 2020 at 05:44:15PM +0200, Claudio Imbrenda wrote:
> Add allocation metadata handling to vmalloc.
> 
> In upcoming patches, allocation metadata will have to be handled
> directly bt the lower level allocators, and will not be handled by the
> common wrapper.
> 
> In this patch, the number of allocated pages plus a magic value are
> written immediately before the returned pointer. This means that multi
> page allocations will allocate one extra page (which is not worse than
> what the current allocator does).
> 
> For small allocations there is an optimization: the returned address is
> intentionally not page-aligned. This signals that the allocation
> spanned one page only. In this case the metadata is only the magic
> value, and it is also saved immediately before the returned pointer.
> Since the pointer does not point to the begininng of the page, there is
> always space in the same page for the magic value.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  lib/vmalloc.c | 105 +++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 95 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/vmalloc.c b/lib/vmalloc.c
> index e0c7b6b..2f25734 100644
> --- a/lib/vmalloc.c
> +++ b/lib/vmalloc.c
> @@ -15,6 +15,16 @@
>  #include <bitops.h>
>  #include "vmalloc.h"
>  
> +#define VM_MAGIC 0x7E57C0DE
> +
> +#define GET_METADATA(x) (((struct metadata *)(x)) - 1)
> +#define GET_MAGIC(x) (*((unsigned long *)(x) - 1))
> +
> +struct metadata {
> +	unsigned long npages;
> +	unsigned long magic;
> +};
> +
>  static struct spinlock lock;
>  static void *vfree_top = 0;
>  static void *page_root;
> @@ -25,8 +35,14 @@ static void *page_root;
>   *
>   * nr is the number of pages to allocate
>   * alignment_pages is the alignment of the allocation *in pages*
> + * metadata indicates whether an extra (unaligned) page needs to be allocated
> + * right before the main (aligned) allocation.
> + *
> + * The return value points to the first allocated virtual page, which will
> + * be the (potentially unaligned) metadata page if the metadata flag is
> + * specified.
>   */
> -void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
> +static void *do_alloc_vpages(ulong nr, unsigned int align_order, bool metadata)
>  {
>  	uintptr_t ptr;
>  
> @@ -34,6 +50,8 @@ void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
>  	ptr = (uintptr_t)vfree_top;
>  	ptr -= PAGE_SIZE * nr;
>  	ptr &= GENMASK_ULL(63, PAGE_SHIFT + align_order);
> +	if (metadata)
> +		ptr -= PAGE_SIZE;
>  	vfree_top = (void *)ptr;
>  	spin_unlock(&lock);
>  
> @@ -41,6 +59,11 @@ void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
>  	return (void *)ptr;
>  }
>  
> +void *alloc_vpages_aligned(ulong nr, unsigned int align_order)
> +{
> +	return do_alloc_vpages(nr, align_order, false);
> +}
> +
>  void *alloc_vpages(ulong nr)
>  {
>  	return alloc_vpages_aligned(nr, 0);
> @@ -69,35 +92,97 @@ void *vmap(phys_addr_t phys, size_t size)
>  	return mem;
>  }
>  
> +/*
> + * Allocate one page, for an object with specified alignment.
> + * The resulting pointer will be aligned to the required alignment, but
> + * intentionally not page-aligned.
> + * The metadata for single pages allocation is just the magic value,
> + * which is placed right before the pointer, like for bigger allocations.
> + */
> +static void *vm_alloc_one_page(size_t alignment)
> +{
> +	void *p;
> +
> +	/* this guarantees that there will be space for the magic value */
> +	assert(alignment >= sizeof(uintptr_t));
> +	assert(alignment < PAGE_SIZE);
> +	p = alloc_vpage();
> +	install_page(page_root, virt_to_phys(alloc_page()), p);
> +	p = (void *)((uintptr_t)p + alignment);
> +	/* write the magic value right before the returned address */
> +	GET_MAGIC(p) = VM_MAGIC;
> +	return p;
> +}
> +
>  /*
>   * Allocate virtual memory, with the specified minimum alignment.
> + * If the allocation fits in one page, only one page is allocated. Otherwise
> + * enough pages are allocated for the object, plus one to keep metadata
> + * information about the allocation.
>   */
>  static void *vm_memalign(size_t alignment, size_t size)
>  {
> +	struct metadata *m;
>  	phys_addr_t pa;
> -	void *mem, *p;
> +	uintptr_t p;
> +	void *mem;
> +	size_t i;
>  
> +	if (!size)
> +		return NULL;
>  	assert(is_power_of_2(alignment));
>  
> +	if (alignment < sizeof(uintptr_t))
> +		alignment = sizeof(uintptr_t);
> +	/* it fits in one page, allocate only one page */
> +	if (alignment + size <= PAGE_SIZE)
> +		return vm_alloc_one_page(alignment);
>  	size = PAGE_ALIGN(size) / PAGE_SIZE;
>  	alignment = get_order(PAGE_ALIGN(alignment) / PAGE_SIZE);
> -	mem = p = alloc_vpages_aligned(size, alignment);
> -	while (size--) {
> +	mem = do_alloc_vpages(size, alignment, true);
> +	p = (uintptr_t)mem;
> +	/* skip the metadata page */
> +	mem = (void *)(p + PAGE_SIZE);
> +	/*
> +	 * time to actually allocate the physical pages to back our virtual
> +	 * allocation; note that we need to allocate one extra page (for the
> +	 * metadata), hence the <=
> +	 */
> +	for (i = 0; i <= size; i++, p += PAGE_SIZE) {
>  		pa = virt_to_phys(alloc_page());
>  		assert(pa);
> -		install_page(page_root, pa, p);
> -		p += PAGE_SIZE;
> +		install_page(page_root, pa, (void *)p);
>  	}
> +	m = GET_METADATA(mem);
> +	m->npages = size;
> +	m->magic = VM_MAGIC;
>  	return mem;
>  }
>  
>  static void vm_free(void *mem, size_t size)
>  {
> -	while (size) {
> -		free_page(phys_to_virt(virt_to_pte_phys(page_root, mem)));
> -		mem += PAGE_SIZE;
> -		size -= PAGE_SIZE;
> +	struct metadata *m;
> +	uintptr_t ptr, end;
> +
> +	/* the pointer is not page-aligned, it was a single-page allocation */
> +	if (!IS_ALIGNED((uintptr_t)mem, PAGE_SIZE)) {
> +		assert(GET_MAGIC(mem) == VM_MAGIC);
> +		ptr = virt_to_pte_phys(page_root, mem) & PAGE_MASK;
> +		free_page(phys_to_virt(ptr));
> +		return;
>  	}
> +
> +	/* the pointer is page-aligned, it was a multi-page allocation */
> +	m = GET_METADATA(mem);
> +	assert(m->magic == VM_MAGIC);
> +	assert(m->npages > 0);
> +	/* free all the pages including the metadata page */
> +	ptr = (uintptr_t)mem - PAGE_SIZE;
> +	end = ptr + m->npages * PAGE_SIZE;
> +	for ( ; ptr < end; ptr += PAGE_SIZE)
> +		free_page(phys_to_virt(virt_to_pte_phys(page_root, (void *)ptr)));
> +	/* free the last one separately to avoid overflow issues */
> +	free_page(phys_to_virt(virt_to_pte_phys(page_root, (void *)ptr)));

I don't get this. How is

 for (p = start; p < end; p += step)
   process(p);
 process(p)

different from

 for (p = start; p <= end; p += step)
   process(p);

To avoid overflow issues we should simple ensure start and end are
computed correctly. Also, I'd prefer 'end' point to the actual end,
not the last included page, e.g. start=0x1000, end=0x1fff. Then we
have

 start = get_start();
 assert(PAGE_ALIGN(start) == start);
 end = start + nr_pages * PAGE_SIZE - 1;
 assert(start < end);
 for (p = start; start < end; p += PAGE_SIZE)
   process(p);

Thanks,
drew


>  }
>  
>  static struct alloc_ops vmalloc_ops = {
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas Claudio Imbrenda
@ 2020-10-03  9:23   ` Andrew Jones
  2020-10-05  7:10     ` Claudio Imbrenda
  2020-11-06 11:34   ` Paolo Bonzini
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2020-10-03  9:23 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Fri, Oct 02, 2020 at 05:44:16PM +0200, Claudio Imbrenda wrote:
> Add definitions and boundaries of memory areas for some architectures.
> This is needed by the next patch.
> 
> Most architectures only get one generic memory area, wherease x86 and
> s390x get some more attention:
> 
> x86 gets
> * lowest area (24-bit addresses)
> * low area (32-bit addresses)
> * the rest
> 
> s390x gets
> * low area (31-bit addresses)
> * the rest
> 
> Notice that the number indicates the order in which the areas are
> scanned when more than one area is indicated. The default order tries
> to get allocations from higher address ranges before trying lower ones.
> This tries to keep the precious lower addresses as free as possible.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  lib/asm-generic/memory_areas.h | 11 +++++++++++
>  lib/arm/asm/memory_areas.h     | 11 +++++++++++
>  lib/arm64/asm/memory_areas.h   | 11 +++++++++++
>  lib/powerpc/asm/memory_areas.h | 11 +++++++++++
>  lib/ppc64/asm/memory_areas.h   | 11 +++++++++++
>  lib/s390x/asm/memory_areas.h   | 17 +++++++++++++++++
>  lib/x86/asm/memory_areas.h     | 22 ++++++++++++++++++++++
>  7 files changed, 94 insertions(+)
>  create mode 100644 lib/asm-generic/memory_areas.h
>  create mode 100644 lib/arm/asm/memory_areas.h
>  create mode 100644 lib/arm64/asm/memory_areas.h
>  create mode 100644 lib/powerpc/asm/memory_areas.h
>  create mode 100644 lib/ppc64/asm/memory_areas.h
>  create mode 100644 lib/s390x/asm/memory_areas.h
>  create mode 100644 lib/x86/asm/memory_areas.h
> 
> diff --git a/lib/asm-generic/memory_areas.h b/lib/asm-generic/memory_areas.h
> new file mode 100644
> index 0000000..927baa7
> --- /dev/null
> +++ b/lib/asm-generic/memory_areas.h
> @@ -0,0 +1,11 @@
> +#ifndef MEMORY_AREAS_H

_ASM_GENERIC_MEMORY_AREAS_H_

> +#define MEMORY_AREAS_H
> +
> +#define AREA_NORMAL_PFN 0
> +#define AREA_NORMAL_NUMBER 0
> +#define AREA_NORMAL 1
> +
> +#define AREA_ANY -1
> +#define AREA_ANY_NUMBER 0xff

Do we really need both a "type number", like AREA_NORMAL, and a
"number number" (AREA_NORMAL_NUMBER)? Why not just search in the order
of the type numbers? Or in the reverse order, if that's better? Also,
would an enum be more appropriate for the type numbers?

> +
> +#endif
> diff --git a/lib/arm/asm/memory_areas.h b/lib/arm/asm/memory_areas.h
> new file mode 100644
> index 0000000..927baa7
> --- /dev/null
> +++ b/lib/arm/asm/memory_areas.h
> @@ -0,0 +1,11 @@
> +#ifndef MEMORY_AREAS_H

_ASMARM_MEMORY_AREAS_H_

We certainly don't want the same define as the generic file, as it's
possible an arch will want to simply extend the generic code, e.g.

 #ifndef _ASMARM_MEMORY_AREAS_H_
 #define _ASMARM_MEMORY_AREAS_H_
 #include #include <asm-generic/memory_areas.h>

 /* ARM memory area stuff here */

 #endif

This comment applies to the rest of memory_areas.h files. Look at
other lib/$ARCH/asm/*.h files to get the define prefix.

> +#define MEMORY_AREAS_H
> +
> +#define AREA_NORMAL_PFN 0
> +#define AREA_NORMAL_NUMBER 0
> +#define AREA_NORMAL 1
> +
> +#define AREA_ANY -1
> +#define AREA_ANY_NUMBER 0xff
> +
> +#endif
[...]
> diff --git a/lib/s390x/asm/memory_areas.h b/lib/s390x/asm/memory_areas.h
> new file mode 100644
> index 0000000..4856a27
> --- /dev/null
> +++ b/lib/s390x/asm/memory_areas.h
> @@ -0,0 +1,17 @@
> +#ifndef MEMORY_AREAS_H
> +#define MEMORY_AREAS_H
> +
> +#define AREA_NORMAL_PFN BIT(31-12)

BIT(31 - PAGE_SHIFT)

> +#define AREA_NORMAL_NUMBER 0
> +#define AREA_NORMAL 1
> +
> +#define AREA_LOW_PFN 0
> +#define AREA_LOW_NUMBER 1
> +#define AREA_LOW 2
> +
> +#define AREA_ANY -1
> +#define AREA_ANY_NUMBER 0xff
> +
> +#define AREA_DMA31 AREA_LOW

I don't work on s390x, but I'd rather not add too many aliases. I think
a single name with the min and max address bits embedded in it is
probably best. Maybe something like AREA_0_31 and AREA_31_52, or
whatever.

> +
> +#endif
> diff --git a/lib/x86/asm/memory_areas.h b/lib/x86/asm/memory_areas.h
> new file mode 100644
> index 0000000..d704df3
> --- /dev/null
> +++ b/lib/x86/asm/memory_areas.h
> @@ -0,0 +1,22 @@
> +#ifndef MEMORY_AREAS_H
> +#define MEMORY_AREAS_H
> +
> +#define AREA_NORMAL_PFN BIT(32-12)
> +#define AREA_NORMAL_NUMBER 0
> +#define AREA_NORMAL 1
> +
> +#define AREA_LOW_PFN BIT(24-12)
> +#define AREA_LOW_NUMBER 1
> +#define AREA_LOW 2
> +
> +#define AREA_LOWEST_PFN 0
> +#define AREA_LOWEST_NUMBER 2
> +#define AREA_LOWEST 4
> +
> +#define AREA_DMA24 AREA_LOWEST
> +#define AREA_DMA32 (AREA_LOWEST | AREA_LOW)

Aha, now I finally see that there's a type number and a number number
because the type number is a bit, presumably for some flag field that's
coming in a later patch. I'll have to look at the other patches to see
how that's used, but at the moment I feel like there should be another
way to represent memory areas without requiring a handful of defines
and aliases for each one.

Thanks,
drew

> +
> +#define AREA_ANY -1
> +#define AREA_ANY_NUMBER 0xff
> +
> +#endif
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions
  2020-10-02 18:18   ` Andrew Jones
@ 2020-10-05  6:57     ` Claudio Imbrenda
  0 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05  6:57 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Fri, 2 Oct 2020 20:18:44 +0200
Andrew Jones <drjones@redhat.com> wrote:

[...]
> 
> I think we should just copy the kernel's list_head much closer.

fair enough; I think I'll just throw away this patch and steal the
kernel's implementation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata
  2020-10-03  8:46   ` Andrew Jones
@ 2020-10-05  7:00     ` Claudio Imbrenda
  0 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05  7:00 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Sat, 3 Oct 2020 10:46:39 +0200
Andrew Jones <drjones@redhat.com> wrote:

[...]

> > +	/* the pointer is page-aligned, it was a multi-page
> > allocation */
> > +	m = GET_METADATA(mem);
> > +	assert(m->magic == VM_MAGIC);
> > +	assert(m->npages > 0);
> > +	/* free all the pages including the metadata page */
> > +	ptr = (uintptr_t)mem - PAGE_SIZE;
> > +	end = ptr + m->npages * PAGE_SIZE;
> > +	for ( ; ptr < end; ptr += PAGE_SIZE)
> > +		free_page(phys_to_virt(virt_to_pte_phys(page_root,
> > (void *)ptr)));
> > +	/* free the last one separately to avoid overflow issues */
> > +	free_page(phys_to_virt(virt_to_pte_phys(page_root, (void
> > *)ptr)));  
> 
> I don't get this. How is
> 
>  for (p = start; p < end; p += step)
>    process(p);
>  process(p)
> 
> different from
> 
>  for (p = start; p <= end; p += step)
>    process(p);

there was a reason at some point, I think the code evolved past it and
these lines stayed there as is

> To avoid overflow issues we should simple ensure start and end are
> computed correctly. Also, I'd prefer 'end' point to the actual end,
> not the last included page, e.g. start=0x1000, end=0x1fff. Then we
> have
> 
>  start = get_start();
>  assert(PAGE_ALIGN(start) == start);
>  end = start + nr_pages * PAGE_SIZE - 1;
>  assert(start < end);
>  for (p = start; start < end; p += PAGE_SIZE)
>    process(p);
> 
> Thanks,
> drew

yeah I'll definitely fix it

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas
  2020-10-03  9:23   ` Andrew Jones
@ 2020-10-05  7:10     ` Claudio Imbrenda
  0 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05  7:10 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Sat, 3 Oct 2020 11:23:27 +0200
Andrew Jones <drjones@redhat.com> wrote:

[...]

> > diff --git a/lib/asm-generic/memory_areas.h
> > b/lib/asm-generic/memory_areas.h new file mode 100644
> > index 0000000..927baa7
> > --- /dev/null
> > +++ b/lib/asm-generic/memory_areas.h
> > @@ -0,0 +1,11 @@
> > +#ifndef MEMORY_AREAS_H  
> 
> _ASM_GENERIC_MEMORY_AREAS_H_
> 
> > +#define MEMORY_AREAS_H
> > +
> > +#define AREA_NORMAL_PFN 0
> > +#define AREA_NORMAL_NUMBER 0
> > +#define AREA_NORMAL 1
> > +
> > +#define AREA_ANY -1
> > +#define AREA_ANY_NUMBER 0xff  
> 
> Do we really need both a "type number", like AREA_NORMAL, and a
> "number number" (AREA_NORMAL_NUMBER)? Why not just search in the order
> of the type numbers? Or in the reverse order, if that's better? Also,
> would an enum be more appropriate for the type numbers?

you understood the reason later on, but also consider that enums cannot
be extended and they need to be arch dependent, and not always have all
the elements.
 
> > +
> > +#endif
> > diff --git a/lib/arm/asm/memory_areas.h b/lib/arm/asm/memory_areas.h
> > new file mode 100644
> > index 0000000..927baa7
> > --- /dev/null
> > +++ b/lib/arm/asm/memory_areas.h
> > @@ -0,0 +1,11 @@
> > +#ifndef MEMORY_AREAS_H  
> 
> _ASMARM_MEMORY_AREAS_H_
> 
> We certainly don't want the same define as the generic file, as it's
> possible an arch will want to simply extend the generic code, e.g.
> 
>  #ifndef _ASMARM_MEMORY_AREAS_H_
>  #define _ASMARM_MEMORY_AREAS_H_
>  #include #include <asm-generic/memory_areas.h>

I see now, I guess if an arch is fine with the generic version it can
include it instead of redefining it.

I'll fix the defines and the names

>  /* ARM memory area stuff here */
> 
>  #endif
> 
> This comment applies to the rest of memory_areas.h files. Look at
> other lib/$ARCH/asm/*.h files to get the define prefix.
> 
> > +#define MEMORY_AREAS_H
> > +
> > +#define AREA_NORMAL_PFN 0
> > +#define AREA_NORMAL_NUMBER 0
> > +#define AREA_NORMAL 1
> > +
> > +#define AREA_ANY -1
> > +#define AREA_ANY_NUMBER 0xff
> > +
> > +#endif  
> [...]
> > diff --git a/lib/s390x/asm/memory_areas.h
> > b/lib/s390x/asm/memory_areas.h new file mode 100644
> > index 0000000..4856a27
> > --- /dev/null
> > +++ b/lib/s390x/asm/memory_areas.h
> > @@ -0,0 +1,17 @@
> > +#ifndef MEMORY_AREAS_H
> > +#define MEMORY_AREAS_H
> > +
> > +#define AREA_NORMAL_PFN BIT(31-12)  
> 
> BIT(31 - PAGE_SHIFT)
> 
> > +#define AREA_NORMAL_NUMBER 0
> > +#define AREA_NORMAL 1
> > +
> > +#define AREA_LOW_PFN 0
> > +#define AREA_LOW_NUMBER 1
> > +#define AREA_LOW 2
> > +
> > +#define AREA_ANY -1
> > +#define AREA_ANY_NUMBER 0xff
> > +
> > +#define AREA_DMA31 AREA_LOW  
> 
> I don't work on s390x, but I'd rather not add too many aliases. I
> think a single name with the min and max address bits embedded in it
> is probably best. Maybe something like AREA_0_31 and AREA_31_52, or
> whatever.

the aliases are arch-specific and are just to make the life easier, you
could just always use the generic macros.

the generic macros, by the way, need to be generic because they are
used in common code, and there we can't have arch-specific names

> > +
> > +#endif
> > diff --git a/lib/x86/asm/memory_areas.h b/lib/x86/asm/memory_areas.h
> > new file mode 100644
> > index 0000000..d704df3
> > --- /dev/null
> > +++ b/lib/x86/asm/memory_areas.h
> > @@ -0,0 +1,22 @@
> > +#ifndef MEMORY_AREAS_H
> > +#define MEMORY_AREAS_H
> > +
> > +#define AREA_NORMAL_PFN BIT(32-12)
> > +#define AREA_NORMAL_NUMBER 0
> > +#define AREA_NORMAL 1
> > +
> > +#define AREA_LOW_PFN BIT(24-12)
> > +#define AREA_LOW_NUMBER 1
> > +#define AREA_LOW 2
> > +
> > +#define AREA_LOWEST_PFN 0
> > +#define AREA_LOWEST_NUMBER 2
> > +#define AREA_LOWEST 4
> > +
> > +#define AREA_DMA24 AREA_LOWEST
> > +#define AREA_DMA32 (AREA_LOWEST | AREA_LOW)  
> 
> Aha, now I finally see that there's a type number and a number number
> because the type number is a bit, presumably for some flag field
> that's coming in a later patch. I'll have to look at the other

I will fix the patch description to make it clear

> patches to see how that's used, but at the moment I feel like there
> should be another way to represent memory areas without requiring a
> handful of defines and aliases for each one.

I think this is the most straightforward way, even though it might not
necessarily look very clean, but... suggestions welcome :)

> Thanks,
> drew
> 
> > +
> > +#define AREA_ANY -1
> > +#define AREA_ANY_NUMBER 0xff
> > +
> > +#endif
> > -- 
> > 2.26.2
> >   
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
                   ` (6 preceding siblings ...)
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 7/7] lib/alloc_page: allow reserving arbitrary memory ranges Claudio Imbrenda
@ 2020-10-05 11:54 ` Pierre Morel
  2020-10-05 12:35   ` Claudio Imbrenda
  2020-11-06 11:36 ` Paolo Bonzini
  8 siblings, 1 reply; 42+ messages in thread
From: Pierre Morel @ 2020-10-05 11:54 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm, pbonzini; +Cc: frankja, david, thuth, cohuck, lvivier



On 2020-10-02 17:44, Claudio Imbrenda wrote:
> The KVM unit tests are increasingly being used to test more than just
> KVM. They are being used to test TCG, qemu I/O device emulation, other
> hypervisors, and even actual hardware.
> 
> The existing memory allocators are becoming more and more inadequate to
> the needs of the upcoming unit tests (but also some existing ones, see
> below).
> 
> Some important features that are lacking:
> * ability to perform a small physical page allocation with a big
>    alignment withtout wasting huge amounts of memory
> * ability to allocate physical pages from specific pools/areaas (e.g.
>    below 16M, or 4G, etc)
> * ability to reserve arbitrary pages (if free), removing them from the
>    free pool
> 
> Some other features that are nice, but not so fundamental:
> * no need for the generic allocator to keep track of metadata
>    (i.e. allocation size), this is now handled by the lower level
>    allocators
> * coalescing small blocks into bigger ones, to allow contiguous memory
>    freed in small blocks in a random order to be used for large
>    allocations again
> 
> This is achieved in the following ways:
> 
> For the virtual allocator:
> * only the virtul allocator needs one extra page of metadata, but only
>    for allocations that wouldn't fit in one page
> 
> For the page allocator:
> * page allocator has up to 6 memory pools, each pool has a metadata
>    area; the metadata has a byte for each page in the area, describing
>    the order of the block it belongs to, and whether it is free
> * if there are no free blocks of the desired size, a bigger block is
>    split until we reach the required size; the unused parts of the block
>    are put back in the free lists
> * if an allocation needs ablock with a larger alignment than its size, a
>    larger block of (at least) the required order is split; the unused parts
>    put back in the appropriate free lists
> * if the allocation could not be satisfied, the next allowed area is
>    searched; the allocation fails only when all allowed areas have been
>    tried
> * new functions to perform allocations from specific areas; the areas
>    are arch-dependent and should be set up by the arch code
> * for now x86 has a memory area for "lowest" memory under 16MB, one for
>    "low" memory under 4GB and one for the rest, while s390x has one for under
>    2GB and one for the rest; suggestions for more fine grained areas or for
>    the other architectures are welcome


While doing a page allocator, the topology is not the only 
characteristic we may need to specify.
Specific page characteristics like rights, access flags, cache behavior 
may be useful when testing I/O for some architectures.
This obviously will need some connection to the MMU handling.

Wouldn't it be interesting to use a bitmap flag as argument to 
page_alloc() to define separate regions, even if the connection with the 
MMU is done in a future series?

Regards,
Pierre

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
  2020-10-05 11:54 ` [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Pierre Morel
@ 2020-10-05 12:35   ` Claudio Imbrenda
  2020-10-05 12:49     ` Andrew Jones
  2020-10-05 12:57     ` Pierre Morel
  0 siblings, 2 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05 12:35 UTC (permalink / raw)
  To: Pierre Morel; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, 5 Oct 2020 13:54:42 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

[...]

> While doing a page allocator, the topology is not the only 
> characteristic we may need to specify.
> Specific page characteristics like rights, access flags, cache
> behavior may be useful when testing I/O for some architectures.
> This obviously will need some connection to the MMU handling.
> 
> Wouldn't it be interesting to use a bitmap flag as argument to 
> page_alloc() to define separate regions, even if the connection with
> the MMU is done in a future series?

the physical allocator is only concerned with the physical pages. if
you need special MMU flags to be set, then you should enable the MMU
and fiddle with the flags and settings yourself.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator Claudio Imbrenda
@ 2020-10-05 12:40   ` Andrew Jones
  2020-10-05 15:56     ` Claudio Imbrenda
  2020-12-08  0:41   ` Nadav Amit
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2020-10-05 12:40 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Fri, Oct 02, 2020 at 05:44:17PM +0200, Claudio Imbrenda wrote:
> This is a complete rewrite of the page allocator.
> 
> This will bring a few improvements:
> * no need to specify the size when freeing
> * allocate small areas with a large alignment without wasting memory
> * ability to initialize and use multiple memory areas (e.g. DMA)
> * more sanity checks
> 
> A few things have changed:
> * initialization cannot be done with free_pages like before,
>   page_alloc_init_area has to be used instead
> 
> Arch-specific changes:
> * s390x now uses the area below 2GiB for SMP lowcore initialization.
> 
> Details:
> Each memory area has metadata at the very beginning. The metadata is a
> byte array with one entry per usable page (so, excluding the metadata
> itself). Each entry indicates if the page is special (unused for now),
> if it is allocated, and the order of the block. Both free and allocated
> pages are part of larger blocks.
> 
> Some more fixed size metadata is present in a fixed-size static array.
> This metadata contains start and end page frame numbers, the pointer to
> the metadata array, and the array of freelists. The array of freelists
> has an entry for each possible order (indicated by the macro NLISTS,
> defined as BITS_PER_LONG - PAGE_SHIFT).
> 
> On allocation, if the free list for the needed size is empty, larger
> blocks are split. When a small allocation with a large alignment is
> requested, an appropriately large block is split, to guarantee the
> alignment.
> 
> When a block is freed, an attempt will be made to merge it into the
> neighbour, iterating the process as long as possible.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  lib/alloc_page.h |  63 +++++-
>  lib/alloc_page.c | 484 +++++++++++++++++++++++++++++++++++++----------
>  lib/arm/setup.c  |   2 +-
>  lib/s390x/sclp.c |   6 +-
>  lib/s390x/smp.c  |   2 +-
>  lib/vmalloc.c    |  13 +-
>  6 files changed, 453 insertions(+), 117 deletions(-)
> 
> diff --git a/lib/alloc_page.h b/lib/alloc_page.h
> index 88540d1..81847ae 100644
> --- a/lib/alloc_page.h
> +++ b/lib/alloc_page.h
> @@ -8,12 +8,71 @@
>  #ifndef ALLOC_PAGE_H
>  #define ALLOC_PAGE_H 1
>  
> +#include <asm/memory_areas.h>
> +
> +/* Returns true if the page allocator has been initialized */
>  bool page_alloc_initialized(void);
> +
> +/*
> + * Initializes a memory area.
> + * n is the number of the area to initialize
> + * base_pfn is the physical frame number of the start of the area to initialize
> + * top_pfn is the physical frame number of the first page immediately after
> + * the end of the area to initialize
> + */
> +void page_alloc_init_area(u8 n, uintptr_t base_pfn, uintptr_t top_pfn);
> +
> +/* Enables the page allocator. At least one area must have been initialized */
>  void page_alloc_ops_enable(void);
> +
> +/*
> + * Allocate aligned memory from the specified areas.
> + * areas is a bitmap of allowed areas
> + * alignment must be a power of 2
> + */
> +void *memalign_pages_area(unsigned int areas, size_t alignment, size_t size);

I guess 'size' is the number of pages? Or is it bytes that automatically
get rounded up to a multiple of PAGE_SIZE?

> +
> +/*
> + * Allocate aligned memory from any area.
> + * Equivalent to memalign_pages_area(~0, alignment, size).
> + */
> +void *memalign_pages(size_t alignment, size_t size);
> +
> +/*
> + * Allocate naturally aligned memory from the specified areas.
> + * Equivalent to memalign_pages_area(areas, 1ull << order, 1ull << order).
> + */
> +void *alloc_pages_area(unsigned int areas, unsigned int order);
> +
> +/*
> + * Allocate one page from any area.
> + * Equivalent to alloc_pages(0);
> + */
>  void *alloc_page(void);
> +
> +/*
> + * Allocate naturally aligned memory from any area.
> + * Equivalent to alloc_pages_area(~0, order);
> + */
>  void *alloc_pages(unsigned int order);
> -void free_page(void *page);
> +
> +/*
> + * Frees a memory block allocated with any of the memalign_pages* or
> + * alloc_pages* functions.
> + * The pointer must point to the start of the block.
> + */
>  void free_pages(void *mem, size_t size);
> -void free_pages_by_order(void *mem, unsigned int order);
> +
> +/* For backwards compatibility */
> +static inline void free_page(void *mem)
> +{
> +	return free_pages(mem, 1);
> +}
> +
> +/* For backwards compatibility */

We don't preserve interfaces for backwards compatibility in
kvm-unit-tests. If the interface is no longer a good idea,
then we can change all the call sites. However, in this case,
it is a good idea, as it balances alloc_page(), so I'd just
write "Equivalent to free_pages(mem, 1)".

> +static inline void free_pages_by_order(void *mem, unsigned int order)
> +{
> +	free_pages(mem, 1ull << order);
> +}
>  
>  #endif
> diff --git a/lib/alloc_page.c b/lib/alloc_page.c
> index 74fe726..29d221f 100644
> --- a/lib/alloc_page.c
> +++ b/lib/alloc_page.c
> @@ -9,169 +9,445 @@
>  #include "alloc_phys.h"
>  #include "alloc_page.h"
>  #include "bitops.h"
> +#include "list.h"
>  #include <asm/page.h>
>  #include <asm/io.h>
>  #include <asm/spinlock.h>
> +#include <asm/memory_areas.h>
>  
> +#define IS_ALIGNED_ORDER(x,order) IS_ALIGNED((x),BIT_ULL(order))
> +#define NLISTS ((BITS_PER_LONG) - (PAGE_SHIFT))

nit: too many ()

> +#define PFN(x) ((uintptr_t)(x) >> PAGE_SHIFT)
> +
> +#define MAX_AREAS	6

This looks like something that should be defined memory_areas.h.

> +
> +#define ORDER_MASK	0x3f
> +#define ALLOC_MASK	0x40
> +
> +struct mem_area {
> +	/* Physical frame number of the first usable frame in the area */
> +	uintptr_t base;
> +	/* Physical frame number of the first frame outside the area */
> +	uintptr_t top;
> +	/* Combination ALLOC_MASK and order */
> +	u8 *page_states;

I can't tell what page_states is supposed to be by its name nor the
comment.

> +	/* One freelist for each possible block size, up to NLISTS */
> +	struct linked_list freelists[NLISTS];
> +};
> +
> +static struct mem_area areas[MAX_AREAS];
> +static unsigned int areas_mask;
>  static struct spinlock lock;
> -static void *freelist = 0;
>  
>  bool page_alloc_initialized(void)
>  {
> -	return freelist != 0;
> +	return areas_mask != 0;
>  }
>  
> -void free_pages(void *mem, size_t size)
> +static inline bool area_or_metadata_contains(struct mem_area *a, uintptr_t pfn)

I can't tell what this function is suppose to return by its name and it
has no comment.

>  {
> -	void *old_freelist;
> -	void *end;
> +	return (pfn >= PFN(a->page_states)) && (pfn < a->top);

Why are we comparing an input pfn with the page_states virtual
pfn? I guess you want 'pfn >= base' ?

> +}
>  
> -	assert_msg((unsigned long) mem % PAGE_SIZE == 0,
> -		   "mem not page aligned: %p", mem);
> +static inline bool area_contains(struct mem_area *a, uintptr_t pfn)

Please complete the sentence in these function names, e.g.
area_contains_pfn(), which appears to be what this function
is checking. I still don't understand what the
area_or_metadata_contains() function is supposed to be checking
though.

> +{
> +	return (pfn >= a->base) && (pfn < a->top);
> +}
>  
> -	assert_msg(size % PAGE_SIZE == 0, "size not page aligned: %#zx", size);
> +/*
> + * Splits the free block starting at addr into 2 blocks of half the size.
> + *
> + * The function depends on the following assumptions:
> + * - The allocator must have been initialized
> + * - the block must be within the memory area
> + * - all pages in the block must be free and not special
> + * - the pointer must point to the start of the block
> + * - all pages in the block must have the same block size.

pages should all have the same page size, no? How does a
page have a block size?

> + * - the block size must be greater than 0
> + * - the block size must be smaller than the maximum allowed
> + * - the block must be in a free list
> + * - the function is called with the lock held
> + */
> +static void split(struct mem_area *a, void *addr)
> +{
> +	uintptr_t pfn = PFN(addr);

addr appears to be virtual, since it's a 'void *', not a phys_addr_t.
PFN() only does a shift, so that's the virtual PFN. However we're
comparing that with memory area pfn's, which are supposed to be
physical frame numbers, at least according to the comments in the
struct.

> +	struct linked_list *p;
> +	uintptr_t i, idx;
> +	u8 order;
>  
> -	assert_msg(size == 0 || (uintptr_t)mem == -size ||
> -		   (uintptr_t)mem + size > (uintptr_t)mem,
> -		   "mem + size overflow: %p + %#zx", mem, size);
> +	assert(a && area_contains(a, pfn));
> +	idx = pfn - a->base;
> +	order = a->page_states[idx];

pfn - a->base could potentially be huge, so we can't be using that
as an array index.

I think what you want is something like

 pfn = PFN(virt_to_phys(addr));
 order = ffs(pfn) - 1;
 if (pfn != order)
   order <<= 1;

Then you have order and can use it as a index into an array if necessary.

> +	assert(!(order & ~ORDER_MASK) && order && (order < NLISTS));

What's wrong with order==0? What is ORDER_MASK good for?

> +	assert(IS_ALIGNED_ORDER(pfn, order));

I'm getting the feeling that split() shouldn't operate on an addr, but
rather some internal index of a block list or something. There are
just too many requirements as to what addr is supposed to be.

> +	assert(area_contains(a, pfn + BIT(order) - 1));
>  
> -	if (size == 0) {
> -		freelist = NULL;
> -		return;
> -	}
> +	/* Remove the block from its free list */
> +	p = list_remove(addr);

So addr is a pointer to a linked_list object? I'm getting really confused.

My editor says I'm only 30% of my way through this patch, so I don't think
I'm going to have time to look at the rest. I think this patch is trying
to do too much at once. IMO, we need to implement a single new feature at
a time. Possibly starting with some cleanup patches for the current code
first.

Thanks,
drew

> +	assert(p);
>  
> -	spin_lock(&lock);
> -	old_freelist = freelist;
> -	freelist = mem;
> -	end = mem + size;
> -	while (mem + PAGE_SIZE != end) {
> -		*(void **)mem = (mem + PAGE_SIZE);
> -		mem += PAGE_SIZE;
> +	/* update the block size for each page in the block */
> +	for (i = 0; i < BIT(order); i++) {
> +		assert(a->page_states[idx + i] == order);
> +		a->page_states[idx + i] = order - 1;
>  	}
> -
> -	*(void **)mem = old_freelist;
> -	spin_unlock(&lock);
> +	order--;
> +	/* add the first half block to the appropriate free list */
> +	list_add(a->freelists + order, p);
> +	/* add the second half block to the appropriate free list */
> +	list_add(a->freelists + order, (void *)((pfn + BIT(order)) * PAGE_SIZE));
>  }
>  
> -void free_pages_by_order(void *mem, unsigned int order)
> +/*
> + * Returns a block whose alignment and size are at least the parameter values.
> + * If there is not enough free memory, NULL is returned.
> + *
> + * Both parameters must be not larger than the largest allowed order
> + */
> +static void *page_memalign_order(struct mem_area *a, u8 al, u8 sz)
>  {
> -	free_pages(mem, 1ul << (order + PAGE_SHIFT));
> +	struct linked_list *p, *res = NULL;
> +	u8 order;
> +
> +	assert((al < NLISTS) && (sz < NLISTS));
> +	/* we need the bigger of the two as starting point */
> +	order = sz > al ? sz : al;
> +
> +	/* search all free lists for some memory */
> +	for ( ; order < NLISTS; order++) {
> +		p = a->freelists[order].next;
> +		if (!is_list_empty(p))
> +			break;
> +	}
> +	/* out of memory */
> +	if (order >= NLISTS)
> +		return NULL;
> +
> +	/*
> +	 * the block is bigger than what we need because either there were
> +	 * no smaller blocks, or the smaller blocks were not aligned to our
> +	 * needs; therefore we split the block until we reach the needed size
> +	 */
> +	for (; order > sz; order--)
> +		split(a, p);
> +
> +	res = list_remove(p);
> +	memset(a->page_states + (PFN(res) - a->base), ALLOC_MASK | order, BIT(order));
> +	return res;
>  }
>  
> -void *alloc_page()
> +/*
> + * Try to merge two blocks into a bigger one.
> + * Returns true in case of a successful merge.
> + * Merging will succeed only if both blocks have the same block size and are
> + * both free.
> + *
> + * The function depends on the following assumptions:
> + * - the first parameter is strictly smaller than the second
> + * - the parameters must point each to the start of their block
> + * - the two parameters point to adjacent blocks
> + * - the two blocks are both in a free list
> + * - all of the pages of the two blocks must be free
> + * - all of the pages of the two blocks must have the same block size
> + * - the function is called with the lock held
> + */
> +static bool coalesce(struct mem_area *a, u8 order, uintptr_t pfn, uintptr_t pfn2)
>  {
> -	void *p;
> +	uintptr_t first, second, i;
> +	struct linked_list *li;
>  
> -	if (!freelist)
> -		return 0;
> +	assert(IS_ALIGNED_ORDER(pfn, order) && IS_ALIGNED_ORDER(pfn2, order));
> +	assert(pfn2 == pfn + BIT(order));
> +	assert(a);
>  
> -	spin_lock(&lock);
> -	p = freelist;
> -	freelist = *(void **)freelist;
> -	spin_unlock(&lock);
> +	/* attempting to coalesce two blocks that belong to different areas */
> +	if (!area_contains(a, pfn) || !area_contains(a, pfn2 + BIT(order) - 1))
> +		return false;
> +	first = pfn - a->base;
> +	second = pfn2 - a->base;
> +	/* the two blocks have different sizes, cannot coalesce */
> +	if ((a->page_states[first] != order) || (a->page_states[second] != order))
> +		return false;
>  
> -	if (p)
> -		memset(p, 0, PAGE_SIZE);
> -	return p;
> +	/* we can coalesce, remove both blocks from their freelists */
> +	li = list_remove((void *)(pfn2 << PAGE_SHIFT));
> +	assert(li);
> +	li = list_remove((void *)(pfn << PAGE_SHIFT));
> +	assert(li);
> +	/* check the metadata entries and update with the new size */
> +	for (i = 0; i < (2ull << order); i++) {
> +		assert(a->page_states[first + i] == order);
> +		a->page_states[first + i] = order + 1;
> +	}
> +	/* finally add the newly coalesced block to the appropriate freelist */
> +	list_add(a->freelists + order + 1, li);
> +	return true;
>  }
>  
>  /*
> - * Allocates (1 << order) physically contiguous and naturally aligned pages.
> - * Returns NULL if there's no memory left.
> + * Free a block of memory.
> + * The parameter can be NULL, in which case nothing happens.
> + *
> + * The function depends on the following assumptions:
> + * - the parameter is page aligned
> + * - the parameter belongs to an existing memory area
> + * - the parameter points to the beginning of the block
> + * - the size of the block is less than the maximum allowed
> + * - the block is completely contained in its memory area
> + * - all pages in the block have the same block size
> + * - no pages in the memory block were already free
> + * - no pages in the memory block are special
>   */
> -void *alloc_pages(unsigned int order)
> +static void _free_pages(void *mem)
>  {
> -	/* Generic list traversal. */
> -	void *prev;
> -	void *curr = NULL;
> -	void *next = freelist;
> -
> -	/* Looking for a run of length (1 << order). */
> -	unsigned long run = 0;
> -	const unsigned long n = 1ul << order;
> -	const unsigned long align_mask = (n << PAGE_SHIFT) - 1;
> -	void *run_start = NULL;
> -	void *run_prev = NULL;
> -	unsigned long run_next_pa = 0;
> -	unsigned long pa;
> +	uintptr_t pfn2, pfn = PFN(mem);
> +	struct mem_area *a = NULL;
> +	uintptr_t i, p;
> +	u8 order;
>  
> -	assert(order < sizeof(unsigned long) * 8);
> +	if (!mem)
> +		return;
> +	assert(IS_ALIGNED((uintptr_t)mem, PAGE_SIZE));
>  
> -	spin_lock(&lock);
> -	for (;;) {
> -		prev = curr;
> -		curr = next;
> +	/* find which area this pointer belongs to*/
> +	for (i = 0; !a && (i < MAX_AREAS); i++) {
> +		if ((areas_mask & BIT(i)) && area_contains(areas + i, pfn))
> +			a = areas + i;
> +	}
> +	assert_msg(a, "memory does not belong to any area: %p", mem);
>  
> -		if (!curr) {
> -			run_start = NULL;
> -			break;
> -		}
> +	p = pfn - a->base;
> +	order = a->page_states[p] & ORDER_MASK;
>  
> -		next = *((void **) curr);
> -		pa = virt_to_phys(curr);
> -
> -		if (run == 0) {
> -			if (!(pa & align_mask)) {
> -				run_start = curr;
> -				run_prev = prev;
> -				run_next_pa = pa + PAGE_SIZE;
> -				run = 1;
> -			}
> -		} else if (pa == run_next_pa) {
> -			run_next_pa += PAGE_SIZE;
> -			run += 1;
> -		} else {
> -			run = 0;
> -		}
> +	/* ensure that the first page is allocated and not special */
> +	assert(a->page_states[p] == (order | ALLOC_MASK));
> +	/* ensure that the order has a sane value */
> +	assert(order < NLISTS);
> +	/* ensure that the block is aligned properly for its size */
> +	assert(IS_ALIGNED_ORDER(pfn, order));
> +	/* ensure that the area can contain the whole block */
> +	assert(area_contains(a, pfn + BIT(order) - 1));
>  
> -		if (run == n) {
> -			if (run_prev)
> -				*((void **) run_prev) = next;
> -			else
> -				freelist = next;
> -			break;
> -		}
> +	for (i = 0; i < BIT(order); i++) {
> +		/* check that all pages of the block have consistent metadata */
> +		assert(a->page_states[p + i] == (ALLOC_MASK | order));
> +		/* set the page as free */
> +		a->page_states[p + i] &= ~ALLOC_MASK;
>  	}
> -	spin_unlock(&lock);
> -	if (run_start)
> -		memset(run_start, 0, n * PAGE_SIZE);
> -	return run_start;
> +	/* provisionally add the block to the appropriate free list */
> +	list_add(a->freelists + order, mem);
> +	/* try to coalesce the block with neighbouring blocks if possible */
> +	do {
> +		/*
> +		 * get the order again since it might have changed after
> +		 * coalescing in a previous iteration
> +		 */
> +		order = a->page_states[p] & ORDER_MASK;
> +		/*
> +		 * let's consider this block and the next one if this block
> +		 * is aligned to the next size, otherwise let's consider the
> +		 * previous block and this one
> +		 */
> +		if (!IS_ALIGNED_ORDER(pfn, order + 1))
> +			pfn = pfn - BIT(order);
> +		pfn2 = pfn + BIT(order);
> +		/* repeat as long as we manage to coalesce something */
> +	} while (coalesce(a, order, pfn, pfn2));
>  }
>  
> +void free_pages(void *mem, size_t size)
> +{
> +	spin_lock(&lock);
> +	_free_pages(mem);
> +	spin_unlock(&lock);
> +}
>  
> -void free_page(void *page)
> +static void *page_memalign_order_area(unsigned area, u8 ord, u8 al)
>  {
> +	void *res = NULL;
> +	int i;
> +
>  	spin_lock(&lock);
> -	*(void **)page = freelist;
> -	freelist = page;
> +	area &= areas_mask;
> +	for (i = 0; !res && (i < MAX_AREAS); i++)
> +		if (area & BIT(i))
> +			res = page_memalign_order(areas + i, ord, al);
>  	spin_unlock(&lock);
> +	return res;
>  }
>  
> -static void *page_memalign(size_t alignment, size_t size)
> +/*
> + * Allocates (1 << order) physically contiguous and naturally aligned pages.
> + * Returns NULL if the allocation was not possible.
> + */
> +void *alloc_pages_area(unsigned int area, unsigned int order)
>  {
> -	unsigned long n = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
> -	unsigned int order;
> +	return page_memalign_order_area(area, order, order);
> +}
>  
> -	if (!size)
> -		return NULL;
> +void *alloc_pages(unsigned int order)
> +{
> +	return alloc_pages_area(AREA_ANY, order);
> +}
>  
> -	order = get_order(n);
> +/*
> + * Allocates (1 << order) physically contiguous aligned pages.
> + * Returns NULL if the allocation was not possible.
> + */
> +void *memalign_pages_area(unsigned int area, size_t alignment, size_t size)
> +{
> +	assert(is_power_of_2(alignment));
> +	alignment = get_order(PAGE_ALIGN(alignment) >> PAGE_SHIFT);
> +	size = get_order(PAGE_ALIGN(size) >> PAGE_SHIFT);
> +	assert(alignment < NLISTS);
> +	assert(size < NLISTS);
> +	return page_memalign_order_area(area, size, alignment);
> +}
>  
> -	return alloc_pages(order);
> +void *memalign_pages(size_t alignment, size_t size)
> +{
> +	return memalign_pages_area(AREA_ANY, alignment, size);
>  }
>  
> -static void page_free(void *mem, size_t size)
> +/*
> + * Allocates one page
> + */
> +void *alloc_page()
>  {
> -	free_pages(mem, size);
> +	return alloc_pages(0);
>  }
>  
>  static struct alloc_ops page_alloc_ops = {
> -	.memalign = page_memalign,
> -	.free = page_free,
> +	.memalign = memalign_pages,
> +	.free = free_pages,
>  	.align_min = PAGE_SIZE,
>  };
>  
> +/*
> + * Enables the page allocator.
> + *
> + * Prerequisites:
> + * - at least one memory area has been initialized
> + */
>  void page_alloc_ops_enable(void)
>  {
> +	spin_lock(&lock);
> +	assert(page_alloc_initialized());
>  	alloc_ops = &page_alloc_ops;
> +	spin_unlock(&lock);
> +}
> +
> +/*
> + * Adds a new memory area to the pool of available memory.
> + *
> + * Prerequisites:
> + * - the lock is held
> + * - start and top are page frame numbers
> + * - start is smaller than top
> + * - top does not fall outside of addressable memory
> + * - there is at least one more slot free for memory areas
> + * - if a specific memory area number has been indicated, it needs to be free
> + * - the memory area to add does not overlap with existing areas
> + * - the memory area to add has at least 5 pages available
> + */
> +static void _page_alloc_init_area(u8 n, uintptr_t start_pfn, uintptr_t top_pfn)
> +{
> +	size_t table_size, npages, i;
> +	struct mem_area *a;
> +	u8 order = 0;
> +
> +	/* the number must be within the allowed range */
> +	assert(n < MAX_AREAS);
> +	/* the new area number must be unused */
> +	assert(!(areas_mask & BIT(n)));
> +
> +	/* other basic sanity checks */
> +	assert(top_pfn > start_pfn);
> +	assert(top_pfn - start_pfn > 4);
> +	assert(top_pfn < BIT_ULL(sizeof(void *) * 8 - PAGE_SHIFT));
> +
> +	/* calculate the size of the metadata table in pages */
> +	table_size = (top_pfn - start_pfn + PAGE_SIZE) / (PAGE_SIZE + 1);
> +
> +	/* fill in the values of the new area */
> +	a = areas + n;
> +	a->page_states = (void *)(start_pfn << PAGE_SHIFT);
> +	a->base = start_pfn + table_size;
> +	a->top = top_pfn;
> +	npages = top_pfn - a->base;
> +	assert((a->base - start_pfn) * PAGE_SIZE >= npages);
> +
> +	/* check that the new area does not overlap with any existing areas */
> +	for (i = 0; i < MAX_AREAS; i++) {
> +		if (!(areas_mask & BIT(i)))
> +			continue;
> +		assert(!area_or_metadata_contains(areas + i, start_pfn));
> +		assert(!area_or_metadata_contains(areas + i, top_pfn - 1));
> +		assert(!area_or_metadata_contains(a, PFN(areas[i].page_states)));
> +		assert(!area_or_metadata_contains(a, areas[i].top - 1));
> +	}
> +	/* initialize all freelists for the new area */
> +	for (i = 0; i < NLISTS; i++)
> +		a->freelists[i].next = a->freelists[i].prev = a->freelists + i;
> +
> +	/* initialize the metadata for the available memory */
> +	for (i = a->base; i < a->top; i += 1ull << order) {
> +		/* search which order to start from */
> +		while (i + BIT(order) > a->top) {
> +			assert(order);
> +			order--;
> +		}
> +		/*
> +		 * we need both loops, one for the start and the other for
> +		 * the end of the block, in case it spans a power of two
> +		 * boundary
> +		 */
> +		while (IS_ALIGNED_ORDER(i, order + 1) && (i + BIT(order + 1) <= a->top))
> +			order++;
> +		assert(order < NLISTS);
> +		/* initialize the metadata and add to the freelist */
> +		memset(a->page_states + (i - a->base), order, BIT(order));
> +		list_add(a->freelists + order, (void *)(i << PAGE_SHIFT));
> +	}
> +	/* finally mark the area as present */
> +	areas_mask |= BIT(n);
> +}
> +
> +static void __page_alloc_init_area(u8 n, uintptr_t cutoff, uintptr_t base_pfn, uintptr_t *top_pfn)
> +{
> +	if (*top_pfn > cutoff) {
> +		spin_lock(&lock);
> +		if (base_pfn >= cutoff) {
> +			_page_alloc_init_area(n, base_pfn, *top_pfn);
> +			*top_pfn = 0;
> +		} else {
> +			_page_alloc_init_area(n, cutoff, *top_pfn);
> +			*top_pfn = cutoff;
> +		}
> +		spin_unlock(&lock);
> +	}
> +}
> +
> +/*
> + * Adds a new memory area to the pool of available memory.
> + *
> + * Prerequisites:
> + * see _page_alloc_init_area
> + */
> +void page_alloc_init_area(u8 n, uintptr_t base_pfn, uintptr_t top_pfn)
> +{
> +	if (n != AREA_ANY_NUMBER) {
> +		__page_alloc_init_area(n, 0, base_pfn, &top_pfn);
> +		return;
> +	}
> +#ifdef AREA_HIGH_PFN
> +	__page_alloc_init_area(AREA_HIGH_NUMBER, AREA_HIGH_PFN), base_pfn, &top_pfn);
> +#endif
> +	__page_alloc_init_area(AREA_NORMAL_NUMBER, AREA_NORMAL_PFN, base_pfn, &top_pfn);
> +#ifdef AREA_LOW_PFN
> +	__page_alloc_init_area(AREA_LOW_NUMBER, AREA_LOW_PFN, base_pfn, &top_pfn);
> +#endif
> +#ifdef AREA_LOWEST_PFN
> +	__page_alloc_init_area(AREA_LOWEST_NUMBER, AREA_LOWEST_PFN, base_pfn, &top_pfn);
> +#endif
>  }
> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> index 78562e4..3f03ca6 100644
> --- a/lib/arm/setup.c
> +++ b/lib/arm/setup.c
> @@ -155,7 +155,7 @@ static void mem_init(phys_addr_t freemem_start)
>  	assert(sizeof(long) == 8 || !(base >> 32));
>  	if (sizeof(long) != 8 && (top >> 32) != 0)
>  		top = ((uint64_t)1 << 32);
> -	free_pages((void *)(unsigned long)base, top - base);
> +	page_alloc_init_area(0, base >> PAGE_SHIFT, top >> PAGE_SHIFT);
>  	page_alloc_ops_enable();
>  }
>  
> diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
> index 4054d0e..4e2ac18 100644
> --- a/lib/s390x/sclp.c
> +++ b/lib/s390x/sclp.c
> @@ -37,11 +37,11 @@ static void mem_init(phys_addr_t mem_end)
>  
>  	phys_alloc_init(freemem_start, mem_end - freemem_start);
>  	phys_alloc_get_unused(&base, &top);
> -	base = (base + PAGE_SIZE - 1) & -PAGE_SIZE;
> -	top = top & -PAGE_SIZE;
> +	base = PAGE_ALIGN(base) >> PAGE_SHIFT;
> +	top = top >> PAGE_SHIFT;
>  
>  	/* Make the pages available to the physical allocator */
> -	free_pages((void *)(unsigned long)base, top - base);
> +	page_alloc_init_area(AREA_ANY_NUMBER, base, top);
>  	page_alloc_ops_enable();
>  }
>  
> diff --git a/lib/s390x/smp.c b/lib/s390x/smp.c
> index 2860e9c..ea93329 100644
> --- a/lib/s390x/smp.c
> +++ b/lib/s390x/smp.c
> @@ -190,7 +190,7 @@ int smp_cpu_setup(uint16_t addr, struct psw psw)
>  
>  	sigp_retry(cpu->addr, SIGP_INITIAL_CPU_RESET, 0, NULL);
>  
> -	lc = alloc_pages(1);
> +	lc = alloc_pages_area(AREA_DMA31, 1);
>  	cpu->lowcore = lc;
>  	memset(lc, 0, PAGE_SIZE * 2);
>  	sigp_retry(cpu->addr, SIGP_SET_PREFIX, (unsigned long )lc, NULL);
> diff --git a/lib/vmalloc.c b/lib/vmalloc.c
> index 2f25734..3aec5ac 100644
> --- a/lib/vmalloc.c
> +++ b/lib/vmalloc.c
> @@ -217,18 +217,19 @@ void setup_vm()
>  	 * so that it can be used to allocate page tables.
>  	 */
>  	if (!page_alloc_initialized()) {
> -		base = PAGE_ALIGN(base);
> -		top = top & -PAGE_SIZE;
> -		free_pages(phys_to_virt(base), top - base);
> +		base = PAGE_ALIGN(base) >> PAGE_SHIFT;
> +		top = top >> PAGE_SHIFT;
> +		page_alloc_init_area(AREA_ANY_NUMBER, base, top);
> +		page_alloc_ops_enable();
>  	}
>  
>  	find_highmem();
>  	phys_alloc_get_unused(&base, &top);
>  	page_root = setup_mmu(top);
>  	if (base != top) {
> -		base = PAGE_ALIGN(base);
> -		top = top & -PAGE_SIZE;
> -		free_pages(phys_to_virt(base), top - base);
> +		base = PAGE_ALIGN(base) >> PAGE_SHIFT;
> +		top = top >> PAGE_SHIFT;
> +		page_alloc_init_area(AREA_ANY_NUMBER, base, top);
>  	}
>  
>  	spin_lock(&lock);
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
  2020-10-05 12:35   ` Claudio Imbrenda
@ 2020-10-05 12:49     ` Andrew Jones
  2020-10-05 12:57     ` Pierre Morel
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Jones @ 2020-10-05 12:49 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Pierre Morel, kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, Oct 05, 2020 at 02:35:03PM +0200, Claudio Imbrenda wrote:
> On Mon, 5 Oct 2020 13:54:42 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
> [...]
> 
> > While doing a page allocator, the topology is not the only 
> > characteristic we may need to specify.
> > Specific page characteristics like rights, access flags, cache
> > behavior may be useful when testing I/O for some architectures.
> > This obviously will need some connection to the MMU handling.
> > 
> > Wouldn't it be interesting to use a bitmap flag as argument to 
> > page_alloc() to define separate regions, even if the connection with
> > the MMU is done in a future series?
> 
> the physical allocator is only concerned with the physical pages. if
> you need special MMU flags to be set, then you should enable the MMU
> and fiddle with the flags and settings yourself.
>

Given enough need, we could create a collection of functions like

 alloc_pages_ro()
 alloc_pages_uncached()
 ...

These functions wouldn't have generic implementations, only arch-specific
implementations, and those implementations would simply do a typical
allocation, followed by an iteration of each PTE where the arch-specific
flags get set.

Thanks,
drew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
  2020-10-05 12:35   ` Claudio Imbrenda
  2020-10-05 12:49     ` Andrew Jones
@ 2020-10-05 12:57     ` Pierre Morel
  2020-10-05 14:59       ` Claudio Imbrenda
  1 sibling, 1 reply; 42+ messages in thread
From: Pierre Morel @ 2020-10-05 12:57 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier



On 2020-10-05 14:35, Claudio Imbrenda wrote:
> On Mon, 5 Oct 2020 13:54:42 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
> [...]
> 
>> While doing a page allocator, the topology is not the only
>> characteristic we may need to specify.
>> Specific page characteristics like rights, access flags, cache
>> behavior may be useful when testing I/O for some architectures.
>> This obviously will need some connection to the MMU handling.
>>
>> Wouldn't it be interesting to use a bitmap flag as argument to
>> page_alloc() to define separate regions, even if the connection with
>> the MMU is done in a future series?
> 
> the physical allocator is only concerned with the physical pages. if
> you need special MMU flags to be set, then you should enable the MMU
> and fiddle with the flags and settings yourself.
> 

AFAIU the page_allocator() works on virtual addresses if the MMU has 
been initialized.

Considering that more and more tests will enable the MMU by default, 
eventually with a simple logical mapping, it seems to me that having the 
possibility to give the page allocator more information about the page 
access configuration could be interesting.

I find that using two different interfaces, both related to memory 
handling, to have a proper memory configuration for an I/O page may be 
complicated without some way to link page allocator and MMU tables together.


-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
  2020-10-05 12:57     ` Pierre Morel
@ 2020-10-05 14:59       ` Claudio Imbrenda
  0 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05 14:59 UTC (permalink / raw)
  To: Pierre Morel; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, 5 Oct 2020 14:57:15 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 2020-10-05 14:35, Claudio Imbrenda wrote:
> > On Mon, 5 Oct 2020 13:54:42 +0200
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> > 
> > [...]
> >   
> >> While doing a page allocator, the topology is not the only
> >> characteristic we may need to specify.
> >> Specific page characteristics like rights, access flags, cache
> >> behavior may be useful when testing I/O for some architectures.
> >> This obviously will need some connection to the MMU handling.
> >>
> >> Wouldn't it be interesting to use a bitmap flag as argument to
> >> page_alloc() to define separate regions, even if the connection
> >> with the MMU is done in a future series?  
> > 
> > the physical allocator is only concerned with the physical pages. if
> > you need special MMU flags to be set, then you should enable the MMU
> > and fiddle with the flags and settings yourself.
> >   
> 
> AFAIU the page_allocator() works on virtual addresses if the MMU has 
> been initialized.

no, it still works on physical addresses, which happen to be identity
mapped by the MMU. don't forget that the page tables are
themselves allocated with the page allocator. 

> Considering that more and more tests will enable the MMU by default, 
> eventually with a simple logical mapping, it seems to me that having
> the possibility to give the page allocator more information about the
> page access configuration could be interesting.

I disagree.

I think we should not violate the layering here.

> I find that using two different interfaces, both related to memory 
> handling, to have a proper memory configuration for an I/O page may
> be complicated without some way to link page allocator and MMU tables
> together.

If you want to allocate an identity mapped page and also change its
properties at the same time, you can always write a wrapper.

keep the page allocator working only on physical addresses, add a
function to change the properties of the mapping, and add a wrapper
for the two.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-05 12:40   ` Andrew Jones
@ 2020-10-05 15:56     ` Claudio Imbrenda
  2020-10-05 16:53       ` Andrew Jones
  0 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05 15:56 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, 5 Oct 2020 14:40:39 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Fri, Oct 02, 2020 at 05:44:17PM +0200, Claudio Imbrenda wrote:
> > This is a complete rewrite of the page allocator.
> > 
> > This will bring a few improvements:
> > * no need to specify the size when freeing
> > * allocate small areas with a large alignment without wasting memory
> > * ability to initialize and use multiple memory areas (e.g. DMA)
> > * more sanity checks
> > 
> > A few things have changed:
> > * initialization cannot be done with free_pages like before,
> >   page_alloc_init_area has to be used instead
> > 
> > Arch-specific changes:
> > * s390x now uses the area below 2GiB for SMP lowcore initialization.
> > 
> > Details:
> > Each memory area has metadata at the very beginning. The metadata
> > is a byte array with one entry per usable page (so, excluding the
> > metadata itself). Each entry indicates if the page is special
> > (unused for now), if it is allocated, and the order of the block.
> > Both free and allocated pages are part of larger blocks.
> > 
> > Some more fixed size metadata is present in a fixed-size static
> > array. This metadata contains start and end page frame numbers, the
> > pointer to the metadata array, and the array of freelists. The
> > array of freelists has an entry for each possible order (indicated
> > by the macro NLISTS, defined as BITS_PER_LONG - PAGE_SHIFT).
> > 
> > On allocation, if the free list for the needed size is empty, larger
> > blocks are split. When a small allocation with a large alignment is
> > requested, an appropriately large block is split, to guarantee the
> > alignment.
> > 
> > When a block is freed, an attempt will be made to merge it into the
> > neighbour, iterating the process as long as possible.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > ---
> >  lib/alloc_page.h |  63 +++++-
> >  lib/alloc_page.c | 484
> > +++++++++++++++++++++++++++++++++++++---------- lib/arm/setup.c  |
> >  2 +- lib/s390x/sclp.c |   6 +-
> >  lib/s390x/smp.c  |   2 +-
> >  lib/vmalloc.c    |  13 +-
> >  6 files changed, 453 insertions(+), 117 deletions(-)
> > 
> > diff --git a/lib/alloc_page.h b/lib/alloc_page.h
> > index 88540d1..81847ae 100644
> > --- a/lib/alloc_page.h
> > +++ b/lib/alloc_page.h
> > @@ -8,12 +8,71 @@
> >  #ifndef ALLOC_PAGE_H
> >  #define ALLOC_PAGE_H 1
> >  
> > +#include <asm/memory_areas.h>
> > +
> > +/* Returns true if the page allocator has been initialized */
> >  bool page_alloc_initialized(void);
> > +
> > +/*
> > + * Initializes a memory area.
> > + * n is the number of the area to initialize
> > + * base_pfn is the physical frame number of the start of the area
> > to initialize
> > + * top_pfn is the physical frame number of the first page
> > immediately after
> > + * the end of the area to initialize
> > + */
> > +void page_alloc_init_area(u8 n, uintptr_t base_pfn, uintptr_t
> > top_pfn); +
> > +/* Enables the page allocator. At least one area must have been
> > initialized */ void page_alloc_ops_enable(void);
> > +
> > +/*
> > + * Allocate aligned memory from the specified areas.
> > + * areas is a bitmap of allowed areas
> > + * alignment must be a power of 2
> > + */
> > +void *memalign_pages_area(unsigned int areas, size_t alignment,
> > size_t size);  
> 
> I guess 'size' is the number of pages? Or is it bytes that
> automatically get rounded up to a multiple of PAGE_SIZE?

it's the latter, since this is the backend for memalign. it has the
same parameters as memalign. I will improve the comments.

> > +
> > +/*
> > + * Allocate aligned memory from any area.
> > + * Equivalent to memalign_pages_area(~0, alignment, size).
> > + */
> > +void *memalign_pages(size_t alignment, size_t size);
> > +
> > +/*
> > + * Allocate naturally aligned memory from the specified areas.
> > + * Equivalent to memalign_pages_area(areas, 1ull << order, 1ull <<
> > order).
> > + */
> > +void *alloc_pages_area(unsigned int areas, unsigned int order);
> > +
> > +/*
> > + * Allocate one page from any area.
> > + * Equivalent to alloc_pages(0);
> > + */
> >  void *alloc_page(void);
> > +
> > +/*
> > + * Allocate naturally aligned memory from any area.
> > + * Equivalent to alloc_pages_area(~0, order);
> > + */
> >  void *alloc_pages(unsigned int order);
> > -void free_page(void *page);
> > +
> > +/*
> > + * Frees a memory block allocated with any of the memalign_pages*
> > or
> > + * alloc_pages* functions.
> > + * The pointer must point to the start of the block.
> > + */
> >  void free_pages(void *mem, size_t size);
> > -void free_pages_by_order(void *mem, unsigned int order);
> > +
> > +/* For backwards compatibility */
> > +static inline void free_page(void *mem)
> > +{
> > +	return free_pages(mem, 1);
> > +}
> > +
> > +/* For backwards compatibility */  
> 
> We don't preserve interfaces for backwards compatibility in
> kvm-unit-tests. If the interface is no longer a good idea,
> then we can change all the call sites. However, in this case,
> it is a good idea, as it balances alloc_page(), so I'd just
> write "Equivalent to free_pages(mem, 1)".

I will fix the comment

> > +static inline void free_pages_by_order(void *mem, unsigned int
> > order) +{
> > +	free_pages(mem, 1ull << order);
> > +}
> >  
> >  #endif
> > diff --git a/lib/alloc_page.c b/lib/alloc_page.c
> > index 74fe726..29d221f 100644
> > --- a/lib/alloc_page.c
> > +++ b/lib/alloc_page.c
> > @@ -9,169 +9,445 @@
> >  #include "alloc_phys.h"
> >  #include "alloc_page.h"
> >  #include "bitops.h"
> > +#include "list.h"
> >  #include <asm/page.h>
> >  #include <asm/io.h>
> >  #include <asm/spinlock.h>
> > +#include <asm/memory_areas.h>
> >  
> > +#define IS_ALIGNED_ORDER(x,order) IS_ALIGNED((x),BIT_ULL(order))
> > +#define NLISTS ((BITS_PER_LONG) - (PAGE_SHIFT))  
> 
> nit: too many ()

will fix

> > +#define PFN(x) ((uintptr_t)(x) >> PAGE_SHIFT)
> > +
> > +#define MAX_AREAS	6  
> 
> This looks like something that should be defined memory_areas.h.

makes sense. fill fix

> > +
> > +#define ORDER_MASK	0x3f
> > +#define ALLOC_MASK	0x40
> > +
> > +struct mem_area {
> > +	/* Physical frame number of the first usable frame in the
> > area */
> > +	uintptr_t base;
> > +	/* Physical frame number of the first frame outside the
> > area */
> > +	uintptr_t top;
> > +	/* Combination ALLOC_MASK and order */
> > +	u8 *page_states;  
> 
> I can't tell what page_states is supposed to be by its name nor the
> comment.

it's the per-page metadata array, as described in the patch
description. it has one entry for each usable page in the area.

> > +	/* One freelist for each possible block size, up to NLISTS
> > */
> > +	struct linked_list freelists[NLISTS];
> > +};
> > +
> > +static struct mem_area areas[MAX_AREAS];
> > +static unsigned int areas_mask;
> >  static struct spinlock lock;
> > -static void *freelist = 0;
> >  
> >  bool page_alloc_initialized(void)
> >  {
> > -	return freelist != 0;
> > +	return areas_mask != 0;
> >  }
> >  
> > -void free_pages(void *mem, size_t size)
> > +static inline bool area_or_metadata_contains(struct mem_area *a,
> > uintptr_t pfn)  
> 
> I can't tell what this function is suppose to return by its name and
> it has no comment.
> 
> >  {
> > -	void *old_freelist;
> > -	void *end;
> > +	return (pfn >= PFN(a->page_states)) && (pfn < a->top);  
> 
> Why are we comparing an input pfn with the page_states virtual
> pfn? I guess you want 'pfn >= base' ?

no, I want to see if the address overlaps the start of the metadata,
which is placed immediately before the usable range. it is used in one
place below, unfortunately below the point at which you gave up reading

> > +}
> >  
> > -	assert_msg((unsigned long) mem % PAGE_SIZE == 0,
> > -		   "mem not page aligned: %p", mem);
> > +static inline bool area_contains(struct mem_area *a, uintptr_t
> > pfn)  
> 
> Please complete the sentence in these function names, e.g.
> area_contains_pfn(), which appears to be what this function
> is checking. I still don't understand what the
> area_or_metadata_contains() function is supposed to be checking
> though.

as written in the patch description, at the very beginning of an area
there is a metadata array, with one entry per page. 

area_or_metadata_contains checks if a pfn falls in the usable area or
in its metadata area, whereas area_contains only checks if a pfn falls
in the usable area.

I will add some more comments to clarify

> > +{
> > +	return (pfn >= a->base) && (pfn < a->top);
> > +}
> >  
> > -	assert_msg(size % PAGE_SIZE == 0, "size not page aligned:
> > %#zx", size); +/*
> > + * Splits the free block starting at addr into 2 blocks of half
> > the size.
> > + *
> > + * The function depends on the following assumptions:
> > + * - The allocator must have been initialized
> > + * - the block must be within the memory area
> > + * - all pages in the block must be free and not special
> > + * - the pointer must point to the start of the block
> > + * - all pages in the block must have the same block size.  
> 
> pages should all have the same page size, no? How does a
> page have a block size?

no, a page belongs to a power-of-two aligned block of pages, in the
worst case it's an order 0 block (so one page), but in most cases it
will belong to a larger block.

this is basically where we store the size of an allocation so we don't
have to provide it back again when freeing

> > + * - the block size must be greater than 0
> > + * - the block size must be smaller than the maximum allowed
> > + * - the block must be in a free list
> > + * - the function is called with the lock held
> > + */
> > +static void split(struct mem_area *a, void *addr)
> > +{
> > +	uintptr_t pfn = PFN(addr);  
> 
> addr appears to be virtual, since it's a 'void *', not a phys_addr_t.

it's a pointer, doesn't mean it's virtual. In fact it will be physical,
since the page allocator only works with physical addresses.

if the MMU has been activated and all of memory is virtual but identity
mapped, it does not make any difference.

the addresses do not belong to the vmalloc range

> PFN() only does a shift, so that's the virtual PFN. However we're

again, it's just a PFN

> comparing that with memory area pfn's, which are supposed to be
> physical frame numbers, at least according to the comments in the

correct. we are comparing physical PFNs with other physical PFNs

> struct.
> 
> > +	struct linked_list *p;
> > +	uintptr_t i, idx;
> > +	u8 order;
> >  
> > -	assert_msg(size == 0 || (uintptr_t)mem == -size ||
> > -		   (uintptr_t)mem + size > (uintptr_t)mem,
> > -		   "mem + size overflow: %p + %#zx", mem, size);
> > +	assert(a && area_contains(a, pfn));
> > +	idx = pfn - a->base;
> > +	order = a->page_states[idx];  
> 
> pfn - a->base could potentially be huge, so we can't be using that
> as an array index.

this is exactly what we are doing. we need the index of the page in the
per-page metadata array.

from the array we get the order of the block the page belongs to.
 
> I think what you want is something like
> 
>  pfn = PFN(virt_to_phys(addr));
>  order = ffs(pfn) - 1;
>  if (pfn != order)
>    order <<= 1;
> 
> Then you have order and can use it as a index into an array if
> necessary.

now this does not make any sense

> > +	assert(!(order & ~ORDER_MASK) && order && (order <
> > NLISTS));  
> 
> What's wrong with order==0? What is ORDER_MASK good for?

as written in the comment in the struct, the metadata contains the
order and a bit to mark when a page is allocated. ORDER_MASK is used to
extract the order part.

a block with order 0 is a single page. you cannot split a page. that's
why order 0 is bad.

split takes a pointer to a free block and splits the block in two blocks
of half the size each. this is needed when there are no free blocks of
the suitable size, but there are bigger free blocks

> > +	assert(IS_ALIGNED_ORDER(pfn, order));  
> 
> I'm getting the feeling that split() shouldn't operate on an addr, but
> rather some internal index of a block list or something. There are
> just too many requirements as to what addr is supposed to be.

split is an internal function, it's static for a reason. basically
there is exactly one spot where it should be used, and that is the one
where it is used.
 
> > +	assert(area_contains(a, pfn + BIT(order) - 1));
> >  
> > -	if (size == 0) {
> > -		freelist = NULL;
> > -		return;
> > -	}
> > +	/* Remove the block from its free list */
> > +	p = list_remove(addr);  
> 
> So addr is a pointer to a linked_list object? I'm getting really
> confused.

it's literally a free list; the beginning of each free block of memory
contains the pointers so that it can be handled using list operations

> My editor says I'm only 30% of my way through this patch, so I don't
> think I'm going to have time to look at the rest. I think this patch
> is trying to do too much at once. IMO, we need to implement a single

this patch is implementing the page allocator; it is a single new
feature.

> new feature at a time. Possibly starting with some cleanup patches
> for the current code first.

there is no way to split this any further, I need to break some
interfaces, so I need to do it all at once.

> Thanks,
> drew
> 
> > +	assert(p);
> >  
> > -	spin_lock(&lock);
> > -	old_freelist = freelist;
> > -	freelist = mem;
> > -	end = mem + size;
> > -	while (mem + PAGE_SIZE != end) {
> > -		*(void **)mem = (mem + PAGE_SIZE);
> > -		mem += PAGE_SIZE;
> > +	/* update the block size for each page in the block */
> > +	for (i = 0; i < BIT(order); i++) {
> > +		assert(a->page_states[idx + i] == order);
> > +		a->page_states[idx + i] = order - 1;
> >  	}
> > -
> > -	*(void **)mem = old_freelist;
> > -	spin_unlock(&lock);
> > +	order--;
> > +	/* add the first half block to the appropriate free list */
> > +	list_add(a->freelists + order, p);
> > +	/* add the second half block to the appropriate free list
> > */
> > +	list_add(a->freelists + order, (void *)((pfn + BIT(order))
> > * PAGE_SIZE)); }
> >  
> > -void free_pages_by_order(void *mem, unsigned int order)
> > +/*
> > + * Returns a block whose alignment and size are at least the
> > parameter values.
> > + * If there is not enough free memory, NULL is returned.
> > + *
> > + * Both parameters must be not larger than the largest allowed
> > order
> > + */
> > +static void *page_memalign_order(struct mem_area *a, u8 al, u8 sz)
> >  {
> > -	free_pages(mem, 1ul << (order + PAGE_SHIFT));
> > +	struct linked_list *p, *res = NULL;
> > +	u8 order;
> > +
> > +	assert((al < NLISTS) && (sz < NLISTS));
> > +	/* we need the bigger of the two as starting point */
> > +	order = sz > al ? sz : al;
> > +
> > +	/* search all free lists for some memory */
> > +	for ( ; order < NLISTS; order++) {
> > +		p = a->freelists[order].next;
> > +		if (!is_list_empty(p))
> > +			break;
> > +	}
> > +	/* out of memory */
> > +	if (order >= NLISTS)
> > +		return NULL;
> > +
> > +	/*
> > +	 * the block is bigger than what we need because either
> > there were
> > +	 * no smaller blocks, or the smaller blocks were not
> > aligned to our
> > +	 * needs; therefore we split the block until we reach the
> > needed size
> > +	 */
> > +	for (; order > sz; order--)
> > +		split(a, p);
> > +
> > +	res = list_remove(p);
> > +	memset(a->page_states + (PFN(res) - a->base), ALLOC_MASK |
> > order, BIT(order));
> > +	return res;
> >  }
> >  
> > -void *alloc_page()
> > +/*
> > + * Try to merge two blocks into a bigger one.
> > + * Returns true in case of a successful merge.
> > + * Merging will succeed only if both blocks have the same block
> > size and are
> > + * both free.
> > + *
> > + * The function depends on the following assumptions:
> > + * - the first parameter is strictly smaller than the second
> > + * - the parameters must point each to the start of their block
> > + * - the two parameters point to adjacent blocks
> > + * - the two blocks are both in a free list
> > + * - all of the pages of the two blocks must be free
> > + * - all of the pages of the two blocks must have the same block
> > size
> > + * - the function is called with the lock held
> > + */
> > +static bool coalesce(struct mem_area *a, u8 order, uintptr_t pfn,
> > uintptr_t pfn2) {
> > -	void *p;
> > +	uintptr_t first, second, i;
> > +	struct linked_list *li;
> >  
> > -	if (!freelist)
> > -		return 0;
> > +	assert(IS_ALIGNED_ORDER(pfn, order) &&
> > IS_ALIGNED_ORDER(pfn2, order));
> > +	assert(pfn2 == pfn + BIT(order));
> > +	assert(a);
> >  
> > -	spin_lock(&lock);
> > -	p = freelist;
> > -	freelist = *(void **)freelist;
> > -	spin_unlock(&lock);
> > +	/* attempting to coalesce two blocks that belong to
> > different areas */
> > +	if (!area_contains(a, pfn) || !area_contains(a, pfn2 +
> > BIT(order) - 1))
> > +		return false;
> > +	first = pfn - a->base;
> > +	second = pfn2 - a->base;
> > +	/* the two blocks have different sizes, cannot coalesce */
> > +	if ((a->page_states[first] != order) ||
> > (a->page_states[second] != order))
> > +		return false;
> >  
> > -	if (p)
> > -		memset(p, 0, PAGE_SIZE);
> > -	return p;
> > +	/* we can coalesce, remove both blocks from their
> > freelists */
> > +	li = list_remove((void *)(pfn2 << PAGE_SHIFT));
> > +	assert(li);
> > +	li = list_remove((void *)(pfn << PAGE_SHIFT));
> > +	assert(li);
> > +	/* check the metadata entries and update with the new size
> > */
> > +	for (i = 0; i < (2ull << order); i++) {
> > +		assert(a->page_states[first + i] == order);
> > +		a->page_states[first + i] = order + 1;
> > +	}
> > +	/* finally add the newly coalesced block to the
> > appropriate freelist */
> > +	list_add(a->freelists + order + 1, li);
> > +	return true;
> >  }
> >  
> >  /*
> > - * Allocates (1 << order) physically contiguous and naturally
> > aligned pages.
> > - * Returns NULL if there's no memory left.
> > + * Free a block of memory.
> > + * The parameter can be NULL, in which case nothing happens.
> > + *
> > + * The function depends on the following assumptions:
> > + * - the parameter is page aligned
> > + * - the parameter belongs to an existing memory area
> > + * - the parameter points to the beginning of the block
> > + * - the size of the block is less than the maximum allowed
> > + * - the block is completely contained in its memory area
> > + * - all pages in the block have the same block size
> > + * - no pages in the memory block were already free
> > + * - no pages in the memory block are special
> >   */
> > -void *alloc_pages(unsigned int order)
> > +static void _free_pages(void *mem)
> >  {
> > -	/* Generic list traversal. */
> > -	void *prev;
> > -	void *curr = NULL;
> > -	void *next = freelist;
> > -
> > -	/* Looking for a run of length (1 << order). */
> > -	unsigned long run = 0;
> > -	const unsigned long n = 1ul << order;
> > -	const unsigned long align_mask = (n << PAGE_SHIFT) - 1;
> > -	void *run_start = NULL;
> > -	void *run_prev = NULL;
> > -	unsigned long run_next_pa = 0;
> > -	unsigned long pa;
> > +	uintptr_t pfn2, pfn = PFN(mem);
> > +	struct mem_area *a = NULL;
> > +	uintptr_t i, p;
> > +	u8 order;
> >  
> > -	assert(order < sizeof(unsigned long) * 8);
> > +	if (!mem)
> > +		return;
> > +	assert(IS_ALIGNED((uintptr_t)mem, PAGE_SIZE));
> >  
> > -	spin_lock(&lock);
> > -	for (;;) {
> > -		prev = curr;
> > -		curr = next;
> > +	/* find which area this pointer belongs to*/
> > +	for (i = 0; !a && (i < MAX_AREAS); i++) {
> > +		if ((areas_mask & BIT(i)) && area_contains(areas +
> > i, pfn))
> > +			a = areas + i;
> > +	}
> > +	assert_msg(a, "memory does not belong to any area: %p",
> > mem); 
> > -		if (!curr) {
> > -			run_start = NULL;
> > -			break;
> > -		}
> > +	p = pfn - a->base;
> > +	order = a->page_states[p] & ORDER_MASK;
> >  
> > -		next = *((void **) curr);
> > -		pa = virt_to_phys(curr);
> > -
> > -		if (run == 0) {
> > -			if (!(pa & align_mask)) {
> > -				run_start = curr;
> > -				run_prev = prev;
> > -				run_next_pa = pa + PAGE_SIZE;
> > -				run = 1;
> > -			}
> > -		} else if (pa == run_next_pa) {
> > -			run_next_pa += PAGE_SIZE;
> > -			run += 1;
> > -		} else {
> > -			run = 0;
> > -		}
> > +	/* ensure that the first page is allocated and not special
> > */
> > +	assert(a->page_states[p] == (order | ALLOC_MASK));
> > +	/* ensure that the order has a sane value */
> > +	assert(order < NLISTS);
> > +	/* ensure that the block is aligned properly for its size
> > */
> > +	assert(IS_ALIGNED_ORDER(pfn, order));
> > +	/* ensure that the area can contain the whole block */
> > +	assert(area_contains(a, pfn + BIT(order) - 1));
> >  
> > -		if (run == n) {
> > -			if (run_prev)
> > -				*((void **) run_prev) = next;
> > -			else
> > -				freelist = next;
> > -			break;
> > -		}
> > +	for (i = 0; i < BIT(order); i++) {
> > +		/* check that all pages of the block have
> > consistent metadata */
> > +		assert(a->page_states[p + i] == (ALLOC_MASK |
> > order));
> > +		/* set the page as free */
> > +		a->page_states[p + i] &= ~ALLOC_MASK;
> >  	}
> > -	spin_unlock(&lock);
> > -	if (run_start)
> > -		memset(run_start, 0, n * PAGE_SIZE);
> > -	return run_start;
> > +	/* provisionally add the block to the appropriate free
> > list */
> > +	list_add(a->freelists + order, mem);
> > +	/* try to coalesce the block with neighbouring blocks if
> > possible */
> > +	do {
> > +		/*
> > +		 * get the order again since it might have changed
> > after
> > +		 * coalescing in a previous iteration
> > +		 */
> > +		order = a->page_states[p] & ORDER_MASK;
> > +		/*
> > +		 * let's consider this block and the next one if
> > this block
> > +		 * is aligned to the next size, otherwise let's
> > consider the
> > +		 * previous block and this one
> > +		 */
> > +		if (!IS_ALIGNED_ORDER(pfn, order + 1))
> > +			pfn = pfn - BIT(order);
> > +		pfn2 = pfn + BIT(order);
> > +		/* repeat as long as we manage to coalesce
> > something */
> > +	} while (coalesce(a, order, pfn, pfn2));
> >  }
> >  
> > +void free_pages(void *mem, size_t size)
> > +{
> > +	spin_lock(&lock);
> > +	_free_pages(mem);
> > +	spin_unlock(&lock);
> > +}
> >  
> > -void free_page(void *page)
> > +static void *page_memalign_order_area(unsigned area, u8 ord, u8 al)
> >  {
> > +	void *res = NULL;
> > +	int i;
> > +
> >  	spin_lock(&lock);
> > -	*(void **)page = freelist;
> > -	freelist = page;
> > +	area &= areas_mask;
> > +	for (i = 0; !res && (i < MAX_AREAS); i++)
> > +		if (area & BIT(i))
> > +			res = page_memalign_order(areas + i, ord,
> > al); spin_unlock(&lock);
> > +	return res;
> >  }
> >  
> > -static void *page_memalign(size_t alignment, size_t size)
> > +/*
> > + * Allocates (1 << order) physically contiguous and naturally
> > aligned pages.
> > + * Returns NULL if the allocation was not possible.
> > + */
> > +void *alloc_pages_area(unsigned int area, unsigned int order)
> >  {
> > -	unsigned long n = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
> > -	unsigned int order;
> > +	return page_memalign_order_area(area, order, order);
> > +}
> >  
> > -	if (!size)
> > -		return NULL;
> > +void *alloc_pages(unsigned int order)
> > +{
> > +	return alloc_pages_area(AREA_ANY, order);
> > +}
> >  
> > -	order = get_order(n);
> > +/*
> > + * Allocates (1 << order) physically contiguous aligned pages.
> > + * Returns NULL if the allocation was not possible.
> > + */
> > +void *memalign_pages_area(unsigned int area, size_t alignment,
> > size_t size) +{
> > +	assert(is_power_of_2(alignment));
> > +	alignment = get_order(PAGE_ALIGN(alignment) >> PAGE_SHIFT);
> > +	size = get_order(PAGE_ALIGN(size) >> PAGE_SHIFT);
> > +	assert(alignment < NLISTS);
> > +	assert(size < NLISTS);
> > +	return page_memalign_order_area(area, size, alignment);
> > +}
> >  
> > -	return alloc_pages(order);
> > +void *memalign_pages(size_t alignment, size_t size)
> > +{
> > +	return memalign_pages_area(AREA_ANY, alignment, size);
> >  }
> >  
> > -static void page_free(void *mem, size_t size)
> > +/*
> > + * Allocates one page
> > + */
> > +void *alloc_page()
> >  {
> > -	free_pages(mem, size);
> > +	return alloc_pages(0);
> >  }
> >  
> >  static struct alloc_ops page_alloc_ops = {
> > -	.memalign = page_memalign,
> > -	.free = page_free,
> > +	.memalign = memalign_pages,
> > +	.free = free_pages,
> >  	.align_min = PAGE_SIZE,
> >  };
> >  
> > +/*
> > + * Enables the page allocator.
> > + *
> > + * Prerequisites:
> > + * - at least one memory area has been initialized
> > + */
> >  void page_alloc_ops_enable(void)
> >  {
> > +	spin_lock(&lock);
> > +	assert(page_alloc_initialized());
> >  	alloc_ops = &page_alloc_ops;
> > +	spin_unlock(&lock);
> > +}
> > +
> > +/*
> > + * Adds a new memory area to the pool of available memory.
> > + *
> > + * Prerequisites:
> > + * - the lock is held
> > + * - start and top are page frame numbers
> > + * - start is smaller than top
> > + * - top does not fall outside of addressable memory
> > + * - there is at least one more slot free for memory areas
> > + * - if a specific memory area number has been indicated, it needs
> > to be free
> > + * - the memory area to add does not overlap with existing areas
> > + * - the memory area to add has at least 5 pages available
> > + */
> > +static void _page_alloc_init_area(u8 n, uintptr_t start_pfn,
> > uintptr_t top_pfn) +{
> > +	size_t table_size, npages, i;
> > +	struct mem_area *a;
> > +	u8 order = 0;
> > +
> > +	/* the number must be within the allowed range */
> > +	assert(n < MAX_AREAS);
> > +	/* the new area number must be unused */
> > +	assert(!(areas_mask & BIT(n)));
> > +
> > +	/* other basic sanity checks */
> > +	assert(top_pfn > start_pfn);
> > +	assert(top_pfn - start_pfn > 4);
> > +	assert(top_pfn < BIT_ULL(sizeof(void *) * 8 - PAGE_SHIFT));
> > +
> > +	/* calculate the size of the metadata table in pages */
> > +	table_size = (top_pfn - start_pfn + PAGE_SIZE) /
> > (PAGE_SIZE + 1); +
> > +	/* fill in the values of the new area */
> > +	a = areas + n;
> > +	a->page_states = (void *)(start_pfn << PAGE_SHIFT);
> > +	a->base = start_pfn + table_size;
> > +	a->top = top_pfn;
> > +	npages = top_pfn - a->base;
> > +	assert((a->base - start_pfn) * PAGE_SIZE >= npages);
> > +
> > +	/* check that the new area does not overlap with any
> > existing areas */
> > +	for (i = 0; i < MAX_AREAS; i++) {
> > +		if (!(areas_mask & BIT(i)))
> > +			continue;
> > +		assert(!area_or_metadata_contains(areas + i,
> > start_pfn));
> > +		assert(!area_or_metadata_contains(areas + i,
> > top_pfn - 1));
> > +		assert(!area_or_metadata_contains(a,
> > PFN(areas[i].page_states)));
> > +		assert(!area_or_metadata_contains(a, areas[i].top
> > - 1));
> > +	}
> > +	/* initialize all freelists for the new area */
> > +	for (i = 0; i < NLISTS; i++)
> > +		a->freelists[i].next = a->freelists[i].prev =
> > a->freelists + i; +
> > +	/* initialize the metadata for the available memory */
> > +	for (i = a->base; i < a->top; i += 1ull << order) {
> > +		/* search which order to start from */
> > +		while (i + BIT(order) > a->top) {
> > +			assert(order);
> > +			order--;
> > +		}
> > +		/*
> > +		 * we need both loops, one for the start and the
> > other for
> > +		 * the end of the block, in case it spans a power
> > of two
> > +		 * boundary
> > +		 */
> > +		while (IS_ALIGNED_ORDER(i, order + 1) && (i +
> > BIT(order + 1) <= a->top))
> > +			order++;
> > +		assert(order < NLISTS);
> > +		/* initialize the metadata and add to the freelist
> > */
> > +		memset(a->page_states + (i - a->base), order,
> > BIT(order));
> > +		list_add(a->freelists + order, (void *)(i <<
> > PAGE_SHIFT));
> > +	}
> > +	/* finally mark the area as present */
> > +	areas_mask |= BIT(n);
> > +}
> > +
> > +static void __page_alloc_init_area(u8 n, uintptr_t cutoff,
> > uintptr_t base_pfn, uintptr_t *top_pfn) +{
> > +	if (*top_pfn > cutoff) {
> > +		spin_lock(&lock);
> > +		if (base_pfn >= cutoff) {
> > +			_page_alloc_init_area(n, base_pfn,
> > *top_pfn);
> > +			*top_pfn = 0;
> > +		} else {
> > +			_page_alloc_init_area(n, cutoff, *top_pfn);
> > +			*top_pfn = cutoff;
> > +		}
> > +		spin_unlock(&lock);
> > +	}
> > +}
> > +
> > +/*
> > + * Adds a new memory area to the pool of available memory.
> > + *
> > + * Prerequisites:
> > + * see _page_alloc_init_area
> > + */
> > +void page_alloc_init_area(u8 n, uintptr_t base_pfn, uintptr_t
> > top_pfn) +{
> > +	if (n != AREA_ANY_NUMBER) {
> > +		__page_alloc_init_area(n, 0, base_pfn, &top_pfn);
> > +		return;
> > +	}
> > +#ifdef AREA_HIGH_PFN
> > +	__page_alloc_init_area(AREA_HIGH_NUMBER, AREA_HIGH_PFN),
> > base_pfn, &top_pfn); +#endif
> > +	__page_alloc_init_area(AREA_NORMAL_NUMBER,
> > AREA_NORMAL_PFN, base_pfn, &top_pfn); +#ifdef AREA_LOW_PFN
> > +	__page_alloc_init_area(AREA_LOW_NUMBER, AREA_LOW_PFN,
> > base_pfn, &top_pfn); +#endif
> > +#ifdef AREA_LOWEST_PFN
> > +	__page_alloc_init_area(AREA_LOWEST_NUMBER,
> > AREA_LOWEST_PFN, base_pfn, &top_pfn); +#endif
> >  }
> > diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> > index 78562e4..3f03ca6 100644
> > --- a/lib/arm/setup.c
> > +++ b/lib/arm/setup.c
> > @@ -155,7 +155,7 @@ static void mem_init(phys_addr_t freemem_start)
> >  	assert(sizeof(long) == 8 || !(base >> 32));
> >  	if (sizeof(long) != 8 && (top >> 32) != 0)
> >  		top = ((uint64_t)1 << 32);
> > -	free_pages((void *)(unsigned long)base, top - base);
> > +	page_alloc_init_area(0, base >> PAGE_SHIFT, top >>
> > PAGE_SHIFT); page_alloc_ops_enable();
> >  }
> >  
> > diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c
> > index 4054d0e..4e2ac18 100644
> > --- a/lib/s390x/sclp.c
> > +++ b/lib/s390x/sclp.c
> > @@ -37,11 +37,11 @@ static void mem_init(phys_addr_t mem_end)
> >  
> >  	phys_alloc_init(freemem_start, mem_end - freemem_start);
> >  	phys_alloc_get_unused(&base, &top);
> > -	base = (base + PAGE_SIZE - 1) & -PAGE_SIZE;
> > -	top = top & -PAGE_SIZE;
> > +	base = PAGE_ALIGN(base) >> PAGE_SHIFT;
> > +	top = top >> PAGE_SHIFT;
> >  
> >  	/* Make the pages available to the physical allocator */
> > -	free_pages((void *)(unsigned long)base, top - base);
> > +	page_alloc_init_area(AREA_ANY_NUMBER, base, top);
> >  	page_alloc_ops_enable();
> >  }
> >  
> > diff --git a/lib/s390x/smp.c b/lib/s390x/smp.c
> > index 2860e9c..ea93329 100644
> > --- a/lib/s390x/smp.c
> > +++ b/lib/s390x/smp.c
> > @@ -190,7 +190,7 @@ int smp_cpu_setup(uint16_t addr, struct psw psw)
> >  
> >  	sigp_retry(cpu->addr, SIGP_INITIAL_CPU_RESET, 0, NULL);
> >  
> > -	lc = alloc_pages(1);
> > +	lc = alloc_pages_area(AREA_DMA31, 1);
> >  	cpu->lowcore = lc;
> >  	memset(lc, 0, PAGE_SIZE * 2);
> >  	sigp_retry(cpu->addr, SIGP_SET_PREFIX, (unsigned long )lc,
> > NULL); diff --git a/lib/vmalloc.c b/lib/vmalloc.c
> > index 2f25734..3aec5ac 100644
> > --- a/lib/vmalloc.c
> > +++ b/lib/vmalloc.c
> > @@ -217,18 +217,19 @@ void setup_vm()
> >  	 * so that it can be used to allocate page tables.
> >  	 */
> >  	if (!page_alloc_initialized()) {
> > -		base = PAGE_ALIGN(base);
> > -		top = top & -PAGE_SIZE;
> > -		free_pages(phys_to_virt(base), top - base);
> > +		base = PAGE_ALIGN(base) >> PAGE_SHIFT;
> > +		top = top >> PAGE_SHIFT;
> > +		page_alloc_init_area(AREA_ANY_NUMBER, base, top);
> > +		page_alloc_ops_enable();
> >  	}
> >  
> >  	find_highmem();
> >  	phys_alloc_get_unused(&base, &top);
> >  	page_root = setup_mmu(top);
> >  	if (base != top) {
> > -		base = PAGE_ALIGN(base);
> > -		top = top & -PAGE_SIZE;
> > -		free_pages(phys_to_virt(base), top - base);
> > +		base = PAGE_ALIGN(base) >> PAGE_SHIFT;
> > +		top = top >> PAGE_SHIFT;
> > +		page_alloc_init_area(AREA_ANY_NUMBER, base, top);
> >  	}
> >  
> >  	spin_lock(&lock);
> > -- 
> > 2.26.2
> >   
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-05 15:56     ` Claudio Imbrenda
@ 2020-10-05 16:53       ` Andrew Jones
  2020-10-05 17:18         ` Claudio Imbrenda
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2020-10-05 16:53 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, Oct 05, 2020 at 05:56:13PM +0200, Claudio Imbrenda wrote:
> On Mon, 5 Oct 2020 14:40:39 +0200
> Andrew Jones <drjones@redhat.com> wrote:
> 
> > On Fri, Oct 02, 2020 at 05:44:17PM +0200, Claudio Imbrenda wrote:
> > >  {
> > > -	void *old_freelist;
> > > -	void *end;
> > > +	return (pfn >= PFN(a->page_states)) && (pfn < a->top);  
> > 
> > Why are we comparing an input pfn with the page_states virtual
> > pfn? I guess you want 'pfn >= base' ?
> 
> no, I want to see if the address overlaps the start of the metadata,
> which is placed immediately before the usable range. it is used in one
> place below, unfortunately below the point at which you gave up reading

But we're mixing and matching virtual and physical addresses. I assume
pfn is the physical frame number, thus physical. a->top should also
be physical, but PFN(a->page_states) is virtual. Now that I understand
what this function does, I think you want something like

  meta_start = PFN(virt_to_phys(a->page_states));
  meta_end = PFN(virt_to_phys(&a->page_states[block_size]));

  (pfn >= meta_start && pfn < meta_end) || (pfn >= a->base && pfn < a->top)

There's also another metadata page to consider though, right? The one
where the page_states pointer is stored.

> > > + *
> > > + * The function depends on the following assumptions:
> > > + * - The allocator must have been initialized
> > > + * - the block must be within the memory area
> > > + * - all pages in the block must be free and not special
> > > + * - the pointer must point to the start of the block
> > > + * - all pages in the block must have the same block size.  
> > 
> > pages should all have the same page size, no? How does a
> > page have a block size?
> 
> no, a page belongs to a power-of-two aligned block of pages, in the
> worst case it's an order 0 block (so one page), but in most cases it
> will belong to a larger block.

So the comment should be "all blocks in the memory area must have
the same block size" ?

> 
> this is basically where we store the size of an allocation so we don't
> have to provide it back again when freeing
> 
> > > + * - the block size must be greater than 0
> > > + * - the block size must be smaller than the maximum allowed
> > > + * - the block must be in a free list
> > > + * - the function is called with the lock held
> > > + */
> > > +static void split(struct mem_area *a, void *addr)
> > > +{
> > > +	uintptr_t pfn = PFN(addr);  
> > 
> > addr appears to be virtual, since it's a 'void *', not a phys_addr_t.
> 
> it's a pointer, doesn't mean it's virtual. In fact it will be physical,
> since the page allocator only works with physical addresses.

That won't work for 32-bit processors that support physical addresses
that are larger than 32-bits - if the target has more than 4G and the
unit test wants to access that high memory. Using phys_addr_t everywhere
you want a physical address avoids that problem.

> 
> if the MMU has been activated and all of memory is virtual but identity
> mapped, it does not make any difference.

True for 64-bit targets, not possible for targets than can't address all
memory. Also, I don't think we should write general code that assumes we
have an identity map, or that the current virtual mapping we're using is
the identity mapping. It's best to be explicit with 

 phys_addr_t paddr = virt_to_phys(vaddr);

and then use paddr wherever a physical address is expected, not a pointer.

> 
> the addresses do not belong to the vmalloc range
> 
> > PFN() only does a shift, so that's the virtual PFN. However we're
> 
> again, it's just a PFN
> 
> > comparing that with memory area pfn's, which are supposed to be
> > physical frame numbers, at least according to the comments in the
> 
> correct. we are comparing physical PFNs with other physical PFNs

Maybe. But that's not obvious from the code.

> 
> > struct.
> > 
> > > +	struct linked_list *p;
> > > +	uintptr_t i, idx;
> > > +	u8 order;
> > >  
> > > -	assert_msg(size == 0 || (uintptr_t)mem == -size ||
> > > -		   (uintptr_t)mem + size > (uintptr_t)mem,
> > > -		   "mem + size overflow: %p + %#zx", mem, size);
> > > +	assert(a && area_contains(a, pfn));
> > > +	idx = pfn - a->base;
> > > +	order = a->page_states[idx];  
> > 
> > pfn - a->base could potentially be huge, so we can't be using that
> > as an array index.
> 
> this is exactly what we are doing. we need the index of the page in the
> per-page metadata array.

Ah right, one byte per page. That's indeed going to be a large array.

> 
> from the array we get the order of the block the page belongs to.
>  
> > I think what you want is something like
> > 
> >  pfn = PFN(virt_to_phys(addr));
> >  order = ffs(pfn) - 1;
> >  if (pfn != order)
> >    order <<= 1;
> > 
> > Then you have order and can use it as a index into an array if
> > necessary.
> 
> now this does not make any sense

I was thinking we wanted the order of the address, not of some block
the address is somehow associated with and mapped to by this byte.

> 
> > > +	assert(!(order & ~ORDER_MASK) && order && (order <
> > > NLISTS));  
> > 
> > What's wrong with order==0? What is ORDER_MASK good for?
> 
> as written in the comment in the struct, the metadata contains the
> order and a bit to mark when a page is allocated. ORDER_MASK is used to
> extract the order part.
> 
> a block with order 0 is a single page. you cannot split a page. that's
> why order 0 is bad.
> 
> split takes a pointer to a free block and splits the block in two blocks
> of half the size each. this is needed when there are no free blocks of
> the suitable size, but there are bigger free blocks
> 
> > > +	assert(IS_ALIGNED_ORDER(pfn, order));  
> > 
> > I'm getting the feeling that split() shouldn't operate on an addr, but
> > rather some internal index of a block list or something. There are
> > just too many requirements as to what addr is supposed to be.
> 
> split is an internal function, it's static for a reason. basically
> there is exactly one spot where it should be used, and that is the one
> where it is used.
>  
> > > +	assert(area_contains(a, pfn + BIT(order) - 1));
> > >  
> > > -	if (size == 0) {
> > > -		freelist = NULL;
> > > -		return;
> > > -	}
> > > +	/* Remove the block from its free list */
> > > +	p = list_remove(addr);  
> > 
> > So addr is a pointer to a linked_list object? I'm getting really
> > confused.
> 
> it's literally a free list; the beginning of each free block of memory
> contains the pointers so that it can be handled using list operations

Using a list member for list operations and container_of() to get back to
the object would help readability.

> 
> > My editor says I'm only 30% of my way through this patch, so I don't
> > think I'm going to have time to look at the rest. I think this patch
> > is trying to do too much at once. IMO, we need to implement a single
> 
> this patch is implementing the page allocator; it is a single new
> feature.
> 
> > new feature at a time. Possibly starting with some cleanup patches
> > for the current code first.
> 
> there is no way to split this any further, I need to break some
> interfaces, so I need to do it all at once.

You can always build up a new implementation with a series of patches
that don't wire up to anything. A final patch can simply wire up the
new and delete the old.

Thanks,
drew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-05 16:53       ` Andrew Jones
@ 2020-10-05 17:18         ` Claudio Imbrenda
  2020-10-05 18:04           ` Andrew Jones
  0 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-10-05 17:18 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, 5 Oct 2020 18:53:11 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Mon, Oct 05, 2020 at 05:56:13PM +0200, Claudio Imbrenda wrote:
> > On Mon, 5 Oct 2020 14:40:39 +0200
> > Andrew Jones <drjones@redhat.com> wrote:
> >   
> > > On Fri, Oct 02, 2020 at 05:44:17PM +0200, Claudio Imbrenda wrote:
> > >  
> > > >  {
> > > > -	void *old_freelist;
> > > > -	void *end;
> > > > +	return (pfn >= PFN(a->page_states)) && (pfn < a->top);
> > > >    
> > > 
> > > Why are we comparing an input pfn with the page_states virtual
> > > pfn? I guess you want 'pfn >= base' ?  
> > 
> > no, I want to see if the address overlaps the start of the metadata,
> > which is placed immediately before the usable range. it is used in
> > one place below, unfortunately below the point at which you gave up
> > reading  
> 
> But we're mixing and matching virtual and physical addresses. I assume

the page allocator should work with physical pages and addresses, since
it works at a lower level than virtual addresses. 

> pfn is the physical frame number, thus physical. a->top should also
> be physical, but PFN(a->page_states) is virtual. Now that I understand
> what this function does, I think you want something like
> 
>   meta_start = PFN(virt_to_phys(a->page_states));
>   meta_end = PFN(virt_to_phys(&a->page_states[block_size]));
> 
>   (pfn >= meta_start && pfn < meta_end) || (pfn >= a->base && pfn <
> a->top)
> 
> There's also another metadata page to consider though, right? The one
> where the page_states pointer is stored.

no, that's a small static array, it's not stored inside the area
 
> > > > + *
> > > > + * The function depends on the following assumptions:
> > > > + * - The allocator must have been initialized
> > > > + * - the block must be within the memory area
> > > > + * - all pages in the block must be free and not special
> > > > + * - the pointer must point to the start of the block
> > > > + * - all pages in the block must have the same block size.    
> > > 
> > > pages should all have the same page size, no? How does a
> > > page have a block size?  
> > 
> > no, a page belongs to a power-of-two aligned block of pages, in the
> > worst case it's an order 0 block (so one page), but in most cases it
> > will belong to a larger block.  
> 
> So the comment should be "all blocks in the memory area must have
> the same block size" ?

no, all pages belonging to the block must have the same block size.
it's a basic sanity check. if a block of order e.g. 5 has some pages
marked with a different oder, then something went terribly wrong.

> > 
> > this is basically where we store the size of an allocation so we
> > don't have to provide it back again when freeing
> >   
> > > > + * - the block size must be greater than 0
> > > > + * - the block size must be smaller than the maximum allowed
> > > > + * - the block must be in a free list
> > > > + * - the function is called with the lock held
> > > > + */
> > > > +static void split(struct mem_area *a, void *addr)
> > > > +{
> > > > +	uintptr_t pfn = PFN(addr);    
> > > 
> > > addr appears to be virtual, since it's a 'void *', not a
> > > phys_addr_t.  
> > 
> > it's a pointer, doesn't mean it's virtual. In fact it will be
> > physical, since the page allocator only works with physical
> > addresses.  
> 
> That won't work for 32-bit processors that support physical addresses
> that are larger than 32-bits - if the target has more than 4G and the
> unit test wants to access that high memory. Using phys_addr_t
> everywhere you want a physical address avoids that problem.

no it doesn't. you can't even handle the freelist if it's in high
memory. consider that the current allocator also does not handle high
memory. and there is no infrastructure to shortly map high memory.

if a 32 bit test wants to access high memory, it will have to manage it
manually

> > 
> > if the MMU has been activated and all of memory is virtual but
> > identity mapped, it does not make any difference.  
> 
> True for 64-bit targets, not possible for targets than can't address
> all memory. Also, I don't think we should write general code that
> assumes we have an identity map, or that the current virtual mapping
> we're using is the identity mapping. It's best to be explicit with 
> 
>  phys_addr_t paddr = virt_to_phys(vaddr);
> 
> and then use paddr wherever a physical address is expected, not a
> pointer.

this means adding a lot of complexity to the infrastructure

> > 
> > the addresses do not belong to the vmalloc range
> >   
> > > PFN() only does a shift, so that's the virtual PFN. However we're
> > >  
> > 
> > again, it's just a PFN
> >   
> > > comparing that with memory area pfn's, which are supposed to be
> > > physical frame numbers, at least according to the comments in the
> > >  
> > 
> > correct. we are comparing physical PFNs with other physical PFNs  
> 
> Maybe. But that's not obvious from the code.
> 
> >   
> > > struct.
> > >   
> > > > +	struct linked_list *p;
> > > > +	uintptr_t i, idx;
> > > > +	u8 order;
> > > >  
> > > > -	assert_msg(size == 0 || (uintptr_t)mem == -size ||
> > > > -		   (uintptr_t)mem + size > (uintptr_t)mem,
> > > > -		   "mem + size overflow: %p + %#zx", mem,
> > > > size);
> > > > +	assert(a && area_contains(a, pfn));
> > > > +	idx = pfn - a->base;
> > > > +	order = a->page_states[idx];    
> > > 
> > > pfn - a->base could potentially be huge, so we can't be using that
> > > as an array index.  
> > 
> > this is exactly what we are doing. we need the index of the page in
> > the per-page metadata array.  
> 
> Ah right, one byte per page. That's indeed going to be a large array.
> 
> > 
> > from the array we get the order of the block the page belongs to.
> >    
> > > I think what you want is something like
> > > 
> > >  pfn = PFN(virt_to_phys(addr));
> > >  order = ffs(pfn) - 1;
> > >  if (pfn != order)
> > >    order <<= 1;
> > > 
> > > Then you have order and can use it as a index into an array if
> > > necessary.  
> > 
> > now this does not make any sense  
> 
> I was thinking we wanted the order of the address, not of some block
> the address is somehow associated with and mapped to by this byte.
> 
> >   
> > > > +	assert(!(order & ~ORDER_MASK) && order && (order <
> > > > NLISTS));    
> > > 
> > > What's wrong with order==0? What is ORDER_MASK good for?  
> > 
> > as written in the comment in the struct, the metadata contains the
> > order and a bit to mark when a page is allocated. ORDER_MASK is
> > used to extract the order part.
> > 
> > a block with order 0 is a single page. you cannot split a page.
> > that's why order 0 is bad.
> > 
> > split takes a pointer to a free block and splits the block in two
> > blocks of half the size each. this is needed when there are no free
> > blocks of the suitable size, but there are bigger free blocks
> >   
> > > > +	assert(IS_ALIGNED_ORDER(pfn, order));    
> > > 
> > > I'm getting the feeling that split() shouldn't operate on an
> > > addr, but rather some internal index of a block list or
> > > something. There are just too many requirements as to what addr
> > > is supposed to be.  
> > 
> > split is an internal function, it's static for a reason. basically
> > there is exactly one spot where it should be used, and that is the
> > one where it is used.
> >    
> > > > +	assert(area_contains(a, pfn + BIT(order) - 1));
> > > >  
> > > > -	if (size == 0) {
> > > > -		freelist = NULL;
> > > > -		return;
> > > > -	}
> > > > +	/* Remove the block from its free list */
> > > > +	p = list_remove(addr);    
> > > 
> > > So addr is a pointer to a linked_list object? I'm getting really
> > > confused.  
> > 
> > it's literally a free list; the beginning of each free block of
> > memory contains the pointers so that it can be handled using list
> > operations  
> 
> Using a list member for list operations and container_of() to get
> back to the object would help readability.

except that the "object" is the block of free memory, and the only
thing inside it is the list object at the very beginning. I think it's
overkill to use container_of in that case, since the only thing is the
list itself, and it's at the beginning of the free block.

> >   
> > > My editor says I'm only 30% of my way through this patch, so I
> > > don't think I'm going to have time to look at the rest. I think
> > > this patch is trying to do too much at once. IMO, we need to
> > > implement a single  
> > 
> > this patch is implementing the page allocator; it is a single new
> > feature.
> >   
> > > new feature at a time. Possibly starting with some cleanup patches
> > > for the current code first.  
> > 
> > there is no way to split this any further, I need to break some
> > interfaces, so I need to do it all at once.  
> 
> You can always build up a new implementation with a series of patches
> that don't wire up to anything. A final patch can simply wire up the
> new and delete the old.
> 
> Thanks,
> drew

so what I gather at the end is that I have to write a physical memory
allocator for a complex OS that has to seamlessly access high memory.

I'll try to come up with something

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-05 17:18         ` Claudio Imbrenda
@ 2020-10-05 18:04           ` Andrew Jones
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Jones @ 2020-10-05 18:04 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, Oct 05, 2020 at 07:18:17PM +0200, Claudio Imbrenda wrote:
> On Mon, 5 Oct 2020 18:53:11 +0200
> Andrew Jones <drjones@redhat.com> wrote:
> > That won't work for 32-bit processors that support physical addresses
> > that are larger than 32-bits - if the target has more than 4G and the
> > unit test wants to access that high memory. Using phys_addr_t
> > everywhere you want a physical address avoids that problem.
> 
> no it doesn't. you can't even handle the freelist if it's in high
> memory. consider that the current allocator also does not handle high
> memory. and there is no infrastructure to shortly map high memory.
> 
> if a 32 bit test wants to access high memory, it will have to manage it
> manually

But the point of this series is to rewrite the allocator in order to
add features such as allocations from different memory zones, like
high memory, and also to fix its shortcomings. It seems like we should
start by fixing the current allocator's lack of concern for physical
and virtual address distinctions.

> > Using a list member for list operations and container_of() to get
> > back to the object would help readability.
> 
> except that the "object" is the block of free memory, and the only
> thing inside it is the list object at the very beginning. I think it's
> overkill to use container_of in that case, since the only thing is the
> list itself, and it's at the beginning of the free block.

Fair enough, but I'm also thinking it might be overkill to introduce
a general list API at all if all we need to do is insert and delete
blocks, and ones where we're aware of where the pointers are.

> so what I gather at the end is that I have to write a physical memory
> allocator for a complex OS that has to seamlessly access high memory.

We definitely don't want complexity. Trade-offs for simplicity are
good, but they should be implemented in a way that assert when
assumptions fail. If we want to limit memory access to 4G for 32-bit
architectures, then we should add some code to clamp upper physical
addresses and some asserts for unit tests that try to use them. If
the allocator wants to depend on the identity map to simplify its
implementation, then that's probably OK too, but I'd still keep
the interfaces clean: physical addresses should be phys_addr_t,
virtual addresses that should be the identity mapping of physical
addresses should be checked, e.g. assert(virt_to_phys(vaddr) == vaddr)

> 
> I'll try to come up with something
> 

Thanks,
drew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions Claudio Imbrenda
  2020-10-02 18:18   ` Andrew Jones
@ 2020-11-06 11:29   ` Paolo Bonzini
  1 sibling, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2020-11-06 11:29 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: frankja, david, thuth, cohuck, lvivier

On 02/10/20 17:44, Claudio Imbrenda wrote:
> +#include <stdbool.h>
> +
> +/*
> + * Simple double linked list. The pointer to the list is a list item itself,
> + * like in the kernel implementation.

More precisely, *circular* doubly-linked list.  Just a minor note. :)

Paolo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas Claudio Imbrenda
  2020-10-03  9:23   ` Andrew Jones
@ 2020-11-06 11:34   ` Paolo Bonzini
  2020-11-06 12:58     ` Claudio Imbrenda
  1 sibling, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2020-11-06 11:34 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: frankja, david, thuth, cohuck, lvivier

On 02/10/20 17:44, Claudio Imbrenda wrote:
> x86 gets
> * lowest area (24-bit addresses)
> * low area (32-bit addresses)
> * the rest

x86 if anything could use a 36-bit area; the 24-bit one is out of scope 
for what kvm-unit-tests does.

So something like this:

diff --git a/lib/x86/asm/memory_areas.h b/lib/x86/asm/memory_areas.h
index d704df3..952f5bd 100644
--- a/lib/x86/asm/memory_areas.h
+++ b/lib/x86/asm/memory_areas.h
@@ -1,20 +1,19 @@
  #ifndef MEMORY_AREAS_H
  #define MEMORY_AREAS_H

-#define AREA_NORMAL_PFN BIT(32-12)
+#define AREA_NORMAL_PFN BIT(36-12)
  #define AREA_NORMAL_NUMBER 0
  #define AREA_NORMAL 1

-#define AREA_LOW_PFN BIT(24-12)
-#define AREA_LOW_NUMBER 1
-#define AREA_LOW 2
+#define AREA_PAE_HIGH_PFN BIT(32-12)
+#define AREA_PAE_HIGH_NUMBER 1
+#define AREA_PAE_HIGH 2

-#define AREA_LOWEST_PFN 0
-#define AREA_LOWEST_NUMBER 2
-#define AREA_LOWEST 4
+#define AREA_LOW_PFN 0
+#define AREA_LOW_NUMBER 2
+#define AREA_LOW 4

-#define AREA_DMA24 AREA_LOWEST
-#define AREA_DMA32 (AREA_LOWEST | AREA_LOW)
+#define AREA_PAE (AREA_PAE | AREA_LOW)

  #define AREA_ANY -1
  #define AREA_ANY_NUMBER 0xff

Paolo


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops Claudio Imbrenda
@ 2020-11-06 11:35   ` Paolo Bonzini
  2020-11-06 12:56     ` Claudio Imbrenda
  0 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2020-11-06 11:35 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: frankja, david, thuth, cohuck, lvivier

On 02/10/20 17:44, Claudio Imbrenda wrote:
> Remove align_min from struct alloc_ops.

... because? :)

Paolo

> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   lib/alloc.h      | 1 -
>   lib/alloc_page.c | 1 -
>   lib/alloc_phys.c | 9 +++++----
>   lib/vmalloc.c    | 1 -
>   4 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/alloc.h b/lib/alloc.h
> index 9b4b634..db90b01 100644
> --- a/lib/alloc.h
> +++ b/lib/alloc.h
> @@ -25,7 +25,6 @@
>   struct alloc_ops {
>   	void *(*memalign)(size_t alignment, size_t size);
>   	void (*free)(void *ptr);
> -	size_t align_min;
>   };
>   
>   extern struct alloc_ops *alloc_ops;
> diff --git a/lib/alloc_page.c b/lib/alloc_page.c
> index 046082a..3c6c4ee 100644
> --- a/lib/alloc_page.c
> +++ b/lib/alloc_page.c
> @@ -320,7 +320,6 @@ void *alloc_page()
>   static struct alloc_ops page_alloc_ops = {
>   	.memalign = memalign_pages,
>   	.free = free_pages,
> -	.align_min = PAGE_SIZE,
>   };
>   
>   /*
> diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
> index 72e20f7..a4d2bf2 100644
> --- a/lib/alloc_phys.c
> +++ b/lib/alloc_phys.c
> @@ -29,8 +29,8 @@ static phys_addr_t base, top;
>   static void *early_memalign(size_t alignment, size_t size);
>   static struct alloc_ops early_alloc_ops = {
>   	.memalign = early_memalign,
> -	.align_min = DEFAULT_MINIMUM_ALIGNMENT
>   };
> +static size_t align_min;
>   
>   struct alloc_ops *alloc_ops = &early_alloc_ops;
>   
> @@ -39,8 +39,7 @@ void phys_alloc_show(void)
>   	int i;
>   
>   	spin_lock(&lock);
> -	printf("phys_alloc minimum alignment: %#" PRIx64 "\n",
> -		(u64)early_alloc_ops.align_min);
> +	printf("phys_alloc minimum alignment: %#" PRIx64 "\n", (u64)align_min);
>   	for (i = 0; i < nr_regions; ++i)
>   		printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
>   			(u64)regions[i].base,
> @@ -64,7 +63,7 @@ void phys_alloc_set_minimum_alignment(phys_addr_t align)
>   {
>   	assert(align && !(align & (align - 1)));
>   	spin_lock(&lock);
> -	early_alloc_ops.align_min = align;
> +	align_min = align;
>   	spin_unlock(&lock);
>   }
>   
> @@ -83,6 +82,8 @@ static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
>   		top_safe = MIN(top_safe, 1ULL << 32);
>   
>   	assert(base < top_safe);
> +	if (align < align_min)
> +		align = align_min;
>   
>   	addr = ALIGN(base, align);
>   	size += addr - base;
> diff --git a/lib/vmalloc.c b/lib/vmalloc.c
> index 986a34c..b28a390 100644
> --- a/lib/vmalloc.c
> +++ b/lib/vmalloc.c
> @@ -188,7 +188,6 @@ static void vm_free(void *mem)
>   static struct alloc_ops vmalloc_ops = {
>   	.memalign = vm_memalign,
>   	.free = vm_free,
> -	.align_min = PAGE_SIZE,
>   };
>   
>   void __attribute__((__weak__)) find_highmem(void)
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators
  2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
                   ` (7 preceding siblings ...)
  2020-10-05 11:54 ` [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Pierre Morel
@ 2020-11-06 11:36 ` Paolo Bonzini
  8 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2020-11-06 11:36 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: frankja, david, thuth, cohuck, lvivier

On 02/10/20 17:44, Claudio Imbrenda wrote:
> The KVM unit tests are increasingly being used to test more than just
> KVM. They are being used to test TCG, qemu I/O device emulation, other
> hypervisors, and even actual hardware.
> 
> The existing memory allocators are becoming more and more inadequate to
> the needs of the upcoming unit tests (but also some existing ones, see
> below).
> 
> Some important features that are lacking:
> * ability to perform a small physical page allocation with a big
>    alignment withtout wasting huge amounts of memory
> * ability to allocate physical pages from specific pools/areaas (e.g.
>    below 16M, or 4G, etc)
> * ability to reserve arbitrary pages (if free), removing them from the
>    free pool
> 
> Some other features that are nice, but not so fundamental:
> * no need for the generic allocator to keep track of metadata
>    (i.e. allocation size), this is now handled by the lower level
>    allocators
> * coalescing small blocks into bigger ones, to allow contiguous memory
>    freed in small blocks in a random order to be used for large
>    allocations again
> 
> This is achieved in the following ways:
> 
> For the virtual allocator:
> * only the virtul allocator needs one extra page of metadata, but only
>    for allocations that wouldn't fit in one page
> 
> For the page allocator:
> * page allocator has up to 6 memory pools, each pool has a metadata
>    area; the metadata has a byte for each page in the area, describing
>    the order of the block it belongs to, and whether it is free
> * if there are no free blocks of the desired size, a bigger block is
>    split until we reach the required size; the unused parts of the block
>    are put back in the free lists
> * if an allocation needs ablock with a larger alignment than its size, a
>    larger block of (at least) the required order is split; the unused parts
>    put back in the appropriate free lists
> * if the allocation could not be satisfied, the next allowed area is
>    searched; the allocation fails only when all allowed areas have been
>    tried
> * new functions to perform allocations from specific areas; the areas
>    are arch-dependent and should be set up by the arch code
> * for now x86 has a memory area for "lowest" memory under 16MB, one for
>    "low" memory under 4GB and one for the rest, while s390x has one for under
>    2GB and one for the rest; suggestions for more fine grained areas or for
>    the other architectures are welcome
> * upon freeing a block, an attempt is made to coalesce it into the
>    appropriate neighbour (if it is free), and so on for the resulting
>    larger block thus obtained
> 
> For the physical allocator:
> * the minimum alignment is now handled manually, since it has been
>    removed from the common struct
> 
> 
> This patchset addresses some current but otherwise unsolvable issues on
> s390x, such as the need to allocate a block under 2GB for each SMP CPU
> upon CPU activation.
> 
> This patchset has been tested on s390x, amd64 and i386. It has also been
> compiled on aarch64.
> 
> V1->V2:
> * Renamed some functions, as per review comments
> * Improved commit messages
> * Split the list handling functions into an independent header
> * Addded arch-specific headers to define the memory areas
> * Fixed some minor issues
> * The magic value for small allocations in the virtual allocator is now
>    put right before the returned pointer, like for large allocations
> * Added comments to make the code more readable
> * Many minor fixes

Queued with the exception of patch 6 (still waiting for the CI to 
finish, but still).

Paolo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops
  2020-11-06 11:35   ` Paolo Bonzini
@ 2020-11-06 12:56     ` Claudio Imbrenda
  0 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-11-06 12:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, frankja, david, thuth, cohuck, lvivier

On Fri, 6 Nov 2020 12:35:31 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 02/10/20 17:44, Claudio Imbrenda wrote:
> > Remove align_min from struct alloc_ops.  
> 
> ... because? :)

because nobody is using it anymore at this point :)

> Paolo
> 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > ---
> >   lib/alloc.h      | 1 -
> >   lib/alloc_page.c | 1 -
> >   lib/alloc_phys.c | 9 +++++----
> >   lib/vmalloc.c    | 1 -
> >   4 files changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/lib/alloc.h b/lib/alloc.h
> > index 9b4b634..db90b01 100644
> > --- a/lib/alloc.h
> > +++ b/lib/alloc.h
> > @@ -25,7 +25,6 @@
> >   struct alloc_ops {
> >   	void *(*memalign)(size_t alignment, size_t size);
> >   	void (*free)(void *ptr);
> > -	size_t align_min;
> >   };
> >   
> >   extern struct alloc_ops *alloc_ops;
> > diff --git a/lib/alloc_page.c b/lib/alloc_page.c
> > index 046082a..3c6c4ee 100644
> > --- a/lib/alloc_page.c
> > +++ b/lib/alloc_page.c
> > @@ -320,7 +320,6 @@ void *alloc_page()
> >   static struct alloc_ops page_alloc_ops = {
> >   	.memalign = memalign_pages,
> >   	.free = free_pages,
> > -	.align_min = PAGE_SIZE,
> >   };
> >   
> >   /*
> > diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
> > index 72e20f7..a4d2bf2 100644
> > --- a/lib/alloc_phys.c
> > +++ b/lib/alloc_phys.c
> > @@ -29,8 +29,8 @@ static phys_addr_t base, top;
> >   static void *early_memalign(size_t alignment, size_t size);
> >   static struct alloc_ops early_alloc_ops = {
> >   	.memalign = early_memalign,
> > -	.align_min = DEFAULT_MINIMUM_ALIGNMENT
> >   };
> > +static size_t align_min;
> >   
> >   struct alloc_ops *alloc_ops = &early_alloc_ops;
> >   
> > @@ -39,8 +39,7 @@ void phys_alloc_show(void)
> >   	int i;
> >   
> >   	spin_lock(&lock);
> > -	printf("phys_alloc minimum alignment: %#" PRIx64 "\n",
> > -		(u64)early_alloc_ops.align_min);
> > +	printf("phys_alloc minimum alignment: %#" PRIx64 "\n",
> > (u64)align_min); for (i = 0; i < nr_regions; ++i)
> >   		printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
> >   			(u64)regions[i].base,
> > @@ -64,7 +63,7 @@ void phys_alloc_set_minimum_alignment(phys_addr_t
> > align) {
> >   	assert(align && !(align & (align - 1)));
> >   	spin_lock(&lock);
> > -	early_alloc_ops.align_min = align;
> > +	align_min = align;
> >   	spin_unlock(&lock);
> >   }
> >   
> > @@ -83,6 +82,8 @@ static phys_addr_t
> > phys_alloc_aligned_safe(phys_addr_t size, top_safe = MIN(top_safe,
> > 1ULL << 32); 
> >   	assert(base < top_safe);
> > +	if (align < align_min)
> > +		align = align_min;
> >   
> >   	addr = ALIGN(base, align);
> >   	size += addr - base;
> > diff --git a/lib/vmalloc.c b/lib/vmalloc.c
> > index 986a34c..b28a390 100644
> > --- a/lib/vmalloc.c
> > +++ b/lib/vmalloc.c
> > @@ -188,7 +188,6 @@ static void vm_free(void *mem)
> >   static struct alloc_ops vmalloc_ops = {
> >   	.memalign = vm_memalign,
> >   	.free = vm_free,
> > -	.align_min = PAGE_SIZE,
> >   };
> >   
> >   void __attribute__((__weak__)) find_highmem(void)
> >   
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas
  2020-11-06 11:34   ` Paolo Bonzini
@ 2020-11-06 12:58     ` Claudio Imbrenda
  2020-11-06 13:04       ` Paolo Bonzini
  0 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-11-06 12:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, frankja, david, thuth, cohuck, lvivier

On Fri, 6 Nov 2020 12:34:10 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 02/10/20 17:44, Claudio Imbrenda wrote:
> > x86 gets
> > * lowest area (24-bit addresses)
> > * low area (32-bit addresses)
> > * the rest  
> 
> x86 if anything could use a 36-bit area; the 24-bit one is out of
> scope for what kvm-unit-tests does.

sure... I went with what I remembered about the x86 architecture, but
I'm not an expert

my patch was meant to be some "sensible defaults" that people with
more knowledge should override anyway :)

> So something like this:
> 
> diff --git a/lib/x86/asm/memory_areas.h b/lib/x86/asm/memory_areas.h
> index d704df3..952f5bd 100644
> --- a/lib/x86/asm/memory_areas.h
> +++ b/lib/x86/asm/memory_areas.h
> @@ -1,20 +1,19 @@
>   #ifndef MEMORY_AREAS_H
>   #define MEMORY_AREAS_H
> 
> -#define AREA_NORMAL_PFN BIT(32-12)
> +#define AREA_NORMAL_PFN BIT(36-12)
>   #define AREA_NORMAL_NUMBER 0
>   #define AREA_NORMAL 1
> 
> -#define AREA_LOW_PFN BIT(24-12)
> -#define AREA_LOW_NUMBER 1
> -#define AREA_LOW 2
> +#define AREA_PAE_HIGH_PFN BIT(32-12)
> +#define AREA_PAE_HIGH_NUMBER 1
> +#define AREA_PAE_HIGH 2
> 
> -#define AREA_LOWEST_PFN 0
> -#define AREA_LOWEST_NUMBER 2
> -#define AREA_LOWEST 4
> +#define AREA_LOW_PFN 0
> +#define AREA_LOW_NUMBER 2
> +#define AREA_LOW 4
> 
> -#define AREA_DMA24 AREA_LOWEST
> -#define AREA_DMA32 (AREA_LOWEST | AREA_LOW)
> +#define AREA_PAE (AREA_PAE | AREA_LOW)
> 
>   #define AREA_ANY -1
>   #define AREA_ANY_NUMBER 0xff
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas
  2020-11-06 12:58     ` Claudio Imbrenda
@ 2020-11-06 13:04       ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2020-11-06 13:04 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: kvm, frankja, david, thuth, cohuck, lvivier

On 06/11/20 13:58, Claudio Imbrenda wrote:
>> x86 if anything could use a 36-bit area; the 24-bit one is out of
>> scope for what kvm-unit-tests does.
> sure... I went with what I remembered about the x86 architecture, but
> I'm not an expert
> 
> my patch was meant to be some "sensible defaults" that people with
> more knowledge should override anyway:)

Yep, got it.  Though, "why are 24-bit addresses special" (for ISA DMA 
that is only used by floppies and SoundBlaster cards) is actually way 
more for experts than "why are 36-bit addresses special" at this point 
in time!

Paolo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator Claudio Imbrenda
  2020-10-05 12:40   ` Andrew Jones
@ 2020-12-08  0:41   ` Nadav Amit
  2020-12-08  1:10     ` Nadav Amit
  2020-12-08  9:11     ` Claudio Imbrenda
  1 sibling, 2 replies; 42+ messages in thread
From: Nadav Amit @ 2020-12-08  0:41 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: KVM, pbonzini, frankja, david, thuth, cohuck, lvivier

> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
> This is a complete rewrite of the page allocator.

This patch causes me crashes:

  lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))

It appears that two areas are registered on AREA_LOW_NUMBER, as setup_vm()
can call (and calls on my system) page_alloc_init_area() twice.

setup_vm() uses AREA_ANY_NUMBER as the area number argument but eventually
this means, according to the code, that __page_alloc_init_area() would use
AREA_LOW_NUMBER.

I do not understand the rationale behind these areas well enough to fix it.

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08  0:41   ` Nadav Amit
@ 2020-12-08  1:10     ` Nadav Amit
  2020-12-08  9:15       ` Claudio Imbrenda
  2020-12-08  9:11     ` Claudio Imbrenda
  1 sibling, 1 reply; 42+ messages in thread
From: Nadav Amit @ 2020-12-08  1:10 UTC (permalink / raw)
  To: Claudio Imbrenda; +Cc: KVM, pbonzini, frankja, david, thuth, cohuck, lvivier

> On Dec 7, 2020, at 4:41 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>> 
>> This is a complete rewrite of the page allocator.
> 
> This patch causes me crashes:
> 
>  lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
> 
> It appears that two areas are registered on AREA_LOW_NUMBER, as setup_vm()
> can call (and calls on my system) page_alloc_init_area() twice.
> 
> setup_vm() uses AREA_ANY_NUMBER as the area number argument but eventually
> this means, according to the code, that __page_alloc_init_area() would use
> AREA_LOW_NUMBER.
> 
> I do not understand the rationale behind these areas well enough to fix it.

One more thing: I changed the previous allocator to zero any allocated page.
Without it, I get strange failures when I do not run the tests on KVM, which
are presumably caused by some intentional or unintentional hidden assumption
of kvm-unit-tests that the memory is zeroed.

Can you restore this behavior? I can also send this one-line fix, but I do
not want to overstep on your (hopeful) fix for the previous problem that I
mentioned (AREA_ANY_NUMBER).

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08  0:41   ` Nadav Amit
  2020-12-08  1:10     ` Nadav Amit
@ 2020-12-08  9:11     ` Claudio Imbrenda
  2020-12-08  9:16       ` Nadav Amit
  1 sibling, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-12-08  9:11 UTC (permalink / raw)
  To: Nadav Amit; +Cc: KVM, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, 7 Dec 2020 16:41:18 -0800
Nadav Amit <nadav.amit@gmail.com> wrote:

> > On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda
> > <imbrenda@linux.ibm.com> wrote:
> > 
> > This is a complete rewrite of the page allocator.  
> 
> This patch causes me crashes:
> 
>   lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
> 
> It appears that two areas are registered on AREA_LOW_NUMBER, as
> setup_vm() can call (and calls on my system) page_alloc_init_area()
> twice.

which system is that? page_alloc_init_area should not be called twice
on the same area!

> setup_vm() uses AREA_ANY_NUMBER as the area number argument but
> eventually this means, according to the code, that
> __page_alloc_init_area() would use AREA_LOW_NUMBER.
> 
> I do not understand the rationale behind these areas well enough to
> fix it.

I'll see what I can fix


Claudio

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08  1:10     ` Nadav Amit
@ 2020-12-08  9:15       ` Claudio Imbrenda
  2020-12-08  9:23         ` Nadav Amit
  0 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-12-08  9:15 UTC (permalink / raw)
  To: Nadav Amit; +Cc: KVM, pbonzini, frankja, david, thuth, cohuck, lvivier

On Mon, 7 Dec 2020 17:10:13 -0800
Nadav Amit <nadav.amit@gmail.com> wrote:

> > On Dec 7, 2020, at 4:41 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> >   
> >> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda
> >> <imbrenda@linux.ibm.com> wrote:
> >> 
> >> This is a complete rewrite of the page allocator.  
> > 
> > This patch causes me crashes:
> > 
> >  lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
> > 
> > It appears that two areas are registered on AREA_LOW_NUMBER, as
> > setup_vm() can call (and calls on my system) page_alloc_init_area()
> > twice.
> > 
> > setup_vm() uses AREA_ANY_NUMBER as the area number argument but
> > eventually this means, according to the code, that
> > __page_alloc_init_area() would use AREA_LOW_NUMBER.
> > 
> > I do not understand the rationale behind these areas well enough to
> > fix it.  
> 
> One more thing: I changed the previous allocator to zero any
> allocated page. Without it, I get strange failures when I do not run
> the tests on KVM, which are presumably caused by some intentional or
> unintentional hidden assumption of kvm-unit-tests that the memory is
> zeroed.
> 
> Can you restore this behavior? I can also send this one-line fix, but
> I do not want to overstep on your (hopeful) fix for the previous
> problem that I mentioned (AREA_ANY_NUMBER).

no. Some tests depend on the fact that the memory is being touched for
the first time.

if your test depends on memory being zeroed on allocation, maybe you
can zero the memory yourself in the test?

otherwise I can try adding a function to explicitly allocate a zeroed
page.


Claudio

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08  9:11     ` Claudio Imbrenda
@ 2020-12-08  9:16       ` Nadav Amit
  0 siblings, 0 replies; 42+ messages in thread
From: Nadav Amit @ 2020-12-08  9:16 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, Laurent Vivier

> On Dec 8, 2020, at 1:11 AM, Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
> On Mon, 7 Dec 2020 16:41:18 -0800
> Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>>> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda
>>> <imbrenda@linux.ibm.com> wrote:
>>> 
>>> This is a complete rewrite of the page allocator.  
>> 
>> This patch causes me crashes:
>> 
>>  lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
>> 
>> It appears that two areas are registered on AREA_LOW_NUMBER, as
>> setup_vm() can call (and calls on my system) page_alloc_init_area()
>> twice.
> 
> which system is that? page_alloc_init_area should not be called twice
> on the same area!

It is not the “same area”, it just uses the same id, see setup_vm():

        if (!page_alloc_initialized()) {
                base = PAGE_ALIGN(base) >> PAGE_SHIFT;
                top = top >> PAGE_SHIFT;

		// FIRST
                page_alloc_init_area(AREA_ANY_NUMBER, base, top);
                page_alloc_ops_enable();
        }

        find_highmem();
        phys_alloc_get_unused(&base, &top);
        page_root = setup_mmu(top);
        if (base != top) {
                base = PAGE_ALIGN(base) >> PAGE_SHIFT;
                top = top >> PAGE_SHIFT;

		// SECOND
                page_alloc_init_area(AREA_ANY_NUMBER, base, top);
        }

The problem occurs when I run KVM-unit-tests on VMware, but would presumably
also happen on bare-metal.

> 
>> setup_vm() uses AREA_ANY_NUMBER as the area number argument but
>> eventually this means, according to the code, that
>> __page_alloc_init_area() would use AREA_LOW_NUMBER.
>> 
>> I do not understand the rationale behind these areas well enough to
>> fix it.
> 
> I'll see what I can fix

Thanks,
Nadav


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08  9:15       ` Claudio Imbrenda
@ 2020-12-08  9:23         ` Nadav Amit
  2020-12-08 10:00           ` Claudio Imbrenda
  0 siblings, 1 reply; 42+ messages in thread
From: Nadav Amit @ 2020-12-08  9:23 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, lvivier

> On Dec 8, 2020, at 1:15 AM, Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
> On Mon, 7 Dec 2020 17:10:13 -0800
> Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>>> On Dec 7, 2020, at 4:41 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>>> 
>>>> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda
>>>> <imbrenda@linux.ibm.com> wrote:
>>>> 
>>>> This is a complete rewrite of the page allocator.  
>>> 
>>> This patch causes me crashes:
>>> 
>>> lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
>>> 
>>> It appears that two areas are registered on AREA_LOW_NUMBER, as
>>> setup_vm() can call (and calls on my system) page_alloc_init_area()
>>> twice.
>>> 
>>> setup_vm() uses AREA_ANY_NUMBER as the area number argument but
>>> eventually this means, according to the code, that
>>> __page_alloc_init_area() would use AREA_LOW_NUMBER.
>>> 
>>> I do not understand the rationale behind these areas well enough to
>>> fix it.  
>> 
>> One more thing: I changed the previous allocator to zero any
>> allocated page. Without it, I get strange failures when I do not run
>> the tests on KVM, which are presumably caused by some intentional or
>> unintentional hidden assumption of kvm-unit-tests that the memory is
>> zeroed.
>> 
>> Can you restore this behavior? I can also send this one-line fix, but
>> I do not want to overstep on your (hopeful) fix for the previous
>> problem that I mentioned (AREA_ANY_NUMBER).
> 
> no. Some tests depend on the fact that the memory is being touched for
> the first time.
> 
> if your test depends on memory being zeroed on allocation, maybe you
> can zero the memory yourself in the test?
> 
> otherwise I can try adding a function to explicitly allocate a zeroed
> page.

To be fair, I do not know which non-zeroed memory causes the failure, and
debugging these kind of failures is hard and sometimes non-deterministic. For
instance, the failure I got this time was:

	Test suite: vmenter
	VM-Fail on vmlaunch: error number is 7. See Intel 30.4.

And other VM-entry failures, which are not easy to debug, especially on
bare-metal.

Note that the failing test is not new, and unfortunately these kind of
errors (wrong assumption that memory is zeroed) are not rare, since KVM
indeed zeroes the memory (unlike other hypervisors and bare-metal).

The previous allocator had the behavior of zeroing the memory to avoid such
problems. I would argue that zeroing should be the default behavior, and if
someone wants to have the memory “untouched” for a specific test (which
one?) he should use an alternative function for this matter.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08  9:23         ` Nadav Amit
@ 2020-12-08 10:00           ` Claudio Imbrenda
  2020-12-08 12:48             ` Nadav Amit
  0 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-12-08 10:00 UTC (permalink / raw)
  To: Nadav Amit
  Cc: KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, lvivier

On Tue, 8 Dec 2020 01:23:59 -0800
Nadav Amit <nadav.amit@gmail.com> wrote:

> > On Dec 8, 2020, at 1:15 AM, Claudio Imbrenda
> > <imbrenda@linux.ibm.com> wrote:
> > 
> > On Mon, 7 Dec 2020 17:10:13 -0800
> > Nadav Amit <nadav.amit@gmail.com> wrote:
> >   
> >>> On Dec 7, 2020, at 4:41 PM, Nadav Amit <nadav.amit@gmail.com>
> >>> wrote: 
> >>>> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda
> >>>> <imbrenda@linux.ibm.com> wrote:
> >>>> 
> >>>> This is a complete rewrite of the page allocator.    
> >>> 
> >>> This patch causes me crashes:
> >>> 
> >>> lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
> >>> 
> >>> It appears that two areas are registered on AREA_LOW_NUMBER, as
> >>> setup_vm() can call (and calls on my system)
> >>> page_alloc_init_area() twice.
> >>> 
> >>> setup_vm() uses AREA_ANY_NUMBER as the area number argument but
> >>> eventually this means, according to the code, that
> >>> __page_alloc_init_area() would use AREA_LOW_NUMBER.
> >>> 
> >>> I do not understand the rationale behind these areas well enough
> >>> to fix it.    
> >> 
> >> One more thing: I changed the previous allocator to zero any
> >> allocated page. Without it, I get strange failures when I do not
> >> run the tests on KVM, which are presumably caused by some
> >> intentional or unintentional hidden assumption of kvm-unit-tests
> >> that the memory is zeroed.
> >> 
> >> Can you restore this behavior? I can also send this one-line fix,
> >> but I do not want to overstep on your (hopeful) fix for the
> >> previous problem that I mentioned (AREA_ANY_NUMBER).  
> > 
> > no. Some tests depend on the fact that the memory is being touched
> > for the first time.
> > 
> > if your test depends on memory being zeroed on allocation, maybe you
> > can zero the memory yourself in the test?
> > 
> > otherwise I can try adding a function to explicitly allocate a
> > zeroed page.  
> 
> To be fair, I do not know which non-zeroed memory causes the failure,
> and debugging these kind of failures is hard and sometimes
> non-deterministic. For instance, the failure I got this time was:
> 
> 	Test suite: vmenter
> 	VM-Fail on vmlaunch: error number is 7. See Intel 30.4.
> 
> And other VM-entry failures, which are not easy to debug, especially
> on bare-metal.

so you are running the test on bare metal?

that is something I had not tested

> Note that the failing test is not new, and unfortunately these kind of
> errors (wrong assumption that memory is zeroed) are not rare, since
> KVM indeed zeroes the memory (unlike other hypervisors and
> bare-metal).
> 
> The previous allocator had the behavior of zeroing the memory to

I don't remember such behaviour, but I'll have a look

> avoid such problems. I would argue that zeroing should be the default
> behavior, and if someone wants to have the memory “untouched” for a
> specific test (which one?) he should use an alternative function for
> this matter.

probably we need some commandline switches to change the behaviour of
the allocator according to the specific needs of each testcase


I'll see what I can do


Claudio

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08 10:00           ` Claudio Imbrenda
@ 2020-12-08 12:48             ` Nadav Amit
  2020-12-08 13:41               ` Claudio Imbrenda
  0 siblings, 1 reply; 42+ messages in thread
From: Nadav Amit @ 2020-12-08 12:48 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, lvivier

> On Dec 8, 2020, at 2:00 AM, Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
> On Tue, 8 Dec 2020 01:23:59 -0800
> Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>>> On Dec 8, 2020, at 1:15 AM, Claudio Imbrenda
>>> <imbrenda@linux.ibm.com> wrote:
>>> 
>>> On Mon, 7 Dec 2020 17:10:13 -0800
>>> Nadav Amit <nadav.amit@gmail.com> wrote:
>>> 
>>>>> On Dec 7, 2020, at 4:41 PM, Nadav Amit <nadav.amit@gmail.com>
>>>>> wrote: 
>>>>>> On Oct 2, 2020, at 8:44 AM, Claudio Imbrenda
>>>>>> <imbrenda@linux.ibm.com> wrote:
>>>>>> 
>>>>>> This is a complete rewrite of the page allocator.    
>>>>> 
>>>>> This patch causes me crashes:
>>>>> 
>>>>> lib/alloc_page.c:433: assert failed: !(areas_mask & BIT(n))
>>>>> 
>>>>> It appears that two areas are registered on AREA_LOW_NUMBER, as
>>>>> setup_vm() can call (and calls on my system)
>>>>> page_alloc_init_area() twice.
>>>>> 
>>>>> setup_vm() uses AREA_ANY_NUMBER as the area number argument but
>>>>> eventually this means, according to the code, that
>>>>> __page_alloc_init_area() would use AREA_LOW_NUMBER.
>>>>> 
>>>>> I do not understand the rationale behind these areas well enough
>>>>> to fix it.    
>>>> 
>>>> One more thing: I changed the previous allocator to zero any
>>>> allocated page. Without it, I get strange failures when I do not
>>>> run the tests on KVM, which are presumably caused by some
>>>> intentional or unintentional hidden assumption of kvm-unit-tests
>>>> that the memory is zeroed.
>>>> 
>>>> Can you restore this behavior? I can also send this one-line fix,
>>>> but I do not want to overstep on your (hopeful) fix for the
>>>> previous problem that I mentioned (AREA_ANY_NUMBER).  
>>> 
>>> no. Some tests depend on the fact that the memory is being touched
>>> for the first time.
>>> 
>>> if your test depends on memory being zeroed on allocation, maybe you
>>> can zero the memory yourself in the test?
>>> 
>>> otherwise I can try adding a function to explicitly allocate a
>>> zeroed page.  
>> 
>> To be fair, I do not know which non-zeroed memory causes the failure,
>> and debugging these kind of failures is hard and sometimes
>> non-deterministic. For instance, the failure I got this time was:
>> 
>> 	Test suite: vmenter
>> 	VM-Fail on vmlaunch: error number is 7. See Intel 30.4.
>> 
>> And other VM-entry failures, which are not easy to debug, especially
>> on bare-metal.
> 
> so you are running the test on bare metal?
> 
> that is something I had not tested

Base-metal / VMware.

> 
>> Note that the failing test is not new, and unfortunately these kind of
>> errors (wrong assumption that memory is zeroed) are not rare, since
>> KVM indeed zeroes the memory (unlike other hypervisors and
>> bare-metal).
>> 
>> The previous allocator had the behavior of zeroing the memory to
> 
> I don't remember such behaviour, but I'll have a look

See https://www.spinics.net/lists/kvm/msg186474.html

> 
>> avoid such problems. I would argue that zeroing should be the default
>> behavior, and if someone wants to have the memory “untouched” for a
>> specific test (which one?) he should use an alternative function for
>> this matter.
> 
> probably we need some commandline switches to change the behaviour of
> the allocator according to the specific needs of each testcase
> 
> 
> I'll see what I can do

I do not think commandline switches are the right way. I think that
reproducibility requires the memory to always be on a given state before the
tests begin. There are latent bugs in kvm-unit-tests that are not apparent
when the memory is zeroed. I do not think anyone wants to waste time on
resolving these bugs.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08 12:48             ` Nadav Amit
@ 2020-12-08 13:41               ` Claudio Imbrenda
  2020-12-08 14:26                 ` Andrew Jones
  0 siblings, 1 reply; 42+ messages in thread
From: Claudio Imbrenda @ 2020-12-08 13:41 UTC (permalink / raw)
  To: Nadav Amit
  Cc: KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, lvivier

On Tue, 8 Dec 2020 04:48:09 -0800
Nadav Amit <nadav.amit@gmail.com> wrote:

[...]

> >> And other VM-entry failures, which are not easy to debug,
> >> especially on bare-metal.  
> > 
> > so you are running the test on bare metal?
> > 
> > that is something I had not tested  
> 
> Base-metal / VMware.
> 
> >   
> >> Note that the failing test is not new, and unfortunately these
> >> kind of errors (wrong assumption that memory is zeroed) are not
> >> rare, since KVM indeed zeroes the memory (unlike other hypervisors
> >> and bare-metal).
> >> 
> >> The previous allocator had the behavior of zeroing the memory to  
> > 
> > I don't remember such behaviour, but I'll have a look  
> 
> See https://www.spinics.net/lists/kvm/msg186474.html

hmmm it seems I had completely missed it, oops

> >   
> >> avoid such problems. I would argue that zeroing should be the
> >> default behavior, and if someone wants to have the memory
> >> “untouched” for a specific test (which one?) he should use an
> >> alternative function for this matter.  
> > 
> > probably we need some commandline switches to change the behaviour
> > of the allocator according to the specific needs of each testcase
> > 
> > 
> > I'll see what I can do  
> 
> I do not think commandline switches are the right way. I think that
> reproducibility requires the memory to always be on a given state

there are some bugs that are only exposed when the memory has never
been touched. zeroing the memory on allocation guarantees that we will
__never__ be able to detect those bugs.

> before the tests begin. There are latent bugs in kvm-unit-tests that

then maybe it's the case to fix those bugs? :)

> are not apparent when the memory is zeroed. I do not think anyone
> wants to waste time on resolving these bugs.

I disagree. if a unit test has a bug, it should be fixed.

some tests apparently need the allocator to clear the memory, while
other tests depend on the memory being untouched. this is clearly
impossible to solve without some kind of switch


I would like to know what the others think about this issue too


Claudio

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08 13:41               ` Claudio Imbrenda
@ 2020-12-08 14:26                 ` Andrew Jones
  2020-12-09  8:53                   ` Claudio Imbrenda
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2020-12-08 14:26 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Nadav Amit, KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, lvivier

On Tue, Dec 08, 2020 at 02:41:39PM +0100, Claudio Imbrenda wrote:
> On Tue, 8 Dec 2020 04:48:09 -0800
> Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> [...]
> 
> > >> And other VM-entry failures, which are not easy to debug,
> > >> especially on bare-metal.  
> > > 
> > > so you are running the test on bare metal?
> > > 
> > > that is something I had not tested  
> > 
> > Base-metal / VMware.
> > 
> > >   
> > >> Note that the failing test is not new, and unfortunately these
> > >> kind of errors (wrong assumption that memory is zeroed) are not
> > >> rare, since KVM indeed zeroes the memory (unlike other hypervisors
> > >> and bare-metal).
> > >> 
> > >> The previous allocator had the behavior of zeroing the memory to  
> > > 
> > > I don't remember such behaviour, but I'll have a look  
> > 
> > See https://www.spinics.net/lists/kvm/msg186474.html
> 
> hmmm it seems I had completely missed it, oops
> 
> > >   
> > >> avoid such problems. I would argue that zeroing should be the
> > >> default behavior, and if someone wants to have the memory
> > >> “untouched” for a specific test (which one?) he should use an
> > >> alternative function for this matter.  
> > > 
> > > probably we need some commandline switches to change the behaviour
> > > of the allocator according to the specific needs of each testcase
> > > 
> > > 
> > > I'll see what I can do  
> > 
> > I do not think commandline switches are the right way. I think that
> > reproducibility requires the memory to always be on a given state
> 
> there are some bugs that are only exposed when the memory has never
> been touched. zeroing the memory on allocation guarantees that we will
> __never__ be able to detect those bugs.
> 
> > before the tests begin. There are latent bugs in kvm-unit-tests that
> 
> then maybe it's the case to fix those bugs? :)
> 
> > are not apparent when the memory is zeroed. I do not think anyone
> > wants to waste time on resolving these bugs.
> 
> I disagree. if a unit test has a bug, it should be fixed.
> 
> some tests apparently need the allocator to clear the memory, while
> other tests depend on the memory being untouched. this is clearly
> impossible to solve without some kind of switch
> 
> 
> I would like to know what the others think about this issue too
>

If the allocator supports memory being returned and then reallocated, then
the generic allocation API cannot guarantee that the memory is untouched
anyway. So, if a test requires untouched memory, it should use a specific
API. I think setup() should probably just set some physical memory regions
aside for that purpose, exposing them somehow to unit tests. The unit
tests can then do anything they want with them. The generic API might
as well continue zeroing memory by default.

I never got around to finishing my review of the memory areas. Maybe that
can be modified to support this "untouched" area simply by labeling an
area as such and by not accepting returned pages to that area.

Thanks,
drew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator
  2020-12-08 14:26                 ` Andrew Jones
@ 2020-12-09  8:53                   ` Claudio Imbrenda
  0 siblings, 0 replies; 42+ messages in thread
From: Claudio Imbrenda @ 2020-12-09  8:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Nadav Amit, KVM, Paolo Bonzini, Janosch Frank, David Hildenbrand,
	Thomas Huth, cohuck, lvivier

On Tue, 8 Dec 2020 15:26:10 +0100
Andrew Jones <drjones@redhat.com> wrote:

[...]
 
> > > are not apparent when the memory is zeroed. I do not think anyone
> > > wants to waste time on resolving these bugs.  
> > 
> > I disagree. if a unit test has a bug, it should be fixed.
> > 
> > some tests apparently need the allocator to clear the memory, while
> > other tests depend on the memory being untouched. this is clearly
> > impossible to solve without some kind of switch
> > 
> > 
> > I would like to know what the others think about this issue too
> >  
> 
> If the allocator supports memory being returned and then reallocated,
> then the generic allocation API cannot guarantee that the memory is
> untouched anyway. So, if a test requires untouched memory, it should
> use a specific API. I think setup() should probably just set some
> physical memory regions aside for that purpose, exposing them somehow
> to unit tests. The unit tests can then do anything they want with
> them. The generic API might as well continue zeroing memory by
> default.

I think I have an idea for a solution that will allow for untouched
pages and zeroed pages, on request, without any additional changes.

Give me a few days ;)

> I never got around to finishing my review of the memory areas. Maybe
> that can be modified to support this "untouched" area simply by
> labeling an area as such and by not accepting returned pages to that
> area.
> 
> Thanks,
> drew
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-12-09  8:53 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-02 15:44 [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Claudio Imbrenda
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 1/7] lib/list: Add double linked list management functions Claudio Imbrenda
2020-10-02 18:18   ` Andrew Jones
2020-10-05  6:57     ` Claudio Imbrenda
2020-11-06 11:29   ` Paolo Bonzini
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 2/7] lib/vmalloc: vmalloc support for handling allocation metadata Claudio Imbrenda
2020-10-03  8:46   ` Andrew Jones
2020-10-05  7:00     ` Claudio Imbrenda
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 3/7] lib/asm: Add definitions of memory areas Claudio Imbrenda
2020-10-03  9:23   ` Andrew Jones
2020-10-05  7:10     ` Claudio Imbrenda
2020-11-06 11:34   ` Paolo Bonzini
2020-11-06 12:58     ` Claudio Imbrenda
2020-11-06 13:04       ` Paolo Bonzini
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 4/7] lib/alloc_page: complete rewrite of the page allocator Claudio Imbrenda
2020-10-05 12:40   ` Andrew Jones
2020-10-05 15:56     ` Claudio Imbrenda
2020-10-05 16:53       ` Andrew Jones
2020-10-05 17:18         ` Claudio Imbrenda
2020-10-05 18:04           ` Andrew Jones
2020-12-08  0:41   ` Nadav Amit
2020-12-08  1:10     ` Nadav Amit
2020-12-08  9:15       ` Claudio Imbrenda
2020-12-08  9:23         ` Nadav Amit
2020-12-08 10:00           ` Claudio Imbrenda
2020-12-08 12:48             ` Nadav Amit
2020-12-08 13:41               ` Claudio Imbrenda
2020-12-08 14:26                 ` Andrew Jones
2020-12-09  8:53                   ` Claudio Imbrenda
2020-12-08  9:11     ` Claudio Imbrenda
2020-12-08  9:16       ` Nadav Amit
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 5/7] lib/alloc: simplify free and malloc Claudio Imbrenda
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 6/7] lib/alloc.h: remove align_min from struct alloc_ops Claudio Imbrenda
2020-11-06 11:35   ` Paolo Bonzini
2020-11-06 12:56     ` Claudio Imbrenda
2020-10-02 15:44 ` [kvm-unit-tests PATCH v2 7/7] lib/alloc_page: allow reserving arbitrary memory ranges Claudio Imbrenda
2020-10-05 11:54 ` [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Pierre Morel
2020-10-05 12:35   ` Claudio Imbrenda
2020-10-05 12:49     ` Andrew Jones
2020-10-05 12:57     ` Pierre Morel
2020-10-05 14:59       ` Claudio Imbrenda
2020-11-06 11:36 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.