kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] [RFC] KVM: Guest page hinting
@ 2017-08-01 20:48 Nitesh Narayan Lal
  2017-08-01 20:48 ` [PATCH 1/3] KVM: Support for guest " Nitesh Narayan Lal
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Nitesh Narayan Lal @ 2017-08-01 20:48 UTC (permalink / raw)
  To: kvm; +Cc: riel, mst, david, yang.zhang.wz, pagupta, wei.w.wang

The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. This approach has some different trade-offs compared to ballooning:
-For guests with DAX (no page cache) it rapidly hands back all free memory to the host.
-Resolves the complexity of ballooning policy as the whole process is transparent and occurs automatically.
-More frequent hypercalls than ballooning. More efficient memory use at the cost of higher CPU use.
-Guest can quickly re-use memory if it is needed again, with MADV_FREE use on the host side.

This patch set is divided into three parts and this is just the first part which contains the implementation used for preparing the list of guest free pages which will be sent to the host via hypercall.
The patch-set leverages the existing arch_free_page() and arch_alloc_page() to add this functionality. It uses two lists one cpu-local and other cpu-global.  Whenever a page is freed it is added to the respective cpu-local list until it is full. Once the list is full a seqlock is taken to prevent any further page allocations and the per cpu-local list is traversed in order to check for any fragmentation due to reallocations. If present those entries are defragmented and are added to the cpu-global list until it is full. Once the cpu-global list is full it is parsed and compressed. 
A hypercall is made only if the total number of entries are above the specified threshold value. A hypercall may affect the performance if done frequently and hence it needs to be minimized. This is the primary reason for compression, as it ensures replacement of multiple consecutive entries to a single one and removal of all duplicate entries causing frequent exhaustion of cpu-global list. After compressing the hyperlist there could be three following possibilities:
*If the number of entries in this cpu-global list is greater than the threshold required for hypercall value then a hypercall is issued.
*If the parsing of the cpu-local list is complete but the number of cpu-global list entries is less than the threshold then they are copied to a cpu-local list.
*In case the parsing of the cpu-local list is yet not complete and the number of entries in the cpu-global list is less than the threshold then the parsing of the cpu-local list is continued and entries in the cpu-global list are added from the newly available index acquired after compression.

-
Regards
Nitesh 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] KVM: Support for guest page hinting
  2017-08-01 20:48 [PATCH 0/3] [RFC] KVM: Guest page hinting Nitesh Narayan Lal
@ 2017-08-01 20:48 ` Nitesh Narayan Lal
  2017-08-02  7:12   ` kbuild test robot
  2017-08-01 20:48 ` [PATCH 2/3] KVM: Guest page hinting functionality Nitesh Narayan Lal
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Nitesh Narayan Lal @ 2017-08-01 20:48 UTC (permalink / raw)
  To: kvm
  Cc: riel, mst, david, yang.zhang.wz, pagupta, wei.w.wang, Nitesh Narayan Lal

This patch includes the following:
1. Basic skeleton for the support
2. Enablement of x86 platform to use the same

Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
 arch/x86/kvm/Makefile   |  1 +
 include/linux/gfp.h     |  7 +++++++
 virt/kvm/Kconfig        |  4 ++++
 virt/kvm/page_hinting.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+)
 create mode 100644 virt/kvm/page_hinting.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 09d4b17..6f7c382 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,6 +10,7 @@ KVM := ../../../virt/kvm
 kvm-y			+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
 				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
+kvm-$(CONFIG_KVM_FREE_PAGE_HINTING)	+= $(KVM)/page_hinting.o
 
 kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index a89d37e..37963b8 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -424,6 +424,13 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
 	return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
 }
 
+#ifdef	CONFIG_KVM_FREE_PAGE_HINTING
+#define HAVE_ARCH_ALLOC_PAGE
+#define HAVE_ARCH_FREE_PAGE
+void arch_free_page(struct page *page, int order);
+void arch_alloc_page(struct page *page, int order);
+#endif
+
 #ifndef HAVE_ARCH_FREE_PAGE
 static inline void arch_free_page(struct page *page, int order) { }
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index b0cc1a3..57f1c7b 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -50,3 +50,7 @@ config KVM_COMPAT
 
 config HAVE_KVM_IRQ_BYPASS
        bool
+
+config KVM_FREE_PAGE_HINTING
+       def_bool m
+       depends on KVM
diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
new file mode 100644
index 0000000..39d2b1d
--- /dev/null
+++ b/virt/kvm/page_hinting.c
@@ -0,0 +1,46 @@
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/page_ref.h>
+#include <linux/kvm_host.h>
+#include <linux/sort.h>
+
+#include <trace/events/kvm.h>
+
+#define MAX_FGPT_ENTRIES	1000
+#define HYPERLIST_THRESHOLD	500
+/*
+ * struct kvm_free_pages - Tracks the pages which are freed by the guest.
+ * @pfn	- page frame number for the page which is to be freed
+ * @pages - number of pages which are supposed to be freed.
+ * A global array object is used to hold the list of pfn and number of pages
+ * which are freed by the guest. This list may also have fragmentated pages so
+ * defragmentation is a must prior to the hypercall.
+ */
+struct kvm_free_pages {
+	unsigned long pfn;
+	unsigned int pages;
+};
+
+/*
+ * hypervisor_pages - It is a dummy structure passed with the hypercall.
+ * @pfn - page frame number for the page which is to be freed.
+ * @pages - number of pages which are supposed to be freed.
+ * A global array object is used to to hold the list of pfn and pages and is
+ * passed as part of the hypercall.
+ */
+struct hypervisor_pages {
+	unsigned long pfn;
+	unsigned int pages;
+};
+
+DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
+DEFINE_PER_CPU(int, kvm_pt_idx);
+struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
+
+void arch_alloc_page(struct page *page, int order)
+{
+}
+
+void arch_free_page(struct page *page, int order)
+{
+}
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-01 20:48 [PATCH 0/3] [RFC] KVM: Guest page hinting Nitesh Narayan Lal
  2017-08-01 20:48 ` [PATCH 1/3] KVM: Support for guest " Nitesh Narayan Lal
@ 2017-08-01 20:48 ` Nitesh Narayan Lal
  2017-08-02  7:01   ` Pankaj Gupta
  2017-08-02 12:19   ` Pankaj Gupta
  2017-08-01 20:48 ` [PATCH 3/3] KVM: Adding tracepoints for guest page hinting Nitesh Narayan Lal
  2017-08-04  5:16 ` [PATCH 0/3] [RFC] KVM: Guest " Yang Zhang
  3 siblings, 2 replies; 13+ messages in thread
From: Nitesh Narayan Lal @ 2017-08-01 20:48 UTC (permalink / raw)
  To: kvm
  Cc: riel, mst, david, yang.zhang.wz, pagupta, wei.w.wang, Nitesh Narayan Lal

This patch adds the guest implementation in order to maintain the list of
pages which are freed by the guest and are not reused. To avoid any
reallocation it includes seqlock once the list is completely filled.
Though it doesn't carries the hypercall related changes.

Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
 virt/kvm/page_hinting.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 244 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
index 39d2b1d..cfdc513 100644
--- a/virt/kvm/page_hinting.c
+++ b/virt/kvm/page_hinting.c
@@ -3,7 +3,7 @@
 #include <linux/page_ref.h>
 #include <linux/kvm_host.h>
 #include <linux/sort.h>
-
+#include <linux/kernel.h>
 #include <trace/events/kvm.h>
 
 #define MAX_FGPT_ENTRIES	1000
@@ -33,14 +33,256 @@ struct hypervisor_pages {
 	unsigned int pages;
 };
 
+static __cacheline_aligned_in_smp DEFINE_SEQLOCK(guest_page_lock);
 DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
 DEFINE_PER_CPU(int, kvm_pt_idx);
-struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
+struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES - 1];
+
+static void empty_hyperlist(void)
+{
+	int i = 0;
+
+	while (i < MAX_FGPT_ENTRIES - 1) {
+		hypervisor_pagelist[i].pfn = 0;
+		hypervisor_pagelist[i].pages = 0;
+		i++;
+	}
+}
+
+static void make_hypercall(void)
+{
+	/*
+	 * Dummy function: Tobe filled later.
+	 */
+	empty_hyperlist();
+}
+
+static int sort_pfn(const void *a1, const void *b1)
+{
+	const struct hypervisor_pages *a = a1;
+	const struct hypervisor_pages *b = b1;
+
+	if (a->pfn > b->pfn)
+		return 1;
+
+	if (a->pfn < b->pfn)
+		return -1;
+
+	return 0;
+}
+
+static int pack_hyperlist(void)
+{
+	int i = 0, j = 0;
+
+	while (i < MAX_FGPT_ENTRIES  - 1) {
+		if (hypervisor_pagelist[i].pfn != 0) {
+			hypervisor_pagelist[j].pfn =
+					hypervisor_pagelist[i].pfn;
+			hypervisor_pagelist[j].pages =
+					hypervisor_pagelist[i].pages;
+			j++;
+		}
+		i++;
+	}
+	i = j;
+	while (j < MAX_FGPT_ENTRIES - 1) {
+		hypervisor_pagelist[j].pfn = 0;
+		hypervisor_pagelist[j].pages = 0;
+		j++;
+	}
+	return i;
+}
+
+int compress_hyperlist(void)
+{
+	int i = 0, j = 1, merge_counter = 0, ret = 0;
+
+	sort(hypervisor_pagelist, MAX_FGPT_ENTRIES - 1,
+	     sizeof(struct hypervisor_pages), sort_pfn, NULL);
+	while (i < MAX_FGPT_ENTRIES - 1 && j < MAX_FGPT_ENTRIES - 1) {
+		unsigned long pfni = hypervisor_pagelist[i].pfn;
+		unsigned int pagesi = hypervisor_pagelist[i].pages;
+		unsigned long pfnj = hypervisor_pagelist[j].pfn;
+		unsigned int pagesj = hypervisor_pagelist[j].pages;
+
+		if (pfnj <= pfni) {
+			if (((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) &&
+			    ((pfnj + pagesj - 1) >= (pfni - 1))) {
+				hypervisor_pagelist[i].pfn = pfnj;
+				hypervisor_pagelist[i].pages += pfni - pfnj;
+				hypervisor_pagelist[j].pfn = 0;
+				hypervisor_pagelist[j].pages = 0;
+				j++;
+				merge_counter++;
+				continue;
+			} else if ((pfnj + pagesj - 1) > (pfni + pagesi - 1)) {
+				hypervisor_pagelist[i].pfn = pfnj;
+				hypervisor_pagelist[i].pages = pagesj;
+				hypervisor_pagelist[j].pfn = 0;
+				hypervisor_pagelist[j].pages = 0;
+				j++;
+				merge_counter++;
+				continue;
+			}
+		}
+		if (pfnj > pfni) {
+			if ((pfnj + pagesj - 1) > (pfni + pagesi - 1) &&
+			    (pfnj <= pfni + pagesi)) {
+				hypervisor_pagelist[i].pages +=
+						(pfnj + pagesj - 1) -
+						(pfni + pagesi - 1);
+				hypervisor_pagelist[j].pfn = 0;
+				hypervisor_pagelist[j].pages = 0;
+				j++;
+				merge_counter++;
+				continue;
+			}
+			if ((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) {
+				hypervisor_pagelist[j].pfn = 0;
+				hypervisor_pagelist[j].pages = 0;
+				j++;
+				merge_counter++;
+				continue;
+			}
+		}
+		i = j;
+		j++;
+	}
+	if (merge_counter != 0)
+		ret = pack_hyperlist() - 1;
+	else
+		ret = MAX_FGPT_ENTRIES - 1;
+	return ret;
+}
+
+void copy_hyperlist(int hyper_idx)
+{
+	int *idx = &get_cpu_var(kvm_pt_idx);
+	struct kvm_free_pages *free_page_obj;
+	int i = 0;
+
+	free_page_obj = &get_cpu_var(kvm_pt)[0];
+	while (i < hyper_idx) {
+		free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
+		free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
+		*idx += 1;
+		i++;
+	}
+	empty_hyperlist();
+	put_cpu_var(kvm_pt);
+	put_cpu_var(kvm_pt_idx);
+}
+
+/*
+ * arch_free_page_slowpath() - This function adds the guest free page entries
+ * to hypervisor_pages list and also ensures defragmentation prior to addition
+ * if it is present with any entry of the kvm_free_pages list.
+ */
+void arch_free_page_slowpath(void)
+{
+	int idx = 0;
+	int hyper_idx = -1;
+	int *kvm_idx = &get_cpu_var(kvm_pt_idx);
+	struct kvm_free_pages *free_page_obj = &get_cpu_var(kvm_pt)[0];
+
+	write_seqlock(&guest_page_lock);
+	while (idx < MAX_FGPT_ENTRIES) {
+		unsigned long pfn = free_page_obj[idx].pfn;
+		unsigned long pfn_end = free_page_obj[idx].pfn +
+					free_page_obj[idx].pages - 1;
+		bool prev_free = false;
+
+		while (pfn <= pfn_end) {
+			struct page *p = pfn_to_page(pfn);
+
+			if (PageCompound(p)) {
+				struct page *head_page = compound_head(p);
+				unsigned long head_pfn = page_to_pfn(head_page);
+				unsigned int alloc_pages =
+					1 << compound_order(head_page);
+
+				pfn = head_pfn + alloc_pages;
+				prev_free = false;
+				continue;
+			}
+			if (page_ref_count(p)) {
+				pfn++;
+				prev_free = false;
+				continue;
+			}
+			/*
+			 * The page is free so add it to the list and free the
+			 * hypervisor_pagelist if required.
+			 */
+			if (!prev_free) {
+				hyper_idx++;
+				if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
+					hyper_idx =  compress_hyperlist();
+					if (hyper_idx >=
+					    HYPERLIST_THRESHOLD - 1) {
+						make_hypercall();
+						hyper_idx = 0;
+					}
+				}
+				hypervisor_pagelist[hyper_idx].pfn = pfn;
+				hypervisor_pagelist[hyper_idx].pages = 1;
+				/*
+				 * If the next contiguous page is free, it can
+				 * be added to this same entry.
+				 */
+				prev_free = true;
+			} else {
+				/*
+				 * Multiple adjacent free pages
+				 */
+				hypervisor_pagelist[hyper_idx].pages++;
+			}
+			pfn++;
+		}
+		free_page_obj[idx].pfn = 0;
+		free_page_obj[idx].pages = 0;
+		idx++;
+	}
+	*kvm_idx = 0;
+	put_cpu_var(kvm_pt);
+	put_cpu_var(kvm_pt_idx);
+	write_sequnlock(&guest_page_lock);
+}
 
 void arch_alloc_page(struct page *page, int order)
 {
+	unsigned int seq;
+
+	/*
+	 * arch_free_page will acquire the lock once the list carrying guest
+	 * free pages is full and a hypercall will be made. Until complete free
+	 * page list is traversed no further allocaiton will be allowed.
+	 */
+	do {
+		seq = read_seqbegin(&guest_page_lock);
+	} while (read_seqretry(&guest_page_lock, seq));
 }
 
 void arch_free_page(struct page *page, int order)
 {
+	int *free_page_idx = &get_cpu_var(kvm_pt_idx);
+	struct kvm_free_pages *free_page_obj;
+	unsigned long flags;
+
+	/*
+	 * use of global variables may trigger a race condition between irq and
+	 * process context causing unwanted overwrites. This will be replaced
+	 * with a better solution to prevent such race conditions.
+	 */
+	local_irq_save(flags);
+	free_page_obj = &get_cpu_var(kvm_pt)[0];
+	free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
+	free_page_obj[*free_page_idx].pages = 1 << order;
+	*free_page_idx += 1;
+	if (*free_page_idx == MAX_FGPT_ENTRIES)
+		arch_free_page_slowpath();
+	put_cpu_var(kvm_pt);
+	put_cpu_var(kvm_pt_idx);
+	local_irq_restore(flags);
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] KVM: Adding tracepoints for guest page hinting
  2017-08-01 20:48 [PATCH 0/3] [RFC] KVM: Guest page hinting Nitesh Narayan Lal
  2017-08-01 20:48 ` [PATCH 1/3] KVM: Support for guest " Nitesh Narayan Lal
  2017-08-01 20:48 ` [PATCH 2/3] KVM: Guest page hinting functionality Nitesh Narayan Lal
@ 2017-08-01 20:48 ` Nitesh Narayan Lal
  2017-08-04  5:16 ` [PATCH 0/3] [RFC] KVM: Guest " Yang Zhang
  3 siblings, 0 replies; 13+ messages in thread
From: Nitesh Narayan Lal @ 2017-08-01 20:48 UTC (permalink / raw)
  To: kvm
  Cc: riel, mst, david, yang.zhang.wz, pagupta, wei.w.wang, Nitesh Narayan Lal

Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
 include/trace/events/kvm.h | 101 +++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/page_hinting.c    |  17 ++++++++
 2 files changed, 118 insertions(+)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 8ade3eb..96d29a4 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -393,6 +393,107 @@ TRACE_EVENT(kvm_halt_poll_ns,
 #define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \
 	trace_kvm_halt_poll_ns(false, vcpu_id, new, old)
 
+TRACE_EVENT(guest_free_page,
+	    TP_PROTO(struct page *page, unsigned int order),
+
+	TP_ARGS(page, order),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, pfn)
+		__field(unsigned int, order)
+	),
+
+	TP_fast_assign(
+		__entry->pfn            = page_to_pfn(page);
+		__entry->order          = order;
+	),
+
+	TP_printk("page=%p pfn=%lu number of pages=%d",
+		  pfn_to_page(__entry->pfn),
+		  __entry->pfn,
+		  (1 << __entry->order))
+);
+
+TRACE_EVENT(guest_alloc_page,
+	    TP_PROTO(struct page *page, unsigned int order),
+
+	TP_ARGS(page, order),
+
+	TP_STRUCT__entry(
+		__field(unsigned long,  pfn)
+		__field(unsigned int,   order)
+	),
+
+	TP_fast_assign(
+		__entry->pfn            = page_to_pfn(page);
+		__entry->order          = order;
+		),
+
+	TP_printk("page=%p pfn=%lu number of pages=%d",
+		  pfn_to_page(__entry->pfn),
+		  __entry->pfn,
+		  (1 << __entry->order))
+);
+
+TRACE_EVENT(guest_free_page_slowpath,
+	    TP_PROTO(unsigned long pfn, unsigned int pages),
+
+	TP_ARGS(pfn, pages),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, pfn)
+		__field(unsigned int, pages)
+	),
+
+	TP_fast_assign(
+		__entry->pfn            = pfn;
+		__entry->pages          = pages;
+	),
+
+	TP_printk("pfn=%lu number of pages=%u",
+		  __entry->pfn,
+		  __entry->pages)
+);
+
+TRACE_EVENT(guest_pfn_dump,
+	    TP_PROTO(char *type, unsigned long pfn, unsigned int pages),
+
+	TP_ARGS(type, pfn, pages),
+
+	TP_STRUCT__entry(
+		__field(char *, type)
+		__field(unsigned long, pfn)
+		__field(unsigned int, pages)
+	),
+
+	TP_fast_assign(
+		__entry->type		= type;
+		__entry->pfn            = pfn;
+		__entry->pages          = pages;
+		),
+
+	TP_printk("Type=%s pfn=%lu number of pages=%d",
+		  __entry->type,
+		  __entry->pfn,
+		  __entry->pages)
+);
+
+TRACE_EVENT(guest_str_dump,
+	    TP_PROTO(char *str),
+
+	TP_ARGS(str),
+
+	TP_STRUCT__entry(
+		__field(char *, str)
+	),
+
+	TP_fast_assign(
+		__entry->str		= str;
+		),
+
+	TP_printk("Debug=%s",
+		  __entry->str)
+);
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
index cfdc513..5fb390e0 100644
--- a/virt/kvm/page_hinting.c
+++ b/virt/kvm/page_hinting.c
@@ -55,6 +55,7 @@ static void make_hypercall(void)
 	 * Dummy function: Tobe filled later.
 	 */
 	empty_hyperlist();
+	trace_guest_str_dump("Hypercall to host...:");
 }
 
 static int sort_pfn(const void *a1, const void *b1)
@@ -77,6 +78,9 @@ static int pack_hyperlist(void)
 
 	while (i < MAX_FGPT_ENTRIES  - 1) {
 		if (hypervisor_pagelist[i].pfn != 0) {
+			trace_guest_pfn_dump("Packing Hyperlist",
+					     hypervisor_pagelist[i].pfn,
+					     hypervisor_pagelist[i].pages);
 			hypervisor_pagelist[j].pfn =
 					hypervisor_pagelist[i].pfn;
 			hypervisor_pagelist[j].pages =
@@ -164,6 +168,9 @@ void copy_hyperlist(int hyper_idx)
 
 	free_page_obj = &get_cpu_var(kvm_pt)[0];
 	while (i < hyper_idx) {
+		trace_guest_pfn_dump("HyperList entry copied",
+				     hypervisor_pagelist[i].pfn,
+				     hypervisor_pagelist[i].pages);
 		free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
 		free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
 		*idx += 1;
@@ -204,11 +211,14 @@ void arch_free_page_slowpath(void)
 
 				pfn = head_pfn + alloc_pages;
 				prev_free = false;
+				trace_guest_pfn_dump("Compound",
+						     head_pfn, alloc_pages);
 				continue;
 			}
 			if (page_ref_count(p)) {
 				pfn++;
 				prev_free = false;
+				trace_guest_pfn_dump("Single", pfn, 1);
 				continue;
 			}
 			/*
@@ -216,6 +226,11 @@ void arch_free_page_slowpath(void)
 			 * hypervisor_pagelist if required.
 			 */
 			if (!prev_free) {
+				if (hyper_idx != -1) {
+					trace_guest_free_page_slowpath(
+					hypervisor_pagelist[hyper_idx].pfn,
+					hypervisor_pagelist[hyper_idx].pages);
+				}
 				hyper_idx++;
 				if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
 					hyper_idx =  compress_hyperlist();
@@ -262,6 +277,7 @@ void arch_alloc_page(struct page *page, int order)
 	do {
 		seq = read_seqbegin(&guest_page_lock);
 	} while (read_seqretry(&guest_page_lock, seq));
+	trace_guest_alloc_page(page, order);
 }
 
 void arch_free_page(struct page *page, int order)
@@ -277,6 +293,7 @@ void arch_free_page(struct page *page, int order)
 	 */
 	local_irq_save(flags);
 	free_page_obj = &get_cpu_var(kvm_pt)[0];
+	trace_guest_free_page(page, order);
 	free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
 	free_page_obj[*free_page_idx].pages = 1 << order;
 	*free_page_idx += 1;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-01 20:48 ` [PATCH 2/3] KVM: Guest page hinting functionality Nitesh Narayan Lal
@ 2017-08-02  7:01   ` Pankaj Gupta
  2017-08-02 18:59     ` Nitesh Narayan Lal
  2017-08-02 12:19   ` Pankaj Gupta
  1 sibling, 1 reply; 13+ messages in thread
From: Pankaj Gupta @ 2017-08-02  7:01 UTC (permalink / raw)
  To: Nitesh Narayan Lal; +Cc: kvm, riel, mst, david, yang zhang wz, wei w wang

Hi Nitesh,

I tried to look at the code. 
Some minor comments inline.

Thanks,
Pankaj

> 
> This patch adds the guest implementation in order to maintain the list of
> pages which are freed by the guest and are not reused. To avoid any
> reallocation it includes seqlock once the list is completely filled.
> Though it doesn't carries the hypercall related changes.
> 
> Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
> ---
>  virt/kvm/page_hinting.c | 246
>  +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 244 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
> index 39d2b1d..cfdc513 100644
> --- a/virt/kvm/page_hinting.c
> +++ b/virt/kvm/page_hinting.c
> @@ -3,7 +3,7 @@
>  #include <linux/page_ref.h>
>  #include <linux/kvm_host.h>
>  #include <linux/sort.h>
> -
> +#include <linux/kernel.h>
>  #include <trace/events/kvm.h>
>  
>  #define MAX_FGPT_ENTRIES	1000
> @@ -33,14 +33,256 @@ struct hypervisor_pages {
>  	unsigned int pages;
>  };
>  
> +static __cacheline_aligned_in_smp DEFINE_SEQLOCK(guest_page_lock);
>  DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
>  DEFINE_PER_CPU(int, kvm_pt_idx);
> -struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
> +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES - 1];
> +
> +static void empty_hyperlist(void)
> +{
> +	int i = 0;
> +
> +	while (i < MAX_FGPT_ENTRIES - 1) {

      MAX_FGPT_ENTRIES in-place of 'MAX_FGPT_ENTRIES - 1' here
      and at similar other places?

> +		hypervisor_pagelist[i].pfn = 0;
> +		hypervisor_pagelist[i].pages = 0;
> +		i++;
> +	}
> +}
> +
> +static void make_hypercall(void)
> +{
> +	/*
> +	 * Dummy function: Tobe filled later.
> +	 */
> +	empty_hyperlist();
> +}
> +
> +static int sort_pfn(const void *a1, const void *b1)
> +{
> +	const struct hypervisor_pages *a = a1;
> +	const struct hypervisor_pages *b = b1;
> +
> +	if (a->pfn > b->pfn)
> +		return 1;
> +
> +	if (a->pfn < b->pfn)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int pack_hyperlist(void)
> +{
> +	int i = 0, j = 0;
> +
> +	while (i < MAX_FGPT_ENTRIES  - 1) {
> +		if (hypervisor_pagelist[i].pfn != 0) {

   no need to copy values at same index if non-zero                   

> +			hypervisor_pagelist[j].pfn =
> +					hypervisor_pagelist[i].pfn;
> +			hypervisor_pagelist[j].pages =
> +					hypervisor_pagelist[i].pages;
> +			j++;
> +		}
> +		i++;
> +	}
> +	i = j;
> +	while (j < MAX_FGPT_ENTRIES - 1) {
> +		hypervisor_pagelist[j].pfn = 0;
> +		hypervisor_pagelist[j].pages = 0;
> +		j++;
> +	}
> +	return i;
> +}
> +
> +int compress_hyperlist(void)
> +{
> +	int i = 0, j = 1, merge_counter = 0, ret = 0;
> +
> +	sort(hypervisor_pagelist, MAX_FGPT_ENTRIES - 1,
> +	     sizeof(struct hypervisor_pages), sort_pfn, NULL);

> +	while (i < MAX_FGPT_ENTRIES - 1 && j < MAX_FGPT_ENTRIES - 1) {
> +		unsigned long pfni = hypervisor_pagelist[i].pfn;
> +		unsigned int pagesi = hypervisor_pagelist[i].pages;
> +		unsigned long pfnj = hypervisor_pagelist[j].pfn;
> +		unsigned int pagesj = hypervisor_pagelist[j].pages;
> +
> +		if (pfnj <= pfni) {
> +			if (((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) &&
> +			    ((pfnj + pagesj - 1) >= (pfni - 1))) {
> +				hypervisor_pagelist[i].pfn = pfnj;
> +				hypervisor_pagelist[i].pages += pfni - pfnj;
> +				hypervisor_pagelist[j].pfn = 0;
> +				hypervisor_pagelist[j].pages = 0;
> +				j++;
> +				merge_counter++;
> +				continue;
> +			} else if ((pfnj + pagesj - 1) > (pfni + pagesi - 1)) {
> +				hypervisor_pagelist[i].pfn = pfnj;
> +				hypervisor_pagelist[i].pages = pagesj;
> +				hypervisor_pagelist[j].pfn = 0;
> +				hypervisor_pagelist[j].pages = 0;
> +				j++;
> +				merge_counter++;
> +				continue;
> +			}
> +		}
> +		if (pfnj > pfni) {
             
          else if here

> +			if ((pfnj + pagesj - 1) > (pfni + pagesi - 1) &&
> +			    (pfnj <= pfni + pagesi)) {
> +				hypervisor_pagelist[i].pages +=
> +						(pfnj + pagesj - 1) -
> +						(pfni + pagesi - 1);
> +				hypervisor_pagelist[j].pfn = 0;
> +				hypervisor_pagelist[j].pages = 0;
> +				j++;
> +				merge_counter++;
> +				continue;
> +			}
> +			if ((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) {
> +				hypervisor_pagelist[j].pfn = 0;
> +				hypervisor_pagelist[j].pages = 0;
> +				j++;
> +				merge_counter++;
> +				continue;
> +			}
> +		}
> +		i = j;
> +		j++;
> +	}
> +	if (merge_counter != 0)
> +		ret = pack_hyperlist() - 1;
> +	else
> +		ret = MAX_FGPT_ENTRIES - 1;
> +	return ret;
> +}
> +
> +void copy_hyperlist(int hyper_idx)
> +{
> +	int *idx = &get_cpu_var(kvm_pt_idx);
> +	struct kvm_free_pages *free_page_obj;
> +	int i = 0;
> +
> +	free_page_obj = &get_cpu_var(kvm_pt)[0];
> +	while (i < hyper_idx) {
> +		free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
> +		free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
> +		*idx += 1;
> +		i++;
> +	}
> +	empty_hyperlist();
> +	put_cpu_var(kvm_pt);
> +	put_cpu_var(kvm_pt_idx);
> +}
> +
> +/*
> + * arch_free_page_slowpath() - This function adds the guest free page
> entries
> + * to hypervisor_pages list and also ensures defragmentation prior to
> addition
> + * if it is present with any entry of the kvm_free_pages list.
> + */
> +void arch_free_page_slowpath(void)
> +{
> +	int idx = 0;
> +	int hyper_idx = -1;
> +	int *kvm_idx = &get_cpu_var(kvm_pt_idx);
> +	struct kvm_free_pages *free_page_obj = &get_cpu_var(kvm_pt)[0];
> +
> +	write_seqlock(&guest_page_lock);
> +	while (idx < MAX_FGPT_ENTRIES) {
> +		unsigned long pfn = free_page_obj[idx].pfn;
> +		unsigned long pfn_end = free_page_obj[idx].pfn +
> +					free_page_obj[idx].pages - 1;
> +		bool prev_free = false;
> +
> +		while (pfn <= pfn_end) {
> +			struct page *p = pfn_to_page(pfn);
> +
> +			if (PageCompound(p)) {
> +				struct page *head_page = compound_head(p);
> +				unsigned long head_pfn = page_to_pfn(head_page);
> +				unsigned int alloc_pages =
> +					1 << compound_order(head_page);
> +
> +				pfn = head_pfn + alloc_pages;
> +				prev_free = false;
> +				continue;
> +			}
> +			if (page_ref_count(p)) {
> +				pfn++;
> +				prev_free = false;
> +				continue;
> +			}
> +			/*
> +			 * The page is free so add it to the list and free the
> +			 * hypervisor_pagelist if required.
> +			 */
> +			if (!prev_free) {
> +				hyper_idx++;
> +				if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
> +					hyper_idx =  compress_hyperlist();
> +					if (hyper_idx >=
> +					    HYPERLIST_THRESHOLD - 1) {
> +						make_hypercall();
> +						hyper_idx = 0;
> +					}
> +				}
> +				hypervisor_pagelist[hyper_idx].pfn = pfn;
> +				hypervisor_pagelist[hyper_idx].pages = 1;
> +				/*
> +				 * If the next contiguous page is free, it can
> +				 * be added to this same entry.
> +				 */
> +				prev_free = true;
> +			} else {
> +				/*
> +				 * Multiple adjacent free pages
> +				 */
> +				hypervisor_pagelist[hyper_idx].pages++;
> +			}
> +			pfn++;
> +		}
> +		free_page_obj[idx].pfn = 0;
> +		free_page_obj[idx].pages = 0;
> +		idx++;
> +	}
> +	*kvm_idx = 0;
> +	put_cpu_var(kvm_pt);
> +	put_cpu_var(kvm_pt_idx);
> +	write_sequnlock(&guest_page_lock);
> +}
>  
>  void arch_alloc_page(struct page *page, int order)
>  {
> +	unsigned int seq;
> +
> +	/*
> +	 * arch_free_page will acquire the lock once the list carrying guest
> +	 * free pages is full and a hypercall will be made. Until complete free
> +	 * page list is traversed no further allocaiton will be allowed.
> +	 */
> +	do {
> +		seq = read_seqbegin(&guest_page_lock);
> +	} while (read_seqretry(&guest_page_lock, seq));
>  }
>  
>  void arch_free_page(struct page *page, int order)
>  {
> +	int *free_page_idx = &get_cpu_var(kvm_pt_idx);
> +	struct kvm_free_pages *free_page_obj;
> +	unsigned long flags;
> +
> +	/*
> +	 * use of global variables may trigger a race condition between irq and
> +	 * process context causing unwanted overwrites. This will be replaced
> +	 * with a better solution to prevent such race conditions.
> +	 */
> +	local_irq_save(flags);
> +	free_page_obj = &get_cpu_var(kvm_pt)[0];
> +	free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
> +	free_page_obj[*free_page_idx].pages = 1 << order;
> +	*free_page_idx += 1;
> +	if (*free_page_idx == MAX_FGPT_ENTRIES)
> +		arch_free_page_slowpath();
> +	put_cpu_var(kvm_pt);
> +	put_cpu_var(kvm_pt_idx);
> +	local_irq_restore(flags);
>  }
> --
> 2.9.4
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] KVM: Support for guest page hinting
  2017-08-01 20:48 ` [PATCH 1/3] KVM: Support for guest " Nitesh Narayan Lal
@ 2017-08-02  7:12   ` kbuild test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-08-02  7:12 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: kbuild-all, kvm, riel, mst, david, yang.zhang.wz, pagupta,
	wei.w.wang, Nitesh Narayan Lal

[-- Attachment #1: Type: text/plain, Size: 1340 bytes --]

Hi Nitesh,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.13-rc3 next-20170801]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nitesh-Narayan-Lal/KVM-Guest-page-hinting/20170802-130559
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   mm/page_alloc.o: In function `__free_pages_ok':
>> page_alloc.c:(.text+0x16e9): undefined reference to `arch_free_page'
   mm/page_alloc.o: In function `post_alloc_hook':
>> page_alloc.c:(.text+0x2424): undefined reference to `arch_alloc_page'
   mm/page_alloc.o: In function `free_hot_cold_page':
   page_alloc.c:(.text+0x3307): undefined reference to `arch_free_page'
   mm/page_alloc.o: In function `get_page_from_freelist':
   page_alloc.c:(.text+0x3fdd): undefined reference to `arch_alloc_page'
   page_alloc.c:(.text+0x45e3): undefined reference to `arch_alloc_page'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39627 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-01 20:48 ` [PATCH 2/3] KVM: Guest page hinting functionality Nitesh Narayan Lal
  2017-08-02  7:01   ` Pankaj Gupta
@ 2017-08-02 12:19   ` Pankaj Gupta
  2017-08-02 15:02     ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Pankaj Gupta @ 2017-08-02 12:19 UTC (permalink / raw)
  To: Nitesh Narayan Lal; +Cc: kvm, riel, mst, david, yang zhang wz, wei w wang


> 
> This patch adds the guest implementation in order to maintain the list of
> pages which are freed by the guest and are not reused. To avoid any
> reallocation it includes seqlock once the list is completely filled.
> Though it doesn't carries the hypercall related changes.
> 
> Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
> ---
>  virt/kvm/page_hinting.c | 246
>  +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 244 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
> index 39d2b1d..cfdc513 100644
> --- a/virt/kvm/page_hinting.c
> +++ b/virt/kvm/page_hinting.c
> @@ -3,7 +3,7 @@
>  #include <linux/page_ref.h>
>  #include <linux/kvm_host.h>
>  #include <linux/sort.h>
> -
> +#include <linux/kernel.h>
>  #include kvm.h>
>  
>  #define MAX_FGPT_ENTRIES        1000
> @@ -33,14 +33,256 @@ struct hypervisor_pages {
>          unsigned int pages;
>  };
>  
> +static __cacheline_aligned_in_smp DEFINE_SEQLOCK(guest_page_lock);
>  DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
>  DEFINE_PER_CPU(int, kvm_pt_idx);
> -struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
> +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES - 1];
> +
> +static void empty_hyperlist(void)
> +{
> +        int i = 0;
> +
> +        while (i < MAX_FGPT_ENTRIES - 1) {
> +                hypervisor_pagelist[i].pfn = 0;
> +                hypervisor_pagelist[i].pages = 0;
> +                i++;
> +        }
> +}
> +
> +static void make_hypercall(void)
> +{
> +        /*
> +         * Dummy function: Tobe filled later.
> +         */
> +        empty_hyperlist();
> +}
> +
> +static int sort_pfn(const void *a1, const void *b1)
> +{
> +        const struct hypervisor_pages *a = a1;
> +        const struct hypervisor_pages *b = b1;
> +
> +        if (a->pfn > b->pfn)
> +                return 1;
> +
> +        if (a->pfn < b->pfn)
> +                return -1;
> +
> +        return 0;
> +}
> +
> +static int pack_hyperlist(void)
> +{
> +        int i = 0, j = 0;
> +
> +        while (i < MAX_FGPT_ENTRIES  - 1) {
> +                if (hypervisor_pagelist[i].pfn != 0) {
> +                        hypervisor_pagelist[j].pfn =
> +                                        hypervisor_pagelist[i].pfn;
> +                        hypervisor_pagelist[j].pages =
> +                                        hypervisor_pagelist[i].pages;
> +                        j++;
> +                }
> +                i++;
> +        }
> +        i = j;
> +        while (j < MAX_FGPT_ENTRIES - 1) {
> +                hypervisor_pagelist[j].pfn = 0;
> +                hypervisor_pagelist[j].pages = 0;
> +                j++;
> +        }
> +        return i;
> +}
> +
> +int compress_hyperlist(void)
> +{
> +        int i = 0, j = 1, merge_counter = 0, ret = 0;
> +
> +        sort(hypervisor_pagelist, MAX_FGPT_ENTRIES - 1,
> +             sizeof(struct hypervisor_pages), sort_pfn, NULL);
> +        while (i < MAX_FGPT_ENTRIES - 1 && j < MAX_FGPT_ENTRIES - 1) {
> +                unsigned long pfni = hypervisor_pagelist[i].pfn;
> +                unsigned int pagesi = hypervisor_pagelist[i].pages;
> +                unsigned long pfnj = hypervisor_pagelist[j].pfn;
> +                unsigned int pagesj = hypervisor_pagelist[j].pages;
> +
> +                if (pfnj <= pfni) {
> +                        if (((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) &&
> +                            ((pfnj + pagesj - 1) >= (pfni - 1))) {
> +                                hypervisor_pagelist[i].pfn = pfnj;
> +                                hypervisor_pagelist[i].pages += pfni - pfnj;
> +                                hypervisor_pagelist[j].pfn = 0;
> +                                hypervisor_pagelist[j].pages = 0;
> +                                j++;
> +                                merge_counter++;
> +                                continue;
> +                        } else if ((pfnj + pagesj - 1) > (pfni + pagesi - 1)) {
> +                                hypervisor_pagelist[i].pfn = pfnj;
> +                                hypervisor_pagelist[i].pages = pagesj;
> +                                hypervisor_pagelist[j].pfn = 0;
> +                                hypervisor_pagelist[j].pages = 0;
> +                                j++;
> +                                merge_counter++;
> +                                continue;
> +                        }
> +                }
> +                if (pfnj > pfni) {
> +                        if ((pfnj + pagesj - 1) > (pfni + pagesi - 1) &&
> +                            (pfnj <= pfni + pagesi)) {
> +                                hypervisor_pagelist[i].pages +=
> +                                                (pfnj + pagesj - 1) -
> +                                                (pfni + pagesi - 1);
> +                                hypervisor_pagelist[j].pfn = 0;
> +                                hypervisor_pagelist[j].pages = 0;
> +                                j++;
> +                                merge_counter++;
> +                                continue;
> +                        }
> +                        if ((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) {
> +                                hypervisor_pagelist[j].pfn = 0;
> +                                hypervisor_pagelist[j].pages = 0;
> +                                j++;
> +                                merge_counter++;
> +                                continue;
> +                        }
> +                }
> +                i = j;
> +                j++;
> +        }
> +        if (merge_counter != 0)
> +                ret = pack_hyperlist() - 1;
> +        else
> +                ret = MAX_FGPT_ENTRIES - 1;
> +        return ret;
> +}
> +
> +void copy_hyperlist(int hyper_idx)
> +{
> +        int *idx = &get_cpu_var(kvm_pt_idx);
> +        struct kvm_free_pages *free_page_obj;
> +        int i = 0;
> +
> +        free_page_obj = &get_cpu_var(kvm_pt)[0];
> +        while (i < hyper_idx) {
> +                free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
> +                free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
> +                *idx += 1;
> +                i++;
> +        }
> +        empty_hyperlist();
> +        put_cpu_var(kvm_pt);
> +        put_cpu_var(kvm_pt_idx);
> +}
> +
> +/*
> + * arch_free_page_slowpath() - This function adds the guest free page
> entries
> + * to hypervisor_pages list and also ensures defragmentation prior to
> addition
> + * if it is present with any entry of the kvm_free_pages list.
> + */
> +void arch_free_page_slowpath(void)
> +{
> +        int idx = 0;
> +        int hyper_idx = -1;
> +        int *kvm_idx = &get_cpu_var(kvm_pt_idx);
> +        struct kvm_free_pages *free_page_obj = &get_cpu_var(kvm_pt)[0];
> +
> +        write_seqlock(&guest_page_lock);
> +        while (idx < MAX_FGPT_ENTRIES) {
> +                unsigned long pfn = free_page_obj[idx].pfn;
> +                unsigned long pfn_end = free_page_obj[idx].pfn +
> +                                        free_page_obj[idx].pages - 1;
> +                bool prev_free = false;
> +
> +                while (pfn <= pfn_end) {
> +                        struct page *p = pfn_to_page(pfn);

 just one thing here, if we want/planning to use pmem as RAM this might not work, because pfn will not 
 have corresponding 'struct page'?

> +
> +                        if (PageCompound(p)) {
> +                                struct page *head_page = compound_head(p);
> +                                unsigned long head_pfn = page_to_pfn(head_page);
> +                                unsigned int alloc_pages =
> +                                        1 << compound_order(head_page);
> +
> +                                pfn = head_pfn + alloc_pages;
> +                                prev_free = false;
> +                                continue;
> +                        }
> +                        if (page_ref_count(p)) {
> +                                pfn++;
> +                                prev_free = false;
> +                                continue;
> +                        }
> +                        /*
> +                         * The page is free so add it to the list and free the
> +                         * hypervisor_pagelist if required.
> +                         */
> +                        if (!prev_free) {
> +                                hyper_idx++;
> +                                if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
> +                                        hyper_idx =  compress_hyperlist();
> +                                        if (hyper_idx >=
> +                                            HYPERLIST_THRESHOLD - 1) {
> +                                                make_hypercall();
> +                                                hyper_idx = 0;
> +                                        }
> +                                }
> +                                hypervisor_pagelist[hyper_idx].pfn = pfn;
> +                                hypervisor_pagelist[hyper_idx].pages = 1;
> +                                /*
> +                                 * If the next contiguous page is free, it can
> +                                 * be added to this same entry.
> +                                 */
> +                                prev_free = true;
> +                        } else {
> +                                /*
> +                                 * Multiple adjacent free pages
> +                                 */
> +                                hypervisor_pagelist[hyper_idx].pages++;
> +                        }
> +                        pfn++;
> +                }
> +                free_page_obj[idx].pfn = 0;
> +                free_page_obj[idx].pages = 0;
> +                idx++;
> +        }
> +        *kvm_idx = 0;
> +        put_cpu_var(kvm_pt);
> +        put_cpu_var(kvm_pt_idx);
> +        write_sequnlock(&guest_page_lock);
> +}
>  
>  void arch_alloc_page(struct page *page, int order)
>  {
> +        unsigned int seq;
> +
> +        /*
> +         * arch_free_page will acquire the lock once the list carrying guest
> +         * free pages is full and a hypercall will be made. Until complete free
> +         * page list is traversed no further allocaiton will be allowed.
> +         */
> +        do {
> +                seq = read_seqbegin(&guest_page_lock);
> +        } while (read_seqretry(&guest_page_lock, seq));
>  }
>  
>  void arch_free_page(struct page *page, int order)
>  {
> +        int *free_page_idx = &get_cpu_var(kvm_pt_idx);
> +        struct kvm_free_pages *free_page_obj;
> +        unsigned long flags;
> +
> +        /*
> +         * use of global variables may trigger a race condition between irq and
> +         * process context causing unwanted overwrites. This will be replaced
> +         * with a better solution to prevent such race conditions.
> +         */
> +        local_irq_save(flags);
> +        free_page_obj = &get_cpu_var(kvm_pt)[0];
> +        free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
> +        free_page_obj[*free_page_idx].pages = 1 << order;
> +        *free_page_idx += 1;
> +        if (*free_page_idx == MAX_FGPT_ENTRIES)
> +                arch_free_page_slowpath();
> +        put_cpu_var(kvm_pt);
> +        put_cpu_var(kvm_pt_idx);
> +        local_irq_restore(flags);
>  }
> --
> 2.9.4
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-02 12:19   ` Pankaj Gupta
@ 2017-08-02 15:02     ` Rik van Riel
  0 siblings, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2017-08-02 15:02 UTC (permalink / raw)
  To: Pankaj Gupta, Nitesh Narayan Lal
  Cc: kvm, mst, david, yang zhang wz, wei w wang

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

On Wed, 2017-08-02 at 08:19 -0400, Pankaj Gupta wrote:
> > 
> > +        write_seqlock(&guest_page_lock);
> > +        while (idx < MAX_FGPT_ENTRIES) {
> > +                unsigned long pfn = free_page_obj[idx].pfn;
> > +                unsigned long pfn_end = free_page_obj[idx].pfn +
> > +                                        free_page_obj[idx].pages -
> > 1;
> > +                bool prev_free = false;
> > +
> > +                while (pfn <= pfn_end) {
> > +                        struct page *p = pfn_to_page(pfn);
> 
>  just one thing here, if we want/planning to use pmem as RAM this
> might not work, because pfn will not 
>  have corresponding 'struct page'?

pmem is never really used as RAM; allocating
and freeing space in pmem is done by a filesystem,
not by the memory management subsystem.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-02  7:01   ` Pankaj Gupta
@ 2017-08-02 18:59     ` Nitesh Narayan Lal
  2017-08-02 19:20       ` Rik van Riel
  0 siblings, 1 reply; 13+ messages in thread
From: Nitesh Narayan Lal @ 2017-08-02 18:59 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: kvm, riel, mst, david, yang.zhang.wz, wei.w.wang, mst, david,
	yang zhang wz, wei w wang


[-- Attachment #1.1: Type: text/plain, Size: 9738 bytes --]

Hi Pankaj,

Thanks for your comments, please find my response inline.


On 08/02/2017 03:01 AM, Pankaj Gupta wrote:
> Hi Nitesh,
>
> I tried to look at the code. 
> Some minor comments inline.
>
> Thanks,
> Pankaj
>
>> This patch adds the guest implementation in order to maintain the list of
>> pages which are freed by the guest and are not reused. To avoid any
>> reallocation it includes seqlock once the list is completely filled.
>> Though it doesn't carries the hypercall related changes.
>>
>> Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
>> ---
>>  virt/kvm/page_hinting.c | 246
>>  +++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 244 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
>> index 39d2b1d..cfdc513 100644
>> --- a/virt/kvm/page_hinting.c
>> +++ b/virt/kvm/page_hinting.c
>> @@ -3,7 +3,7 @@
>>  #include <linux/page_ref.h>
>>  #include <linux/kvm_host.h>
>>  #include <linux/sort.h>
>> -
>> +#include <linux/kernel.h>
>>  #include <trace/events/kvm.h>
>>  
>>  #define MAX_FGPT_ENTRIES	1000
>> @@ -33,14 +33,256 @@ struct hypervisor_pages {
>>  	unsigned int pages;
>>  };
>>  
>> +static __cacheline_aligned_in_smp DEFINE_SEQLOCK(guest_page_lock);
>>  DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
>>  DEFINE_PER_CPU(int, kvm_pt_idx);
>> -struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
>> +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES - 1];
>> +
>> +static void empty_hyperlist(void)
>> +{
>> +	int i = 0;
>> +
>> +	while (i < MAX_FGPT_ENTRIES - 1) {
>       MAX_FGPT_ENTRIES in-place of 'MAX_FGPT_ENTRIES - 1' here
>       and at similar other places?
This is because CPU local list has a total of 1000 entries
(MAX_FGPT_ENTRIES) where as CPU global list has 999 entries. If you see
the arch_free_page_slowpath() and consider a situation where there are
1000 entries of singly allocated free pages in cpu-local list i.e., none
of them are re-allocated. While adding them to the cpu global list when
cpu local list index reaches to 1000 the outer loop will terminate due
to which cpu global list index will never reach to 1000 and
compress_hyperlist()/make_hypercall() will never be called.
>
>> +		hypervisor_pagelist[i].pfn = 0;
>> +		hypervisor_pagelist[i].pages = 0;
>> +		i++;
>> +	}
>> +}
>> +
>> +static void make_hypercall(void)
>> +{
>> +	/*
>> +	 * Dummy function: Tobe filled later.
>> +	 */
>> +	empty_hyperlist();
>> +}
>> +
>> +static int sort_pfn(const void *a1, const void *b1)
>> +{
>> +	const struct hypervisor_pages *a = a1;
>> +	const struct hypervisor_pages *b = b1;
>> +
>> +	if (a->pfn > b->pfn)
>> +		return 1;
>> +
>> +	if (a->pfn < b->pfn)
>> +		return -1;
>> +
>> +	return 0;
>> +}
>> +
>> +static int pack_hyperlist(void)
>> +{
>> +	int i = 0, j = 0;
>> +
>> +	while (i < MAX_FGPT_ENTRIES  - 1) {
>> +		if (hypervisor_pagelist[i].pfn != 0) {
>    no need to copy values at same index if non-zero
I agree, I will fix it in the next version.
>                    
>
>> +			hypervisor_pagelist[j].pfn =
>> +					hypervisor_pagelist[i].pfn;
>> +			hypervisor_pagelist[j].pages =
>> +					hypervisor_pagelist[i].pages;
>> +			j++;
>> +		}
>> +		i++;
>> +	}
>> +	i = j;
>> +	while (j < MAX_FGPT_ENTRIES - 1) {
>> +		hypervisor_pagelist[j].pfn = 0;
>> +		hypervisor_pagelist[j].pages = 0;
>> +		j++;
>> +	}
>> +	return i;
>> +}
>> +
>> +int compress_hyperlist(void)
>> +{
>> +	int i = 0, j = 1, merge_counter = 0, ret = 0;
>> +
>> +	sort(hypervisor_pagelist, MAX_FGPT_ENTRIES - 1,
>> +	     sizeof(struct hypervisor_pages), sort_pfn, NULL);
>> +	while (i < MAX_FGPT_ENTRIES - 1 && j < MAX_FGPT_ENTRIES - 1) {
>> +		unsigned long pfni = hypervisor_pagelist[i].pfn;
>> +		unsigned int pagesi = hypervisor_pagelist[i].pages;
>> +		unsigned long pfnj = hypervisor_pagelist[j].pfn;
>> +		unsigned int pagesj = hypervisor_pagelist[j].pages;
>> +
>> +		if (pfnj <= pfni) {
>> +			if (((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) &&
>> +			    ((pfnj + pagesj - 1) >= (pfni - 1))) {
>> +				hypervisor_pagelist[i].pfn = pfnj;
>> +				hypervisor_pagelist[i].pages += pfni - pfnj;
>> +				hypervisor_pagelist[j].pfn = 0;
>> +				hypervisor_pagelist[j].pages = 0;
>> +				j++;
>> +				merge_counter++;
>> +				continue;
>> +			} else if ((pfnj + pagesj - 1) > (pfni + pagesi - 1)) {
>> +				hypervisor_pagelist[i].pfn = pfnj;
>> +				hypervisor_pagelist[i].pages = pagesj;
>> +				hypervisor_pagelist[j].pfn = 0;
>> +				hypervisor_pagelist[j].pages = 0;
>> +				j++;
>> +				merge_counter++;
>> +				continue;
>> +			}
>> +		}
>> +		if (pfnj > pfni) {
>              
>           else if here
I will fix this in the next version.
>> +			if ((pfnj + pagesj - 1) > (pfni + pagesi - 1) &&
>> +			    (pfnj <= pfni + pagesi)) {
>> +				hypervisor_pagelist[i].pages +=
>> +						(pfnj + pagesj - 1) -
>> +						(pfni + pagesi - 1);
>> +				hypervisor_pagelist[j].pfn = 0;
>> +				hypervisor_pagelist[j].pages = 0;
>> +				j++;
>> +				merge_counter++;
>> +				continue;
>> +			}
>> +			if ((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) {
>> +				hypervisor_pagelist[j].pfn = 0;
>> +				hypervisor_pagelist[j].pages = 0;
>> +				j++;
>> +				merge_counter++;
>> +				continue;
>> +			}
>> +		}
>> +		i = j;
>> +		j++;
>> +	}
>> +	if (merge_counter != 0)
>> +		ret = pack_hyperlist() - 1;
>> +	else
>> +		ret = MAX_FGPT_ENTRIES - 1;
>> +	return ret;
>> +}
>> +
>> +void copy_hyperlist(int hyper_idx)
>> +{
>> +	int *idx = &get_cpu_var(kvm_pt_idx);
>> +	struct kvm_free_pages *free_page_obj;
>> +	int i = 0;
>> +
>> +	free_page_obj = &get_cpu_var(kvm_pt)[0];
>> +	while (i < hyper_idx) {
>> +		free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
>> +		free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
>> +		*idx += 1;
>> +		i++;
>> +	}
>> +	empty_hyperlist();
>> +	put_cpu_var(kvm_pt);
>> +	put_cpu_var(kvm_pt_idx);
>> +}
>> +
>> +/*
>> + * arch_free_page_slowpath() - This function adds the guest free page
>> entries
>> + * to hypervisor_pages list and also ensures defragmentation prior to
>> addition
>> + * if it is present with any entry of the kvm_free_pages list.
>> + */
>> +void arch_free_page_slowpath(void)
>> +{
>> +	int idx = 0;
>> +	int hyper_idx = -1;
>> +	int *kvm_idx = &get_cpu_var(kvm_pt_idx);
>> +	struct kvm_free_pages *free_page_obj = &get_cpu_var(kvm_pt)[0];
>> +
>> +	write_seqlock(&guest_page_lock);
>> +	while (idx < MAX_FGPT_ENTRIES) {
>> +		unsigned long pfn = free_page_obj[idx].pfn;
>> +		unsigned long pfn_end = free_page_obj[idx].pfn +
>> +					free_page_obj[idx].pages - 1;
>> +		bool prev_free = false;
>> +
>> +		while (pfn <= pfn_end) {
>> +			struct page *p = pfn_to_page(pfn);
>> +
>> +			if (PageCompound(p)) {
>> +				struct page *head_page = compound_head(p);
>> +				unsigned long head_pfn = page_to_pfn(head_page);
>> +				unsigned int alloc_pages =
>> +					1 << compound_order(head_page);
>> +
>> +				pfn = head_pfn + alloc_pages;
>> +				prev_free = false;
>> +				continue;
>> +			}
>> +			if (page_ref_count(p)) {
>> +				pfn++;
>> +				prev_free = false;
>> +				continue;
>> +			}
>> +			/*
>> +			 * The page is free so add it to the list and free the
>> +			 * hypervisor_pagelist if required.
>> +			 */
>> +			if (!prev_free) {
>> +				hyper_idx++;
>> +				if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
>> +					hyper_idx =  compress_hyperlist();
>> +					if (hyper_idx >=
>> +					    HYPERLIST_THRESHOLD - 1) {
>> +						make_hypercall();
>> +						hyper_idx = 0;
>> +					}
>> +				}
>> +				hypervisor_pagelist[hyper_idx].pfn = pfn;
>> +				hypervisor_pagelist[hyper_idx].pages = 1;
>> +				/*
>> +				 * If the next contiguous page is free, it can
>> +				 * be added to this same entry.
>> +				 */
>> +				prev_free = true;
>> +			} else {
>> +				/*
>> +				 * Multiple adjacent free pages
>> +				 */
>> +				hypervisor_pagelist[hyper_idx].pages++;
>> +			}
>> +			pfn++;
>> +		}
>> +		free_page_obj[idx].pfn = 0;
>> +		free_page_obj[idx].pages = 0;
>> +		idx++;
>> +	}
>> +	*kvm_idx = 0;
>> +	put_cpu_var(kvm_pt);
>> +	put_cpu_var(kvm_pt_idx);
>> +	write_sequnlock(&guest_page_lock);
>> +}
>>  
>>  void arch_alloc_page(struct page *page, int order)
>>  {
>> +	unsigned int seq;
>> +
>> +	/*
>> +	 * arch_free_page will acquire the lock once the list carrying guest
>> +	 * free pages is full and a hypercall will be made. Until complete free
>> +	 * page list is traversed no further allocaiton will be allowed.
>> +	 */
>> +	do {
>> +		seq = read_seqbegin(&guest_page_lock);
>> +	} while (read_seqretry(&guest_page_lock, seq));
>>  }
>>  
>>  void arch_free_page(struct page *page, int order)
>>  {
>> +	int *free_page_idx = &get_cpu_var(kvm_pt_idx);
>> +	struct kvm_free_pages *free_page_obj;
>> +	unsigned long flags;
>> +
>> +	/*
>> +	 * use of global variables may trigger a race condition between irq and
>> +	 * process context causing unwanted overwrites. This will be replaced
>> +	 * with a better solution to prevent such race conditions.
>> +	 */
>> +	local_irq_save(flags);
>> +	free_page_obj = &get_cpu_var(kvm_pt)[0];
>> +	free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
>> +	free_page_obj[*free_page_idx].pages = 1 << order;
>> +	*free_page_idx += 1;
>> +	if (*free_page_idx == MAX_FGPT_ENTRIES)
>> +		arch_free_page_slowpath();
>> +	put_cpu_var(kvm_pt);
>> +	put_cpu_var(kvm_pt_idx);
>> +	local_irq_restore(flags);
>>  }
>> --
>> 2.9.4
>>
>>

-- 
Regards
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-02 18:59     ` Nitesh Narayan Lal
@ 2017-08-02 19:20       ` Rik van Riel
  2017-08-02 20:37         ` Nitesh Narayan Lal
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2017-08-02 19:20 UTC (permalink / raw)
  To: Nitesh Narayan Lal, Pankaj Gupta
  Cc: kvm, mst, david, yang.zhang.wz, wei.w.wang

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On Wed, 2017-08-02 at 14:59 -0400, Nitesh Narayan Lal wrote:
> 
> > > -struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
> > > +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES -
> > > 1];
> > > +
> > > +static void empty_hyperlist(void)
> > > +{
> > > +	int i = 0;
> > > +
> > > +	while (i < MAX_FGPT_ENTRIES - 1) {
> > 
> >       MAX_FGPT_ENTRIES in-place of 'MAX_FGPT_ENTRIES - 1' here
> >       and at similar other places?
> 
> This is because CPU local list has a total of 1000 entries
> (MAX_FGPT_ENTRIES) where as CPU global list has 999 entries. If you
> see
> the arch_free_page_slowpath() and consider a situation where there
> are
> 1000 entries of singly allocated free pages in cpu-local list i.e.,
> none
> of them are re-allocated. While adding them to the cpu global list
> when
> cpu local list index reaches to 1000 the outer loop will terminate
> due
> to which cpu global list index will never reach to 1000 and
> compress_hyperlist()/make_hypercall() will never be called.
> > 
> 
Can you explain why the hypervisor_pagelist
is smaller than the cpu local list?

This makes no sense to me. Why are they not
the same size?

That would certainly make the code easier to read.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] KVM: Guest page hinting functionality
  2017-08-02 19:20       ` Rik van Riel
@ 2017-08-02 20:37         ` Nitesh Narayan Lal
  0 siblings, 0 replies; 13+ messages in thread
From: Nitesh Narayan Lal @ 2017-08-02 20:37 UTC (permalink / raw)
  To: Rik van Riel, Pankaj Gupta
  Cc: kvm, mst, david, yang.zhang.wz, wei.w.wang, mst, david,
	yang.zhang.wz, wei.w.wang


[-- Attachment #1.1: Type: text/plain, Size: 1479 bytes --]



On 08/02/2017 03:20 PM, Rik van Riel wrote:
> On Wed, 2017-08-02 at 14:59 -0400, Nitesh Narayan Lal wrote:
>>>> -struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
>>>> +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES -
>>>> 1];
>>>> +
>>>> +static void empty_hyperlist(void)
>>>> +{
>>>> +	int i = 0;
>>>> +
>>>> +	while (i < MAX_FGPT_ENTRIES - 1) {
>>>       MAX_FGPT_ENTRIES in-place of 'MAX_FGPT_ENTRIES - 1' here
>>>       and at similar other places?
>> This is because CPU local list has a total of 1000 entries
>> (MAX_FGPT_ENTRIES) where as CPU global list has 999 entries. If you
>> see
>> the arch_free_page_slowpath() and consider a situation where there
>> are
>> 1000 entries of singly allocated free pages in cpu-local list i.e.,
>> none
>> of them are re-allocated. While adding them to the cpu global list
>> when
>> cpu local list index reaches to 1000 the outer loop will terminate
>> due
>> to which cpu global list index will never reach to 1000 and
>> compress_hyperlist()/make_hypercall() will never be called.
> Can you explain why the hypervisor_pagelist
> is smaller than the cpu local list?
>
> This makes no sense to me. Why are they not
> the same size?
>
> That would certainly make the code easier to read.

I am sorry. There is no point of complicating things when it could be
done in a simpler way.
I will make the required changes in my next patch-set.

-- 
Regards
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] [RFC] KVM: Guest page hinting
  2017-08-01 20:48 [PATCH 0/3] [RFC] KVM: Guest page hinting Nitesh Narayan Lal
                   ` (2 preceding siblings ...)
  2017-08-01 20:48 ` [PATCH 3/3] KVM: Adding tracepoints for guest page hinting Nitesh Narayan Lal
@ 2017-08-04  5:16 ` Yang Zhang
  2017-08-04 11:59   ` David Hildenbrand
  3 siblings, 1 reply; 13+ messages in thread
From: Yang Zhang @ 2017-08-04  5:16 UTC (permalink / raw)
  To: Nitesh Narayan Lal, kvm; +Cc: riel, mst, david, pagupta, wei.w.wang, david

On 2017/8/2 4:48, Nitesh Narayan Lal wrote:
> The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. This approach has some different trade-offs compared to ballooning:
> -For guests with DAX (no page cache) it rapidly hands back all free memory to the host.
> -Resolves the complexity of ballooning policy as the whole process is transparent and occurs automatically.
> -More frequent hypercalls than ballooning. More efficient memory use at the cost of higher CPU use.
> -Guest can quickly re-use memory if it is needed again, with MADV_FREE use on the host side.

Hi Nitesh,

I didn't look into the patch set. But seems there already two people are 
doing the similar thing:
https://lkml.org/lkml/2017/7/12/249
https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html

Are you doing the same thing?

>
> This patch set is divided into three parts and this is just the first part which contains the implementation used for preparing the list of guest free pages which will be sent to the host via hypercall.
> The patch-set leverages the existing arch_free_page() and arch_alloc_page() to add this functionality. It uses two lists one cpu-local and other cpu-global.  Whenever a page is freed it is added to the respective cpu-local list until it is full. Once the list is full a seqlock is taken to prevent any further page allocations and the per cpu-local list is traversed in order to check for any fragmentation due to reallocations. If present those entries are defragmented and are added to the cpu-global list until it is full. Once the cpu-global list is full it is parsed and compressed.
> A hypercall is made only if the total number of entries are above the specified threshold value. A hypercall may affect the performance if done frequently and hence it needs to be minimized. This is the primary reason for compression, as it ensures replacement of multiple consecutive entries to a single one and removal of all duplicate entries causing frequent exhaustion of cpu-global list. After compressing the hyperlist there could be three following possibilities:
> *If the number of entries in this cpu-global list is greater than the threshold required for hypercall value then a hypercall is issued.
> *If the parsing of the cpu-local list is complete but the number of cpu-global list entries is less than the threshold then they are copied to a cpu-local list.
> *In case the parsing of the cpu-local list is yet not complete and the number of entries in the cpu-global list is less than the threshold then the parsing of the cpu-local list is continued and entries in the cpu-global list are added from the newly available index acquired after compression.
>
> -
> Regards
> Nitesh
>


-- 
Yang
Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] [RFC] KVM: Guest page hinting
  2017-08-04  5:16 ` [PATCH 0/3] [RFC] KVM: Guest " Yang Zhang
@ 2017-08-04 11:59   ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2017-08-04 11:59 UTC (permalink / raw)
  To: Yang Zhang, Nitesh Narayan Lal, kvm; +Cc: riel, mst, pagupta, wei.w.wang

On 04.08.2017 07:16, Yang Zhang wrote:
> On 2017/8/2 4:48, Nitesh Narayan Lal wrote:
>> The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. This approach has some different trade-offs compared to ballooning:
>> -For guests with DAX (no page cache) it rapidly hands back all free memory to the host.
>> -Resolves the complexity of ballooning policy as the whole process is transparent and occurs automatically.
>> -More frequent hypercalls than ballooning. More efficient memory use at the cost of higher CPU use.
>> -Guest can quickly re-use memory if it is needed again, with MADV_FREE use on the host side.
> 
> Hi Nitesh,
> 
> I didn't look into the patch set. But seems there already two people are 
> doing the similar thing:
> https://lkml.org/lkml/2017/7/12/249
> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html
> 
> Are you doing the same thing?
> 

(I'm working on virtio-mem, so my point of view)

virtio-balloon was used for different purposes:
1. Forcing memory pressure onto the guest, making it shrink its page cache.
2. Collaborative memory management. Flexible memory assignment between
guests.
3. Memory hot(un)plug.

While one was able to do these things to some degree, there are certain
limitations that cannot be fixed (not going into detail).

Now, work is being done on virtio-balloon to do free page hinting in
special scenarios (when migrating the guest, so unused pages are not
migrated).

What people are working on right now:

fake DAX targets 1. free page hinting targets 2. virtio-mem targets 3.

So this work (guest page hinting) "collides" with the current work being
done on virtio-balloon. But it is way superior, as free pages are
_always_ hinted, not just when the guest is being migrated.

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-04 11:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 20:48 [PATCH 0/3] [RFC] KVM: Guest page hinting Nitesh Narayan Lal
2017-08-01 20:48 ` [PATCH 1/3] KVM: Support for guest " Nitesh Narayan Lal
2017-08-02  7:12   ` kbuild test robot
2017-08-01 20:48 ` [PATCH 2/3] KVM: Guest page hinting functionality Nitesh Narayan Lal
2017-08-02  7:01   ` Pankaj Gupta
2017-08-02 18:59     ` Nitesh Narayan Lal
2017-08-02 19:20       ` Rik van Riel
2017-08-02 20:37         ` Nitesh Narayan Lal
2017-08-02 12:19   ` Pankaj Gupta
2017-08-02 15:02     ` Rik van Riel
2017-08-01 20:48 ` [PATCH 3/3] KVM: Adding tracepoints for guest page hinting Nitesh Narayan Lal
2017-08-04  5:16 ` [PATCH 0/3] [RFC] KVM: Guest " Yang Zhang
2017-08-04 11:59   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).