linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12] KVM: introduce readonly memslot
@ 2012-08-07  9:47 Xiao Guangrong
  2012-08-07  9:48 ` [PATCH v5 01/12] KVM: fix missing check for memslot flags Xiao Guangrong
                   ` (12 more replies)
  0 siblings, 13 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

Changelog:
- introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
- introduce KVM_HVA_ERR_BAD and optimize error hva indicators

The test case can be found at:
http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2

In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault-pfn
and async is not allowed, then the vm will crash.

As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
to the guest, read access is happy for readonly memslot, write access on
readonly memslot will cause KVM_EXIT_MMIO exit.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 01/12] KVM: fix missing check for memslot flags
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
@ 2012-08-07  9:48 ` Xiao Guangrong
  2012-08-07  9:48 ` [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace Xiao Guangrong
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:48 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Check flags when memslot is registered from userspace as Avi's suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a2e85af..7b94d70 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -678,6 +678,14 @@ void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)
 	slots->generation++;
 }

+static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
+{
+	if (mem->flags & ~KVM_MEM_LOG_DIRTY_PAGES)
+		return -EINVAL;
+
+	return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -698,6 +706,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	struct kvm_memory_slot old, new;
 	struct kvm_memslots *slots, *old_memslots;

+	r = check_memory_region_flags(mem);
+	if (r)
+		goto out;
+
 	r = -EINVAL;
 	/* General sanity checks */
 	if (mem->memory_size & (PAGE_SIZE - 1))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
  2012-08-07  9:48 ` [PATCH v5 01/12] KVM: fix missing check for memslot flags Xiao Guangrong
@ 2012-08-07  9:48 ` Xiao Guangrong
  2012-08-09 18:48   ` Marcelo Tosatti
  2012-08-07  9:49 ` [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic Xiao Guangrong
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:48 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Quote Avi's comment:
| KVM_MEMSLOT_INVALID is actually an internal symbol, not used by
| userspace.  Please move it to kvm_host.h.

Also, move KVM_MEMSLOT_INVALID to the highest bit

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 include/linux/kvm.h      |    1 -
 include/linux/kvm_host.h |    2 ++
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..dc3aa2a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -103,7 +103,6 @@ struct kvm_userspace_memory_region {

 /* for kvm_memory_region::flags */
 #define KVM_MEM_LOG_DIRTY_PAGES  1UL
-#define KVM_MEMSLOT_INVALID      (1UL << 1)

 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2b897e..4c2da5a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -35,6 +35,8 @@
 #define KVM_MMIO_SIZE 8
 #endif

+#define KVM_MEMSLOT_INVALID	(1UL << 31)
+
 /*
  * If we support unaligned MMIO, at most one fragment will be split into two:
  */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
  2012-08-07  9:48 ` [PATCH v5 01/12] KVM: fix missing check for memslot flags Xiao Guangrong
  2012-08-07  9:48 ` [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace Xiao Guangrong
@ 2012-08-07  9:49 ` Xiao Guangrong
  2012-08-09 18:50   ` Marcelo Tosatti
  2012-08-07  9:50 ` [PATCH v5 04/12] KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic Xiao Guangrong
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:49 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

It can instead of hva_to_pfn_atomic

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c       |    5 +----
 include/linux/kvm_host.h |    3 ++-
 virt/kvm/kvm_main.c      |   14 ++++++++------
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9651c2c..5548971 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2510,15 +2510,12 @@ static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 				     bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
-	unsigned long hva;

 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;

-	hva = gfn_to_hva_memslot(slot, gfn);
-
-	return hva_to_pfn_atomic(hva);
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }

 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4c2da5a..187c9c8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -457,7 +457,6 @@ void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);

-pfn_t hva_to_pfn_atomic(unsigned long addr);
 pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn);
 pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
 		       bool write_fault, bool *writable);
@@ -465,6 +464,8 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable);
 pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
+pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
+
 void kvm_release_pfn_dirty(pfn_t pfn);
 void kvm_release_pfn_clean(pfn_t pfn);
 void kvm_set_pfn_dirty(pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7b94d70..543f9b7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1102,12 +1102,6 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	return pfn;
 }

-pfn_t hva_to_pfn_atomic(unsigned long addr)
-{
-	return hva_to_pfn(addr, true, NULL, true, NULL);
-}
-EXPORT_SYMBOL_GPL(hva_to_pfn_atomic);
-
 static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
 			  bool write_fault, bool *writable)
 {
@@ -1155,6 +1149,14 @@ pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 	return hva_to_pfn(addr, false, NULL, true, NULL);
 }

+pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	unsigned long addr = gfn_to_hva_memslot(slot, gfn);
+
+	return hva_to_pfn(addr, true, NULL, true, NULL);
+}
+EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
+
 int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
 								  int nr_pages)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 04/12] KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (2 preceding siblings ...)
  2012-08-07  9:49 ` [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic Xiao Guangrong
@ 2012-08-07  9:50 ` Xiao Guangrong
  2012-08-07  9:51 ` [PATCH v5 05/12] KVM: reorganize hva_to_pfn Xiao Guangrong
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:50 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

This set of functions is only used to read data from host space, in the
later patch, we will only get a readonly hva in gfn_to_hva_read, and
the function name is a good hint to let gfn_to_hva_read to pair with
kvm_read_hva()/kvm_read_hva_atomic()

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c |   29 +++++++++++++++++++++++++----
 1 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 543f9b7..26ffc87 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1002,6 +1002,27 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);

+/*
+ * The hva returned by this function is only allowed to be read.
+ * It should pair with kvm_read_hva() or kvm_read_hva_atomic().
+ */
+static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
+{
+	return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
+}
+
+static int kvm_read_hva(void *data, void __user *hva, int len)
+{
+	return __copy_from_user(data, hva, len);
+
+}
+
+static int kvm_read_hva_atomic(void *data, void __user *hva, int len)
+{
+	return __copy_from_user_inatomic(data, hva, len);
+
+}
+
 int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
 	unsigned long start, int write, struct page **page)
 {
@@ -1274,10 +1295,10 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 	int r;
 	unsigned long addr;

-	addr = gfn_to_hva(kvm, gfn);
+	addr = gfn_to_hva_read(kvm, gfn);
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
-	r = __copy_from_user(data, (void __user *)addr + offset, len);
+	r = kvm_read_hva(data, (void __user *)addr + offset, len);
 	if (r)
 		return -EFAULT;
 	return 0;
@@ -1312,11 +1333,11 @@ int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int offset = offset_in_page(gpa);

-	addr = gfn_to_hva(kvm, gfn);
+	addr = gfn_to_hva_read(kvm, gfn);
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
 	pagefault_disable();
-	r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
+	r = kvm_read_hva_atomic(data, (void __user *)addr + offset, len);
 	pagefault_enable();
 	if (r)
 		return -EFAULT;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 05/12] KVM: reorganize hva_to_pfn
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (3 preceding siblings ...)
  2012-08-07  9:50 ` [PATCH v5 04/12] KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic Xiao Guangrong
@ 2012-08-07  9:51 ` Xiao Guangrong
  2012-08-10 17:51   ` Marcelo Tosatti
  2012-08-07  9:51 ` [PATCH v5 06/12] KVM: use 'writable' as a hint to map writable pfn Xiao Guangrong
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:51 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

We do too many things in hva_to_pfn, this patch reorganize the code,
let it be better readable

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c |  159 +++++++++++++++++++++++++++++++--------------------
 1 files changed, 97 insertions(+), 62 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 26ffc87..dd01bcb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1043,83 +1043,118 @@ static inline int check_user_page_hwpoison(unsigned long addr)
 	return rc == -EHWPOISON;
 }

-static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+/*
+ * The atomic path to get the writable pfn which will be stored in @pfn,
+ * true indicates success, otherwise false is returned.
+ */
+static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
+			    bool write_fault, bool *writable, pfn_t *pfn)
 {
 	struct page *page[1];
-	int npages = 0;
-	pfn_t pfn;
+	int npages;

-	/* we can do it either atomically or asynchronously, not both */
-	BUG_ON(atomic && async);
+	if (!(async || atomic))
+		return false;

-	BUG_ON(!write_fault && !writable);
+	npages = __get_user_pages_fast(addr, 1, 1, page);
+	if (npages == 1) {
+		*pfn = page_to_pfn(page[0]);

-	if (writable)
-		*writable = true;
+		if (writable)
+			*writable = true;
+		return true;
+	}
+
+	return false;
+}

-	if (atomic || async)
-		npages = __get_user_pages_fast(addr, 1, 1, page);
+/*
+ * The slow path to get the pfn of the specified host virtual address,
+ * 1 indicates success, -errno is returned if error is detected.
+ */
+static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+			   bool *writable, pfn_t *pfn)
+{
+	struct page *page[1];
+	int npages = 0;

-	if (unlikely(npages != 1) && !atomic) {
-		might_sleep();
+	might_sleep();

-		if (writable)
-			*writable = write_fault;
-
-		if (async) {
-			down_read(&current->mm->mmap_sem);
-			npages = get_user_page_nowait(current, current->mm,
-						     addr, write_fault, page);
-			up_read(&current->mm->mmap_sem);
-		} else
-			npages = get_user_pages_fast(addr, 1, write_fault,
-						     page);
-
-		/* map read fault as writable if possible */
-		if (unlikely(!write_fault) && npages == 1) {
-			struct page *wpage[1];
-
-			npages = __get_user_pages_fast(addr, 1, 1, wpage);
-			if (npages == 1) {
-				*writable = true;
-				put_page(page[0]);
-				page[0] = wpage[0];
-			}
-			npages = 1;
+	if (writable)
+		*writable = write_fault;
+
+	if (async) {
+		down_read(&current->mm->mmap_sem);
+		npages = get_user_page_nowait(current, current->mm,
+					      addr, write_fault, page);
+		up_read(&current->mm->mmap_sem);
+	} else
+		npages = get_user_pages_fast(addr, 1, write_fault,
+					     page);
+	if (npages != 1)
+		return npages;
+
+	/* map read fault as writable if possible */
+	if (unlikely(!write_fault)) {
+		struct page *wpage[1];
+
+		npages = __get_user_pages_fast(addr, 1, 1, wpage);
+		if (npages == 1) {
+			*writable = true;
+			put_page(page[0]);
+			page[0] = wpage[0];
 		}
+
+		npages = 1;
 	}
+	*pfn = page_to_pfn(page[0]);
+	return npages;
+}
+
+static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
+			bool write_fault, bool *writable)
+{
+	struct vm_area_struct *vma;
+	pfn_t pfn = 0;
+	int npages;

-	if (unlikely(npages != 1)) {
-		struct vm_area_struct *vma;
+	/* we can do it either atomically or asynchronously, not both */
+	BUG_ON(atomic && async);

-		if (atomic)
-			return KVM_PFN_ERR_FAULT;
+	BUG_ON(!write_fault && !writable);

-		down_read(&current->mm->mmap_sem);
-		if (npages == -EHWPOISON ||
-			(!async && check_user_page_hwpoison(addr))) {
-			up_read(&current->mm->mmap_sem);
-			return KVM_PFN_ERR_HWPOISON;
-		}
+	if (hva_to_pfn_fast(addr, atomic, async, write_fault, writable, &pfn))
+		return pfn;

-		vma = find_vma_intersection(current->mm, addr, addr+1);
-
-		if (vma == NULL)
-			pfn = KVM_PFN_ERR_FAULT;
-		else if ((vma->vm_flags & VM_PFNMAP)) {
-			pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
-				vma->vm_pgoff;
-			BUG_ON(!kvm_is_mmio_pfn(pfn));
-		} else {
-			if (async && (vma->vm_flags & VM_WRITE))
-				*async = true;
-			pfn = KVM_PFN_ERR_FAULT;
-		}
-		up_read(&current->mm->mmap_sem);
-	} else
-		pfn = page_to_pfn(page[0]);
+	if (atomic)
+		return KVM_PFN_ERR_FAULT;

+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	if (npages == 1)
+		return pfn;
+
+	down_read(&current->mm->mmap_sem);
+	if (npages == -EHWPOISON ||
+	      (!async && check_user_page_hwpoison(addr))) {
+		pfn = KVM_PFN_ERR_HWPOISON;
+		goto exit;
+	}
+
+	vma = find_vma_intersection(current->mm, addr, addr + 1);
+
+	if (vma == NULL)
+		pfn = KVM_PFN_ERR_FAULT;
+	else if ((vma->vm_flags & VM_PFNMAP)) {
+		pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
+			vma->vm_pgoff;
+		BUG_ON(!kvm_is_mmio_pfn(pfn));
+	} else {
+		if (async && (vma->vm_flags & VM_WRITE))
+			*async = true;
+		pfn = KVM_PFN_ERR_FAULT;
+	}
+exit:
+	up_read(&current->mm->mmap_sem);
 	return pfn;
 }

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 06/12] KVM: use 'writable' as a hint to map writable pfn
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (4 preceding siblings ...)
  2012-08-07  9:51 ` [PATCH v5 05/12] KVM: reorganize hva_to_pfn Xiao Guangrong
@ 2012-08-07  9:51 ` Xiao Guangrong
  2012-08-07  9:52 ` [PATCH v5 07/12] KVM: introduce KVM_PFN_ERR_RO_FAULT Xiao Guangrong
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:51 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In current code, we always map writable pfn for the read-fault, in order
to support readonly memslot, we map writable pfn only if 'writable'
is not NULL

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c |   24 +++++++++++++++++++++++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd01bcb..bf6dede 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1056,6 +1056,14 @@ static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
 	if (!(async || atomic))
 		return false;

+	/*
+	 * Fast pin a writable pfn only if it is a write fault request
+	 * or the caller allows to map a writable pfn for a read fault
+	 * request.
+	 */
+	if (!(write_fault || writable))
+		return false;
+
 	npages = __get_user_pages_fast(addr, 1, 1, page);
 	if (npages == 1) {
 		*pfn = page_to_pfn(page[0]);
@@ -1095,7 +1103,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 		return npages;

 	/* map read fault as writable if possible */
-	if (unlikely(!write_fault)) {
+	if (unlikely(!write_fault) && writable) {
 		struct page *wpage[1];

 		npages = __get_user_pages_fast(addr, 1, 1, wpage);
@@ -1111,6 +1119,20 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	return npages;
 }

+/*
+ * Pin guest page in memory and return its pfn.
+ * @addr: host virtual address which maps memory to the guest
+ * @atomic: whether this function can sleep
+ * @async: whether this function need to wait IO complete if the
+ *         host page is not in the memory
+ * @write_fault: whether we should get a writable host page
+ * @writable: whether it allows to map a writable host page for !@write_fault
+ *
+ * The function will map a writable host page for these two cases:
+ * 1): @write_fault = true
+ * 2): @write_fault = false && @writable, @writable will tell the caller
+ *     whether the mapping is writable.
+ */
 static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 			bool write_fault, bool *writable)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 07/12] KVM: introduce KVM_PFN_ERR_RO_FAULT
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (5 preceding siblings ...)
  2012-08-07  9:51 ` [PATCH v5 06/12] KVM: use 'writable' as a hint to map writable pfn Xiao Guangrong
@ 2012-08-07  9:52 ` Xiao Guangrong
  2012-08-07  9:52 ` [PATCH v5 08/12] KVM: introduce KVM_HVA_ERR_BAD Xiao Guangrong
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:52 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In the later patch, it indicates failure when we try to get a writable
pfn from the readonly memslot

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 include/linux/kvm_host.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 187c9c8..8306f24 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -60,6 +60,7 @@
 #define KVM_PFN_ERR_FAULT	(KVM_PFN_ERR_MASK)
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_BAD		(KVM_PFN_ERR_MASK + 2)
+#define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 3)

 static inline bool is_error_pfn(pfn_t pfn)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 08/12] KVM: introduce KVM_HVA_ERR_BAD
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (6 preceding siblings ...)
  2012-08-07  9:52 ` [PATCH v5 07/12] KVM: introduce KVM_PFN_ERR_RO_FAULT Xiao Guangrong
@ 2012-08-07  9:52 ` Xiao Guangrong
  2012-08-07  9:53 ` [PATCH v5 09/12] KVM: introduce KVM_HVA_ERR_RO_BAD Xiao Guangrong
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:52 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Then, remove bad_hva and inline kvm_is_error_hva

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 include/linux/kvm_host.h |    8 +++++++-
 virt/kvm/kvm_main.c      |   13 +------------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8306f24..7737e45 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -77,6 +77,13 @@ static inline bool is_invalid_pfn(pfn_t pfn)
 	return !is_noslot_pfn(pfn) && is_error_pfn(pfn);
 }

+#define KVM_HVA_ERR_BAD	(PAGE_OFFSET)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+	return addr == PAGE_OFFSET;
+}
+
 #define KVM_ERR_PTR_BAD_PAGE	(ERR_PTR(-ENOENT))

 static inline bool is_error_page(struct page *page)
@@ -425,7 +432,6 @@ id_to_memslot(struct kvm_memslots *slots, int id)
 	return slot;
 }

-int kvm_is_error_hva(unsigned long addr);
 int kvm_set_memory_region(struct kvm *kvm,
 			  struct kvm_userspace_memory_region *mem,
 			  int user_alloc);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bf6dede..1bd83a6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -931,17 +931,6 @@ void kvm_disable_largepages(void)
 }
 EXPORT_SYMBOL_GPL(kvm_disable_largepages);

-static inline unsigned long bad_hva(void)
-{
-	return PAGE_OFFSET;
-}
-
-int kvm_is_error_hva(unsigned long addr)
-{
-	return addr == bad_hva();
-}
-EXPORT_SYMBOL_GPL(kvm_is_error_hva);
-
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
 {
 	return __gfn_to_memslot(kvm_memslots(kvm), gfn);
@@ -988,7 +977,7 @@ static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 				     gfn_t *nr_pages)
 {
 	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
-		return bad_hva();
+		return KVM_HVA_ERR_BAD;

 	if (nr_pages)
 		*nr_pages = slot->npages - (gfn - slot->base_gfn);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 09/12] KVM: introduce KVM_HVA_ERR_RO_BAD
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (7 preceding siblings ...)
  2012-08-07  9:52 ` [PATCH v5 08/12] KVM: introduce KVM_HVA_ERR_BAD Xiao Guangrong
@ 2012-08-07  9:53 ` Xiao Guangrong
  2012-08-07  9:54 ` [PATCH v5 10/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:53 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In the later patch, it indicates failure when we try to get a writable
hva from the readonly slot

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 include/linux/kvm_host.h |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7737e45..2df6861 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -77,11 +77,12 @@ static inline bool is_invalid_pfn(pfn_t pfn)
 	return !is_noslot_pfn(pfn) && is_error_pfn(pfn);
 }

-#define KVM_HVA_ERR_BAD	(PAGE_OFFSET)
+#define KVM_HVA_ERR_BAD		(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD	(PAGE_OFFSET + PAGE_SIZE)

 static inline bool kvm_is_error_hva(unsigned long addr)
 {
-	return addr == PAGE_OFFSET;
+	return addr >= PAGE_OFFSET;
 }

 #define KVM_ERR_PTR_BAD_PAGE	(ERR_PTR(-ENOENT))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 10/12] KVM: introduce readonly memslot
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (8 preceding siblings ...)
  2012-08-07  9:53 ` [PATCH v5 09/12] KVM: introduce KVM_HVA_ERR_RO_BAD Xiao Guangrong
@ 2012-08-07  9:54 ` Xiao Guangrong
  2012-08-07  9:54 ` [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info Xiao Guangrong
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:54 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault-pfn
and async is not allowed, then the vm will crash

We introduce readonly memory region to map ROM/ROMD to the guest, read access
is happy for readonly memslot, write access on readonly memslot will cause
KVM_EXIT_MMIO exit

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/api.txt |   10 +++-
 arch/x86/include/asm/kvm.h        |    1 +
 arch/x86/kvm/mmu.c                |    9 ++++
 arch/x86/kvm/x86.c                |    1 +
 include/linux/kvm.h               |    6 ++-
 virt/kvm/kvm_main.c               |   81 +++++++++++++++++++++++++++++--------
 6 files changed, 87 insertions(+), 21 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index bf33aaa..b91bfd4 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
 };

 /* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES  1UL
+#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
+#define KVM_MEM_READONLY	(1UL << 1)

 This ioctl allows the user to create or modify a guest physical memory
 slot.  When changing an existing slot, it may be moved in the guest
@@ -873,9 +874,12 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.

-The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
+The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which
 instructs kvm to keep track of writes to memory within the slot.  See
-the KVM_GET_DIRTY_LOG ioctl.
+the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READONLY when the
+KVM_CAP_READONLY_MEM capability, it indicates the guest memory is read-only,
+that means, guest is only allowed to read it. Writes will be posted to
+userspace as KVM_EXIT_MMIO exits.

 When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
 region are automatically reflected into the guest.  For example, an mmap()
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 246617e..521bf25 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -25,6 +25,7 @@
 #define __KVM_HAVE_DEBUGREGS
 #define __KVM_HAVE_XSAVE
 #define __KVM_HAVE_XCRS
+#define __KVM_HAVE_READONLY_MEM

 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5548971..8e312a2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2647,6 +2647,15 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *

 static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, pfn_t pfn)
 {
+	/*
+	 * Do not cache the mmio info caused by writing the readonly gfn
+	 * into the spte otherwise read access on readonly gfn also can
+	 * caused mmio page fault and treat it as mmio access.
+	 * Return 1 to tell kvm to emulate it.
+	 */
+	if (pfn == KVM_PFN_ERR_RO_FAULT)
+		return 1;
+
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(gfn_to_hva(vcpu->kvm, gfn), current);
 		return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8ebf65c..4c86239 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2167,6 +2167,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_PCI_2_3:
 	case KVM_CAP_KVMCLOCK_CTRL:
+	case KVM_CAP_READONLY_MEM:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index dc3aa2a..94867d0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -102,7 +102,8 @@ struct kvm_userspace_memory_region {
 };

 /* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES  1UL
+#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
+#define KVM_MEM_READONLY	(1UL << 1)

 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -617,6 +618,9 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#ifdef __KVM_HAVE_READONLY_MEM
+#define KVM_CAP_READONLY_MEM 81
+#endif

 #ifdef KVM_CAP_IRQ_ROUTING

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1bd83a6..5e899d9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -680,7 +680,13 @@ void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)

 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
 {
-	if (mem->flags & ~KVM_MEM_LOG_DIRTY_PAGES)
+	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
+
+#ifdef KVM_CAP_READONLY_MEM
+	valid_flags |= KVM_MEM_READONLY;
+#endif
+
+	if (mem->flags & ~valid_flags)
 		return -EINVAL;

 	return 0;
@@ -973,18 +979,32 @@ out:
 	return size;
 }

-static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
-				     gfn_t *nr_pages)
+static bool memslot_is_readonly(struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_READONLY;
+}
+
+static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
+				       gfn_t *nr_pages, bool write)
 {
 	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
 		return KVM_HVA_ERR_BAD;

+	if (memslot_is_readonly(slot) && write)
+		return KVM_HVA_ERR_RO_BAD;
+
 	if (nr_pages)
 		*nr_pages = slot->npages - (gfn - slot->base_gfn);

 	return gfn_to_hva_memslot(slot, gfn);
 }

+static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
+				     gfn_t *nr_pages)
+{
+	return __gfn_to_hva_many(slot, gfn, nr_pages, true);
+}
+
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
@@ -997,7 +1017,7 @@ EXPORT_SYMBOL_GPL(gfn_to_hva);
  */
 static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
 {
-	return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
+	return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false);
 }

 static int kvm_read_hva(void *data, void __user *hva, int len)
@@ -1108,6 +1128,17 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	return npages;
 }

+static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
+{
+	if (unlikely(!(vma->vm_flags & VM_READ)))
+		return false;
+
+	if (write_fault && (unlikely(!(vma->vm_flags & VM_WRITE))))
+		return false;
+
+	return true;
+}
+
 /*
  * Pin guest page in memory and return its pfn.
  * @addr: host virtual address which maps memory to the guest
@@ -1132,8 +1163,6 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);

-	BUG_ON(!write_fault && !writable);
-
 	if (hva_to_pfn_fast(addr, atomic, async, write_fault, writable, &pfn))
 		return pfn;

@@ -1160,7 +1189,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 			vma->vm_pgoff;
 		BUG_ON(!kvm_is_mmio_pfn(pfn));
 	} else {
-		if (async && (vma->vm_flags & VM_WRITE))
+		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
 		pfn = KVM_PFN_ERR_FAULT;
 	}
@@ -1169,19 +1198,40 @@ exit:
 	return pfn;
 }

+static pfn_t
+__gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, bool atomic,
+		     bool *async, bool write_fault, bool *writable)
+{
+	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
+
+	if (addr == KVM_HVA_ERR_RO_BAD)
+		return KVM_PFN_ERR_RO_FAULT;
+
+	if (kvm_is_error_hva(addr))
+		return KVM_PFN_ERR_BAD;
+
+	/* Do not map writable pfn in the readonly memslot. */
+	if (writable && memslot_is_readonly(slot)) {
+		*writable = false;
+		writable = NULL;
+	}
+
+	return hva_to_pfn(addr, atomic, async, write_fault,
+			  writable);
+}
+
 static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
 			  bool write_fault, bool *writable)
 {
-	unsigned long addr;
+	struct kvm_memory_slot *slot;

 	if (async)
 		*async = false;

-	addr = gfn_to_hva(kvm, gfn);
-	if (kvm_is_error_hva(addr))
-		return KVM_PFN_ERR_BAD;
+	slot = gfn_to_memslot(kvm, gfn);

-	return hva_to_pfn(addr, atomic, async, write_fault, writable);
+	return __gfn_to_pfn_memslot(slot, gfn, atomic, async, write_fault,
+				    writable);
 }

 pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
@@ -1212,15 +1262,12 @@ EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);

 pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	unsigned long addr = gfn_to_hva_memslot(slot, gfn);
-	return hva_to_pfn(addr, false, NULL, true, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL);
 }

 pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	unsigned long addr = gfn_to_hva_memslot(slot, gfn);
-
-	return hva_to_pfn(addr, true, NULL, true, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (9 preceding siblings ...)
  2012-08-07  9:54 ` [PATCH v5 10/12] KVM: introduce readonly memslot Xiao Guangrong
@ 2012-08-07  9:54 ` Xiao Guangrong
  2012-08-10 18:03   ` Marcelo Tosatti
  2012-08-07  9:55 ` [PATCH v5 12/12] KVM: indicate readonly access fault Xiao Guangrong
  2012-08-10 18:14 ` [PATCH v5 00/12] KVM: introduce readonly memslot Marcelo Tosatti
  12 siblings, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:54 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Introduce set_mmio_exit_info to cleanup the common code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |   33 +++++++++++++++++----------------
 1 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c86239..8cde327 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3761,9 +3761,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
 static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
 			   void *val, int bytes)
 {
-	struct kvm_mmio_fragment *frag = &vcpu->mmio_fragments[0];
-
-	memcpy(vcpu->run->mmio.data, frag->data, frag->len);
 	return X86EMUL_CONTINUE;
 }

@@ -3831,6 +3828,20 @@ mmio:
 	return X86EMUL_CONTINUE;
 }

+static void set_mmio_exit_info(struct kvm_vcpu *vcpu,
+			       struct kvm_mmio_fragment *frag, bool write)
+{
+	struct kvm_run *run = vcpu->run;
+
+	run->exit_reason = KVM_EXIT_MMIO;
+	run->mmio.phys_addr = frag->gpa;
+	run->mmio.len = frag->len;
+	run->mmio.is_write = vcpu->mmio_is_write = write;
+
+	if (write)
+		memcpy(run->mmio.data, frag->data, frag->len);
+}
+
 int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
 			void *val, unsigned int bytes,
 			struct x86_exception *exception,
@@ -3870,14 +3881,10 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
 		return rc;

 	gpa = vcpu->mmio_fragments[0].gpa;
-
 	vcpu->mmio_needed = 1;
 	vcpu->mmio_cur_fragment = 0;

-	vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
-	vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
-	vcpu->run->exit_reason = KVM_EXIT_MMIO;
-	vcpu->run->mmio.phys_addr = gpa;
+	set_mmio_exit_info(vcpu, &vcpu->mmio_fragments[0], ops->write);

 	return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
 }
@@ -5486,7 +5493,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
  */
 static int complete_mmio(struct kvm_vcpu *vcpu)
 {
-	struct kvm_run *run = vcpu->run;
 	struct kvm_mmio_fragment *frag;
 	int r;

@@ -5497,7 +5503,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 		/* Complete previous fragment */
 		frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
 		if (!vcpu->mmio_is_write)
-			memcpy(frag->data, run->mmio.data, frag->len);
+			memcpy(frag->data, vcpu->run->mmio.data, frag->len);
 		if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
 			vcpu->mmio_needed = 0;
 			if (vcpu->mmio_is_write)
@@ -5507,12 +5513,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 		}
 		/* Initiate next fragment */
 		++frag;
-		run->exit_reason = KVM_EXIT_MMIO;
-		run->mmio.phys_addr = frag->gpa;
-		if (vcpu->mmio_is_write)
-			memcpy(run->mmio.data, frag->data, frag->len);
-		run->mmio.len = frag->len;
-		run->mmio.is_write = vcpu->mmio_is_write;
+		set_mmio_exit_info(vcpu, frag, vcpu->mmio_is_write);
 		return 0;

 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 12/12] KVM: indicate readonly access fault
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (10 preceding siblings ...)
  2012-08-07  9:54 ` [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info Xiao Guangrong
@ 2012-08-07  9:55 ` Xiao Guangrong
  2012-08-10 18:14 ` [PATCH v5 00/12] KVM: introduce readonly memslot Marcelo Tosatti
  12 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-07  9:55 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
caused by write access on readonly memslot

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c       |   12 ++++++++----
 include/linux/kvm.h      |    3 +++
 include/linux/kvm_host.h |    1 +
 virt/kvm/kvm_main.c      |    3 +++
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8cde327..5e10482 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3704,9 +3704,10 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,

 	ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
 	if (ret < 0)
-		return 0;
+		return ret;
+
 	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
-	return 1;
+	return 0;
 }

 struct read_write_emulator_ops {
@@ -3736,7 +3737,7 @@ static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
 static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
 			void *val, int bytes)
 {
-	return !kvm_read_guest(vcpu->kvm, gpa, val, bytes);
+	return kvm_read_guest(vcpu->kvm, gpa, val, bytes);
 }

 static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -3798,7 +3799,8 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
 	if (ret)
 		goto mmio;

-	if (ops->read_write_emulate(vcpu, gpa, val, bytes))
+	ret = ops->read_write_emulate(vcpu, gpa, val, bytes);
+	if (!ret)
 		return X86EMUL_CONTINUE;

 mmio:
@@ -3820,6 +3822,7 @@ mmio:
 		frag->gpa = gpa;
 		frag->data = val;
 		frag->len = now;
+		frag->write_readonly_mem = (ret == -EPERM);

 		gpa += now;
 		val += now;
@@ -3837,6 +3840,7 @@ static void set_mmio_exit_info(struct kvm_vcpu *vcpu,
 	run->mmio.phys_addr = frag->gpa;
 	run->mmio.len = frag->len;
 	run->mmio.is_write = vcpu->mmio_is_write = write;
+	run->mmio.write_readonly_mem = frag->write_readonly_mem;

 	if (write)
 		memcpy(run->mmio.data, frag->data, frag->len);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 94867d0..9261541 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -222,6 +222,9 @@ struct kvm_run {
 			__u8  data[8];
 			__u32 len;
 			__u8  is_write;
+#ifdef __KVM_HAVE_READONLY_MEM
+			__u8 write_readonly_mem;
+#endif
 		} mmio;
 		/* KVM_EXIT_HYPERCALL */
 		struct {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2df6861..c47979d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -184,6 +184,7 @@ struct kvm_mmio_fragment {
 	gpa_t gpa;
 	void *data;
 	unsigned len;
+	bool write_readonly_mem;
 };

 struct kvm_vcpu {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5e899d9..d0babc9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1445,6 +1445,9 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
 	unsigned long addr;

 	addr = gfn_to_hva(kvm, gfn);
+	if (addr == KVM_HVA_ERR_RO_BAD)
+		return -EPERM;
+
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
 	r = __copy_to_user((void __user *)addr + offset, data, len);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace
  2012-08-07  9:48 ` [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace Xiao Guangrong
@ 2012-08-09 18:48   ` Marcelo Tosatti
  2012-08-10  2:11     ` Xiao Guangrong
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-09 18:48 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Aug 07, 2012 at 05:48:49PM +0800, Xiao Guangrong wrote:
> Quote Avi's comment:
> | KVM_MEMSLOT_INVALID is actually an internal symbol, not used by
> | userspace.  Please move it to kvm_host.h.
> 
> Also, move KVM_MEMSLOT_INVALID to the highest bit
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  include/linux/kvm.h      |    1 -
>  include/linux/kvm_host.h |    2 ++
>  2 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..dc3aa2a 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -103,7 +103,6 @@ struct kvm_userspace_memory_region {
> 
>  /* for kvm_memory_region::flags */
>  #define KVM_MEM_LOG_DIRTY_PAGES  1UL
> -#define KVM_MEMSLOT_INVALID      (1UL << 1)
> 
>  /* for KVM_IRQ_LINE */
>  struct kvm_irq_level {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d2b897e..4c2da5a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -35,6 +35,8 @@
>  #define KVM_MMIO_SIZE 8
>  #endif
> 
> +#define KVM_MEMSLOT_INVALID	(1UL << 31)
> +
>  /*
>   * If we support unaligned MMIO, at most one fragment will be split into two:
>   */

Please document which range is for userspace visible flags, which range
is reserved. Mention that in both headers, point to each other
("userspace definitions are there" / "internal definitions are there").

16/16 bits for each should be fine.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic
  2012-08-07  9:49 ` [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic Xiao Guangrong
@ 2012-08-09 18:50   ` Marcelo Tosatti
  2012-08-10  3:22     ` Xiao Guangrong
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-09 18:50 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Aug 07, 2012 at 05:49:36PM +0800, Xiao Guangrong wrote:
> It can instead of hva_to_pfn_atomic
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c       |    5 +----
>  include/linux/kvm_host.h |    3 ++-
>  virt/kvm/kvm_main.c      |   14 ++++++++------
>  3 files changed, 11 insertions(+), 11 deletions(-)

What if someone needs gfn_to_hva_memslot in the future, which is not
unlikely?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace
  2012-08-09 18:48   ` Marcelo Tosatti
@ 2012-08-10  2:11     ` Xiao Guangrong
  0 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-10  2:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/10/2012 02:48 AM, Marcelo Tosatti wrote:

>> +#define KVM_MEMSLOT_INVALID	(1UL << 31)
>> +
>>  /*
>>   * If we support unaligned MMIO, at most one fragment will be split into two:
>>   */
> 
> Please document which range is for userspace visible flags, which range
> is reserved. Mention that in both headers, point to each other
> ("userspace definitions are there" / "internal definitions are there").
> 
> 16/16 bits for each should be fine.

Okay, good to me, will do it in the next version, thank you, Marcelo!



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic
  2012-08-09 18:50   ` Marcelo Tosatti
@ 2012-08-10  3:22     ` Xiao Guangrong
  0 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-10  3:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/10/2012 02:50 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:49:36PM +0800, Xiao Guangrong wrote:
>> It can instead of hva_to_pfn_atomic
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c       |    5 +----
>>  include/linux/kvm_host.h |    3 ++-
>>  virt/kvm/kvm_main.c      |   14 ++++++++------
>>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> What if someone needs gfn_to_hva_memslot in the future, which is not
> unlikely?

Marcelo,

We have already had the function gfn_to_hva_memslot in kvm_host.h which
is not directly used on x86. In this patchset, gfn_to_hva_memslot does
not check the slot->flags.

I think we can let it be based on gfn_to_hva_many:
static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
                                               gfn_t gfn)
{
	return gfn_to_hva_many(slot, gfn, true);
}

Then all gfn_to_*() functions are based on gfn_to_hva_many in which we
will check slot->flags properly.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 05/12] KVM: reorganize hva_to_pfn
  2012-08-07  9:51 ` [PATCH v5 05/12] KVM: reorganize hva_to_pfn Xiao Guangrong
@ 2012-08-10 17:51   ` Marcelo Tosatti
  2012-08-11  3:11     ` Xiao Guangrong
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-10 17:51 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Aug 07, 2012 at 05:51:05PM +0800, Xiao Guangrong wrote:
> We do too many things in hva_to_pfn, this patch reorganize the code,
> let it be better readable
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  virt/kvm/kvm_main.c |  159 +++++++++++++++++++++++++++++++--------------------
>  1 files changed, 97 insertions(+), 62 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 26ffc87..dd01bcb 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1043,83 +1043,118 @@ static inline int check_user_page_hwpoison(unsigned long addr)
>  	return rc == -EHWPOISON;
>  }
> 
> -static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> -			bool write_fault, bool *writable)
> +/*
> + * The atomic path to get the writable pfn which will be stored in @pfn,
> + * true indicates success, otherwise false is returned.
> + */
> +static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
> +			    bool write_fault, bool *writable, pfn_t *pfn)
>  {
>  	struct page *page[1];
> -	int npages = 0;
> -	pfn_t pfn;
> +	int npages;
> 
> -	/* we can do it either atomically or asynchronously, not both */
> -	BUG_ON(atomic && async);
> +	if (!(async || atomic))
> +		return false;
> 
> -	BUG_ON(!write_fault && !writable);
> +	npages = __get_user_pages_fast(addr, 1, 1, page);
> +	if (npages == 1) {
> +		*pfn = page_to_pfn(page[0]);
> 
> -	if (writable)
> -		*writable = true;
> +		if (writable)
> +			*writable = true;
> +		return true;
> +	}
> +
> +	return false;
> +}
> 
> -	if (atomic || async)
> -		npages = __get_user_pages_fast(addr, 1, 1, page);
> +/*
> + * The slow path to get the pfn of the specified host virtual address,
> + * 1 indicates success, -errno is returned if error is detected.
> + */
> +static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
> +			   bool *writable, pfn_t *pfn)
> +{
> +	struct page *page[1];
> +	int npages = 0;
> 
> -	if (unlikely(npages != 1) && !atomic) {
> -		might_sleep();
> +	might_sleep();
> 
> -		if (writable)
> -			*writable = write_fault;
> -
> -		if (async) {
> -			down_read(&current->mm->mmap_sem);
> -			npages = get_user_page_nowait(current, current->mm,
> -						     addr, write_fault, page);
> -			up_read(&current->mm->mmap_sem);
> -		} else
> -			npages = get_user_pages_fast(addr, 1, write_fault,
> -						     page);
> -
> -		/* map read fault as writable if possible */
> -		if (unlikely(!write_fault) && npages == 1) {
> -			struct page *wpage[1];
> -
> -			npages = __get_user_pages_fast(addr, 1, 1, wpage);
> -			if (npages == 1) {
> -				*writable = true;
> -				put_page(page[0]);
> -				page[0] = wpage[0];
> -			}
> -			npages = 1;
> +	if (writable)
> +		*writable = write_fault;
> +
> +	if (async) {
> +		down_read(&current->mm->mmap_sem);
> +		npages = get_user_page_nowait(current, current->mm,
> +					      addr, write_fault, page);
> +		up_read(&current->mm->mmap_sem);
> +	} else
> +		npages = get_user_pages_fast(addr, 1, write_fault,
> +					     page);
> +	if (npages != 1)
> +		return npages;

 * Returns number of pages pinned. This may be fewer than the number
 * requested. If nr_pages is 0 or negative, returns 0. If no pages
 * were pinned, returns -errno.
 */
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
                        struct page **pages)


Current behaviour is

        if (atomic || async)
                npages = __get_user_pages_fast(addr, 1, 1, page);

	if (npages != 1) 
		slow path retry;

The changes above change this, don't they?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info
  2012-08-07  9:54 ` [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info Xiao Guangrong
@ 2012-08-10 18:03   ` Marcelo Tosatti
  2012-08-11  3:13     ` Xiao Guangrong
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-10 18:03 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Aug 07, 2012 at 05:54:42PM +0800, Xiao Guangrong wrote:
> Introduce set_mmio_exit_info to cleanup the common code
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/x86.c |   33 +++++++++++++++++----------------
>  1 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4c86239..8cde327 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3761,9 +3761,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
>  static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
>  			   void *val, int bytes)
>  {
> -	struct kvm_mmio_fragment *frag = &vcpu->mmio_fragments[0];
> -
> -	memcpy(vcpu->run->mmio.data, frag->data, frag->len);
>  	return X86EMUL_CONTINUE;
>  }
> 
> @@ -3831,6 +3828,20 @@ mmio:
>  	return X86EMUL_CONTINUE;
>  }
> 
> +static void set_mmio_exit_info(struct kvm_vcpu *vcpu,
> +			       struct kvm_mmio_fragment *frag, bool write)
> +{
> +	struct kvm_run *run = vcpu->run;
> +
> +	run->exit_reason = KVM_EXIT_MMIO;
> +	run->mmio.phys_addr = frag->gpa;
> +	run->mmio.len = frag->len;
> +	run->mmio.is_write = vcpu->mmio_is_write = write;
> +
> +	if (write)
> +		memcpy(run->mmio.data, frag->data, frag->len);
> +}
> +
>  int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
>  			void *val, unsigned int bytes,
>  			struct x86_exception *exception,
> @@ -3870,14 +3881,10 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
>  		return rc;
> 
>  	gpa = vcpu->mmio_fragments[0].gpa;
> -
>  	vcpu->mmio_needed = 1;
>  	vcpu->mmio_cur_fragment = 0;
> 
> -	vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
> -	vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
> -	vcpu->run->exit_reason = KVM_EXIT_MMIO;
> -	vcpu->run->mmio.phys_addr = gpa;
> +	set_mmio_exit_info(vcpu, &vcpu->mmio_fragments[0], ops->write);
> 
>  	return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
>  }
> @@ -5486,7 +5493,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>   */
>  static int complete_mmio(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_run *run = vcpu->run;
>  	struct kvm_mmio_fragment *frag;
>  	int r;
> 
> @@ -5497,7 +5503,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>  		/* Complete previous fragment */
>  		frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
>  		if (!vcpu->mmio_is_write)
> -			memcpy(frag->data, run->mmio.data, frag->len);
> +			memcpy(frag->data, vcpu->run->mmio.data, frag->len);
>  		if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
>  			vcpu->mmio_needed = 0;
>  			if (vcpu->mmio_is_write)
> @@ -5507,12 +5513,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>  		}
>  		/* Initiate next fragment */
>  		++frag;
> -		run->exit_reason = KVM_EXIT_MMIO;
> -		run->mmio.phys_addr = frag->gpa;
> -		if (vcpu->mmio_is_write)
> -			memcpy(run->mmio.data, frag->data, frag->len);
> -		run->mmio.len = frag->len;
> -		run->mmio.is_write = vcpu->mmio_is_write;
> +		set_mmio_exit_info(vcpu, frag, vcpu->mmio_is_write);
>  		return 0;
> 
>  	}
> -- 
> 1.7.7.6

IMO having a function is unnecessary (it makes it harder the code).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
                   ` (11 preceding siblings ...)
  2012-08-07  9:55 ` [PATCH v5 12/12] KVM: indicate readonly access fault Xiao Guangrong
@ 2012-08-10 18:14 ` Marcelo Tosatti
  2012-08-11  3:36   ` Xiao Guangrong
  2012-08-14 14:00   ` Avi Kivity
  12 siblings, 2 replies; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-10 18:14 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> Changelog:
> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> 
> The test case can be found at:
> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> 
> In current code, if we map a readonly memory space from host to guest
> and the page is not currently mapped in the host, we will get a fault-pfn
> and async is not allowed, then the vm will crash.
> 
> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> to the guest, read access is happy for readonly memslot, write access on
> readonly memslot will cause KVM_EXIT_MMIO exit.

Memory slots whose QEMU mapping is write protected is supported
today, as long as there are no write faults.

What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
again?

The initial objective was to fix a vm crash, can you explain that
initial problem?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 05/12] KVM: reorganize hva_to_pfn
  2012-08-10 17:51   ` Marcelo Tosatti
@ 2012-08-11  3:11     ` Xiao Guangrong
  0 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-11  3:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/11/2012 01:51 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:51:05PM +0800, Xiao Guangrong wrote:
>> We do too many things in hva_to_pfn, this patch reorganize the code,
>> let it be better readable
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  virt/kvm/kvm_main.c |  159 +++++++++++++++++++++++++++++++--------------------
>>  1 files changed, 97 insertions(+), 62 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 26ffc87..dd01bcb 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1043,83 +1043,118 @@ static inline int check_user_page_hwpoison(unsigned long addr)
>>  	return rc == -EHWPOISON;
>>  }
>>
>> -static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
>> -			bool write_fault, bool *writable)
>> +/*
>> + * The atomic path to get the writable pfn which will be stored in @pfn,
>> + * true indicates success, otherwise false is returned.
>> + */
>> +static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
>> +			    bool write_fault, bool *writable, pfn_t *pfn)
>>  {
>>  	struct page *page[1];
>> -	int npages = 0;
>> -	pfn_t pfn;
>> +	int npages;
>>
>> -	/* we can do it either atomically or asynchronously, not both */
>> -	BUG_ON(atomic && async);
>> +	if (!(async || atomic))
>> +		return false;
>>
>> -	BUG_ON(!write_fault && !writable);
>> +	npages = __get_user_pages_fast(addr, 1, 1, page);
>> +	if (npages == 1) {
>> +		*pfn = page_to_pfn(page[0]);
>>
>> -	if (writable)
>> -		*writable = true;
>> +		if (writable)
>> +			*writable = true;
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>>
>> -	if (atomic || async)
>> -		npages = __get_user_pages_fast(addr, 1, 1, page);
>> +/*
>> + * The slow path to get the pfn of the specified host virtual address,
>> + * 1 indicates success, -errno is returned if error is detected.
>> + */
>> +static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
>> +			   bool *writable, pfn_t *pfn)
>> +{
>> +	struct page *page[1];
>> +	int npages = 0;
>>
>> -	if (unlikely(npages != 1) && !atomic) {
>> -		might_sleep();
>> +	might_sleep();
>>
>> -		if (writable)
>> -			*writable = write_fault;
>> -
>> -		if (async) {
>> -			down_read(&current->mm->mmap_sem);
>> -			npages = get_user_page_nowait(current, current->mm,
>> -						     addr, write_fault, page);
>> -			up_read(&current->mm->mmap_sem);
>> -		} else
>> -			npages = get_user_pages_fast(addr, 1, write_fault,
>> -						     page);
>> -
>> -		/* map read fault as writable if possible */
>> -		if (unlikely(!write_fault) && npages == 1) {
>> -			struct page *wpage[1];
>> -
>> -			npages = __get_user_pages_fast(addr, 1, 1, wpage);
>> -			if (npages == 1) {
>> -				*writable = true;
>> -				put_page(page[0]);
>> -				page[0] = wpage[0];
>> -			}
>> -			npages = 1;
>> +	if (writable)
>> +		*writable = write_fault;
>> +
>> +	if (async) {
>> +		down_read(&current->mm->mmap_sem);
>> +		npages = get_user_page_nowait(current, current->mm,
>> +					      addr, write_fault, page);
>> +		up_read(&current->mm->mmap_sem);
>> +	} else
>> +		npages = get_user_pages_fast(addr, 1, write_fault,
>> +					     page);
>> +	if (npages != 1)
>> +		return npages;
> 
>  * Returns number of pages pinned. This may be fewer than the number
>  * requested. If nr_pages is 0 or negative, returns 0. If no pages
>  * were pinned, returns -errno.
>  */
> int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>                         struct page **pages)
> 
> 
> Current behaviour is
> 
>         if (atomic || async)
>                 npages = __get_user_pages_fast(addr, 1, 1, page);
> 
> 	if (npages != 1) 
> 		slow path retry;
> 
> The changes above change this, don't they?

Marcelo,

Sorry, I do not know why you thought the logic was changed, in this patch,
the logic is:

	/* return true if it is successful. */
        if (hva_to_pfn_fast(addr, atomic, async, write_fault, writable, &pfn))
                return pfn;

	/* atomic can not go to slow path. */
        if (atomic)
                return KVM_PFN_ERR_FAULT;

	/* get pfn by the slow path */
        npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
        if (npages == 1)
                return pfn;

	/* the error-handle path. */
	......


Did i miss something?



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info
  2012-08-10 18:03   ` Marcelo Tosatti
@ 2012-08-11  3:13     ` Xiao Guangrong
  0 siblings, 0 replies; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-11  3:13 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/11/2012 02:03 AM, Marcelo Tosatti wrote:

>>  int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
>>  			void *val, unsigned int bytes,
>>  			struct x86_exception *exception,
>> @@ -3870,14 +3881,10 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
>>  		return rc;
>>
>>  	gpa = vcpu->mmio_fragments[0].gpa;
>> -
>>  	vcpu->mmio_needed = 1;
>>  	vcpu->mmio_cur_fragment = 0;
>>
>> -	vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
>> -	vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
>> -	vcpu->run->exit_reason = KVM_EXIT_MMIO;
>> -	vcpu->run->mmio.phys_addr = gpa;
>> +	set_mmio_exit_info(vcpu, &vcpu->mmio_fragments[0], ops->write);
>>
>>  	return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
>>  }
>> @@ -5486,7 +5493,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>   */
>>  static int complete_mmio(struct kvm_vcpu *vcpu)
>>  {
>> -	struct kvm_run *run = vcpu->run;
>>  	struct kvm_mmio_fragment *frag;
>>  	int r;
>>
>> @@ -5497,7 +5503,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>>  		/* Complete previous fragment */
>>  		frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
>>  		if (!vcpu->mmio_is_write)
>> -			memcpy(frag->data, run->mmio.data, frag->len);
>> +			memcpy(frag->data, vcpu->run->mmio.data, frag->len);
>>  		if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
>>  			vcpu->mmio_needed = 0;
>>  			if (vcpu->mmio_is_write)
>> @@ -5507,12 +5513,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>>  		}
>>  		/* Initiate next fragment */
>>  		++frag;
>> -		run->exit_reason = KVM_EXIT_MMIO;
>> -		run->mmio.phys_addr = frag->gpa;
>> -		if (vcpu->mmio_is_write)
>> -			memcpy(run->mmio.data, frag->data, frag->len);
>> -		run->mmio.len = frag->len;
>> -		run->mmio.is_write = vcpu->mmio_is_write;
>> +		set_mmio_exit_info(vcpu, frag, vcpu->mmio_is_write);
>>  		return 0;
>>
>>  	}
>> -- 
>> 1.7.7.6
> 
> IMO having a function is unnecessary (it makes it harder the code).

Okay, i will drop this patch.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-10 18:14 ` [PATCH v5 00/12] KVM: introduce readonly memslot Marcelo Tosatti
@ 2012-08-11  3:36   ` Xiao Guangrong
  2012-08-13 17:39     ` Marcelo Tosatti
  2012-08-14 14:00   ` Avi Kivity
  1 sibling, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-11  3:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>>
>> The test case can be found at:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>>
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault-pfn
>> and async is not allowed, then the vm will crash.
>>
>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>> to the guest, read access is happy for readonly memslot, write access on
>> readonly memslot will cause KVM_EXIT_MMIO exit.
> 
> Memory slots whose QEMU mapping is write protected is supported
> today, as long as there are no write faults.
> 
> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> again?
> 

It is happy to map !write host memory space to the readonly memslot,
and they can coexist as well.

readonly memslot checks the write-permission by seeing slot->flags and
!write memory checks the write-permission in hva_to_pfn() function
which checks vma->flags. It is no conflict.

> The initial objective was to fix a vm crash, can you explain that
> initial problem?
>

The issue was trigged by this code:

                } else {
                        if (async && (vma->vm_flags & VM_WRITE))
                                *async = true;
                        pfn = KVM_PFN_ERR_FAULT;
                }

If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
its physical page is swapped out (or the file data does not be read in),
get_user_page_nowait will fail, above code reject to set async,
then we will get a fault pfn and async=false.

I guess this issue also exists in "QEMU write protected mapping" as
you mentioned above.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-11  3:36   ` Xiao Guangrong
@ 2012-08-13 17:39     ` Marcelo Tosatti
  2012-08-14  2:58       ` Xiao Guangrong
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 17:39 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> > On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >> Changelog:
> >> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >>
> >> The test case can be found at:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >>
> >> In current code, if we map a readonly memory space from host to guest
> >> and the page is not currently mapped in the host, we will get a fault-pfn
> >> and async is not allowed, then the vm will crash.
> >>
> >> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >> to the guest, read access is happy for readonly memslot, write access on
> >> readonly memslot will cause KVM_EXIT_MMIO exit.
> > 
> > Memory slots whose QEMU mapping is write protected is supported
> > today, as long as there are no write faults.
> > 
> > What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> > again?
> > 
> 
> It is happy to map !write host memory space to the readonly memslot,
> and they can coexist as well.
> 
> readonly memslot checks the write-permission by seeing slot->flags and
> !write memory checks the write-permission in hva_to_pfn() function
> which checks vma->flags. It is no conflict.

Yes, there is no conflict. The point is, if you can use the
mmap(PROT_READ) interface (supporting read faults on read-only slots)
for this behavior, what is the advantage of a new memslot flag?

I'm not saying mmap(PROT_READ) is the best interface, i am just asking
why it is not.

> > The initial objective was to fix a vm crash, can you explain that
> > initial problem?
> >
> 
> The issue was trigged by this code:
> 
>                 } else {
>                         if (async && (vma->vm_flags & VM_WRITE))
>                                 *async = true;
>                         pfn = KVM_PFN_ERR_FAULT;
>                 }
> 
> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
> its physical page is swapped out (or the file data does not be read in),
> get_user_page_nowait will fail, above code reject to set async,
> then we will get a fault pfn and async=false.
> 
> I guess this issue also exists in "QEMU write protected mapping" as
> you mentioned above.

Yes, it does. As far as i understand, what that check does from a high
level pov is:

- Did get_user_pages_nowait() fail due to a swapped out page (in which 
case we should try to swappin the page asynchronously), or due to 
another reason (for which case an error should be returned).

Using vma->vm_flags VM_WRITE for that is trying to guess why
get_user_pages_nowait() failed, because it (gup_nowait return values) 
does not provide sufficient information by itself.

Can't that be fixed separately? 

Another issue which is also present with the mmap(PROT_READ) scheme is
interaction with reexecute_instruction. That is, unless i am mistaken,
reexecute_instruction can succeed (return true) on a region that is
write protected. This breaks the "write faults on read-only slots exit
to userspace via EXIT_MMIO" behaviour.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-13 17:39     ` Marcelo Tosatti
@ 2012-08-14  2:58       ` Xiao Guangrong
  2012-08-14 15:25         ` Marcelo Tosatti
  0 siblings, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-14  2:58 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
> On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
>> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
>>> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>>>> Changelog:
>>>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>>>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>>>>
>>>> The test case can be found at:
>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>>>>
>>>> In current code, if we map a readonly memory space from host to guest
>>>> and the page is not currently mapped in the host, we will get a fault-pfn
>>>> and async is not allowed, then the vm will crash.
>>>>
>>>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>>>> to the guest, read access is happy for readonly memslot, write access on
>>>> readonly memslot will cause KVM_EXIT_MMIO exit.
>>>
>>> Memory slots whose QEMU mapping is write protected is supported
>>> today, as long as there are no write faults.
>>>
>>> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
>>> again?
>>>
>>
>> It is happy to map !write host memory space to the readonly memslot,
>> and they can coexist as well.
>>
>> readonly memslot checks the write-permission by seeing slot->flags and
>> !write memory checks the write-permission in hva_to_pfn() function
>> which checks vma->flags. It is no conflict.
> 
> Yes, there is no conflict. The point is, if you can use the
> mmap(PROT_READ) interface (supporting read faults on read-only slots)
> for this behavior, what is the advantage of a new memslot flag?
> 

You can get the discussion at:
https://lkml.org/lkml/2012/5/22/228

> I'm not saying mmap(PROT_READ) is the best interface, i am just asking
> why it is not.

My fault. :(

> 
>>> The initial objective was to fix a vm crash, can you explain that
>>> initial problem?
>>>
>>
>> The issue was trigged by this code:
>>
>>                 } else {
>>                         if (async && (vma->vm_flags & VM_WRITE))
>>                                 *async = true;
>>                         pfn = KVM_PFN_ERR_FAULT;
>>                 }
>>
>> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
>> its physical page is swapped out (or the file data does not be read in),
>> get_user_page_nowait will fail, above code reject to set async,
>> then we will get a fault pfn and async=false.
>>
>> I guess this issue also exists in "QEMU write protected mapping" as
>> you mentioned above.
> 
> Yes, it does. As far as i understand, what that check does from a high
> level pov is:
> 
> - Did get_user_pages_nowait() fail due to a swapped out page (in which 
> case we should try to swappin the page asynchronously), or due to 
> another reason (for which case an error should be returned).
> 
> Using vma->vm_flags VM_WRITE for that is trying to guess why
> get_user_pages_nowait() failed, because it (gup_nowait return values) 
> does not provide sufficient information by itself.
> 

That is exactly what i did in the first version. :)

You can see it and the reason why it switched to the new way (readonly memslot)
in the above website (the first message in thread).

> Can't that be fixed separately? 
> 
> Another issue which is also present with the mmap(PROT_READ) scheme is
> interaction with reexecute_instruction. That is, unless i am mistaken,
> reexecute_instruction can succeed (return true) on a region that is
> write protected. This breaks the "write faults on read-only slots exit
> to userspace via EXIT_MMIO" behaviour.

Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-10 18:14 ` [PATCH v5 00/12] KVM: introduce readonly memslot Marcelo Tosatti
  2012-08-11  3:36   ` Xiao Guangrong
@ 2012-08-14 14:00   ` Avi Kivity
  2012-08-14 15:51     ` Marcelo Tosatti
  1 sibling, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-08-14 14:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, LKML, KVM

On 08/10/2012 09:14 PM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>> 
>> The test case can be found at:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>> 
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault-pfn
>> and async is not allowed, then the vm will crash.
>> 
>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>> to the guest, read access is happy for readonly memslot, write access on
>> readonly memslot will cause KVM_EXIT_MMIO exit.
> 
> Memory slots whose QEMU mapping is write protected is supported
> today, as long as there are no write faults.
> 
> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> again?

Userspace may want to modify the ROM (for example, when programming a
flash device).  It is also possible to map an hva range rw through one
slot and ro through another.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-14  2:58       ` Xiao Guangrong
@ 2012-08-14 15:25         ` Marcelo Tosatti
  2012-08-16  5:49           ` Xiao Guangrong
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-14 15:25 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

[-- Attachment #1: Type: text/plain, Size: 4491 bytes --]

On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
> On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
> > On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
> >> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> >>> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >>>> Changelog:
> >>>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >>>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >>>>
> >>>> The test case can be found at:
> >>>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >>>>
> >>>> In current code, if we map a readonly memory space from host to guest
> >>>> and the page is not currently mapped in the host, we will get a fault-pfn
> >>>> and async is not allowed, then the vm will crash.
> >>>>
> >>>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >>>> to the guest, read access is happy for readonly memslot, write access on
> >>>> readonly memslot will cause KVM_EXIT_MMIO exit.
> >>>
> >>> Memory slots whose QEMU mapping is write protected is supported
> >>> today, as long as there are no write faults.
> >>>
> >>> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> >>> again?
> >>>
> >>
> >> It is happy to map !write host memory space to the readonly memslot,
> >> and they can coexist as well.
> >>
> >> readonly memslot checks the write-permission by seeing slot->flags and
> >> !write memory checks the write-permission in hva_to_pfn() function
> >> which checks vma->flags. It is no conflict.
> > 
> > Yes, there is no conflict. The point is, if you can use the
> > mmap(PROT_READ) interface (supporting read faults on read-only slots)
> > for this behavior, what is the advantage of a new memslot flag?
> > 
> 
> You can get the discussion at:
> https://lkml.org/lkml/2012/5/22/228
> 
> > I'm not saying mmap(PROT_READ) is the best interface, i am just asking
> > why it is not.
> 
> My fault. :(
> 
> > 
> >>> The initial objective was to fix a vm crash, can you explain that
> >>> initial problem?
> >>>
> >>
> >> The issue was trigged by this code:
> >>
> >>                 } else {
> >>                         if (async && (vma->vm_flags & VM_WRITE))
> >>                                 *async = true;
> >>                         pfn = KVM_PFN_ERR_FAULT;
> >>                 }
> >>
> >> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
> >> its physical page is swapped out (or the file data does not be read in),
> >> get_user_page_nowait will fail, above code reject to set async,
> >> then we will get a fault pfn and async=false.
> >>
> >> I guess this issue also exists in "QEMU write protected mapping" as
> >> you mentioned above.
> > 
> > Yes, it does. As far as i understand, what that check does from a high
> > level pov is:
> > 
> > - Did get_user_pages_nowait() fail due to a swapped out page (in which 
> > case we should try to swappin the page asynchronously), or due to 
> > another reason (for which case an error should be returned).
> > 
> > Using vma->vm_flags VM_WRITE for that is trying to guess why
> > get_user_pages_nowait() failed, because it (gup_nowait return values) 
> > does not provide sufficient information by itself.
> > 
> 
> That is exactly what i did in the first version. :)
> 
> You can see it and the reason why it switched to the new way (readonly memslot)
> in the above website (the first message in thread).

Userspace can create multiple mappings for the same memory region, for
example via shared memory (shm_open), and have different protections for
the two (or more) regions. I had old patch doing this, its attached.

> > Can't that be fixed separately? 
> > 
> > Another issue which is also present with the mmap(PROT_READ) scheme is
> > interaction with reexecute_instruction. That is, unless i am mistaken,
> > reexecute_instruction can succeed (return true) on a region that is
> > write protected. This breaks the "write faults on read-only slots exit
> > to userspace via EXIT_MMIO" behaviour.
> 
> Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?

reexecute_instruction validates presence of GPA by looking at registered
memslots. But if the access is a write, and userspace memory map is
read-only, reexecute_instruction should exit via MMIO.

That is, reexecute_instruction must validate GPA using registered
memslots AND additionaly userspace map permission, not only registered
memslot.
 

[-- Attachment #2: qemukvm-guest-mapping --]
[-- Type: text/plain, Size: 5327 bytes --]

Index: qemu-kvm-gpage-cache/cpu-common.h
===================================================================
--- qemu-kvm-gpage-cache.orig/cpu-common.h
+++ qemu-kvm-gpage-cache/cpu-common.h
@@ -33,6 +33,7 @@ ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
+void *qemu_get_ram_ptr_guest(ram_addr_t addr);
 /* This should not be used by devices.  */
 int do_qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
 ram_addr_t qemu_ram_addr_from_host(void *ptr);
Index: qemu-kvm-gpage-cache/exec.c
===================================================================
--- qemu-kvm-gpage-cache.orig/exec.c
+++ qemu-kvm-gpage-cache/exec.c
@@ -35,6 +35,7 @@
 #include "exec-all.h"
 #include "qemu-common.h"
 #include "cache-utils.h"
+#include "sysemu.h"
 
 #if !defined(TARGET_IA64)
 #include "tcg.h"
@@ -124,6 +125,7 @@ static int in_migration;
 
 typedef struct RAMBlock {
     uint8_t *host;
+    uint8_t *guest;
     ram_addr_t offset;
     ram_addr_t length;
     struct RAMBlock *next;
@@ -2450,7 +2452,8 @@ static long gethugepagesize(const char *
     return fs.f_bsize;
 }
 
-static void *file_ram_alloc(ram_addr_t memory, const char *path)
+static void *file_ram_alloc(ram_addr_t memory, const char *path,
+                            RAMBlock *block)
 {
     char *filename;
     void *area;
@@ -2507,7 +2510,12 @@ static void *file_ram_alloc(ram_addr_t m
      * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
      * to sidestep this quirk.
      */
-    flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+    if (mem_guest_map)
+        flags = MAP_SHARED;
+    else if (mem_prealloc)
+        flags = MAP_POPULATE|MAP_SHARED;
+    else
+        flags = MAP_PRIVATE;
     area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
 #else
     area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
@@ -2517,12 +2525,22 @@ static void *file_ram_alloc(ram_addr_t m
 	close(fd);
 	return (NULL);
     }
+    if (mem_guest_map) {
+        block->guest = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+        if (block->guest == MAP_FAILED) {
+            perror("alloc_mem_area: can't mmap guest map");
+            munmap(area, memory);
+            close(fd);
+            return NULL;
+        }
+    }
     return area;
 }
 
 #else
 
-static void *file_ram_alloc(ram_addr_t memory, const char *path)
+static void *file_ram_alloc(ram_addr_t memory, const char *path,
+                            RAMBlock *block)
 {
     return NULL;
 }
@@ -2538,7 +2556,7 @@ ram_addr_t qemu_ram_alloc(ram_addr_t siz
     size = TARGET_PAGE_ALIGN(size);
     new_block = qemu_malloc(sizeof(*new_block));
 
-    new_block->host = file_ram_alloc(size, mem_path);
+    new_block->host = file_ram_alloc(size, mem_path, new_block);
     if (!new_block->host) {
 #if defined(TARGET_S390X) && defined(CONFIG_KVM)
     /* XXX S390 KVM requires the topmost vma of the RAM to be < 256GB */
@@ -2584,7 +2602,8 @@ void qemu_ram_free(ram_addr_t addr)
    It should not be used for general purpose DMA.
    Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
  */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+
+static void *__qemu_get_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *prev;
     RAMBlock **prevp;
@@ -2610,9 +2629,27 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
         block->next = *prevp;
         *prevp = block;
     }
+    return block;
+}
+
+void *qemu_get_ram_ptr(ram_addr_t addr)
+{
+    RAMBlock *block = __qemu_get_ram_ptr(addr);
+
     return block->host + (addr - block->offset);
 }
 
+void *qemu_get_ram_ptr_guest(ram_addr_t addr)
+{
+    RAMBlock *block;
+
+    if (!mem_guest_map)
+        return qemu_get_ram_ptr(addr);
+
+    block =  __qemu_get_ram_ptr(addr);
+    return block->guest + (addr - block->offset);
+}
+
 int do_qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
 {
     RAMBlock *prev;
Index: qemu-kvm-gpage-cache/qemu-kvm.c
===================================================================
--- qemu-kvm-gpage-cache.orig/qemu-kvm.c
+++ qemu-kvm-gpage-cache/qemu-kvm.c
@@ -2327,7 +2327,7 @@ void kvm_set_phys_mem(target_phys_addr_t
 #endif
 
     r = kvm_register_phys_mem(kvm_context, start_addr,
-                              qemu_get_ram_ptr(phys_offset), size, 0);
+                              qemu_get_ram_ptr_guest(phys_offset), size, 0);
     if (r < 0) {
         printf("kvm_cpu_register_physical_memory: failed\n");
         exit(1);
Index: qemu-kvm-gpage-cache/sysemu.h
===================================================================
--- qemu-kvm-gpage-cache.orig/sysemu.h
+++ qemu-kvm-gpage-cache/sysemu.h
@@ -15,6 +15,7 @@
 
 /* vl.c */
 extern const char *bios_name;
+extern int mem_guest_map;
 
 #define QEMU_FILE_TYPE_BIOS   0
 #define QEMU_FILE_TYPE_KEYMAP 1
Index: qemu-kvm-gpage-cache/vl.c
===================================================================
--- qemu-kvm-gpage-cache.orig/vl.c
+++ qemu-kvm-gpage-cache/vl.c
@@ -248,6 +248,7 @@ const char *mem_path = NULL;
 #ifdef MAP_POPULATE
 int mem_prealloc = 1;	/* force preallocation of physical target memory */
 #endif
+int mem_guest_map = 1; /* separate qemu/guest mappings for RAM */
 #ifdef TARGET_ARM
 int old_param = 0;
 #endif

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-14 14:00   ` Avi Kivity
@ 2012-08-14 15:51     ` Marcelo Tosatti
  2012-08-15 10:44       ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-14 15:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, LKML, KVM

On Tue, Aug 14, 2012 at 05:00:33PM +0300, Avi Kivity wrote:
> On 08/10/2012 09:14 PM, Marcelo Tosatti wrote:
> > On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >> Changelog:
> >> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >> 
> >> The test case can be found at:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >> 
> >> In current code, if we map a readonly memory space from host to guest
> >> and the page is not currently mapped in the host, we will get a fault-pfn
> >> and async is not allowed, then the vm will crash.
> >> 
> >> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >> to the guest, read access is happy for readonly memslot, write access on
> >> readonly memslot will cause KVM_EXIT_MMIO exit.
> > 
> > Memory slots whose QEMU mapping is write protected is supported
> > today, as long as there are no write faults.
> > 
> > What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> > again?
> 
> Userspace may want to modify the ROM (for example, when programming a
> flash device).  It is also possible to map an hva range rw through one
> slot and ro through another.

Right, can do that with multiple userspace maps to the same anonymous 
memory region (see other email).

The bugs noticed should be fixed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-14 15:51     ` Marcelo Tosatti
@ 2012-08-15 10:44       ` Avi Kivity
  2012-08-15 17:53         ` Marcelo Tosatti
  0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-08-15 10:44 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, LKML, KVM

On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
>> 
>> Userspace may want to modify the ROM (for example, when programming a
>> flash device).  It is also possible to map an hva range rw through one
>> slot and ro through another.
> 
> Right, can do that with multiple userspace maps to the same anonymous 
> memory region (see other email).

Yes it's possible.  It requires that we move all memory allocation to be
fd based, since userspace can't predict what memory will be dual-mapped
(at least if emulated hardware allows this).  Is this a reasonable
requirement?  Do ksm/thp/autonuma work with this?



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-15 10:44       ` Avi Kivity
@ 2012-08-15 17:53         ` Marcelo Tosatti
  2012-08-16  9:03           ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-15 17:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, LKML, KVM

On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
> On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
> >> 
> >> Userspace may want to modify the ROM (for example, when programming a
> >> flash device).  It is also possible to map an hva range rw through one
> >> slot and ro through another.
> > 
> > Right, can do that with multiple userspace maps to the same anonymous 
> > memory region (see other email).
> 
> Yes it's possible.  It requires that we move all memory allocation to be
> fd based, since userspace can't predict what memory will be dual-mapped
> (at least if emulated hardware allows this).

It can:
- Create named memory object, with associated fd.
- Copy data from large anonymous memory region to named memory.
- Unmap region that must be dual-mapped from large anonymous memory chunk.
- Map named memory object at address.

The last step can be replaced by adjusting KVM memory slots.

The disadvantage of protection information in memory slots
is that it duplicates functionality that is handled by 
userspace mappings.

Moreover, multiple memory maps are necessary for any
split-qemu-into-smaller-pieces solutions.

>  Is this a reasonable
> requirement?  Do ksm/thp/autonuma work with this?

As mentioned, only memory used for ROM purposes must be dual mapped. 

I don't think there is any way to create multiple mappings 
to one anonymous memory object ATM, but POSIX defines it
(posix_typed_mem_open).

The limitation of thp/ksm on shared memory also affects any other user
of shared memory, so it should be fixed there.

Also, QEMU ROM is allocated separately from RAM, correct?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-14 15:25         ` Marcelo Tosatti
@ 2012-08-16  5:49           ` Xiao Guangrong
  2012-08-16 16:03             ` Marcelo Tosatti
  0 siblings, 1 reply; 35+ messages in thread
From: Xiao Guangrong @ 2012-08-16  5:49 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 08/14/2012 11:25 PM, Marcelo Tosatti wrote:
> On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
>> On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
>>> On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
>>>> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
>>>>> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>>>>>> Changelog:
>>>>>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>>>>>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>>>>>>
>>>>>> The test case can be found at:
>>>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>>>>>>
>>>>>> In current code, if we map a readonly memory space from host to guest
>>>>>> and the page is not currently mapped in the host, we will get a fault-pfn
>>>>>> and async is not allowed, then the vm will crash.
>>>>>>
>>>>>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>>>>>> to the guest, read access is happy for readonly memslot, write access on
>>>>>> readonly memslot will cause KVM_EXIT_MMIO exit.
>>>>>
>>>>> Memory slots whose QEMU mapping is write protected is supported
>>>>> today, as long as there are no write faults.
>>>>>
>>>>> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
>>>>> again?
>>>>>
>>>>
>>>> It is happy to map !write host memory space to the readonly memslot,
>>>> and they can coexist as well.
>>>>
>>>> readonly memslot checks the write-permission by seeing slot->flags and
>>>> !write memory checks the write-permission in hva_to_pfn() function
>>>> which checks vma->flags. It is no conflict.
>>>
>>> Yes, there is no conflict. The point is, if you can use the
>>> mmap(PROT_READ) interface (supporting read faults on read-only slots)
>>> for this behavior, what is the advantage of a new memslot flag?
>>>
>>
>> You can get the discussion at:
>> https://lkml.org/lkml/2012/5/22/228
>>
>>> I'm not saying mmap(PROT_READ) is the best interface, i am just asking
>>> why it is not.
>>
>> My fault. :(
>>
>>>
>>>>> The initial objective was to fix a vm crash, can you explain that
>>>>> initial problem?
>>>>>
>>>>
>>>> The issue was trigged by this code:
>>>>
>>>>                 } else {
>>>>                         if (async && (vma->vm_flags & VM_WRITE))
>>>>                                 *async = true;
>>>>                         pfn = KVM_PFN_ERR_FAULT;
>>>>                 }
>>>>
>>>> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
>>>> its physical page is swapped out (or the file data does not be read in),
>>>> get_user_page_nowait will fail, above code reject to set async,
>>>> then we will get a fault pfn and async=false.
>>>>
>>>> I guess this issue also exists in "QEMU write protected mapping" as
>>>> you mentioned above.
>>>
>>> Yes, it does. As far as i understand, what that check does from a high
>>> level pov is:
>>>
>>> - Did get_user_pages_nowait() fail due to a swapped out page (in which 
>>> case we should try to swappin the page asynchronously), or due to 
>>> another reason (for which case an error should be returned).
>>>
>>> Using vma->vm_flags VM_WRITE for that is trying to guess why
>>> get_user_pages_nowait() failed, because it (gup_nowait return values) 
>>> does not provide sufficient information by itself.
>>>
>>
>> That is exactly what i did in the first version. :)
>>
>> You can see it and the reason why it switched to the new way (readonly memslot)
>> in the above website (the first message in thread).
> 
> Userspace can create multiple mappings for the same memory region, for
> example via shared memory (shm_open), and have different protections for
> the two (or more) regions. I had old patch doing this, its attached.
> 

In this way, if guest try to write a readonly gfn, the vm will be crashed since
it will return FAULT_PFN on the page-fault path. VMM can not detect this kind
of fault, we have these problems:
- even if guest try to write ROM on a PCI device, the guest will die, but
  we'd ignore this write, it looks more like the real machine.

- can not implement ROMD beacuse write to a ROMD is MMIO access

Yes, we can rework get_user_page_nowait and get_user_pages_fast, let them
tell us the fault reason, but it is more complex i think.

>>> Can't that be fixed separately? 
>>>
>>> Another issue which is also present with the mmap(PROT_READ) scheme is
>>> interaction with reexecute_instruction. That is, unless i am mistaken,
>>> reexecute_instruction can succeed (return true) on a region that is
>>> write protected. This breaks the "write faults on read-only slots exit
>>> to userspace via EXIT_MMIO" behaviour.
>>
>> Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?
> 
> reexecute_instruction validates presence of GPA by looking at registered
> memslots. But if the access is a write, and userspace memory map is
> read-only, reexecute_instruction should exit via MMIO.
> 
> That is, reexecute_instruction must validate GPA using registered
> memslots AND additionaly userspace map permission, not only registered
> memslot.
> 

What will happen if we always retry a unhandleable instruction which try to write
readonly memory? It will goto a endless loop (write-fault -> emulation fail ->
write-fault...)? Right?

I do not think exit via MMIO is a good idea because the instructions can not be
emulated, after the userspace finished the MMIO, the emulation will fail again.

I think we can simply exit via KVM_EXIT_INTERNAL_ERROR for all the access on
readonly memory because:
- it is fine for the read access since the read fault is always fixed on page-fault path,
  it does not go to x86_emulate_instruction()

- for the write access, we can not emulate it. It is not bad since it only happen on
  the instructions kvm unsupported.

Your idea?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-15 17:53         ` Marcelo Tosatti
@ 2012-08-16  9:03           ` Avi Kivity
  2012-08-16 15:57             ` Marcelo Tosatti
  0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-08-16  9:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, LKML, KVM

On 08/15/2012 08:53 PM, Marcelo Tosatti wrote:
> On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
>> On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
>> >> 
>> >> Userspace may want to modify the ROM (for example, when programming a
>> >> flash device).  It is also possible to map an hva range rw through one
>> >> slot and ro through another.
>> > 
>> > Right, can do that with multiple userspace maps to the same anonymous 
>> > memory region (see other email).
>> 
>> Yes it's possible.  It requires that we move all memory allocation to be
>> fd based, since userspace can't predict what memory will be dual-mapped
>> (at least if emulated hardware allows this).
> 
> It can:
> - Create named memory object, with associated fd.
> - Copy data from large anonymous memory region to named memory.

That doesn't work if dma is in progress (assigned device).  It also
doubles the amount of memory in use.

> - Unmap region that must be dual-mapped from large anonymous memory chunk.
> - Map named memory object at address.
> 
> The last step can be replaced by adjusting KVM memory slots.
> 
> The disadvantage of protection information in memory slots
> is that it duplicates functionality that is handled by 
> userspace mappings.

Agree.  So does the memory slots mechanism, and even dirty logging.

> 
> Moreover, multiple memory maps are necessary for any
> split-qemu-into-smaller-pieces solutions.

Complex users can use complex mechanism, but let's keep the simple stuff
simple.

> 
>>  Is this a reasonable
>> requirement?  Do ksm/thp/autonuma work with this?
> 
> As mentioned, only memory used for ROM purposes must be dual mapped. 
> 
> I don't think there is any way to create multiple mappings 
> to one anonymous memory object ATM, but POSIX defines it
> (posix_typed_mem_open).
> 
> The limitation of thp/ksm on shared memory also affects any other user
> of shared memory, so it should be fixed there.
> 
> Also, QEMU ROM is allocated separately from RAM, correct?
> 

Correct.  But the chipset is also able to to write-protect some ranges
in the 0xc0000-0x100000 area via the PAM.  It is able to write-protect
both RAM and PCI memory (usually mapped to flash).



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-16  9:03           ` Avi Kivity
@ 2012-08-16 15:57             ` Marcelo Tosatti
  2012-08-16 16:17               ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-16 15:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, LKML, KVM

On Thu, Aug 16, 2012 at 12:03:01PM +0300, Avi Kivity wrote:
> On 08/15/2012 08:53 PM, Marcelo Tosatti wrote:
> > On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
> >> On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
> >> >> 
> >> >> Userspace may want to modify the ROM (for example, when programming a
> >> >> flash device).  It is also possible to map an hva range rw through one
> >> >> slot and ro through another.
> >> > 
> >> > Right, can do that with multiple userspace maps to the same anonymous 
> >> > memory region (see other email).
> >> 
> >> Yes it's possible.  It requires that we move all memory allocation to be
> >> fd based, since userspace can't predict what memory will be dual-mapped
> >> (at least if emulated hardware allows this).
> > 
> > It can:
> > - Create named memory object, with associated fd.
> > - Copy data from large anonymous memory region to named memory.
> 
> That doesn't work if dma is in progress (assigned device).  It also
> doubles the amount of memory in use.
>
> > - Unmap region that must be dual-mapped from large anonymous memory chunk.
> > - Map named memory object at address.
> > 
> > The last step can be replaced by adjusting KVM memory slots.
> > 
> > The disadvantage of protection information in memory slots
> > is that it duplicates functionality that is handled by 
> > userspace mappings.
> 
> Agree.  So does the memory slots mechanism, and even dirty logging.
>
> > Moreover, multiple memory maps are necessary for any
> > split-qemu-into-smaller-pieces solutions.
> 
> Complex users can use complex mechanism, but let's keep the simple stuff
> simple.
> 
> > 
> >>  Is this a reasonable
> >> requirement?  Do ksm/thp/autonuma work with this?
> > 
> > As mentioned, only memory used for ROM purposes must be dual mapped. 
> > 
> > I don't think there is any way to create multiple mappings 
> > to one anonymous memory object ATM, but POSIX defines it
> > (posix_typed_mem_open).
> > 
> > The limitation of thp/ksm on shared memory also affects any other user
> > of shared memory, so it should be fixed there.
> > 
> > Also, QEMU ROM is allocated separately from RAM, correct?
> > 
> 
> Correct.  But the chipset is also able to to write-protect some ranges
> in the 0xc0000-0x100000 area via the PAM.  It is able to write-protect
> both RAM and PCI memory (usually mapped to flash).

You are convinced that adding read-write protection information to the
memory slots, which controls access by the guest, in addition to the
userspace host pagetables, is useful. OK.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-16  5:49           ` Xiao Guangrong
@ 2012-08-16 16:03             ` Marcelo Tosatti
  0 siblings, 0 replies; 35+ messages in thread
From: Marcelo Tosatti @ 2012-08-16 16:03 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Thu, Aug 16, 2012 at 01:49:11PM +0800, Xiao Guangrong wrote:
> On 08/14/2012 11:25 PM, Marcelo Tosatti wrote:
> > On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
> >> On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
> >>> On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
> >>>> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> >>>>> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >>>>>> Changelog:
> >>>>>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >>>>>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >>>>>>
> >>>>>> The test case can be found at:
> >>>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >>>>>>
> >>>>>> In current code, if we map a readonly memory space from host to guest
> >>>>>> and the page is not currently mapped in the host, we will get a fault-pfn
> >>>>>> and async is not allowed, then the vm will crash.
> >>>>>>
> >>>>>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >>>>>> to the guest, read access is happy for readonly memslot, write access on
> >>>>>> readonly memslot will cause KVM_EXIT_MMIO exit.
> >>>>>
> >>>>> Memory slots whose QEMU mapping is write protected is supported
> >>>>> today, as long as there are no write faults.
> >>>>>
> >>>>> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> >>>>> again?
> >>>>>
> >>>>
> >>>> It is happy to map !write host memory space to the readonly memslot,
> >>>> and they can coexist as well.
> >>>>
> >>>> readonly memslot checks the write-permission by seeing slot->flags and
> >>>> !write memory checks the write-permission in hva_to_pfn() function
> >>>> which checks vma->flags. It is no conflict.
> >>>
> >>> Yes, there is no conflict. The point is, if you can use the
> >>> mmap(PROT_READ) interface (supporting read faults on read-only slots)
> >>> for this behavior, what is the advantage of a new memslot flag?
> >>>
> >>
> >> You can get the discussion at:
> >> https://lkml.org/lkml/2012/5/22/228
> >>
> >>> I'm not saying mmap(PROT_READ) is the best interface, i am just asking
> >>> why it is not.
> >>
> >> My fault. :(
> >>
> >>>
> >>>>> The initial objective was to fix a vm crash, can you explain that
> >>>>> initial problem?
> >>>>>
> >>>>
> >>>> The issue was trigged by this code:
> >>>>
> >>>>                 } else {
> >>>>                         if (async && (vma->vm_flags & VM_WRITE))
> >>>>                                 *async = true;
> >>>>                         pfn = KVM_PFN_ERR_FAULT;
> >>>>                 }
> >>>>
> >>>> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
> >>>> its physical page is swapped out (or the file data does not be read in),
> >>>> get_user_page_nowait will fail, above code reject to set async,
> >>>> then we will get a fault pfn and async=false.
> >>>>
> >>>> I guess this issue also exists in "QEMU write protected mapping" as
> >>>> you mentioned above.
> >>>
> >>> Yes, it does. As far as i understand, what that check does from a high
> >>> level pov is:
> >>>
> >>> - Did get_user_pages_nowait() fail due to a swapped out page (in which 
> >>> case we should try to swappin the page asynchronously), or due to 
> >>> another reason (for which case an error should be returned).
> >>>
> >>> Using vma->vm_flags VM_WRITE for that is trying to guess why
> >>> get_user_pages_nowait() failed, because it (gup_nowait return values) 
> >>> does not provide sufficient information by itself.
> >>>
> >>
> >> That is exactly what i did in the first version. :)
> >>
> >> You can see it and the reason why it switched to the new way (readonly memslot)
> >> in the above website (the first message in thread).
> > 
> > Userspace can create multiple mappings for the same memory region, for
> > example via shared memory (shm_open), and have different protections for
> > the two (or more) regions. I had old patch doing this, its attached.
> > 
> 
> In this way, if guest try to write a readonly gfn, the vm will be crashed since
> it will return FAULT_PFN on the page-fault path. VMM can not detect this kind
> of fault, we have these problems:
> - even if guest try to write ROM on a PCI device, the guest will die, but
>   we'd ignore this write, it looks more like the real machine.
> 
> - can not implement ROMD beacuse write to a ROMD is MMIO access
> 
> Yes, we can rework get_user_page_nowait and get_user_pages_fast, let them
> tell us the fault reason, but it is more complex i think.
> 
> >>> Can't that be fixed separately? 
> >>>
> >>> Another issue which is also present with the mmap(PROT_READ) scheme is
> >>> interaction with reexecute_instruction. That is, unless i am mistaken,
> >>> reexecute_instruction can succeed (return true) on a region that is
> >>> write protected. This breaks the "write faults on read-only slots exit
> >>> to userspace via EXIT_MMIO" behaviour.
> >>
> >> Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?
> > 
> > reexecute_instruction validates presence of GPA by looking at registered
> > memslots. But if the access is a write, and userspace memory map is
> > read-only, reexecute_instruction should exit via MMIO.
> > 
> > That is, reexecute_instruction must validate GPA using registered
> > memslots AND additionaly userspace map permission, not only registered
> > memslot.
> > 
> 
> What will happen if we always retry a unhandleable instruction which try to write
> readonly memory? It will goto a endless loop (write-fault -> emulation fail ->
> write-fault...)? Right?

I think so... thats what would happen on real hardware.

> I do not think exit via MMIO is a good idea because the instructions can not be
> emulated, after the userspace finished the MMIO, the emulation will fail again.
> 
> I think we can simply exit via KVM_EXIT_INTERNAL_ERROR for all the access on
> readonly memory because:
> - it is fine for the read access since the read fault is always fixed on page-fault path,
>   it does not go to x86_emulate_instruction()
> 
> - for the write access, we can not emulate it. It is not bad since it only happen on
>   the instructions kvm unsupported.
> 
> Your idea?

Either KVM_EXIT_INTERNAL_ERROR or leave the guest looping. 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 00/12] KVM: introduce readonly memslot
  2012-08-16 15:57             ` Marcelo Tosatti
@ 2012-08-16 16:17               ` Avi Kivity
  0 siblings, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2012-08-16 16:17 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, LKML, KVM

On 08/16/2012 06:57 PM, Marcelo Tosatti wrote:
>> 
>> Correct.  But the chipset is also able to to write-protect some ranges
>> in the 0xc0000-0x100000 area via the PAM.  It is able to write-protect
>> both RAM and PCI memory (usually mapped to flash).
> 
> You are convinced that adding read-write protection information to the
> memory slots, which controls access by the guest, in addition to the
> userspace host pagetables, is useful. OK.

In fact if we started from scratch I'd go for one huge slot, with
PROT_NONE for mmio and non-kvm APIs for dirty bitmaps, and use Linux mm
APIs to manage the details.  This would make kvm x86_64 only (no way to
access the PCI space on i386) but it would simplify a lot of the
internal translation layer.  But we're not there.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-08-16 16:17 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-07  9:47 [PATCH v5 00/12] KVM: introduce readonly memslot Xiao Guangrong
2012-08-07  9:48 ` [PATCH v5 01/12] KVM: fix missing check for memslot flags Xiao Guangrong
2012-08-07  9:48 ` [PATCH v5 02/12] KVM: hide KVM_MEMSLOT_INVALID from userspace Xiao Guangrong
2012-08-09 18:48   ` Marcelo Tosatti
2012-08-10  2:11     ` Xiao Guangrong
2012-08-07  9:49 ` [PATCH v5 03/12] KVM: introduce gfn_to_pfn_memslot_atomic Xiao Guangrong
2012-08-09 18:50   ` Marcelo Tosatti
2012-08-10  3:22     ` Xiao Guangrong
2012-08-07  9:50 ` [PATCH v5 04/12] KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic Xiao Guangrong
2012-08-07  9:51 ` [PATCH v5 05/12] KVM: reorganize hva_to_pfn Xiao Guangrong
2012-08-10 17:51   ` Marcelo Tosatti
2012-08-11  3:11     ` Xiao Guangrong
2012-08-07  9:51 ` [PATCH v5 06/12] KVM: use 'writable' as a hint to map writable pfn Xiao Guangrong
2012-08-07  9:52 ` [PATCH v5 07/12] KVM: introduce KVM_PFN_ERR_RO_FAULT Xiao Guangrong
2012-08-07  9:52 ` [PATCH v5 08/12] KVM: introduce KVM_HVA_ERR_BAD Xiao Guangrong
2012-08-07  9:53 ` [PATCH v5 09/12] KVM: introduce KVM_HVA_ERR_RO_BAD Xiao Guangrong
2012-08-07  9:54 ` [PATCH v5 10/12] KVM: introduce readonly memslot Xiao Guangrong
2012-08-07  9:54 ` [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info Xiao Guangrong
2012-08-10 18:03   ` Marcelo Tosatti
2012-08-11  3:13     ` Xiao Guangrong
2012-08-07  9:55 ` [PATCH v5 12/12] KVM: indicate readonly access fault Xiao Guangrong
2012-08-10 18:14 ` [PATCH v5 00/12] KVM: introduce readonly memslot Marcelo Tosatti
2012-08-11  3:36   ` Xiao Guangrong
2012-08-13 17:39     ` Marcelo Tosatti
2012-08-14  2:58       ` Xiao Guangrong
2012-08-14 15:25         ` Marcelo Tosatti
2012-08-16  5:49           ` Xiao Guangrong
2012-08-16 16:03             ` Marcelo Tosatti
2012-08-14 14:00   ` Avi Kivity
2012-08-14 15:51     ` Marcelo Tosatti
2012-08-15 10:44       ` Avi Kivity
2012-08-15 17:53         ` Marcelo Tosatti
2012-08-16  9:03           ` Avi Kivity
2012-08-16 15:57             ` Marcelo Tosatti
2012-08-16 16:17               ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).