All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach
@ 2010-04-09  9:27 Takuya Yoshikawa
  2010-04-09  9:30 ` [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space Takuya Yoshikawa
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-09  9:27 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, fernando

Hi, this is the first version!


We've first implemented the x86 specific parts without introducing
new APIs: so this code works with current qemu-kvm.

Although we have many things to do, we'd like to get some comments
to see we are going to the right direction.


Note: we are now testing this and now getting to be thinking we may
  be able to improve some performance especially concerning to the
  migration, graphics, etc, sorry but not confirmed yet.

Thanks in advance,
  Takuya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space
  2010-04-09  9:27 [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach Takuya Yoshikawa
@ 2010-04-09  9:30 ` Takuya Yoshikawa
  2010-04-11 17:08   ` Avi Kivity
  2010-04-09  9:32 ` [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps Takuya Yoshikawa
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-09  9:30 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, fernando

This work is initially suggested by Avi Kivity for moving the
dirty bitmaps used by KVM to user space: This makes it possible
to manipulate the bitmaps from qemu without copying from KVM.

Note: We are now brushing up this code before sending to x86's
  maintainers.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 arch/x86/include/asm/uaccess.h |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index abd3e0e..25a694e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -98,6 +98,43 @@ struct exception_table_entry {
 
 extern int fixup_exception(struct pt_regs *regs);
 
+/**
+ * set_bit_user: - Set a bit of a bitmap in user space.
+ * @nr:   Bit offset to set.
+ * @addr: Base address, in user space.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * This macro sets a bit of a bitmap in user space. Note that this
+ * is same as __set_bit but not set_bit in the sense that setting
+ * the bit is not done atomically.
+ *
+ * Returns zero on success, -EFAULT on error.
+ */
+#define __set_bit_user_asm(nr, addr, err, errret)			\
+	asm volatile("1:	bts %1,%2\n"				\
+		     "2:\n"						\
+		     ".section .fixup,\"ax\"\n"				\
+		     "3:	mov %3,%0\n"				\
+		     "	jmp 2b\n"					\
+		     ".previous\n"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : "=r"(err)					\
+		     : "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
+
+#define set_bit_user(nr, addr)						\
+({									\
+	int __ret_sbu = 0;						\
+									\
+	might_fault();							\
+	if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))			\
+		__set_bit_user_asm(nr, addr, __ret_sbu, -EFAULT);	\
+	else								\
+		__ret_sbu = -EFAULT;					\
+									\
+	__ret_sbu;							\
+})
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps
  2010-04-09  9:27 [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach Takuya Yoshikawa
  2010-04-09  9:30 ` [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space Takuya Yoshikawa
@ 2010-04-09  9:32 ` Takuya Yoshikawa
  2010-04-11 17:12   ` Avi Kivity
  2010-04-09  9:34 ` [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy " Takuya Yoshikawa
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-09  9:32 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, fernando

We will use this later in other parts.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 arch/powerpc/kvm/book3s.c |    2 +-
 arch/x86/kvm/x86.c        |    2 +-
 include/linux/kvm_host.h  |    5 +++++
 virt/kvm/kvm_main.c       |    4 ++--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a7ab2ea..3ca857b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1136,7 +1136,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		kvm_for_each_vcpu(n, vcpu, kvm)
 			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
 
-		n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+		n = kvm_dirty_bitmap_bytes(memslot);
 		memset(memslot->dirty_bitmap, 0, n);
 	}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd5c3d3..450ecfe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2664,7 +2664,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	if (!memslot->dirty_bitmap)
 		goto out;
 
-	n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+	n = kvm_dirty_bitmap_bytes(memslot);
 
 	r = -ENOMEM;
 	dirty_bitmap = vmalloc(n);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8e91fa7..dd6bcf4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -119,6 +119,11 @@ struct kvm_memory_slot {
 	int user_alloc;
 };
 
+static inline int kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
+{
+	return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+}
+
 struct kvm_kernel_irq_routing_entry {
 	u32 gsi;
 	u32 type;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9379533..5ab581e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -645,7 +645,7 @@ skip_lpage:
 
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-		unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8;
+		int dirty_bytes = kvm_dirty_bitmap_bytes(&new);
 
 		new.dirty_bitmap = vmalloc(dirty_bytes);
 		if (!new.dirty_bitmap)
@@ -777,7 +777,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 	if (!memslot->dirty_bitmap)
 		goto out;
 
-	n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+	n = kvm_dirty_bitmap_bytes(memslot);
 
 	for (i = 0; !any && i < n/sizeof(long); ++i)
 		any = memslot->dirty_bitmap[i];
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps
  2010-04-09  9:27 [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach Takuya Yoshikawa
  2010-04-09  9:30 ` [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space Takuya Yoshikawa
  2010-04-09  9:32 ` [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps Takuya Yoshikawa
@ 2010-04-09  9:34 ` Takuya Yoshikawa
  2010-04-11 17:13   ` Avi Kivity
  2010-04-09  9:35 ` [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps Takuya Yoshikawa
  2010-04-09  9:38 ` [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space" Takuya Yoshikawa
  4 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-09  9:34 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, fernando

For x86, we will change the allocation and free parts to do_mmap() and
do_munmap(). This patch makes it cleaner.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 virt/kvm/kvm_main.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5ab581e..f919bd1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -431,6 +431,12 @@ out_err_nodisable:
 	return ERR_PTR(r);
 }
 
+static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	vfree(memslot->dirty_bitmap);
+	memslot->dirty_bitmap = NULL;
+}
+
 /*
  * Free any memory in @free but not in @dont.
  */
@@ -443,7 +449,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 		vfree(free->rmap);
 
 	if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
-		vfree(free->dirty_bitmap);
+		kvm_destroy_dirty_bitmap(free);
 
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
@@ -454,7 +460,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 	}
 
 	free->npages = 0;
-	free->dirty_bitmap = NULL;
 	free->rmap = NULL;
 }
 
@@ -516,6 +521,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	int dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+	memslot->dirty_bitmap = vmalloc(dirty_bytes);
+	if (!memslot->dirty_bitmap)
+		return -ENOMEM;
+
+	memset(memslot->dirty_bitmap, 0, dirty_bytes);
+	return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -645,12 +662,8 @@ skip_lpage:
 
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-		int dirty_bytes = kvm_dirty_bitmap_bytes(&new);
-
-		new.dirty_bitmap = vmalloc(dirty_bytes);
-		if (!new.dirty_bitmap)
+		if (kvm_create_dirty_bitmap(&new) < 0)
 			goto out_free;
-		memset(new.dirty_bitmap, 0, dirty_bytes);
 		/* destroy any largepage mappings for dirty tracking */
 		if (old.npages)
 			flush_shadow = 1;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps
  2010-04-09  9:27 [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2010-04-09  9:34 ` [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy " Takuya Yoshikawa
@ 2010-04-09  9:35 ` Takuya Yoshikawa
  2010-04-11 17:15   ` Avi Kivity
  2010-04-09  9:38 ` [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space" Takuya Yoshikawa
  4 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-09  9:35 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, fernando

Currently, x86 vmalloc()s a dirty bitmap every time when we swich
to the next dirty bitmap. To avoid this, we use the double buffering
technique: we also move the bitmaps to userspace, so that extra
bitmaps will not use the precious kernel resource.

This idea is based on Avi's suggestion.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |    3 +++
 include/linux/kvm_host.h        |    6 ++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0c49c88..b502bca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -25,6 +25,9 @@
 #include <asm/mtrr.h>
 #include <asm/msr-index.h>
 
+/* Select x86 specific features in <linux/kvm_host.h> */
+#define __KVM_HAVE_USER_DIRTYBITMAP
+
 #define KVM_MAX_VCPUS 64
 #define KVM_MEMORY_SLOTS 32
 /* memory slots that does not exposed to userspace */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dd6bcf4..07092d6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -110,7 +110,13 @@ struct kvm_memory_slot {
 	unsigned long npages;
 	unsigned long flags;
 	unsigned long *rmap;
+#ifndef __KVM_HAVE_USER_DIRTYBITMAP
 	unsigned long *dirty_bitmap;
+#else
+	unsigned long __user *dirty_bitmap;
+	unsigned long __user *dirty_bitmap_old;
+	bool is_dirty;
+#endif
 	struct {
 		unsigned long rmap_pde;
 		int write_count;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space"
  2010-04-09  9:27 [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach Takuya Yoshikawa
                   ` (3 preceding siblings ...)
  2010-04-09  9:35 ` [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps Takuya Yoshikawa
@ 2010-04-09  9:38 ` Takuya Yoshikawa
  2010-04-11 17:21   ` Avi Kivity
  4 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-09  9:38 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, fernando

By this patch, bitmap allocation is replaced with do_mmap() and
bitmap manipulation is replaced with *_user() functions.

Note that this does not change the APIs between kernel and user space.
To get more advantage from this hack, we need to add a new interface
for triggering the bitmap swith and getting the bitmap addresses: the
addresses is in user space and we can export them to qemu.

TODO:
1. We want to use copy_in_user() for 32bit case too.
   Note that this is only for the compatibility issue: in the future,
   we hope, qemu will not need to use this ioctl.
2. We have to implement test_bit_user() to avoid extra set_bit.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 arch/x86/kvm/x86.c       |  118 +++++++++++++++++++++++++++++++++++++--------
 include/linux/kvm_host.h |    4 ++
 virt/kvm/kvm_main.c      |   30 +++++++++++-
 3 files changed, 130 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 450ecfe..995b970 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2642,16 +2642,99 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
+int kvm_arch_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	unsigned long user_addr1;
+	unsigned long user_addr2;
+	int dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+	down_write(&current->mm->mmap_sem);
+	user_addr1 = do_mmap(NULL, 0, dirty_bytes,
+			     PROT_READ | PROT_WRITE,
+			     MAP_PRIVATE | MAP_ANONYMOUS, 0);
+	if (IS_ERR((void *)user_addr1)) {
+		up_write(&current->mm->mmap_sem);
+		return PTR_ERR((void *)user_addr1);
+	}
+	user_addr2 = do_mmap(NULL, 0, dirty_bytes,
+			     PROT_READ | PROT_WRITE,
+			     MAP_PRIVATE | MAP_ANONYMOUS, 0);
+	if (IS_ERR((void *)user_addr2)) {
+		do_munmap(current->mm, user_addr1, dirty_bytes);
+		up_write(&current->mm->mmap_sem);
+		return PTR_ERR((void *)user_addr2);
+	}
+	up_write(&current->mm->mmap_sem);
+
+	memslot->dirty_bitmap = (unsigned long __user *)user_addr1;
+	memslot->dirty_bitmap_old = (unsigned long __user *)user_addr2;
+	clear_user(memslot->dirty_bitmap, dirty_bytes);
+	clear_user(memslot->dirty_bitmap_old, dirty_bytes);
+
+	return 0;
+}
+
+void kvm_arch_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	int n = kvm_dirty_bitmap_bytes(memslot);
+
+	if (!memslot->dirty_bitmap)
+		return;
+
+	down_write(&current->mm->mmap_sem);
+	do_munmap(current->mm, (unsigned long)memslot->dirty_bitmap, n);
+	do_munmap(current->mm, (unsigned long)memslot->dirty_bitmap_old, n);
+	up_write(&current->mm->mmap_sem);
+
+	memslot->dirty_bitmap = NULL;
+	memslot->dirty_bitmap_old = NULL;
+}
+
+static int kvm_copy_dirty_bitmap(unsigned long __user *to,
+				 const unsigned long __user *from, int n)
+{
+#ifdef CONFIG_X86_64
+	if (copy_in_user(to, from, n) < 0) {
+		printk(KERN_WARNING "%s: copy_in_user failed\n", __func__);
+		return -EFAULT;
+	}
+	return 0;
+#else
+	int ret = 0;
+	void *p = vmalloc(n);
+
+	if (!p) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	if (copy_from_user(p, from, n) < 0) {
+		printk(KERN_WARNING "%s: copy_from_user failed\n", __func__);
+		ret = -EFAULT;
+		goto out_free;
+	}
+	if (copy_to_user(to, p, n) < 0) {
+		printk(KERN_WARNING "%s: copy_to_user failed\n", __func__);
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+out_free:
+	vfree(p);
+out:
+	return ret;
+#endif
+}
+
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				      struct kvm_dirty_log *log)
 {
-	int r, n, i;
+	int r, n;
 	struct kvm_memory_slot *memslot;
-	unsigned long is_dirty = 0;
-	unsigned long *dirty_bitmap = NULL;
+	unsigned long __user *dirty_bitmap;
+	unsigned long __user *dirty_bitmap_old;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -2664,44 +2747,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	if (!memslot->dirty_bitmap)
 		goto out;
 
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	r = -ENOMEM;
-	dirty_bitmap = vmalloc(n);
-	if (!dirty_bitmap)
-		goto out;
-	memset(dirty_bitmap, 0, n);
+	dirty_bitmap = memslot->dirty_bitmap;
+	dirty_bitmap_old = memslot->dirty_bitmap_old;
 
-	for (i = 0; !is_dirty && i < n/sizeof(long); i++)
-		is_dirty = memslot->dirty_bitmap[i];
+	n = kvm_dirty_bitmap_bytes(memslot);
+	clear_user(dirty_bitmap_old, n);
 
 	/* If nothing is dirty, don't bother messing with page tables. */
-	if (is_dirty) {
+	if (memslot->is_dirty) {
 		struct kvm_memslots *slots, *old_slots;
 
 		spin_lock(&kvm->mmu_lock);
 		kvm_mmu_slot_remove_write_access(kvm, log->slot);
 		spin_unlock(&kvm->mmu_lock);
 
+		r = -ENOMEM;
 		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 		if (!slots)
-			goto out_free;
+			goto out;
 
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
+		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old;
+		slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap;
+		slots->memslots[log->slot].is_dirty = false;
 
 		old_slots = kvm->memslots;
 		rcu_assign_pointer(kvm->memslots, slots);
 		synchronize_srcu_expedited(&kvm->srcu);
-		dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
 		kfree(old_slots);
 	}
 
-	r = 0;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
-		r = -EFAULT;
-out_free:
-	vfree(dirty_bitmap);
+	r = kvm_copy_dirty_bitmap(log->dirty_bitmap, dirty_bitmap, n);
 out:
 	mutex_unlock(&kvm->slots_lock);
 	return r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 07092d6..834812f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -276,6 +276,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_userspace_memory_region *mem,
 				struct kvm_memory_slot old,
 				int user_alloc);
+#ifdef __KVM_HAVE_USER_DIRTYBITMAP
+int kvm_arch_create_dirty_bitmap(struct kvm_memory_slot *memslot);
+void kvm_arch_destroy_dirty_bitmap(struct kvm_memory_slot *memslot);
+#endif
 void kvm_disable_largepages(void);
 void kvm_arch_flush_shadow(struct kvm *kvm);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f919bd1..038a677 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -433,8 +433,12 @@ out_err_nodisable:
 
 static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
+#ifdef __KVM_HAVE_USER_DIRTYBITMAP
+	kvm_arch_destroy_dirty_bitmap(memslot);
+#else
 	vfree(memslot->dirty_bitmap);
 	memslot->dirty_bitmap = NULL;
+#endif
 }
 
 /*
@@ -463,13 +467,26 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 	free->rmap = NULL;
 }
 
+/*
+ * We don't munmap dirty bitmaps by ourselves in the case of vm destruction.
+ */
+static void kvm_pre_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+#ifdef __KVM_HAVE_USER_DIRTYBITMAP
+	memslot->dirty_bitmap = NULL;
+	memslot->dirty_bitmap_old = NULL;
+#endif
+}
+
 void kvm_free_physmem(struct kvm *kvm)
 {
 	int i;
 	struct kvm_memslots *slots = kvm->memslots;
 
-	for (i = 0; i < slots->nmemslots; ++i)
+	for (i = 0; i < slots->nmemslots; ++i) {
+		kvm_pre_destroy_dirty_bitmap(&slots->memslots[i]);
 		kvm_free_physmem_slot(&slots->memslots[i], NULL);
+	}
 
 	kfree(kvm->memslots);
 }
@@ -523,6 +540,9 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 
 static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
+#ifdef __KVM_HAVE_USER_DIRTYBITMAP
+	return kvm_arch_create_dirty_bitmap(memslot);
+#else
 	int dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
 
 	memslot->dirty_bitmap = vmalloc(dirty_bytes);
@@ -530,6 +550,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 		return -ENOMEM;
 
 	memset(memslot->dirty_bitmap, 0, dirty_bytes);
+#endif
 	return 0;
 }
 
@@ -1197,9 +1218,16 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 	if (memslot && memslot->dirty_bitmap) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
+#ifdef __KVM_HAVE_USER_DIRTYBITMAP
+		if (set_bit_user(rel_gfn, memslot->dirty_bitmap) < 0)
+			printk(KERN_WARNING "%s: set_bit_user failed\n", __func__);
+
+		memslot->is_dirty = true;
+#else
 		/* avoid RMW */
 		if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
 			generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
+#endif
 	}
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space
  2010-04-09  9:30 ` [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space Takuya Yoshikawa
@ 2010-04-11 17:08   ` Avi Kivity
  2010-04-12  1:29     ` Takuya Yoshikawa
  2010-04-21  4:56     ` Fernando Luis Vázquez Cao
  0 siblings, 2 replies; 25+ messages in thread
From: Avi Kivity @ 2010-04-11 17:08 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/09/2010 12:30 PM, Takuya Yoshikawa wrote:
> This work is initially suggested by Avi Kivity for moving the
> dirty bitmaps used by KVM to user space: This makes it possible
> to manipulate the bitmaps from qemu without copying from KVM.
>
> Note: We are now brushing up this code before sending to x86's
>    maintainers.
>
>    

The subject prefix will need to be x86:, not KVM:, since it isn't kvm 
specific, and you will need to beef up the description since you will 
undoubtedly be asked why this is needed.

Also, please add the generic implementation (in a separate patch).  We 
have dirty bitmaps for ppc as well.

> +/**
> + * set_bit_user: - Set a bit of a bitmap in user space.
> + * @nr:   Bit offset to set.
> + * @addr: Base address, in user space.
> + *
> + * Context: User context only.  This function may sleep.
> + *
> + * This macro sets a bit of a bitmap in user space. Note that this
> + * is same as __set_bit but not set_bit in the sense that setting
> + * the bit is not done atomically.
> + *
> + * Returns zero on success, -EFAULT on error.
> + */
> +#define __set_bit_user_asm(nr, addr, err, errret)			\
> +	asm volatile("1:	bts %1,%2\n"				\
> +		     "2:\n"						\
> +		     ".section .fixup,\"ax\"\n"				\
> +		     "3:	mov %3,%0\n"				\
> +		     "	jmp 2b\n"					\
> +		     ".previous\n"					\
> +		     _ASM_EXTABLE(1b, 3b)				\
> +		     : "=r"(err)					\
> +		     : "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
> +
> +#define set_bit_user(nr, addr)						\
> +({									\
> +	int __ret_sbu = 0;						\
> +									\
> +	might_fault();							\
> +	if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))			\
> +		__set_bit_user_asm(nr, addr, __ret_sbu, -EFAULT);	\
> +	else								\
> +		__ret_sbu = -EFAULT;					\
> +									\
> +	__ret_sbu;							\
> +})
> +
>    

Should be called __set_bit_user() since it is non-atomic.

An interesting wart is that this will use the kernel's word size instead 
of userspace word size for access.  So, a 32-bit process might allocate 
a 4-byte bitmap, and a 64-bit kernel will use a 64-bit access to touch 
it, which might result in a fault.  This might be resolved by 
documenting that userspace bitmaps must be a multiple of 64-bits in size 
and recommending that they be 64-bit aligned as well.

Can you replace the macros with inline functions?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps
  2010-04-09  9:32 ` [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps Takuya Yoshikawa
@ 2010-04-11 17:12   ` Avi Kivity
  2010-04-12  1:53     ` Takuya Yoshikawa
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-11 17:12 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/09/2010 12:32 PM, Takuya Yoshikawa wrote:
> We will use this later in other parts.
>
>    

s/rapper/wrapper/...

>
> +static inline int kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
> +{
> +	return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
> +}
> +
>    

'int' may overflow.

>   struct kvm_kernel_irq_routing_entry {
>   	u32 gsi;
>   	u32 type;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9379533..5ab581e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -645,7 +645,7 @@ skip_lpage:
>
>   	/* Allocate page dirty bitmap if needed */
>   	if ((new.flags&  KVM_MEM_LOG_DIRTY_PAGES)&&  !new.dirty_bitmap) {
> -		unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8;
> +		int dirty_bytes = kvm_dirty_bitmap_bytes(&new);
>    

Ah, an existing bug.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps
  2010-04-09  9:34 ` [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy " Takuya Yoshikawa
@ 2010-04-11 17:13   ` Avi Kivity
  2010-04-12  2:07     ` Takuya Yoshikawa
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-11 17:13 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote:
> For x86, we will change the allocation and free parts to do_mmap() and
> do_munmap(). This patch makes it cleaner.
>
>    

Should be done for all architectures.  I don't want different ways of 
creating dirty bitmaps for different architectures.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps
  2010-04-09  9:35 ` [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps Takuya Yoshikawa
@ 2010-04-11 17:15   ` Avi Kivity
  2010-04-12  2:15     ` Takuya Yoshikawa
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-11 17:15 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/09/2010 12:35 PM, Takuya Yoshikawa wrote:
> Currently, x86 vmalloc()s a dirty bitmap every time when we swich
> to the next dirty bitmap. To avoid this, we use the double buffering
> technique: we also move the bitmaps to userspace, so that extra
> bitmaps will not use the precious kernel resource.
>
> This idea is based on Avi's suggestion.
>
>    

Please fold this into the next patch.  Introducing new data members 
without their users is hard to follow.


>   #define KVM_MAX_VCPUS 64
>   #define KVM_MEMORY_SLOTS 32
>   /* memory slots that does not exposed to userspace */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index dd6bcf4..07092d6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -110,7 +110,13 @@ struct kvm_memory_slot {
>   	unsigned long npages;
>   	unsigned long flags;
>   	unsigned long *rmap;
> +#ifndef __KVM_HAVE_USER_DIRTYBITMAP
>   	unsigned long *dirty_bitmap;
> +#else
> +	unsigned long __user *dirty_bitmap;
> +	unsigned long __user *dirty_bitmap_old;
> +	bool is_dirty;
> +#endif
>    

And, if we make set_user_bit() generic, we don't need the ifdefs.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space"
  2010-04-09  9:38 ` [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space" Takuya Yoshikawa
@ 2010-04-11 17:21   ` Avi Kivity
  2010-04-12  2:29     ` Takuya Yoshikawa
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-11 17:21 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/09/2010 12:38 PM, Takuya Yoshikawa wrote:
> By this patch, bitmap allocation is replaced with do_mmap() and
> bitmap manipulation is replaced with *_user() functions.
>
> Note that this does not change the APIs between kernel and user space.
> To get more advantage from this hack, we need to add a new interface
> for triggering the bitmap swith and getting the bitmap addresses: the
> addresses is in user space and we can export them to qemu.
>
>    

I would like to merge the new API together with the patchset, since 
without it, the improvement is insufficient.

> TODO:
> 1. We want to use copy_in_user() for 32bit case too.
>    

Definitely.  Why doesn't it work now?

>     Note that this is only for the compatibility issue: in the future,
>     we hope, qemu will not need to use this ioctl.
> 2. We have to implement test_bit_user() to avoid extra set_bit.
>    

This was important in the days of shadow paging.  I'm not so sure about 
it with nested paging, since we'll typically only fault a page once per 
iteration.  Since we're very likely to actually write these days, the 
extra access is wasteful.

>
> +int kvm_arch_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> +{
> +	unsigned long user_addr1;
> +	unsigned long user_addr2;
> +	int dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
> +
> +	down_write(&current->mm->mmap_sem);
> +	user_addr1 = do_mmap(NULL, 0, dirty_bytes,
> +			     PROT_READ | PROT_WRITE,
> +			     MAP_PRIVATE | MAP_ANONYMOUS, 0);
> +	if (IS_ERR((void *)user_addr1)) {
> +		up_write(&current->mm->mmap_sem);
> +		return PTR_ERR((void *)user_addr1);
> +	}
> +	user_addr2 = do_mmap(NULL, 0, dirty_bytes,
> +			     PROT_READ | PROT_WRITE,
> +			     MAP_PRIVATE | MAP_ANONYMOUS, 0);
> +	if (IS_ERR((void *)user_addr2)) {
> +		do_munmap(current->mm, user_addr1, dirty_bytes);
> +		up_write(&current->mm->mmap_sem);
> +		return PTR_ERR((void *)user_addr2);
> +	}
> +	up_write(&current->mm->mmap_sem);
>    

I think a single do_mmap() call for both bitmaps is simpler in terms of 
error handling.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space
  2010-04-11 17:08   ` Avi Kivity
@ 2010-04-12  1:29     ` Takuya Yoshikawa
  2010-04-12  9:12       ` Avi Kivity
  2010-04-21  4:56     ` Fernando Luis Vázquez Cao
  1 sibling, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-12  1:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, fernando

(2010/04/12 2:08), Avi Kivity wrote:
> On 04/09/2010 12:30 PM, Takuya Yoshikawa wrote:
>> This work is initially suggested by Avi Kivity for moving the
>> dirty bitmaps used by KVM to user space: This makes it possible
>> to manipulate the bitmaps from qemu without copying from KVM.
>>
>> Note: We are now brushing up this code before sending to x86's
>> maintainers.
>>
>
> The subject prefix will need to be x86:, not KVM:, since it isn't kvm
> specific, and you will need to beef up the description since you will
> undoubtedly be asked why this is needed.

OK, I'll do my best. I wish somebody other than KVM people also want to
use this.

>
> Also, please add the generic implementation (in a separate patch). We
> have dirty bitmaps for ppc as well.

Yes, I've already implemented asm-generic set bit user, but need to finish
other dirty bitmap related parts for ppc to explain why it is needed.

>
>> +/**
>> + * set_bit_user: - Set a bit of a bitmap in user space.
>> + * @nr: Bit offset to set.
>> + * @addr: Base address, in user space.
>> + *
>> + * Context: User context only. This function may sleep.
>> + *
>> + * This macro sets a bit of a bitmap in user space. Note that this
>> + * is same as __set_bit but not set_bit in the sense that setting
>> + * the bit is not done atomically.
>> + *
>> + * Returns zero on success, -EFAULT on error.
>> + */
>> +#define __set_bit_user_asm(nr, addr, err, errret) \
>> + asm volatile("1: bts %1,%2\n" \
>> + "2:\n" \
>> + ".section .fixup,\"ax\"\n" \
>> + "3: mov %3,%0\n" \
>> + " jmp 2b\n" \
>> + ".previous\n" \
>> + _ASM_EXTABLE(1b, 3b) \
>> + : "=r"(err) \
>> + : "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
>> +
>> +#define set_bit_user(nr, addr) \
>> +({ \
>> + int __ret_sbu = 0; \
>> + \
>> + might_fault(); \
>> + if (access_ok(VERIFY_WRITE, addr, nr/8 + 1)) \
>> + __set_bit_user_asm(nr, addr, __ret_sbu, -EFAULT); \
>> + else \
>> + __ret_sbu = -EFAULT; \
>> + \
>> + __ret_sbu; \
>> +})
>> +
>
> Should be called __set_bit_user() since it is non-atomic.

Actually I first named it like that and then noticed that in the uaccess'
convention, __ prefix means it is "with less checking" version.

I don't know which is better in this case, should be "set_bit_user_non_atomic"
though bit long? May be judged by x86 people.

>
> An interesting wart is that this will use the kernel's word size instead
> of userspace word size for access. So, a 32-bit process might allocate a
> 4-byte bitmap, and a 64-bit kernel will use a 64-bit access to touch it,
> which might result in a fault. This might be resolved by documenting
> that userspace bitmaps must be a multiple of 64-bits in size and
> recommending that they be 64-bit aligned as well.

Oh, I didn't care about that. I'll explain in the next version.

>
> Can you replace the macros with inline functions?
>

Yes, I can. I have both inline and macro versions and talking with Fernando
which will be preferred.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps
  2010-04-11 17:12   ` Avi Kivity
@ 2010-04-12  1:53     ` Takuya Yoshikawa
  0 siblings, 0 replies; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-12  1:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, fernando

(2010/04/12 2:12), Avi Kivity wrote:
> On 04/09/2010 12:32 PM, Takuya Yoshikawa wrote:
>> We will use this later in other parts.
>>
>
> s/rapper/wrapper/...

Oh, my poor English, sorry.

>
>>
>> +static inline int kvm_dirty_bitmap_bytes(struct kvm_memory_slot
>> *memslot)
>> +{
>> + return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
>> +}
>> +
>
> 'int' may overflow.

Then, we have to fix a bit more.

In mark_page_dirty(), we pass unsigned long rel_gfn to __set_bit(),
but __set_bit() takes the offset as int.

We have to shift the base before using __set_bit().

I'll send this as a separate bug fix.

>
>> struct kvm_kernel_irq_routing_entry {
>> u32 gsi;
>> u32 type;
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 9379533..5ab581e 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -645,7 +645,7 @@ skip_lpage:
>>
>> /* Allocate page dirty bitmap if needed */
>> if ((new.flags& KVM_MEM_LOG_DIRTY_PAGES)&& !new.dirty_bitmap) {
>> - unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8;
>> + int dirty_bytes = kvm_dirty_bitmap_bytes(&new);
>
> Ah, an existing bug.
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps
  2010-04-11 17:13   ` Avi Kivity
@ 2010-04-12  2:07     ` Takuya Yoshikawa
  2010-04-12  9:13       ` Avi Kivity
  0 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-12  2:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, fernando

(2010/04/12 2:13), Avi Kivity wrote:
> On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote:
>> For x86, we will change the allocation and free parts to do_mmap() and
>> do_munmap(). This patch makes it cleaner.
>>
>
> Should be done for all architectures. I don't want different ways of
> creating dirty bitmaps for different architectures.
>

OK. So the only difference for other(generic) implementation would be
if we pin bitmaps or not.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps
  2010-04-11 17:15   ` Avi Kivity
@ 2010-04-12  2:15     ` Takuya Yoshikawa
  2010-04-12  9:19       ` Avi Kivity
  0 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-12  2:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, fernando

(2010/04/12 2:15), Avi Kivity wrote:
> On 04/09/2010 12:35 PM, Takuya Yoshikawa wrote:
>> Currently, x86 vmalloc()s a dirty bitmap every time when we swich
>> to the next dirty bitmap. To avoid this, we use the double buffering
>> technique: we also move the bitmaps to userspace, so that extra
>> bitmaps will not use the precious kernel resource.
>>
>> This idea is based on Avi's suggestion.
>>
>
> Please fold this into the next patch. Introducing new data members
> without their users is hard to follow.

OK.

>
>
>> #define KVM_MAX_VCPUS 64
>> #define KVM_MEMORY_SLOTS 32
>> /* memory slots that does not exposed to userspace */
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index dd6bcf4..07092d6 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -110,7 +110,13 @@ struct kvm_memory_slot {
>> unsigned long npages;
>> unsigned long flags;
>> unsigned long *rmap;
>> +#ifndef __KVM_HAVE_USER_DIRTYBITMAP
>> unsigned long *dirty_bitmap;
>> +#else
>> + unsigned long __user *dirty_bitmap;
>> + unsigned long __user *dirty_bitmap_old;
>> + bool is_dirty;
>> +#endif
>
> And, if we make set_user_bit() generic, we don't need the ifdefs.

OK, but we have one problem: ia64. I checked all architectures' dirty bitmap
implementations and thought generalizing this work is not so hard except for
ia64. It's already too different from other parts.

#ifdef CONFIG_IA64
unsigned long *dirty_bitmap;
#else
...
#endif

is acceptable?

>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space"
  2010-04-11 17:21   ` Avi Kivity
@ 2010-04-12  2:29     ` Takuya Yoshikawa
  2010-04-12  9:22       ` Avi Kivity
  0 siblings, 1 reply; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-12  2:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, fernando

(2010/04/12 2:21), Avi Kivity wrote:
> On 04/09/2010 12:38 PM, Takuya Yoshikawa wrote:
>> By this patch, bitmap allocation is replaced with do_mmap() and
>> bitmap manipulation is replaced with *_user() functions.
>>
>> Note that this does not change the APIs between kernel and user space.
>> To get more advantage from this hack, we need to add a new interface
>> for triggering the bitmap swith and getting the bitmap addresses: the
>> addresses is in user space and we can export them to qemu.
>>
>
> I would like to merge the new API together with the patchset, since
> without it, the improvement is insufficient.

OK, it would be only a simple API functions.

>
>> TODO:
>> 1. We want to use copy_in_user() for 32bit case too.
>
> Definitely. Why doesn't it work now?

Sadly we don't have that for 32bit. We have to implement by ourselves.

I tested two temporary implementations for 32bit:
   1. This version using copy_from_user() and copy_to_user() with
      not nice vmalloc().
   2. Loop with __get_user() and __put_user().

The result was 1 is much faster than 2.

>
>> Note that this is only for the compatibility issue: in the future,
>> we hope, qemu will not need to use this ioctl.
>> 2. We have to implement test_bit_user() to avoid extra set_bit.
>
> This was important in the days of shadow paging. I'm not so sure about
> it with nested paging, since we'll typically only fault a page once per
> iteration. Since we're very likely to actually write these days, the
> extra access is wasteful.

Nice news for me! So all we need to ask x86(asm-generic) people to merge are:

set bit user and copy_in_user 32bit version.

>
>>
>> +int kvm_arch_create_dirty_bitmap(struct kvm_memory_slot *memslot)
>> +{
>> + unsigned long user_addr1;
>> + unsigned long user_addr2;
>> + int dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
>> +
>> + down_write(&current->mm->mmap_sem);
>> + user_addr1 = do_mmap(NULL, 0, dirty_bytes,
>> + PROT_READ | PROT_WRITE,
>> + MAP_PRIVATE | MAP_ANONYMOUS, 0);
>> + if (IS_ERR((void *)user_addr1)) {
>> + up_write(&current->mm->mmap_sem);
>> + return PTR_ERR((void *)user_addr1);
>> + }
>> + user_addr2 = do_mmap(NULL, 0, dirty_bytes,
>> + PROT_READ | PROT_WRITE,
>> + MAP_PRIVATE | MAP_ANONYMOUS, 0);
>> + if (IS_ERR((void *)user_addr2)) {
>> + do_munmap(current->mm, user_addr1, dirty_bytes);
>> + up_write(&current->mm->mmap_sem);
>> + return PTR_ERR((void *)user_addr2);
>> + }
>> + up_write(&current->mm->mmap_sem);
>
> I think a single do_mmap() call for both bitmaps is simpler in terms of
> error handling.

OK, I'll fix in the next version.

>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space
  2010-04-12  1:29     ` Takuya Yoshikawa
@ 2010-04-12  9:12       ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2010-04-12  9:12 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/12/2010 04:29 AM, Takuya Yoshikawa wrote:
>> Should be called __set_bit_user() since it is non-atomic.
>
>
> Actually I first named it like that and then noticed that in the uaccess'
> convention, __ prefix means it is "with less checking" version.

On the other hand, for the bitops family, __ means nonatomic.

>
> I don't know which is better in this case, should be 
> "set_bit_user_non_atomic"
> though bit long? May be judged by x86 people.

And there's _inatomic which means no sleeping...  it's good to have a 
long name since it avoids confusion, especially in a newly introduced 
function.  So I like your last suggestion.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps
  2010-04-12  2:07     ` Takuya Yoshikawa
@ 2010-04-12  9:13       ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2010-04-12  9:13 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/12/2010 05:07 AM, Takuya Yoshikawa wrote:
> (2010/04/12 2:13), Avi Kivity wrote:
>> On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote:
>>> For x86, we will change the allocation and free parts to do_mmap() and
>>> do_munmap(). This patch makes it cleaner.
>>>
>>
>> Should be done for all architectures. I don't want different ways of
>> creating dirty bitmaps for different architectures.
>>
>
> OK. So the only difference for other(generic) implementation would be
> if we pin bitmaps or not.

Well, ppc shouldn't need to pin the bitmap either, I'd think.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps
  2010-04-12  2:15     ` Takuya Yoshikawa
@ 2010-04-12  9:19       ` Avi Kivity
  2010-04-12  9:30         ` Takuya Yoshikawa
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-12  9:19 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/12/2010 05:15 AM, Takuya Yoshikawa wrote:
>
> OK, but we have one problem: ia64. I checked all architectures' dirty 
> bitmap
> implementations and thought generalizing this work is not so hard 
> except for
> ia64. It's already too different from other parts.
>
> #ifdef CONFIG_IA64
> unsigned long *dirty_bitmap;
> #else
> ...
> #endif
>
> is acceptable?

I think you can keep the bitmap in userspace, but replace the vmalloc() 
with get_user_pages() and vmap() (in arch/ia64).  'dirty_bitmap' can 
then be in kvm->arch.

Note: this will likely break ia64 without testing.  Please copy the 
patches to kvm-ia64@vger.kernel.org so they can test and fix them if 
they want to.

The patches should at least build, though.  If you don't have an ia64 
machine, I can build-test them for you.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space"
  2010-04-12  2:29     ` Takuya Yoshikawa
@ 2010-04-12  9:22       ` Avi Kivity
  2010-04-12 20:55         ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2010-04-12  9:22 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, fernando

On 04/12/2010 05:29 AM, Takuya Yoshikawa wrote:
>
>>
>>> TODO:
>>> 1. We want to use copy_in_user() for 32bit case too.
>>
>> Definitely. Why doesn't it work now?
>
> Sadly we don't have that for 32bit. We have to implement by ourselves.
>
> I tested two temporary implementations for 32bit:
>   1. This version using copy_from_user() and copy_to_user() with
>      not nice vmalloc().
>   2. Loop with __get_user() and __put_user().
>
> The result was 1 is much faster than 2.

What about copy_from_user()/copy_to_user() through a 512 byte buffer on 
the kernel stack?  That should give the speed of 1 without the vmalloc() 
(which will fail on 32 bit if you copy large blocks).

>
>>
>>> Note that this is only for the compatibility issue: in the future,
>>> we hope, qemu will not need to use this ioctl.
>>> 2. We have to implement test_bit_user() to avoid extra set_bit.
>>
>> This was important in the days of shadow paging. I'm not so sure about
>> it with nested paging, since we'll typically only fault a page once per
>> iteration. Since we're very likely to actually write these days, the
>> extra access is wasteful.
>
> Nice news for me! So all we need to ask x86(asm-generic) people to 
> merge are:
>
> set bit user and copy_in_user 32bit version.

They might still want test_bit_user and clear_bit_user for completeness :)

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps
  2010-04-12  9:19       ` Avi Kivity
@ 2010-04-12  9:30         ` Takuya Yoshikawa
  0 siblings, 0 replies; 25+ messages in thread
From: Takuya Yoshikawa @ 2010-04-12  9:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, fernando


>
> I think you can keep the bitmap in userspace, but replace the vmalloc()
> with get_user_pages() and vmap() (in arch/ia64). 'dirty_bitmap' can then
> be in kvm->arch.
>
> Note: this will likely break ia64 without testing. Please copy the
> patches to kvm-ia64@vger.kernel.org so they can test and fix them if
> they want to.

OK, I'll keep in mind.

>
> The patches should at least build, though. If you don't have an ia64
> machine, I can build-test them for you.

Thank you, I don't have an ia64 machine here, so need your help.

>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space"
  2010-04-12  9:22       ` Avi Kivity
@ 2010-04-12 20:55         ` Fernando Luis Vazquez Cao
  2010-04-12 21:00           ` Avi Kivity
  0 siblings, 1 reply; 25+ messages in thread
From: Fernando Luis Vazquez Cao @ 2010-04-12 20:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, mtosatti, kvm

Avi Kivity wrote:
> On 04/12/2010 05:29 AM, Takuya Yoshikawa wrote:
>>>> TODO:
>>>> 1. We want to use copy_in_user() for 32bit case too.
>>>
>>> Definitely. Why doesn't it work now?
>>
>> Sadly we don't have that for 32bit. We have to implement by ourselves.
>>
>> I tested two temporary implementations for 32bit:
>>   1. This version using copy_from_user() and copy_to_user() with
>>      not nice vmalloc().
>>   2. Loop with __get_user() and __put_user().
>>
>> The result was 1 is much faster than 2.
>
> What about copy_from_user()/copy_to_user() through a 512 byte buffer 
> on the kernel stack?

Reserving 512 bytes on the stack looks like too much, I'd rather kmalloc 
a 512 byte buffer at VM
creation time and pass it down to the dirty page tracking code. Would 
you be OK with such an
approach?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space"
  2010-04-12 20:55         ` Fernando Luis Vazquez Cao
@ 2010-04-12 21:00           ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2010-04-12 21:00 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao; +Cc: Takuya Yoshikawa, mtosatti, kvm

On 04/12/2010 11:55 PM, Fernando Luis Vazquez Cao wrote:
>>> Sadly we don't have that for 32bit. We have to implement by ourselves.
>>>
>>> I tested two temporary implementations for 32bit:
>>>   1. This version using copy_from_user() and copy_to_user() with
>>>      not nice vmalloc().
>>>   2. Loop with __get_user() and __put_user().
>>>
>>> The result was 1 is much faster than 2.
>>
>> What about copy_from_user()/copy_to_user() through a 512 byte buffer 
>> on the kernel stack?
>
>
> Reserving 512 bytes on the stack looks like too much, I'd rather 
> kmalloc a 512 byte buffer at VM
> creation time and pass it down to the dirty page tracking code. Would 
> you be OK with such an
> approach?

But a generic copy_in_user() can't access that (I guess I wasn't clear - 
just like I want a generic set_bit_user(), I want a generic 
copy_in_user() instead of ifdefs in my code).  But I guess we can make 
it as simple as possible using get_user()/put_user(), and someone will 
optimize it.

Alternatively use a 256 byte buffer, should be fine for the stack and 
for performance.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space
  2010-04-11 17:08   ` Avi Kivity
  2010-04-12  1:29     ` Takuya Yoshikawa
@ 2010-04-21  4:56     ` Fernando Luis Vázquez Cao
  2010-04-21  8:09       ` Avi Kivity
  1 sibling, 1 reply; 25+ messages in thread
From: Fernando Luis Vázquez Cao @ 2010-04-21  4:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, mtosatti, kvm

On 04/12/2010 02:08 AM, Avi Kivity wrote:
>> +#define __set_bit_user_asm(nr, addr, err, errret)            \
>> +    asm volatile("1:    bts %1,%2\n"                \
>> +             "2:\n"                        \
>> +             ".section .fixup,\"ax\"\n"                \
>> +             "3:    mov %3,%0\n"                \
>> +             "    jmp 2b\n"                    \
>> +             ".previous\n"                    \
>> +             _ASM_EXTABLE(1b, 3b)                \
>> +             : "=r"(err)                    \
>> +             : "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
>> +
>> +#define set_bit_user(nr, addr)                        \
>> +({                                    \
>> +    int __ret_sbu = 0;                        \
>> +                                    \
>> +    might_fault();                            \
>> +    if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))            \
>> +        __set_bit_user_asm(nr, addr, __ret_sbu, -EFAULT);    \
>> +    else                                \
>> +        __ret_sbu = -EFAULT;                    \
>> +                                    \
>> +    __ret_sbu;                            \
>> +})
>> +
>>    
> 
> Should be called __set_bit_user() since it is non-atomic.
> 
> An interesting wart is that this will use the kernel's word size instead
> of userspace word size for access.  So, a 32-bit process might allocate
> a 4-byte bitmap, and a 64-bit kernel will use a 64-bit access to touch
> it, which might result in a fault.  This might be resolved by
> documenting that userspace bitmaps must be a multiple of 64-bits in size
> and recommending that they be 64-bit aligned as well.

Yes, the inline assembler above generates a REX prefixed bts with the W field
set (48 0f ab), which means we have a 64 bit operand size. In addition to the
solution you propose we could also implement a legacy mode version that uses
a 32bit bts. Compat ioctls and that ilk could pontentially benefit from this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space
  2010-04-21  4:56     ` Fernando Luis Vázquez Cao
@ 2010-04-21  8:09       ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2010-04-21  8:09 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao; +Cc: Takuya Yoshikawa, mtosatti, kvm

On 04/21/2010 07:56 AM, Fernando Luis Vázquez Cao wrote:
> On 04/12/2010 02:08 AM, Avi Kivity wrote:
>    
>>> +#define __set_bit_user_asm(nr, addr, err, errret)            \
>>> +    asm volatile("1:    bts %1,%2\n"                \
>>> +             "2:\n"                        \
>>> +             ".section .fixup,\"ax\"\n"                \
>>> +             "3:    mov %3,%0\n"                \
>>> +             "    jmp 2b\n"                    \
>>> +             ".previous\n"                    \
>>> +             _ASM_EXTABLE(1b, 3b)                \
>>> +             : "=r"(err)                    \
>>> +             : "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
>>> +
>>> +#define set_bit_user(nr, addr)                        \
>>> +({                                    \
>>> +    int __ret_sbu = 0;                        \
>>> +                                    \
>>> +    might_fault();                            \
>>> +    if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))            \
>>> +        __set_bit_user_asm(nr, addr, __ret_sbu, -EFAULT);    \
>>> +    else                                \
>>> +        __ret_sbu = -EFAULT;                    \
>>> +                                    \
>>> +    __ret_sbu;                            \
>>> +})
>>> +
>>>
>>>        
>> Should be called __set_bit_user() since it is non-atomic.
>>
>> An interesting wart is that this will use the kernel's word size instead
>> of userspace word size for access.  So, a 32-bit process might allocate
>> a 4-byte bitmap, and a 64-bit kernel will use a 64-bit access to touch
>> it, which might result in a fault.  This might be resolved by
>> documenting that userspace bitmaps must be a multiple of 64-bits in size
>> and recommending that they be 64-bit aligned as well.
>>      
> Yes, the inline assembler above generates a REX prefixed bts with the W field
> set (48 0f ab), which means we have a 64 bit operand size. In addition to the
> solution you propose we could also implement a legacy mode version that uses
> a 32bit bts. Compat ioctls and that ilk could pontentially benefit from this.
>    

We generally try to avoid compat ioctls; declaring that the bitmap is 
little endian and uses 64-bit words works around all of the difficulties.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-04-21  8:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-09  9:27 [PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach Takuya Yoshikawa
2010-04-09  9:30 ` [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space Takuya Yoshikawa
2010-04-11 17:08   ` Avi Kivity
2010-04-12  1:29     ` Takuya Yoshikawa
2010-04-12  9:12       ` Avi Kivity
2010-04-21  4:56     ` Fernando Luis Vázquez Cao
2010-04-21  8:09       ` Avi Kivity
2010-04-09  9:32 ` [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps Takuya Yoshikawa
2010-04-11 17:12   ` Avi Kivity
2010-04-12  1:53     ` Takuya Yoshikawa
2010-04-09  9:34 ` [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy " Takuya Yoshikawa
2010-04-11 17:13   ` Avi Kivity
2010-04-12  2:07     ` Takuya Yoshikawa
2010-04-12  9:13       ` Avi Kivity
2010-04-09  9:35 ` [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps Takuya Yoshikawa
2010-04-11 17:15   ` Avi Kivity
2010-04-12  2:15     ` Takuya Yoshikawa
2010-04-12  9:19       ` Avi Kivity
2010-04-12  9:30         ` Takuya Yoshikawa
2010-04-09  9:38 ` [PATCH RFC 5/5] KVM: This is the main part of the "moving dirty bitmaps to user space" Takuya Yoshikawa
2010-04-11 17:21   ` Avi Kivity
2010-04-12  2:29     ` Takuya Yoshikawa
2010-04-12  9:22       ` Avi Kivity
2010-04-12 20:55         ` Fernando Luis Vazquez Cao
2010-04-12 21:00           ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.