linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
@ 2010-05-04 12:56 Takuya Yoshikawa
  2010-05-04 12:58 ` [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean Takuya Yoshikawa
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 12:56 UTC (permalink / raw)
  To: avi, mtosatti, agraf
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, linux-kernel,
	kvm-ppc, yoshikawa.takuya, linuxppc-dev, mingo, paulus, hpa,
	tglx

Hi, sorry for sending from my personal account.
The following series are all from me:

  From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

  The 3rd version of "moving dirty bitmaps to user space".

>From this version, we add x86 and ppc and asm-generic people to CC lists.


[To KVM people]

Sorry for being late to reply your comments.

Avi,
 - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c .

 - I've considered to change the set_bit_user_non_atomic to an inline function,
   but did not change because the other helpers in the uaccess.h are written as
   macros. Anyway, I hope that x86 people will give us appropriate suggestions
   about this.

 - I thought that documenting about making bitmaps 64-bit aligned will be
   written when we add an API to register user-allocated bitmaps. So probably
   in the next series.

Avi, Alex,
 - Could you check the ia64 and ppc parts, please? I tried to keep the logical
   changes as small as possible.

   I personally tried to build these with cross compilers. For ia64, I could check
   build success with my patch series. But book3s, even without my patch series,
   it failed with the following errors:

  arch/powerpc/kvm/book3s_paired_singles.c: In function 'kvmppc_emulate_paired_single':
  arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 bytes is larger than 2048 bytes
  make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
  make: *** [arch/powerpc/kvm] Error 2


About changelog: there are two main changes from the 2nd version:
  1. I changed the treatment of clean slots (see patch 1/12).
     This was already applied today, thanks!
  2. I changed the switch API. (see patch 11/12).

To show this API's advantage, I also did a test (see the end of this mail).


[To x86 people]

Hi, Thomas, Ingo, Peter,

Please review the patches 4,5/12. Because this is the first experience for
me to send patches to x86, please tell me if this lacks anything.


[To ppc people]

Hi, Benjamin, Paul, Alex,

Please see the patches 6,7/12. I first say sorry for that I've not tested these
yet. In that sense, these may not be in the quality for precise reviews. But I
will be happy if you would give me any comments.

Alex, could you help me? Though I have a plan to get PPC box in the future,
currently I cannot test these.



[To asm-generic people]

Hi, Arnd,

Please review the patch 8/12. This kind of macro is acceptable?





[Performance test]

We measured the tsc needed to the ioctl()s for getting dirty logs in
kernel.

Test environment

  AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory


1. GUI test (running Ubuntu guest in graphical mode)

  sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ...

We show a relatively stable part to compare how much time is needed
for the basic parts of dirty log ioctl.

                           get.org   get.opt  switch.opt

slots[7].len=32768          278379     66398     64024
slots[8].len=32768          181246       270       160
slots[7].len=32768          263961     64673     64494
slots[8].len=32768          181655       265       160
slots[7].len=32768          263736     64701     64610
slots[8].len=32768          182785       267       160
slots[7].len=32768          260925     65360     65042
slots[8].len=32768          182579       264       160
slots[7].len=32768          267823     65915     65682
slots[8].len=32768          186350       271       160

At a glance, we know our optimization improved significantly compared
to the original get dirty log ioctl. This is true for both get.opt and
switch.opt. This has a really big impact for the personal KVM users who
drive KVM in GUI mode on their usual PCs.

Next, we notice that switch.opt improved a hundred nano seconds or so for
these slots. Although this may sound a bit tiny improvement, we can feel
this as a difference of GUI's responses like mouse reactions.

To feel the difference, please try GUI on your PC with our patch series!


2. Live-migration test (4GB guest, write loop with 1GB buf)

We also did a live-migration test.

                           get.org   get.opt  switch.opt

slots[0].len=655360         797383    261144    222181
slots[1].len=3757047808    2186721   1965244   1842824
slots[2].len=637534208     1433562   1012723   1031213
slots[3].len=131072         216858       331       331
slots[4].len=131072         121635       225       164
slots[5].len=131072         120863       356       164
slots[6].len=16777216       121746      1133       156
slots[7].len=32768          120415       230       278
slots[8].len=32768          120368       216       149
slots[0].len=655360         806497    194710    223582
slots[1].len=3757047808    2142922   1878025   1895369
slots[2].len=637534208     1386512   1021309   1000345
slots[3].len=131072         221118       459       296
slots[4].len=131072         121516       272       166
slots[5].len=131072         122652       244       173
slots[6].len=16777216       123226     99185       149
slots[7].len=32768          121803       457       505
slots[8].len=32768          121586       216       155
slots[0].len=655360         766113    211317    213179
slots[1].len=3757047808    2155662   1974790   1842361
slots[2].len=637534208     1481411   1020004   1031352
slots[3].len=131072         223100       351       295
slots[4].len=131072         122982       436       164
slots[5].len=131072         122100       300       503
slots[6].len=16777216       123653       779       151
slots[7].len=32768          122617       284       157
slots[8].len=32768          122737       253       149

For slots other than 0,1,2 we can see the similar improvement.

Considering the fact that switch.opt does not depend on the bitmap length
except for kvm_mmu_slot_remove_write_access(), this is the cause of some
usec to msec time consumption: there might be some context switches.

But note that this was done with the workload which dirtied the memory
endlessly during the live-migration.

In usual workload, the number of dirty pages varies a lot for each iteration
and we should gain really a lot for relatively clean cases.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
@ 2010-05-04 12:58 ` Takuya Yoshikawa
  2010-05-04 13:00 ` [RFC][PATCH 2/12] KVM: introduce slot level dirty state management Takuya Yoshikawa
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 12:58 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. Furthermore, we can avoid the
tlb flush triggered by vmalloc() and get some good effects. In my test,
the improved ioctl was about 4 to 10 times faster than the original one
for clean slots.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/x86.c |   37 +++++++++++++++++++++++--------------
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..b95a211 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	struct kvm_memory_slot *memslot;
 	unsigned long n;
 	unsigned long is_dirty = 0;
-	unsigned long *dirty_bitmap = NULL;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 
-	r = -ENOMEM;
-	dirty_bitmap = vmalloc(n);
-	if (!dirty_bitmap)
-		goto out;
-	memset(dirty_bitmap, 0, n);
-
 	for (i = 0; !is_dirty && i < n/sizeof(long); i++)
 		is_dirty = memslot->dirty_bitmap[i];
 
 	/* If nothing is dirty, don't bother messing with page tables. */
 	if (is_dirty) {
 		struct kvm_memslots *slots, *old_slots;
+		unsigned long *dirty_bitmap;
 
 		spin_lock(&kvm->mmu_lock);
 		kvm_mmu_slot_remove_write_access(kvm, log->slot);
 		spin_unlock(&kvm->mmu_lock);
 
-		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-		if (!slots)
-			goto out_free;
+		r = -ENOMEM;
+		dirty_bitmap = vmalloc(n);
+		if (!dirty_bitmap)
+			goto out;
+		memset(dirty_bitmap, 0, n);
 
+		r = -ENOMEM;
+		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+		if (!slots) {
+			vfree(dirty_bitmap);
+			goto out;
+		}
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
 		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
 
@@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		synchronize_srcu_expedited(&kvm->srcu);
 		dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
 		kfree(old_slots);
+
+		r = -EFAULT;
+		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
+			vfree(dirty_bitmap);
+			goto out;
+		}
+		vfree(dirty_bitmap);
+	} else {
+		r = -EFAULT;
+		if (clear_user(log->dirty_bitmap, n))
+			goto out;
 	}
 
 	r = 0;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
-		r = -EFAULT;
-out_free:
-	vfree(dirty_bitmap);
 out:
 	mutex_unlock(&kvm->slots_lock);
 	return r;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 2/12] KVM: introduce slot level dirty state management
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
  2010-05-04 12:58 ` [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean Takuya Yoshikawa
@ 2010-05-04 13:00 ` Takuya Yoshikawa
  2010-05-04 13:01 ` [RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps Takuya Yoshikawa
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:00 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

This patch introduces is_dirty member for each memory slot.
Using this member, we remove the dirty bitmap scans which are done in
get_dirty_log().

This is important for moving dirty bitmaps to user space because we don't
have any good ways to check bitmaps in user space with low cost and scanning
bitmaps to check memory slot dirtiness will not be acceptable.

When we mark a slot dirty:
 - x86 and ppc: at the timing of mark_page_dirty()
 - ia64: at the timing of kvm_ia64_sync_dirty_log()
ia64 uses a different place to store dirty logs and synchronize it with
the logs of memory slots right before the get_dirty_log(). So we use this
timing to update is_dirty.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
CC: Alexander Graf <agraf@suse.de>
---
 arch/ia64/kvm/kvm-ia64.c  |   11 +++++++----
 arch/powerpc/kvm/book3s.c |    9 ++++-----
 arch/x86/kvm/x86.c        |    9 +++------
 include/linux/kvm_host.h  |    4 ++--
 virt/kvm/kvm_main.c       |   13 +++----------
 5 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d5f4e91..17fd65c 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1824,6 +1824,9 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
 	base = memslot->base_gfn / BITS_PER_LONG;
 
 	for (i = 0; i < n/sizeof(long); ++i) {
+		if (dirty_bitmap[base + i])
+			memslot->is_dirty = true;
+
 		memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
 		dirty_bitmap[base + i] = 0;
 	}
@@ -1838,7 +1841,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	int r;
 	unsigned long n;
 	struct kvm_memory_slot *memslot;
-	int is_dirty = 0;
 
 	mutex_lock(&kvm->slots_lock);
 	spin_lock(&kvm->arch.dirty_log_lock);
@@ -1847,16 +1849,17 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	if (r)
 		goto out;
 
-	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	r = kvm_get_dirty_log(kvm, log);
 	if (r)
 		goto out;
 
+	memslot = &kvm->memslots->memslots[log->slot];
 	/* If nothing is dirty, don't bother messing with page tables. */
-	if (is_dirty) {
+	if (memslot->is_dirty) {
 		kvm_flush_remote_tlbs(kvm);
-		memslot = &kvm->memslots->memslots[log->slot];
 		n = kvm_dirty_bitmap_bytes(memslot);
 		memset(memslot->dirty_bitmap, 0, n);
+		memslot->is_dirty = false;
 	}
 	r = 0;
 out:
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 28e785f..4b074f1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1191,20 +1191,18 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	struct kvm_memory_slot *memslot;
 	struct kvm_vcpu *vcpu;
 	ulong ga, ga_end;
-	int is_dirty = 0;
 	int r;
 	unsigned long n;
 
 	mutex_lock(&kvm->slots_lock);
 
-	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	r = kvm_get_dirty_log(kvm, log);
 	if (r)
 		goto out;
 
+	memslot = &kvm->memslots->memslots[log->slot];
 	/* If nothing is dirty, don't bother messing with page tables. */
-	if (is_dirty) {
-		memslot = &kvm->memslots->memslots[log->slot];
-
+	if (memslot->is_dirty) {
 		ga = memslot->base_gfn << PAGE_SHIFT;
 		ga_end = ga + (memslot->npages << PAGE_SHIFT);
 
@@ -1213,6 +1211,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 		n = kvm_dirty_bitmap_bytes(memslot);
 		memset(memslot->dirty_bitmap, 0, n);
+		memslot->is_dirty = false;
 	}
 
 	r = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b95a211..023c7f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2740,10 +2740,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				      struct kvm_dirty_log *log)
 {
-	int r, i;
+	int r;
 	struct kvm_memory_slot *memslot;
 	unsigned long n;
-	unsigned long is_dirty = 0;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -2758,11 +2757,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 
-	for (i = 0; !is_dirty && i < n/sizeof(long); i++)
-		is_dirty = memslot->dirty_bitmap[i];
-
 	/* If nothing is dirty, don't bother messing with page tables. */
-	if (is_dirty) {
+	if (memslot->is_dirty) {
 		struct kvm_memslots *slots, *old_slots;
 		unsigned long *dirty_bitmap;
 
@@ -2784,6 +2780,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		}
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
 		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
+		slots->memslots[log->slot].is_dirty = false;
 
 		old_slots = kvm->memslots;
 		rcu_assign_pointer(kvm->memslots, slots);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ce027d5..0aa6ecb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -117,6 +117,7 @@ struct kvm_memory_slot {
 	unsigned long flags;
 	unsigned long *rmap;
 	unsigned long *dirty_bitmap;
+	bool is_dirty;
 	struct {
 		unsigned long rmap_pde;
 		int write_count;
@@ -335,8 +336,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 int kvm_dev_ioctl_check_extension(long ext);
 
-int kvm_get_dirty_log(struct kvm *kvm,
-			struct kvm_dirty_log *log, int *is_dirty);
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				struct kvm_dirty_log *log);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9ab1a77..7ab6312 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -768,13 +768,11 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 	return kvm_set_memory_region(kvm, mem, user_alloc);
 }
 
-int kvm_get_dirty_log(struct kvm *kvm,
-			struct kvm_dirty_log *log, int *is_dirty)
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	struct kvm_memory_slot *memslot;
-	int r, i;
+	int r;
 	unsigned long n;
-	unsigned long any = 0;
 
 	r = -EINVAL;
 	if (log->slot >= KVM_MEMORY_SLOTS)
@@ -787,16 +785,10 @@ int kvm_get_dirty_log(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 
-	for (i = 0; !any && i < n/sizeof(long); ++i)
-		any = memslot->dirty_bitmap[i];
-
 	r = -EFAULT;
 	if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
 		goto out;
 
-	if (any)
-		*is_dirty = 1;
-
 	r = 0;
 out:
 	return r;
@@ -1193,6 +1185,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
 		generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
+		memslot->is_dirty = true;
 	}
 }
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
  2010-05-04 12:58 ` [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean Takuya Yoshikawa
  2010-05-04 13:00 ` [RFC][PATCH 2/12] KVM: introduce slot level dirty state management Takuya Yoshikawa
@ 2010-05-04 13:01 ` Takuya Yoshikawa
  2010-05-04 13:02 ` [RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit Takuya Yoshikawa
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:01 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

We will change the vmalloc() and vfree() to do_mmap() and do_munmap() later.
This patch makes it easy and cleanup the code.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 virt/kvm/kvm_main.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7ab6312..3e3acad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -435,6 +435,12 @@ out_err_nodisable:
 	return ERR_PTR(r);
 }
 
+static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	vfree(memslot->dirty_bitmap);
+	memslot->dirty_bitmap = NULL;
+}
+
 /*
  * Free any memory in @free but not in @dont.
  */
@@ -447,7 +453,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 		vfree(free->rmap);
 
 	if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
-		vfree(free->dirty_bitmap);
+		kvm_destroy_dirty_bitmap(free);
 
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
@@ -458,7 +464,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 	}
 
 	free->npages = 0;
-	free->dirty_bitmap = NULL;
 	free->rmap = NULL;
 }
 
@@ -520,6 +525,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+	memslot->dirty_bitmap = vmalloc(dirty_bytes);
+	if (!memslot->dirty_bitmap)
+		return -ENOMEM;
+
+	memset(memslot->dirty_bitmap, 0, dirty_bytes);
+	return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -653,12 +670,8 @@ skip_lpage:
 
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-		unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new);
-
-		new.dirty_bitmap = vmalloc(dirty_bytes);
-		if (!new.dirty_bitmap)
+		if (kvm_create_dirty_bitmap(&new) < 0)
 			goto out_free;
-		memset(new.dirty_bitmap, 0, dirty_bytes);
 		/* destroy any largepage mappings for dirty tracking */
 		if (old.npages)
 			flush_shadow = 1;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2010-05-04 13:01 ` [RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps Takuya Yoshikawa
@ 2010-05-04 13:02 ` Takuya Yoshikawa
  2010-05-04 13:02 ` [RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:02 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit x86 and ppc: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with existing generic copy user
helpers.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/uaccess_32.h |    2 ++
 arch/x86/lib/usercopy_32.c        |   26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 088d09f..85d396d 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -21,6 +21,8 @@ unsigned long __must_check __copy_from_user_ll_nocache
 		(void *to, const void __user *from, unsigned long n);
 unsigned long __must_check __copy_from_user_ll_nocache_nozero
 		(void *to, const void __user *from, unsigned long n);
+unsigned long __must_check copy_in_user
+		(void __user *to, const void __user *from, unsigned n);
 
 /**
  * __copy_to_user_inatomic: - Copy a block of data into user space, with less checking.
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e218d5d..e90ffc3 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -889,3 +889,29 @@ void copy_from_user_overflow(void)
 	WARN(1, "Buffer overflow detected!\n");
 }
 EXPORT_SYMBOL(copy_from_user_overflow);
+
+/**
+ * copy_in_user: - Copy a block of data from user space to user space.
+ * @to:   Destination address, in user space.
+ * @from: Source address, in user space.
+ * @n:    Number of bytes to copy.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * Copy data from user space to user space.
+ *
+ * Returns number of bytes that could not be copied.
+ * On success, this will be zero.
+ */
+unsigned long
+copy_in_user(void __user *to, const void __user *from, unsigned n)
+{
+	if (access_ok(VERIFY_WRITE, to, n) && access_ok(VERIFY_READ, from, n)) {
+		if (movsl_is_ok(to, from, n))
+			__copy_user(to, from, n);
+		else
+			n = __copy_user_intel(to, (const void *)from, n);
+	}
+	return n;
+}
+EXPORT_SYMBOL(copy_in_user);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (3 preceding siblings ...)
  2010-05-04 13:02 ` [RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit Takuya Yoshikawa
@ 2010-05-04 13:02 ` Takuya Yoshikawa
  2010-05-04 13:03 ` [RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit Takuya Yoshikawa
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:02 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a macro set_bit_user_non_atomic() following the implementation
style of x86's uaccess functions.

Note: there is a one restriction to this macro: bitmaps must be 64-bit
aligned (see the comment in this patch).

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/uaccess.h |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index abd3e0e..3138e65 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -98,6 +98,45 @@ struct exception_table_entry {
 
 extern int fixup_exception(struct pt_regs *regs);
 
+/**
+ * set_bit_user_non_atomic: - set a bit of a bitmap in user space.
+ * @nr:   Bit offset.
+ * @addr: Base address of a bitmap in user space.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * This macro set a bit of a bitmap in user space.
+ *
+ * Restriction: the bitmap pointed to by @addr must be 64-bit aligned:
+ * the kernel accesses the bitmap by its own word length, so bitmaps
+ * allocated by 32-bit processes may cause fault.
+ *
+ * Returns zero on success, or -EFAULT on error.
+ */
+#define __set_bit_user_non_atomic_asm(nr, addr, err, errret)		\
+	asm volatile("1:	bts %1,%2\n"				\
+		     "2:\n"						\
+		     ".section .fixup,\"ax\"\n"				\
+		     "3:	mov %3,%0\n"				\
+		     "	jmp 2b\n"					\
+		     ".previous\n"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : "=r"(err)					\
+		     : "r" (nr), "m" (__m(addr)), "i" (errret), "0" (err))
+
+#define set_bit_user_non_atomic(nr, addr)				\
+({									\
+	int __ret_sbu;							\
+									\
+	might_fault();							\
+	if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))			\
+		__set_bit_user_non_atomic_asm(nr, addr,	__ret_sbu, -EFAULT);\
+	else								\
+		__ret_sbu = -EFAULT;					\
+									\
+	__ret_sbu;							\
+})
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (4 preceding siblings ...)
  2010-05-04 13:02 ` [RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
@ 2010-05-04 13:03 ` Takuya Yoshikawa
  2010-05-04 13:04 ` [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:03 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit ppc and x86: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with __copy_tofrom_user().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Alexander Graf <agraf@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/uaccess.h |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index bd0fb84..3a01ce8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -359,6 +359,23 @@ static inline unsigned long copy_to_user(void __user *to,
 	return n;
 }
 
+static inline unsigned long copy_in_user(void __user *to,
+		const void __user *from, unsigned long n)
+{
+	unsigned long over;
+
+	if (likely(access_ok(VERIFY_READ, from, n) &&
+	    access_ok(VERIFY_WRITE, to, n)))
+		return __copy_tofrom_user(to, from, n);
+	if (((unsigned long)from < TASK_SIZE) ||
+	    ((unsigned long)to < TASK_SIZE)) {
+		over = max((unsigned long)from, (unsigned long)to)
+			+ n - TASK_SIZE;
+		return __copy_tofrom_user(to, from, n - over) + over;
+	}
+	return n;
+}
+
 #else /* __powerpc64__ */
 
 #define __copy_in_user(to, from, size) \
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (5 preceding siblings ...)
  2010-05-04 13:03 ` [RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit Takuya Yoshikawa
@ 2010-05-04 13:04 ` Takuya Yoshikawa
  2010-05-11 16:00   ` Alexander Graf
  2010-05-04 13:05 ` [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro Takuya Yoshikawa
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:04 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a function set_bit_user_non_atomic().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Alexander Graf <agraf@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/uaccess.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 3a01ce8..f878326 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -321,6 +321,25 @@ do {								\
 	__gu_err;						\
 })
 
+static inline int set_bit_user_non_atomic(int nr, void __user *addr)
+{
+	u8 __user *p;
+	u8 val;
+
+	p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
+	if (!access_ok(VERIFY_WRITE, p, 1))
+		return -EFAULT;
+
+	if (__get_user(val, p))
+		return -EFAULT;
+
+	val |= 1U << (nr % BITS_PER_BYTE);
+	if (__put_user(val, p))
+		return -EFAULT;
+
+	return 0;
+}
+
 
 /* more complex routines */
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (6 preceding siblings ...)
  2010-05-04 13:04 ` [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
@ 2010-05-04 13:05 ` Takuya Yoshikawa
  2010-05-04 15:03   ` Arnd Bergmann
  2010-05-04 13:06 ` [RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic() Takuya Yoshikawa
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:05 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

Although we can use *_le_bit() helpers to treat bitmaps le arranged,
having le bit offset calculation as a seperate macro gives us more freedom.

For example, KVM has le arranged dirty bitmaps for VGA, live-migration
and they are used in user space too. To avoid bitmap copies between kernel
and user space, we want to update the bitmaps in user space directly.
To achive this, le bit offset with *_user() functions help us a lot.

So let us use the le bit offset calculation part by defining it as a new
macro: generic_le_bit_offset() .

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
CC: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/bitops/le.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 80e3bf1..ee445fb 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -9,6 +9,8 @@
 
 #if defined(__LITTLE_ENDIAN)
 
+#define generic_le_bit_offset(nr)	(nr)
+
 #define generic_test_le_bit(nr, addr) test_bit(nr, addr)
 #define generic___set_le_bit(nr, addr) __set_bit(nr, addr)
 #define generic___clear_le_bit(nr, addr) __clear_bit(nr, addr)
@@ -25,6 +27,8 @@
 
 #elif defined(__BIG_ENDIAN)
 
+#define generic_le_bit_offset(nr)	((nr) ^ BITOP_LE_SWIZZLE)
+
 #define generic_test_le_bit(nr, addr) \
 	test_bit((nr) ^ BITOP_LE_SWIZZLE, (addr))
 #define generic___set_le_bit(nr, addr) \
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic()
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (7 preceding siblings ...)
  2010-05-04 13:05 ` [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro Takuya Yoshikawa
@ 2010-05-04 13:06 ` Takuya Yoshikawa
  2010-05-04 13:07 ` [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space Takuya Yoshikawa
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:06 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

This is not to break the build for other architectures than x86 and ppc.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 arch/ia64/include/asm/kvm_host.h    |    5 +++++
 arch/powerpc/include/asm/kvm_host.h |    6 ++++++
 arch/s390/include/asm/kvm_host.h    |    6 ++++++
 arch/x86/include/asm/kvm_host.h     |    5 +++++
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index a362e67..938041b 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -589,6 +589,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+	return 0;
+}
+
 #endif /* __ASSEMBLY__*/
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0c9ad86..9463524 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 #include <asm/kvm_asm.h>
+#include <asm/uaccess.h>
 
 #define KVM_MAX_VCPUS 1
 #define KVM_MEMORY_SLOTS 32
@@ -287,4 +288,9 @@ struct kvm_vcpu_arch {
 #endif
 };
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+	return set_bit_user_non_atomic(nr, addr);
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 27605b6..36710ee 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -238,4 +238,10 @@ struct kvm_arch{
 };
 
 extern int sie64a(struct kvm_s390_sie_block *, unsigned long *);
+
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+	return 0;
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f0007b..9e22df9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -795,4 +795,9 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+	return set_bit_user_non_atomic(nr, addr);
+}
+
 #endif /* _ASM_X86_KVM_HOST_H */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (8 preceding siblings ...)
  2010-05-04 13:06 ` [RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic() Takuya Yoshikawa
@ 2010-05-04 13:07 ` Takuya Yoshikawa
  2010-05-11  3:28   ` Marcelo Tosatti
  2010-05-04 13:08 ` [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps Takuya Yoshikawa
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:07 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

We move dirty bitmaps to user space.

 - Allocation and destruction: we use do_mmap() and do_munmap().
   The new bitmap space is twice longer than the original one and we
   use the additional space for double buffering: this makes it
   possible to update the active bitmap while letting the user space
   read the other one safely. For x86, we can also remove the vmalloc()
   in kvm_vm_ioctl_get_dirty_log().

 - Bitmap manipulations: we replace all functions which access dirty
   bitmaps with *_user() functions.

 - For ia64: moving the dirty bitmaps of memory slots does not affect
   ia64 much because it's using a different place to store dirty logs
   rather than the dirty bitmaps of memory slots: all we have to change
   are sync and get of dirty log, so we don't need set_bit_user like
   functions for ia64.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
CC: Alexander Graf <agraf@suse.de>
---
 arch/ia64/kvm/kvm-ia64.c  |   15 +++++++++-
 arch/powerpc/kvm/book3s.c |    5 +++-
 arch/x86/kvm/x86.c        |   25 ++++++++----------
 include/linux/kvm_host.h  |    3 +-
 virt/kvm/kvm_main.c       |   62 +++++++++++++++++++++++++++++++++++++-------
 5 files changed, 82 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 17fd65c..03503e6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
 	n = kvm_dirty_bitmap_bytes(memslot);
 	base = memslot->base_gfn / BITS_PER_LONG;
 
+	r = -EFAULT;
+	if (!access_ok(VERIFY_WRITE, memslot->dirty_bitmap, n))
+		goto out;
+
 	for (i = 0; i < n/sizeof(long); ++i) {
 		if (dirty_bitmap[base + i])
 			memslot->is_dirty = true;
 
-		memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
+		if (__put_user(dirty_bitmap[base + i],
+			       &memslot->dirty_bitmap[i])) {
+			r = -EFAULT;
+			goto out;
+		}
 		dirty_bitmap[base + i] = 0;
 	}
 	r = 0;
@@ -1858,7 +1866,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	if (memslot->is_dirty) {
 		kvm_flush_remote_tlbs(kvm);
 		n = kvm_dirty_bitmap_bytes(memslot);
-		memset(memslot->dirty_bitmap, 0, n);
+		if (clear_user(memslot->dirty_bitmap, n)) {
+			r = -EFAULT;
+			goto out;
+		}
 		memslot->is_dirty = false;
 	}
 	r = 0;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 4b074f1..2a31d2f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1210,7 +1210,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
 
 		n = kvm_dirty_bitmap_bytes(memslot);
-		memset(memslot->dirty_bitmap, 0, n);
+		if (clear_user(memslot->dirty_bitmap, n)) {
+			r = -EFAULT;
+			goto out;
+		}
 		memslot->is_dirty = false;
 	}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 023c7f8..32a3d94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2760,40 +2760,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	/* If nothing is dirty, don't bother messing with page tables. */
 	if (memslot->is_dirty) {
 		struct kvm_memslots *slots, *old_slots;
-		unsigned long *dirty_bitmap;
+		unsigned long __user *dirty_bitmap;
+		unsigned long __user *dirty_bitmap_old;
 
 		spin_lock(&kvm->mmu_lock);
 		kvm_mmu_slot_remove_write_access(kvm, log->slot);
 		spin_unlock(&kvm->mmu_lock);
 
-		r = -ENOMEM;
-		dirty_bitmap = vmalloc(n);
-		if (!dirty_bitmap)
+		dirty_bitmap = memslot->dirty_bitmap;
+		dirty_bitmap_old = memslot->dirty_bitmap_old;
+		r = -EFAULT;
+		if (clear_user(dirty_bitmap_old, n))
 			goto out;
-		memset(dirty_bitmap, 0, n);
 
 		r = -ENOMEM;
 		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-		if (!slots) {
-			vfree(dirty_bitmap);
+		if (!slots)
 			goto out;
-		}
+
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
+		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old;
+		slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap;
 		slots->memslots[log->slot].is_dirty = false;
 
 		old_slots = kvm->memslots;
 		rcu_assign_pointer(kvm->memslots, slots);
 		synchronize_srcu_expedited(&kvm->srcu);
-		dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
 		kfree(old_slots);
 
 		r = -EFAULT;
-		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
-			vfree(dirty_bitmap);
+		if (copy_in_user(log->dirty_bitmap, dirty_bitmap, n))
 			goto out;
-		}
-		vfree(dirty_bitmap);
 	} else {
 		r = -EFAULT;
 		if (clear_user(log->dirty_bitmap, n))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0aa6ecb..c95e2b7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -116,7 +116,8 @@ struct kvm_memory_slot {
 	unsigned long npages;
 	unsigned long flags;
 	unsigned long *rmap;
-	unsigned long *dirty_bitmap;
+	unsigned long __user *dirty_bitmap;
+	unsigned long __user *dirty_bitmap_old;
 	bool is_dirty;
 	struct {
 		unsigned long rmap_pde;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3e3acad..ddcf65a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -437,8 +437,20 @@ out_err_nodisable:
 
 static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
-	vfree(memslot->dirty_bitmap);
+	unsigned long user_addr;
+	unsigned long n = kvm_dirty_bitmap_bytes(memslot);
+
+	if (!memslot->dirty_bitmap)
+		return;
+
+	user_addr = min((unsigned long)memslot->dirty_bitmap,
+			(unsigned long)memslot->dirty_bitmap_old);
+	down_write(&current->mm->mmap_sem);
+	do_munmap(current->mm, user_addr, 2 * n);
+	up_write(&current->mm->mmap_sem);
+
 	memslot->dirty_bitmap = NULL;
+	memslot->dirty_bitmap_old = NULL;
 }
 
 /*
@@ -472,8 +484,12 @@ void kvm_free_physmem(struct kvm *kvm)
 	int i;
 	struct kvm_memslots *slots = kvm->memslots;
 
-	for (i = 0; i < slots->nmemslots; ++i)
+	for (i = 0; i < slots->nmemslots; ++i) {
+		/* VM process will exit: we don't unmap by ourselves. */
+		slots->memslots[i].dirty_bitmap = NULL;
+		slots->memslots[i].dirty_bitmap_old = NULL;
 		kvm_free_physmem_slot(&slots->memslots[i], NULL);
+	}
 
 	kfree(kvm->memslots);
 }
@@ -527,14 +543,35 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 
 static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
-	unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+	int err;
+	unsigned long user_addr;
+	unsigned long n = kvm_dirty_bitmap_bytes(memslot);
 
-	memslot->dirty_bitmap = vmalloc(dirty_bytes);
-	if (!memslot->dirty_bitmap)
-		return -ENOMEM;
+	down_write(&current->mm->mmap_sem);
+	user_addr = do_mmap(NULL, 0, 2 * n,
+			    PROT_READ | PROT_WRITE,
+			    MAP_PRIVATE | MAP_ANONYMOUS, 0);
+	up_write(&current->mm->mmap_sem);
+
+	if (IS_ERR((void *)user_addr)) {
+		err = PTR_ERR((void *)user_addr);
+		goto out;
+	}
+
+	memslot->dirty_bitmap = (unsigned long __user *)user_addr;
+	memslot->dirty_bitmap_old = (unsigned long __user *)(user_addr + n);
+	if (clear_user(memslot->dirty_bitmap, 2 * n)) {
+		err = -EFAULT;
+		goto out_unmap;
+	}
 
-	memset(memslot->dirty_bitmap, 0, dirty_bytes);
 	return 0;
+out_unmap:
+	down_write(&current->mm->mmap_sem);
+	do_munmap(current->mm, user_addr, 2 * n);
+	up_write(&current->mm->mmap_sem);
+out:
+	return err;
 }
 
 /*
@@ -799,7 +836,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	n = kvm_dirty_bitmap_bytes(memslot);
 
 	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+	if (copy_in_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
 		goto out;
 
 	r = 0;
@@ -1195,11 +1232,16 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 	gfn = unalias_gfn(kvm, gfn);
 	memslot = gfn_to_memslot_unaliased(kvm, gfn);
 	if (memslot && memslot->dirty_bitmap) {
-		unsigned long rel_gfn = gfn - memslot->base_gfn;
+		int nr = generic_le_bit_offset(gfn - memslot->base_gfn);
 
-		generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
+		if (kvm_set_bit_user(nr, memslot->dirty_bitmap))
+			goto out_fault;
 		memslot->is_dirty = true;
 	}
+	return;
+
+out_fault:
+	printk(KERN_WARNING "%s: kvm_set_bit_user failed.\n", __func__);
 }
 
 /*
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (9 preceding siblings ...)
  2010-05-04 13:07 ` [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space Takuya Yoshikawa
@ 2010-05-04 13:08 ` Takuya Yoshikawa
  2010-05-11  3:43   ` Marcelo Tosatti
  2010-05-04 13:11 ` [RFC][PATCH 12/12 sample] qemu-kvm: use " Takuya Yoshikawa
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:08 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

Now that dirty bitmaps are accessible from user space, we export the
addresses of these to achieve zero-copy dirty log check:

  KVM_GET_USER_DIRTY_LOG_ADDR

We also need an API for triggering dirty bitmap switch to take the full
advantage of double buffered bitmaps.

  KVM_SWITCH_DIRTY_LOG

See the documentation in this patch for precise explanations.

About performance improvement: the most important feature of switch API
is the lightness. In our test, this appeared in the form of improved
responses for GUI manipulations.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
CC: Alexander Graf <agraf@suse.de>
---
 Documentation/kvm/api.txt |   87 +++++++++++++++++++++++++++++++++++++++++++++
 arch/ia64/kvm/kvm-ia64.c  |   27 +++++++++-----
 arch/powerpc/kvm/book3s.c |   18 +++++++--
 arch/x86/kvm/x86.c        |   44 ++++++++++++++++-------
 include/linux/kvm.h       |   11 ++++++
 include/linux/kvm_host.h  |    4 ++-
 virt/kvm/kvm_main.c       |   63 +++++++++++++++++++++++++++++----
 7 files changed, 220 insertions(+), 34 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index a237518..c106c83 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -892,6 +892,93 @@ arguments.
 This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
 irqchip, the multiprocessing state must be maintained by userspace.
 
+4.39 KVM_GET_USER_DIRTY_LOG_ADDR
+
+Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see below)
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_user_dirty_log (in/out)
+Returns: 0 on success, -1 on error
+
+This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead
+of the old dirty log manipulation by KVM_GET_DIRTY_LOG.
+
+A bit about KVM_CAP_USER_DIRTY_LOG
+
+Before this ioctl was introduced, dirty bitmaps for dirty page logging were
+allocated in the kernel's memory space.  But we have now moved them to user
+space to get more flexiblity and performance.  To achive this move without
+breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability
+into a few generations which can be identified by its check extension
+return values.
+
+This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the
+KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations.
+
+What you get
+
+By using this, you can get paired bitmap addresses which are accessible from
+user space.  See the explanation in 4.40 for the roles of these two bitmaps.
+
+How to Get
+
+Before calling this, you have to set the slot member of kvm_user_dirty_log
+to indicate the target memory slot.
+
+struct kvm_user_dirty_log {
+	__u32 slot;
+	__u32 flags;
+	__u64 dirty_bitmap;
+	__u64 dirty_bitmap_old;
+};
+
+The addresses will be set in the paired members: dirty_bitmap and _old.
+
+Note
+
+In generation 1, we support bitmaps which are created in kernel but do not
+support bitmaps created by users.  This means bitmap creation/destruction
+are done same as before when you instruct KVM by KVM_SET_USER_MEMORY_REGION
+(see 4.34) to start/stop logging.  Please do not try to free the exported
+bitmaps by yourself, or KVM will access the freed area and end with fault.
+
+4.40 KVM_SWITCH_DIRTY_LOG
+
+Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see 4.39)
+Architectures: all
+Type: vm ioctl
+Parameters: memory slot id
+Returns: 0 if switched, 1 if not (slot was clean), -1 on error
+
+This ioctl allows you to switch the dirty log to the next one: a newer
+ioctl for getting dirty page logs than KVM_GET_DIRTY_LOG (see 4.7 for the
+explanation about dirty page logging, log format is not changed).
+
+If you have the capability KVM_CAP_USER_DIRTY_LOG, using this is strongly
+recommended than using KVM_GET_DIRTY_LOG because this does not need buffer
+copy between kernel and user space.
+
+How to Use
+
+Before calling this, you have to remember the paired addresses of dirty
+bitmaps which can be obtained by KVM_GET_USER_DIRTY_LOG_ADDR (see 4.39):
+dirty_bitmap (being used now by kernel) and dirty_bitmap_old (not being
+used now and containing the last log).
+
+After calling this, the role of these bitmaps will change like this;
+If the return value was 0, kernel cleared dirty_bitmap_old and began to use
+it for the next logging, so that you can use the cold dirty_bitmap to check
+the log since the last switch.  If the return value was 1, all pages were not
+dirty and bitmap switch was not done.  Note that remembering which bitmap is
+now active is your responsibility.  So you have to update your remembering
+when you get the return value 0.
+
+Note
+
+Bitmap switch may also occur when you call KVM_GET_DIRTY_LOG.  Please use
+either one, preferably this one, only to avoid extra confusion.  We do not
+guarantee on which condition KVM_GET_DIRTY_LOG causes bitmap switch.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 03503e6..b590b80 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1801,8 +1801,7 @@ void kvm_arch_exit(void)
 	kvm_vmm_info = NULL;
 }
 
-static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
-		struct kvm_dirty_log *log)
+static int kvm_ia64_sync_dirty_log(struct kvm *kvm, int slot)
 {
 	struct kvm_memory_slot *memslot;
 	int r, i;
@@ -1812,10 +1811,10 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
 			offsetof(struct kvm_vm_data, kvm_mem_dirty_log));
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (slot >= KVM_MEMORY_SLOTS)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[log->slot];
+	memslot = &kvm->memslots->memslots[slot];
 	r = -ENOENT;
 	if (!memslot->dirty_bitmap)
 		goto out;
@@ -1843,8 +1842,8 @@ out:
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
-		struct kvm_dirty_log *log)
+static int kvm_ia64_update_dirty_log(struct kvm *kvm, int slot,
+				     unsigned long __user *log_bitmap)
 {
 	int r;
 	unsigned long n;
@@ -1853,15 +1852,15 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 	spin_lock(&kvm->arch.dirty_log_lock);
 
-	r = kvm_ia64_sync_dirty_log(kvm, log);
+	r = kvm_ia64_sync_dirty_log(kvm, slot);
 	if (r)
 		goto out;
 
-	r = kvm_get_dirty_log(kvm, log);
+	r = kvm_update_dirty_log(kvm, slot, log_bitmap);
 	if (r)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[log->slot];
+	memslot = &kvm->memslots->memslots[slot];
 	/* If nothing is dirty, don't bother messing with page tables. */
 	if (memslot->is_dirty) {
 		kvm_flush_remote_tlbs(kvm);
@@ -1879,6 +1878,16 @@ out:
 	return r;
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return kvm_ia64_update_dirty_log(kvm, log->slot, log->dirty_bitmap);
+}
+
+int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot)
+{
+	return kvm_ia64_update_dirty_log(kvm, slot, NULL);
+}
+
 int kvm_arch_hardware_setup(void)
 {
 	return 0;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2a31d2f..54b3a76 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1185,8 +1185,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
-				      struct kvm_dirty_log *log)
+static int kvmppc_update_dirty_log(struct kvm *kvm, int slot,
+				   unsigned long __user *log_bitmap)
 {
 	struct kvm_memory_slot *memslot;
 	struct kvm_vcpu *vcpu;
@@ -1196,11 +1196,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	mutex_lock(&kvm->slots_lock);
 
-	r = kvm_get_dirty_log(kvm, log);
+	r = kvm_update_dirty_log(kvm, slot, log_bitmap);
 	if (r)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[log->slot];
+	memslot = &kvm->memslots->memslots[slot];
 	/* If nothing is dirty, don't bother messing with page tables. */
 	if (memslot->is_dirty) {
 		ga = memslot->base_gfn << PAGE_SHIFT;
@@ -1223,6 +1223,16 @@ out:
 	return r;
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return kvmppc_update_dirty_log(kvm, log->slot, log->dirty_bitmap);
+}
+
+int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot)
+{
+	return kvmppc_update_dirty_log(kvm, slot, NULL);
+}
+
 int kvmppc_core_check_processor_compat(void)
 {
 	return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 32a3d94..7a31ab1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2737,8 +2737,8 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
-				      struct kvm_dirty_log *log)
+static int kvm_x86_update_dirty_log(struct kvm *kvm, int slot,
+				    unsigned long __user *log_bitmap)
 {
 	int r;
 	struct kvm_memory_slot *memslot;
@@ -2747,10 +2747,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (slot >= KVM_MEMORY_SLOTS)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[log->slot];
+	memslot = &kvm->memslots->memslots[slot];
 	r = -ENOENT;
 	if (!memslot->dirty_bitmap)
 		goto out;
@@ -2764,7 +2764,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		unsigned long __user *dirty_bitmap_old;
 
 		spin_lock(&kvm->mmu_lock);
-		kvm_mmu_slot_remove_write_access(kvm, log->slot);
+		kvm_mmu_slot_remove_write_access(kvm, slot);
 		spin_unlock(&kvm->mmu_lock);
 
 		dirty_bitmap = memslot->dirty_bitmap;
@@ -2779,22 +2779,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 			goto out;
 
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old;
-		slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap;
-		slots->memslots[log->slot].is_dirty = false;
+		slots->memslots[slot].dirty_bitmap = dirty_bitmap_old;
+		slots->memslots[slot].dirty_bitmap_old = dirty_bitmap;
+		slots->memslots[slot].is_dirty = false;
 
 		old_slots = kvm->memslots;
 		rcu_assign_pointer(kvm->memslots, slots);
 		synchronize_srcu_expedited(&kvm->srcu);
 		kfree(old_slots);
 
-		r = -EFAULT;
-		if (copy_in_user(log->dirty_bitmap, dirty_bitmap, n))
-			goto out;
+		if (log_bitmap) {
+			r = -EFAULT;
+			if (copy_in_user(log_bitmap, dirty_bitmap, n))
+				goto out;
+		}
 	} else {
-		r = -EFAULT;
-		if (clear_user(log->dirty_bitmap, n))
+		if (log_bitmap) {
+			r = -EFAULT;
+			if (clear_user(log_bitmap, n))
+				goto out;
+		} else {
+			/* Tell the user about no switch. */
+			r = 1;
 			goto out;
+		}
 	}
 
 	r = 0;
@@ -2803,6 +2811,16 @@ out:
 	return r;
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return kvm_x86_update_dirty_log(kvm, log->slot, log->dirty_bitmap);
+}
+
+int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot)
+{
+	return kvm_x86_update_dirty_log(kvm, slot, NULL);
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 23ea022..47980c2 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -322,6 +322,14 @@ struct kvm_dirty_log {
 	};
 };
 
+/* for KVM_GET_USER_DIRTY_LOG_ADDR */
+struct kvm_user_dirty_log {
+	__u32 slot;
+	__u32 flags;
+	__u64 dirty_bitmap;
+	__u64 dirty_bitmap_old;
+};
+
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
 	__u32 len;
@@ -524,6 +532,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_PPC_OSI 52
 #define KVM_CAP_PPC_UNSET_IRQ 53
 #define KVM_CAP_ENABLE_CAP 54
+#define KVM_CAP_USER_DIRTY_LOG 55
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -620,6 +629,8 @@ struct kvm_clock_data {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO,  0x49, struct kvm_user_dirty_log)
+#define KVM_SWITCH_DIRTY_LOG      _IO(KVMIO,   0x4a)
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP        _IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE              _IOW(KVMIO,  0x61, struct kvm_irq_level)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c95e2b7..a94277a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -337,9 +337,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 int kvm_dev_ioctl_check_extension(long ext);
 
-int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+int kvm_update_dirty_log(struct kvm *kvm, int slot,
+			 unsigned long __user *log_bitmap);
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				struct kvm_dirty_log *log);
+int kvm_vm_ioctl_switch_dirty_log(struct kvm *kvm, int slot);
 
 int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 				   struct
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ddcf65a..300a0c1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -818,26 +818,55 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 	return kvm_set_memory_region(kvm, mem, user_alloc);
 }
 
-int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+static int kvm_vm_ioctl_get_user_dirty_log_addr(struct kvm *kvm,
+						struct kvm_user_dirty_log *log)
+{
+	struct kvm_memory_slot *memslot;
+
+	if (log->slot >= KVM_MEMORY_SLOTS)
+		return -EINVAL;
+
+	memslot = &kvm->memslots->memslots[log->slot];
+	if (!memslot->dirty_bitmap)
+		return -ENOENT;
+
+	log->dirty_bitmap = (unsigned long)memslot->dirty_bitmap;
+	log->dirty_bitmap_old = (unsigned long)memslot->dirty_bitmap_old;
+	return 0;
+}
+
+int kvm_update_dirty_log(struct kvm *kvm, int slot,
+			 unsigned long __user *log_bitmap)
 {
 	struct kvm_memory_slot *memslot;
 	int r;
-	unsigned long n;
 
 	r = -EINVAL;
-	if (log->slot >= KVM_MEMORY_SLOTS)
+	if (slot >= KVM_MEMORY_SLOTS)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[log->slot];
+	memslot = &kvm->memslots->memslots[slot];
 	r = -ENOENT;
 	if (!memslot->dirty_bitmap)
 		goto out;
 
-	n = kvm_dirty_bitmap_bytes(memslot);
+	if (log_bitmap) {
+		unsigned long n = kvm_dirty_bitmap_bytes(memslot);
 
-	r = -EFAULT;
-	if (copy_in_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+		r = -EFAULT;
+		if (copy_in_user(log_bitmap, memslot->dirty_bitmap, n))
+			goto out;
+	} else if (memslot->is_dirty) {
+		unsigned long __user *dirty_bitmap;
+
+		dirty_bitmap = memslot->dirty_bitmap;
+		memslot->dirty_bitmap = memslot->dirty_bitmap_old;
+		memslot->dirty_bitmap_old = dirty_bitmap;
+	} else {
+		/* Tell the user about no switch. */
+		r = 1;
 		goto out;
+	}
 
 	r = 0;
 out:
@@ -1647,6 +1676,25 @@ static long kvm_vm_ioctl(struct file *filp,
 			goto out;
 		break;
 	}
+	case KVM_GET_USER_DIRTY_LOG_ADDR: {
+		struct kvm_user_dirty_log log;
+
+		r = -EFAULT;
+		if (copy_from_user(&log, argp, sizeof log))
+			goto out;
+		r = kvm_vm_ioctl_get_user_dirty_log_addr(kvm, &log);
+		if (r)
+			goto out;
+		r = -EFAULT;
+		if (copy_to_user(argp, &log, sizeof log))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_SWITCH_DIRTY_LOG: {
+		r = kvm_vm_ioctl_switch_dirty_log(kvm, arg);
+		break;
+	}
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 	case KVM_REGISTER_COALESCED_MMIO: {
 		struct kvm_coalesced_mmio_zone zone;
@@ -1823,6 +1871,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg)
 	case KVM_CAP_USER_MEMORY:
 	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
 	case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS:
+	case KVM_CAP_USER_DIRTY_LOG:
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
 	case KVM_CAP_SET_BOOT_CPU_ID:
 #endif
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][PATCH 12/12 sample] qemu-kvm: use new API for getting/switching dirty bitmaps
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (10 preceding siblings ...)
  2010-05-04 13:08 ` [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps Takuya Yoshikawa
@ 2010-05-04 13:11 ` Takuya Yoshikawa
  2010-05-10 12:06 ` [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Avi Kivity
  2010-05-11 15:55 ` Alexander Graf
  13 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 13:11 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

We use new API for light dirty log access if KVM supports it.

This conflicts with Marcelo's patches. So please take this as a sample patch.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 kvm/include/linux/kvm.h |   11 ++++++
 qemu-kvm.c              |   81 ++++++++++++++++++++++++++++++++++++++++++-----
 qemu-kvm.h              |    1 +
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index 6485981..efd9538 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -317,6 +317,14 @@ struct kvm_dirty_log {
 	};
 };
 
+/* for KVM_GET_USER_DIRTY_LOG_ADDR */
+struct kvm_user_dirty_log {
+	__u32 slot;
+	__u32 flags;
+	__u64 dirty_bitmap;
+	__u64 dirty_bitmap_old;
+};
+
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
 	__u32 len;
@@ -499,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_PPC_SEGSTATE 43
 
 #define KVM_CAP_PCI_SEGMENT 47
+#define KVM_CAP_USER_DIRTY_LOG 55
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -595,6 +604,8 @@ struct kvm_clock_data {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO,  0x49, struct kvm_user_dirty_log)
+#define KVM_SWITCH_DIRTY_LOG      _IO(KVMIO,   0x4a)
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP        _IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE              _IOW(KVMIO,  0x61, struct kvm_irq_level)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 91f0222..98777f0 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -143,6 +143,8 @@ struct slot_info {
     unsigned long userspace_addr;
     unsigned flags;
     int logging_count;
+    unsigned long *dirty_bitmap;
+    unsigned long *dirty_bitmap_old;
 };
 
 struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS];
@@ -232,6 +234,29 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr,
     return 1;
 }
 
+static int kvm_user_dirty_log_works(void)
+{
+    return kvm_state->user_dirty_log;
+}
+
+static int kvm_set_user_dirty_log(int slot)
+{
+    int r;
+    struct kvm_user_dirty_log dlog;
+
+    dlog.slot = slot;
+    r = kvm_vm_ioctl(kvm_state, KVM_GET_USER_DIRTY_LOG_ADDR, &dlog);
+    if (r < 0) {
+        DPRINTF("KVM_GET_USER_DIRTY_LOG_ADDR failed: %s\n", strerror(-r));
+        return r;
+    }
+    slots[slot].dirty_bitmap = (unsigned long *)
+                               ((unsigned long)dlog.dirty_bitmap);
+    slots[slot].dirty_bitmap_old = (unsigned long *)
+                                   ((unsigned long)dlog.dirty_bitmap_old);
+    return r;
+}
+
 /*
  * dirty pages logging control
  */
@@ -265,8 +290,16 @@ static int kvm_dirty_pages_log_change(kvm_context_t kvm,
         DPRINTF("slot %d start %llx len %llx flags %x\n",
                 mem.slot, mem.guest_phys_addr, mem.memory_size, mem.flags);
         r = kvm_vm_ioctl(kvm_state, KVM_SET_USER_MEMORY_REGION, &mem);
-        if (r < 0)
+        if (r < 0) {
             fprintf(stderr, "%s: %m\n", __FUNCTION__);
+            return r;
+        }
+    }
+    if (flags & KVM_MEM_LOG_DIRTY_PAGES) {
+        r = kvm_set_user_dirty_log(slot);
+    } else {
+        slots[slot].dirty_bitmap = NULL;
+        slots[slot].dirty_bitmap_old = NULL;
     }
     return r;
 }
@@ -589,7 +622,6 @@ int kvm_register_phys_mem(kvm_context_t kvm,
                           unsigned long phys_start, void *userspace_addr,
                           unsigned long len, int log)
 {
-
     struct kvm_userspace_memory_region memory = {
         .memory_size = len,
         .guest_phys_addr = phys_start,
@@ -608,6 +640,9 @@ int kvm_register_phys_mem(kvm_context_t kvm,
         fprintf(stderr, "create_userspace_phys_mem: %s\n", strerror(-r));
         return -1;
     }
+    if (log) {
+        r = kvm_set_user_dirty_log(memory.slot);
+    }
     register_slot(memory.slot, memory.guest_phys_addr, memory.memory_size,
                   memory.userspace_addr, memory.flags);
     return 0;
@@ -652,6 +687,8 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long phys_start,
         fprintf(stderr, "destroy_userspace_phys_mem: %s", strerror(-r));
         return;
     }
+    slots[memory.slot].dirty_bitmap = NULL;
+    slots[memory.slot].dirty_bitmap_old = NULL;
 
     free_slot(memory.slot);
 }
@@ -692,6 +729,21 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long phys_addr, void *buf)
     return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf);
 }
 
+static int kvm_switch_map(int slot)
+{
+    int r;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_SWITCH_DIRTY_LOG, slot);
+    if (r == 0) {
+        unsigned long *dirty_bitmap;
+
+        dirty_bitmap = slots[slot].dirty_bitmap;
+        slots[slot].dirty_bitmap = slots[slot].dirty_bitmap_old;
+        slots[slot].dirty_bitmap_old = dirty_bitmap;
+    }
+    return r;
+}
+
 int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr,
                               unsigned long len, void *opaque,
                               int (*cb)(unsigned long start,
@@ -706,14 +758,25 @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr,
     for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i) {
         if ((slots[i].len && (uint64_t) slots[i].phys_addr >= phys_addr)
             && ((uint64_t) slots[i].phys_addr + slots[i].len <= end_addr)) {
-            buf = qemu_malloc(BITMAP_SIZE(slots[i].len));
-            r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf);
-            if (r) {
+            if (kvm_user_dirty_log_works()) {
+                r = kvm_switch_map(i);
+                if (r == 1) { /* slot was clean */
+                    continue;
+                } else if (r < 0) {
+                    return r;
+                }
+                r = cb(slots[i].phys_addr, slots[i].len,
+                       slots[i].dirty_bitmap_old, opaque);
+            } else {
+                buf = qemu_malloc(BITMAP_SIZE(slots[i].len));
+                r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf);
+                if (r) {
+                    qemu_free(buf);
+                    return r;
+                }
+                r = cb(slots[i].phys_addr, slots[i].len, buf, opaque);
                 qemu_free(buf);
-                return r;
             }
-            r = cb(slots[i].phys_addr, slots[i].len, buf, opaque);
-            qemu_free(buf);
             if (r)
                 return r;
         }
@@ -2097,6 +2160,8 @@ static int kvm_create_context(void)
 #ifdef TARGET_I386
     destroy_region_works = kvm_destroy_memory_region_works(kvm_context);
 #endif
+    kvm_state->user_dirty_log
+        = kvm_check_extension(kvm_state, KVM_CAP_USER_DIRTY_LOG);
 
     r = kvm_arch_init_irq_routing();
     if (r < 0) {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index ba3808a..aea89df 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -973,6 +973,7 @@ struct KVMState {
 #endif
     int irqchip_in_kernel;
     int pit_in_kernel;
+    int user_dirty_log;
 
     struct kvm_context kvm_context;
 };
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-04 13:05 ` [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro Takuya Yoshikawa
@ 2010-05-04 15:03   ` Arnd Bergmann
  2010-05-04 16:08     ` Avi Kivity
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2010-05-04 15:03 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
> 
> Although we can use *_le_bit() helpers to treat bitmaps le arranged,
> having le bit offset calculation as a seperate macro gives us more freedom.
> 
> For example, KVM has le arranged dirty bitmaps for VGA, live-migration
> and they are used in user space too. To avoid bitmap copies between kernel
> and user space, we want to update the bitmaps in user space directly.
> To achive this, le bit offset with *_user() functions help us a lot.
> 
> So let us use the le bit offset calculation part by defining it as a new
> macro: generic_le_bit_offset() .

Does this work correctly if your user space is 32 bits (i.e. unsigned long
is different size in user space and kernel) in both big- and little-endian
systems?

I'm not sure about all the details, but I think you cannot in general share
bitmaps between user space and kernel because of this.

	Arnd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-04 15:03   ` Arnd Bergmann
@ 2010-05-04 16:08     ` Avi Kivity
  2010-05-05  2:59       ` Takuya Yoshikawa
  0 siblings, 1 reply; 36+ messages in thread
From: Avi Kivity @ 2010-05-04 16:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, x86, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, hpa, tglx, Takuya Yoshikawa

On 05/04/2010 06:03 PM, Arnd Bergmann wrote:
> On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
>    
>> Although we can use *_le_bit() helpers to treat bitmaps le arranged,
>> having le bit offset calculation as a seperate macro gives us more freedom.
>>
>> For example, KVM has le arranged dirty bitmaps for VGA, live-migration
>> and they are used in user space too. To avoid bitmap copies between kernel
>> and user space, we want to update the bitmaps in user space directly.
>> To achive this, le bit offset with *_user() functions help us a lot.
>>
>> So let us use the le bit offset calculation part by defining it as a new
>> macro: generic_le_bit_offset() .
>>      
> Does this work correctly if your user space is 32 bits (i.e. unsigned long
> is different size in user space and kernel) in both big- and little-endian
> systems?
>
> I'm not sure about all the details, but I think you cannot in general share
> bitmaps between user space and kernel because of this.
>    

That's why the bitmaps are defined as little endian u64 aligned, even on 
big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
and u64 alignment ensures we can use long-sized bitops on mixed size 
systems.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-04 16:08     ` Avi Kivity
@ 2010-05-05  2:59       ` Takuya Yoshikawa
  2010-05-06 13:38         ` Arnd Bergmann
  0 siblings, 1 reply; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-05  2:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-arch, x86, kvm, Arnd Bergmann, kvm-ia64, fernando,
	mtosatti, agraf, kvm-ppc, linux-kernel, yoshikawa.takuya,
	linuxppc-dev, mingo, paulus, hpa, tglx

On Tue, 04 May 2010 19:08:23 +0300
Avi Kivity <avi@redhat.com> wrote:

> On 05/04/2010 06:03 PM, Arnd Bergmann wrote:
> > On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
...
> >> So let us use the le bit offset calculation part by defining it as a new
> >> macro: generic_le_bit_offset() .
> >>      
> > Does this work correctly if your user space is 32 bits (i.e. unsigned long
> > is different size in user space and kernel) in both big- and little-endian
> > systems?
> >
> > I'm not sure about all the details, but I think you cannot in general share
> > bitmaps between user space and kernel because of this.
> >    
> 
> That's why the bitmaps are defined as little endian u64 aligned, even on 
> big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
> and u64 alignment ensures we can use long-sized bitops on mixed size 
> systems.

There was a suggestion to propose set_le_bit_user() kind of macros.
But what I thought was these have a constraint you two explained and seemed to be
a little bit specific to some area, like KVM.

So I decided to propose just the offset calculation macro.

  Thanks, Takuya

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-05  2:59       ` Takuya Yoshikawa
@ 2010-05-06 13:38         ` Arnd Bergmann
  2010-05-10 11:46           ` Takuya Yoshikawa
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2010-05-06 13:38 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, Avi Kivity, hpa, tglx

On Wednesday 05 May 2010, Takuya Yoshikawa wrote:
> Date: 
> Yesterday 04:59:24
> > That's why the bitmaps are defined as little endian u64 aligned, even on 
> > big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
> > and u64 alignment ensures we can use long-sized bitops on mixed size 
> > systems.

Ok, I see.

> There was a suggestion to propose set_le_bit_user() kind of macros.
> But what I thought was these have a constraint you two explained and seemed to be
> a little bit specific to some area, like KVM.
> 
> So I decided to propose just the offset calculation macro.

I'm not sure I understand how this macro is going to be used though.
If you are just using this in kernel space, that's fine, please go for
it.

However, if the intention is to use the same macro in user space, putting
it into asm-generic/bitops/* is not going to help, because those headers
are not available in user space, and I wouldn't want to change that.

The definition of the macro is not part of the ABI, so just duplicate
it in KVM if you need it there.

	Arnd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-06 13:38         ` Arnd Bergmann
@ 2010-05-10 11:46           ` Takuya Yoshikawa
  2010-05-10 12:01             ` Avi Kivity
  2010-05-10 12:01             ` Arnd Bergmann
  0 siblings, 2 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-10 11:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, x86, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, mingo, paulus, Avi Kivity,
	hpa, tglx, Takuya Yoshikawa

(2010/05/06 22:38), Arnd Bergmann wrote:
> On Wednesday 05 May 2010, Takuya Yoshikawa wrote:
>> Date:
>> Yesterday 04:59:24
>>> That's why the bitmaps are defined as little endian u64 aligned, even on
>>> big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic,
>>> and u64 alignment ensures we can use long-sized bitops on mixed size
>>> systems.
>
> Ok, I see.
>
>> There was a suggestion to propose set_le_bit_user() kind of macros.
>> But what I thought was these have a constraint you two explained and seemed to be
>> a little bit specific to some area, like KVM.
>>
>> So I decided to propose just the offset calculation macro.
>
> I'm not sure I understand how this macro is going to be used though.
> If you are just using this in kernel space, that's fine, please go for
> it.

Yes, I'm just using in kernel space: qemu has its own endian related helpers.

So if you allow us to place this macro in asm-generic/bitops/* it will help us.

Avi, what do you think? Do you want to place it in kvm.h ?


>
> However, if the intention is to use the same macro in user space, putting
> it into asm-generic/bitops/* is not going to help, because those headers
> are not available in user space, and I wouldn't want to change that.
>
> The definition of the macro is not part of the ABI, so just duplicate
> it in KVM if you need it there.
>
> 	Arnd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-10 11:46           ` Takuya Yoshikawa
@ 2010-05-10 12:01             ` Avi Kivity
  2010-05-10 12:01             ` Arnd Bergmann
  1 sibling, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-05-10 12:01 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, Arnd Bergmann, kvm, kvm-ia64, fernando,
	mtosatti, agraf, kvm-ppc, linux-kernel, linuxppc-dev, mingo,
	paulus, hpa, tglx, Takuya Yoshikawa

On 05/10/2010 02:46 PM, Takuya Yoshikawa wrote:
> (2010/05/06 22:38), Arnd Bergmann wrote:
>> On Wednesday 05 May 2010, Takuya Yoshikawa wrote:
>>> Date:
>>> Yesterday 04:59:24
>>>> That's why the bitmaps are defined as little endian u64 aligned, 
>>>> even on
>>>> big endian 32-bit systems.  Little endian bitmaps are wordsize 
>>>> agnostic,
>>>> and u64 alignment ensures we can use long-sized bitops on mixed size
>>>> systems.
>>
>> Ok, I see.
>>
>>> There was a suggestion to propose set_le_bit_user() kind of macros.
>>> But what I thought was these have a constraint you two explained and 
>>> seemed to be
>>> a little bit specific to some area, like KVM.
>>>
>>> So I decided to propose just the offset calculation macro.
>>
>> I'm not sure I understand how this macro is going to be used though.
>> If you are just using this in kernel space, that's fine, please go for
>> it.
>
> Yes, I'm just using in kernel space: qemu has its own endian related 
> helpers.
>
> So if you allow us to place this macro in asm-generic/bitops/* it will 
> help us.
>
> Avi, what do you think? Do you want to place it in kvm.h ?

I really prefer anything that is generic to be outside kvm, even if kvm 
is the only user.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-10 11:46           ` Takuya Yoshikawa
  2010-05-10 12:01             ` Avi Kivity
@ 2010-05-10 12:01             ` Arnd Bergmann
  2010-05-10 12:09               ` Takuya Yoshikawa
  1 sibling, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2010-05-10 12:01 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, mingo, paulus, Avi Kivity,
	hpa, tglx, Takuya Yoshikawa

On Monday 10 May 2010, Takuya Yoshikawa wrote:
> (2010/05/06 22:38), Arnd Bergmann wrote:
> > On Wednesday 05 May 2010, Takuya Yoshikawa wrote:
> >> There was a suggestion to propose set_le_bit_user() kind of macros.
> >> But what I thought was these have a constraint you two explained and seemed to be
> >> a little bit specific to some area, like KVM.
> >>
> >> So I decided to propose just the offset calculation macro.
> >
> > I'm not sure I understand how this macro is going to be used though.
> > If you are just using this in kernel space, that's fine, please go for
> > it.
> 
> Yes, I'm just using in kernel space: qemu has its own endian related helpers.
> 
> So if you allow us to place this macro in asm-generic/bitops/* it will help us.

No problem at all then. Thanks for the explanation.

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (11 preceding siblings ...)
  2010-05-04 13:11 ` [RFC][PATCH 12/12 sample] qemu-kvm: use " Takuya Yoshikawa
@ 2010-05-10 12:06 ` Avi Kivity
  2010-05-10 12:26   ` Takuya Yoshikawa
  2010-05-11 15:55 ` Alexander Graf
  13 siblings, 1 reply; 36+ messages in thread
From: Avi Kivity @ 2010-05-10 12:06 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, hpa, tglx

On 05/04/2010 03:56 PM, Takuya Yoshikawa wrote:
> [Performance test]
>
> We measured the tsc needed to the ioctl()s for getting dirty logs in
> kernel.
>
> Test environment
>
>    AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory
>
>
> 1. GUI test (running Ubuntu guest in graphical mode)
>
>    sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ...
>
> We show a relatively stable part to compare how much time is needed
> for the basic parts of dirty log ioctl.
>
>                             get.org   get.opt  switch.opt
>
> slots[7].len=32768          278379     66398     64024
> slots[8].len=32768          181246       270       160
> slots[7].len=32768          263961     64673     64494
> slots[8].len=32768          181655       265       160
> slots[7].len=32768          263736     64701     64610
> slots[8].len=32768          182785       267       160
> slots[7].len=32768          260925     65360     65042
> slots[8].len=32768          182579       264       160
> slots[7].len=32768          267823     65915     65682
> slots[8].len=32768          186350       271       160
>
> At a glance, we know our optimization improved significantly compared
> to the original get dirty log ioctl. This is true for both get.opt and
> switch.opt. This has a really big impact for the personal KVM users who
> drive KVM in GUI mode on their usual PCs.
>
> Next, we notice that switch.opt improved a hundred nano seconds or so for
> these slots. Although this may sound a bit tiny improvement, we can feel
> this as a difference of GUI's responses like mouse reactions.
>    

100 ns... this is a bit on the low side (and if you can measure it 
interactively you have much better reflexes than I).

> To feel the difference, please try GUI on your PC with our patch series!
>    

No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is 
problematic.  Have you tried profiling to see where the time is spent 
(well I can guess, clearing the write access from the sptes).

>
> 2. Live-migration test (4GB guest, write loop with 1GB buf)
>
> We also did a live-migration test.
>
>                             get.org   get.opt  switch.opt
>
> slots[0].len=655360         797383    261144    222181
> slots[1].len=3757047808    2186721   1965244   1842824
> slots[2].len=637534208     1433562   1012723   1031213
> slots[3].len=131072         216858       331       331
> slots[4].len=131072         121635       225       164
> slots[5].len=131072         120863       356       164
> slots[6].len=16777216       121746      1133       156
> slots[7].len=32768          120415       230       278
> slots[8].len=32768          120368       216       149
> slots[0].len=655360         806497    194710    223582
> slots[1].len=3757047808    2142922   1878025   1895369
> slots[2].len=637534208     1386512   1021309   1000345
> slots[3].len=131072         221118       459       296
> slots[4].len=131072         121516       272       166
> slots[5].len=131072         122652       244       173
> slots[6].len=16777216       123226     99185       149
> slots[7].len=32768          121803       457       505
> slots[8].len=32768          121586       216       155
> slots[0].len=655360         766113    211317    213179
> slots[1].len=3757047808    2155662   1974790   1842361
> slots[2].len=637534208     1481411   1020004   1031352
> slots[3].len=131072         223100       351       295
> slots[4].len=131072         122982       436       164
> slots[5].len=131072         122100       300       503
> slots[6].len=16777216       123653       779       151
> slots[7].len=32768          122617       284       157
> slots[8].len=32768          122737       253       149
>
> For slots other than 0,1,2 we can see the similar improvement.
>
> Considering the fact that switch.opt does not depend on the bitmap length
> except for kvm_mmu_slot_remove_write_access(), this is the cause of some
> usec to msec time consumption: there might be some context switches.
>
> But note that this was done with the workload which dirtied the memory
> endlessly during the live-migration.
>
> In usual workload, the number of dirty pages varies a lot for each iteration
> and we should gain really a lot for relatively clean cases.
>    

Can you post such a test, for an idle large guest?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
  2010-05-10 12:01             ` Arnd Bergmann
@ 2010-05-10 12:09               ` Takuya Yoshikawa
  0 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-10 12:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, x86, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, mingo, paulus, Avi Kivity,
	hpa, tglx, Takuya Yoshikawa


>>
>> Yes, I'm just using in kernel space: qemu has its own endian related helpers.
>>
>> So if you allow us to place this macro in asm-generic/bitops/* it will help us.
>
> No problem at all then. Thanks for the explanation.
>
> Acked-by: Arnd Bergmann<arnd@arndb.de>

Thanks you both. I will add your Acked-by from now on!

   Takuya

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-10 12:06 ` [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Avi Kivity
@ 2010-05-10 12:26   ` Takuya Yoshikawa
  2010-05-11 10:11     ` Takuya Yoshikawa
  2010-05-13 11:47     ` Avi Kivity
  0 siblings, 2 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-10 12:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, mingo, paulus, hpa, tglx,
	Takuya Yoshikawa


>>
>> get.org get.opt switch.opt
>>
>> slots[7].len=32768 278379 66398 64024
>> slots[8].len=32768 181246 270 160
>> slots[7].len=32768 263961 64673 64494
>> slots[8].len=32768 181655 265 160
>> slots[7].len=32768 263736 64701 64610
>> slots[8].len=32768 182785 267 160
>> slots[7].len=32768 260925 65360 65042
>> slots[8].len=32768 182579 264 160
>> slots[7].len=32768 267823 65915 65682
>> slots[8].len=32768 186350 271 160
>>
>> At a glance, we know our optimization improved significantly compared
>> to the original get dirty log ioctl. This is true for both get.opt and
>> switch.opt. This has a really big impact for the personal KVM users who
>> drive KVM in GUI mode on their usual PCs.
>>
>> Next, we notice that switch.opt improved a hundred nano seconds or so for
>> these slots. Although this may sound a bit tiny improvement, we can feel
>> this as a difference of GUI's responses like mouse reactions.
>
> 100 ns... this is a bit on the low side (and if you can measure it
> interactively you have much better reflexes than I).
>
>> To feel the difference, please try GUI on your PC with our patch series!
>
> No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is
> problematic. Have you tried profiling to see where the time is spent
> (well I can guess, clearing the write access from the sptes).

Sorry but no, and I agree with your guess.
Anyway, I want to do some profiling to confirm this guess.


BTW, If we only think about performance improvement of time, optimized
get(get.opt) may be enough at this stage.

But if we consider the future expansion like using user allocated bitmaps,
new API's introduced for switch.opt won't become waste, I think, because we
need a structure to get and export bitmap addresses.


>>
>> In usual workload, the number of dirty pages varies a lot for each
>> iteration
>> and we should gain really a lot for relatively clean cases.
>
> Can you post such a test, for an idle large guest?

OK, I'll do!

>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space
  2010-05-04 13:07 ` [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space Takuya Yoshikawa
@ 2010-05-11  3:28   ` Marcelo Tosatti
  2010-05-12  6:27     ` Takuya Yoshikawa
  0 siblings, 1 reply; 36+ messages in thread
From: Marcelo Tosatti @ 2010-05-11  3:28 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, agraf, kvm-ppc,
	linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo, paulus, avi,
	hpa, tglx

On Tue, May 04, 2010 at 10:07:02PM +0900, Takuya Yoshikawa wrote:
> We move dirty bitmaps to user space.
> 
>  - Allocation and destruction: we use do_mmap() and do_munmap().
>    The new bitmap space is twice longer than the original one and we
>    use the additional space for double buffering: this makes it
>    possible to update the active bitmap while letting the user space
>    read the other one safely. For x86, we can also remove the vmalloc()
>    in kvm_vm_ioctl_get_dirty_log().
> 
>  - Bitmap manipulations: we replace all functions which access dirty
>    bitmaps with *_user() functions.
> 
>  - For ia64: moving the dirty bitmaps of memory slots does not affect
>    ia64 much because it's using a different place to store dirty logs
>    rather than the dirty bitmaps of memory slots: all we have to change
>    are sync and get of dirty log, so we don't need set_bit_user like
>    functions for ia64.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> CC: Avi Kivity <avi@redhat.com>
> CC: Alexander Graf <agraf@suse.de>
> ---
>  arch/ia64/kvm/kvm-ia64.c  |   15 +++++++++-
>  arch/powerpc/kvm/book3s.c |    5 +++-
>  arch/x86/kvm/x86.c        |   25 ++++++++----------
>  include/linux/kvm_host.h  |    3 +-
>  virt/kvm/kvm_main.c       |   62 +++++++++++++++++++++++++++++++++++++-------
>  5 files changed, 82 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 17fd65c..03503e6 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
>  	n = kvm_dirty_bitmap_bytes(memslot);
>  	base = memslot->base_gfn / BITS_PER_LONG;
>  
> +	r = -EFAULT;
> +	if (!access_ok(VERIFY_WRITE, memslot->dirty_bitmap, n))
> +		goto out;
> +
>  	for (i = 0; i < n/sizeof(long); ++i) {
>  		if (dirty_bitmap[base + i])
>  			memslot->is_dirty = true;
>  
> -		memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
> +		if (__put_user(dirty_bitmap[base + i],
> +			       &memslot->dirty_bitmap[i])) {
> +			r = -EFAULT;
> +			goto out;
> +		}
>  		dirty_bitmap[base + i] = 0;
>  	}
>  	r = 0;
> @@ -1858,7 +1866,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  	if (memslot->is_dirty) {
>  		kvm_flush_remote_tlbs(kvm);
>  		n = kvm_dirty_bitmap_bytes(memslot);
> -		memset(memslot->dirty_bitmap, 0, n);
> +		if (clear_user(memslot->dirty_bitmap, n)) {
> +			r = -EFAULT;
> +			goto out;
> +		}
>  		memslot->is_dirty = false;
>  	}
>  	r = 0;
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 4b074f1..2a31d2f 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -1210,7 +1210,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
>  
>  		n = kvm_dirty_bitmap_bytes(memslot);
> -		memset(memslot->dirty_bitmap, 0, n);
> +		if (clear_user(memslot->dirty_bitmap, n)) {
> +			r = -EFAULT;
> +			goto out;
> +		}
>  		memslot->is_dirty = false;
>  	}
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 023c7f8..32a3d94 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2760,40 +2760,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  	/* If nothing is dirty, don't bother messing with page tables. */
>  	if (memslot->is_dirty) {
>  		struct kvm_memslots *slots, *old_slots;
> -		unsigned long *dirty_bitmap;
> +		unsigned long __user *dirty_bitmap;
> +		unsigned long __user *dirty_bitmap_old;
>  
>  		spin_lock(&kvm->mmu_lock);
>  		kvm_mmu_slot_remove_write_access(kvm, log->slot);
>  		spin_unlock(&kvm->mmu_lock);
>  
> -		r = -ENOMEM;
> -		dirty_bitmap = vmalloc(n);
> -		if (!dirty_bitmap)
> +		dirty_bitmap = memslot->dirty_bitmap;
> +		dirty_bitmap_old = memslot->dirty_bitmap_old;
> +		r = -EFAULT;
> +		if (clear_user(dirty_bitmap_old, n))
>  			goto out;
> -		memset(dirty_bitmap, 0, n);
>  
>  		r = -ENOMEM;
>  		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> -		if (!slots) {
> -			vfree(dirty_bitmap);
> +		if (!slots)
>  			goto out;
> -		}
> +
>  		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> -		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
> +		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap_old;
> +		slots->memslots[log->slot].dirty_bitmap_old = dirty_bitmap;
>  		slots->memslots[log->slot].is_dirty = false;
>  
>  		old_slots = kvm->memslots;
>  		rcu_assign_pointer(kvm->memslots, slots);
>  		synchronize_srcu_expedited(&kvm->srcu);
> -		dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
>  		kfree(old_slots);
>  
>  		r = -EFAULT;
> -		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
> -			vfree(dirty_bitmap);
> +		if (copy_in_user(log->dirty_bitmap, dirty_bitmap, n))
>  			goto out;
> -		}
> -		vfree(dirty_bitmap);
>  	} else {
>  		r = -EFAULT;
>  		if (clear_user(log->dirty_bitmap, n))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0aa6ecb..c95e2b7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -116,7 +116,8 @@ struct kvm_memory_slot {
>  	unsigned long npages;
>  	unsigned long flags;
>  	unsigned long *rmap;
> -	unsigned long *dirty_bitmap;
> +	unsigned long __user *dirty_bitmap;
> +	unsigned long __user *dirty_bitmap_old;
>  	bool is_dirty;
>  	struct {
>  		unsigned long rmap_pde;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3e3acad..ddcf65a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -437,8 +437,20 @@ out_err_nodisable:
>  
>  static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
>  {
> -	vfree(memslot->dirty_bitmap);
> +	unsigned long user_addr;
> +	unsigned long n = kvm_dirty_bitmap_bytes(memslot);
> +
> +	if (!memslot->dirty_bitmap)
> +		return;
> +
> +	user_addr = min((unsigned long)memslot->dirty_bitmap,
> +			(unsigned long)memslot->dirty_bitmap_old);
> +	down_write(&current->mm->mmap_sem);
> +	do_munmap(current->mm, user_addr, 2 * n);
> +	up_write(&current->mm->mmap_sem);
> +
>  	memslot->dirty_bitmap = NULL;
> +	memslot->dirty_bitmap_old = NULL;
>  }
>  
>  /*
> @@ -472,8 +484,12 @@ void kvm_free_physmem(struct kvm *kvm)
>  	int i;
>  	struct kvm_memslots *slots = kvm->memslots;
>  
> -	for (i = 0; i < slots->nmemslots; ++i)
> +	for (i = 0; i < slots->nmemslots; ++i) {
> +		/* VM process will exit: we don't unmap by ourselves. */
> +		slots->memslots[i].dirty_bitmap = NULL;
> +		slots->memslots[i].dirty_bitmap_old = NULL;
>  		kvm_free_physmem_slot(&slots->memslots[i], NULL);
> +	}
>  
>  	kfree(kvm->memslots);
>  }
> @@ -527,14 +543,35 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>  
>  static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
>  {
> -	unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
> +	int err;
> +	unsigned long user_addr;
> +	unsigned long n = kvm_dirty_bitmap_bytes(memslot);
>  
> -	memslot->dirty_bitmap = vmalloc(dirty_bytes);
> -	if (!memslot->dirty_bitmap)
> -		return -ENOMEM;
> +	down_write(&current->mm->mmap_sem);
> +	user_addr = do_mmap(NULL, 0, 2 * n,
> +			    PROT_READ | PROT_WRITE,
> +			    MAP_PRIVATE | MAP_ANONYMOUS, 0);
> +	up_write(&current->mm->mmap_sem);
> +
> +	if (IS_ERR((void *)user_addr)) {
> +		err = PTR_ERR((void *)user_addr);
> +		goto out;
> +	}
> +
> +	memslot->dirty_bitmap = (unsigned long __user *)user_addr;
> +	memslot->dirty_bitmap_old = (unsigned long __user *)(user_addr + n);
> +	if (clear_user(memslot->dirty_bitmap, 2 * n)) {
> +		err = -EFAULT;
> +		goto out_unmap;
> +	}
>  
> -	memset(memslot->dirty_bitmap, 0, dirty_bytes);
>  	return 0;
> +out_unmap:
> +	down_write(&current->mm->mmap_sem);
> +	do_munmap(current->mm, user_addr, 2 * n);
> +	up_write(&current->mm->mmap_sem);
> +out:
> +	return err;
>  }
>  
>  /*
> @@ -799,7 +836,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  	n = kvm_dirty_bitmap_bytes(memslot);
>  
>  	r = -EFAULT;
> -	if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
> +	if (copy_in_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
>  		goto out;
>  
>  	r = 0;
> @@ -1195,11 +1232,16 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
>  	gfn = unalias_gfn(kvm, gfn);
>  	memslot = gfn_to_memslot_unaliased(kvm, gfn);
>  	if (memslot && memslot->dirty_bitmap) {
> -		unsigned long rel_gfn = gfn - memslot->base_gfn;
> +		int nr = generic_le_bit_offset(gfn - memslot->base_gfn);
>  
> -		generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
> +		if (kvm_set_bit_user(nr, memslot->dirty_bitmap))
> +			goto out_fault;

mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
Must find a way to move it outside of the spinlock section.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
  2010-05-04 13:08 ` [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps Takuya Yoshikawa
@ 2010-05-11  3:43   ` Marcelo Tosatti
  2010-05-11  5:53     ` Takuya Yoshikawa
  0 siblings, 1 reply; 36+ messages in thread
From: Marcelo Tosatti @ 2010-05-11  3:43 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, agraf, kvm-ppc,
	linux-kernel, yoshikawa.takuya, linuxppc-dev, mingo, paulus, avi,
	hpa, tglx

On Tue, May 04, 2010 at 10:08:21PM +0900, Takuya Yoshikawa wrote:
> Now that dirty bitmaps are accessible from user space, we export the
> addresses of these to achieve zero-copy dirty log check:
> 
>   KVM_GET_USER_DIRTY_LOG_ADDR
> 
> We also need an API for triggering dirty bitmap switch to take the full
> advantage of double buffered bitmaps.
> 
>   KVM_SWITCH_DIRTY_LOG
> 
> See the documentation in this patch for precise explanations.
> 
> About performance improvement: the most important feature of switch API
> is the lightness. In our test, this appeared in the form of improved
> responses for GUI manipulations.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> CC: Avi Kivity <avi@redhat.com>
> CC: Alexander Graf <agraf@suse.de>
> ---
>  Documentation/kvm/api.txt |   87 +++++++++++++++++++++++++++++++++++++++++++++
>  arch/ia64/kvm/kvm-ia64.c  |   27 +++++++++-----
>  arch/powerpc/kvm/book3s.c |   18 +++++++--
>  arch/x86/kvm/x86.c        |   44 ++++++++++++++++-------
>  include/linux/kvm.h       |   11 ++++++
>  include/linux/kvm_host.h  |    4 ++-
>  virt/kvm/kvm_main.c       |   63 +++++++++++++++++++++++++++++----
>  7 files changed, 220 insertions(+), 34 deletions(-)
> 
> diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
> index a237518..c106c83 100644
> --- a/Documentation/kvm/api.txt
> +++ b/Documentation/kvm/api.txt
> @@ -892,6 +892,93 @@ arguments.
>  This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
>  irqchip, the multiprocessing state must be maintained by userspace.
>  
> +4.39 KVM_GET_USER_DIRTY_LOG_ADDR
> +
> +Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see below)
> +Architectures: all
> +Type: vm ioctl
> +Parameters: struct kvm_user_dirty_log (in/out)
> +Returns: 0 on success, -1 on error
> +
> +This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead
> +of the old dirty log manipulation by KVM_GET_DIRTY_LOG.
> +
> +A bit about KVM_CAP_USER_DIRTY_LOG
> +
> +Before this ioctl was introduced, dirty bitmaps for dirty page logging were
> +allocated in the kernel's memory space.  But we have now moved them to user
> +space to get more flexiblity and performance.  To achive this move without
> +breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability
> +into a few generations which can be identified by its check extension
> +return values.
> +
> +This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the
> +KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations.
> +
> +What you get
> +
> +By using this, you can get paired bitmap addresses which are accessible from
> +user space.  See the explanation in 4.40 for the roles of these two bitmaps.
> +
> +How to Get
> +
> +Before calling this, you have to set the slot member of kvm_user_dirty_log
> +to indicate the target memory slot.
> +
> +struct kvm_user_dirty_log {
> +	__u32 slot;
> +	__u32 flags;
> +	__u64 dirty_bitmap;
> +	__u64 dirty_bitmap_old;
> +};
> +
> +The addresses will be set in the paired members: dirty_bitmap and _old.

Why not pass the bitmap address to the kernel, instead of having the
kernel allocate it. Because apparently you plan to do that in a new
generation anyway?

Also, why does the kernel need to know about different bitmaps?

One alternative would be:

KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
bitmap was clean, it returns 0, no switch performed. If the active
bitmap was dirty, the kernel switches to the new bitmap and returns 1.

And the responsability of cleaning the new bitmap could also be left
for userspace.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
  2010-05-11  3:43   ` Marcelo Tosatti
@ 2010-05-11  5:53     ` Takuya Yoshikawa
  2010-05-11 14:07       ` Marcelo Tosatti
  0 siblings, 1 reply; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-11  5:53 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, agraf, kvm-ppc,
	linux-kernel, linuxppc-dev, mingo, paulus, avi, hpa, tglx,
	Takuya Yoshikawa

(2010/05/11 12:43), Marcelo Tosatti wrote:
> On Tue, May 04, 2010 at 10:08:21PM +0900, Takuya Yoshikawa wrote:
>> +How to Get
>> +
>> +Before calling this, you have to set the slot member of kvm_user_dirty_log
>> +to indicate the target memory slot.
>> +
>> +struct kvm_user_dirty_log {
>> +	__u32 slot;
>> +	__u32 flags;
>> +	__u64 dirty_bitmap;
>> +	__u64 dirty_bitmap_old;
>> +};
>> +
>> +The addresses will be set in the paired members: dirty_bitmap and _old.
>
> Why not pass the bitmap address to the kernel, instead of having the
> kernel allocate it. Because apparently you plan to do that in a new
> generation anyway?

Yes, we want to make qemu allocate and free bitmaps in the future.
But currently, those are strictly tied with memory slot registration and
changing it in one patch set is really difficult.

Anyway, we need kernel side allocation mechanism to keep the current
GET_DIRTY_LOG api. I don't mind not exporting kernel allocated bitmaps
in this patch set and later introducing a bitmap registration mechanism
in another patch set.

As this RFC is suggesting, kernel side double buffering infrastructure is
designed as general as possible and adding a new API like SWITCH can be done
naturally.

>
> Also, why does the kernel need to know about different bitmaps?

Because we need to support current GET API. We don't have any way to get
a new bitmap in the case of GET and we don't want to do_mmap() every time
for GET.

>
> One alternative would be:
>
> KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
> bitmap was clean, it returns 0, no switch performed. If the active
> bitmap was dirty, the kernel switches to the new bitmap and returns 1.
>
> And the responsability of cleaning the new bitmap could also be left
> for userspace.
>

That is a beautiful approach but we can do that only when we give up using
GET api.


I follow you and Avi's advice about that kind of maintenance policy!
What do you think?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-10 12:26   ` Takuya Yoshikawa
@ 2010-05-11 10:11     ` Takuya Yoshikawa
  2010-05-13 11:47     ` Avi Kivity
  1 sibling, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-11 10:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, mingo, paulus, hpa, tglx,
	Takuya Yoshikawa


>>>
>>> In usual workload, the number of dirty pages varies a lot for each
>>> iteration
>>> and we should gain really a lot for relatively clean cases.
>>
>> Can you post such a test, for an idle large guest?
>
> OK, I'll do!


Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*****************************************
     get.org     get.opt    switch.opt

     1060875     310292     190335
     1076754     301295     188600
      655504     318284     196029
      529769     301471        325
      694796      70216     221172
      651868     353073     196184
      543339     312865     213236
     1061938      72785     203090
      689527     323901     249519
      621364     323881        473
     1063671      70703     192958
      915903     336318     174008
     1046462     332384        782
     1037942      72783     190655
      680122     318305     243544
      688156     314935     193526
      558658     265934     190550
      652454     372135     196270
      660140      68613        352
     1101947     378642     186575
         ...        ...        ...
*****************************************

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.

And when the slot is cleaner, the ratio is bigger.




>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
  2010-05-11  5:53     ` Takuya Yoshikawa
@ 2010-05-11 14:07       ` Marcelo Tosatti
  2010-05-12  6:03         ` Takuya Yoshikawa
  0 siblings, 1 reply; 36+ messages in thread
From: Marcelo Tosatti @ 2010-05-11 14:07 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, agraf, kvm-ppc,
	linux-kernel, linuxppc-dev, mingo, paulus, avi, hpa, tglx,
	Takuya Yoshikawa

On Tue, May 11, 2010 at 02:53:54PM +0900, Takuya Yoshikawa wrote:
> (2010/05/11 12:43), Marcelo Tosatti wrote:
> >On Tue, May 04, 2010 at 10:08:21PM +0900, Takuya Yoshikawa wrote:
> >>+How to Get
> >>+
> >>+Before calling this, you have to set the slot member of kvm_user_dirty_log
> >>+to indicate the target memory slot.
> >>+
> >>+struct kvm_user_dirty_log {
> >>+	__u32 slot;
> >>+	__u32 flags;
> >>+	__u64 dirty_bitmap;
> >>+	__u64 dirty_bitmap_old;
> >>+};
> >>+
> >>+The addresses will be set in the paired members: dirty_bitmap and _old.
> >
> >Why not pass the bitmap address to the kernel, instead of having the
> >kernel allocate it. Because apparently you plan to do that in a new
> >generation anyway?
> 
> Yes, we want to make qemu allocate and free bitmaps in the future.
> But currently, those are strictly tied with memory slot registration and
> changing it in one patch set is really difficult.
> 
> Anyway, we need kernel side allocation mechanism to keep the current
> GET_DIRTY_LOG api. I don't mind not exporting kernel allocated bitmaps
> in this patch set and later introducing a bitmap registration mechanism
> in another patch set.
> 
> As this RFC is suggesting, kernel side double buffering infrastructure is
> designed as general as possible and adding a new API like SWITCH can be done
> naturally.
> 
> >
> >Also, why does the kernel need to know about different bitmaps?
> 
> Because we need to support current GET API. We don't have any way to get
> a new bitmap in the case of GET and we don't want to do_mmap() every time
> for GET.
> 
> >
> >One alternative would be:
> >
> >KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
> >bitmap was clean, it returns 0, no switch performed. If the active
> >bitmap was dirty, the kernel switches to the new bitmap and returns 1.
> >
> >And the responsability of cleaning the new bitmap could also be left
> >for userspace.
> >
> 
> That is a beautiful approach but we can do that only when we give up using
> GET api.
> 
> 
> I follow you and Avi's advice about that kind of maintenance policy!
> What do you think?

If you introduce a switch ioctl that frees the bitmap vmalloc'ed by the
current set_memory_region (if its not freed already), after pointing the
memslot to the user supplied one, it should be fine?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
                   ` (12 preceding siblings ...)
  2010-05-10 12:06 ` [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Avi Kivity
@ 2010-05-11 15:55 ` Alexander Graf
  2010-05-12  9:19   ` Takuya Yoshikawa
  13 siblings, 1 reply; 36+ messages in thread
From: Alexander Graf @ 2010-05-11 15:55 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti,
	linux-kernel, kvm-ppc, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

Takuya Yoshikawa wrote:
> Hi, sorry for sending from my personal account.
> The following series are all from me:
>
>   From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
>
>   The 3rd version of "moving dirty bitmaps to user space".
>
> From this version, we add x86 and ppc and asm-generic people to CC lists.
>
>
> [To KVM people]
>
> Sorry for being late to reply your comments.
>
> Avi,
>  - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c .
>
>  - I've considered to change the set_bit_user_non_atomic to an inline function,
>    but did not change because the other helpers in the uaccess.h are written as
>    macros. Anyway, I hope that x86 people will give us appropriate suggestions
>    about this.
>
>  - I thought that documenting about making bitmaps 64-bit aligned will be
>    written when we add an API to register user-allocated bitmaps. So probably
>    in the next series.
>
> Avi, Alex,
>  - Could you check the ia64 and ppc parts, please? I tried to keep the logical
>    changes as small as possible.
>
>    I personally tried to build these with cross compilers. For ia64, I could check
>    build success with my patch series. But book3s, even without my patch series,
>    it failed with the following errors:
>
>   arch/powerpc/kvm/book3s_paired_singles.c: In function 'kvmppc_emulate_paired_single':
>   arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 bytes is larger than 2048 bytes
>   make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
>   make: *** [arch/powerpc/kvm] Error 2
>   

This is bad. I haven't encountered that one at all so far, but I guess
my compiler version is different from yours. Sigh.

>
> About changelog: there are two main changes from the 2nd version:
>   1. I changed the treatment of clean slots (see patch 1/12).
>      This was already applied today, thanks!
>   2. I changed the switch API. (see patch 11/12).
>
> To show this API's advantage, I also did a test (see the end of this mail).
>
>
> [To x86 people]
>
> Hi, Thomas, Ingo, Peter,
>
> Please review the patches 4,5/12. Because this is the first experience for
> me to send patches to x86, please tell me if this lacks anything.
>
>
> [To ppc people]
>
> Hi, Benjamin, Paul, Alex,
>
> Please see the patches 6,7/12. I first say sorry for that I've not tested these
> yet. In that sense, these may not be in the quality for precise reviews. But I
> will be happy if you would give me any comments.
>
> Alex, could you help me? Though I have a plan to get PPC box in the future,
> currently I cannot test these.
>   

Could you please point me to a git tree where everything's readily
applied? That would make testing a lot easier.

Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space
  2010-05-04 13:04 ` [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
@ 2010-05-11 16:00   ` Alexander Graf
  2010-05-12  9:25     ` Takuya Yoshikawa
  0 siblings, 1 reply; 36+ messages in thread
From: Alexander Graf @ 2010-05-11 16:00 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti,
	linux-kernel, kvm-ppc, yoshikawa.takuya, linuxppc-dev, mingo,
	paulus, avi, hpa, tglx

Takuya Yoshikawa wrote:
> During the work of KVM's dirty page logging optimization, we encountered
> the need of manipulating bitmaps in user space efficiantly. To achive this,
> we introduce a uaccess function for setting a bit in user space following
> Avi's suggestion.
>
>   KVM is now using dirty bitmaps for live-migration and VGA. Although we need
>   to update them from kernel side, copying them every time for updating the
>   dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
>   manipulation improves responses of GUI manipulations a lot.
>
> We also found one similar need in drivers/vhost/vhost.c in which the author
> implemented set_bit_to_user() locally using inefficient functions: see TODO
> at the top of that.
>
> Probably, this kind of need would be common for virtualization area.
>
> So we introduce a function set_bit_user_non_atomic().
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> CC: Alexander Graf <agraf@suse.de>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/uaccess.h |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 3a01ce8..f878326 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -321,6 +321,25 @@ do {								\
>  	__gu_err;						\
>  })
>  
> +static inline int set_bit_user_non_atomic(int nr, void __user *addr)
> +{
> +	u8 __user *p;
> +	u8 val;
> +
> +	p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
>   

Does C do the + or the / first? Either way, I'd like to see brackets here :)


Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
  2010-05-11 14:07       ` Marcelo Tosatti
@ 2010-05-12  6:03         ` Takuya Yoshikawa
  0 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-12  6:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, agraf, kvm-ppc,
	linux-kernel, linuxppc-dev, mingo, paulus, avi, hpa, tglx,
	Takuya Yoshikawa


>>> One alternative would be:
>>>
>>> KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
>>> bitmap was clean, it returns 0, no switch performed. If the active
>>> bitmap was dirty, the kernel switches to the new bitmap and returns 1.
>>>
>>> And the responsability of cleaning the new bitmap could also be left
>>> for userspace.
>>>
>>
>> That is a beautiful approach but we can do that only when we give up using
>> GET api.
>>
>>
>> I follow you and Avi's advice about that kind of maintenance policy!
>> What do you think?
>
> If you introduce a switch ioctl that frees the bitmap vmalloc'ed by the
> current set_memory_region (if its not freed already), after pointing the
> memslot to the user supplied one, it should be fine?
>

You mean switching from vmalloc'ed(not do_mmap'ed) one to user supplied one?

It may be possible but makes things really complicated in my view:
until some point we use set_bit, and then use set_bit_user, etc.

IMO:
  - # of slots is limited and the size of dirty_bitmap_old pointer is not problematic.
  - Both user side and kernel side need not allocate buffers every time and once
    paired buffers are registered, we will reuse the buffers until user side orders
    to stop logging.
  - We have a tiny advantage if we need not copy_from_user to get a bitmap address
    for switch ioctl.

  => So I think having two __user bitmaps is not a bad thing.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space
  2010-05-11  3:28   ` Marcelo Tosatti
@ 2010-05-12  6:27     ` Takuya Yoshikawa
  0 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-12  6:27 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, agraf, kvm-ppc,
	linux-kernel, linuxppc-dev, mingo, paulus, avi, hpa, tglx,
	Takuya Yoshikawa


>>   	r = 0;
>> @@ -1195,11 +1232,16 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
>>   	gfn = unalias_gfn(kvm, gfn);
>>   	memslot = gfn_to_memslot_unaliased(kvm, gfn);
>>   	if (memslot&&  memslot->dirty_bitmap) {
>> -		unsigned long rel_gfn = gfn - memslot->base_gfn;
>> +		int nr = generic_le_bit_offset(gfn - memslot->base_gfn);
>>
>> -		generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
>> +		if (kvm_set_bit_user(nr, memslot->dirty_bitmap))
>> +			goto out_fault;
>
> mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
> Must find a way to move it outside of the spinlock section.
>

Oh, it's a serious problem. I have to consider it.


Thanks,
   Takuya

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-11 15:55 ` Alexander Graf
@ 2010-05-12  9:19   ` Takuya Yoshikawa
  0 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-12  9:19 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti,
	linux-kernel, kvm-ppc, linuxppc-dev, mingo, paulus, avi, hpa,
	tglx, Takuya Yoshikawa


>> [To ppc people]
>>
>> Hi, Benjamin, Paul, Alex,
>>
>> Please see the patches 6,7/12. I first say sorry for that I've not tested these
>> yet. In that sense, these may not be in the quality for precise reviews. But I
>> will be happy if you would give me any comments.
>>
>> Alex, could you help me? Though I have a plan to get PPC box in the future,
>> currently I cannot test these.
>>
>
> Could you please point me to a git tree where everything's readily
> applied? That would make testing a lot easier.

OK, I'll prepare one. Probably on sourceforge or somewhere like Kemari.

Thanks,
   Takuya


>
> Alex
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space
  2010-05-11 16:00   ` Alexander Graf
@ 2010-05-12  9:25     ` Takuya Yoshikawa
  0 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-12  9:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti,
	linux-kernel, kvm-ppc, linuxppc-dev, mingo, paulus, avi, hpa,
	tglx, Takuya Yoshikawa


>>
>> +static inline int set_bit_user_non_atomic(int nr, void __user *addr)
>> +{
>> +	u8 __user *p;
>> +	u8 val;
>> +
>> +	p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
>>
>
> Does C do the + or the / first? Either way, I'd like to see brackets here :)

OK, I'll change like that! I like brackets style too :)

>
>
> Alex
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-10 12:26   ` Takuya Yoshikawa
  2010-05-11 10:11     ` Takuya Yoshikawa
@ 2010-05-13 11:47     ` Avi Kivity
  2010-05-17  9:06       ` Takuya Yoshikawa
  1 sibling, 1 reply; 36+ messages in thread
From: Avi Kivity @ 2010-05-13 11:47 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: linux-arch, x86, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, mingo, paulus, hpa, tglx,
	Takuya Yoshikawa

On 05/10/2010 03:26 PM, Takuya Yoshikawa wrote:
>> No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is
>> problematic. Have you tried profiling to see where the time is spent
>> (well I can guess, clearing the write access from the sptes).
>
>
> Sorry but no, and I agree with your guess.
> Anyway, I want to do some profiling to confirm this guess.
>
>
> BTW, If we only think about performance improvement of time, optimized
> get(get.opt) may be enough at this stage.
>
> But if we consider the future expansion like using user allocated 
> bitmaps,
> new API's introduced for switch.opt won't become waste, I think, 
> because we
> need a structure to get and export bitmap addresses.

User allocated bitmaps have the advantage of reducing pinned memory.  
However we have plenty more pinned memory allocated in memory slots, so 
by itself, user allocated bitmaps don't justify this change.

Perhaps if we optimize memory slot write protection (I have some ideas 
about this) we can make the performance improvement more pronounced.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
  2010-05-13 11:47     ` Avi Kivity
@ 2010-05-17  9:06       ` Takuya Yoshikawa
  0 siblings, 0 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-17  9:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, mtosatti, agraf,
	kvm-ppc, linux-kernel, linuxppc-dev, Takuya Yoshikawa


> User allocated bitmaps have the advantage of reducing pinned memory.
> However we have plenty more pinned memory allocated in memory slots, so
> by itself, user allocated bitmaps don't justify this change.

In that sense, what do you think about the question I sent last week?

=== REPOST 1 ===
 >>
 >> mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
 >> Must find a way to move it outside of the spinlock section.
 >>
 >
 > Oh, it's a serious problem. I have to consider it.

Avi, Marcelo,

Sorry but I have to say that mmu_lock spin_lock problem was completely out of
my mind. Although I looked through the code, it seems not easy to move the
set_bit_user to outside of spinlock section without breaking the semantics of
its protection.

So this may take some time to solve.

But personally, I want to do something for x86's "vmallc() every time" problem
even though moving dirty bitmaps to user space cannot be achieved soon.

In that sense, do you mind if we do double buffering without moving dirty bitmaps to
user space?

I know that the resource for vmalloc() is precious for x86 but even now, at the timing
of get_dirty_log, we use the same amount of memory as double buffering.
=== 1 END ===


>
> Perhaps if we optimize memory slot write protection (I have some ideas
> about this) we can make the performance improvement more pronounced.
>

It's really nice!

Even now we can measure the performance improvement by introducing switch ioctl
when guest is relatively idle, so the combination will be really effective!

=== REPOST 2 ===
 >>
 >> Can you post such a test, for an idle large guest?
 >
 > OK, I'll do!


Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*****************************************
     get.org     get.opt    switch.opt

     1060875     310292     190335
     1076754     301295     188600
      655504     318284     196029
      529769     301471        325
      694796      70216     221172
      651868     353073     196184
      543339     312865     213236
     1061938      72785     203090
      689527     323901     249519
      621364     323881        473
     1063671      70703     192958
      915903     336318     174008
     1046462     332384        782
     1037942      72783     190655
      680122     318305     243544
      688156     314935     193526
      558658     265934     190550
      652454     372135     196270
      660140      68613        352
     1101947     378642     186575
         ...        ...        ...
*****************************************

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.

And when the slot is cleaner, the ratio is bigger.
=== 2 END ===

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-05-17  9:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
2010-05-04 12:58 ` [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean Takuya Yoshikawa
2010-05-04 13:00 ` [RFC][PATCH 2/12] KVM: introduce slot level dirty state management Takuya Yoshikawa
2010-05-04 13:01 ` [RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps Takuya Yoshikawa
2010-05-04 13:02 ` [RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit Takuya Yoshikawa
2010-05-04 13:02 ` [RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
2010-05-04 13:03 ` [RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit Takuya Yoshikawa
2010-05-04 13:04 ` [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
2010-05-11 16:00   ` Alexander Graf
2010-05-12  9:25     ` Takuya Yoshikawa
2010-05-04 13:05 ` [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro Takuya Yoshikawa
2010-05-04 15:03   ` Arnd Bergmann
2010-05-04 16:08     ` Avi Kivity
2010-05-05  2:59       ` Takuya Yoshikawa
2010-05-06 13:38         ` Arnd Bergmann
2010-05-10 11:46           ` Takuya Yoshikawa
2010-05-10 12:01             ` Avi Kivity
2010-05-10 12:01             ` Arnd Bergmann
2010-05-10 12:09               ` Takuya Yoshikawa
2010-05-04 13:06 ` [RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic() Takuya Yoshikawa
2010-05-04 13:07 ` [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space Takuya Yoshikawa
2010-05-11  3:28   ` Marcelo Tosatti
2010-05-12  6:27     ` Takuya Yoshikawa
2010-05-04 13:08 ` [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps Takuya Yoshikawa
2010-05-11  3:43   ` Marcelo Tosatti
2010-05-11  5:53     ` Takuya Yoshikawa
2010-05-11 14:07       ` Marcelo Tosatti
2010-05-12  6:03         ` Takuya Yoshikawa
2010-05-04 13:11 ` [RFC][PATCH 12/12 sample] qemu-kvm: use " Takuya Yoshikawa
2010-05-10 12:06 ` [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Avi Kivity
2010-05-10 12:26   ` Takuya Yoshikawa
2010-05-11 10:11     ` Takuya Yoshikawa
2010-05-13 11:47     ` Avi Kivity
2010-05-17  9:06       ` Takuya Yoshikawa
2010-05-11 15:55 ` Alexander Graf
2010-05-12  9:19   ` Takuya Yoshikawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).