linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
@ 2010-05-04 12:56 Takuya Yoshikawa
  2010-05-04 12:58 ` [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean Takuya Yoshikawa
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Takuya Yoshikawa @ 2010-05-04 12:56 UTC (permalink / raw)
  To: avi, mtosatti, agraf
  Cc: linux-arch, arnd, kvm, kvm-ia64, fernando, x86, linux-kernel,
	kvm-ppc, yoshikawa.takuya, linuxppc-dev, mingo, paulus, hpa,
	tglx

Hi, sorry for sending from my personal account.
The following series are all from me:

  From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

  The 3rd version of "moving dirty bitmaps to user space".

>From this version, we add x86 and ppc and asm-generic people to CC lists.


[To KVM people]

Sorry for being late to reply your comments.

Avi,
 - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c .

 - I've considered to change the set_bit_user_non_atomic to an inline function,
   but did not change because the other helpers in the uaccess.h are written as
   macros. Anyway, I hope that x86 people will give us appropriate suggestions
   about this.

 - I thought that documenting about making bitmaps 64-bit aligned will be
   written when we add an API to register user-allocated bitmaps. So probably
   in the next series.

Avi, Alex,
 - Could you check the ia64 and ppc parts, please? I tried to keep the logical
   changes as small as possible.

   I personally tried to build these with cross compilers. For ia64, I could check
   build success with my patch series. But book3s, even without my patch series,
   it failed with the following errors:

  arch/powerpc/kvm/book3s_paired_singles.c: In function 'kvmppc_emulate_paired_single':
  arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 bytes is larger than 2048 bytes
  make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
  make: *** [arch/powerpc/kvm] Error 2


About changelog: there are two main changes from the 2nd version:
  1. I changed the treatment of clean slots (see patch 1/12).
     This was already applied today, thanks!
  2. I changed the switch API. (see patch 11/12).

To show this API's advantage, I also did a test (see the end of this mail).


[To x86 people]

Hi, Thomas, Ingo, Peter,

Please review the patches 4,5/12. Because this is the first experience for
me to send patches to x86, please tell me if this lacks anything.


[To ppc people]

Hi, Benjamin, Paul, Alex,

Please see the patches 6,7/12. I first say sorry for that I've not tested these
yet. In that sense, these may not be in the quality for precise reviews. But I
will be happy if you would give me any comments.

Alex, could you help me? Though I have a plan to get PPC box in the future,
currently I cannot test these.



[To asm-generic people]

Hi, Arnd,

Please review the patch 8/12. This kind of macro is acceptable?





[Performance test]

We measured the tsc needed to the ioctl()s for getting dirty logs in
kernel.

Test environment

  AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory


1. GUI test (running Ubuntu guest in graphical mode)

  sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ...

We show a relatively stable part to compare how much time is needed
for the basic parts of dirty log ioctl.

                           get.org   get.opt  switch.opt

slots[7].len=32768          278379     66398     64024
slots[8].len=32768          181246       270       160
slots[7].len=32768          263961     64673     64494
slots[8].len=32768          181655       265       160
slots[7].len=32768          263736     64701     64610
slots[8].len=32768          182785       267       160
slots[7].len=32768          260925     65360     65042
slots[8].len=32768          182579       264       160
slots[7].len=32768          267823     65915     65682
slots[8].len=32768          186350       271       160

At a glance, we know our optimization improved significantly compared
to the original get dirty log ioctl. This is true for both get.opt and
switch.opt. This has a really big impact for the personal KVM users who
drive KVM in GUI mode on their usual PCs.

Next, we notice that switch.opt improved a hundred nano seconds or so for
these slots. Although this may sound a bit tiny improvement, we can feel
this as a difference of GUI's responses like mouse reactions.

To feel the difference, please try GUI on your PC with our patch series!


2. Live-migration test (4GB guest, write loop with 1GB buf)

We also did a live-migration test.

                           get.org   get.opt  switch.opt

slots[0].len=655360         797383    261144    222181
slots[1].len=3757047808    2186721   1965244   1842824
slots[2].len=637534208     1433562   1012723   1031213
slots[3].len=131072         216858       331       331
slots[4].len=131072         121635       225       164
slots[5].len=131072         120863       356       164
slots[6].len=16777216       121746      1133       156
slots[7].len=32768          120415       230       278
slots[8].len=32768          120368       216       149
slots[0].len=655360         806497    194710    223582
slots[1].len=3757047808    2142922   1878025   1895369
slots[2].len=637534208     1386512   1021309   1000345
slots[3].len=131072         221118       459       296
slots[4].len=131072         121516       272       166
slots[5].len=131072         122652       244       173
slots[6].len=16777216       123226     99185       149
slots[7].len=32768          121803       457       505
slots[8].len=32768          121586       216       155
slots[0].len=655360         766113    211317    213179
slots[1].len=3757047808    2155662   1974790   1842361
slots[2].len=637534208     1481411   1020004   1031352
slots[3].len=131072         223100       351       295
slots[4].len=131072         122982       436       164
slots[5].len=131072         122100       300       503
slots[6].len=16777216       123653       779       151
slots[7].len=32768          122617       284       157
slots[8].len=32768          122737       253       149

For slots other than 0,1,2 we can see the similar improvement.

Considering the fact that switch.opt does not depend on the bitmap length
except for kvm_mmu_slot_remove_write_access(), this is the cause of some
usec to msec time consumption: there might be some context switches.

But note that this was done with the workload which dirtied the memory
endlessly during the live-migration.

In usual workload, the number of dirty pages varies a lot for each iteration
and we should gain really a lot for relatively clean cases.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-05-17  9:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-04 12:56 [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Takuya Yoshikawa
2010-05-04 12:58 ` [RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean Takuya Yoshikawa
2010-05-04 13:00 ` [RFC][PATCH 2/12] KVM: introduce slot level dirty state management Takuya Yoshikawa
2010-05-04 13:01 ` [RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps Takuya Yoshikawa
2010-05-04 13:02 ` [RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit Takuya Yoshikawa
2010-05-04 13:02 ` [RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
2010-05-04 13:03 ` [RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit Takuya Yoshikawa
2010-05-04 13:04 ` [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space Takuya Yoshikawa
2010-05-11 16:00   ` Alexander Graf
2010-05-12  9:25     ` Takuya Yoshikawa
2010-05-04 13:05 ` [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro Takuya Yoshikawa
2010-05-04 15:03   ` Arnd Bergmann
2010-05-04 16:08     ` Avi Kivity
2010-05-05  2:59       ` Takuya Yoshikawa
2010-05-06 13:38         ` Arnd Bergmann
2010-05-10 11:46           ` Takuya Yoshikawa
2010-05-10 12:01             ` Avi Kivity
2010-05-10 12:01             ` Arnd Bergmann
2010-05-10 12:09               ` Takuya Yoshikawa
2010-05-04 13:06 ` [RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic() Takuya Yoshikawa
2010-05-04 13:07 ` [RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space Takuya Yoshikawa
2010-05-11  3:28   ` Marcelo Tosatti
2010-05-12  6:27     ` Takuya Yoshikawa
2010-05-04 13:08 ` [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps Takuya Yoshikawa
2010-05-11  3:43   ` Marcelo Tosatti
2010-05-11  5:53     ` Takuya Yoshikawa
2010-05-11 14:07       ` Marcelo Tosatti
2010-05-12  6:03         ` Takuya Yoshikawa
2010-05-04 13:11 ` [RFC][PATCH 12/12 sample] qemu-kvm: use " Takuya Yoshikawa
2010-05-10 12:06 ` [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space Avi Kivity
2010-05-10 12:26   ` Takuya Yoshikawa
2010-05-11 10:11     ` Takuya Yoshikawa
2010-05-13 11:47     ` Avi Kivity
2010-05-17  9:06       ` Takuya Yoshikawa
2010-05-11 15:55 ` Alexander Graf
2010-05-12  9:19   ` Takuya Yoshikawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).