All of lore.kernel.org
 help / color / mirror / Atom feed
From: Isaku Yamahata <yamahata@valinux.co.jp>
To: thfbjyddx <thfbjyddx@hotmail.com>
Cc: "t.hirofuchi" <t.hirofuchi@aist.go.jp>,
	qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>,
	"satoshi.itoh" <satoshi.itoh@aist.go.jp>
Subject: Re: 回??: [PATCH 00/21][RFC] postcopy live?migration
Date: Mon, 16 Jan 2012 19:17:55 +0900	[thread overview]
Message-ID: <20120116101754.GB18937@valinux.co.jp> (raw)
In-Reply-To: <20120116065116.GB28686@valinux.co.jp>

On Mon, Jan 16, 2012 at 03:51:16PM +0900, Isaku Yamahata wrote:
> Thank you for your info.
> I suppose I found the cause, MSR_KVM_WALL_CLOCK and MSR_KVM_SYSTEM_TIME.
> Your kernel enables KVM paravirt_ops, right?
> 
> Although I'm preparing the next path series including the fixes,
> you can also try postcopy by disabling paravirt_ops or disabling kvm
> (use tcg i.e. -machine accel:tcg).

Disabling KVM pv clock would be ok.
Passing no-kvmclock to guest kernel disables it.

> thanks,
> 
> 
> On Thu, Jan 12, 2012 at 09:26:03PM +0800, thfbjyddx wrote:
> >  
> > Do you know what wchan the process was blocked at?
> > kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) doesn't seem to block.
> >  
> > It's
> > WCHAN              COMMAND
> > umem_fault------qemu-system-x86
> >  
> >  
> > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > Tommy
> >  
> > From: Isaku Yamahata
> > Date: 2012-01-12 16:54
> > To: thfbjyddx
> > CC: t.hirofuchi; qemu-devel; kvm; satoshi.itoh
> > Subject: Re: [Qemu-devel]回??: [PATCH 00/21][RFC] postcopy live?migration
> > On Thu, Jan 12, 2012 at 04:29:44PM +0800, thfbjyddx wrote:
> > > Hi , I've dug more thess days
> > >  
> > > > (qemu) migration-tcp: Attempting to start an incoming migration
> > > > migration-tcp: accepted migration
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > > > 4872:4872 postcopy_incoming_ram_load:1057: done
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > >  
> > > There should be only single EOS line. Just copy & past miss?
> > >  
> > > There must be two EOS for one is coming from postcopy_outgoing_ram_save_live
> > > (...stage == QEMU_SAVE_LIVE_STAGE_PART) and the other is
> > > postcopy_outgoing_ram_save_live(...stage == QEMU_SAVE_LIVE_STAGE_END)
> > > I think in postcopy the ram_save_live in the iterate part can be ignore
> > > so why there still have the qemu_put_byte(f, QEMU_VM_SECTON_PART) and
> > > qemu_put_byte(f, QEMU_VM_SECTON_END) in the procedure? Is it essential?
> >  
> > Not so essential.
> >  
> > > Can you please track it down one more step?
> > > Which line did it stuck in kvm_put_msrs()? kvm_put_msrs() doesn't seem to
> > > block.(backtrace by the debugger would be best.)
> > >
> > > it gets to the kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) and never return
> > > so it gets stuck
> >  
> > Do you know what wchan the process was blocked at?
> > kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) doesn't seem to block.
> >  
> >  
> > > when I check the EOS problem
> > > I just annotated the qemu_put_byte(f, QEMU_VM_SECTION_PART);
> >  and qemu_put_be32
> > > (f, se->section_id)
> > >  (I think this is a wrong way to fix it and I don't know how it get through)
> > > and leave just the se->save_live_state in the qemu_savevm_state_iterate
> > > it didn't get stuck at kvm_put_msrs()
> > > but it has some other error
> > > (qemu) migration-tcp: Attempting to start an incoming migration
> > > migration-tcp: accepted migration
> > > 2126:2126 postcopy_incoming_ram_load:1018: incoming ram load
> > > 2126:2126 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > > 2126:2126 postcopy_incoming_ram_load:1057: done
> > > migration: successfully loaded vm state
> > > 2126:2126 postcopy_incoming_fork_umemd:1069: fork
> > > 2126:2126 postcopy_incoming_fork_umemd:1127: qemu pid: 2126 daemon pid: 2129
> > > 2130:2130 postcopy_incoming_umemd:1840: daemon pid: 2130
> > > 2130:2130 postcopy_incoming_umemd:1875: entering umemd main loop
> > > Can't find block !
> > > 2130:2130 postcopy_incoming_umem_ram_load:1526: shmem == NULL
> > > 2130:2130 postcopy_incoming_umemd:1882: exiting umemd main loop
> > > and at the same time , the destination node didn't show the EOS
> > >  
> > > so I still can't solve the stuck problem
> > > Thanks for your help~!
> > > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > ━
> > > Tommy
> > >  
> > > From: Isaku Yamahata
> > > Date: 2012-01-11 10:45
> > > To: thfbjyddx
> > > CC: t.hirofuchi; qemu-devel; kvm; satoshi.itoh
> > > Subject: Re: [Qemu-devel]回??: [PATCH 00/21][RFC] postcopy live migration
> > > On Sat, Jan 07, 2012 at 06:29:14PM +0800, thfbjyddx wrote:
> > > > Hello all!
> > >  
> > > Hi, thank you for detailed report. The procedure you've tried looks
> > > good basically. Some comments below.
> > >  
> > > > I got the qemu basic version(03ecd2c80a64d030a22fe67cc7a60f24e17ff211) and
> > > > patched it correctly
> > > > but it still didn't make sense and I got the same scenario as before
> > > > outgoing node intel x86_64; incoming node amd x86_64. guest image is on nfs
> > > >  
> > > >
> >  I think I should show what I do more clearly and hope somebody can figure out
> > > > the problem
> > > > 
> > > >  ・ 1, both in/out node patch the qemu and start on 3.1.7 kernel with umem
> > > > 
> > > >        ./configure --target-list=
> > > x86_64-softmmu --enable-kvm --enable-postcopy
> > > > --enable-debug
> > > >        make
> > > >        make install
> > > > 
> > > >  ・ 2, outgoing qemu:
> > > > 
> > > > qemu-system-x86_64 -m 256 -hda xxx -monitor stdio -vnc: 2 -usbdevice tablet
> > > > -machine accel=kvm
> > > > incoming qemu:
> > > > qemu-system-x86_64 -m 256 -hda xxx -postcopy -incoming tcp:0:8888 -monitor
> > > > stdio -vnc: 2 -usbdevice tablet -machine accel=kvm
> > > > 
> > > >  ・ 3, outgoing node:
> > > > 
> > > > migrate -d -p -n tcp:(incoming node ip):8888
> > > >  
> > > > result:
> > > > 
> > > >  ・ outgoing qemu:
> > > > 
> > > > info status: VM-status: paused (finish-migrate);
> > > > 
> > > >  ・ incoming qemu:
> > > > 
> > > > can't type any more and can't kill the process(qemu-system-x86)
> > > >  
> > > > I open the debug flag in migration.c migration-tcp.c migration-postcopy.c:
> > > > 
> > > >  ・ outgoing qemu:
> > > > 
> > > > (qemu) migration-tcp: connect completed
> > > > migration: beginning savevm
> > > > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 1
> > > > migration: iterate
> > > > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 2
> > > > migration: done iterating
> > > > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 3
> > > > 4500:4500 postcopy_outgoing_begin:716: outgoing begin
> > > > 
> > > >  ・ incoming qemu:
> > > > 
> > > > (qemu) migration-tcp: Attempting to start an incoming migration
> > > > migration-tcp: accepted migration
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > > > 4872:4872 postcopy_incoming_ram_load:1057: done
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > >  
> > > There should be only single EOS line. Just copy & past miss?
> > >  
> > >  
> > > > from the result:
> > > > It didn't get to the "successfully loaded vm state"
> > > > So it still in the qemu_loadvm_state, and I found it's in
> > > > cpu_synchronize_all_post_init->kvm_arch_put_registers->kvm_put_msrs and got
> > > > stuck
> > >  
> > > Can you please track it down one more step?
> > > Which line did it stuck in kvm_put_msrs()? kvm_put_msrs() doesn't seem to
> > > block.(backtrace by the debugger would be best.)
> > >  
> > > If possible, can you please test with more simplified configuration.
> > > i.e. drop device as much as possible i.e. no usbdevice, no disk...
> > > So the debug will be simplified.
> > >  
> > > thanks,
> > >  
> > > > Does anyone give some advises on the problem?
> > > > Thanks very much~
> > > >  
> > > > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > ━
> > > ━
> > > > Tommy
> > > >  
> > > > From: Isaku Yamahata
> > > > Date: 2011-12-29 09:25
> > > > To: kvm; qemu-devel
> > > > CC: yamahata; t.hirofuchi; satoshi.itoh
> > > > Subject: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
> > > > Intro
> > > > =====
> > > > This patch series implements postcopy live migration.[1]
> > > > As discussed at KVM forum 2011, dedicated character device is used for
> > > > distributed shared memory between migration source and destination.
> > > > Now we can discuss/benchmark/compare with precopy. I believe there are
> > > > much rooms for improvement.
> > > >  
> > > > [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
> > > >  
> > > >  
> > > > Usage
> > > > =====
> > > > You need load umem character device on the host before starting migration.
> > > > Postcopy can be used for tcg and kvm accelarator. The implementation depend
> > > > on only linux umem character device. But the driver dependent code is split
> > > > into a file.
> > > > I tested only host page size ==
> >  guest page size case, but the implementation
> > > > allows host page size != guest page size case.
> > > >  
> > > > The following options are added with this patch series.
> > > > - incoming part
> > > >   command line options
> > > >   -postcopy [-postcopy-flags <flags>]
> > > >   where flags is for changing behavior for benchmark/debugging
> > > >   Currently the following flags are available
> > > >   0: default
> > > >   1: enable touching page request
> > > >  
> > > >   example:
> > > >   qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
> > > >  
> > > > - outging part
> > > >   options for migrate command 
> > > >   migrate [-p [-n]] URI
> > > >   -p: indicate postcopy migration
> > > >   -n: disable background transferring pages: This is for benchmark/
> > debugging
> > > >  
> > > >   example:
> > > >   migrate -p -n tcp:<dest ip address>:4444
> > > >  
> > > >  
> > > > TODO
> > > > ====
> > > > - benchmark/evaluation. Especially how async page fault affects the result.
> > > > - improve/optimization
> > > >   At the moment at least what I'm aware of is
> > > >   - touching pages in incoming qemu process by fd handler seems suboptimal.
> > > >     creating dedicated thread?
> > > >   - making incoming socket non-blocking
> > > >   - outgoing handler seems suboptimal causing latency.
> > > > - catch up memory API change
> > > > - consider on FUSE/CUSE possibility
> > > > - and more...
> > > >  
> > > > basic postcopy work flow
> > > > ========================
> > > >         qemu on the destination
> > > >               |
> > > >               V
> > > >         open(/dev/umem)
> > > >               |
> > > >               V
> > > >         UMEM_DEV_CREATE_UMEM
> > > >               |
> > > >               V
> > > >         Here we have two file descriptors to
> > > >         umem device and shmem file
> > > >               |
> > > >               |                                  umemd
> > > >               |                                  daemon on the destination
> > > >               |
> > > >               V    create pipe to communicate
> > > >         fork()---------------------------------------,
> > > >               |                                      |
> > > >               V                                      |
> > > >         close(socket)                                V
> > > >         close(shmem)                              mmap(shmem file)
> > > >               |                                      |
> > > >               V                                      V
> > > >         mmap(umem device) for guest RAM           close(shmem file)
> > > >               |                                      |
> > > >         close(umem device)                           |
> > > >               |                                      |
> > > >               V                                      |
> > > >         wait for ready from daemon <----pipe-----send ready message
> > > >               |                                      |
> > > >               |                                 Here the daemon takes over 
> > > >         send ok------------pipe---------------> the owner of the socket    
> > > >               |         to the source              
> > > >               V                                      |
> > > >         entering post copy stage                     |
> > > >         start guest execution                        |
> > > >               |                                      |
> > > >               V                                      V
> > > >         access guest RAM                          UMEM_GET_PAGE_REQUEST
> > > >               |                                      |
> > > >               V                                      V
> > > >         page fault ------------------------------>page offset is returned
> > > >         block                                        |
> > > >                                                      V
> > > >                                                   pull page from the source
> > > >                                                   write the page contents
> > > >                                                   to the shmem.
> > > >                                                      |
> > > >                                                      V
> > > >         unblock     <-----------------------------UMEM_MARK_PAGE_CACHED
> > > >         the fault handler returns the page
> > > >         page fault is resolved
> > > >               |
> > > >               |                                   pages can be sent
> > > >               |                                   backgroundly
> > > >               |                                      |
> > > >               |                                      V
> > > >               |                                   UMEM_MARK_PAGE_CACHED
> > > >               |                                      |
> > > >               V                                      V
> > > >         The specified pages<-----pipe------------request to touch pages
> > > >         are made present by                          |
> > > >         touching guest RAM.                          |
> > > >               |                                      |
> > > >               V                                      V
> > > >              reply-------------pipe-------------> release the cached page
> > > >               |                                   madvise(MADV_REMOVE)
> > > >               |                                      |
> > > >               V                                      V
> > > >  
> > > >                  all the pages are pulled from the source
> > > >  
> > > >               |                                      |
> > > >               V                                      V
> > > >         the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
> > > >        (note: I'm not sure if this can be implemented or not)
> > > >               |                                      |
> > > >               V                                      V
> > > >         migration completes                        exit()
> > > >  
> > > >  
> > > >  
> > > > Isaku Yamahata (21):
> > > >   arch_init: export sort_ram_list() and ram_save_block()
> > > >   arch_init: export RAM_SAVE_xxx flags for postcopy
> > > >   arch_init/ram_save: introduce constant for ram save version = 4
> > > >   arch_init: refactor host_from_stream_offset()
> > > >   arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
> > > >   arch_init: refactor ram_save_block()
> > > >   arch_init/ram_save_live: factor out ram_save_limit
> > > >   arch_init/ram_load: refactor ram_load
> > > >   exec.c: factor out qemu_get_ram_ptr()
> > > >   exec.c: export last_ram_offset()
> > > >   savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
> > > >   savevm: qemu_pending_size() to return pending buffered size
> > > >   savevm, buffered_file: introduce method to drain buffer of buffered
> > > >     file
> > > >   migration: export migrate_fd_completed() and migrate_fd_cleanup()
> > > >   migration: factor out parameters into MigrationParams
> > > >   umem.h: import Linux umem.h
> > > >   update-linux-headers.sh: teach umem.h to update-linux-headers.sh
> > > >   configure: add CONFIG_POSTCOPY option
> > > >   postcopy: introduce -postcopy and -postcopy-flags option
> > > >   postcopy outgoing: add -p and -n option to migrate command
> > > >   postcopy: implement postcopy livemigration
> > > >  
> > > >  Makefile.target                 |    4 +
> > > >  arch_init.c                     |  260 ++++---
> > > >  arch_init.h                     |   20 +
> > > >  block-migration.c               |    8 +-
> > > >  buffered_file.c                 |   20 +-
> > > >  buffered_file.h                 |    1 +
> > > >  configure                       |   12 +
> > > >  cpu-all.h                       |    9 +
> > > >  exec-obsolete.h                 |    1 +
> > > >  exec.c                          |   75 +-
> > > >  hmp-commands.hx                 |   12 +-
> > > >  hw/hw.h                         |    7 +-
> > > >  linux-headers/linux/umem.h      |   83 ++
> > > >  migration-exec.c                |    8 +
> > > >  migration-fd.c                  |   30 +
> > > >  migration-postcopy-stub.c       |   77 ++
> > > >  migration-postcopy.c            |
> > >  1891 +++++++++++++++++++++++++++++++++++++++
> > > >  migration-tcp.c                 |   37 +-
> > > >  migration-unix.c                |   32 +-
> > > >  migration.c                     |   53 +-
> > > >  migration.h                     |   49 +-
> > > >  qemu-common.h                   |    2 +
> > > >  qemu-options.hx                 |   25 +
> > > >  qmp-commands.hx                 |   10 +-
> > > >  savevm.c                        |   31 +-
> > > >  scripts/update-linux-headers.sh |    2 +-
> > > >  sysemu.h                        |    4 +-
> > > >  umem.c                          |  379 ++++++++
> > > >  umem.h                          |  105 +++
> > > >  vl.c                            |   20 +-
> > > >  30 files changed, 3086 insertions(+), 181 deletions(-)
> > > >  create mode 100644 linux-headers/linux/umem.h
> > > >  create mode 100644 migration-postcopy-stub.c
> > > >  create mode 100644 migration-postcopy.c
> > > >  create mode 100644 umem.c
> > > >  create mode 100644 umem.h
> > > >  
> > > >  
> > > >  
> > >  
> > > -- 
> > > yamahata
> > >  
> > >  
> >  
> > -- 
> > yamahata
> >  
> >  
> 
> -- 
> yamahata
> 

-- 
yamahata

WARNING: multiple messages have this Message-ID (diff)
From: Isaku Yamahata <yamahata@valinux.co.jp>
To: thfbjyddx <thfbjyddx@hotmail.com>
Cc: "t.hirofuchi" <t.hirofuchi@aist.go.jp>,
	qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>,
	"satoshi.itoh" <satoshi.itoh@aist.go.jp>
Subject: Re: [Qemu-devel] 回??: [PATCH 00/21][RFC] postcopy live?migration
Date: Mon, 16 Jan 2012 19:17:55 +0900	[thread overview]
Message-ID: <20120116101754.GB18937@valinux.co.jp> (raw)
In-Reply-To: <20120116065116.GB28686@valinux.co.jp>

On Mon, Jan 16, 2012 at 03:51:16PM +0900, Isaku Yamahata wrote:
> Thank you for your info.
> I suppose I found the cause, MSR_KVM_WALL_CLOCK and MSR_KVM_SYSTEM_TIME.
> Your kernel enables KVM paravirt_ops, right?
> 
> Although I'm preparing the next path series including the fixes,
> you can also try postcopy by disabling paravirt_ops or disabling kvm
> (use tcg i.e. -machine accel:tcg).

Disabling KVM pv clock would be ok.
Passing no-kvmclock to guest kernel disables it.

> thanks,
> 
> 
> On Thu, Jan 12, 2012 at 09:26:03PM +0800, thfbjyddx wrote:
> >  
> > Do you know what wchan the process was blocked at?
> > kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) doesn't seem to block.
> >  
> > It's
> > WCHAN              COMMAND
> > umem_fault------qemu-system-x86
> >  
> >  
> > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > Tommy
> >  
> > From: Isaku Yamahata
> > Date: 2012-01-12 16:54
> > To: thfbjyddx
> > CC: t.hirofuchi; qemu-devel; kvm; satoshi.itoh
> > Subject: Re: [Qemu-devel]回??: [PATCH 00/21][RFC] postcopy live?migration
> > On Thu, Jan 12, 2012 at 04:29:44PM +0800, thfbjyddx wrote:
> > > Hi , I've dug more thess days
> > >  
> > > > (qemu) migration-tcp: Attempting to start an incoming migration
> > > > migration-tcp: accepted migration
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > > > 4872:4872 postcopy_incoming_ram_load:1057: done
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > >  
> > > There should be only single EOS line. Just copy & past miss?
> > >  
> > > There must be two EOS for one is coming from postcopy_outgoing_ram_save_live
> > > (...stage == QEMU_SAVE_LIVE_STAGE_PART) and the other is
> > > postcopy_outgoing_ram_save_live(...stage == QEMU_SAVE_LIVE_STAGE_END)
> > > I think in postcopy the ram_save_live in the iterate part can be ignore
> > > so why there still have the qemu_put_byte(f, QEMU_VM_SECTON_PART) and
> > > qemu_put_byte(f, QEMU_VM_SECTON_END) in the procedure? Is it essential?
> >  
> > Not so essential.
> >  
> > > Can you please track it down one more step?
> > > Which line did it stuck in kvm_put_msrs()? kvm_put_msrs() doesn't seem to
> > > block.(backtrace by the debugger would be best.)
> > >
> > > it gets to the kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) and never return
> > > so it gets stuck
> >  
> > Do you know what wchan the process was blocked at?
> > kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) doesn't seem to block.
> >  
> >  
> > > when I check the EOS problem
> > > I just annotated the qemu_put_byte(f, QEMU_VM_SECTION_PART);
> >  and qemu_put_be32
> > > (f, se->section_id)
> > >  (I think this is a wrong way to fix it and I don't know how it get through)
> > > and leave just the se->save_live_state in the qemu_savevm_state_iterate
> > > it didn't get stuck at kvm_put_msrs()
> > > but it has some other error
> > > (qemu) migration-tcp: Attempting to start an incoming migration
> > > migration-tcp: accepted migration
> > > 2126:2126 postcopy_incoming_ram_load:1018: incoming ram load
> > > 2126:2126 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > > 2126:2126 postcopy_incoming_ram_load:1057: done
> > > migration: successfully loaded vm state
> > > 2126:2126 postcopy_incoming_fork_umemd:1069: fork
> > > 2126:2126 postcopy_incoming_fork_umemd:1127: qemu pid: 2126 daemon pid: 2129
> > > 2130:2130 postcopy_incoming_umemd:1840: daemon pid: 2130
> > > 2130:2130 postcopy_incoming_umemd:1875: entering umemd main loop
> > > Can't find block !
> > > 2130:2130 postcopy_incoming_umem_ram_load:1526: shmem == NULL
> > > 2130:2130 postcopy_incoming_umemd:1882: exiting umemd main loop
> > > and at the same time , the destination node didn't show the EOS
> > >  
> > > so I still can't solve the stuck problem
> > > Thanks for your help~!
> > > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > ━
> > > Tommy
> > >  
> > > From: Isaku Yamahata
> > > Date: 2012-01-11 10:45
> > > To: thfbjyddx
> > > CC: t.hirofuchi; qemu-devel; kvm; satoshi.itoh
> > > Subject: Re: [Qemu-devel]回??: [PATCH 00/21][RFC] postcopy live migration
> > > On Sat, Jan 07, 2012 at 06:29:14PM +0800, thfbjyddx wrote:
> > > > Hello all!
> > >  
> > > Hi, thank you for detailed report. The procedure you've tried looks
> > > good basically. Some comments below.
> > >  
> > > > I got the qemu basic version(03ecd2c80a64d030a22fe67cc7a60f24e17ff211) and
> > > > patched it correctly
> > > > but it still didn't make sense and I got the same scenario as before
> > > > outgoing node intel x86_64; incoming node amd x86_64. guest image is on nfs
> > > >  
> > > >
> >  I think I should show what I do more clearly and hope somebody can figure out
> > > > the problem
> > > > 
> > > >  ・ 1, both in/out node patch the qemu and start on 3.1.7 kernel with umem
> > > > 
> > > >        ./configure --target-list=
> > > x86_64-softmmu --enable-kvm --enable-postcopy
> > > > --enable-debug
> > > >        make
> > > >        make install
> > > > 
> > > >  ・ 2, outgoing qemu:
> > > > 
> > > > qemu-system-x86_64 -m 256 -hda xxx -monitor stdio -vnc: 2 -usbdevice tablet
> > > > -machine accel=kvm
> > > > incoming qemu:
> > > > qemu-system-x86_64 -m 256 -hda xxx -postcopy -incoming tcp:0:8888 -monitor
> > > > stdio -vnc: 2 -usbdevice tablet -machine accel=kvm
> > > > 
> > > >  ・ 3, outgoing node:
> > > > 
> > > > migrate -d -p -n tcp:(incoming node ip):8888
> > > >  
> > > > result:
> > > > 
> > > >  ・ outgoing qemu:
> > > > 
> > > > info status: VM-status: paused (finish-migrate);
> > > > 
> > > >  ・ incoming qemu:
> > > > 
> > > > can't type any more and can't kill the process(qemu-system-x86)
> > > >  
> > > > I open the debug flag in migration.c migration-tcp.c migration-postcopy.c:
> > > > 
> > > >  ・ outgoing qemu:
> > > > 
> > > > (qemu) migration-tcp: connect completed
> > > > migration: beginning savevm
> > > > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 1
> > > > migration: iterate
> > > > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 2
> > > > migration: done iterating
> > > > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 3
> > > > 4500:4500 postcopy_outgoing_begin:716: outgoing begin
> > > > 
> > > >  ・ incoming qemu:
> > > > 
> > > > (qemu) migration-tcp: Attempting to start an incoming migration
> > > > migration-tcp: accepted migration
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > > > 4872:4872 postcopy_incoming_ram_load:1057: done
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > > > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > > > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > > > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > >  
> > > There should be only single EOS line. Just copy & past miss?
> > >  
> > >  
> > > > from the result:
> > > > It didn't get to the "successfully loaded vm state"
> > > > So it still in the qemu_loadvm_state, and I found it's in
> > > > cpu_synchronize_all_post_init->kvm_arch_put_registers->kvm_put_msrs and got
> > > > stuck
> > >  
> > > Can you please track it down one more step?
> > > Which line did it stuck in kvm_put_msrs()? kvm_put_msrs() doesn't seem to
> > > block.(backtrace by the debugger would be best.)
> > >  
> > > If possible, can you please test with more simplified configuration.
> > > i.e. drop device as much as possible i.e. no usbdevice, no disk...
> > > So the debug will be simplified.
> > >  
> > > thanks,
> > >  
> > > > Does anyone give some advises on the problem?
> > > > Thanks very much~
> > > >  
> > > > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > ━
> > > ━
> > > > Tommy
> > > >  
> > > > From: Isaku Yamahata
> > > > Date: 2011-12-29 09:25
> > > > To: kvm; qemu-devel
> > > > CC: yamahata; t.hirofuchi; satoshi.itoh
> > > > Subject: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
> > > > Intro
> > > > =====
> > > > This patch series implements postcopy live migration.[1]
> > > > As discussed at KVM forum 2011, dedicated character device is used for
> > > > distributed shared memory between migration source and destination.
> > > > Now we can discuss/benchmark/compare with precopy. I believe there are
> > > > much rooms for improvement.
> > > >  
> > > > [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
> > > >  
> > > >  
> > > > Usage
> > > > =====
> > > > You need load umem character device on the host before starting migration.
> > > > Postcopy can be used for tcg and kvm accelarator. The implementation depend
> > > > on only linux umem character device. But the driver dependent code is split
> > > > into a file.
> > > > I tested only host page size ==
> >  guest page size case, but the implementation
> > > > allows host page size != guest page size case.
> > > >  
> > > > The following options are added with this patch series.
> > > > - incoming part
> > > >   command line options
> > > >   -postcopy [-postcopy-flags <flags>]
> > > >   where flags is for changing behavior for benchmark/debugging
> > > >   Currently the following flags are available
> > > >   0: default
> > > >   1: enable touching page request
> > > >  
> > > >   example:
> > > >   qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
> > > >  
> > > > - outging part
> > > >   options for migrate command 
> > > >   migrate [-p [-n]] URI
> > > >   -p: indicate postcopy migration
> > > >   -n: disable background transferring pages: This is for benchmark/
> > debugging
> > > >  
> > > >   example:
> > > >   migrate -p -n tcp:<dest ip address>:4444
> > > >  
> > > >  
> > > > TODO
> > > > ====
> > > > - benchmark/evaluation. Especially how async page fault affects the result.
> > > > - improve/optimization
> > > >   At the moment at least what I'm aware of is
> > > >   - touching pages in incoming qemu process by fd handler seems suboptimal.
> > > >     creating dedicated thread?
> > > >   - making incoming socket non-blocking
> > > >   - outgoing handler seems suboptimal causing latency.
> > > > - catch up memory API change
> > > > - consider on FUSE/CUSE possibility
> > > > - and more...
> > > >  
> > > > basic postcopy work flow
> > > > ========================
> > > >         qemu on the destination
> > > >               |
> > > >               V
> > > >         open(/dev/umem)
> > > >               |
> > > >               V
> > > >         UMEM_DEV_CREATE_UMEM
> > > >               |
> > > >               V
> > > >         Here we have two file descriptors to
> > > >         umem device and shmem file
> > > >               |
> > > >               |                                  umemd
> > > >               |                                  daemon on the destination
> > > >               |
> > > >               V    create pipe to communicate
> > > >         fork()---------------------------------------,
> > > >               |                                      |
> > > >               V                                      |
> > > >         close(socket)                                V
> > > >         close(shmem)                              mmap(shmem file)
> > > >               |                                      |
> > > >               V                                      V
> > > >         mmap(umem device) for guest RAM           close(shmem file)
> > > >               |                                      |
> > > >         close(umem device)                           |
> > > >               |                                      |
> > > >               V                                      |
> > > >         wait for ready from daemon <----pipe-----send ready message
> > > >               |                                      |
> > > >               |                                 Here the daemon takes over 
> > > >         send ok------------pipe---------------> the owner of the socket    
> > > >               |         to the source              
> > > >               V                                      |
> > > >         entering post copy stage                     |
> > > >         start guest execution                        |
> > > >               |                                      |
> > > >               V                                      V
> > > >         access guest RAM                          UMEM_GET_PAGE_REQUEST
> > > >               |                                      |
> > > >               V                                      V
> > > >         page fault ------------------------------>page offset is returned
> > > >         block                                        |
> > > >                                                      V
> > > >                                                   pull page from the source
> > > >                                                   write the page contents
> > > >                                                   to the shmem.
> > > >                                                      |
> > > >                                                      V
> > > >         unblock     <-----------------------------UMEM_MARK_PAGE_CACHED
> > > >         the fault handler returns the page
> > > >         page fault is resolved
> > > >               |
> > > >               |                                   pages can be sent
> > > >               |                                   backgroundly
> > > >               |                                      |
> > > >               |                                      V
> > > >               |                                   UMEM_MARK_PAGE_CACHED
> > > >               |                                      |
> > > >               V                                      V
> > > >         The specified pages<-----pipe------------request to touch pages
> > > >         are made present by                          |
> > > >         touching guest RAM.                          |
> > > >               |                                      |
> > > >               V                                      V
> > > >              reply-------------pipe-------------> release the cached page
> > > >               |                                   madvise(MADV_REMOVE)
> > > >               |                                      |
> > > >               V                                      V
> > > >  
> > > >                  all the pages are pulled from the source
> > > >  
> > > >               |                                      |
> > > >               V                                      V
> > > >         the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
> > > >        (note: I'm not sure if this can be implemented or not)
> > > >               |                                      |
> > > >               V                                      V
> > > >         migration completes                        exit()
> > > >  
> > > >  
> > > >  
> > > > Isaku Yamahata (21):
> > > >   arch_init: export sort_ram_list() and ram_save_block()
> > > >   arch_init: export RAM_SAVE_xxx flags for postcopy
> > > >   arch_init/ram_save: introduce constant for ram save version = 4
> > > >   arch_init: refactor host_from_stream_offset()
> > > >   arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
> > > >   arch_init: refactor ram_save_block()
> > > >   arch_init/ram_save_live: factor out ram_save_limit
> > > >   arch_init/ram_load: refactor ram_load
> > > >   exec.c: factor out qemu_get_ram_ptr()
> > > >   exec.c: export last_ram_offset()
> > > >   savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
> > > >   savevm: qemu_pending_size() to return pending buffered size
> > > >   savevm, buffered_file: introduce method to drain buffer of buffered
> > > >     file
> > > >   migration: export migrate_fd_completed() and migrate_fd_cleanup()
> > > >   migration: factor out parameters into MigrationParams
> > > >   umem.h: import Linux umem.h
> > > >   update-linux-headers.sh: teach umem.h to update-linux-headers.sh
> > > >   configure: add CONFIG_POSTCOPY option
> > > >   postcopy: introduce -postcopy and -postcopy-flags option
> > > >   postcopy outgoing: add -p and -n option to migrate command
> > > >   postcopy: implement postcopy livemigration
> > > >  
> > > >  Makefile.target                 |    4 +
> > > >  arch_init.c                     |  260 ++++---
> > > >  arch_init.h                     |   20 +
> > > >  block-migration.c               |    8 +-
> > > >  buffered_file.c                 |   20 +-
> > > >  buffered_file.h                 |    1 +
> > > >  configure                       |   12 +
> > > >  cpu-all.h                       |    9 +
> > > >  exec-obsolete.h                 |    1 +
> > > >  exec.c                          |   75 +-
> > > >  hmp-commands.hx                 |   12 +-
> > > >  hw/hw.h                         |    7 +-
> > > >  linux-headers/linux/umem.h      |   83 ++
> > > >  migration-exec.c                |    8 +
> > > >  migration-fd.c                  |   30 +
> > > >  migration-postcopy-stub.c       |   77 ++
> > > >  migration-postcopy.c            |
> > >  1891 +++++++++++++++++++++++++++++++++++++++
> > > >  migration-tcp.c                 |   37 +-
> > > >  migration-unix.c                |   32 +-
> > > >  migration.c                     |   53 +-
> > > >  migration.h                     |   49 +-
> > > >  qemu-common.h                   |    2 +
> > > >  qemu-options.hx                 |   25 +
> > > >  qmp-commands.hx                 |   10 +-
> > > >  savevm.c                        |   31 +-
> > > >  scripts/update-linux-headers.sh |    2 +-
> > > >  sysemu.h                        |    4 +-
> > > >  umem.c                          |  379 ++++++++
> > > >  umem.h                          |  105 +++
> > > >  vl.c                            |   20 +-
> > > >  30 files changed, 3086 insertions(+), 181 deletions(-)
> > > >  create mode 100644 linux-headers/linux/umem.h
> > > >  create mode 100644 migration-postcopy-stub.c
> > > >  create mode 100644 migration-postcopy.c
> > > >  create mode 100644 umem.c
> > > >  create mode 100644 umem.h
> > > >  
> > > >  
> > > >  
> > >  
> > > -- 
> > > yamahata
> > >  
> > >  
> >  
> > -- 
> > yamahata
> >  
> >  
> 
> -- 
> yamahata
> 

-- 
yamahata

  reply	other threads:[~2012-01-16 10:17 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-29  1:25 [PATCH 00/21][RFC] postcopy live migration Isaku Yamahata
2011-12-29  1:25 ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 01/21] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 02/21] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 03/21] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 04/21] arch_init: refactor host_from_stream_offset() Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 05/21] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 06/21] arch_init: refactor ram_save_block() Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 07/21] arch_init/ram_save_live: factor out ram_save_limit Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 08/21] arch_init/ram_load: refactor ram_load Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 09/21] exec.c: factor out qemu_get_ram_ptr() Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 10/21] exec.c: export last_ram_offset() Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 11/21] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 12/21] savevm: qemu_pending_size() to return pending buffered size Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 13/21] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 14/21] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 15/21] migration: factor out parameters into MigrationParams Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 16/21] umem.h: import Linux umem.h Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 17/21] update-linux-headers.sh: teach umem.h to update-linux-headers.sh Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 18/21] configure: add CONFIG_POSTCOPY option Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 19/21] postcopy: introduce -postcopy and -postcopy-flags option Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:25 ` [PATCH 20/21] postcopy outgoing: add -p and -n option to migrate command Isaku Yamahata
2011-12-29  1:25   ` [Qemu-devel] " Isaku Yamahata
2011-12-29  1:26 ` [PATCH 21/21] postcopy: implement postcopy livemigration Isaku Yamahata
2011-12-29  1:26   ` [Qemu-devel] " Isaku Yamahata
2011-12-29 15:51   ` Orit Wasserman
2011-12-29 15:51     ` Orit Wasserman
2012-01-04  3:34     ` Isaku Yamahata
2012-01-04  3:34       ` [Qemu-devel] " Isaku Yamahata
2011-12-29 16:06   ` Avi Kivity
2011-12-29 16:06     ` [Qemu-devel] " Avi Kivity
2012-01-04  3:29     ` Isaku Yamahata
2012-01-04  3:29       ` [Qemu-devel] " Isaku Yamahata
2012-01-12 14:15       ` Avi Kivity
2012-01-12 14:15         ` [Qemu-devel] " Avi Kivity
2011-12-29 22:39 ` [PATCH 00/21][RFC] postcopy live migration Anthony Liguori
2011-12-29 22:39   ` [Qemu-devel] " Anthony Liguori
2012-01-01  9:43   ` Orit Wasserman
2012-01-01  9:43     ` [Qemu-devel] " Orit Wasserman
2012-01-01 16:27     ` Stefan Hajnoczi
2012-01-01 16:27       ` Stefan Hajnoczi
2012-01-02  9:28       ` Dor Laor
2012-01-02  9:28         ` Dor Laor
2012-01-02 17:22         ` Stefan Hajnoczi
2012-01-02 17:22           ` [Qemu-devel] " Stefan Hajnoczi
2012-01-01  9:52   ` Dor Laor
2012-01-01  9:52     ` [Qemu-devel] " Dor Laor
2012-01-04  1:30     ` Takuya Yoshikawa
2012-01-04  1:30       ` [Qemu-devel] " Takuya Yoshikawa
2012-01-04  3:48     ` Michael Roth
2012-01-04  3:48       ` [Qemu-devel] " Michael Roth
2012-01-04  3:51   ` Isaku Yamahata
2012-01-04  3:51     ` Isaku Yamahata
     [not found] ` <BLU0-SMTP161AC380D472854F48E33A5BC9A0@phx.gbl>
2012-01-11  2:45   ` 回??: " Isaku Yamahata
2012-01-11  2:45     ` [Qemu-devel] " Isaku Yamahata
2012-01-12  8:29     ` thfbjyddx
2012-01-12  8:29       ` [Qemu-devel] " thfbjyddx
2012-01-12  8:54       ` 回??: [PATCH 00/21][RFC] postcopy live?migration Isaku Yamahata
2012-01-12  8:54         ` [Qemu-devel] " Isaku Yamahata
2012-01-12 13:26         ` thfbjyddx
2012-01-12 13:26           ` [Qemu-devel] " thfbjyddx
2012-01-16  6:51           ` Isaku Yamahata
2012-01-16  6:51             ` [Qemu-devel] " Isaku Yamahata
2012-01-16 10:17             ` Isaku Yamahata [this message]
2012-01-16 10:17               ` Isaku Yamahata
2012-03-12  8:36               ` thfbjyddx
2012-03-12  8:36                 ` [Qemu-devel] " thfbjyddx
2012-03-13  3:21                 ` Isaku Yamahata
2012-03-13  3:21                   ` [Qemu-devel] " Isaku Yamahata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120116101754.GB18937@valinux.co.jp \
    --to=yamahata@valinux.co.jp \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=satoshi.itoh@aist.go.jp \
    --cc=t.hirofuchi@aist.go.jp \
    --cc=thfbjyddx@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.