* 33 VCPUs in HVM guests with live migration with Linux hangs @ 2014-04-04 20:44 Konrad Rzeszutek Wilk 2014-04-07 8:32 ` Ian Campbell 0 siblings, 1 reply; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-04 20:44 UTC (permalink / raw) To: xen-devel; +Cc: boris.ostrovsky, david.vrabel When live migrating I found out that if you try with more than 32 VCPUs the guest is stuck. It works OK when booting - all 33 VCPUs show up. I use this small config: kernel = "hvmloader" device_model_version = 'qemu-xen-traditional' vcpus = 33 builder='hvm' memory=1024 serial='file:/var/log/xen/console-bootstrap-x86_64-pvhvm' name="m" disk = [ 'file:/mnt/lab/bootstrap-x86_64/root_image.iso,hdc:cdrom,r','phy:/dev/guests/bootstrap-x86_64-pvhvm,xvda,w'] boot="dn" vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ] vnc=1 vnclisten="0.0.0.0" usb=1 usbdevice="tablet" And do a migration: m 33 1023 33 -b---- 14.3 -bash-4.1# xl migrate m localhost root@localhost's password: migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/418) Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/418) Savefile contains xl domain config WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware libxl: notice: libxl_numa.c:494:libxl__get_numa_candidate: NUMA placement failed, performance might be affected xc: Reloading memory pages: 262144/1045504 25%migration target: Transfer complete, requesting permission to start domain. migration sender: Target has acknowledged transfer. migration sender: Giving target permission to start. migration target: Got permission, starting domain. migration target: Domain started successsfully. migration sender: Target reports successful startup. Migration successful. Which completes succesfully. xl vpcu-list tells me that all 32 VCPUs are blocked except one - 33rd. Which seems to alternate between: Call Trace: [<ffffffff81128a5a>] multi_cpu_stop+0x9a <-- ffff8800365b5da0: [<ffffffff811289c0>] multi_cpu_stop ffff8800365b5dc0: [<ffffffff811290aa>] cpu_stopper_thread+0x4a ffff8800365b5de0: [<ffffffff816f7081>] __schedule+0x381 ffff8800365b5e38: [<ffffffff810cbf90>] smpboot_thread_fn ffff8800365b5e80: [<ffffffff810cc0d8>] smpboot_thread_fn+0x148 ffff8800365b5eb0: [<ffffffff810cbf90>] smpboot_thread_fn ffff8800365b5ec0: [<ffffffff810c498e>] kthread+0xce ffff8800365b5f28: [<ffffffff810c48c0>] kthread ffff8800365b5f50: [<ffffffff81703e0c>] ret_from_fork+0x7c ffff8800365b5f80: [<ffffffff810c48c0>] kthread and [<ffffffff8108f3c8>] pvclock_clocksource_read+0x18 <-- ffff880038603ef0: [<ffffffff81045698>] xen_clocksource_read+0x28 ffff880038603f00: [<ffffffff81057909>] sched_clock+0x9 ffff880038603f10: [<ffffffff810d7b85>] sched_clock_local+0x25 ffff880038603f40: [<ffffffff810d7ca8>] sched_clock_cpu+0xb8 ffff880038603f60: [<ffffffff810d840e>] irqtime_account_irq+0x4e ffff880038603f80: [<ffffffff810a5279>] irq_enter+0x39 ffff880038603f90: [<ffffffff813f8480>] xen_evtchn_do_upcall+0x20 ffff880038603fb0: [<ffffffff817058ed>] xen_hvm_callback_vector+0x6d which implies that the CPU is receiving interrupts, but somehow is in thread doing something.. probably waiting for an mutex. When the CPU (33) started (before migration) this was its stack: Call Trace: [<ffffffff8108e846>] native_safe_halt+0x6 <-- ffff8800377d1e90: [<ffffffff8105989a>] default_idle+0x1a ffff8800377d1eb0: [<ffffffff810591f6>] arch_cpu_idle+0x26 ffff8800377d1ec0: [<ffffffff810f6d76>] cpu_startup_entry+0xa6 ffff8800377d1ef0: [<ffffffff81109a55>] clockevents_register_device+0x105 ffff8800377d1f30: [<ffffffff8108236e>] start_secondary+0x19e The only culprit I could think of was commit d5b17dbff83d63fb6bf35daec21c8ebfb8d695b5 "xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info" which I had reverted - but that did not help. So questions: 1) Had anybody else actually booted HVM guests with more than 32 VCPUs and tried to migrate? 2) If yes, had you seen this before? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: 33 VCPUs in HVM guests with live migration with Linux hangs 2014-04-04 20:44 33 VCPUs in HVM guests with live migration with Linux hangs Konrad Rzeszutek Wilk @ 2014-04-07 8:32 ` Ian Campbell 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad 0 siblings, 2 replies; 36+ messages in thread From: Ian Campbell @ 2014-04-07 8:32 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, boris.ostrovsky, david.vrabel On Fri, 2014-04-04 at 16:44 -0400, Konrad Rzeszutek Wilk wrote: > The only culprit I could think of was commit d5b17dbff83d63fb6bf35daec21c8ebfb8d695b5 > "xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info" > > which I had reverted - but that did not help. Looking at the commit log that isn't surprising. That message hints that there is some later setup which calls VCPUOP_register_info and causes cpus >= 33 to work. (since as the message says shared info only has space for 32 cpus). I think you want to be investigating the (lack of) calls to VCPUOP_register_info. > So questions: > > 1) Had anybody else actually booted HVM guests with more than 32 VCPUs > and tried to migrate? Not me. Ian. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1). 2014-04-07 8:32 ` Ian Campbell @ 2014-04-08 17:25 ` konrad 2014-04-08 17:25 ` [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating konrad ` (3 more replies) 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad 1 sibling, 4 replies; 36+ messages in thread From: konrad @ 2014-04-08 17:25 UTC (permalink / raw) To: xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich These two patches (one for Linux, one for Xen) allow PVHVM guests to use the per-cpu VCPU mechanism after migration. Currently when an PVHVM guest migrates all the per-cpu information is lost and we fallback on the shared_info structure. This is regardless if the HVM guest has 2 or 128 CPUs. Since the structure has an array for only 32 CPUs that means if we are to migrate a PVHVM guest - we can only do it up to 32 CPUs. These patches fix it and allow more than 32 VCPUs to be migrated with PVHVM Linux guests. The Linux diff is: arch/x86/xen/enlighten.c | 21 ++++++++++++++++----- arch/x86/xen/suspend.c | 6 +----- arch/x86/xen/time.c | 3 +++ 3 files changed, 20 insertions(+), 10 deletions(-) while the Xen one is: xen/arch/x86/hvm/hvm.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) ^ permalink raw reply [flat|nested] 36+ messages in thread
* [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad @ 2014-04-08 17:25 ` konrad 2014-04-08 17:25 ` konrad ` (2 subsequent siblings) 3 siblings, 0 replies; 36+ messages in thread From: konrad @ 2014-04-08 17:25 UTC (permalink / raw) To: xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> When we migrate an HVM guest, by default our shared_info can only hold up to 32 CPUs. As such the hypercall VCPUOP_register_vcpu_info was introduced which allowed us to setup per-page areas for VCPUs. This means we can boot PVHVM guest with more than 32 VCPUs. During migration the per-cpu structure is allocated fresh by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly migrated guest can do make the VCPUOP_register_vcpu_info hypercall. Unfortunatly we end up triggering this condition: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running vCPUS. The Linux PV code paths make this work by iterating over every vCPU with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it. 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are not allowed on HVM guests we can't do this. This patch enables this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/arch/x86/hvm/hvm.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 38c491e..b5b92fe 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( case VCPUOP_stop_singleshot_timer: case VCPUOP_register_vcpu_info: case VCPUOP_register_vcpu_time_memory_area: + case VCPUOP_down: + case VCPUOP_up: + case VCPUOP_is_up: rc = do_vcpu_op(cmd, vcpuid, arg); break; default: -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad 2014-04-08 17:25 ` [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating konrad @ 2014-04-08 17:25 ` konrad 2014-04-08 18:18 ` [Xen-devel] " Roger Pau Monné ` (3 more replies) 2014-04-08 17:25 ` [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs " konrad 2014-04-08 17:25 ` konrad 3 siblings, 4 replies; 36+ messages in thread From: konrad @ 2014-04-08 17:25 UTC (permalink / raw) To: xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich Cc: Konrad Rzeszutek Wilk From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> When we migrate an HVM guest, by default our shared_info can only hold up to 32 CPUs. As such the hypercall VCPUOP_register_vcpu_info was introduced which allowed us to setup per-page areas for VCPUs. This means we can boot PVHVM guest with more than 32 VCPUs. During migration the per-cpu structure is allocated fresh by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly migrated guest can do make the VCPUOP_register_vcpu_info hypercall. Unfortunatly we end up triggering this condition: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running vCPUS. The Linux PV code paths make this work by iterating over every vCPU with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it. 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are not allowed on HVM guests we can't do this. This patch enables this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/arch/x86/hvm/hvm.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 38c491e..b5b92fe 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( case VCPUOP_stop_singleshot_timer: case VCPUOP_register_vcpu_info: case VCPUOP_register_vcpu_time_memory_area: + case VCPUOP_down: + case VCPUOP_up: + case VCPUOP_is_up: rc = do_vcpu_op(cmd, vcpuid, arg); break; default: -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 17:25 ` konrad @ 2014-04-08 18:18 ` Roger Pau Monné 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-08 18:18 ` Roger Pau Monné ` (2 subsequent siblings) 3 siblings, 2 replies; 36+ messages in thread From: Roger Pau Monné @ 2014-04-08 18:18 UTC (permalink / raw) To: konrad, xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich On 08/04/14 19:25, konrad@kernel.org wrote: > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > When we migrate an HVM guest, by default our shared_info can > only hold up to 32 CPUs. As such the hypercall > VCPUOP_register_vcpu_info was introduced which allowed us to > setup per-page areas for VCPUs. This means we can boot PVHVM > guest with more than 32 VCPUs. During migration the per-cpu > structure is allocated fresh by the hypervisor (vcpu_info_mfn > is set to INVALID_MFN) so that the newly migrated guest > can do make the VCPUOP_register_vcpu_info hypercall. > > Unfortunatly we end up triggering this condition: > /* Run this command on yourself or on other offline VCPUS. */ > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > > which means we are unable to setup the per-cpu VCPU structures > for running vCPUS. The Linux PV code paths make this work by > iterating over every vCPU with: > > 1) is target CPU up (VCPUOP_is_up hypercall?) > 2) if yes, then VCPUOP_down to pause it. > 3) VCPUOP_register_vcpu_info > 4) if it was down, then VCPUOP_up to bring it back up > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > not allowed on HVM guests we can't do this. This patch > enables this. Hmmm, this looks like a very convoluted approach to something that could be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into suspension, which means that all vCPUs except vCPU#0 will be in the cpususpend_handler, see: http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 Then on resume we unblock the "suspended" CPUs, and the first thing they do is call cpu_ops.cpu_resume which is basically going to setup the vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar is possible under Linux, but it seems easier and doesn't require any Xen-side changes. Roger. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 18:18 ` [Xen-devel] " Roger Pau Monné @ 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-09 7:37 ` Roger Pau Monné ` (3 more replies) 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 1 sibling, 4 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-08 18:53 UTC (permalink / raw) To: Roger Pau Monné Cc: konrad, xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > On 08/04/14 19:25, konrad@kernel.org wrote: > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > When we migrate an HVM guest, by default our shared_info can > > only hold up to 32 CPUs. As such the hypercall > > VCPUOP_register_vcpu_info was introduced which allowed us to > > setup per-page areas for VCPUs. This means we can boot PVHVM > > guest with more than 32 VCPUs. During migration the per-cpu > > structure is allocated fresh by the hypervisor (vcpu_info_mfn > > is set to INVALID_MFN) so that the newly migrated guest > > can do make the VCPUOP_register_vcpu_info hypercall. > > > > Unfortunatly we end up triggering this condition: > > /* Run this command on yourself or on other offline VCPUS. */ > > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > > > > which means we are unable to setup the per-cpu VCPU structures > > for running vCPUS. The Linux PV code paths make this work by > > iterating over every vCPU with: > > > > 1) is target CPU up (VCPUOP_is_up hypercall?) > > 2) if yes, then VCPUOP_down to pause it. > > 3) VCPUOP_register_vcpu_info > > 4) if it was down, then VCPUOP_up to bring it back up > > > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > > not allowed on HVM guests we can't do this. This patch > > enables this. > > Hmmm, this looks like a very convoluted approach to something that could > be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > suspension, which means that all vCPUs except vCPU#0 will be in the > cpususpend_handler, see: > > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 How do you 'suspend' them? If I remember there is a disadvantage of doing this as you have to bring all the CPUs "offline". That in Linux means using the stop_machine which is pretty big hammer and increases the latency for migration. > > Then on resume we unblock the "suspended" CPUs, and the first thing they > do is call cpu_ops.cpu_resume which is basically going to setup the > vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar > is possible under Linux, but it seems easier and doesn't require any > Xen-side changes. > > Roger. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 18:53 ` Konrad Rzeszutek Wilk @ 2014-04-09 7:37 ` Roger Pau Monné 2014-04-09 15:34 ` Konrad Rzeszutek Wilk 2014-04-09 15:34 ` [Xen-devel] " Konrad Rzeszutek Wilk 2014-04-09 7:37 ` Roger Pau Monné ` (2 subsequent siblings) 3 siblings, 2 replies; 36+ messages in thread From: Roger Pau Monné @ 2014-04-09 7:37 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: konrad, xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: >> On 08/04/14 19:25, konrad@kernel.org wrote: >>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> >>> When we migrate an HVM guest, by default our shared_info can >>> only hold up to 32 CPUs. As such the hypercall >>> VCPUOP_register_vcpu_info was introduced which allowed us to >>> setup per-page areas for VCPUs. This means we can boot PVHVM >>> guest with more than 32 VCPUs. During migration the per-cpu >>> structure is allocated fresh by the hypervisor (vcpu_info_mfn >>> is set to INVALID_MFN) so that the newly migrated guest >>> can do make the VCPUOP_register_vcpu_info hypercall. >>> >>> Unfortunatly we end up triggering this condition: >>> /* Run this command on yourself or on other offline VCPUS. */ >>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) >>> >>> which means we are unable to setup the per-cpu VCPU structures >>> for running vCPUS. The Linux PV code paths make this work by >>> iterating over every vCPU with: >>> >>> 1) is target CPU up (VCPUOP_is_up hypercall?) >>> 2) if yes, then VCPUOP_down to pause it. >>> 3) VCPUOP_register_vcpu_info >>> 4) if it was down, then VCPUOP_up to bring it back up >>> >>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are >>> not allowed on HVM guests we can't do this. This patch >>> enables this. >> >> Hmmm, this looks like a very convoluted approach to something that could >> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into >> suspension, which means that all vCPUs except vCPU#0 will be in the >> cpususpend_handler, see: >> >> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > > How do you 'suspend' them? If I remember there is a disadvantage of doing > this as you have to bring all the CPUs "offline". That in Linux means using > the stop_machine which is pretty big hammer and increases the latency for migration. In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 Which makes all APs call cpususpend_handler, so we know all APs are stuck in a while loop with interrupts disabled: http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 Then on resume the APs are taken out of the while loop and the first thing they do before returning from the IPI handler is registering the new per-cpu vcpu_info area. But I'm not sure this is something that can be accomplished easily on Linux. I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my 8-way box, and it seems to be working fine :). Roger. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 7:37 ` Roger Pau Monné @ 2014-04-09 15:34 ` Konrad Rzeszutek Wilk 2014-04-09 15:34 ` [Xen-devel] " Konrad Rzeszutek Wilk 1 sibling, 0 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-09 15:34 UTC (permalink / raw) To: Roger Pau Monné Cc: keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote: > On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: > > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > >> On 08/04/14 19:25, konrad@kernel.org wrote: > >>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >>> > >>> When we migrate an HVM guest, by default our shared_info can > >>> only hold up to 32 CPUs. As such the hypercall > >>> VCPUOP_register_vcpu_info was introduced which allowed us to > >>> setup per-page areas for VCPUs. This means we can boot PVHVM > >>> guest with more than 32 VCPUs. During migration the per-cpu > >>> structure is allocated fresh by the hypervisor (vcpu_info_mfn > >>> is set to INVALID_MFN) so that the newly migrated guest > >>> can do make the VCPUOP_register_vcpu_info hypercall. > >>> > >>> Unfortunatly we end up triggering this condition: > >>> /* Run this command on yourself or on other offline VCPUS. */ > >>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > >>> > >>> which means we are unable to setup the per-cpu VCPU structures > >>> for running vCPUS. The Linux PV code paths make this work by > >>> iterating over every vCPU with: > >>> > >>> 1) is target CPU up (VCPUOP_is_up hypercall?) > >>> 2) if yes, then VCPUOP_down to pause it. > >>> 3) VCPUOP_register_vcpu_info > >>> 4) if it was down, then VCPUOP_up to bring it back up > >>> > >>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > >>> not allowed on HVM guests we can't do this. This patch > >>> enables this. > >> > >> Hmmm, this looks like a very convoluted approach to something that could > >> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > >> suspension, which means that all vCPUs except vCPU#0 will be in the > >> cpususpend_handler, see: > >> > >> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > > > > How do you 'suspend' them? If I remember there is a disadvantage of doing > > this as you have to bring all the CPUs "offline". That in Linux means using > > the stop_machine which is pretty big hammer and increases the latency for migration. > > In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: > > http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 > > Which makes all APs call cpususpend_handler, so we know all APs are > stuck in a while loop with interrupts disabled: > > http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 > > Then on resume the APs are taken out of the while loop and the first > thing they do before returning from the IPI handler is registering the > new per-cpu vcpu_info area. But I'm not sure this is something that can > be accomplished easily on Linux. That is a bit of what the 'stop_machine' would do. It puts all of the CPUs in whatever function you want. But I am not sure of the latency impact - as in what if the migration takes longer and all of the CPUs sit there spinning. Another variant of that is the 'smp_call_function'. Then when we resume - we need a mailbox that is shared (easily enough I think) to tell us that the migration has been done - and then need to call that VCPUOP_register_vcpu_info. But if the migration has taken quite long - I fear that the watchdogs might kick in and start complaining about the CPUs stuck. Especially if we migrating on overcommitted guest. With this the latency for them to be 'paused', 'initted', 'unpaused' I think is much much smaller. Ugh, lets wait with this exercise of using the 'smp_call_function' sometime at the end of the summer - and see. That functionality should be shared with the PV code path IMHO. > > I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my > 8-way box, and it seems to be working fine :). Awesome! > > Roger. > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 7:37 ` Roger Pau Monné 2014-04-09 15:34 ` Konrad Rzeszutek Wilk @ 2014-04-09 15:34 ` Konrad Rzeszutek Wilk 2014-04-09 15:38 ` David Vrabel 2014-04-09 15:38 ` David Vrabel 1 sibling, 2 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-09 15:34 UTC (permalink / raw) To: Roger Pau Monné Cc: konrad, xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote: > On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: > > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > >> On 08/04/14 19:25, konrad@kernel.org wrote: > >>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >>> > >>> When we migrate an HVM guest, by default our shared_info can > >>> only hold up to 32 CPUs. As such the hypercall > >>> VCPUOP_register_vcpu_info was introduced which allowed us to > >>> setup per-page areas for VCPUs. This means we can boot PVHVM > >>> guest with more than 32 VCPUs. During migration the per-cpu > >>> structure is allocated fresh by the hypervisor (vcpu_info_mfn > >>> is set to INVALID_MFN) so that the newly migrated guest > >>> can do make the VCPUOP_register_vcpu_info hypercall. > >>> > >>> Unfortunatly we end up triggering this condition: > >>> /* Run this command on yourself or on other offline VCPUS. */ > >>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > >>> > >>> which means we are unable to setup the per-cpu VCPU structures > >>> for running vCPUS. The Linux PV code paths make this work by > >>> iterating over every vCPU with: > >>> > >>> 1) is target CPU up (VCPUOP_is_up hypercall?) > >>> 2) if yes, then VCPUOP_down to pause it. > >>> 3) VCPUOP_register_vcpu_info > >>> 4) if it was down, then VCPUOP_up to bring it back up > >>> > >>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > >>> not allowed on HVM guests we can't do this. This patch > >>> enables this. > >> > >> Hmmm, this looks like a very convoluted approach to something that could > >> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > >> suspension, which means that all vCPUs except vCPU#0 will be in the > >> cpususpend_handler, see: > >> > >> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > > > > How do you 'suspend' them? If I remember there is a disadvantage of doing > > this as you have to bring all the CPUs "offline". That in Linux means using > > the stop_machine which is pretty big hammer and increases the latency for migration. > > In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: > > http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 > > Which makes all APs call cpususpend_handler, so we know all APs are > stuck in a while loop with interrupts disabled: > > http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 > > Then on resume the APs are taken out of the while loop and the first > thing they do before returning from the IPI handler is registering the > new per-cpu vcpu_info area. But I'm not sure this is something that can > be accomplished easily on Linux. That is a bit of what the 'stop_machine' would do. It puts all of the CPUs in whatever function you want. But I am not sure of the latency impact - as in what if the migration takes longer and all of the CPUs sit there spinning. Another variant of that is the 'smp_call_function'. Then when we resume - we need a mailbox that is shared (easily enough I think) to tell us that the migration has been done - and then need to call that VCPUOP_register_vcpu_info. But if the migration has taken quite long - I fear that the watchdogs might kick in and start complaining about the CPUs stuck. Especially if we migrating on overcommitted guest. With this the latency for them to be 'paused', 'initted', 'unpaused' I think is much much smaller. Ugh, lets wait with this exercise of using the 'smp_call_function' sometime at the end of the summer - and see. That functionality should be shared with the PV code path IMHO. > > I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my > 8-way box, and it seems to be working fine :). Awesome! > > Roger. > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:34 ` [Xen-devel] " Konrad Rzeszutek Wilk @ 2014-04-09 15:38 ` David Vrabel 2014-04-09 15:55 ` Konrad Rzeszutek Wilk 2014-04-09 15:55 ` [Xen-devel] " Konrad Rzeszutek Wilk 2014-04-09 15:38 ` David Vrabel 1 sibling, 2 replies; 36+ messages in thread From: David Vrabel @ 2014-04-09 15:38 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Roger Pau Monné, konrad, xen-devel, boris.ostrovsky, linux-kernel, keir, jbeulich On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote: > On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote: >> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: >>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: >>>> On 08/04/14 19:25, konrad@kernel.org wrote: >>>>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>>>> >>>>> When we migrate an HVM guest, by default our shared_info can >>>>> only hold up to 32 CPUs. As such the hypercall >>>>> VCPUOP_register_vcpu_info was introduced which allowed us to >>>>> setup per-page areas for VCPUs. This means we can boot PVHVM >>>>> guest with more than 32 VCPUs. During migration the per-cpu >>>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn >>>>> is set to INVALID_MFN) so that the newly migrated guest >>>>> can do make the VCPUOP_register_vcpu_info hypercall. >>>>> >>>>> Unfortunatly we end up triggering this condition: >>>>> /* Run this command on yourself or on other offline VCPUS. */ >>>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) >>>>> >>>>> which means we are unable to setup the per-cpu VCPU structures >>>>> for running vCPUS. The Linux PV code paths make this work by >>>>> iterating over every vCPU with: >>>>> >>>>> 1) is target CPU up (VCPUOP_is_up hypercall?) >>>>> 2) if yes, then VCPUOP_down to pause it. >>>>> 3) VCPUOP_register_vcpu_info >>>>> 4) if it was down, then VCPUOP_up to bring it back up >>>>> >>>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are >>>>> not allowed on HVM guests we can't do this. This patch >>>>> enables this. >>>> >>>> Hmmm, this looks like a very convoluted approach to something that could >>>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into >>>> suspension, which means that all vCPUs except vCPU#0 will be in the >>>> cpususpend_handler, see: >>>> >>>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 >>> >>> How do you 'suspend' them? If I remember there is a disadvantage of doing >>> this as you have to bring all the CPUs "offline". That in Linux means using >>> the stop_machine which is pretty big hammer and increases the latency for migration. >> >> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: >> >> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 >> >> Which makes all APs call cpususpend_handler, so we know all APs are >> stuck in a while loop with interrupts disabled: >> >> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 >> >> Then on resume the APs are taken out of the while loop and the first >> thing they do before returning from the IPI handler is registering the >> new per-cpu vcpu_info area. But I'm not sure this is something that can >> be accomplished easily on Linux. > > That is a bit of what the 'stop_machine' would do. It puts all of the > CPUs in whatever function you want. But I am not sure of the latency impact - as > in what if the migration takes longer and all of the CPUs sit there spinning. > Another variant of that is the 'smp_call_function'. I tested stop_machine() on all CPUs during suspend once and it was awful: 100s of ms of additional downtime. Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full stop_machine(). David ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:38 ` David Vrabel @ 2014-04-09 15:55 ` Konrad Rzeszutek Wilk 2014-04-09 15:55 ` [Xen-devel] " Konrad Rzeszutek Wilk 1 sibling, 0 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-09 15:55 UTC (permalink / raw) To: David Vrabel Cc: keir, linux-kernel, konrad, jbeulich, xen-devel, boris.ostrovsky, Roger Pau Monné On Wed, Apr 09, 2014 at 04:38:37PM +0100, David Vrabel wrote: > On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote: > > On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote: > >> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: > >>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > >>>> On 08/04/14 19:25, konrad@kernel.org wrote: > >>>>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >>>>> > >>>>> When we migrate an HVM guest, by default our shared_info can > >>>>> only hold up to 32 CPUs. As such the hypercall > >>>>> VCPUOP_register_vcpu_info was introduced which allowed us to > >>>>> setup per-page areas for VCPUs. This means we can boot PVHVM > >>>>> guest with more than 32 VCPUs. During migration the per-cpu > >>>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn > >>>>> is set to INVALID_MFN) so that the newly migrated guest > >>>>> can do make the VCPUOP_register_vcpu_info hypercall. > >>>>> > >>>>> Unfortunatly we end up triggering this condition: > >>>>> /* Run this command on yourself or on other offline VCPUS. */ > >>>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > >>>>> > >>>>> which means we are unable to setup the per-cpu VCPU structures > >>>>> for running vCPUS. The Linux PV code paths make this work by > >>>>> iterating over every vCPU with: > >>>>> > >>>>> 1) is target CPU up (VCPUOP_is_up hypercall?) > >>>>> 2) if yes, then VCPUOP_down to pause it. > >>>>> 3) VCPUOP_register_vcpu_info > >>>>> 4) if it was down, then VCPUOP_up to bring it back up > >>>>> > >>>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > >>>>> not allowed on HVM guests we can't do this. This patch > >>>>> enables this. > >>>> > >>>> Hmmm, this looks like a very convoluted approach to something that could > >>>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > >>>> suspension, which means that all vCPUs except vCPU#0 will be in the > >>>> cpususpend_handler, see: > >>>> > >>>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > >>> > >>> How do you 'suspend' them? If I remember there is a disadvantage of doing > >>> this as you have to bring all the CPUs "offline". That in Linux means using > >>> the stop_machine which is pretty big hammer and increases the latency for migration. > >> > >> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: > >> > >> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 > >> > >> Which makes all APs call cpususpend_handler, so we know all APs are > >> stuck in a while loop with interrupts disabled: > >> > >> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 > >> > >> Then on resume the APs are taken out of the while loop and the first > >> thing they do before returning from the IPI handler is registering the > >> new per-cpu vcpu_info area. But I'm not sure this is something that can > >> be accomplished easily on Linux. > > > > That is a bit of what the 'stop_machine' would do. It puts all of the > > CPUs in whatever function you want. But I am not sure of the latency impact - as > > in what if the migration takes longer and all of the CPUs sit there spinning. > > Another variant of that is the 'smp_call_function'. > > I tested stop_machine() on all CPUs during suspend once and it was > awful: 100s of ms of additional downtime. Yikes. > > Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full > stop_machine(). But that is clearly a bigger patch than this little bug-fix. Do you want to just take this patch as is and then later on I can work on prototyping the 'IPI-and-park-in-handler'? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:38 ` David Vrabel 2014-04-09 15:55 ` Konrad Rzeszutek Wilk @ 2014-04-09 15:55 ` Konrad Rzeszutek Wilk 1 sibling, 0 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-09 15:55 UTC (permalink / raw) To: David Vrabel Cc: Roger Pau Monné, konrad, xen-devel, boris.ostrovsky, linux-kernel, keir, jbeulich On Wed, Apr 09, 2014 at 04:38:37PM +0100, David Vrabel wrote: > On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote: > > On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote: > >> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: > >>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > >>>> On 08/04/14 19:25, konrad@kernel.org wrote: > >>>>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >>>>> > >>>>> When we migrate an HVM guest, by default our shared_info can > >>>>> only hold up to 32 CPUs. As such the hypercall > >>>>> VCPUOP_register_vcpu_info was introduced which allowed us to > >>>>> setup per-page areas for VCPUs. This means we can boot PVHVM > >>>>> guest with more than 32 VCPUs. During migration the per-cpu > >>>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn > >>>>> is set to INVALID_MFN) so that the newly migrated guest > >>>>> can do make the VCPUOP_register_vcpu_info hypercall. > >>>>> > >>>>> Unfortunatly we end up triggering this condition: > >>>>> /* Run this command on yourself or on other offline VCPUS. */ > >>>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > >>>>> > >>>>> which means we are unable to setup the per-cpu VCPU structures > >>>>> for running vCPUS. The Linux PV code paths make this work by > >>>>> iterating over every vCPU with: > >>>>> > >>>>> 1) is target CPU up (VCPUOP_is_up hypercall?) > >>>>> 2) if yes, then VCPUOP_down to pause it. > >>>>> 3) VCPUOP_register_vcpu_info > >>>>> 4) if it was down, then VCPUOP_up to bring it back up > >>>>> > >>>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > >>>>> not allowed on HVM guests we can't do this. This patch > >>>>> enables this. > >>>> > >>>> Hmmm, this looks like a very convoluted approach to something that could > >>>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > >>>> suspension, which means that all vCPUs except vCPU#0 will be in the > >>>> cpususpend_handler, see: > >>>> > >>>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > >>> > >>> How do you 'suspend' them? If I remember there is a disadvantage of doing > >>> this as you have to bring all the CPUs "offline". That in Linux means using > >>> the stop_machine which is pretty big hammer and increases the latency for migration. > >> > >> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: > >> > >> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 > >> > >> Which makes all APs call cpususpend_handler, so we know all APs are > >> stuck in a while loop with interrupts disabled: > >> > >> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 > >> > >> Then on resume the APs are taken out of the while loop and the first > >> thing they do before returning from the IPI handler is registering the > >> new per-cpu vcpu_info area. But I'm not sure this is something that can > >> be accomplished easily on Linux. > > > > That is a bit of what the 'stop_machine' would do. It puts all of the > > CPUs in whatever function you want. But I am not sure of the latency impact - as > > in what if the migration takes longer and all of the CPUs sit there spinning. > > Another variant of that is the 'smp_call_function'. > > I tested stop_machine() on all CPUs during suspend once and it was > awful: 100s of ms of additional downtime. Yikes. > > Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full > stop_machine(). But that is clearly a bigger patch than this little bug-fix. Do you want to just take this patch as is and then later on I can work on prototyping the 'IPI-and-park-in-handler'? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:34 ` [Xen-devel] " Konrad Rzeszutek Wilk 2014-04-09 15:38 ` David Vrabel @ 2014-04-09 15:38 ` David Vrabel 1 sibling, 0 replies; 36+ messages in thread From: David Vrabel @ 2014-04-09 15:38 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, jbeulich, xen-devel, boris.ostrovsky, Roger Pau Monné On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote: > On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote: >> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: >>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: >>>> On 08/04/14 19:25, konrad@kernel.org wrote: >>>>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>>>> >>>>> When we migrate an HVM guest, by default our shared_info can >>>>> only hold up to 32 CPUs. As such the hypercall >>>>> VCPUOP_register_vcpu_info was introduced which allowed us to >>>>> setup per-page areas for VCPUs. This means we can boot PVHVM >>>>> guest with more than 32 VCPUs. During migration the per-cpu >>>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn >>>>> is set to INVALID_MFN) so that the newly migrated guest >>>>> can do make the VCPUOP_register_vcpu_info hypercall. >>>>> >>>>> Unfortunatly we end up triggering this condition: >>>>> /* Run this command on yourself or on other offline VCPUS. */ >>>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) >>>>> >>>>> which means we are unable to setup the per-cpu VCPU structures >>>>> for running vCPUS. The Linux PV code paths make this work by >>>>> iterating over every vCPU with: >>>>> >>>>> 1) is target CPU up (VCPUOP_is_up hypercall?) >>>>> 2) if yes, then VCPUOP_down to pause it. >>>>> 3) VCPUOP_register_vcpu_info >>>>> 4) if it was down, then VCPUOP_up to bring it back up >>>>> >>>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are >>>>> not allowed on HVM guests we can't do this. This patch >>>>> enables this. >>>> >>>> Hmmm, this looks like a very convoluted approach to something that could >>>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into >>>> suspension, which means that all vCPUs except vCPU#0 will be in the >>>> cpususpend_handler, see: >>>> >>>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 >>> >>> How do you 'suspend' them? If I remember there is a disadvantage of doing >>> this as you have to bring all the CPUs "offline". That in Linux means using >>> the stop_machine which is pretty big hammer and increases the latency for migration. >> >> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: >> >> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 >> >> Which makes all APs call cpususpend_handler, so we know all APs are >> stuck in a while loop with interrupts disabled: >> >> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 >> >> Then on resume the APs are taken out of the while loop and the first >> thing they do before returning from the IPI handler is registering the >> new per-cpu vcpu_info area. But I'm not sure this is something that can >> be accomplished easily on Linux. > > That is a bit of what the 'stop_machine' would do. It puts all of the > CPUs in whatever function you want. But I am not sure of the latency impact - as > in what if the migration takes longer and all of the CPUs sit there spinning. > Another variant of that is the 'smp_call_function'. I tested stop_machine() on all CPUs during suspend once and it was awful: 100s of ms of additional downtime. Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full stop_machine(). David ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-09 7:37 ` Roger Pau Monné @ 2014-04-09 7:37 ` Roger Pau Monné 2014-04-09 8:33 ` Ian Campbell 2014-04-09 8:33 ` [Xen-devel] " Ian Campbell 3 siblings, 0 replies; 36+ messages in thread From: Roger Pau Monné @ 2014-04-09 7:37 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: >> On 08/04/14 19:25, konrad@kernel.org wrote: >>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> >>> When we migrate an HVM guest, by default our shared_info can >>> only hold up to 32 CPUs. As such the hypercall >>> VCPUOP_register_vcpu_info was introduced which allowed us to >>> setup per-page areas for VCPUs. This means we can boot PVHVM >>> guest with more than 32 VCPUs. During migration the per-cpu >>> structure is allocated fresh by the hypervisor (vcpu_info_mfn >>> is set to INVALID_MFN) so that the newly migrated guest >>> can do make the VCPUOP_register_vcpu_info hypercall. >>> >>> Unfortunatly we end up triggering this condition: >>> /* Run this command on yourself or on other offline VCPUS. */ >>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) >>> >>> which means we are unable to setup the per-cpu VCPU structures >>> for running vCPUS. The Linux PV code paths make this work by >>> iterating over every vCPU with: >>> >>> 1) is target CPU up (VCPUOP_is_up hypercall?) >>> 2) if yes, then VCPUOP_down to pause it. >>> 3) VCPUOP_register_vcpu_info >>> 4) if it was down, then VCPUOP_up to bring it back up >>> >>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are >>> not allowed on HVM guests we can't do this. This patch >>> enables this. >> >> Hmmm, this looks like a very convoluted approach to something that could >> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into >> suspension, which means that all vCPUs except vCPU#0 will be in the >> cpususpend_handler, see: >> >> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > > How do you 'suspend' them? If I remember there is a disadvantage of doing > this as you have to bring all the CPUs "offline". That in Linux means using > the stop_machine which is pretty big hammer and increases the latency for migration. In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0: http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289 Which makes all APs call cpususpend_handler, so we know all APs are stuck in a while loop with interrupts disabled: http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459 Then on resume the APs are taken out of the while loop and the first thing they do before returning from the IPI handler is registering the new per-cpu vcpu_info area. But I'm not sure this is something that can be accomplished easily on Linux. I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my 8-way box, and it seems to be working fine :). Roger. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-09 7:37 ` Roger Pau Monné 2014-04-09 7:37 ` Roger Pau Monné @ 2014-04-09 8:33 ` Ian Campbell 2014-04-09 8:33 ` [Xen-devel] " Ian Campbell 3 siblings, 0 replies; 36+ messages in thread From: Ian Campbell @ 2014-04-09 8:33 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky, Roger Pau Monné On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > > On 08/04/14 19:25, konrad@kernel.org wrote: > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > > > When we migrate an HVM guest, by default our shared_info can > > > only hold up to 32 CPUs. As such the hypercall > > > VCPUOP_register_vcpu_info was introduced which allowed us to > > > setup per-page areas for VCPUs. This means we can boot PVHVM > > > guest with more than 32 VCPUs. During migration the per-cpu > > > structure is allocated fresh by the hypervisor (vcpu_info_mfn > > > is set to INVALID_MFN) so that the newly migrated guest > > > can do make the VCPUOP_register_vcpu_info hypercall. > > > > > > Unfortunatly we end up triggering this condition: > > > /* Run this command on yourself or on other offline VCPUS. */ > > > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > > > > > > which means we are unable to setup the per-cpu VCPU structures > > > for running vCPUS. The Linux PV code paths make this work by > > > iterating over every vCPU with: > > > > > > 1) is target CPU up (VCPUOP_is_up hypercall?) > > > 2) if yes, then VCPUOP_down to pause it. > > > 3) VCPUOP_register_vcpu_info > > > 4) if it was down, then VCPUOP_up to bring it back up > > > > > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > > > not allowed on HVM guests we can't do this. This patch > > > enables this. > > > > Hmmm, this looks like a very convoluted approach to something that could > > be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > > suspension, which means that all vCPUs except vCPU#0 will be in the > > cpususpend_handler, see: > > > > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > > How do you 'suspend' them? If I remember there is a disadvantage of doing > this as you have to bring all the CPUs "offline". That in Linux means using > the stop_machine which is pretty big hammer and increases the latency for migration. Yes, this is why the ability to have the toolstack save/restore the secondary vcpu state was added. It's especially important for checkpointing, but it's relevant to regular migrate as a performance improvement too. It's not just stop-machine, IIRC it's a tonne of udev events relating to cpus off/onlinign etc too and all the userspace activity which that implies. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 18:53 ` Konrad Rzeszutek Wilk ` (2 preceding siblings ...) 2014-04-09 8:33 ` Ian Campbell @ 2014-04-09 8:33 ` Ian Campbell 2014-04-09 9:04 ` Roger Pau Monné 2014-04-09 9:04 ` [Xen-devel] " Roger Pau Monné 3 siblings, 2 replies; 36+ messages in thread From: Ian Campbell @ 2014-04-09 8:33 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Roger Pau Monné, keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > > On 08/04/14 19:25, konrad@kernel.org wrote: > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > > > When we migrate an HVM guest, by default our shared_info can > > > only hold up to 32 CPUs. As such the hypercall > > > VCPUOP_register_vcpu_info was introduced which allowed us to > > > setup per-page areas for VCPUs. This means we can boot PVHVM > > > guest with more than 32 VCPUs. During migration the per-cpu > > > structure is allocated fresh by the hypervisor (vcpu_info_mfn > > > is set to INVALID_MFN) so that the newly migrated guest > > > can do make the VCPUOP_register_vcpu_info hypercall. > > > > > > Unfortunatly we end up triggering this condition: > > > /* Run this command on yourself or on other offline VCPUS. */ > > > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > > > > > > which means we are unable to setup the per-cpu VCPU structures > > > for running vCPUS. The Linux PV code paths make this work by > > > iterating over every vCPU with: > > > > > > 1) is target CPU up (VCPUOP_is_up hypercall?) > > > 2) if yes, then VCPUOP_down to pause it. > > > 3) VCPUOP_register_vcpu_info > > > 4) if it was down, then VCPUOP_up to bring it back up > > > > > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > > > not allowed on HVM guests we can't do this. This patch > > > enables this. > > > > Hmmm, this looks like a very convoluted approach to something that could > > be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > > suspension, which means that all vCPUs except vCPU#0 will be in the > > cpususpend_handler, see: > > > > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 > > How do you 'suspend' them? If I remember there is a disadvantage of doing > this as you have to bring all the CPUs "offline". That in Linux means using > the stop_machine which is pretty big hammer and increases the latency for migration. Yes, this is why the ability to have the toolstack save/restore the secondary vcpu state was added. It's especially important for checkpointing, but it's relevant to regular migrate as a performance improvement too. It's not just stop-machine, IIRC it's a tonne of udev events relating to cpus off/onlinign etc too and all the userspace activity which that implies. Ian. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 8:33 ` [Xen-devel] " Ian Campbell @ 2014-04-09 9:04 ` Roger Pau Monné 2014-04-09 9:04 ` [Xen-devel] " Roger Pau Monné 1 sibling, 0 replies; 36+ messages in thread From: Roger Pau Monné @ 2014-04-09 9:04 UTC (permalink / raw) To: Ian Campbell, Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky On 09/04/14 10:33, Ian Campbell wrote: > On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote: >> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: >>> On 08/04/14 19:25, konrad@kernel.org wrote: >>>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>>> >>>> When we migrate an HVM guest, by default our shared_info can >>>> only hold up to 32 CPUs. As such the hypercall >>>> VCPUOP_register_vcpu_info was introduced which allowed us to >>>> setup per-page areas for VCPUs. This means we can boot PVHVM >>>> guest with more than 32 VCPUs. During migration the per-cpu >>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn >>>> is set to INVALID_MFN) so that the newly migrated guest >>>> can do make the VCPUOP_register_vcpu_info hypercall. >>>> >>>> Unfortunatly we end up triggering this condition: >>>> /* Run this command on yourself or on other offline VCPUS. */ >>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) >>>> >>>> which means we are unable to setup the per-cpu VCPU structures >>>> for running vCPUS. The Linux PV code paths make this work by >>>> iterating over every vCPU with: >>>> >>>> 1) is target CPU up (VCPUOP_is_up hypercall?) >>>> 2) if yes, then VCPUOP_down to pause it. >>>> 3) VCPUOP_register_vcpu_info >>>> 4) if it was down, then VCPUOP_up to bring it back up >>>> >>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are >>>> not allowed on HVM guests we can't do this. This patch >>>> enables this. >>> >>> Hmmm, this looks like a very convoluted approach to something that could >>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into >>> suspension, which means that all vCPUs except vCPU#0 will be in the >>> cpususpend_handler, see: >>> >>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 >> >> How do you 'suspend' them? If I remember there is a disadvantage of doing >> this as you have to bring all the CPUs "offline". That in Linux means using >> the stop_machine which is pretty big hammer and increases the latency for migration. > > Yes, this is why the ability to have the toolstack save/restore the > secondary vcpu state was added. It's especially important for > checkpointing, but it's relevant to regular migrate as a performance > improvement too. > > It's not just stop-machine, IIRC it's a tonne of udev events relating to > cpus off/onlinign etc too and all the userspace activity which that > implies. Well, what it's done on FreeBSD is nothing like that, it's called the cpususpend handler, but it's not off-lining CPUs or anything like that, it just places the CPU in a while loop inside of an IPI handler, so we can do something like this will all APs: while (suspended) pause(); register_vcpu_info(); So the registration of the vcpu_info area happens just after the CPU is waken from suspension and before it leaves the IPI handler, and it's the CPU itself the one that calls VCPUOP_register_vcpu_info (so we can avoid the gate in Xen that prevents registering the vcpu_info area for CPUs different that ourself). Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 8:33 ` [Xen-devel] " Ian Campbell 2014-04-09 9:04 ` Roger Pau Monné @ 2014-04-09 9:04 ` Roger Pau Monné 1 sibling, 0 replies; 36+ messages in thread From: Roger Pau Monné @ 2014-04-09 9:04 UTC (permalink / raw) To: Ian Campbell, Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky On 09/04/14 10:33, Ian Campbell wrote: > On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote: >> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: >>> On 08/04/14 19:25, konrad@kernel.org wrote: >>>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>>> >>>> When we migrate an HVM guest, by default our shared_info can >>>> only hold up to 32 CPUs. As such the hypercall >>>> VCPUOP_register_vcpu_info was introduced which allowed us to >>>> setup per-page areas for VCPUs. This means we can boot PVHVM >>>> guest with more than 32 VCPUs. During migration the per-cpu >>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn >>>> is set to INVALID_MFN) so that the newly migrated guest >>>> can do make the VCPUOP_register_vcpu_info hypercall. >>>> >>>> Unfortunatly we end up triggering this condition: >>>> /* Run this command on yourself or on other offline VCPUS. */ >>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) >>>> >>>> which means we are unable to setup the per-cpu VCPU structures >>>> for running vCPUS. The Linux PV code paths make this work by >>>> iterating over every vCPU with: >>>> >>>> 1) is target CPU up (VCPUOP_is_up hypercall?) >>>> 2) if yes, then VCPUOP_down to pause it. >>>> 3) VCPUOP_register_vcpu_info >>>> 4) if it was down, then VCPUOP_up to bring it back up >>>> >>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are >>>> not allowed on HVM guests we can't do this. This patch >>>> enables this. >>> >>> Hmmm, this looks like a very convoluted approach to something that could >>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into >>> suspension, which means that all vCPUs except vCPU#0 will be in the >>> cpususpend_handler, see: >>> >>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 >> >> How do you 'suspend' them? If I remember there is a disadvantage of doing >> this as you have to bring all the CPUs "offline". That in Linux means using >> the stop_machine which is pretty big hammer and increases the latency for migration. > > Yes, this is why the ability to have the toolstack save/restore the > secondary vcpu state was added. It's especially important for > checkpointing, but it's relevant to regular migrate as a performance > improvement too. > > It's not just stop-machine, IIRC it's a tonne of udev events relating to > cpus off/onlinign etc too and all the userspace activity which that > implies. Well, what it's done on FreeBSD is nothing like that, it's called the cpususpend handler, but it's not off-lining CPUs or anything like that, it just places the CPU in a while loop inside of an IPI handler, so we can do something like this will all APs: while (suspended) pause(); register_vcpu_info(); So the registration of the vcpu_info area happens just after the CPU is waken from suspension and before it leaves the IPI handler, and it's the CPU itself the one that calls VCPUOP_register_vcpu_info (so we can avoid the gate in Xen that prevents registering the vcpu_info area for CPUs different that ourself). Roger. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 18:18 ` [Xen-devel] " Roger Pau Monné 2014-04-08 18:53 ` Konrad Rzeszutek Wilk @ 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 1 sibling, 0 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-08 18:53 UTC (permalink / raw) To: Roger Pau Monné Cc: keir, linux-kernel, konrad, david.vrabel, jbeulich, xen-devel, boris.ostrovsky On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote: > On 08/04/14 19:25, konrad@kernel.org wrote: > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > When we migrate an HVM guest, by default our shared_info can > > only hold up to 32 CPUs. As such the hypercall > > VCPUOP_register_vcpu_info was introduced which allowed us to > > setup per-page areas for VCPUs. This means we can boot PVHVM > > guest with more than 32 VCPUs. During migration the per-cpu > > structure is allocated fresh by the hypervisor (vcpu_info_mfn > > is set to INVALID_MFN) so that the newly migrated guest > > can do make the VCPUOP_register_vcpu_info hypercall. > > > > Unfortunatly we end up triggering this condition: > > /* Run this command on yourself or on other offline VCPUS. */ > > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > > > > which means we are unable to setup the per-cpu VCPU structures > > for running vCPUS. The Linux PV code paths make this work by > > iterating over every vCPU with: > > > > 1) is target CPU up (VCPUOP_is_up hypercall?) > > 2) if yes, then VCPUOP_down to pause it. > > 3) VCPUOP_register_vcpu_info > > 4) if it was down, then VCPUOP_up to bring it back up > > > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > > not allowed on HVM guests we can't do this. This patch > > enables this. > > Hmmm, this looks like a very convoluted approach to something that could > be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into > suspension, which means that all vCPUs except vCPU#0 will be in the > cpususpend_handler, see: > > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 How do you 'suspend' them? If I remember there is a disadvantage of doing this as you have to bring all the CPUs "offline". That in Linux means using the stop_machine which is pretty big hammer and increases the latency for migration. > > Then on resume we unblock the "suspended" CPUs, and the first thing they > do is call cpu_ops.cpu_resume which is basically going to setup the > vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar > is possible under Linux, but it seems easier and doesn't require any > Xen-side changes. > > Roger. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 17:25 ` konrad 2014-04-08 18:18 ` [Xen-devel] " Roger Pau Monné @ 2014-04-08 18:18 ` Roger Pau Monné 2014-04-09 9:06 ` Jan Beulich 2014-04-09 9:06 ` Jan Beulich 3 siblings, 0 replies; 36+ messages in thread From: Roger Pau Monné @ 2014-04-08 18:18 UTC (permalink / raw) To: konrad, xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich On 08/04/14 19:25, konrad@kernel.org wrote: > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > When we migrate an HVM guest, by default our shared_info can > only hold up to 32 CPUs. As such the hypercall > VCPUOP_register_vcpu_info was introduced which allowed us to > setup per-page areas for VCPUs. This means we can boot PVHVM > guest with more than 32 VCPUs. During migration the per-cpu > structure is allocated fresh by the hypervisor (vcpu_info_mfn > is set to INVALID_MFN) so that the newly migrated guest > can do make the VCPUOP_register_vcpu_info hypercall. > > Unfortunatly we end up triggering this condition: > /* Run this command on yourself or on other offline VCPUS. */ > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) > > which means we are unable to setup the per-cpu VCPU structures > for running vCPUS. The Linux PV code paths make this work by > iterating over every vCPU with: > > 1) is target CPU up (VCPUOP_is_up hypercall?) > 2) if yes, then VCPUOP_down to pause it. > 3) VCPUOP_register_vcpu_info > 4) if it was down, then VCPUOP_up to bring it back up > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are > not allowed on HVM guests we can't do this. This patch > enables this. Hmmm, this looks like a very convoluted approach to something that could be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into suspension, which means that all vCPUs except vCPU#0 will be in the cpususpend_handler, see: http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878&view=markup#l1460 Then on resume we unblock the "suspended" CPUs, and the first thing they do is call cpu_ops.cpu_resume which is basically going to setup the vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar is possible under Linux, but it seems easier and doesn't require any Xen-side changes. Roger. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 17:25 ` konrad 2014-04-08 18:18 ` [Xen-devel] " Roger Pau Monné 2014-04-08 18:18 ` Roger Pau Monné @ 2014-04-09 9:06 ` Jan Beulich 2014-04-09 9:06 ` Jan Beulich 3 siblings, 0 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-09 9:06 UTC (permalink / raw) To: konrad; +Cc: keir, linux-kernel, david.vrabel, xen-devel, boris.ostrovsky >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( > case VCPUOP_stop_singleshot_timer: > case VCPUOP_register_vcpu_info: > case VCPUOP_register_vcpu_time_memory_area: > + case VCPUOP_down: > + case VCPUOP_up: > + case VCPUOP_is_up: This, if I checked it properly, leaves only VCPUOP_initialise, VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. None of which look inherently bad to be used by HVM (but VCPUOP_initialise certainly would need closer checking), so I wonder whether either the wrapper shouldn't be dropped altogether or at least be converted from a white list approach to a black list one. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-08 17:25 ` konrad ` (2 preceding siblings ...) 2014-04-09 9:06 ` Jan Beulich @ 2014-04-09 9:06 ` Jan Beulich 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 3 siblings, 2 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-09 9:06 UTC (permalink / raw) To: konrad Cc: david.vrabel, xen-devel, boris.ostrovsky, Konrad Rzeszutek Wilk, linux-kernel, keir >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( > case VCPUOP_stop_singleshot_timer: > case VCPUOP_register_vcpu_info: > case VCPUOP_register_vcpu_time_memory_area: > + case VCPUOP_down: > + case VCPUOP_up: > + case VCPUOP_is_up: This, if I checked it properly, leaves only VCPUOP_initialise, VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. None of which look inherently bad to be used by HVM (but VCPUOP_initialise certainly would need closer checking), so I wonder whether either the wrapper shouldn't be dropped altogether or at least be converted from a white list approach to a black list one. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 9:06 ` Jan Beulich @ 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 2014-04-09 15:36 ` Jan Beulich 2014-04-09 15:36 ` Jan Beulich 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 1 sibling, 2 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-09 15:27 UTC (permalink / raw) To: Jan Beulich Cc: konrad, david.vrabel, xen-devel, boris.ostrovsky, linux-kernel, keir On Wed, Apr 09, 2014 at 10:06:12AM +0100, Jan Beulich wrote: > >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( > > case VCPUOP_stop_singleshot_timer: > > case VCPUOP_register_vcpu_info: > > case VCPUOP_register_vcpu_time_memory_area: > > + case VCPUOP_down: > > + case VCPUOP_up: > > + case VCPUOP_is_up: > > This, if I checked it properly, leaves only VCPUOP_initialise, > VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. > None of which look inherently bad to be used by HVM (but > VCPUOP_initialise certainly would need closer checking), so I > wonder whether either the wrapper shouldn't be dropped altogether > or at least be converted from a white list approach to a black list one. I was being conservative here because I did not want to allow the other ones without at least testing it. Perhaps that can be done as a seperate patch and this just as a bug-fix? > > Jan > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:27 ` Konrad Rzeszutek Wilk @ 2014-04-09 15:36 ` Jan Beulich 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 2014-04-09 15:36 ` Jan Beulich 1 sibling, 2 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-09 15:36 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: david.vrabel, konrad, xen-devel, boris.ostrovsky, linux-kernel, keir >>> On 09.04.14 at 17:27, <konrad.wilk@oracle.com> wrote: > On Wed, Apr 09, 2014 at 10:06:12AM +0100, Jan Beulich wrote: >> >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: >> > --- a/xen/arch/x86/hvm/hvm.c >> > +++ b/xen/arch/x86/hvm/hvm.c >> > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( >> > case VCPUOP_stop_singleshot_timer: >> > case VCPUOP_register_vcpu_info: >> > case VCPUOP_register_vcpu_time_memory_area: >> > + case VCPUOP_down: >> > + case VCPUOP_up: >> > + case VCPUOP_is_up: >> >> This, if I checked it properly, leaves only VCPUOP_initialise, >> VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. >> None of which look inherently bad to be used by HVM (but >> VCPUOP_initialise certainly would need closer checking), so I >> wonder whether either the wrapper shouldn't be dropped altogether >> or at least be converted from a white list approach to a black list one. > > I was being conservative here because I did not want to allow the > other ones without at least testing it. > > Perhaps that can be done as a seperate patch and this just as > a bug-fix? I'm clearly not in favor of the patch as is - minimally I'd want it to convert the white list to a black list. And once you do this it would seem rather natural to not pointlessly add entries. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:36 ` Jan Beulich @ 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 2014-04-23 8:57 ` Jan Beulich 2014-04-23 8:57 ` Jan Beulich 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 1 sibling, 2 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-22 18:34 UTC (permalink / raw) To: Jan Beulich Cc: david.vrabel, konrad, xen-devel, boris.ostrovsky, linux-kernel, keir On Wed, Apr 09, 2014 at 04:36:53PM +0100, Jan Beulich wrote: > >>> On 09.04.14 at 17:27, <konrad.wilk@oracle.com> wrote: > > On Wed, Apr 09, 2014 at 10:06:12AM +0100, Jan Beulich wrote: > >> >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > >> > --- a/xen/arch/x86/hvm/hvm.c > >> > +++ b/xen/arch/x86/hvm/hvm.c > >> > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( > >> > case VCPUOP_stop_singleshot_timer: > >> > case VCPUOP_register_vcpu_info: > >> > case VCPUOP_register_vcpu_time_memory_area: > >> > + case VCPUOP_down: > >> > + case VCPUOP_up: > >> > + case VCPUOP_is_up: > >> > >> This, if I checked it properly, leaves only VCPUOP_initialise, > >> VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. > >> None of which look inherently bad to be used by HVM (but > >> VCPUOP_initialise certainly would need closer checking), so I > >> wonder whether either the wrapper shouldn't be dropped altogether > >> or at least be converted from a white list approach to a black list one. > > > > I was being conservative here because I did not want to allow the > > other ones without at least testing it. > > > > Perhaps that can be done as a seperate patch and this just as > > a bug-fix? > > I'm clearly not in favor of the patch as is - minimally I'd want it to > convert the white list to a black list. And once you do this it would > seem rather natural to not pointlessly add entries. With this patch: diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index b5b92fe..5eee790 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3455,34 +3455,6 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } } -static long hvm_vcpu_op( - int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) -{ - long rc; - - switch ( cmd ) - { - case VCPUOP_register_runstate_memory_area: - case VCPUOP_get_runstate_info: - case VCPUOP_set_periodic_timer: - case VCPUOP_stop_periodic_timer: - case VCPUOP_set_singleshot_timer: - case VCPUOP_stop_singleshot_timer: - case VCPUOP_register_vcpu_info: - case VCPUOP_register_vcpu_time_memory_area: - case VCPUOP_down: - case VCPUOP_up: - case VCPUOP_is_up: - rc = do_vcpu_op(cmd, vcpuid, arg); - break; - default: - rc = -ENOSYS; - break; - } - - return rc; -} - typedef unsigned long hvm_hypercall_t( unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); @@ -3517,30 +3489,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return compat_memory_op(cmd, arg); } -static long hvm_vcpu_op_compat32( - int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) -{ - long rc; - - switch ( cmd ) - { - case VCPUOP_register_runstate_memory_area: - case VCPUOP_get_runstate_info: - case VCPUOP_set_periodic_timer: - case VCPUOP_stop_periodic_timer: - case VCPUOP_set_singleshot_timer: - case VCPUOP_stop_singleshot_timer: - case VCPUOP_register_vcpu_info: - case VCPUOP_register_vcpu_time_memory_area: - rc = compat_vcpu_op(cmd, vcpuid, arg); - break; - default: - rc = -ENOSYS; - break; - } - - return rc; -} static long hvm_physdev_op_compat32( int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) @@ -3563,7 +3511,7 @@ static long hvm_physdev_op_compat32( static hvm_hypercall_t *const hvm_hypercall64_table[NR_hypercalls] = { [ __HYPERVISOR_memory_op ] = (hvm_hypercall_t *)hvm_memory_op, [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op, - [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op, + HYPERCALL(vcpu_op), [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op, HYPERCALL(xen_version), HYPERCALL(console_io), @@ -3583,7 +3531,7 @@ static hvm_hypercall_t *const hvm_hypercall64_table[NR_hypercalls] = { static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = { [ __HYPERVISOR_memory_op ] = (hvm_hypercall_t *)hvm_memory_op_compat32, [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op_compat32, - [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op_compat32, + COMPAT_CALL(vcpu_op), [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op_compat32, COMPAT_CALL(xen_version), HYPERCALL(console_io), And with an HVM guest poking at the rest of VCPUOPs: VCPUOP_get_physid, VCPUOP_initialise, and VCPUOP_send_nmi - either before the CPU is up or when it is up - work. That is: the VCPUOP_get_physid would return -EINVAL; VCPUOP_initialise would return either -EEXIST or 0, and in either case the content of the ctxt was full of zeros. The VCPUOP_send_nmi did cause the HVM to get an NMI and it spitted out 'Dazed and confused'. It also noticed corruption: [ 3.611742] Corrupted low memory at c000fffc (fffc phys) = 00029b00 [ 2.386785] Corrupted low memory at ffff88000000fff8 (fff8 phys) = 2990000000000 Which is odd because there does not seem to be anything in the path of hypervisor that would cause this. > > Jan > ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-22 18:34 ` Konrad Rzeszutek Wilk @ 2014-04-23 8:57 ` Jan Beulich 2014-04-23 8:57 ` Jan Beulich 1 sibling, 0 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-23 8:57 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, david.vrabel, xen-devel, boris.ostrovsky >>> On 22.04.14 at 20:34, <konrad.wilk@oracle.com> wrote: > With this patch: > [...] > And with an HVM guest poking at the rest of VCPUOPs: VCPUOP_get_physid, > VCPUOP_initialise, and VCPUOP_send_nmi - either before the CPU is up or > when it is up - work. > > That is: the VCPUOP_get_physid would return -EINVAL; VCPUOP_initialise > would return either -EEXIST or 0, and in either case > the content of the ctxt was full of zeros. Good. > The VCPUOP_send_nmi did cause the HVM to get an NMI and it spitted out > 'Dazed and confused'. It also noticed corruption: > > [ 3.611742] Corrupted low memory at c000fffc (fffc phys) = 00029b00 > [ 2.386785] Corrupted low memory at ffff88000000fff8 (fff8 phys) = > 2990000000000 > > Which is odd because there does not seem to be anything in the path > of hypervisor that would cause this. Indeed. This looks a little like a segment descriptor got modified here with a descriptor table base of zero and a selector of 0xfff8. That corruption needs to be hunted down in any case before enabling VCPUOP_send_nmi for HVM. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 2014-04-23 8:57 ` Jan Beulich @ 2014-04-23 8:57 ` Jan Beulich 1 sibling, 0 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-23 8:57 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: david.vrabel, konrad, xen-devel, boris.ostrovsky, linux-kernel, keir >>> On 22.04.14 at 20:34, <konrad.wilk@oracle.com> wrote: > With this patch: > [...] > And with an HVM guest poking at the rest of VCPUOPs: VCPUOP_get_physid, > VCPUOP_initialise, and VCPUOP_send_nmi - either before the CPU is up or > when it is up - work. > > That is: the VCPUOP_get_physid would return -EINVAL; VCPUOP_initialise > would return either -EEXIST or 0, and in either case > the content of the ctxt was full of zeros. Good. > The VCPUOP_send_nmi did cause the HVM to get an NMI and it spitted out > 'Dazed and confused'. It also noticed corruption: > > [ 3.611742] Corrupted low memory at c000fffc (fffc phys) = 00029b00 > [ 2.386785] Corrupted low memory at ffff88000000fff8 (fff8 phys) = > 2990000000000 > > Which is odd because there does not seem to be anything in the path > of hypervisor that would cause this. Indeed. This looks a little like a segment descriptor got modified here with a descriptor table base of zero and a selector of 0xfff8. That corruption needs to be hunted down in any case before enabling VCPUOP_send_nmi for HVM. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:36 ` Jan Beulich 2014-04-22 18:34 ` Konrad Rzeszutek Wilk @ 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 1 sibling, 0 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-22 18:34 UTC (permalink / raw) To: Jan Beulich Cc: keir, linux-kernel, konrad, david.vrabel, xen-devel, boris.ostrovsky On Wed, Apr 09, 2014 at 04:36:53PM +0100, Jan Beulich wrote: > >>> On 09.04.14 at 17:27, <konrad.wilk@oracle.com> wrote: > > On Wed, Apr 09, 2014 at 10:06:12AM +0100, Jan Beulich wrote: > >> >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > >> > --- a/xen/arch/x86/hvm/hvm.c > >> > +++ b/xen/arch/x86/hvm/hvm.c > >> > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( > >> > case VCPUOP_stop_singleshot_timer: > >> > case VCPUOP_register_vcpu_info: > >> > case VCPUOP_register_vcpu_time_memory_area: > >> > + case VCPUOP_down: > >> > + case VCPUOP_up: > >> > + case VCPUOP_is_up: > >> > >> This, if I checked it properly, leaves only VCPUOP_initialise, > >> VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. > >> None of which look inherently bad to be used by HVM (but > >> VCPUOP_initialise certainly would need closer checking), so I > >> wonder whether either the wrapper shouldn't be dropped altogether > >> or at least be converted from a white list approach to a black list one. > > > > I was being conservative here because I did not want to allow the > > other ones without at least testing it. > > > > Perhaps that can be done as a seperate patch and this just as > > a bug-fix? > > I'm clearly not in favor of the patch as is - minimally I'd want it to > convert the white list to a black list. And once you do this it would > seem rather natural to not pointlessly add entries. With this patch: diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index b5b92fe..5eee790 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3455,34 +3455,6 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } } -static long hvm_vcpu_op( - int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) -{ - long rc; - - switch ( cmd ) - { - case VCPUOP_register_runstate_memory_area: - case VCPUOP_get_runstate_info: - case VCPUOP_set_periodic_timer: - case VCPUOP_stop_periodic_timer: - case VCPUOP_set_singleshot_timer: - case VCPUOP_stop_singleshot_timer: - case VCPUOP_register_vcpu_info: - case VCPUOP_register_vcpu_time_memory_area: - case VCPUOP_down: - case VCPUOP_up: - case VCPUOP_is_up: - rc = do_vcpu_op(cmd, vcpuid, arg); - break; - default: - rc = -ENOSYS; - break; - } - - return rc; -} - typedef unsigned long hvm_hypercall_t( unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); @@ -3517,30 +3489,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return compat_memory_op(cmd, arg); } -static long hvm_vcpu_op_compat32( - int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) -{ - long rc; - - switch ( cmd ) - { - case VCPUOP_register_runstate_memory_area: - case VCPUOP_get_runstate_info: - case VCPUOP_set_periodic_timer: - case VCPUOP_stop_periodic_timer: - case VCPUOP_set_singleshot_timer: - case VCPUOP_stop_singleshot_timer: - case VCPUOP_register_vcpu_info: - case VCPUOP_register_vcpu_time_memory_area: - rc = compat_vcpu_op(cmd, vcpuid, arg); - break; - default: - rc = -ENOSYS; - break; - } - - return rc; -} static long hvm_physdev_op_compat32( int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) @@ -3563,7 +3511,7 @@ static long hvm_physdev_op_compat32( static hvm_hypercall_t *const hvm_hypercall64_table[NR_hypercalls] = { [ __HYPERVISOR_memory_op ] = (hvm_hypercall_t *)hvm_memory_op, [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op, - [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op, + HYPERCALL(vcpu_op), [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op, HYPERCALL(xen_version), HYPERCALL(console_io), @@ -3583,7 +3531,7 @@ static hvm_hypercall_t *const hvm_hypercall64_table[NR_hypercalls] = { static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = { [ __HYPERVISOR_memory_op ] = (hvm_hypercall_t *)hvm_memory_op_compat32, [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op_compat32, - [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op_compat32, + COMPAT_CALL(vcpu_op), [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op_compat32, COMPAT_CALL(xen_version), HYPERCALL(console_io), And with an HVM guest poking at the rest of VCPUOPs: VCPUOP_get_physid, VCPUOP_initialise, and VCPUOP_send_nmi - either before the CPU is up or when it is up - work. That is: the VCPUOP_get_physid would return -EINVAL; VCPUOP_initialise would return either -EEXIST or 0, and in either case the content of the ctxt was full of zeros. The VCPUOP_send_nmi did cause the HVM to get an NMI and it spitted out 'Dazed and confused'. It also noticed corruption: [ 3.611742] Corrupted low memory at c000fffc (fffc phys) = 00029b00 [ 2.386785] Corrupted low memory at ffff88000000fff8 (fff8 phys) = 2990000000000 Which is odd because there does not seem to be anything in the path of hypervisor that would cause this. > > Jan > ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 2014-04-09 15:36 ` Jan Beulich @ 2014-04-09 15:36 ` Jan Beulich 1 sibling, 0 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-09 15:36 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: keir, linux-kernel, konrad, david.vrabel, xen-devel, boris.ostrovsky >>> On 09.04.14 at 17:27, <konrad.wilk@oracle.com> wrote: > On Wed, Apr 09, 2014 at 10:06:12AM +0100, Jan Beulich wrote: >> >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: >> > --- a/xen/arch/x86/hvm/hvm.c >> > +++ b/xen/arch/x86/hvm/hvm.c >> > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( >> > case VCPUOP_stop_singleshot_timer: >> > case VCPUOP_register_vcpu_info: >> > case VCPUOP_register_vcpu_time_memory_area: >> > + case VCPUOP_down: >> > + case VCPUOP_up: >> > + case VCPUOP_is_up: >> >> This, if I checked it properly, leaves only VCPUOP_initialise, >> VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. >> None of which look inherently bad to be used by HVM (but >> VCPUOP_initialise certainly would need closer checking), so I >> wonder whether either the wrapper shouldn't be dropped altogether >> or at least be converted from a white list approach to a black list one. > > I was being conservative here because I did not want to allow the > other ones without at least testing it. > > Perhaps that can be done as a seperate patch and this just as > a bug-fix? I'm clearly not in favor of the patch as is - minimally I'd want it to convert the white list to a black list. And once you do this it would seem rather natural to not pointlessly add entries. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating. 2014-04-09 9:06 ` Jan Beulich 2014-04-09 15:27 ` Konrad Rzeszutek Wilk @ 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 1 sibling, 0 replies; 36+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-09 15:27 UTC (permalink / raw) To: Jan Beulich Cc: keir, linux-kernel, konrad, david.vrabel, xen-devel, boris.ostrovsky On Wed, Apr 09, 2014 at 10:06:12AM +0100, Jan Beulich wrote: > >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -3470,6 +3470,9 @@ static long hvm_vcpu_op( > > case VCPUOP_stop_singleshot_timer: > > case VCPUOP_register_vcpu_info: > > case VCPUOP_register_vcpu_time_memory_area: > > + case VCPUOP_down: > > + case VCPUOP_up: > > + case VCPUOP_is_up: > > This, if I checked it properly, leaves only VCPUOP_initialise, > VCPUOP_send_nmi, and VCPUOP_get_physid disallowed for HVM. > None of which look inherently bad to be used by HVM (but > VCPUOP_initialise certainly would need closer checking), so I > wonder whether either the wrapper shouldn't be dropped altogether > or at least be converted from a white list approach to a black list one. I was being conservative here because I did not want to allow the other ones without at least testing it. Perhaps that can be done as a seperate patch and this just as a bug-fix? > > Jan > ^ permalink raw reply [flat|nested] 36+ messages in thread
* [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs when migrating. 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad 2014-04-08 17:25 ` [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating konrad 2014-04-08 17:25 ` konrad @ 2014-04-08 17:25 ` konrad 2014-04-08 17:25 ` konrad 3 siblings, 0 replies; 36+ messages in thread From: konrad @ 2014-04-08 17:25 UTC (permalink / raw) To: xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> When Xen migrates an HVM guest, by default its shared_info can only hold up to 32 CPUs. As such the hypercall VCPUOP_register_vcpu_info was introduced which allowed us to setup per-page areas for VCPUs. This means we can boot PVHVM guest with more than 32 VCPUs. During migration the per-cpu structure is allocated freshly by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly migrated guest can make an VCPUOP_register_vcpu_info hypercall. Unfortunatly we end up triggering this condition in Xen: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running vCPUS. The Linux PV code paths make this work by iterating over every vCPU with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it. 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are not allowed on HVM guests we can't do this. However with the Xen commit "hvm: Support more than 32 VCPUS when migrating." we can do this. As such first check if VCPUOP_is_up is actually possible before trying this dance. As most of this dance code is done already in 'xen_setup_vcpu' lets make it callable on both PV and HVM. This means moving one of the checks out to 'xen_setup_runstate_info'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/enlighten.c | 21 ++++++++++++++++----- arch/x86/xen/suspend.c | 6 +----- arch/x86/xen/time.c | 3 +++ 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 201d09a..af8be96 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -236,12 +236,23 @@ static void xen_vcpu_setup(int cpu) void xen_vcpu_restore(void) { int cpu; + bool vcpuops = true; + const struct cpumask *mask; - for_each_possible_cpu(cpu) { + mask = xen_pv_domain() ? cpu_possible_mask : cpu_online_mask; + + /* Only Xen 4.5 and higher supports this. */ + if (HYPERVISOR_vcpu_op(VCPUOP_is_up, smp_processor_id(), NULL) == -ENOSYS) + vcpuops = false; + + for_each_cpu(cpu, mask) { bool other_cpu = (cpu != smp_processor_id()); - bool is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL); + bool is_up = false; + + if (vcpuops) + is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL); - if (other_cpu && is_up && + if (vcpuops && other_cpu && is_up && HYPERVISOR_vcpu_op(VCPUOP_down, cpu, NULL)) BUG(); @@ -250,7 +261,7 @@ void xen_vcpu_restore(void) if (have_vcpu_info_placement) xen_vcpu_setup(cpu); - if (other_cpu && is_up && + if (vcpuops && other_cpu && is_up && HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) BUG(); } @@ -1751,7 +1762,7 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { /* Leave it to be NULL. */ if (cpu >= MAX_VIRT_CPUS) - continue; + per_cpu(xen_vcpu, cpu) = NULL; /* Triggers xen_vcpu_setup.*/ per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; } } diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..6fb3298 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -33,11 +33,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) xen_hvm_init_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); - if (xen_feature(XENFEAT_hvm_safe_pvclock)) { - for_each_online_cpu(cpu) { - xen_setup_runstate_info(cpu); - } - } + xen_vcpu_restore(); #endif } diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 7b78f88..d4feb2e 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -105,6 +105,9 @@ void xen_setup_runstate_info(int cpu) { struct vcpu_register_runstate_memory_area area; + if (xen_hvm_domain() && !(xen_feature(XENFEAT_hvm_safe_pvclock))) + return; + area.addr.v = &per_cpu(xen_runstate, cpu); if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area, -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs when migrating. 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad ` (2 preceding siblings ...) 2014-04-08 17:25 ` [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs " konrad @ 2014-04-08 17:25 ` konrad 2014-04-09 8:03 ` Jan Beulich 2014-04-09 8:03 ` Jan Beulich 3 siblings, 2 replies; 36+ messages in thread From: konrad @ 2014-04-08 17:25 UTC (permalink / raw) To: xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich Cc: Konrad Rzeszutek Wilk From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> When Xen migrates an HVM guest, by default its shared_info can only hold up to 32 CPUs. As such the hypercall VCPUOP_register_vcpu_info was introduced which allowed us to setup per-page areas for VCPUs. This means we can boot PVHVM guest with more than 32 VCPUs. During migration the per-cpu structure is allocated freshly by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly migrated guest can make an VCPUOP_register_vcpu_info hypercall. Unfortunatly we end up triggering this condition in Xen: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running vCPUS. The Linux PV code paths make this work by iterating over every vCPU with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it. 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are not allowed on HVM guests we can't do this. However with the Xen commit "hvm: Support more than 32 VCPUS when migrating." we can do this. As such first check if VCPUOP_is_up is actually possible before trying this dance. As most of this dance code is done already in 'xen_setup_vcpu' lets make it callable on both PV and HVM. This means moving one of the checks out to 'xen_setup_runstate_info'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/enlighten.c | 21 ++++++++++++++++----- arch/x86/xen/suspend.c | 6 +----- arch/x86/xen/time.c | 3 +++ 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 201d09a..af8be96 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -236,12 +236,23 @@ static void xen_vcpu_setup(int cpu) void xen_vcpu_restore(void) { int cpu; + bool vcpuops = true; + const struct cpumask *mask; - for_each_possible_cpu(cpu) { + mask = xen_pv_domain() ? cpu_possible_mask : cpu_online_mask; + + /* Only Xen 4.5 and higher supports this. */ + if (HYPERVISOR_vcpu_op(VCPUOP_is_up, smp_processor_id(), NULL) == -ENOSYS) + vcpuops = false; + + for_each_cpu(cpu, mask) { bool other_cpu = (cpu != smp_processor_id()); - bool is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL); + bool is_up = false; + + if (vcpuops) + is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL); - if (other_cpu && is_up && + if (vcpuops && other_cpu && is_up && HYPERVISOR_vcpu_op(VCPUOP_down, cpu, NULL)) BUG(); @@ -250,7 +261,7 @@ void xen_vcpu_restore(void) if (have_vcpu_info_placement) xen_vcpu_setup(cpu); - if (other_cpu && is_up && + if (vcpuops && other_cpu && is_up && HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) BUG(); } @@ -1751,7 +1762,7 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { /* Leave it to be NULL. */ if (cpu >= MAX_VIRT_CPUS) - continue; + per_cpu(xen_vcpu, cpu) = NULL; /* Triggers xen_vcpu_setup.*/ per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; } } diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..6fb3298 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -33,11 +33,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) xen_hvm_init_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); - if (xen_feature(XENFEAT_hvm_safe_pvclock)) { - for_each_online_cpu(cpu) { - xen_setup_runstate_info(cpu); - } - } + xen_vcpu_restore(); #endif } diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 7b78f88..d4feb2e 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -105,6 +105,9 @@ void xen_setup_runstate_info(int cpu) { struct vcpu_register_runstate_memory_area area; + if (xen_hvm_domain() && !(xen_feature(XENFEAT_hvm_safe_pvclock))) + return; + area.addr.v = &per_cpu(xen_runstate, cpu); if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area, -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs when migrating. 2014-04-08 17:25 ` konrad @ 2014-04-09 8:03 ` Jan Beulich 2014-04-09 8:03 ` Jan Beulich 1 sibling, 0 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-09 8:03 UTC (permalink / raw) To: konrad; +Cc: keir, linux-kernel, david.vrabel, xen-devel, boris.ostrovsky >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > + /* Only Xen 4.5 and higher supports this. */ > + if (HYPERVISOR_vcpu_op(VCPUOP_is_up, smp_processor_id(), NULL) == -ENOSYS) > + vcpuops = false; Did you mean to say "for HVM guests" in the comment? And of course the comment could quickly become stale if we backported the Xen side change to e.g. 4.4.1. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs when migrating. 2014-04-08 17:25 ` konrad 2014-04-09 8:03 ` Jan Beulich @ 2014-04-09 8:03 ` Jan Beulich 1 sibling, 0 replies; 36+ messages in thread From: Jan Beulich @ 2014-04-09 8:03 UTC (permalink / raw) To: konrad Cc: david.vrabel, xen-devel, boris.ostrovsky, Konrad Rzeszutek Wilk, linux-kernel, keir >>> On 08.04.14 at 19:25, <konrad@kernel.org> wrote: > + /* Only Xen 4.5 and higher supports this. */ > + if (HYPERVISOR_vcpu_op(VCPUOP_is_up, smp_processor_id(), NULL) == -ENOSYS) > + vcpuops = false; Did you mean to say "for HVM guests" in the comment? And of course the comment could quickly become stale if we backported the Xen side change to e.g. 4.4.1. Jan ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1). 2014-04-07 8:32 ` Ian Campbell 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad @ 2014-04-08 17:25 ` konrad 1 sibling, 0 replies; 36+ messages in thread From: konrad @ 2014-04-08 17:25 UTC (permalink / raw) To: xen-devel, david.vrabel, boris.ostrovsky, linux-kernel, keir, jbeulich These two patches (one for Linux, one for Xen) allow PVHVM guests to use the per-cpu VCPU mechanism after migration. Currently when an PVHVM guest migrates all the per-cpu information is lost and we fallback on the shared_info structure. This is regardless if the HVM guest has 2 or 128 CPUs. Since the structure has an array for only 32 CPUs that means if we are to migrate a PVHVM guest - we can only do it up to 32 CPUs. These patches fix it and allow more than 32 VCPUs to be migrated with PVHVM Linux guests. The Linux diff is: arch/x86/xen/enlighten.c | 21 ++++++++++++++++----- arch/x86/xen/suspend.c | 6 +----- arch/x86/xen/time.c | 3 +++ 3 files changed, 20 insertions(+), 10 deletions(-) while the Xen one is: xen/arch/x86/hvm/hvm.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2014-04-23 8:57 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-04-04 20:44 33 VCPUs in HVM guests with live migration with Linux hangs Konrad Rzeszutek Wilk 2014-04-07 8:32 ` Ian Campbell 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad 2014-04-08 17:25 ` [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating konrad 2014-04-08 17:25 ` konrad 2014-04-08 18:18 ` [Xen-devel] " Roger Pau Monné 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-09 7:37 ` Roger Pau Monné 2014-04-09 15:34 ` Konrad Rzeszutek Wilk 2014-04-09 15:34 ` [Xen-devel] " Konrad Rzeszutek Wilk 2014-04-09 15:38 ` David Vrabel 2014-04-09 15:55 ` Konrad Rzeszutek Wilk 2014-04-09 15:55 ` [Xen-devel] " Konrad Rzeszutek Wilk 2014-04-09 15:38 ` David Vrabel 2014-04-09 7:37 ` Roger Pau Monné 2014-04-09 8:33 ` Ian Campbell 2014-04-09 8:33 ` [Xen-devel] " Ian Campbell 2014-04-09 9:04 ` Roger Pau Monné 2014-04-09 9:04 ` [Xen-devel] " Roger Pau Monné 2014-04-08 18:53 ` Konrad Rzeszutek Wilk 2014-04-08 18:18 ` Roger Pau Monné 2014-04-09 9:06 ` Jan Beulich 2014-04-09 9:06 ` Jan Beulich 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 2014-04-09 15:36 ` Jan Beulich 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 2014-04-23 8:57 ` Jan Beulich 2014-04-23 8:57 ` Jan Beulich 2014-04-22 18:34 ` Konrad Rzeszutek Wilk 2014-04-09 15:36 ` Jan Beulich 2014-04-09 15:27 ` Konrad Rzeszutek Wilk 2014-04-08 17:25 ` [LINUX PATCH 2/2] xen/pvhvm: Support more than 32 VCPUs " konrad 2014-04-08 17:25 ` konrad 2014-04-09 8:03 ` Jan Beulich 2014-04-09 8:03 ` Jan Beulich 2014-04-08 17:25 ` [PATCH] Fixes for more than 32 VCPUs migration for HVM guests (v1) konrad
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.