From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58295) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cCRE6-0002xN-UO for qemu-devel@nongnu.org; Thu, 01 Dec 2016 08:16:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cCRE2-0007J2-TG for qemu-devel@nongnu.org; Thu, 01 Dec 2016 08:16:38 -0500 Received: from 5.mo3.mail-out.ovh.net ([87.98.178.36]:58488) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cCRE2-0007HJ-Kg for qemu-devel@nongnu.org; Thu, 01 Dec 2016 08:16:34 -0500 Received: from player772.ha.ovh.net (b7.ovh.net [213.186.33.57]) by mo3.mail-out.ovh.net (Postfix) with ESMTP id 020F165785 for ; Thu, 1 Dec 2016 14:16:31 +0100 (CET) Date: Thu, 1 Dec 2016 14:16:15 +0100 From: Greg Kurz Message-ID: <20161201141615.13c658cb@bahia> In-Reply-To: <1479248275-18889-1-git-send-email-david@gibson.dropbear.id.au> References: <1479248275-18889-1-git-send-email-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFCv2 00/12] Clean up compatibility mode handling List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: clg@kaod.org, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, agraf@suse.de, qemu-ppc@nongnu.org, qemu-devel@nongnu.org, abologna@redhat.com, thuth@redhat.com, lvivier@redhat.com On Wed, 16 Nov 2016 09:17:43 +1100 David Gibson wrote: > This series is a significant rework to how we handle CPU compatibility > modes on ppc. > David, Please find below the results of the migration tests. > * Information about compatibility modes was previously open coded and > scattered across a number of functions in both target-ppc and spapr > code. It's now brought together into a common table of > compatibility modes. > > * There was significant conceptual confusion about what a > compatibility mode means, and how it interacts with the machine > type. This cleans that up, clarifying that a compatibility mode > (as an externally set option) only makes sense on machine types > that don't permit the guest hypervisor privilege (i.e. 'pseries') > > * It was previously the user's (or management layer's) responsibility > to determine compatibility of CPUs on either end for migration. > This uses the compatibility modes to check that properly during an > incoming migration. > > * Some ill-considered sanity checks broke migration from 2.6 to 2.7, > due to some new instruction classes being added. This should avoid > a repeat of that problem for 2.8 (we may be able to backport a > minimal subset to 2.7-stable to fix the existing problem). > > Patches 1-3 are preliminary cleanups which could stand on their own. > Patches 4-12 are the compatibility mode cleanup proper. > > So far, this has been mimimally tested. There are quite a few > migration cases to check. For example: > > Basic: > > 1) Boot guest with -cpu host > Should go into POWER8 compat mode after CAS > Previously would have been raw mode > > 2) Boot guest with -machine pseries,max-cpu-compat=power7 -cpu host > Should go into POWER7 compat mode > > 3) Boot guest with -cpu host,compat=power7 > Should act as (2), but print a warning > > 4) Boot guest via libvirt with power7 compat mode specified in XML > Should act as (3), (2) once we fix libvirt > > 5) Hack guest to only advertise power7 compatibility, boot with -cpu host > Should go into POWER7 compat mode after CAS > > 6) Hack guest to only advertise real PVRs > Should remain in POWER8 raw mode after CAS > > 7) Hack guest to only advertise real PVRs > Boot with -machine pseries,max-cpu-compat=power8 > Should fail at CAS time > > 8) Hack guest to only advertise power7 compatibility, boot with -cpu host > Reboot to normal guest > Should go to power7 compat mode after CAS of boot 1 > Should revert to raw mode on reboot > SHould go to power8 compat mode after CAS of boot 2 > > Migration: > The QEMU command line used to test migration is as follows: ppc64-softmmu/qemu-system-ppc64 \ -snapshot \ -nodefaults \ -no-shutdown \ -nographic \ -device virtio-blk-pci,drive=drive0 \ -drive file=/home/greg/images/fedora24-ppc64.qcow2,id=drive0,if=none \ -global virtio-blk-pci.disable-legacy=off \ -global virtio-blk-pci.disable-modern=on \ -device virtio-net,netdev=netdev0,mac=C0:FF:EE:00:00:66,id=net0 \ -netdev tap,id=netdev0,vhost=off,helper=/usr/libexec/qemu-bridge-helper \ -global virtio-net-pci.disable-legacy=off \ -global virtio-net-pci.disable-modern=on \ -m 4G \ -serial mon:stdio \ -trace spapr_cas_pvr Note that virtio devices are explicitely configured to run in legacy mode because I couldn't pass tests 15 and 16 otherwise, with various issues including QEMU getting killed by OOM ! I'll focus on these issues separately. > 9) Boot guest with qemu-2.6 -machine pseries-2.6 -cpu host > Migrate to qemu-2.8 -machine pseries-2.6 -cpu host > Should work, end up running in power8 raw mode > == QEMU-2.6 == spapr_cas_pvr current=0, cpu_match=1, new=0, compat flags=6 == guest (source) == cpu : POWER8 (raw), altivec supported == guest (target) == cpu : POWER8 (raw), altivec supported > 10) Boot guest with qemu-2.7 -machine pseries-2.7 -cpu host > Migrate to qemu-2.8 -machine pseries-2.7 -cpu host > Should work, end up running in power8 raw mode > == QEMU-2.7 == spapr_cas_pvr current=0, cpu_match=1, new=0, compat flags=2000000000000006 == guest (source) == cpu : POWER8 (raw), altivec supported == guest (target) == cpu : POWER8 (raw), altivec supported > 11) Boot guest with qemu-2.7 -machine pseries-2.7 -cpu host,compat=power7 > Migrate to qemu-2.8 -machine pseries-2.7 -cpu host,compat=power7 > Should work, be running in POWER7 compat after, but give warning like > (3) > == QEMU-2.7 == spapr_cas_pvr current=f000003, cpu_match=1, new=f000003, compat flags=2000000000000006 == guest (source) == cpu : POWER7 (architected), altivec supported == QEMU-2.8 == CPU 'compat' property is deprecated and has no effect; use max-cpu-compat machine property instead Migration completes but the guest gets a program interrupt: (qemu) info registers NIP 0000000000000700 LR c0000000008309ac CTR c000000000830b40 XER 0000000020000000 CPU#0 MSR 8000000000001000 HID0 0000000000000000 HF 8000000000000000 iidx 3 didx 3 TB 00000000 00000000 DECR 00000000 GPR00 0000000000000000 0000000000000000 0000000000000000 000000007fef0000 GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08 0000000000000000 0000000020000000 6000000060000000 6000000060006180 GPR12 c000000000081000 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 CR 20000000 [ E - - - - - - - ] RES ffffffffffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 0000000000000000 SRR0 6000000060006180 SRR1 c000000000081000 PVR 00000000004d0200 VRSAVE 0000000000000000 SPRG0 0000000000000000 SPRG1 0000000000000000 SPRG2 0000000000000000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 HSRR0 0000000000000000 HSRR1 0000000000000000 CFAR 0000000000000000 SDR1 0000000000000007 DAR 0000000000000000 DSISR 0000000000000000 Same happens with a pseries-2.6 machine. Would you have suggestions how to debug this ? The values in SRR0 and SRR1 look weird compared to the what is described in the ISA... > 12) Boot guest with qemu-2.7 -machine pseries-2.7 -cpu host,compat=power7 > Migrate to qemu-2.8 -machine pseries-2.7,max-cpu-compat=power7 -cpu host > Should work, be running in POWER7 compat after, no warning > Same as 11) except the CPU 'compat' warning for both pseries-2.6 and pseries-2.7. It seems to be related to the compat mode itself as I also hit the error when running with qemu-2.8 -machine pseries-2.8,max-cpu-compat=power8 on a POWER8 host. > 13) Boot to SLOF with qemu-2.6 -machine pseries-2.6 -cpu host > Migrate to qemu-2.8 -machine pseries-2.6 -cpu host > ? > Migration succeeds, typing 'boot' at the SLOF prompt succeeds in booting the system: == QEMU-2.8 == spapr_cas_pvr current=0, explicit_match=1, new=f000004 == guest (target) == cpu : POWER8 (architected), altivec supported > 14) Boot to SLOF with qemu-2.7 -machine pseries-2.7 -cpu host > Migrate to qemu-2.8 -machine pseries-2.7 -cpu host > ? > Same as 13) > 15) Boot to SLOF with qemu-2.7 -machine pseries-2.7 -cpu host,compat=power7 > Migrate to qemu-2.8 -machine pseries-2.7 -cpu host,compat=power7 > ? > == QEMU-2.8 == CPU 'compat' property is deprecated and has no effect; use max-cpu-compat machine property instead Migration succeeds, but this time SLOF then boots the system automatically: spapr_cas_pvr current=f000003, explicit_match=1, new=f000003 == guest (target) == cpu : POWER7 (architected), altivec supported The same happens with pseries-2.6. > 16) Boot to SLOF with qemu-2.7 -machine pseries-2.7 -cpu host,compat=power7 > Migrate to qemu-2.8 -machine pseries-2.7,max-cpu-compat=power7 -cpu host > ? > Same as 16) except the CPU 'compat' warning for both pseries-2.6 and pseries-2.7. > 17) Boot guest with qemu-2.6 -machine pseries-2.6 -cpu host > Migrate to qemu-2.7.z -machine pseries-2.6 -cpu host > Should work > It doesn't. Migration fails on destination: error while loading state for instance 0x0 of device 'cpu' load of migration failed: Invalid argument > 18) Hack guest to only advertise power7 compatibility, boot with -cpu host > Boot with qemu-2.8, migrate to qemu-2.8 > Should be in power7 compat mode after CAS on source, and still > in power7 compat mode on destination > Same failure as 11) Cheers. -- Greg > Changes since RFCv1: > * Change CAS logic to prefer compatibility modes over raw mode > * Simplified by giving up on half-hearted attempts to maintain > backwards migration > * Folded migration stream changes into a single patch > * Removed some preliminary patches which are already merged > > David Gibson (12): > pseries: Always use core objects for CPU construction > pseries: Make cpu_update during CAS unconditional > ppc: Clean up and QOMify hypercall emulation > ppc: Rename cpu_version to compat_pvr > ppc: Rewrite ppc_set_compat() > ppc: Rewrite ppc_get_compat_smt_threads() > ppc: Validate compatibility modes when setting > pseries: Rewrite CAS PVR compatibility logic > ppc: Add ppc_set_compat_all() > pseries: Move CPU compatibility property to machine > pseries: Reset CPU compatibility mode > ppc: Rework CPU compatibility testing across migration > > hw/ppc/spapr.c | 158 ++++++++++++++++------------ > hw/ppc/spapr_cpu_core.c | 85 ++++++++++----- > hw/ppc/spapr_hcall.c | 140 +++++++------------------ > hw/ppc/trace-events | 2 +- > include/hw/ppc/spapr.h | 12 ++- > target-ppc/Makefile.objs | 2 +- > target-ppc/compat.c | 249 ++++++++++++++++++++++++++++++++++++++++++++ > target-ppc/cpu.h | 49 +++++++-- > target-ppc/excp_helper.c | 11 +- > target-ppc/kvm.c | 4 +- > target-ppc/kvm_ppc.h | 4 +- > target-ppc/machine.c | 87 ++++++++++++++-- > target-ppc/translate_init.c | 157 +++++++--------------------- > 13 files changed, 607 insertions(+), 353 deletions(-) > create mode 100644 target-ppc/compat.c >