From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Subject: Re: Reset problem vs. MMIO emulation, hypercalls, etc...
Date: Thu, 2 Aug 2012 12:46:03 +0200
Message-ID: <9FDC9D63-65EB-46B2-A4CE-AE34860EA59A@suse.de>
References: <1343791031.16975.41.camel@pasglop>
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: Avi Kivity <avi@redhat.com>, KVM list <kvm@vger.kernel.org>,
	Paul Mackerras <paulus@au1.ibm.com>, kvm-ppc@vger.kernel.org,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Jan Kiszka <jan.kiszka@siemens.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Return-path: <kvm-ppc-owner@vger.kernel.org>
In-Reply-To: <1343791031.16975.41.camel@pasglop>
Sender: kvm-ppc-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

Avi, Marcelo, Jan? Any thoughts on this?

Alex

On 01.08.2012, at 05:17, Benjamin Herrenschmidt wrote:

> Hi Avi !
> 
> We identified a problem on powerpc which seems to actually be a generic
> issue, and Alex suggested we propose a generic fix. I want to make sure
> we are on the right track first before proposing an actual patch as we
> would like the patch to go in ASAP (ie not waiting the next merge
> window) as it will fix an actual nasty bug with reset in KVM.
> 
> So the basic issue has to do with doing a machine reset as a result of a
> hypervisor call, but the same problem should happen with MMIO/PIO
> emulation.
> 
> After we do an exit as a result of such an operation, at the next
> KVM_RUN, KVM will fetch the "results" of the operation (in the hypercall
> case that's a bunch of register values, in the MMIO read emulation case
> it's a single register value usually, x86 might have more subtle cases)
> and we update the VCPU state (ie. registers) with that data.
> 
> However, what happens is that if a reset happens in between, we end up
> clobbering the reset state.
> 
> IE. What happens in qemu is roughtly:
> 
> - The hcall or MMIO that triggers the reset happens, goes to qemu,
> which eventually calls qemu_system_reset_request()
> 
> - This sets the global reset pending flag and wakes up the main loop.
> It also does a stop of the current vcpu, so we do not return to the
> kernel at this stage.
> 
> - The main loop gets the flag, starts the reset process, which begins
> with stopping all the VCPUs.
> 
> - The reset handlers are called, which includes resetting the CPU
> state, which in our case (powerpc) results in a SET_REGS ioctl to
> establish a new fresh state for booting.
> 
> - The generic code then restarts all VCPUs, which then return into
> VCPU_RUN.
> 
> - The VCPU(s) that did an exit as a result of MMIO emulation,
> hypercall, or similiar (typically the one that triggered the reset but
> possibly others) then gets some of their register state "updated" by the
> result of the operation (in the hcall case, it's a field in the mmap'ed
> run structure that clobbers GPR3 among others).
> 
> Now this is generally not a big issue as -usually- machines don't care
> much about the state of registers on reset.
> 
> However it is for us, at least in a couple of cases because we build a
> device-tree in qemu and we pass a pointer to it in GPR3 before we start
> the firmware or the guest kernel.
> 
> This pointer is set in env->gpr[3] which then gets written by a reset
> handler to KVM with SET_REGS, but that ends up being clobbered
> eventually by the return from the hypercall.
> 
> My original idea to fix that was a ppc specific "trick" where we would
> make SET_REGS clear the flags indicating that we had a pending hypercall
> (or similar) and thus have SET_REGS have the side-effect of aborting a
> current operation.
> 
> However, Alex reckons we should instead have a generic ioctl
> (VCPU_ABORT_OPERATION) which qemu could use when there's a reset, that
> we call to generically abort all such operations (MMIO emu, hcalls,
> etc...) on all archs when qemu decides to reset the processors.
> 
> Alex and I have looked at various possible ways to try to fix that from
> inside qemu but they are all extremely horrible & gross and might not
> even work in the general case.
> 
> Avi, before I actually do a patch (that adds the generic definition,
> doc, and powerpc implementation), I'd like to confirm with you that this
> is the right approach and that this can benefit from expedited
> upstreaming :-)
> 
> Cheers,
> Ben.
> 
> 

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Date: Thu, 02 Aug 2012 10:46:03 +0000
Subject: Re: Reset problem vs. MMIO emulation, hypercalls, etc...
Message-Id: <9FDC9D63-65EB-46B2-A4CE-AE34860EA59A@suse.de>
List-Id: <kvm-ppc.vger.kernel.org>
References: <1343791031.16975.41.camel@pasglop>
In-Reply-To: <1343791031.16975.41.camel@pasglop>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>, KVM list <kvm@vger.kernel.org>, Paul Mackerras <paulus@au1.ibm.com>, kvm-ppc@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>, Jan Kiszka <jan.kiszka@siemens.com>

Avi, Marcelo, Jan? Any thoughts on this?

Alex

On 01.08.2012, at 05:17, Benjamin Herrenschmidt wrote:

> Hi Avi !
> 
> We identified a problem on powerpc which seems to actually be a generic
> issue, and Alex suggested we propose a generic fix. I want to make sure
> we are on the right track first before proposing an actual patch as we
> would like the patch to go in ASAP (ie not waiting the next merge
> window) as it will fix an actual nasty bug with reset in KVM.
> 
> So the basic issue has to do with doing a machine reset as a result of a
> hypervisor call, but the same problem should happen with MMIO/PIO
> emulation.
> 
> After we do an exit as a result of such an operation, at the next
> KVM_RUN, KVM will fetch the "results" of the operation (in the hypercall
> case that's a bunch of register values, in the MMIO read emulation case
> it's a single register value usually, x86 might have more subtle cases)
> and we update the VCPU state (ie. registers) with that data.
> 
> However, what happens is that if a reset happens in between, we end up
> clobbering the reset state.
> 
> IE. What happens in qemu is roughtly:
> 
> - The hcall or MMIO that triggers the reset happens, goes to qemu,
> which eventually calls qemu_system_reset_request()
> 
> - This sets the global reset pending flag and wakes up the main loop.
> It also does a stop of the current vcpu, so we do not return to the
> kernel at this stage.
> 
> - The main loop gets the flag, starts the reset process, which begins
> with stopping all the VCPUs.
> 
> - The reset handlers are called, which includes resetting the CPU
> state, which in our case (powerpc) results in a SET_REGS ioctl to
> establish a new fresh state for booting.
> 
> - The generic code then restarts all VCPUs, which then return into
> VCPU_RUN.
> 
> - The VCPU(s) that did an exit as a result of MMIO emulation,
> hypercall, or similiar (typically the one that triggered the reset but
> possibly others) then gets some of their register state "updated" by the
> result of the operation (in the hcall case, it's a field in the mmap'ed
> run structure that clobbers GPR3 among others).
> 
> Now this is generally not a big issue as -usually- machines don't care
> much about the state of registers on reset.
> 
> However it is for us, at least in a couple of cases because we build a
> device-tree in qemu and we pass a pointer to it in GPR3 before we start
> the firmware or the guest kernel.
> 
> This pointer is set in env->gpr[3] which then gets written by a reset
> handler to KVM with SET_REGS, but that ends up being clobbered
> eventually by the return from the hypercall.
> 
> My original idea to fix that was a ppc specific "trick" where we would
> make SET_REGS clear the flags indicating that we had a pending hypercall
> (or similar) and thus have SET_REGS have the side-effect of aborting a
> current operation.
> 
> However, Alex reckons we should instead have a generic ioctl
> (VCPU_ABORT_OPERATION) which qemu could use when there's a reset, that
> we call to generically abort all such operations (MMIO emu, hcalls,
> etc...) on all archs when qemu decides to reset the processors.
> 
> Alex and I have looked at various possible ways to try to fix that from
> inside qemu but they are all extremely horrible & gross and might not
> even work in the general case.
> 
> Avi, before I actually do a patch (that adds the generic definition,
> doc, and powerpc implementation), I'd like to confirm with you that this
> is the right approach and that this can benefit from expedited
> upstreaming :-)
> 
> Cheers,
> Ben.
> 
>