Reset problem vs. MMIO emulation, hypercalls, etc...

* Reset problem vs. MMIO emulation, hypercalls, etc...
@ 2012-08-01  3:17 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 62+ messages in thread
From: Benjamin Herrenschmidt @ 2012-08-01  3:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Alexander Graf, Paul Mackerras, kvm-ppc

Hi Avi !

We identified a problem on powerpc which seems to actually be a generic
issue, and Alex suggested we propose a generic fix. I want to make sure
we are on the right track first before proposing an actual patch as we
would like the patch to go in ASAP (ie not waiting the next merge
window) as it will fix an actual nasty bug with reset in KVM.

So the basic issue has to do with doing a machine reset as a result of a
hypervisor call, but the same problem should happen with MMIO/PIO
emulation.

After we do an exit as a result of such an operation, at the next
KVM_RUN, KVM will fetch the "results" of the operation (in the hypercall
case that's a bunch of register values, in the MMIO read emulation case
it's a single register value usually, x86 might have more subtle cases)
and we update the VCPU state (ie. registers) with that data.

However, what happens is that if a reset happens in between, we end up
clobbering the reset state.

IE. What happens in qemu is roughtly:

 - The hcall or MMIO that triggers the reset happens, goes to qemu,
which eventually calls qemu_system_reset_request()

 - This sets the global reset pending flag and wakes up the main loop.
It also does a stop of the current vcpu, so we do not return to the
kernel at this stage.

 - The main loop gets the flag, starts the reset process, which begins
with stopping all the VCPUs.

 - The reset handlers are called, which includes resetting the CPU
state, which in our case (powerpc) results in a SET_REGS ioctl to
establish a new fresh state for booting.

 - The generic code then restarts all VCPUs, which then return into
VCPU_RUN.

 - The VCPU(s) that did an exit as a result of MMIO emulation,
hypercall, or similiar (typically the one that triggered the reset but
possibly others) then gets some of their register state "updated" by the
result of the operation (in the hcall case, it's a field in the mmap'ed
run structure that clobbers GPR3 among others).

Now this is generally not a big issue as -usually- machines don't care
much about the state of registers on reset.

However it is for us, at least in a couple of cases because we build a
device-tree in qemu and we pass a pointer to it in GPR3 before we start
the firmware or the guest kernel.

This pointer is set in env->gpr[3] which then gets written by a reset
handler to KVM with SET_REGS, but that ends up being clobbered
eventually by the return from the hypercall.

My original idea to fix that was a ppc specific "trick" where we would
make SET_REGS clear the flags indicating that we had a pending hypercall
(or similar) and thus have SET_REGS have the side-effect of aborting a
current operation.

However, Alex reckons we should instead have a generic ioctl
(VCPU_ABORT_OPERATION) which qemu could use when there's a reset, that
we call to generically abort all such operations (MMIO emu, hcalls,
etc...) on all archs when qemu decides to reset the processors.

Alex and I have looked at various possible ways to try to fix that from
inside qemu but they are all extremely horrible & gross and might not
even work in the general case.

Avi, before I actually do a patch (that adds the generic definition,
doc, and powerpc implementation), I'd like to confirm with you that this
is the right approach and that this can benefit from expedited
upstreaming :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 62+ messages in thread