All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
@ 2018-10-01 17:03 Peter Maydell
  2018-10-01 18:12 ` Alex Bennée
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2018-10-01 17:03 UTC (permalink / raw)
  To: QEMU Developers
  Cc: Paolo Bonzini, Richard Henderson, Alex Bennée, Emilio G. Cota

I've been investigating a race condition where sometimes when my
guest writes to a device register which triggers a
qemu_system_reset_request(), it doesn't actually cause a clean reset,
but instead the guest CPU continues to execute instructions.
I managed to repro it under 'rr', which let me walk through enough
of what was going on to determine the following:

When a guest CPU thread calls qemu_system_reset_request(), this
results in a call to qemu_cpu_stop(current_cpu, true), to
make the CPU come back out to the main loop. We also set the
reset_requested flag, to get the IO thread to actually do the
reset.

The main loop thread runs main_loop_should_exit(). If there is a
pending reset, it calls pause_all_vcpus(), with the intention
that this quiesces all the guest CPUs before it starts messing
with reset actions.

pause_all_vcpus() just waits for every cpu to have cpu->stopped set.
However, if the running cpu has just called qemu_cpu_stop() on
itself then it will have set cpu->stopped true but not actually
made it out to the main loop yet. (In the case I'm looking at,
what happens is that as soon as the CPU thread unlocks the
iothread mutex in io_writex() after the device write, the
main thread runs and does all the reset operations.)

The reset code in the iothread then proceeds to start calling
various reset functions while the CPU thread is still inside
the exec loop, running generated code and so on. This doesn't
seem like what ought to happen. In particular it includes
calling cpu_common_reset(), which clears all kinds of flags
relevant to the still-executing CPU...

Any suggestions for how we should fix this?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-01 17:03 [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop() Peter Maydell
@ 2018-10-01 18:12 ` Alex Bennée
  2018-10-02  8:01   ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2018-10-01 18:12 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Paolo Bonzini, Richard Henderson, Emilio G. Cota


Peter Maydell <peter.maydell@linaro.org> writes:

> I've been investigating a race condition where sometimes when my
> guest writes to a device register which triggers a
> qemu_system_reset_request(), it doesn't actually cause a clean reset,
> but instead the guest CPU continues to execute instructions.
> I managed to repro it under 'rr', which let me walk through enough
> of what was going on to determine the following:
>
> When a guest CPU thread calls qemu_system_reset_request(), this
> results in a call to qemu_cpu_stop(current_cpu, true), to
> make the CPU come back out to the main loop. We also set the
> reset_requested flag, to get the IO thread to actually do the
> reset.
>
> The main loop thread runs main_loop_should_exit(). If there is a
> pending reset, it calls pause_all_vcpus(), with the intention
> that this quiesces all the guest CPUs before it starts messing
> with reset actions.
>
> pause_all_vcpus() just waits for every cpu to have cpu->stopped set.
> However, if the running cpu has just called qemu_cpu_stop() on
> itself then it will have set cpu->stopped true but not actually
> made it out to the main loop yet. (In the case I'm looking at,
> what happens is that as soon as the CPU thread unlocks the
> iothread mutex in io_writex() after the device write, the
> main thread runs and does all the reset operations.)
>
> The reset code in the iothread then proceeds to start calling
> various reset functions while the CPU thread is still inside
> the exec loop, running generated code and so on. This doesn't
> seem like what ought to happen. In particular it includes
> calling cpu_common_reset(), which clears all kinds of flags
> relevant to the still-executing CPU...

I would have thought the reset code should be scheduled via safe async
work to run in the vCPU context. Why should the main loop get involved
at all here?

>
> Any suggestions for how we should fix this?
>
> thanks
> -- PMM


--
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-01 18:12 ` Alex Bennée
@ 2018-10-02  8:01   ` Peter Maydell
  2018-10-02  8:58     ` Paolo Bonzini
  2018-10-02 10:00     ` Alex Bennée
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Maydell @ 2018-10-02  8:01 UTC (permalink / raw)
  To: Alex Bennée
  Cc: QEMU Developers, Paolo Bonzini, Richard Henderson, Emilio G. Cota

On 1 October 2018 at 19:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> I would have thought the reset code should be scheduled via safe async
> work to run in the vCPU context. Why should the main loop get involved
> at all here?

The reset code is much older than the safe-async support for
running things in the vCPU context... Also, does the safe
async support work with KVM/HAX/Hypervisor.Framework? The
reset code has to handle all those, not just TCG.

Plus, which vCPU thread would you use? We're resetting
the entire system, so privileging an arbitrary vCPU
thread to do that doesn't seem any less odd than using
the main loop thread.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02  8:01   ` Peter Maydell
@ 2018-10-02  8:58     ` Paolo Bonzini
  2018-10-02  9:04       ` Peter Maydell
  2018-10-02 10:00     ` Alex Bennée
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2018-10-02  8:58 UTC (permalink / raw)
  To: Peter Maydell, Alex Bennée
  Cc: QEMU Developers, Richard Henderson, Emilio G. Cota

On 02/10/2018 10:01, Peter Maydell wrote:
> On 1 October 2018 at 19:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>> I would have thought the reset code should be scheduled via safe async
>> work to run in the vCPU context. Why should the main loop get involved
>> at all here?
> The reset code is much older than the safe-async support for
> running things in the vCPU context... Also, does the safe
> async support work with KVM/HAX/Hypervisor.Framework? The
> reset code has to handle all those, not just TCG.
> 
> Plus, which vCPU thread would you use? We're resetting
> the entire system, so privileging an arbitrary vCPU
> thread to do that doesn't seem any less odd than using
> the main loop thread.

I think there's two parts in this.

First, the reset code should indeed use run_on_cpu (it need not be safe
i.e. stop-the-world; just run it in the vCPU thread).  It certainly
doesn't do this right now.

Second, when run_on_cpu's callback runs we should make sure that
cpu_can_run() is false.  I think that's already the case, but it's worth
asserting.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02  8:58     ` Paolo Bonzini
@ 2018-10-02  9:04       ` Peter Maydell
  2018-10-02  9:59         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2018-10-02  9:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bennée, QEMU Developers, Richard Henderson, Emilio G. Cota

On 2 October 2018 at 09:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> First, the reset code should indeed use run_on_cpu (it need not be safe
> i.e. stop-the-world; just run it in the vCPU thread).  It certainly
> doesn't do this right now.

I don't understand this part. We're resetting the entire world:
surely we need to stop the entire world first ?

(Also, other things use pause_all_vcpus() and hit this race
condition, like VM suspend and shutdown.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02  9:04       ` Peter Maydell
@ 2018-10-02  9:59         ` Paolo Bonzini
  2018-10-02 10:34           ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2018-10-02  9:59 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, QEMU Developers, Richard Henderson, Emilio G. Cota

On 02/10/2018 11:04, Peter Maydell wrote:
> On 2 October 2018 at 09:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> First, the reset code should indeed use run_on_cpu (it need not be safe
>> i.e. stop-the-world; just run it in the vCPU thread).  It certainly
>> doesn't do this right now.
> 
> I don't understand this part. We're resetting the entire world:
> surely we need to stop the entire world first ?

Most of the world is stopped because it only runs with BQL taken.  vCPU
isn't, so we ensure it is stopped by: 1) using run_on_cpu to synchronize
with the executed TBs (or KVM_RUN) 2) ensuring the execution loop is
paused after reset, which is the cpu_can_run part that you snipped.

"Safe" CPU work items on the other hand ensure that _no_ vCPU is in the
execution loop, which is overkill here.

Paolo

> (Also, other things use pause_all_vcpus() and hit this race
> condition, like VM suspend and shutdown.)
> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02  8:01   ` Peter Maydell
  2018-10-02  8:58     ` Paolo Bonzini
@ 2018-10-02 10:00     ` Alex Bennée
  2018-10-02 10:31       ` Peter Maydell
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2018-10-02 10:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Paolo Bonzini, Richard Henderson, Emilio G. Cota


Peter Maydell <peter.maydell@linaro.org> writes:

> On 1 October 2018 at 19:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>> I would have thought the reset code should be scheduled via safe async
>> work to run in the vCPU context. Why should the main loop get involved
>> at all here?
>
> The reset code is much older than the safe-async support for
> running things in the vCPU context... Also, does the safe
> async support work with KVM/HAX/Hypervisor.Framework? The
> reset code has to handle all those, not just TCG.

the *_run_on_cpu functions should be safe for all users although KVM
stuff seems to use the direct run_on_cpu stuff more. The events are
consumed in the wait_io logic that all accelerators share - in the outer
loop in cpus-common.c

> Plus, which vCPU thread would you use?

Each vCPU should reset it's own data. For one thing it avoids issue
with barriers across threads.

> We're resetting
> the entire system, so privileging an arbitrary vCPU
> thread to do that doesn't seem any less odd than using
> the main loop thread.

Sure - but they do give predictable semantics. If in this case the cpu
sourcing the request scheduled async tasks to stop the cpu to everything
else and a safe task to it's own thread it can be assured everyone has
done their "work" (stopping in this case) and is in a known state.

Does qemu_system_reset_request() make any particular promises of what
order things should happen in?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02 10:00     ` Alex Bennée
@ 2018-10-02 10:31       ` Peter Maydell
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2018-10-02 10:31 UTC (permalink / raw)
  To: Alex Bennée
  Cc: QEMU Developers, Paolo Bonzini, Richard Henderson, Emilio G. Cota

On 2 October 2018 at 11:00, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
>> On 1 October 2018 at 19:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>>> I would have thought the reset code should be scheduled via safe async
>>> work to run in the vCPU context. Why should the main loop get involved
>>> at all here?
>>
>> The reset code is much older than the safe-async support for
>> running things in the vCPU context... Also, does the safe
>> async support work with KVM/HAX/Hypervisor.Framework? The
>> reset code has to handle all those, not just TCG.
>
> the *_run_on_cpu functions should be safe for all users although KVM
> stuff seems to use the direct run_on_cpu stuff more. The events are
> consumed in the wait_io logic that all accelerators share - in the outer
> loop in cpus-common.c
>
>> Plus, which vCPU thread would you use?
>
> Each vCPU should reset it's own data. For one thing it avoids issue
> with barriers across threads.

That seems a very long way from where we are at the moment,
where the semantics are that a CPU is just another kind of
device, and we should ensure that nothing in the system is
executing before we try to reset any of it. (Otherwise
you get into all kinds of nasty conditions where a vCPU
is still running and executes writes to devices that have
already reset, or causes calls into a different vCPU
that again is in the process of resetting).

>> We're resetting
>> the entire system, so privileging an arbitrary vCPU
>> thread to do that doesn't seem any less odd than using
>> the main loop thread.
>
> Sure - but they do give predictable semantics. If in this case the cpu
> sourcing the request scheduled async tasks to stop the cpu to everything
> else and a safe task to it's own thread it can be assured everyone has
> done their "work" (stopping in this case) and is in a known state.
>
> Does qemu_system_reset_request() make any particular promises of what
> order things should happen in?

AIUI the promise is that when the reset occurs the entire
system should end up in the state as if QEMU had just
been started. There is no guarantee about ordering between
different reset methods/functions (which is a bit of a
can of worms of its own), but any device can assume that
nobody else is going to call into it during the reset
process (either before its reset function runs, or after).

Basically it should be like:
 * whole VM pauses
 * we do all the reset work
 * start the VM

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02  9:59         ` Paolo Bonzini
@ 2018-10-02 10:34           ` Peter Maydell
  2018-10-02 16:46             ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2018-10-02 10:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bennée, QEMU Developers, Richard Henderson, Emilio G. Cota

On 2 October 2018 at 10:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 02/10/2018 11:04, Peter Maydell wrote:
>> On 2 October 2018 at 09:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> First, the reset code should indeed use run_on_cpu (it need not be safe
>>> i.e. stop-the-world; just run it in the vCPU thread).  It certainly
>>> doesn't do this right now.
>>
>> I don't understand this part. We're resetting the entire world:
>> surely we need to stop the entire world first ?
>
> Most of the world is stopped because it only runs with BQL taken.  vCPU
> isn't, so we ensure it is stopped by: 1) using run_on_cpu to synchronize
> with the executed TBs (or KVM_RUN) 2) ensuring the execution loop is
> paused after reset, which is the cpu_can_run part that you snipped.

Maybe I just don't understand what you're suggesting should be
done via run-on-cpu. But it seems to me that the problem here
is that cpu_stop_current() should not call qemu_cpu_stop()
immediately, but instead arrange that this vCPU calls qemu_cpu_stop()
when it gets back out of the execution loop.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02 10:34           ` Peter Maydell
@ 2018-10-02 16:46             ` Paolo Bonzini
  2018-10-02 16:57               ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2018-10-02 16:46 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, QEMU Developers, Richard Henderson, Emilio G. Cota

On 02/10/2018 12:34, Peter Maydell wrote:
> On 2 October 2018 at 10:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 02/10/2018 11:04, Peter Maydell wrote:
>>> On 2 October 2018 at 09:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> First, the reset code should indeed use run_on_cpu (it need not be safe
>>>> i.e. stop-the-world; just run it in the vCPU thread).  It certainly
>>>> doesn't do this right now.
>>>
>>> I don't understand this part. We're resetting the entire world:
>>> surely we need to stop the entire world first ?
>>
>> Most of the world is stopped because it only runs with BQL taken.  vCPU
>> isn't, so we ensure it is stopped by: 1) using run_on_cpu to synchronize
>> with the executed TBs (or KVM_RUN) 2) ensuring the execution loop is
>> paused after reset, which is the cpu_can_run part that you snipped.
> 
> Maybe I just don't understand what you're suggesting should be
> done via run-on-cpu. But it seems to me that the problem here
> is that cpu_stop_current() should not call qemu_cpu_stop()
> immediately, but instead arrange that this vCPU calls qemu_cpu_stop()
> when it gets back out of the execution loop.

It will already do it in qemu_wait_io_event_common.  So perhaps it's
enough to do cpu_exit(cpu) in pause_all_vcpus() and cpu_stop_current(),
instead of cpu_stop_current() which incorrectly sets cpu->stopped to true?

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
  2018-10-02 16:46             ` Paolo Bonzini
@ 2018-10-02 16:57               ` Peter Maydell
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2018-10-02 16:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bennée, QEMU Developers, Richard Henderson, Emilio G. Cota

On 2 October 2018 at 17:46, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 02/10/2018 12:34, Peter Maydell wrote:
>> Maybe I just don't understand what you're suggesting should be
>> done via run-on-cpu. But it seems to me that the problem here
>> is that cpu_stop_current() should not call qemu_cpu_stop()
>> immediately, but instead arrange that this vCPU calls qemu_cpu_stop()
>> when it gets back out of the execution loop.
>
> It will already do it in qemu_wait_io_event_common.  So perhaps it's
> enough to do cpu_exit(cpu) in pause_all_vcpus() and cpu_stop_current(),
> instead of cpu_stop_current() which incorrectly sets cpu->stopped to true?

Sounds plausible in the cpu_stop_current() case, but for
pause_all_vcpus(), we need to (continue to) do something that makes
the current CPU immediately set cpu->stopped to true, because the
second part of the function is going to wait until all cpus
(including this one) have cpu->stopped set. (That is, if the vcpu
thread calls pause_all_vcpus() then we can't say "wait for this
thread to get back out to the main loop": that's a deadlock. We
could leave pause_all_vcpus() as it is, or if we cared we'd
need to rewrite all the places that call pause_all_vcpus() from
the vcpu thread to do something else instead. That's basically
kvmvapic.c -- which could probably be rewritten to do something
else than its current pause_all_vcpus/do stuff/resume_all_vcpus
sequence -- plus vm_stop(), which has a related FIXME comment
in it already.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-10-02 16:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-01 17:03 [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop() Peter Maydell
2018-10-01 18:12 ` Alex Bennée
2018-10-02  8:01   ` Peter Maydell
2018-10-02  8:58     ` Paolo Bonzini
2018-10-02  9:04       ` Peter Maydell
2018-10-02  9:59         ` Paolo Bonzini
2018-10-02 10:34           ` Peter Maydell
2018-10-02 16:46             ` Paolo Bonzini
2018-10-02 16:57               ` Peter Maydell
2018-10-02 10:00     ` Alex Bennée
2018-10-02 10:31       ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.