All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] global_mutex and multithread.
@ 2015-01-15 10:25 Frederic Konrad
  2015-01-15 10:34 ` Peter Maydell
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Frederic Konrad @ 2015-01-15 10:25 UTC (permalink / raw)
  To: mttcg, qemu-devel; +Cc: Peter Maydell, Alexander Graf, Paolo Bonzini

Hi everybody,

In case of multithread TCG what is the best way to handle qemu_global_mutex?
We though to have one mutex per vcpu and then synchronize vcpu threads when
they exit (eg: in tcg_exec_all).

Is that making sense?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 10:25 [Qemu-devel] global_mutex and multithread Frederic Konrad
@ 2015-01-15 10:34 ` Peter Maydell
  2015-01-15 10:41   ` Frederic Konrad
  2015-01-15 10:44 ` Paolo Bonzini
  2015-01-15 11:12 ` Paolo Bonzini
  2 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2015-01-15 10:34 UTC (permalink / raw)
  To: Frederic Konrad; +Cc: mttcg, Paolo Bonzini, qemu-devel, Alexander Graf

On 15 January 2015 at 10:25, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> Hi everybody,
>
> In case of multithread TCG what is the best way to handle qemu_global_mutex?

It shouldn't need any changes I think. You're basically bringing
TCG into line with what KVM already has -- one thread per guest
CPU; and qemu_global_mutex already works fine in that model.

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 10:34 ` Peter Maydell
@ 2015-01-15 10:41   ` Frederic Konrad
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Konrad @ 2015-01-15 10:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: mttcg, Paolo Bonzini, qemu-devel, Alexander Graf

On 15/01/2015 11:34, Peter Maydell wrote:
> On 15 January 2015 at 10:25, Frederic Konrad <fred.konrad@greensocs.com> wrote:
>> Hi everybody,
>>
>> In case of multithread TCG what is the best way to handle qemu_global_mutex?
> It shouldn't need any changes I think. You're basically bringing
> TCG into line with what KVM already has -- one thread per guest
> CPU; and qemu_global_mutex already works fine in that model.
>
> -- PMM
Hi Peter,

Thanks for your reply.
True that makes sense, I still think we need to synchronize vcpu when 
they exit?

Fred

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 10:25 [Qemu-devel] global_mutex and multithread Frederic Konrad
  2015-01-15 10:34 ` Peter Maydell
@ 2015-01-15 10:44 ` Paolo Bonzini
  2015-01-15 11:12 ` Paolo Bonzini
  2 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 10:44 UTC (permalink / raw)
  To: Frederic Konrad, mttcg, qemu-devel; +Cc: Peter Maydell, Alexander Graf



On 15/01/2015 11:25, Frederic Konrad wrote:
> Hi everybody,
> 
> In case of multithread TCG what is the best way to handle
> qemu_global_mutex?
> We though to have one mutex per vcpu and then synchronize vcpu threads when
> they exit (eg: in tcg_exec_all).

The basic ideas from Jan's patch in
http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.

RAM block reordering doesn't exist anymore, having been replaced with
mru_block.

The patch reacquired the lock when entering MMIO or PIO emulation.
That's enough while there is only one VCPU thread.

Once you have >1 VCPU thread you'll need the RCU work that I am slowly
polishing and sending out.  That's because one device can change the
memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
and that's not thread-safe.

And later on, once devices start being converted to run outside the BQL,
that can be changed to use new functions address_space_rw_unlocked /
io_mem_read_unlocked / io_mem_write_unlocked.  Something like that is
already visible at https://github.com/bonzini/qemu/commits/rcu (ignore
patches after "kvm: Switch to unlocked MMIO").

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 10:25 [Qemu-devel] global_mutex and multithread Frederic Konrad
  2015-01-15 10:34 ` Peter Maydell
  2015-01-15 10:44 ` Paolo Bonzini
@ 2015-01-15 11:12 ` Paolo Bonzini
  2015-01-15 11:14   ` Alexander Graf
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 11:12 UTC (permalink / raw)
  To: Frederic Konrad, mttcg, qemu-devel; +Cc: Peter Maydell, Alexander Graf

[now with correct listserver address]

On 15/01/2015 11:25, Frederic Konrad wrote:
> Hi everybody,
> 
> In case of multithread TCG what is the best way to handle
> qemu_global_mutex?
> We though to have one mutex per vcpu and then synchronize vcpu threads when
> they exit (eg: in tcg_exec_all).
> 
> Is that making sense?

The basic ideas from Jan's patch in
http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.

RAM block reordering doesn't exist anymore, having been replaced with
mru_block.

The patch reacquired the lock when entering MMIO or PIO emulation.
That's enough while there is only one VCPU thread.

Once you have >1 VCPU thread you'll need the RCU work that I am slowly
polishing and sending out.  That's because one device can change the
memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
and that's not thread-safe.

And later on, once devices start being converted to run outside the BQL,
that can be changed to use new functions address_space_rw_unlocked /
io_mem_read_unlocked / io_mem_write_unlocked.  Something like that is
already visible at https://github.com/bonzini/qemu/commits/rcu (ignore
patches after "kvm: Switch to unlocked MMIO").

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 11:12 ` Paolo Bonzini
@ 2015-01-15 11:14   ` Alexander Graf
  2015-01-15 11:26     ` Paolo Bonzini
  2015-01-15 13:30     ` Frederic Konrad
  2015-01-15 12:51   ` Frederic Konrad
  2015-01-15 19:07   ` Mark Burton
  2 siblings, 2 replies; 22+ messages in thread
From: Alexander Graf @ 2015-01-15 11:14 UTC (permalink / raw)
  To: Paolo Bonzini, Frederic Konrad, mttcg, qemu-devel; +Cc: Peter Maydell



On 15.01.15 12:12, Paolo Bonzini wrote:
> [now with correct listserver address]
> 
> On 15/01/2015 11:25, Frederic Konrad wrote:
>> Hi everybody,
>>
>> In case of multithread TCG what is the best way to handle
>> qemu_global_mutex?
>> We though to have one mutex per vcpu and then synchronize vcpu threads when
>> they exit (eg: in tcg_exec_all).
>>
>> Is that making sense?
> 
> The basic ideas from Jan's patch in
> http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.
> 
> RAM block reordering doesn't exist anymore, having been replaced with
> mru_block.
> 
> The patch reacquired the lock when entering MMIO or PIO emulation.
> That's enough while there is only one VCPU thread.
> 
> Once you have >1 VCPU thread you'll need the RCU work that I am slowly
> polishing and sending out.  That's because one device can change the
> memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
> and that's not thread-safe.

You'll have a similar problem for tb_flush() if you use a single tb
cache. Just introduce a big hammer function for now that IPIs all the
other threads, waits until they halted, do the atomic instruction (like
change the memory map or flush the tb cache), then let them continue.

We can later one-by-one get rid of all callers of this.


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 11:14   ` Alexander Graf
@ 2015-01-15 11:26     ` Paolo Bonzini
  2015-01-15 13:30     ` Frederic Konrad
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 11:26 UTC (permalink / raw)
  To: Alexander Graf, Frederic Konrad, mttcg, qemu-devel; +Cc: Peter Maydell



On 15/01/2015 12:14, Alexander Graf wrote:
>> > 
>> > Once you have >1 VCPU thread you'll need the RCU work that I am slowly
>> > polishing and sending out.  That's because one device can change the
>> > memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
>> > and that's not thread-safe.
> You'll have a similar problem for tb_flush() if you use a single tb
> cache. Just introduce a big hammer function for now that IPIs all the
> other threads, waits until they halted, do the atomic instruction (like
> change the memory map or flush the tb cache), then let them continue.

For the memory map I played with just using cpu_interrupt instead of
waiting for the halt of the other CPUs.  That's safe for 1 VCPU thread,
but not for >1 thread.  Perhaps we can exit the other CPUs with a signal
and cpu_resume_from_signal.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 11:12 ` Paolo Bonzini
  2015-01-15 11:14   ` Alexander Graf
@ 2015-01-15 12:51   ` Frederic Konrad
  2015-01-15 12:56     ` Paolo Bonzini
  2015-01-15 19:07   ` Mark Burton
  2 siblings, 1 reply; 22+ messages in thread
From: Frederic Konrad @ 2015-01-15 12:51 UTC (permalink / raw)
  To: Paolo Bonzini, mttcg, qemu-devel; +Cc: Peter Maydell, Alexander Graf

On 15/01/2015 12:12, Paolo Bonzini wrote:
> [now with correct listserver address]
>
> On 15/01/2015 11:25, Frederic Konrad wrote:
>> Hi everybody,
>>
>> In case of multithread TCG what is the best way to handle
>> qemu_global_mutex?
>> We though to have one mutex per vcpu and then synchronize vcpu threads when
>> they exit (eg: in tcg_exec_all).
>>
>> Is that making sense?
> The basic ideas from Jan's patch in
> http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.
>
> RAM block reordering doesn't exist anymore, having been replaced with
> mru_block.
>
> The patch reacquired the lock when entering MMIO or PIO emulation.
> That's enough while there is only one VCPU thread.
>
> Once you have >1 VCPU thread you'll need the RCU work that I am slowly
> polishing and sending out.  That's because one device can change the
> memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
> and that's not thread-safe.
>
> And later on, once devices start being converted to run outside the BQL,
> that can be changed to use new functions address_space_rw_unlocked /
> io_mem_read_unlocked / io_mem_write_unlocked.  Something like that is
> already visible at https://github.com/bonzini/qemu/commits/rcu (ignore
> patches after "kvm: Switch to unlocked MMIO").
>
> Paolo
>
>
>
Hi Paolo,

Thanks for the reply.

As I understand the idea of Jan is to unlock the global_mutex during tcg 
execution.
Is that right?
So that means it's currently not the case and we won't be able to run 
two TCG
threads at the same time?

About the RCU, is there a lot of device which change the memory map?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 12:51   ` Frederic Konrad
@ 2015-01-15 12:56     ` Paolo Bonzini
  2015-01-15 13:27       ` Frederic Konrad
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 12:56 UTC (permalink / raw)
  To: Frederic Konrad, mttcg, qemu-devel; +Cc: Peter Maydell, Alexander Graf



On 15/01/2015 13:51, Frederic Konrad wrote:
> 
> 
> Thanks for the reply.
> 
> As I understand the idea of Jan is to unlock the global_mutex during tcg
> execution.
> Is that right?
> So that means it's currently not the case and we won't be able to run
> two TCG
> threads at the same time?

Yes.

> About the RCU, is there a lot of device which change the memory map?

All PCI devices (when you program their BARs), but apart from that not
much.  As a first approximation, the patches on github which use
CPU_INTERRUPT_TLBFLUSH should work even for multiple TCG threads.

I'll clean them up a bit further so that CPU_INTERRUPT_TLBFLUSH is used
for CPUs other than the running one; the running CPU instead uses
tlb_flush directly.

If anyone can sum up how cpu_resume_from_signal works, that would also
be helpful.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 12:56     ` Paolo Bonzini
@ 2015-01-15 13:27       ` Frederic Konrad
  2015-01-15 13:30         ` Peter Maydell
  0 siblings, 1 reply; 22+ messages in thread
From: Frederic Konrad @ 2015-01-15 13:27 UTC (permalink / raw)
  To: Paolo Bonzini, mttcg, qemu-devel
  Cc: Peter Maydell, Mark Burton, Alexander Graf

On 15/01/2015 13:56, Paolo Bonzini wrote:
>
> On 15/01/2015 13:51, Frederic Konrad wrote:
>>
>> Thanks for the reply.
>>
>> As I understand the idea of Jan is to unlock the global_mutex during tcg
>> execution.
>> Is that right?
>> So that means it's currently not the case and we won't be able to run
>> two TCG
>> threads at the same time?
> Yes.
>
>> About the RCU, is there a lot of device which change the memory map?
> All PCI devices (when you program their BARs), but apart from that not
> much.  As a first approximation, the patches on github which use
> CPU_INTERRUPT_TLBFLUSH should work even for multiple TCG threads.

Ok that makes sense. Thanks!

Fred

PS: Any idea why listserver is dropped from listserver.greensocs.com?
>
> I'll clean them up a bit further so that CPU_INTERRUPT_TLBFLUSH is used
> for CPUs other than the running one; the running CPU instead uses
> tlb_flush directly.
>
> If anyone can sum up how cpu_resume_from_signal works, that would also
> be helpful.
>
> Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 13:27       ` Frederic Konrad
@ 2015-01-15 13:30         ` Peter Maydell
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2015-01-15 13:30 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, Paolo Bonzini, Mark Burton, qemu-devel, Alexander Graf

On 15 January 2015 at 13:27, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> PS: Any idea why listserver is dropped from listserver.greensocs.com?

Paolo's mail client apparently has a bizarre allergy to the correct
address...

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 11:14   ` Alexander Graf
  2015-01-15 11:26     ` Paolo Bonzini
@ 2015-01-15 13:30     ` Frederic Konrad
  2015-01-15 13:34       ` Mark Burton
  1 sibling, 1 reply; 22+ messages in thread
From: Frederic Konrad @ 2015-01-15 13:30 UTC (permalink / raw)
  To: Alexander Graf, Paolo Bonzini, mttcg, qemu-devel; +Cc: Peter Maydell

On 15/01/2015 12:14, Alexander Graf wrote:
>
> On 15.01.15 12:12, Paolo Bonzini wrote:
>> [now with correct listserver address]
>>
>> On 15/01/2015 11:25, Frederic Konrad wrote:
>>> Hi everybody,
>>>
>>> In case of multithread TCG what is the best way to handle
>>> qemu_global_mutex?
>>> We though to have one mutex per vcpu and then synchronize vcpu threads when
>>> they exit (eg: in tcg_exec_all).
>>>
>>> Is that making sense?
>> The basic ideas from Jan's patch in
>> http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.
>>
>> RAM block reordering doesn't exist anymore, having been replaced with
>> mru_block.
>>
>> The patch reacquired the lock when entering MMIO or PIO emulation.
>> That's enough while there is only one VCPU thread.
>>
>> Once you have >1 VCPU thread you'll need the RCU work that I am slowly
>> polishing and sending out.  That's because one device can change the
>> memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
>> and that's not thread-safe.
> You'll have a similar problem for tb_flush() if you use a single tb
> cache. Just introduce a big hammer function for now that IPIs all the
> other threads, waits until they halted, do the atomic instruction (like
> change the memory map or flush the tb cache), then let them continue.
>
> We can later one-by-one get rid of all callers of this.
>
>
> Alex
Maybe we can put a flag in the tb to say it's being executed so tb_alloc 
won't try to
realloc it?

Maybe it's a bad idea and will be actually slower than exiting and 
waiting all the other
cpu.

Fred

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 13:30     ` Frederic Konrad
@ 2015-01-15 13:34       ` Mark Burton
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Burton @ 2015-01-15 13:34 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, Paolo Bonzini, Peter Maydell, Alexander Graf, qemu-devel

I think we call that flag “please dont reallocate this TB until at least after a CPU has exited and we do a global flush”… So if we sync and get all cpu’s to exit on a global flush, this flag is only there as a figment of our imagination…
e.g. we’re safe without it?

Wish I could say the same of global_mutex :-(


Cheers

Mark.

> On 15 Jan 2015, at 14:30, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> 
> On 15/01/2015 12:14, Alexander Graf wrote:
>> 
>> On 15.01.15 12:12, Paolo Bonzini wrote:
>>> [now with correct listserver address]
>>> 
>>> On 15/01/2015 11:25, Frederic Konrad wrote:
>>>> Hi everybody,
>>>> 
>>>> In case of multithread TCG what is the best way to handle
>>>> qemu_global_mutex?
>>>> We though to have one mutex per vcpu and then synchronize vcpu threads when
>>>> they exit (eg: in tcg_exec_all).
>>>> 
>>>> Is that making sense?
>>> The basic ideas from Jan's patch in
>>> http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.
>>> 
>>> RAM block reordering doesn't exist anymore, having been replaced with
>>> mru_block.
>>> 
>>> The patch reacquired the lock when entering MMIO or PIO emulation.
>>> That's enough while there is only one VCPU thread.
>>> 
>>> Once you have >1 VCPU thread you'll need the RCU work that I am slowly
>>> polishing and sending out.  That's because one device can change the
>>> memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
>>> and that's not thread-safe.
>> You'll have a similar problem for tb_flush() if you use a single tb
>> cache. Just introduce a big hammer function for now that IPIs all the
>> other threads, waits until they halted, do the atomic instruction (like
>> change the memory map or flush the tb cache), then let them continue.
>> 
>> We can later one-by-one get rid of all callers of this.
>> 
>> 
>> Alex
> Maybe we can put a flag in the tb to say it's being executed so tb_alloc won't try to
> realloc it?
> 
> Maybe it's a bad idea and will be actually slower than exiting and waiting all the other
> cpu.
> 
> Fred


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 11:12 ` Paolo Bonzini
  2015-01-15 11:14   ` Alexander Graf
  2015-01-15 12:51   ` Frederic Konrad
@ 2015-01-15 19:07   ` Mark Burton
  2015-01-15 20:27     ` Paolo Bonzini
  2 siblings, 1 reply; 22+ messages in thread
From: Mark Burton @ 2015-01-15 19:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, jan.kiszka, Alexander Graf, qemu-devel,
	KONRAD Frédéric

Still in agony on this issue - I’ve CC’d Jan as his patch looks important…

the patch below would seem to offer by far and away the best result here. (If only we could get it working ;-) )
	it allows threads to proceed as we want them to, it means we dont have to ‘count’ the number of CPU’s that are executing code (and could therefor potentially access IO space)…

However - if we go this route -the current patch is only for x86. (apart from the fact that we still seem to land in a deadlock…)

One thing I wonder - why do we need to go to the extent of mutexing in the TCG like this? Why can’t you simply put a mutex get/release on the slow path? If the core is going to do ‘fast path’ access to the memory, - even if that memory was IO mapped - would it matter if it didn’t have the mutex?

(It would help - I think - if we understood why you believed this patch wouldn’t work with SMP - I thought that was to do with the ‘round-robin’ mechanism - we’ve removed that for multi-thread anyway - but I guess we may have missed something there?)

Cheers

Mark.


> On 15 Jan 2015, at 12:12, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> [now with correct listserver address]
> 
> On 15/01/2015 11:25, Frederic Konrad wrote:
>> Hi everybody,
>> 
>> In case of multithread TCG what is the best way to handle
>> qemu_global_mutex?
>> We though to have one mutex per vcpu and then synchronize vcpu threads when
>> they exit (eg: in tcg_exec_all).
>> 
>> Is that making sense?
> 
> The basic ideas from Jan's patch in
> http://article.gmane.org/gmane.comp.emulators.qemu/118807 still apply.
> 
> RAM block reordering doesn't exist anymore, having been replaced with
> mru_block.
> 
> The patch reacquired the lock when entering MMIO or PIO emulation.
> That's enough while there is only one VCPU thread.
> 
> Once you have >1 VCPU thread you'll need the RCU work that I am slowly
> polishing and sending out.  That's because one device can change the
> memory map, and that will cause a tlb_flush for all CPUs in tcg_commit,
> and that's not thread-safe.
> 
> And later on, once devices start being converted to run outside the BQL,
> that can be changed to use new functions address_space_rw_unlocked /
> io_mem_read_unlocked / io_mem_write_unlocked.  Something like that is
> already visible at https://github.com/bonzini/qemu/commits/rcu (ignore
> patches after "kvm: Switch to unlocked MMIO").
> 
> Paolo
> 
> 
> 


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 19:07   ` Mark Burton
@ 2015-01-15 20:27     ` Paolo Bonzini
  2015-01-15 20:53       ` Mark Burton
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 20:27 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, Peter Maydell, jan.kiszka, qemu-devel, Alexander Graf,
	KONRAD Frédéric



On 15/01/2015 20:07, Mark Burton wrote:
> However - if we go this route -the current patch is only for x86.
> (apart from the fact that we still seem to land in a deadlock…)

Jan said he had it working at least on ARM (MusicPal).

> One thing I wonder - why do we need to go to the extent of mutexing
> in the TCG like this? Why can’t you simply put a mutex get/release on
> the slow path? If the core is going to do ‘fast path’ access to the
> memory, - even if that memory was IO mapped - would it matter if it
> didn’t have the mutex?

Because there is no guarantee that the memory map isn't changed by a
core under the feet of another.  The TLB (in particular the "iotlb") is
only valid with reference to a particular memory map.

Changes to the memory map certainly happen in the slow path, but lookups
are part of the fast path.  Even an rwlocks is too slow for a fast path,
hence the plan of going with RCU.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 20:27     ` Paolo Bonzini
@ 2015-01-15 20:53       ` Mark Burton
  2015-01-15 21:41         ` Paolo Bonzini
  2015-01-15 21:41         ` Paolo Bonzini
  0 siblings, 2 replies; 22+ messages in thread
From: Mark Burton @ 2015-01-15 20:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, jan.kiszka, qemu-devel, Alexander Graf,
	KONRAD Frédéric


> On 15 Jan 2015, at 21:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 15/01/2015 20:07, Mark Burton wrote:
>> However - if we go this route -the current patch is only for x86.
>> (apart from the fact that we still seem to land in a deadlock…)
> 
> Jan said he had it working at least on ARM (MusicPal).

yeah - our problem is when we enable multi-threads - which I dont believe Jan did…
Indeed - he specifically says that doesn’t work…. :-)

> 
>> One thing I wonder - why do we need to go to the extent of mutexing
>> in the TCG like this? Why can’t you simply put a mutex get/release on
>> the slow path? If the core is going to do ‘fast path’ access to the
>> memory, - even if that memory was IO mapped - would it matter if it
>> didn’t have the mutex?
> 
> Because there is no guarantee that the memory map isn't changed by a
> core under the feet of another.  The TLB (in particular the "iotlb") is
> only valid with reference to a particular memory map.

> 
> Changes to the memory map certainly happen in the slow path, but lookups
> are part of the fast path.  Even an rwlocks is too slow for a fast path,
> hence the plan of going with RCU.

Could we arrange the world such that lookups ‘succeed’ (the wheels dont fall off) -ether getting the old value, or the new, but not getting rubbish - and we still only take the mutex if we are going to make alterations to the MM itself? (I have’t looked at the code around that… so sorry if the question is ridiculous).

Cheers

Mark.

> 
> Paolo


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 20:53       ` Mark Burton
@ 2015-01-15 21:41         ` Paolo Bonzini
  2015-01-15 21:41         ` Paolo Bonzini
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 21:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, jan.kiszka, Alexander Graf,
	KONRAD Frédéric



On 15/01/2015 21:53, Mark Burton wrote:
>> Jan said he had it working at least on ARM (MusicPal).
> 
> yeah - our problem is when we enable multi-threads - which I dont believe Jan did…

Multithreaded TCG, or single-threaded TCG with SMP?

>>> One thing I wonder - why do we need to go to the extent of mutexing
>>> in the TCG like this? Why can’t you simply put a mutex get/release on
>>> the slow path? If the core is going to do ‘fast path’ access to the
>>> memory, - even if that memory was IO mapped - would it matter if it
>>> didn’t have the mutex?
>>
>> Because there is no guarantee that the memory map isn't changed by a
>> core under the feet of another.  The TLB (in particular the "iotlb") is
>> only valid with reference to a particular memory map.
> 
>>
>> Changes to the memory map certainly happen in the slow path, but lookups
>> are part of the fast path.  Even an rwlocks is too slow for a fast path,
>> hence the plan of going with RCU.
> 
> Could we arrange the world such that lookups ‘succeed’ (the wheels
> dont fall off) -ether getting the old value, or the new, but not getting
> rubbish - and we still only take the mutex if we are going to make
> alterations to the MM itself? (I have’t looked at the code around that…
> so sorry if the question is ridiculous).

That's the definition of RCU. :)  Look at the docs in
http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more
information. :)

It's still not trivial to make it 100% correct, but at the same time
it's not too hard to prepare something decent to play with.  Also, most
of the work can be done with KVM so it's more or less independent from
what you guys have been doing so far.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 20:53       ` Mark Burton
  2015-01-15 21:41         ` Paolo Bonzini
@ 2015-01-15 21:41         ` Paolo Bonzini
  2015-01-16  7:25           ` Mark Burton
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2015-01-15 21:41 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, Peter Maydell, jan.kiszka, Alexander Graf, qemu-devel,
	KONRAD Frédéric



On 15/01/2015 21:53, Mark Burton wrote:
>> Jan said he had it working at least on ARM (MusicPal).
> 
> yeah - our problem is when we enable multi-threads - which I dont believe Jan did…

Multithreaded TCG, or single-threaded TCG with SMP?

>>> One thing I wonder - why do we need to go to the extent of mutexing
>>> in the TCG like this? Why can’t you simply put a mutex get/release on
>>> the slow path? If the core is going to do ‘fast path’ access to the
>>> memory, - even if that memory was IO mapped - would it matter if it
>>> didn’t have the mutex?
>>
>> Because there is no guarantee that the memory map isn't changed by a
>> core under the feet of another.  The TLB (in particular the "iotlb") is
>> only valid with reference to a particular memory map.
> 
>>
>> Changes to the memory map certainly happen in the slow path, but lookups
>> are part of the fast path.  Even an rwlocks is too slow for a fast path,
>> hence the plan of going with RCU.
> 
> Could we arrange the world such that lookups ‘succeed’ (the wheels
> dont fall off) -ether getting the old value, or the new, but not getting
> rubbish - and we still only take the mutex if we are going to make
> alterations to the MM itself? (I have’t looked at the code around that…
> so sorry if the question is ridiculous).

That's the definition of RCU. :)  Look at the docs in
http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more
information. :)

It's still not trivial to make it 100% correct, but at the same time
it's not too hard to prepare something decent to play with.  Also, most
of the work can be done with KVM so it's more or less independent from
what you guys have been doing so far.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-15 21:41         ` Paolo Bonzini
@ 2015-01-16  7:25           ` Mark Burton
  2015-01-16  8:07             ` Jan Kiszka
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Burton @ 2015-01-16  7:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, jan.kiszka, Alexander Graf, qemu-devel,
	KONRAD Frédéric


> On 15 Jan 2015, at 22:41, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 15/01/2015 21:53, Mark Burton wrote:
>>> Jan said he had it working at least on ARM (MusicPal).
>> 
>> yeah - our problem is when we enable multi-threads - which I dont believe Jan did…
> 
> Multithreaded TCG, or single-threaded TCG with SMP?

He mentions SMP, - I assume thats single-threaded ….

> 
>>>> One thing I wonder - why do we need to go to the extent of mutexing
>>>> in the TCG like this? Why can’t you simply put a mutex get/release on
>>>> the slow path? If the core is going to do ‘fast path’ access to the
>>>> memory, - even if that memory was IO mapped - would it matter if it
>>>> didn’t have the mutex?
>>> 
>>> Because there is no guarantee that the memory map isn't changed by a
>>> core under the feet of another.  The TLB (in particular the "iotlb") is
>>> only valid with reference to a particular memory map.
>> 
>>> 
>>> Changes to the memory map certainly happen in the slow path, but lookups
>>> are part of the fast path.  Even an rwlocks is too slow for a fast path,
>>> hence the plan of going with RCU.
>> 
>> Could we arrange the world such that lookups ‘succeed’ (the wheels
>> dont fall off) -ether getting the old value, or the new, but not getting
>> rubbish - and we still only take the mutex if we are going to make
>> alterations to the MM itself? (I have’t looked at the code around that…
>> so sorry if the question is ridiculous).
> 
> That's the definition of RCU. :)  Look at the docs in
> http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more
> information. :)

Ahh - I see !

> 
> It's still not trivial to make it 100% correct, but at the same time
> it's not too hard to prepare something decent to play with.  Also, most
> of the work can be done with KVM so it's more or less independent from
> what you guys have been doing so far.

Yes - the issue is if we end up relying on it.
But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently).

So - our only small problem is getting Jan’s patch to work for multi-thread :-))

Cheers

Mark.

> 
> Paolo


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-16  7:25           ` Mark Burton
@ 2015-01-16  8:07             ` Jan Kiszka
  2015-01-16  8:43               ` Frederic Konrad
  2015-01-16  8:52               ` Mark Burton
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Kiszka @ 2015-01-16  8:07 UTC (permalink / raw)
  To: Mark Burton, Paolo Bonzini
  Cc: mttcg, Peter Maydell, KONRAD Frédéric, qemu-devel,
	Alexander Graf

On 2015-01-16 08:25, Mark Burton wrote:
> 
>> On 15 Jan 2015, at 22:41, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>>
>> On 15/01/2015 21:53, Mark Burton wrote:
>>>> Jan said he had it working at least on ARM (MusicPal).
>>>
>>> yeah - our problem is when we enable multi-threads - which I dont believe Jan did…
>>
>> Multithreaded TCG, or single-threaded TCG with SMP?
> 
> He mentions SMP, - I assume thats single-threaded ….

Yes, I didn't patched anything towards multi-threaded SMP. Main reason:
there was no answer on how to emulated the memory models of that target
architecture over the host one which is mandatory if you let the
emulated CPUs run unsynchronized in parallel. Did this change?

> 
>>
>>>>> One thing I wonder - why do we need to go to the extent of mutexing
>>>>> in the TCG like this? Why can’t you simply put a mutex get/release on
>>>>> the slow path? If the core is going to do ‘fast path’ access to the
>>>>> memory, - even if that memory was IO mapped - would it matter if it
>>>>> didn’t have the mutex?
>>>>
>>>> Because there is no guarantee that the memory map isn't changed by a
>>>> core under the feet of another.  The TLB (in particular the "iotlb") is
>>>> only valid with reference to a particular memory map.
>>>
>>>>
>>>> Changes to the memory map certainly happen in the slow path, but lookups
>>>> are part of the fast path.  Even an rwlocks is too slow for a fast path,
>>>> hence the plan of going with RCU.
>>>
>>> Could we arrange the world such that lookups ‘succeed’ (the wheels
>>> dont fall off) -ether getting the old value, or the new, but not getting
>>> rubbish - and we still only take the mutex if we are going to make
>>> alterations to the MM itself? (I have’t looked at the code around that…
>>> so sorry if the question is ridiculous).
>>
>> That's the definition of RCU. :)  Look at the docs in
>> http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more
>> information. :)
> 
> Ahh - I see !
> 
>>
>> It's still not trivial to make it 100% correct, but at the same time
>> it's not too hard to prepare something decent to play with.  Also, most
>> of the work can be done with KVM so it's more or less independent from
>> what you guys have been doing so far.
> 
> Yes - the issue is if we end up relying on it.
> But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently).
> 
> So - our only small problem is getting Jan’s patch to work for multi-thread :-))

See above regarding the potential dimension.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-16  8:07             ` Jan Kiszka
@ 2015-01-16  8:43               ` Frederic Konrad
  2015-01-16  8:52               ` Mark Burton
  1 sibling, 0 replies; 22+ messages in thread
From: Frederic Konrad @ 2015-01-16  8:43 UTC (permalink / raw)
  To: Jan Kiszka, Mark Burton, Paolo Bonzini
  Cc: mttcg, Peter Maydell, qemu-devel, Alexander Graf

On 16/01/2015 09:07, Jan Kiszka wrote:
> On 2015-01-16 08:25, Mark Burton wrote:
>>> On 15 Jan 2015, at 22:41, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>>
>>>
>>> On 15/01/2015 21:53, Mark Burton wrote:
>>>>> Jan said he had it working at least on ARM (MusicPal).
>>>> yeah - our problem is when we enable multi-threads - which I dont believe Jan did…
>>> Multithreaded TCG, or single-threaded TCG with SMP?
>> He mentions SMP, - I assume thats single-threaded ….
> Yes, I didn't patched anything towards multi-threaded SMP. Main reason:
> there was no answer on how to emulated the memory models of that target
> architecture over the host one which is mandatory if you let the
> emulated CPUs run unsynchronized in parallel. Did this change?
Hi Jan,

Actually it's what we are trying to do, running emulated CPUs in parallel.

I get a double mutex_lock error (eg: no unlock) I must have missed something
during the "rebase".
I'll check.

Thanks,
Fred

>>>>>> One thing I wonder - why do we need to go to the extent of mutexing
>>>>>> in the TCG like this? Why can’t you simply put a mutex get/release on
>>>>>> the slow path? If the core is going to do ‘fast path’ access to the
>>>>>> memory, - even if that memory was IO mapped - would it matter if it
>>>>>> didn’t have the mutex?
>>>>> Because there is no guarantee that the memory map isn't changed by a
>>>>> core under the feet of another.  The TLB (in particular the "iotlb") is
>>>>> only valid with reference to a particular memory map.
>>>>> Changes to the memory map certainly happen in the slow path, but lookups
>>>>> are part of the fast path.  Even an rwlocks is too slow for a fast path,
>>>>> hence the plan of going with RCU.
>>>> Could we arrange the world such that lookups ‘succeed’ (the wheels
>>>> dont fall off) -ether getting the old value, or the new, but not getting
>>>> rubbish - and we still only take the mutex if we are going to make
>>>> alterations to the MM itself? (I have’t looked at the code around that…
>>>> so sorry if the question is ridiculous).
>>> That's the definition of RCU. :)  Look at the docs in
>>> http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more
>>> information. :)
>> Ahh - I see !
>>
>>> It's still not trivial to make it 100% correct, but at the same time
>>> it's not too hard to prepare something decent to play with.  Also, most
>>> of the work can be done with KVM so it's more or less independent from
>>> what you guys have been doing so far.
>> Yes - the issue is if we end up relying on it.
>> But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently).
>>
>> So - our only small problem is getting Jan’s patch to work for multi-thread :-))
> See above regarding the potential dimension.
>
> Jan
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] global_mutex and multithread.
  2015-01-16  8:07             ` Jan Kiszka
  2015-01-16  8:43               ` Frederic Konrad
@ 2015-01-16  8:52               ` Mark Burton
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Burton @ 2015-01-16  8:52 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: mttcg, Peter Maydell, Alexander Graf, qemu-devel, Paolo Bonzini,
	KONRAD Frédéric


> On 16 Jan 2015, at 09:07, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
> On 2015-01-16 08:25, Mark Burton wrote:
>> 
>>> On 15 Jan 2015, at 22:41, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> 
>>> 
>>> On 15/01/2015 21:53, Mark Burton wrote:
>>>>> Jan said he had it working at least on ARM (MusicPal).
>>>> 
>>>> yeah - our problem is when we enable multi-threads - which I dont believe Jan did…
>>> 
>>> Multithreaded TCG, or single-threaded TCG with SMP?
>> 
>> He mentions SMP, - I assume thats single-threaded ….
> 
> Yes, I didn't patched anything towards multi-threaded SMP. Main reason:
> there was no answer on how to emulated the memory models of that target
> architecture over the host one which is mandatory if you let the
> emulated CPUs run unsynchronized in parallel. Did this change?
> 

No - we just decided to stop pressing the button…
I think this is the ‘x86 on ARM’ issue ? - our plan is to get ARM of x86 working (or that class), and the worry about the other way round.

I dont see why SMP ARM on X86 wouldn’t work with your patch?

Cheers

Mark.

>> 
>>> 
>>>>>> One thing I wonder - why do we need to go to the extent of mutexing
>>>>>> in the TCG like this? Why can’t you simply put a mutex get/release on
>>>>>> the slow path? If the core is going to do ‘fast path’ access to the
>>>>>> memory, - even if that memory was IO mapped - would it matter if it
>>>>>> didn’t have the mutex?
>>>>> 
>>>>> Because there is no guarantee that the memory map isn't changed by a
>>>>> core under the feet of another.  The TLB (in particular the "iotlb") is
>>>>> only valid with reference to a particular memory map.
>>>> 
>>>>> 
>>>>> Changes to the memory map certainly happen in the slow path, but lookups
>>>>> are part of the fast path.  Even an rwlocks is too slow for a fast path,
>>>>> hence the plan of going with RCU.
>>>> 
>>>> Could we arrange the world such that lookups ‘succeed’ (the wheels
>>>> dont fall off) -ether getting the old value, or the new, but not getting
>>>> rubbish - and we still only take the mutex if we are going to make
>>>> alterations to the MM itself? (I have’t looked at the code around that…
>>>> so sorry if the question is ridiculous).
>>> 
>>> That's the definition of RCU. :)  Look at the docs in
>>> http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more
>>> information. :)
>> 
>> Ahh - I see !
>> 
>>> 
>>> It's still not trivial to make it 100% correct, but at the same time
>>> it's not too hard to prepare something decent to play with.  Also, most
>>> of the work can be done with KVM so it's more or less independent from
>>> what you guys have been doing so far.
>> 
>> Yes - the issue is if we end up relying on it.
>> But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently).
>> 
>> So - our only small problem is getting Jan’s patch to work for multi-thread :-))
> 
> See above regarding the potential dimension.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux


	 +44 (0)20 7100 3485 x 210
+33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-01-16  8:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-15 10:25 [Qemu-devel] global_mutex and multithread Frederic Konrad
2015-01-15 10:34 ` Peter Maydell
2015-01-15 10:41   ` Frederic Konrad
2015-01-15 10:44 ` Paolo Bonzini
2015-01-15 11:12 ` Paolo Bonzini
2015-01-15 11:14   ` Alexander Graf
2015-01-15 11:26     ` Paolo Bonzini
2015-01-15 13:30     ` Frederic Konrad
2015-01-15 13:34       ` Mark Burton
2015-01-15 12:51   ` Frederic Konrad
2015-01-15 12:56     ` Paolo Bonzini
2015-01-15 13:27       ` Frederic Konrad
2015-01-15 13:30         ` Peter Maydell
2015-01-15 19:07   ` Mark Burton
2015-01-15 20:27     ` Paolo Bonzini
2015-01-15 20:53       ` Mark Burton
2015-01-15 21:41         ` Paolo Bonzini
2015-01-15 21:41         ` Paolo Bonzini
2015-01-16  7:25           ` Mark Burton
2015-01-16  8:07             ` Jan Kiszka
2015-01-16  8:43               ` Frederic Konrad
2015-01-16  8:52               ` Mark Burton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.