All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
@ 2015-05-12 15:32 Peter Maydell
  2015-05-12 15:43 ` Richard Henderson
  2015-05-12 18:17 ` Paolo Bonzini
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Maydell @ 2015-05-12 15:32 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Paolo Bonzini, Pavel Dovgaluk, Richard Henderson

In order for -icount to work, it's important for the target
translate.c code to correctly bracket any generated code which
can "do I/O" with gen_io_start()/gen_io_end() calls. But
does anybody know exactly what the criteria are here for this?
It would be nice if we could document this in a comment in
gen_icount.h -- I'm happy to write one up if somebody will just
tell me what the right answer is :-)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-12 15:32 [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ? Peter Maydell
@ 2015-05-12 15:43 ` Richard Henderson
  2015-05-12 15:54   ` Peter Maydell
  2015-05-12 18:17 ` Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2015-05-12 15:43 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers; +Cc: Paolo Bonzini, Pavel Dovgaluk

On 05/12/2015 08:32 AM, Peter Maydell wrote:
> In order for -icount to work, it's important for the target
> translate.c code to correctly bracket any generated code which
> can "do I/O" with gen_io_start()/gen_io_end() calls. But
> does anybody know exactly what the criteria are here for this?
> It would be nice if we could document this in a comment in
> gen_icount.h -- I'm happy to write one up if somebody will just
> tell me what the right answer is :-)

I'm really not sure.

So far I've assumed "i/o"-like insns, and those that can read some sort of
cycle counter.  So while that handles easy cases like "inb" and "rdcc", it
certainly doesn't handle any target for which all i/o is memory mapped.

Which is sorta most of them these days, so the utility seems to be low...


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-12 15:43 ` Richard Henderson
@ 2015-05-12 15:54   ` Peter Maydell
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2015-05-12 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, QEMU Developers, Pavel Dovgaluk

On 12 May 2015 at 16:43, Richard Henderson <rth@twiddle.net> wrote:
> On 05/12/2015 08:32 AM, Peter Maydell wrote:
>> In order for -icount to work, it's important for the target
>> translate.c code to correctly bracket any generated code which
>> can "do I/O" with gen_io_start()/gen_io_end() calls. But
>> does anybody know exactly what the criteria are here for this?
>> It would be nice if we could document this in a comment in
>> gen_icount.h -- I'm happy to write one up if somebody will just
>> tell me what the right answer is :-)
>
> I'm really not sure.
>
> So far I've assumed "i/o"-like insns, and those that can read some sort of
> cycle counter.  So while that handles easy cases like "inb" and "rdcc", it
> certainly doesn't handle any target for which all i/o is memory mapped.

I think the "mmio access" case is already dealt with in the
softmmu_template.h handlers, isn't it? If the CPU isn't in a
"can do IO" state then the io_read/write handlers call
cpu_io_recompile(), which figures out how far through the TB
we were (using the machinery we already have for converting
host addresses of faults into guest PC values), and creates
a new TB which stops with the MMIO load/store. (I don't
entirely understand cpu_io_recompile(), though -- it looks
rather tricksy.)

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-12 15:32 [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ? Peter Maydell
  2015-05-12 15:43 ` Richard Henderson
@ 2015-05-12 18:17 ` Paolo Bonzini
  2015-05-12 19:41   ` Peter Maydell
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Paolo Bonzini @ 2015-05-12 18:17 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers; +Cc: Pavel Dovgaluk, Richard Henderson



On 12/05/2015 17:32, Peter Maydell wrote:
> In order for -icount to work, it's important for the target
> translate.c code to correctly bracket any generated code which
> can "do I/O" with gen_io_start()/gen_io_end() calls. But
> does anybody know exactly what the criteria are here for this?
> It would be nice if we could document this in a comment in
> gen_icount.h -- I'm happy to write one up if somebody will just
> tell me what the right answer is :-)

It's any instruction that can cause an icount read, typically through
QEMU_CLOCK_VIRTUAL or cpu_get_ticks().

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-12 18:17 ` Paolo Bonzini
@ 2015-05-12 19:41   ` Peter Maydell
  2015-05-13  8:42     ` Paolo Bonzini
  2015-05-13  6:57   ` Pavel Dovgaluk
       [not found]   ` <16201.3286528692$1431500273@news.gmane.org>
  2 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2015-05-12 19:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: QEMU Developers, Pavel Dovgaluk, Richard Henderson

On 12 May 2015 at 19:17, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 12/05/2015 17:32, Peter Maydell wrote:
>> In order for -icount to work, it's important for the target
>> translate.c code to correctly bracket any generated code which
>> can "do I/O" with gen_io_start()/gen_io_end() calls. But
>> does anybody know exactly what the criteria are here for this?
>> It would be nice if we could document this in a comment in
>> gen_icount.h -- I'm happy to write one up if somebody will just
>> tell me what the right answer is :-)
>
> It's any instruction that can cause an icount read, typically through
> QEMU_CLOCK_VIRTUAL or cpu_get_ticks().

Also anything that can cause a CPU interrupt, since tcg_handle_interrupt()
will call cpu_abort() if the CPU gets an interrupt while it's not
in a 'can do IO' state.

Anything else?

[How are -icount and multi-threaded TCG going to interact? Do we
just say "you get one or the other but not both" ?]

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-12 18:17 ` Paolo Bonzini
  2015-05-12 19:41   ` Peter Maydell
@ 2015-05-13  6:57   ` Pavel Dovgaluk
       [not found]   ` <16201.3286528692$1431500273@news.gmane.org>
  2 siblings, 0 replies; 11+ messages in thread
From: Pavel Dovgaluk @ 2015-05-13  6:57 UTC (permalink / raw)
  To: 'Paolo Bonzini', 'Peter Maydell',
	'QEMU Developers'
  Cc: 'Richard Henderson'

> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
> On 12/05/2015 17:32, Peter Maydell wrote:
> > In order for -icount to work, it's important for the target
> > translate.c code to correctly bracket any generated code which
> > can "do I/O" with gen_io_start()/gen_io_end() calls. But
> > does anybody know exactly what the criteria are here for this?
> > It would be nice if we could document this in a comment in
> > gen_icount.h -- I'm happy to write one up if somebody will just
> > tell me what the right answer is :-)
> 
> It's any instruction that can cause an icount read, typically through
> QEMU_CLOCK_VIRTUAL or cpu_get_ticks().

Doesn't this mean that ARM has incorrect implementation of icount?
MMIO is common for this platform, but none of memory accesses are
surrounded with gen_io_start()/gen_io_end().

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
       [not found]   ` <16201.3286528692$1431500273@news.gmane.org>
@ 2015-05-13  8:32     ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2015-05-13  8:32 UTC (permalink / raw)
  To: Pavel Dovgaluk, 'Peter Maydell', 'QEMU Developers'
  Cc: 'Richard Henderson'



On 13/05/2015 08:57, Pavel Dovgaluk wrote:
>> > It's any instruction that can cause an icount read, typically through
>> > QEMU_CLOCK_VIRTUAL or cpu_get_ticks().
> Doesn't this mean that ARM has incorrect implementation of icount?
> MMIO is common for this platform, but none of memory accesses are
> surrounded with gen_io_start()/gen_io_end().

See here:

    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu_can_do_io(cpu)) {
        cpu_io_recompile(cpu, retaddr);
    }

in softmmu_template.h.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-12 19:41   ` Peter Maydell
@ 2015-05-13  8:42     ` Paolo Bonzini
  2015-05-13  9:41       ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2015-05-13  8:42 UTC (permalink / raw)
  To: Peter Maydell; +Cc: mttcg, QEMU Developers, Pavel Dovgaluk, Richard Henderson



On 12/05/2015 21:41, Peter Maydell wrote:
>> > It's any instruction that can cause an icount read, typically through
>> > QEMU_CLOCK_VIRTUAL or cpu_get_ticks().
> Also anything that can cause a CPU interrupt, since tcg_handle_interrupt()
> will call cpu_abort() if the CPU gets an interrupt while it's not
> in a 'can do IO' state.
> 
> Anything else?
> 
> [How are -icount and multi-threaded TCG going to interact? Do we
> just say "you get one or the other but not both" ?]

For -icount and SMP, yes.  I even posted a patch to that end once.

You can get -icount and multi-threaded TCG (which for UP is simply TCG
with execution out of the BQL) together I think.  For example you could
handle cpu->icount_decr.u16.low == 0 like cpu->halted, hanging the CPU
thread until QEMU_CLOCK_VIRTUAL timers have been processed.  The I/O
thread would have to kick the CPU after processing QEMU_CLOCK_VIRTUAL
timers---not hard to do.

In fact, I suspect cpu->halted should become a kind of bitmap, and "wait
for interrupt" should be just one bit in there.  Any operation that
requires synchronization with other VCPUs should use cpu->halted so that
VCPUs can still run foreign code with run_on_vcpu.  This was the plan I
outlined to Frederic and Mark for flushing TLB remotely, at least.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-13  8:42     ` Paolo Bonzini
@ 2015-05-13  9:41       ` Peter Maydell
  2015-05-13 10:03         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2015-05-13  9:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: mttcg, QEMU Developers, Pavel Dovgaluk, Richard Henderson

On 13 May 2015 at 09:42, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 12/05/2015 21:41, Peter Maydell wrote:
>>> > It's any instruction that can cause an icount read, typically through
>>> > QEMU_CLOCK_VIRTUAL or cpu_get_ticks().
>> Also anything that can cause a CPU interrupt, since tcg_handle_interrupt()
>> will call cpu_abort() if the CPU gets an interrupt while it's not
>> in a 'can do IO' state.
>>
>> Anything else?
>>
>> [How are -icount and multi-threaded TCG going to interact? Do we
>> just say "you get one or the other but not both" ?]
>
> For -icount and SMP, yes.  I even posted a patch to that end once.

I don't see why -icount and SMP need to be mutually exclusive.
If we're round-robining between the SMP CPUs then they should
all stay deterministic, I would have thought?

> You can get -icount and multi-threaded TCG (which for UP is simply TCG
> with execution out of the BQL) together I think.  For example you could
> handle cpu->icount_decr.u16.low == 0 like cpu->halted, hanging the CPU
> thread until QEMU_CLOCK_VIRTUAL timers have been processed.  The I/O
> thread would have to kick the CPU after processing QEMU_CLOCK_VIRTUAL
> timers---not hard to do.

Multithreaded TCG for a UP guest isn't very interesting though...

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-13  9:41       ` Peter Maydell
@ 2015-05-13 10:03         ` Paolo Bonzini
  2015-05-13 12:30           ` Frederic Konrad
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2015-05-13 10:03 UTC (permalink / raw)
  To: Peter Maydell; +Cc: mttcg, QEMU Developers, Pavel Dovgaluk, Richard Henderson



On 13/05/2015 11:41, Peter Maydell wrote:
> > For -icount and SMP, yes.  I even posted a patch to that end once.
> 
> I don't see why -icount and SMP need to be mutually exclusive.
> If we're round-robining between the SMP CPUs then they should
> all stay deterministic, I would have thought?

No, because the round-robin switches happen non-deterministically when
the I/O thread kicks the VCPU in qemu_mutex_lock_iothread.

It gets worse with BQL-free TCG which lets you remove the kicks
altogether (and the round-robin disappears in favor of true
multithreading).  Even you could keep the kicks, having both round-robin
and multi-threading would be extra complication in the code and cause of
bitrot.

> > You can get -icount and multi-threaded TCG (which for UP is simply TCG
> > with execution out of the BQL) together I think.  For example you could
> > handle cpu->icount_decr.u16.low == 0 like cpu->halted, hanging the CPU
> > thread until QEMU_CLOCK_VIRTUAL timers have been processed.  The I/O
> > thread would have to kick the CPU after processing QEMU_CLOCK_VIRTUAL
> > timers---not hard to do.
> 
> Multithreaded TCG for a UP guest isn't very interesting though...

BQL-free TCG is interesting though, for two reasons:

1) maintainability: get rid of all the aforementioned "kick VCPU" stuff
in qemu_mutex_lock_iothread;

2) performance: allow handling of I/O events to run in parallel with the
VCPU, rather than the lockstep technique we have now.  This improves
performance, so that for example you might get rid of the artificial
ratelimiting in ptimer.c.

In case it wasn't clear, BQL-freedom is the main reason why I'm
interested in multithreaded TCG! :)

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ?
  2015-05-13 10:03         ` Paolo Bonzini
@ 2015-05-13 12:30           ` Frederic Konrad
  0 siblings, 0 replies; 11+ messages in thread
From: Frederic Konrad @ 2015-05-13 12:30 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Maydell
  Cc: mttcg, QEMU Developers, Pavel Dovgaluk, Richard Henderson

Hi,

On 13/05/2015 12:03, Paolo Bonzini wrote:
>
> On 13/05/2015 11:41, Peter Maydell wrote:
>>> For -icount and SMP, yes.  I even posted a patch to that end once.
>> I don't see why -icount and SMP need to be mutually exclusive.
>> If we're round-robining between the SMP CPUs then they should
>> all stay deterministic, I would have thought?
> No, because the round-robin switches happen non-deterministically when
> the I/O thread kicks the VCPU in qemu_mutex_lock_iothread.
>
> It gets worse with BQL-free TCG which lets you remove the kicks
> altogether (and the round-robin disappears in favor of true
> multithreading).  Even you could keep the kicks, having both round-robin
> and multi-threading would be extra complication in the code and cause of
> bitrot.

If you're talking of the kick_thread, kick_cpu, I think we should keep that.
We need to be able to synchronize all VCPUs somehow when we want to do for
example tb/tlb flush and tb_invalidate so we are sure no others VCPU will be
executing tb.

But then yes, we will probably have pain with icount and make it less 
deterministic
than it is today (it's broken in my series though because of 
cpu_exec_nocache
which does a tb_invalidate).

Fred
>>> You can get -icount and multi-threaded TCG (which for UP is simply TCG
>>> with execution out of the BQL) together I think.  For example you could
>>> handle cpu->icount_decr.u16.low == 0 like cpu->halted, hanging the CPU
>>> thread until QEMU_CLOCK_VIRTUAL timers have been processed.  The I/O
>>> thread would have to kick the CPU after processing QEMU_CLOCK_VIRTUAL
>>> timers---not hard to do.
>> Multithreaded TCG for a UP guest isn't very interesting though...
> BQL-free TCG is interesting though, for two reasons:
>
> 1) maintainability: get rid of all the aforementioned "kick VCPU" stuff
> in qemu_mutex_lock_iothread;
>
> 2) performance: allow handling of I/O events to run in parallel with the
> VCPU, rather than the lockstep technique we have now.  This improves
> performance, so that for example you might get rid of the artificial
> ratelimiting in ptimer.c.
>
> In case it wasn't clear, BQL-freedom is the main reason why I'm
> interested in multithreaded TCG! :)
>
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-13 12:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 15:32 [Qemu-devel] when does a target frontend need to use gen_io_start()/gen_io_end() ? Peter Maydell
2015-05-12 15:43 ` Richard Henderson
2015-05-12 15:54   ` Peter Maydell
2015-05-12 18:17 ` Paolo Bonzini
2015-05-12 19:41   ` Peter Maydell
2015-05-13  8:42     ` Paolo Bonzini
2015-05-13  9:41       ` Peter Maydell
2015-05-13 10:03         ` Paolo Bonzini
2015-05-13 12:30           ` Frederic Konrad
2015-05-13  6:57   ` Pavel Dovgaluk
     [not found]   ` <16201.3286528692$1431500273@news.gmane.org>
2015-05-13  8:32     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.