All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] cpu_io_recompile, icount, and re-issued instructions
@ 2017-05-01  1:57 Michael Eager
  2017-05-02  7:59 ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Eager @ 2017-05-01  1:57 UTC (permalink / raw)
  To: qemu-devel

I'm working with an emulation for a proprietary processor on an older QEMU source base.  It looks 
like the problem I am seeing in the old sources would still be present in the current source base.

I'm seeing incorrect values when there is a write to a memory-mapped I/O device when icount is set. 
  What I see happening is that a TB with ~20 instructions is executed which contains a write to the 
MM I/O address.  When it gets to the io_write routine, can_do_io is false, which results in a call 
to cpu_io_recompile.

cpu_io_recompile does what it (sort of) says it is supposed to do: it builds a new TB with the I/O 
instruction as the last instruction in the block, then re-issues the TB.  The problem is that the 
new TB contains the instructions before the I/O instruction, so they are executed a second time.

There is a note in cpu_io_recompile which mentions the situation where the I/O instruction is not 
the first in the TB has not been handled:

     /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
      * the first in the TB) then we end up generating a whole new TB and
      *  repeating the fault, which is horribly inefficient.
      *  Better would be to execute just this insn uncached, or generate a
      *  second new TB.

I'm a bit unclear what this is saying.  A new TB is generated which is issued setting can_do_io, and 
which passes through io_write without a problem.  I'm not sure what fault is repeated.

It seems to me that what cpu_io_recompile should do is create a new TB with the n instructions which 
were already executed and cache it.  Then it should create a TB with only the I/O instruction, 
setting can_do_io, and re-issue this TB.

Am I understanding this correctly?

This code has been around for a long, long time.  Has anyone noticed this problem in the past?


-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] cpu_io_recompile, icount, and re-issued instructions
  2017-05-01  1:57 [Qemu-devel] cpu_io_recompile, icount, and re-issued instructions Michael Eager
@ 2017-05-02  7:59 ` Paolo Bonzini
  2017-05-02 17:59   ` Michael Eager
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2017-05-02  7:59 UTC (permalink / raw)
  To: Michael Eager, qemu-devel



On 01/05/2017 03:57, Michael Eager wrote:
> 
> 
> I'm seeing incorrect values when there is a write to a memory-mapped I/O
> device when icount is set.  What I see happening is that a TB with ~20
> instructions is executed which contains a write to the MM I/O address. 
> When it gets to the io_write routine, can_do_io is false, which results
> in a call to cpu_io_recompile.
> 
> cpu_io_recompile does what it (sort of) says it is supposed to do: it
> builds a new TB with the I/O instruction as the last instruction in the
> block, then re-issues the TB.  The problem is that the new TB contains
> the instructions before the I/O instruction, so they are executed a
> second time.

They shouldn't.  When called from cpu_io_recompile,
cpu_restore_state_from_tb should compute the I/O instruction's target PC
from the host PC (stored in retaddr).

Then what happens is the following:

- cpu_io_recompile generates a new TB ending with the I/O instruction.
This new TB has a hash table conflict with the old TB (same
PC/cs_base/flags) the old TB is implicitly removed

- cpu_io_recompile calls cpu_loop_exit_noexc, which goes back to the
execution loop with updated PC

- because the PC is different, a new TB is looked up for the I/O
instruction's PC.  The TB probably is not there and translation starts
again, this time at the I/O instruction

- the new TB, when executed, causes cpu_io_recompile to fire again.
This is the inefficient part mentioned in cpu_io_recompile

- cpu_io_recompile now compiles a one-instruction TB and goes back to
the execution loop

- finally the execution loop executes the one-instruction TB for the I/O
instruction, then it can go on

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] cpu_io_recompile, icount, and re-issued instructions
  2017-05-02  7:59 ` Paolo Bonzini
@ 2017-05-02 17:59   ` Michael Eager
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Eager @ 2017-05-02 17:59 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

On 05/02/2017 12:59 AM, Paolo Bonzini wrote:
>
>
> On 01/05/2017 03:57, Michael Eager wrote:
>>
>>
>> I'm seeing incorrect values when there is a write to a memory-mapped I/O
>> device when icount is set.  What I see happening is that a TB with ~20
>> instructions is executed which contains a write to the MM I/O address.
>> When it gets to the io_write routine, can_do_io is false, which results
>> in a call to cpu_io_recompile.
>>
>> cpu_io_recompile does what it (sort of) says it is supposed to do: it
>> builds a new TB with the I/O instruction as the last instruction in the
>> block, then re-issues the TB.  The problem is that the new TB contains
>> the instructions before the I/O instruction, so they are executed a
>> second time.
>
> They shouldn't.  When called from cpu_io_recompile,
> cpu_restore_state_from_tb should compute the I/O instruction's target PC
> from the host PC (stored in retaddr).
>
> Then what happens is the following:
>
> - cpu_io_recompile generates a new TB ending with the I/O instruction.
> This new TB has a hash table conflict with the old TB (same
> PC/cs_base/flags) the old TB is implicitly removed
>
> - cpu_io_recompile calls cpu_loop_exit_noexc, which goes back to the
> execution loop with updated PC
>
> - because the PC is different, a new TB is looked up for the I/O
> instruction's PC.  The TB probably is not there and translation starts
> again, this time at the I/O instruction
>
> - the new TB, when executed, causes cpu_io_recompile to fire again.
> This is the inefficient part mentioned in cpu_io_recompile
>
> - cpu_io_recompile now compiles a one-instruction TB and goes back to
> the execution loop
>
> - finally the execution loop executes the one-instruction TB for the I/O
> instruction, then it can go on

Thanks for the explanation.

The old QEMU source base I'm working with predates the refactoring of cpu_restore_state
by a couple months (sigh).  It also doesn't have the patches which changed the argument
from env to cpu.  I tried to cherry-pick the patches but there are too many conflicts.

I patched the old code to do what I suggested: a new TB for the already executed
instructions and a TB for the I/O instruction.  It works and has the small advantage
of not requiring a second trip through cpu_io_recompile.  But it will be a merge
conflict when we (eventually) upgrade to more recent QEMU sources.


-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] cpu_io_recompile, icount, and re-issued instructions
@ 2017-05-01  0:55 Michael Eager
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Eager @ 2017-05-01  0:55 UTC (permalink / raw)
  To: qemu-devel

I'm working with an emulation for a proprietary processor on an older QEMU source base.  It looks 
like the problem I am seeing in the old sources would still be present in the current source base.

I'm seeing incorrect values when there is a write to a memory-mapped I/O device when icount is set. 
  What I see happening is that a TB with ~20 instructions is executed which contains a write to the 
MM I/O address.  When it gets to the io_write routine, can_do_io is false, which results in a call 
to cpu_io_recompile.

cpu_io_recompile does what it (sort of) says it is supposed to do: it builds a new TB with the I/O 
instruction as the last instruction in the block, then re-issues the TB.  The problem is that the 
new TB contains the instructions before the I/O instruction, so they are executed a second time.

There is a note in cpu_io_recompile which mentions the situation where the I/O instruction is not 
the first in the TB has not been handled:

     /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
      * the first in the TB) then we end up generating a whole new TB and
      *  repeating the fault, which is horribly inefficient.
      *  Better would be to execute just this insn uncached, or generate a
      *  second new TB.

I'm a bit unclear what this is saying.  A new TB is generated which is issued setting can_do_io, and 
which passes through io_write without a problem.  I'm not sure what fault is repeated.

It seems to me that what cpu_io_recompile should do is create a new TB with the n instructions which 
were already executed and cache it.  Then it should create a TB with only the I/O instruction, 
setting can_do_io, and re-issue this TB.

Am I understanding this correctly?

This code has been around for a long, long time.  Has anyone noticed this problem in the past?


-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-05-02 18:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-01  1:57 [Qemu-devel] cpu_io_recompile, icount, and re-issued instructions Michael Eager
2017-05-02  7:59 ` Paolo Bonzini
2017-05-02 17:59   ` Michael Eager
  -- strict thread matches above, loose matches on Subject: below --
2017-05-01  0:55 Michael Eager

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.