All of lore.kernel.org
 help / color / mirror / Atom feed
* Detecting Faulting Instructions From Plugins
@ 2021-01-30  3:23 Aaron Lindsay
  2021-01-30 18:55 ` Aaron Lindsay
  2021-02-04 21:31 ` Aaron Lindsay
  0 siblings, 2 replies; 12+ messages in thread
From: Aaron Lindsay @ 2021-01-30  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, cota, Alex Bennée

Hello,

I appear to be seeing that if I register a callback for an instruction
via `qemu_plugin_register_vcpu_insn_exec_cb` I receive a callback even
if the instruction faults. For example, if an instruction attempts to
load memory from a page which isn't currently mapped by the OS, I
receive two calls for that instruction - one before the page fault, and
one afterwards when the load succeeds.

Two questions:
1. Is this considered a bug or a "feature"?
2.a. If a bug, is there a good way to detect this from inside the
	 tcg/plugin infrastructure and avoid calling the callback for the
	 faulting execution of the instruction?
2.b. If a "feature", is there a good way to detect this from my plugin?

Thanks!

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-01-30  3:23 Detecting Faulting Instructions From Plugins Aaron Lindsay
@ 2021-01-30 18:55 ` Aaron Lindsay
  2021-02-01 12:07   ` Alex Bennée
  2021-02-04 21:31 ` Aaron Lindsay
  1 sibling, 1 reply; 12+ messages in thread
From: Aaron Lindsay @ 2021-01-30 18:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, cota, Alex Bennée

On Jan 29 22:23, Aaron Lindsay wrote:
> 1. Is this considered a bug or a "feature"?
> 2.a. If a bug, is there a good way to detect this from inside the
> 	 tcg/plugin infrastructure and avoid calling the callback for the
> 	 faulting execution of the instruction?
> 2.b. If a "feature", is there a good way to detect this from my plugin?

I think I've convinced myself the current behavior *is* a "feature", and
working as intended since the instruction can be considered
architecturally committed, even if it faults (ARM statement). But I am
still unsure of the best way to detect whether a load/store instruction
has faulted from within the plugin interface, so I welcome thoughts
there.

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-01-30 18:55 ` Aaron Lindsay
@ 2021-02-01 12:07   ` Alex Bennée
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2021-02-01 12:07 UTC (permalink / raw)
  To: Aaron Lindsay; +Cc: cota, richard.henderson, qemu-devel


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Jan 29 22:23, Aaron Lindsay wrote:
>> 1. Is this considered a bug or a "feature"?
>> 2.a. If a bug, is there a good way to detect this from inside the
>> 	 tcg/plugin infrastructure and avoid calling the callback for the
>> 	 faulting execution of the instruction?
>> 2.b. If a "feature", is there a good way to detect this from my plugin?
>
> I think I've convinced myself the current behavior *is* a "feature", and
> working as intended since the instruction can be considered
> architecturally committed, even if it faults (ARM statement).

Yes I think that's correct. I assume in between the page-fault handler
has run and ensured the page is properly mapped and then returned to
*re*-execute the instruction.

> But I am
> still unsure of the best way to detect whether a load/store instruction
> has faulted from within the plugin interface, so I welcome thoughts
> there.

A hacky method would be to:

  a) track last executed PC on a per-vCPU basis in the callbacks for
  load/store
  b) have a specific callback handler for the head of the vector table
  for synchronous exceptions.

It wouldn't be ideal because you need some knowledge of where the table
is and depending on the architecture the entry may be shared between
synchronous (i.e. the last instruction that attempted to execute) and
asynchronous ones (e.g. a timer just happened to fire).

If we get to the point we can expose the register values to the plugin
then it becomes a much simpler operation because there will be state
information exposed somewhere in the register state.

>
> -Aaron


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-01-30  3:23 Detecting Faulting Instructions From Plugins Aaron Lindsay
  2021-01-30 18:55 ` Aaron Lindsay
@ 2021-02-04 21:31 ` Aaron Lindsay
  2021-02-05 11:19   ` Alex Bennée
  1 sibling, 1 reply; 12+ messages in thread
From: Aaron Lindsay @ 2021-02-04 21:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, cota, richard.henderson

On Jan 29 22:23, Aaron Lindsay wrote:
> I appear to be seeing that if I register a callback for an instruction
> via `qemu_plugin_register_vcpu_insn_exec_cb` I receive a callback even
> if the instruction faults.

I was wrong about what I was seeing - I made some false assumptions
based on incomplete information. After adding some additional
instrumentation, I do not believe what I am seeing is explained by
translation faults, and think it might even be a bug.

For the below output, I've got a plugin which registers a callback via
`qemu_plugin_register_vcpu_insn_exec_cb` for each instruction executed.
I've enabled `-d in_asm` and added prints in my instruction execution
callback when it sees the opcode for the `ldr` instruction in question.
I'm running a local source build of the v5.2.0 release.

Note in the output below the instruction at 0xffffdd2f1d4102c0 is
getting re-translated for some reason, and that two callbacks are made
to my function registered with qemu_plugin_register_vcpu_insn_exec_cb
(the "*** saw encoding"... output) for what should be one instruction
execution.

Do you have any tips for debugging this further or ideas for ensuring the
callback is called only once for this instruction?

----------------
IN:
0xffffdd2f1d410250:  aa1e03e9  mov      x9, x30
0xffffdd2f1d410254:  d503201f  nop
0xffffdd2f1d410258:  a9bc7bfd  stp      x29, x30, [sp, #-0x40]!
0xffffdd2f1d41025c:  910003fd  mov      x29, sp
0xffffdd2f1d410260:  a90153f3  stp      x19, x20, [sp, #0x10]
0xffffdd2f1d410264:  b000f2d3  adrp     x19, #0xffffdd2f1f269000
0xffffdd2f1d410268:  911c4273  add      x19, x19, #0x710
0xffffdd2f1d41026c:  a9025bf5  stp      x21, x22, [sp, #0x20]
0xffffdd2f1d410270:  f000cad6  adrp     x22, #0xffffdd2f1ed6b000
0xffffdd2f1d410274:  aa0003f5  mov      x21, x0
0xffffdd2f1d410278:  f9409674  ldr      x20, [x19, #0x128]
0xffffdd2f1d41027c:  913d42d6  add      x22, x22, #0xf50
0xffffdd2f1d410280:  f9001bf7  str      x23, [sp, #0x30]
0xffffdd2f1d410284:  91003297  add      x23, x20, #0xc
0xffffdd2f1d410288:  91004294  add      x20, x20, #0x10
0xffffdd2f1d41028c:  1400000d  b        #0xffffdd2f1d4102c0

----------------
IN:
0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290

*** saw encoding 0xb94002e2 (@ 504107673 instructions)
----------------
IN:
0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]

*** saw encoding 0xb94002e2 (@ 504107674 instructions)
----------------
IN:
0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290

Thanks!

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-04 21:31 ` Aaron Lindsay
@ 2021-02-05 11:19   ` Alex Bennée
  2021-02-05 14:26     ` Aaron Lindsay via
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2021-02-05 11:19 UTC (permalink / raw)
  To: Aaron Lindsay; +Cc: cota, richard.henderson, qemu-devel


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Jan 29 22:23, Aaron Lindsay wrote:
>> I appear to be seeing that if I register a callback for an instruction
>> via `qemu_plugin_register_vcpu_insn_exec_cb` I receive a callback even
>> if the instruction faults.
>
> I was wrong about what I was seeing - I made some false assumptions
> based on incomplete information. After adding some additional
> instrumentation, I do not believe what I am seeing is explained by
> translation faults, and think it might even be a bug.

Hmm it's possibly a corner case we don't handle for plugins. Currently
we put instruction callbacks before the actual instruction because we
know a) it has to execute and b) we might not execute any code after -
for example if we do a jump. We do actually put placeholders before and
after each instruction because we do need them for memory
instrumentation.

> For the below output, I've got a plugin which registers a callback via
> `qemu_plugin_register_vcpu_insn_exec_cb` for each instruction executed.
> I've enabled `-d in_asm` and added prints in my instruction execution
> callback when it sees the opcode for the `ldr` instruction in question.
> I'm running a local source build of the v5.2.0 release.
>
> Note in the output below the instruction at 0xffffdd2f1d4102c0 is
> getting re-translated for some reason, and that two callbacks are made
> to my function registered with qemu_plugin_register_vcpu_insn_exec_cb
> (the "*** saw encoding"... output) for what should be one instruction
> execution.

I wonder is that load accessing a HW location? I suspect what is
happening is we detect a io_readx/io_writex when ->can_do_io is not
true. As HW access can only happen at the end of a block (because it may
change system state) we trigger a recompile of that instruction and try again.

> Do you have any tips for debugging this further or ideas for ensuring the
> callback is called only once for this instruction?

If you also plant a memory callback you should only see one load
happening for that instruction. Could you verify that?

> ----------------
> IN:
> 0xffffdd2f1d410250:  aa1e03e9  mov      x9, x30
> 0xffffdd2f1d410254:  d503201f  nop
> 0xffffdd2f1d410258:  a9bc7bfd  stp      x29, x30, [sp, #-0x40]!
> 0xffffdd2f1d41025c:  910003fd  mov      x29, sp
> 0xffffdd2f1d410260:  a90153f3  stp      x19, x20, [sp, #0x10]
> 0xffffdd2f1d410264:  b000f2d3  adrp     x19, #0xffffdd2f1f269000
> 0xffffdd2f1d410268:  911c4273  add      x19, x19, #0x710
> 0xffffdd2f1d41026c:  a9025bf5  stp      x21, x22, [sp, #0x20]
> 0xffffdd2f1d410270:  f000cad6  adrp     x22, #0xffffdd2f1ed6b000
> 0xffffdd2f1d410274:  aa0003f5  mov      x21, x0
> 0xffffdd2f1d410278:  f9409674  ldr      x20, [x19, #0x128]
> 0xffffdd2f1d41027c:  913d42d6  add      x22, x22, #0xf50
> 0xffffdd2f1d410280:  f9001bf7  str      x23, [sp, #0x30]
> 0xffffdd2f1d410284:  91003297  add      x23, x20, #0xc
> 0xffffdd2f1d410288:  91004294  add      x20, x20, #0x10
> 0xffffdd2f1d41028c:  1400000d  b        #0xffffdd2f1d4102c0
>
> ----------------
> IN:
> 0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
> 0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
> 0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
> 0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290
>
> *** saw encoding 0xb94002e2 (@ 504107673 instructions)
> ----------------
> IN:
> 0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
>
> *** saw encoding 0xb94002e2 (@ 504107674 instructions)
> ----------------
> IN:
> 0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
> 0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
> 0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290

I think you can work around this in your callback by looking for a
double execution but that exposes rather more of the knowledge of what
is going on behind the scenes than we intended for the plugin interface.
The point is you shouldn't need to know the details of the translator to
write your instruments.

My initial thought is that maybe when we install the callbacks we should
place them after translation if we know there is a guest load/store
happening. However my concern is having such heuristics might miss other
cases - could you see a load from HW indirect jump instruction for
example? It also has the potential to get confusing when we add the
ability to access register values.

I'll see if I can document the limitation for now though while we think
about the best way forward.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-05 11:19   ` Alex Bennée
@ 2021-02-05 14:26     ` Aaron Lindsay via
  2021-02-05 15:03       ` Alex Bennée
  0 siblings, 1 reply; 12+ messages in thread
From: Aaron Lindsay via @ 2021-02-05 14:26 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, richard.henderson

On Feb 05 11:19, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> 
> > For the below output, I've got a plugin which registers a callback via
> > `qemu_plugin_register_vcpu_insn_exec_cb` for each instruction executed.
> > I've enabled `-d in_asm` and added prints in my instruction execution
> > callback when it sees the opcode for the `ldr` instruction in question.
> > I'm running a local source build of the v5.2.0 release.
> >
> > Note in the output below the instruction at 0xffffdd2f1d4102c0 is
> > getting re-translated for some reason, and that two callbacks are made
> > to my function registered with qemu_plugin_register_vcpu_insn_exec_cb
> > (the "*** saw encoding"... output) for what should be one instruction
> > execution.
> 
> I wonder is that load accessing a HW location? I suspect what is
> happening is we detect a io_readx/io_writex when ->can_do_io is not
> true. As HW access can only happen at the end of a block (because it may
> change system state) we trigger a recompile of that instruction and try again.

I just added additional instrumentation, and
`qemu_plugin_hwaddr_is_io(hwaddr)` returns true in the mem_cb for this
access.

> > Do you have any tips for debugging this further or ideas for ensuring the
> > callback is called only once for this instruction?
> 
> If you also plant a memory callback you should only see one load
> happening for that instruction. Could you verify that?

Yes, I've verified there is only one load happening for the instruction,
and that the ordering of callbacks for this instruction is 1) first
insn_exec_cb, 2) second insn_exec_cb, 3) mem_cb.

Is there anything else you'd like me to check to validate your theory?

> > ----------------
> > IN:
> > 0xffffdd2f1d410250:  aa1e03e9  mov      x9, x30
> > 0xffffdd2f1d410254:  d503201f  nop
> > 0xffffdd2f1d410258:  a9bc7bfd  stp      x29, x30, [sp, #-0x40]!
> > 0xffffdd2f1d41025c:  910003fd  mov      x29, sp
> > 0xffffdd2f1d410260:  a90153f3  stp      x19, x20, [sp, #0x10]
> > 0xffffdd2f1d410264:  b000f2d3  adrp     x19, #0xffffdd2f1f269000
> > 0xffffdd2f1d410268:  911c4273  add      x19, x19, #0x710
> > 0xffffdd2f1d41026c:  a9025bf5  stp      x21, x22, [sp, #0x20]
> > 0xffffdd2f1d410270:  f000cad6  adrp     x22, #0xffffdd2f1ed6b000
> > 0xffffdd2f1d410274:  aa0003f5  mov      x21, x0
> > 0xffffdd2f1d410278:  f9409674  ldr      x20, [x19, #0x128]
> > 0xffffdd2f1d41027c:  913d42d6  add      x22, x22, #0xf50
> > 0xffffdd2f1d410280:  f9001bf7  str      x23, [sp, #0x30]
> > 0xffffdd2f1d410284:  91003297  add      x23, x20, #0xc
> > 0xffffdd2f1d410288:  91004294  add      x20, x20, #0x10
> > 0xffffdd2f1d41028c:  1400000d  b        #0xffffdd2f1d4102c0
> >
> > ----------------
> > IN:
> > 0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
> > 0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
> > 0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
> > 0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290
> >
> > *** saw encoding 0xb94002e2 (@ 504107673 instructions)
> > ----------------
> > IN:
> > 0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
> >
> > *** saw encoding 0xb94002e2 (@ 504107674 instructions)
> > ----------------
> > IN:
> > 0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
> > 0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
> > 0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290
> 
> I think you can work around this in your callback by looking for a
> double execution but that exposes rather more of the knowledge of what
> is going on behind the scenes than we intended for the plugin interface.
> The point is you shouldn't need to know the details of the translator to
> write your instruments.

Yes, working around it in that way was initial my thought as well. I
think there may be a few (albeit unlikely) corner cases this wouldn't
work correctly for - like self-branches. I don't think that's a major
roadblock for now, but I'd love to help work towards a cleaner solution
in the long-term.

> My initial thought is that maybe when we install the callbacks we should
> place them after translation if we know there is a guest load/store
> happening. However my concern is having such heuristics might miss other
> cases - could you see a load from HW indirect jump instruction for
> example? It also has the potential to get confusing when we add the
> ability to access register values.

Assuming you're right that TCG is detecting "a io_readx/io_writex when
->can_do_io is not true", could we detect this case when it occurs and
omit the instruction callbacks for the re-translation of the single
instruction (allow the initial callback to stand instead of trying to
turn back time, in a way, to prevent it)? Maybe there would have be some
bookkeeping in the plugin infrastructure side rather than entirely
omitting the callbacks when re-translating, in case that translation got
re-used in a case which didn't hit the same behavior and shouldn't be
skipped?

I admit I don't understand all the intricacies here, so what I suggest
may not be reasonable...

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-05 14:26     ` Aaron Lindsay via
@ 2021-02-05 15:03       ` Alex Bennée
  2021-02-05 15:33         ` Aaron Lindsay via
  2021-02-05 15:42         ` Aaron Lindsay via
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Bennée @ 2021-02-05 15:03 UTC (permalink / raw)
  To: Aaron Lindsay; +Cc: cota, richard.henderson, qemu-devel


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 05 11:19, Alex Bennée wrote:
>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> 
>> > For the below output, I've got a plugin which registers a callback via
>> > `qemu_plugin_register_vcpu_insn_exec_cb` for each instruction executed.
>> > I've enabled `-d in_asm` and added prints in my instruction execution
>> > callback when it sees the opcode for the `ldr` instruction in question.
>> > I'm running a local source build of the v5.2.0 release.
>> >
>> > Note in the output below the instruction at 0xffffdd2f1d4102c0 is
>> > getting re-translated for some reason, and that two callbacks are made
>> > to my function registered with qemu_plugin_register_vcpu_insn_exec_cb
>> > (the "*** saw encoding"... output) for what should be one instruction
>> > execution.
>> 
>> I wonder is that load accessing a HW location? I suspect what is
>> happening is we detect a io_readx/io_writex when ->can_do_io is not
>> true. As HW access can only happen at the end of a block (because it may
>> change system state) we trigger a recompile of that instruction and try again.
>
> I just added additional instrumentation, and
> `qemu_plugin_hwaddr_is_io(hwaddr)` returns true in the mem_cb for this
> access.
>
>> > Do you have any tips for debugging this further or ideas for ensuring the
>> > callback is called only once for this instruction?
>> 
>> If you also plant a memory callback you should only see one load
>> happening for that instruction. Could you verify that?
>
> Yes, I've verified there is only one load happening for the instruction,
> and that the ordering of callbacks for this instruction is 1) first
> insn_exec_cb, 2) second insn_exec_cb, 3) mem_cb.
>
> Is there anything else you'd like me to check to validate your theory?

No I think that pretty much confirms the theory.

>> > ----------------
>> > IN:
>> > 0xffffdd2f1d410250:  aa1e03e9  mov      x9, x30
>> > 0xffffdd2f1d410254:  d503201f  nop
>> > 0xffffdd2f1d410258:  a9bc7bfd  stp      x29, x30, [sp, #-0x40]!
>> > 0xffffdd2f1d41025c:  910003fd  mov      x29, sp
>> > 0xffffdd2f1d410260:  a90153f3  stp      x19, x20, [sp, #0x10]
>> > 0xffffdd2f1d410264:  b000f2d3  adrp     x19, #0xffffdd2f1f269000
>> > 0xffffdd2f1d410268:  911c4273  add      x19, x19, #0x710
>> > 0xffffdd2f1d41026c:  a9025bf5  stp      x21, x22, [sp, #0x20]
>> > 0xffffdd2f1d410270:  f000cad6  adrp     x22, #0xffffdd2f1ed6b000
>> > 0xffffdd2f1d410274:  aa0003f5  mov      x21, x0
>> > 0xffffdd2f1d410278:  f9409674  ldr      x20, [x19, #0x128]
>> > 0xffffdd2f1d41027c:  913d42d6  add      x22, x22, #0xf50
>> > 0xffffdd2f1d410280:  f9001bf7  str      x23, [sp, #0x30]
>> > 0xffffdd2f1d410284:  91003297  add      x23, x20, #0xc
>> > 0xffffdd2f1d410288:  91004294  add      x20, x20, #0x10
>> > 0xffffdd2f1d41028c:  1400000d  b        #0xffffdd2f1d4102c0
>> >
>> > ----------------
>> > IN:
>> > 0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
>> > 0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
>> > 0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
>> > 0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290
>> >
>> > *** saw encoding 0xb94002e2 (@ 504107673 instructions)
>> > ----------------
>> > IN:
>> > 0xffffdd2f1d4102c0:  b94002e2  ldr      w2, [x23]
>> >
>> > *** saw encoding 0xb94002e2 (@ 504107674 instructions)
>> > ----------------
>> > IN:
>> > 0xffffdd2f1d4102c4:  12002441  and      w1, w2, #0x3ff
>> > 0xffffdd2f1d4102c8:  710fec3f  cmp      w1, #0x3fb
>> > 0xffffdd2f1d4102cc:  54fffe29  b.ls     #0xffffdd2f1d410290
>> 
>> I think you can work around this in your callback by looking for a
>> double execution but that exposes rather more of the knowledge of what
>> is going on behind the scenes than we intended for the plugin interface.
>> The point is you shouldn't need to know the details of the translator to
>> write your instruments.
>
> Yes, working around it in that way was initial my thought as well. I
> think there may be a few (albeit unlikely) corner cases this wouldn't
> work correctly for - like self-branches. I don't think that's a major
> roadblock for now, but I'd love to help work towards a cleaner solution
> in the long-term.

A perhaps lighter weight mechanism is to detect load/store insns and
install a memory callback for those instructions instead of an
instruction callback. That way you only have one callback and it will
always be one that will execute once. Plugins are certainly allowed to
make decisions based on the guest instructions - hence we give access to
that data at translation time.

>
>> My initial thought is that maybe when we install the callbacks we should
>> place them after translation if we know there is a guest load/store
>> happening. However my concern is having such heuristics might miss other
>> cases - could you see a load from HW indirect jump instruction for
>> example? It also has the potential to get confusing when we add the
>> ability to access register values.
>
> Assuming you're right that TCG is detecting "a io_readx/io_writex when
> ->can_do_io is not true", could we detect this case when it occurs and
> omit the instruction callbacks for the re-translation of the single
> instruction (allow the initial callback to stand instead of trying to
> turn back time, in a way, to prevent it)? Maybe there would have be some
> bookkeeping in the plugin infrastructure side rather than entirely
> omitting the callbacks when re-translating, in case that translation got
> re-used in a case which didn't hit the same behavior and shouldn't be
> skipped?

They are happening in two separate phases. The translation phase has no
idea what the runtime condition will be. Once we get to runtime it's too
late - and we trigger a new translation phase.

I'll see what Richard thinks. I must admit I thought can_do_io was only
an issue for -icount modes but I think the real picture is slightly more
confused than that.

>
> I admit I don't understand all the intricacies here, so what I suggest
> may not be reasonable...
>
> -Aaron


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-05 15:03       ` Alex Bennée
@ 2021-02-05 15:33         ` Aaron Lindsay via
  2021-02-05 15:42         ` Aaron Lindsay via
  1 sibling, 0 replies; 12+ messages in thread
From: Aaron Lindsay via @ 2021-02-05 15:33 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, richard.henderson

On Feb 05 15:03, Alex Bennée wrote:
> I'll see what Richard thinks. I must admit I thought can_do_io was only
> an issue for -icount modes but I think the real picture is slightly more
> confused than that.

I am using -icount. I apologize for not including that originally - I
didn't realize it mattered.

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-05 15:03       ` Alex Bennée
  2021-02-05 15:33         ` Aaron Lindsay via
@ 2021-02-05 15:42         ` Aaron Lindsay via
  2021-02-05 16:41           ` Alex Bennée
  2021-02-11 17:27           ` Alex Bennée
  1 sibling, 2 replies; 12+ messages in thread
From: Aaron Lindsay via @ 2021-02-05 15:42 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, richard.henderson

On Feb 05 15:03, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > Assuming you're right that TCG is detecting "a io_readx/io_writex when
> > ->can_do_io is not true", could we detect this case when it occurs and
> > omit the instruction callbacks for the re-translation of the single
> > instruction (allow the initial callback to stand instead of trying to
> > turn back time, in a way, to prevent it)? Maybe there would have be some
> > bookkeeping in the plugin infrastructure side rather than entirely
> > omitting the callbacks when re-translating, in case that translation got
> > re-used in a case which didn't hit the same behavior and shouldn't be
> > skipped?
> 
> They are happening in two separate phases. The translation phase has no
> idea what the runtime condition will be. Once we get to runtime it's too
> late - and we trigger a new translation phase.

I believe I understand why we can't catch the initial translation. To
make sure I'm communicating well, my current understanding is that the
timeline for this case goes something like:

1) translate large block of instructions, including ldr
2) attempt to execute ldr, calling instruction callback
3) notice that access is to IO, trigger re-translation of single
   ldr instruction
4) execute block with single ldr instruction to completion, calling both
   instruction and memory callbacks

I was wondering if it would be possible to inform the re-translation in
step 3 that it's for a re-translated IO access so that it could
ultimately cause the second of the duplicate instruction callbacks to be
omitted during execution in 4.

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-05 15:42         ` Aaron Lindsay via
@ 2021-02-05 16:41           ` Alex Bennée
  2021-02-11 17:27           ` Alex Bennée
  1 sibling, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2021-02-05 16:41 UTC (permalink / raw)
  To: Aaron Lindsay; +Cc: cota, richard.henderson, qemu-devel


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 05 15:03, Alex Bennée wrote:
>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > Assuming you're right that TCG is detecting "a io_readx/io_writex when
>> > ->can_do_io is not true", could we detect this case when it occurs and
>> > omit the instruction callbacks for the re-translation of the single
>> > instruction (allow the initial callback to stand instead of trying to
>> > turn back time, in a way, to prevent it)? Maybe there would have be some
>> > bookkeeping in the plugin infrastructure side rather than entirely
>> > omitting the callbacks when re-translating, in case that translation got
>> > re-used in a case which didn't hit the same behavior and shouldn't be
>> > skipped?
>> 
>> They are happening in two separate phases. The translation phase has no
>> idea what the runtime condition will be. Once we get to runtime it's too
>> late - and we trigger a new translation phase.
>
> I believe I understand why we can't catch the initial translation. To
> make sure I'm communicating well, my current understanding is that the
> timeline for this case goes something like:
>
> 1) translate large block of instructions, including ldr
> 2) attempt to execute ldr, calling instruction callback
> 3) notice that access is to IO, trigger re-translation of single
>    ldr instruction
> 4) execute block with single ldr instruction to completion, calling both
>    instruction and memory callbacks
>
> I was wondering if it would be possible to inform the re-translation in
> step 3 that it's for a re-translated IO access so that it could
> ultimately cause the second of the duplicate instruction callbacks to be
> omitted during execution in 4.

Currently we invalidate the previous TB and save the new TB in the
translation cache before restarting the execution loop so the new TB
gets picked up. However we could certainly have a different mechanism
which ensures the next block is not cached. I think if we extend CFLAGS
down to gen_intermediate_code and translator_loop we could ask it not to
instrument that block. It wouldn't be the most efficient solution but
then again this is icount so...

Richard,

What do you think?


>
> -Aaron


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-05 15:42         ` Aaron Lindsay via
  2021-02-05 16:41           ` Alex Bennée
@ 2021-02-11 17:27           ` Alex Bennée
  2021-02-11 18:35             ` Aaron Lindsay via
  1 sibling, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2021-02-11 17:27 UTC (permalink / raw)
  To: Aaron Lindsay; +Cc: cota, richard.henderson, qemu-devel


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 05 15:03, Alex Bennée wrote:
>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > Assuming you're right that TCG is detecting "a io_readx/io_writex when
>> > ->can_do_io is not true", could we detect this case when it occurs and
>> > omit the instruction callbacks for the re-translation of the single
>> > instruction (allow the initial callback to stand instead of trying to
>> > turn back time, in a way, to prevent it)? Maybe there would have be some
>> > bookkeeping in the plugin infrastructure side rather than entirely
>> > omitting the callbacks when re-translating, in case that translation got
>> > re-used in a case which didn't hit the same behavior and shouldn't be
>> > skipped?
>> 
>> They are happening in two separate phases. The translation phase has no
>> idea what the runtime condition will be. Once we get to runtime it's too
>> late - and we trigger a new translation phase.
>
> I believe I understand why we can't catch the initial translation. To
> make sure I'm communicating well, my current understanding is that the
> timeline for this case goes something like:
>
> 1) translate large block of instructions, including ldr
> 2) attempt to execute ldr, calling instruction callback
> 3) notice that access is to IO, trigger re-translation of single
>    ldr instruction
> 4) execute block with single ldr instruction to completion, calling both
>    instruction and memory callbacks
>
> I was wondering if it would be possible to inform the re-translation in
> step 3 that it's for a re-translated IO access so that it could
> ultimately cause the second of the duplicate instruction callbacks to be
> omitted during execution in 4.

This is what I've done - re-executed blocks are compiled with CF_NOINSTR
which skips any instrumentation. If you could test the series I posted and
confirm the problem is solved that would be great:

  Subject: [PATCH  v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix)
  Date: Wed, 10 Feb 2021 22:10:32 +0000
  Message-Id: <20210210221053.18050-1-alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Detecting Faulting Instructions From Plugins
  2021-02-11 17:27           ` Alex Bennée
@ 2021-02-11 18:35             ` Aaron Lindsay via
  0 siblings, 0 replies; 12+ messages in thread
From: Aaron Lindsay via @ 2021-02-11 18:35 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, richard.henderson

On Feb 11 17:27, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > On Feb 05 15:03, Alex Bennée wrote:
> >> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> >> > Assuming you're right that TCG is detecting "a io_readx/io_writex when
> >> > ->can_do_io is not true", could we detect this case when it occurs and
> >> > omit the instruction callbacks for the re-translation of the single
> >> > instruction (allow the initial callback to stand instead of trying to
> >> > turn back time, in a way, to prevent it)? Maybe there would have be some
> >> > bookkeeping in the plugin infrastructure side rather than entirely
> >> > omitting the callbacks when re-translating, in case that translation got
> >> > re-used in a case which didn't hit the same behavior and shouldn't be
> >> > skipped?
> >> 
> >> They are happening in two separate phases. The translation phase has no
> >> idea what the runtime condition will be. Once we get to runtime it's too
> >> late - and we trigger a new translation phase.
> >
> > I believe I understand why we can't catch the initial translation. To
> > make sure I'm communicating well, my current understanding is that the
> > timeline for this case goes something like:
> >
> > 1) translate large block of instructions, including ldr
> > 2) attempt to execute ldr, calling instruction callback
> > 3) notice that access is to IO, trigger re-translation of single
> >    ldr instruction
> > 4) execute block with single ldr instruction to completion, calling both
> >    instruction and memory callbacks
> >
> > I was wondering if it would be possible to inform the re-translation in
> > step 3 that it's for a re-translated IO access so that it could
> > ultimately cause the second of the duplicate instruction callbacks to be
> > omitted during execution in 4.
> 
> This is what I've done - re-executed blocks are compiled with CF_NOINSTR
> which skips any instrumentation. If you could test the series I posted and
> confirm the problem is solved that would be great:
> 
>   Subject: [PATCH  v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix)
>   Date: Wed, 10 Feb 2021 22:10:32 +0000
>   Message-Id: <20210210221053.18050-1-alex.bennee@linaro.org>

Yes, I absolutely will. I worked on getting some local changes rebased
on top of these already this morning and am hoping to finish that up
today and to be able to report back by tomorrow.

Thanks for the quick turnaround on a fix!

-Aaron


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-11 19:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-30  3:23 Detecting Faulting Instructions From Plugins Aaron Lindsay
2021-01-30 18:55 ` Aaron Lindsay
2021-02-01 12:07   ` Alex Bennée
2021-02-04 21:31 ` Aaron Lindsay
2021-02-05 11:19   ` Alex Bennée
2021-02-05 14:26     ` Aaron Lindsay via
2021-02-05 15:03       ` Alex Bennée
2021-02-05 15:33         ` Aaron Lindsay via
2021-02-05 15:42         ` Aaron Lindsay via
2021-02-05 16:41           ` Alex Bennée
2021-02-11 17:27           ` Alex Bennée
2021-02-11 18:35             ` Aaron Lindsay via

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.