All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] reverse execution.
@ 2013-05-07 18:27 KONRAD Frédéric
  2013-05-09 17:54 ` Blue Swirl
  0 siblings, 1 reply; 17+ messages in thread
From: KONRAD Frédéric @ 2013-05-07 18:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Mark Burton, fred.konrad

Hi,

We are trying to find a way to do reverse execution happen with QEMU.

Actually, it is possible to debug the guest through the gdbstub, we want to
make the reverse execution possible with GDB as well.

How we are trying to make that working (basically without optimisation):

-QEMU takes regular snapshot of the VM:
    that can be done with the save vm code without optimisation first.

-When the VM is stopped and GDB requests a reverse-step:
    load the last snapshot and replay to one instruction before the 
current PC.

There are one issue with that for now (for a basic running reverse 
execution):
     -How to stop one instruction before the actual PC.

We though that using "-icount" and stop the guest a little time before the
actual position would give us the right behavior (We use a qemu_timer with
vm_clock to stop the vm at the good time), but it seems that it is not
deterministic, and not reproducable.

Is that normal?

We don't make any input during the replay, and we though that it can be 
caused
by some timer interruption but "-icount" is using a virtual timer as I
understand?

We have two other ideas:

     -Using TCI and count each instruction executed by the processor, 
then stop
         one instruction before the actual position. This seems slower.

     -Using single-step to count each instruction, then stop one instruction
         before the actual position.

Would that be better?

For now we can restore the VM from the last snapshot, when we do a 
reverse-step
but we can't stop at the exact position.

Thanks,
Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-07 18:27 [Qemu-devel] [RFC] reverse execution KONRAD Frédéric
@ 2013-05-09 17:54 ` Blue Swirl
  2013-05-17 17:23   ` KONRAD Frédéric
  0 siblings, 1 reply; 17+ messages in thread
From: Blue Swirl @ 2013-05-09 17:54 UTC (permalink / raw)
  To: KONRAD Frédéric; +Cc: Mark Burton, qemu-devel

On Tue, May 7, 2013 at 6:27 PM, KONRAD Frédéric
<fred.konrad@greensocs.com> wrote:
> Hi,
>
> We are trying to find a way to do reverse execution happen with QEMU.
>
> Actually, it is possible to debug the guest through the gdbstub, we want to
> make the reverse execution possible with GDB as well.
>
> How we are trying to make that working (basically without optimisation):
>
> -QEMU takes regular snapshot of the VM:
>    that can be done with the save vm code without optimisation first.
>
> -When the VM is stopped and GDB requests a reverse-step:
>    load the last snapshot and replay to one instruction before the current
> PC.
>
> There are one issue with that for now (for a basic running reverse
> execution):
>     -How to stop one instruction before the actual PC.

Add a special translation mode for reverse execution where the next PC
is checked after each instruction. Alternatively, you could make
temporary snapshots during this mode (first 1s intervals, then 0.1s
etc) which could be used to find the location. I think this way was
discussed briefly earlier in the list, please check the archives.

>
> We though that using "-icount" and stop the guest a little time before the
> actual position would give us the right behavior (We use a qemu_timer with
> vm_clock to stop the vm at the good time), but it seems that it is not
> deterministic, and not reproducable.
>
> Is that normal?
>
> We don't make any input during the replay, and we though that it can be
> caused
> by some timer interruption but "-icount" is using a virtual timer as I
> understand?
>
> We have two other ideas:
>
>     -Using TCI and count each instruction executed by the processor, then
> stop
>         one instruction before the actual position. This seems slower.
>
>     -Using single-step to count each instruction, then stop one instruction
>         before the actual position.
>
> Would that be better?
>
> For now we can restore the VM from the last snapshot, when we do a
> reverse-step
> but we can't stop at the exact position.
>
> Thanks,
> Fred
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-09 17:54 ` Blue Swirl
@ 2013-05-17 17:23   ` KONRAD Frédéric
  2013-05-17 17:54     ` Peter Maydell
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: KONRAD Frédéric @ 2013-05-17 17:23 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Mark Burton, qemu-devel

On 09/05/2013 19:54, Blue Swirl wrote:
> On Tue, May 7, 2013 at 6:27 PM, KONRAD Frédéric
> <fred.konrad@greensocs.com> wrote:
>> Hi,
>>
>> We are trying to find a way to do reverse execution happen with QEMU.
>>
>> Actually, it is possible to debug the guest through the gdbstub, we want to
>> make the reverse execution possible with GDB as well.
>>
>> How we are trying to make that working (basically without optimisation):
>>
>> -QEMU takes regular snapshot of the VM:
>>     that can be done with the save vm code without optimisation first.
>>
>> -When the VM is stopped and GDB requests a reverse-step:
>>     load the last snapshot and replay to one instruction before the current
>> PC.
>>
>> There are one issue with that for now (for a basic running reverse
>> execution):
>>      -How to stop one instruction before the actual PC.
> Add a special translation mode for reverse execution where the next PC
> is checked after each instruction. Alternatively, you could make
> temporary snapshots during this mode (first 1s intervals, then 0.1s
> etc) which could be used to find the location. I think this way was
> discussed briefly earlier in the list, please check the archives.
>

Hi, thanks for your answer!

I didn't find the discussion in the archive.. Do you have a clue? (Title 
or sender?)

For now we tried some other things which are not working very well,

It appeared that the replay is not deterministic even with icount:
     - the whole icount mechanism is not saved with save_vm (which can 
be achieved by moving qemu_icount to TimerState according to Paolo)
     - replaying two times the same thing and stopping at a specific 
breakpoint show two differents vmclock, so replaying the
         same amount of time don't work, and we abandoned this idea.

We tried to count the amount of time tcg_qemu_tb_exec exited with having 
executed some TB and we stopped one before for the replay.
This is nearly working but:
     - tcg_qemu_tb_exec exits a little more time during the first 
replay, seems the TB linked list is split dynamically?
     - this works with the next replay (reverse-stepi) but we can't stop 
at the exact PC instruction with this method.

So we will try to add an instruction counter in the CPUState and 
increments it after each instruction in the translation code,
which I think is approximately what you suggest.
Then when replaying the code from the snapshot, we will check the amount 
of executed instruction and stop one instruction before.
Maybe we can re-use icount mechanism but this might be a lot more 
complicated as it is a de-counter?

Can this be working?

Maybe we will need to trace the PC from the snapshot to the exact 
location? Or use both mechanism to get the right location?

Thanks,
Fred

>> We though that using "-icount" and stop the guest a little time before the
>> actual position would give us the right behavior (We use a qemu_timer with
>> vm_clock to stop the vm at the good time), but it seems that it is not
>> deterministic, and not reproducable.
>>
>> Is that normal?
>>
>> We don't make any input during the replay, and we though that it can be
>> caused
>> by some timer interruption but "-icount" is using a virtual timer as I
>> understand?
>>
>> We have two other ideas:
>>
>>      -Using TCI and count each instruction executed by the processor, then
>> stop
>>          one instruction before the actual position. This seems slower.
>>
>>      -Using single-step to count each instruction, then stop one instruction
>>          before the actual position.
>>
>> Would that be better?
>>
>> For now we can restore the VM from the last snapshot, when we do a
>> reverse-step
>> but we can't stop at the exact position.
>>
>> Thanks,
>> Fred
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-17 17:23   ` KONRAD Frédéric
@ 2013-05-17 17:54     ` Peter Maydell
  2013-05-17 19:16       ` Mark Burton
  2013-05-18 18:52     ` Blue Swirl
  2013-05-19  4:37     ` Rob Landley
  2 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2013-05-17 17:54 UTC (permalink / raw)
  To: KONRAD Frédéric; +Cc: Blue Swirl, Mark Burton, qemu-devel

On 17 May 2013 18:23, KONRAD Frédéric <fred.konrad@greensocs.com> wrote:
> It appeared that the replay is not deterministic even with icount:
>     - the whole icount mechanism is not saved with save_vm (which can be
> achieved by moving qemu_icount to TimerState according to Paolo)
>     - replaying two times the same thing and stopping at a specific
> breakpoint show two differents vmclock, so replaying the
>         same amount of time don't work, and we abandoned this idea.

Personally I think icount is supposed to be deterministic,
and if it isn't then it should be fixed to be so. Does anybody
who understands it better than me disagree?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-17 17:54     ` Peter Maydell
@ 2013-05-17 19:16       ` Mark Burton
  2013-05-23  1:57         ` Edgar E. Iglesias
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Burton @ 2013-05-17 19:16 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Blue Swirl, qemu-devel, KONRAD Frédéric

I wish I could say I understood it better, but at this point any insight would be gratefully received. However, what does seem clear is that the intent and purpose of Icount is subtly different, and possibly orthogonal to what we're trying to achieve.

And - actually, determinism (or the lack of it), is defiantly an issue, but - for now - we have spent much of this week finding a bit of code that avoids any non-determanistic behavior - simply so we can make sure the mechanisms work - THEN we will tackle the thorny subject of what is causing non-determanistic behavior (by which, I _suspect_ I mean, what devices or timers are not adhering to the icount mechanism).

To recap, as I understand things, setting the icount value in the command line is intended to give a rough "instructions per second" mechanism. One of the effects of that is to make things more deterministic.  Our overall intent is to allow the user that has hit a bug, to step backwards.

After much discussion (!) I'm convinced by the argument that I might in the end want both of these things. I might want to set some sort of instructions per second value (and change it between runs), and if/when I hit a bug, go backwards.

Thus far, so good. 

underneath the hood, icount keeps a counter in the TCG environment which is decremented (as Fred says) and the icount mechanism plays with it as it feels fit.
The bottom line is that, orthogonal to this, we need a separate 'counter' which is almost identical to the icount counter, in order to count instructions for the reverse execution mechanism.

We have looked at re-using the icount counter as Fred said, but that soon ends you up in a whole heap of pain. Our conclusion - it would be much cleaner to have a separate dedicated counter, then you can simply use either mechanism independent of the other.
On this subject - I would like to hear any and all views.

Having said all of that, in BOTH cases, we need determinism.

In our case, determinism is very tightly defined (which - I suspect may not be the case for icount). In our case, having returned to a snapshot, the subsequent execution must follow the EXACT SAME path that it did last time. no if's no buts. Not IO no income tax, no VAT, no money back no guarantee….

Right now, what Fred has found is that sometimes things 'drift'… we will (of course) be looking into that. But, for now, our principle concern is to take a simple bit of code, with no IO, and nothing that causes non-determanism - save a snapshot at the beginning of the sequence, run, hit a breakpoint, return to the breakpoint, and be able to _exactly_ return to the place we came from.

As Fred said, we imagined that we could do this based on TBs, at least as a 'block' level (which actually may be good enough for us). However, our mechanism for counting TB's was badly broken. None the less, we learnt a lot about TB's - and about some of the non-determaistic behavior that will come to haunt us later. We also concluded that counting TBs is always going to be second rate, and if we're going to do this properly, we need to count instructions. Finally, we have concluded that re-using the icount counters is going to be very painful, we need to re-use the same mechanism, but we need dedicated counters…


Again, please, all - pitch in and say what you think. Fred and I have been scratching out head all week on this, and I'm not convinced we have come up with the right answers, so any input would be most welcome.


Cheers

Mark.






On 17 May 2013, at 19:54, Peter Maydell wrote:

> On 17 May 2013 18:23, KONRAD Frédéric <fred.konrad@greensocs.com> wrote:
>> It appeared that the replay is not deterministic even with icount:
>>    - the whole icount mechanism is not saved with save_vm (which can be
>> achieved by moving qemu_icount to TimerState according to Paolo)
>>    - replaying two times the same thing and stopping at a specific
>> breakpoint show two differents vmclock, so replaying the
>>        same amount of time don't work, and we abandoned this idea.
> 
> Personally I think icount is supposed to be deterministic,
> and if it isn't then it should be fixed to be so. Does anybody
> who understands it better than me disagree?
> 
> thanks
> -- PMM


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210
 707-356-0783 x 210
	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-17 17:23   ` KONRAD Frédéric
  2013-05-17 17:54     ` Peter Maydell
@ 2013-05-18 18:52     ` Blue Swirl
  2013-05-22 16:14       ` KONRAD Frédéric
  2013-05-19  4:37     ` Rob Landley
  2 siblings, 1 reply; 17+ messages in thread
From: Blue Swirl @ 2013-05-18 18:52 UTC (permalink / raw)
  To: KONRAD Frédéric; +Cc: Mark Burton, qemu-devel

On Fri, May 17, 2013 at 5:23 PM, KONRAD Frédéric
<fred.konrad@greensocs.com> wrote:
> On 09/05/2013 19:54, Blue Swirl wrote:
>>
>> On Tue, May 7, 2013 at 6:27 PM, KONRAD Frédéric
>> <fred.konrad@greensocs.com> wrote:
>>>
>>> Hi,
>>>
>>> We are trying to find a way to do reverse execution happen with QEMU.
>>>
>>> Actually, it is possible to debug the guest through the gdbstub, we want
>>> to
>>> make the reverse execution possible with GDB as well.
>>>
>>> How we are trying to make that working (basically without optimisation):
>>>
>>> -QEMU takes regular snapshot of the VM:
>>>     that can be done with the save vm code without optimisation first.
>>>
>>> -When the VM is stopped and GDB requests a reverse-step:
>>>     load the last snapshot and replay to one instruction before the
>>> current
>>> PC.
>>>
>>> There are one issue with that for now (for a basic running reverse
>>> execution):
>>>      -How to stop one instruction before the actual PC.
>>
>> Add a special translation mode for reverse execution where the next PC
>> is checked after each instruction. Alternatively, you could make
>> temporary snapshots during this mode (first 1s intervals, then 0.1s
>> etc) which could be used to find the location. I think this way was
>> discussed briefly earlier in the list, please check the archives.
>>
>
> Hi, thanks for your answer!
>
> I didn't find the discussion in the archive.. Do you have a clue? (Title or
> sender?)

Paul Brook (long time QEMU developer) made a paper about this together
with Daniel Jacobowitz:
http://www.linuxsymposium.org/archives/GCC/Reprints-2007/jacobowitz-reprint.pdf

IIRC Paul also mentioned some techniques on the list at that time but
I couldn't find that in the archives.

Other related discussions:
http://article.gmane.org/gmane.comp.emulators.qemu/88447
http://article.gmane.org/gmane.comp.emulators.qemu/94861
http://article.gmane.org/gmane.comp.emulators.qemu/154572

Also this site contains some overview of reverse debugging:
http://jakob.engbloms.se/archives/1554

>
> For now we tried some other things which are not working very well,
>
> It appeared that the replay is not deterministic even with icount:
>     - the whole icount mechanism is not saved with save_vm (which can be
> achieved by moving qemu_icount to TimerState according to Paolo)
>     - replaying two times the same thing and stopping at a specific
> breakpoint show two differents vmclock, so replaying the
>         same amount of time don't work, and we abandoned this idea.
>
> We tried to count the amount of time tcg_qemu_tb_exec exited with having
> executed some TB and we stopped one before for the replay.
> This is nearly working but:
>     - tcg_qemu_tb_exec exits a little more time during the first replay,
> seems the TB linked list is split dynamically?
>     - this works with the next replay (reverse-stepi) but we can't stop at
> the exact PC instruction with this method.
>
> So we will try to add an instruction counter in the CPUState and increments
> it after each instruction in the translation code,
> which I think is approximately what you suggest.
> Then when replaying the code from the snapshot, we will check the amount of
> executed instruction and stop one instruction before.
> Maybe we can re-use icount mechanism but this might be a lot more
> complicated as it is a de-counter?
>
> Can this be working?
>
> Maybe we will need to trace the PC from the snapshot to the exact location?

That should be easy, but not the fastest way.

> Or use both mechanism to get the right location?

Yes, you could load VM from previous snapshot and then use icount or
just host timer to get approximately halfway. Make a new snapshot and
then try again, starting from that snapshot. When you get close
enough, singlestep to the final instruction.

>
> Thanks,
> Fred
>
>
>>> We though that using "-icount" and stop the guest a little time before
>>> the
>>> actual position would give us the right behavior (We use a qemu_timer
>>> with
>>> vm_clock to stop the vm at the good time), but it seems that it is not
>>> deterministic, and not reproducable.
>>>
>>> Is that normal?
>>>
>>> We don't make any input during the replay, and we though that it can be
>>> caused
>>> by some timer interruption but "-icount" is using a virtual timer as I
>>> understand?
>>>
>>> We have two other ideas:
>>>
>>>      -Using TCI and count each instruction executed by the processor,
>>> then
>>> stop
>>>          one instruction before the actual position. This seems slower.
>>>
>>>      -Using single-step to count each instruction, then stop one
>>> instruction
>>>          before the actual position.
>>>
>>> Would that be better?
>>>
>>> For now we can restore the VM from the last snapshot, when we do a
>>> reverse-step
>>> but we can't stop at the exact position.
>>>
>>> Thanks,
>>> Fred
>>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-17 17:23   ` KONRAD Frédéric
  2013-05-17 17:54     ` Peter Maydell
  2013-05-18 18:52     ` Blue Swirl
@ 2013-05-19  4:37     ` Rob Landley
  2013-05-19  7:21       ` Peter Maydell
  2 siblings, 1 reply; 17+ messages in thread
From: Rob Landley @ 2013-05-19  4:37 UTC (permalink / raw)
  To: KONRAD Frédéric; +Cc: Blue Swirl, Mark Burton, qemu-devel

On 05/17/2013 12:23:51 PM, KONRAD Frédéric wrote:
> On 09/05/2013 19:54, Blue Swirl wrote:
>> On Tue, May 7, 2013 at 6:27 PM, KONRAD Frédéric
>> <fred.konrad@greensocs.com> wrote:
>>> Hi,
>>> 
>>> We are trying to find a way to do reverse execution happen with  
>>> QEMU.
...
> For now we tried some other things which are not working very well,
> 
> It appeared that the replay is not deterministic even with icount:

You're aware that reverse execution means you have the "come from"  
problem, right? (The opposite of goto.)

You literally _can't_ figure out your control flow by running the code  
backwards. It's equivalent to solving the halting problem. The best you  
can do is log and replay.

Rob

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19  4:37     ` Rob Landley
@ 2013-05-19  7:21       ` Peter Maydell
  2013-05-19 20:09         ` Mark Burton
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2013-05-19  7:21 UTC (permalink / raw)
  To: Rob Landley
  Cc: Blue Swirl, Mark Burton, qemu-devel, KONRAD Frédéric

On 19 May 2013 05:37, Rob Landley <rob@landley.net> wrote:
> On 05/17/2013 12:23:51 PM, KONRAD Frédéric wrote:
>> It appeared that the replay is not deterministic even with icount:

> You're aware that reverse execution means you have the "come from" problem,
> right? (The opposite of goto.)
>
> You literally _can't_ figure out your control flow by running the code
> backwards. It's equivalent to solving the halting problem. The best you can
> do is log and replay.

Yes, of course -- 'reverse execution' is just the usual phrase
for the user-visible effect. However if your *forwards* replay
isn't deterministic then it all goes pear-shaped, which is
what Fred is complaining about.

-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19  7:21       ` Peter Maydell
@ 2013-05-19 20:09         ` Mark Burton
  2013-05-19 21:20           ` Peter Maydell
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mark Burton @ 2013-05-19 20:09 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Blue Swirl, qemu-devel, KONRAD Frédéric

Spot on Peter,
The (simplistic) plan is simply to take a snapshot at regular intervals, when you want to step backwards, you return to a snapshot, and then re-run forwards to 'just before you started'.

To answer Blauwirbel, we can't "approximate" this - or 'binary search' for the right place - we absolutely have to know exactly how to stop exactly 'just before'. To do that we need a measure of where we are. Time isn't really what we want (It has a nasty habit of moving forwards even when we are pretending to reverse it!). Ideally we need an instruction count. (I say ideally, we could actually get away with something like a block count, but if blocks are linked together etc, we would have to be careful).

I still favor a dedicated counter for this, it seems to make more sense. But - I am wondering about a 'basic block counter' rather than an 'instruction counter'.
	Note - what I understand by a basic block is something that ends in a jump/branch of some description. Hence, one thing I think you can say about a basic block is that each PC value within it is unique. Hence, if I know the number of basic blocks executed, and the current PC, then I should be able to re-run to there (assuming a deterministic system of course).

I'd be interested to know (a) if there is a sensible place for adding a basic block counter, and (b) if people like this route better or worse than an instruction counter?

Cheers

Mark.


On 19 May 2013, at 09:21, Peter Maydell wrote:

> On 19 May 2013 05:37, Rob Landley <rob@landley.net> wrote:
>> On 05/17/2013 12:23:51 PM, KONRAD Frédéric wrote:
>>> It appeared that the replay is not deterministic even with icount:
> 
>> You're aware that reverse execution means you have the "come from" problem,
>> right? (The opposite of goto.)
>> 
>> You literally _can't_ figure out your control flow by running the code
>> backwards. It's equivalent to solving the halting problem. The best you can
>> do is log and replay.
> 
> Yes, of course -- 'reverse execution' is just the usual phrase
> for the user-visible effect. However if your *forwards* replay
> isn't deterministic then it all goes pear-shaped, which is
> what Fred is complaining about.
> 
> -- PMM


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210
 707-356-0783 x 210
	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19 20:09         ` Mark Burton
@ 2013-05-19 21:20           ` Peter Maydell
       [not found]             ` <CAD2=zRDphd7N5gCQeX6oVQP=HEbRRMpcwPKEVDj46DHKhgkKMw@mail.gmail.com>
  2013-05-20  5:28             ` Mark Burton
  2013-05-19 21:39           ` Rob Landley
  2013-05-29 12:37           ` Pavel Dovgaluk
  2 siblings, 2 replies; 17+ messages in thread
From: Peter Maydell @ 2013-05-19 21:20 UTC (permalink / raw)
  To: Mark Burton; +Cc: Blue Swirl, qemu-devel, KONRAD Frédéric

On 19 May 2013 21:09, Mark Burton <mark.burton@greensocs.com> wrote:
>         Note - what I understand by a basic block is something that ends in a
> jump/branch of some description. Hence, one thing I think you can say about a
> basic block is that each PC value within it is unique. Hence, if I know the
> number of basic blocks executed, and the current PC, then I should be able to
> re-run to there (assuming a deterministic system of course).

Assuming your rerun is strictly deterministic so you always exit
the basic block the same way each time, then yes, this amounts
to optimising the "instruction count" by doing it as "X basic
blocks then Y instructions". You could actually have this
really do the instruction count for you, if you track insns
per block at translation time. (There is some fiddling necessary
when we take an unexpected exception in the middle of a block
due to a load/store fault.)

> I'd be interested to know (a) if there is a sensible place for
> adding a basic block counter, and (b) if people like this route
> better or worse than an instruction counter?

I think you're probably best off getting the instruction counter
working properly before trying to optimise it...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19 20:09         ` Mark Burton
  2013-05-19 21:20           ` Peter Maydell
@ 2013-05-19 21:39           ` Rob Landley
  2013-05-20  5:34             ` Mark Burton
  2013-05-29 12:37           ` Pavel Dovgaluk
  2 siblings, 1 reply; 17+ messages in thread
From: Rob Landley @ 2013-05-19 21:39 UTC (permalink / raw)
  To: Mark Burton
  Cc: Blue Swirl, Peter Maydell, qemu-devel, KONRAD Frédéric

On 05/19/2013 03:09:14 PM, Mark Burton wrote:
> Spot on Peter,
> The (simplistic) plan is simply to take a snapshot at regular  
> intervals,
> when you want to step backwards, you return to a snapshot, and then  
> re-run
> forwards to 'just before you started'.

You'd have to snapshot all of memory because any of it could be used  
for branching decisions. You'd have to snapshot the state of I/O  
devices for the same reason. This includes serial ports and keyboards  
and your hardware random number generator and the timer interrupt and  
disk interrupts, all of which you have to log and replay the input  
from, and get the timing exactly right for the interrupts they  
generate. (It has to happen at the right spot because it's used to  
update the random number pool, it can affect scheduling decisions...)

Good luck with that.

A horrid thing you might do is log the instruction pointer every time  
it changes into a (giant) ring buffer. Possibly instrument tcg to write  
up that register every time it has to actually know it (jumps and when  
interrupts happen). (You don't have to know "advanced to next  
instruction". You do have to know "advanced to something other than  
next instruction".) It'll be slow and painful, but might be possible.

Then again to make it work you'd have to log not just where you went  
but where you came _from_ so you could see that you got there and it's  
time to jump again (interrupts again, doesn't mean it's a normal  
departure point, could be a signal). And the problem is do you record  
the target's idea of "from" or the translated host idea of "from"...

Rob

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
       [not found]             ` <CAD2=zRDphd7N5gCQeX6oVQP=HEbRRMpcwPKEVDj46DHKhgkKMw@mail.gmail.com>
@ 2013-05-19 21:47               ` Brendan Dolan-Gavitt
  0 siblings, 0 replies; 17+ messages in thread
From: Brendan Dolan-Gavitt @ 2013-05-19 21:47 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2531 bytes --]

Argh, reply all is needed.
On May 19, 2013 4:45 PM, "Brendan Dolan-Gavitt" <brendandg@gatech.edu>
wrote:

> We had to do something similar for our (soon to be released) record and
> replay implementation. To ensure interrupts are delivered at precisely the
> right time we added a global 64 bit instruction counter and then modified
> translate.c for each architecture to emit tcg ops that increment it before
> each instruction. At the end of translating a TB we store the number of
> instructions in that TB.
>
> Then during replay, before every basic block executes (we disabled
> chaining) we check whether the current instruction count plus the number of
> instructions in the next TB is greater than the instruction count recorded
> for the next interrupt. If so, we retranslate and terminate the block at
> the right point so we can deliver the interrupt.
>
> There is one gotcha to this,  which is that care had to be taken to not
> interfere with the search_pc mechanism.
>
> If the above explanation isn't clear feel free to ask questions or just
> wait 2 weeks and read the code. I'm not sure how much work it would be to
> integrate what we have into the gdb stub but I'd be delighted if someone
> took on the task.
>
> Cheers,
> Brendan
> On 19 May 2013 21:09, Mark Burton <mark.burton@greensocs.com> wrote:
> >         Note - what I understand by a basic block is something that ends
> in a
> > jump/branch of some description. Hence, one thing I think you can say
> about a
> > basic block is that each PC value within it is unique. Hence, if I know
> the
> > number of basic blocks executed, and the current PC, then I should be
> able to
> > re-run to there (assuming a deterministic system of course).
>
> Assuming your rerun is strictly deterministic so you always exit
> the basic block the same way each time, then yes, this amounts
> to optimising the "instruction count" by doing it as "X basic
> blocks then Y instructions". You could actually have this
> really do the instruction count for you, if you track insns
> per block at translation time. (There is some fiddling necessary
> when we take an unexpected exception in the middle of a block
> due to a load/store fault.)
>
> > I'd be interested to know (a) if there is a sensible place for
> > adding a basic block counter, and (b) if people like this route
> > better or worse than an instruction counter?
>
> I think you're probably best off getting the instruction counter
> working properly before trying to optimise it...
>
> thanks
> -- PMM
>
>

[-- Attachment #2: Type: text/html, Size: 3134 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19 21:20           ` Peter Maydell
       [not found]             ` <CAD2=zRDphd7N5gCQeX6oVQP=HEbRRMpcwPKEVDj46DHKhgkKMw@mail.gmail.com>
@ 2013-05-20  5:28             ` Mark Burton
  1 sibling, 0 replies; 17+ messages in thread
From: Mark Burton @ 2013-05-20  5:28 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Blue Swirl, Mark Burton, qemu-devel, KONRAD Frédéric



On 19 May 2013, at 23:20, Peter Maydell <peter.maydell@linaro.org> wrote:

> On 19 May 2013 21:09, Mark Burton <mark.burton@greensocs.com> wrote:
>>        Note - what I understand by a basic block is something that ends in a
>> jump/branch of some description. Hence, one thing I think you can say about a
>> basic block is that each PC value within it is unique. Hence, if I know the
>> number of basic blocks executed, and the current PC, then I should be able to
>> re-run to there (assuming a deterministic system of course).
> 
> Assuming your rerun is strictly deterministic so you always exit
> the basic block the same way each time, then yes, this amounts
> to optimising the "instruction count" by doing it as "X basic
> blocks then Y instructions".

Yes that's exactly what I'm saying 
But....
I guess it's not quite so simple...

> You could actually have this
> really do the instruction count for you, if you track insns
> per block at translation time. (There is some fiddling necessary
> when we take an unexpected exception in the middle of a block
> due to a load/store fault.)


To be clear, we're talking about eg a memory exception that effectively causes an unexpected branch in the middle of a basic block.

The basic blocks used to execute the exception will be counted of course.

The issue is the instruction count of executed instructions as you enter the exception will be wrong.

However it will be consistent and deterministic, so long as the exception itself is deterministic... Which it should be.

The rule that the pc will be unique will be maintained.
Hence if we choose the 'count blocks' and then find the pc I believe we might 'get away' with this case?
If we count the instructions themselves then we have some fixing to do, as you say.

> 
>> I'd be interested to know (a) if there is a sensible place for
>> adding a basic block counter, and (b) if people like this route
>> better or worse than an instruction counter?
> 
> I think you're probably best off getting the instruction counter
> working properly before trying to optimise it...
> 

I agree, 

The advantage of your approach is that the instruction counter may have other uses, and it's robust.

We can start, as you say, simply, and then use a basic block approach to optimise it if need be...

Still, if people see no use in an instruction count, and if people can point at a decent place to annotate basic blocks, then we can do without it....

Cheers

Mark
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19 21:39           ` Rob Landley
@ 2013-05-20  5:34             ` Mark Burton
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Burton @ 2013-05-20  5:34 UTC (permalink / raw)
  To: Rob Landley
  Cc: Blue Swirl, Peter Maydell, Mark Burton, qemu-devel,
	KONRAD Frédéric



On 19 May 2013, at 23:39, Rob Landley <rob@landley.net> wrote:

> On 05/19/2013 03:09:14 PM, Mark Burton wrote:
>> Spot on Peter,
>> The (simplistic) plan is simply to take a snapshot at regular intervals,
>> when you want to step backwards, you return to a snapshot, and then re-run
>> forwards to 'just before you started'.
> 
> You'd have to snapshot all of memory because any of it could be used for branching decisions. You'd have to

Yes. Qemu helps us there we believe.

> snapshot the state of I/O devices for the same reason.

Exactly.
> This includes serial ports and keyboards and your hardware random number generator and the timer interrupt and disk interrupts, all of which you have to log and replay the input from, and get the timing exactly right for the interrupts they

Yes. We are not there yet , but, 
A) Icount seems to help make some of this more deterministic.
B) we can record the io queues activities, this has to be done for migration too....(as I understand it)

But - as I say, we're not there yet...

> generate. (It has to happen at the right spot because it's used to update the random number pool, it can affect scheduling decisions...)
> 
> Good luck with that.
> 
> A horrid thing you might do is log the instruction pointer every time it changes into a (giant) ring buffer. Possibly instrument tcg to write up that register every time it has to actually know it (jumps and when interrupts happen). (You don't have to know "advanced to next instruction". You do have to know "advanced to something other than next instruction".) It'll be slow and painful, but might be possible.
> 

Actually I don't believe this is enough - when the code accesses the device it needs to get the right values , it's not good enough to force the processor to a specific address...

But, maybe I misunderstand your idea?

Cheers

Mark

> Then again to make it work you'd have to log not just where you went but where you came _from_ so you could see that you got there and it's time to jump again (interrupts again, doesn't mean it's a normal departure point, could be a signal). And the problem is do you record the target's idea of "from" or the translated host idea of "from"...
> 
> Rob

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-18 18:52     ` Blue Swirl
@ 2013-05-22 16:14       ` KONRAD Frédéric
  0 siblings, 0 replies; 17+ messages in thread
From: KONRAD Frédéric @ 2013-05-22 16:14 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Mark Burton, qemu-devel

On 18/05/2013 20:52, Blue Swirl wrote:
> On Fri, May 17, 2013 at 5:23 PM, KONRAD Frédéric
> <fred.konrad@greensocs.com> wrote:
>> On 09/05/2013 19:54, Blue Swirl wrote:
>>> On Tue, May 7, 2013 at 6:27 PM, KONRAD Frédéric
>>> <fred.konrad@greensocs.com> wrote:
>>>> Hi,
>>>>
>>>> We are trying to find a way to do reverse execution happen with QEMU.
>>>>
>>>> Actually, it is possible to debug the guest through the gdbstub, we want
>>>> to
>>>> make the reverse execution possible with GDB as well.
>>>>
>>>> How we are trying to make that working (basically without optimisation):
>>>>
>>>> -QEMU takes regular snapshot of the VM:
>>>>      that can be done with the save vm code without optimisation first.
>>>>
>>>> -When the VM is stopped and GDB requests a reverse-step:
>>>>      load the last snapshot and replay to one instruction before the
>>>> current
>>>> PC.
>>>>
>>>> There are one issue with that for now (for a basic running reverse
>>>> execution):
>>>>       -How to stop one instruction before the actual PC.
>>> Add a special translation mode for reverse execution where the next PC
>>> is checked after each instruction. Alternatively, you could make
>>> temporary snapshots during this mode (first 1s intervals, then 0.1s
>>> etc) which could be used to find the location. I think this way was
>>> discussed briefly earlier in the list, please check the archives.
>>>
>> Hi, thanks for your answer!
>>
>> I didn't find the discussion in the archive.. Do you have a clue? (Title or
>> sender?)
> Paul Brook (long time QEMU developer) made a paper about this together
> with Daniel Jacobowitz:
> http://www.linuxsymposium.org/archives/GCC/Reprints-2007/jacobowitz-reprint.pdf
>
> IIRC Paul also mentioned some techniques on the list at that time but
> I couldn't find that in the archives.
>
> Other related discussions:
> http://article.gmane.org/gmane.comp.emulators.qemu/88447
> http://article.gmane.org/gmane.comp.emulators.qemu/94861
> http://article.gmane.org/gmane.comp.emulators.qemu/154572
>
> Also this site contains some overview of reverse debugging:
> http://jakob.engbloms.se/archives/1554
>

Thanks for your help :).
>> For now we tried some other things which are not working very well,
>>
>> It appeared that the replay is not deterministic even with icount:
>>      - the whole icount mechanism is not saved with save_vm (which can be
>> achieved by moving qemu_icount to TimerState according to Paolo)
>>      - replaying two times the same thing and stopping at a specific
>> breakpoint show two differents vmclock, so replaying the
>>          same amount of time don't work, and we abandoned this idea.
>>
>> We tried to count the amount of time tcg_qemu_tb_exec exited with having
>> executed some TB and we stopped one before for the replay.
>> This is nearly working but:
>>      - tcg_qemu_tb_exec exits a little more time during the first replay,
>> seems the TB linked list is split dynamically?
>>      - this works with the next replay (reverse-stepi) but we can't stop at
>> the exact PC instruction with this method.
>>
>> So we will try to add an instruction counter in the CPUState and increments
>> it after each instruction in the translation code,
>> which I think is approximately what you suggest.
>> Then when replaying the code from the snapshot, we will check the amount of
>> executed instruction and stop one instruction before.
>> Maybe we can re-use icount mechanism but this might be a lot more
>> complicated as it is a de-counter?
>>
>> Can this be working?
>>
>> Maybe we will need to trace the PC from the snapshot to the exact location?
> That should be easy, but not the fastest way.
>
>> Or use both mechanism to get the right location?
> Yes, you could load VM from previous snapshot and then use icount or
> just host timer to get approximately halfway. Make a new snapshot and
> then try again, starting from that snapshot. When you get close
> enough, singlestep to the final instruction.

Well, finally we plan to do approximately this way:

We added a translation block counter, which is incremented by TCG code
the same way as icount, and raise a debug exception when we are at the
right location.

Unfortunately this is not sufficient in term of precision, we can jump 
back 1 TB.

We have two choices:
     a/ keep this "executed translation block" counter and "step by 
step" go to the right location.
     b/ transform this counter in a "executed instruction" counter like 
icount and do
         "step by step" execution when we are replaying.

The first is a bit difficult, as we don't have the exact PC location 
where to stop, and the second can
be really slow (I don't have performance measure at the moment).

Maybe we can try mixing both: replaying to the start of the right TB, 
then step by step going to the
right PC.

Fred
>> Thanks,
>> Fred
>>
>>
>>>> We though that using "-icount" and stop the guest a little time before
>>>> the
>>>> actual position would give us the right behavior (We use a qemu_timer
>>>> with
>>>> vm_clock to stop the vm at the good time), but it seems that it is not
>>>> deterministic, and not reproducable.
>>>>
>>>> Is that normal?
>>>>
>>>> We don't make any input during the replay, and we though that it can be
>>>> caused
>>>> by some timer interruption but "-icount" is using a virtual timer as I
>>>> understand?
>>>>
>>>> We have two other ideas:
>>>>
>>>>       -Using TCI and count each instruction executed by the processor,
>>>> then
>>>> stop
>>>>           one instruction before the actual position. This seems slower.
>>>>
>>>>       -Using single-step to count each instruction, then stop one
>>>> instruction
>>>>           before the actual position.
>>>>
>>>> Would that be better?
>>>>
>>>> For now we can restore the VM from the last snapshot, when we do a
>>>> reverse-step
>>>> but we can't stop at the exact position.
>>>>
>>>> Thanks,
>>>> Fred
>>>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-17 19:16       ` Mark Burton
@ 2013-05-23  1:57         ` Edgar E. Iglesias
  0 siblings, 0 replies; 17+ messages in thread
From: Edgar E. Iglesias @ 2013-05-23  1:57 UTC (permalink / raw)
  To: Mark Burton
  Cc: Blue Swirl, Peter Maydell, qemu-devel, KONRAD Frédéric

On Fri, May 17, 2013 at 09:16:06PM +0200, Mark Burton wrote:
> I wish I could say I understood it better, but at this point any insight would be gratefully received. However, what does seem clear is that the intent and purpose of Icount is subtly different, and possibly orthogonal to what we're trying to achieve.
> 
> And - actually, determinism (or the lack of it), is defiantly an issue, but - for now - we have spent much of this week finding a bit of code that avoids any non-determanistic behavior - simply so we can make sure the mechanisms work - THEN we will tackle the thorny subject of what is causing non-determanistic behavior (by which, I _suspect_ I mean, what devices or timers are not adhering to the icount mechanism).
> 
> To recap, as I understand things, setting the icount value in the command line is intended to give a rough "instructions per second" mechanism. One of the effects of that is to make things more deterministic.  Our overall intent is to allow the user that has hit a bug, to step backwards.
> 
> After much discussion (!) I'm convinced by the argument that I might in the end want both of these things. I might want to set some sort of instructions per second value (and change it between runs), and if/when I hit a bug, go backwards.
> 
> Thus far, so good. 
> 
> underneath the hood, icount keeps a counter in the TCG environment which is decremented (as Fred says) and the icount mechanism plays with it as it feels fit.
> The bottom line is that, orthogonal to this, we need a separate 'counter' which is almost identical to the icount counter, in order to count instructions for the reverse execution mechanism.
> 
> We have looked at re-using the icount counter as Fred said, but that soon ends you up in a whole heap of pain. Our conclusion - it would be much cleaner to have a separate dedicated counter, then you can simply use either mechanism independent of the other.
> On this subject - I would like to hear any and all views.
> 
> Having said all of that, in BOTH cases, we need determinism.
> 
> In our case, determinism is very tightly defined (which - I suspect may not be the case for icount). In our case, having returned to a snapshot, the subsequent execution must follow the EXACT SAME path that it did last time. no if's no buts. Not IO no income tax, no VAT, no money back no guarantee….
> 
> Right now, what Fred has found is that sometimes things 'drift'… we will (of course) be looking into that. But, for now, our principle concern is to take a simple bit of code, with no IO, and nothing that causes non-determanism - save a snapshot at the beginning of the sequence, run, hit a breakpoint, return to the breakpoint, and be able to _exactly_ return to the place we came from.
> 
> As Fred said, we imagined that we could do this based on TBs, at least as a 'block' level (which actually may be good enough for us). However, our mechanism for counting TB's was badly broken. None the less, we learnt a lot about TB's - and about some of the non-determaistic behavior that will come to haunt us later. We also concluded that counting TBs is always going to be second rate, and if we're going to do this properly, we need to count instructions. Finally, we have concluded that re-using the icount counters is going to be very painful, we need to re-use the same mechanism, but we need dedicated counters…
> 
> 
> Again, please, all - pitch in and say what you think. Fred and I have been scratching out head all week on this, and I'm not convinced we have come up with the right answers, so any input would be most welcome.

Hi,

This was a long time ago, but I recall having issues with determenism when
hacking on TLMu. Ditching the display timer helped.

IIRC, I was getting 100% reproducable runs after that.

Cheers,
Edgar

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC] reverse execution.
  2013-05-19 20:09         ` Mark Burton
  2013-05-19 21:20           ` Peter Maydell
  2013-05-19 21:39           ` Rob Landley
@ 2013-05-29 12:37           ` Pavel Dovgaluk
  2 siblings, 0 replies; 17+ messages in thread
From: Pavel Dovgaluk @ 2013-05-29 12:37 UTC (permalink / raw)
  To: 'Mark Burton', 'Peter Maydell'
  Cc: 'Blue Swirl', 'qemu-devel',
	'KONRAD Frédéric'

Hello.

> Spot on Peter,
> The (simplistic) plan is simply to take a snapshot at regular intervals, when you want to step
> backwards, you return to a snapshot, and then re-run forwards to 'just before you started'.
> 
> To answer Blauwirbel, we can't "approximate" this - or 'binary search' for the right place -
> we absolutely have to know exactly how to stop exactly 'just before'. To do that we need a
> measure of where we are. Time isn't really what we want (It has a nasty habit of moving
> forwards even when we are pretending to reverse it!). Ideally we need an instruction count. (I
> say ideally, we could actually get away with something like a block count, but if blocks are
> linked together etc, we would have to be careful).
> 
> I still favor a dedicated counter for this, it seems to make more sense. But - I am wondering
> about a 'basic block counter' rather than an 'instruction counter'.
> 	Note - what I understand by a basic block is something that ends in a jump/branch of
> some description. Hence, one thing I think you can say about a basic block is that each PC
> value within it is unique. Hence, if I know the number of basic blocks executed, and the
> current PC, then I should be able to re-run to there (assuming a deterministic system of
> course).
> 
> I'd be interested to know (a) if there is a sensible place for adding a basic block counter,
> and (b) if people like this route better or worse than an instruction counter?

  We have implemented reverse debugging (through GDB) in QEMU. Translation of the instructions
was changed to increment a special instruction counter before execution of every instruction. 
This counter is used to synchronize events that happen asynchronously in regular execution 
mode (timers, disk operations, etc). Checkpointing is also used in our implementation to 
rollback to previous state and then execute forward.
  Trying to count basic blocks is much more complicated task, because they may be interrupted
with exceptions caused by MMU. Blocks' bounds are not deterministic because they
depend on additional code that you translate, instruction where you stopped (using single-step
or reverse-single-step commands).

  You can find some details here:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6178942&contentType=Conferenc
e+Publications

Pavel Dovgaluk

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-05-29 12:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-07 18:27 [Qemu-devel] [RFC] reverse execution KONRAD Frédéric
2013-05-09 17:54 ` Blue Swirl
2013-05-17 17:23   ` KONRAD Frédéric
2013-05-17 17:54     ` Peter Maydell
2013-05-17 19:16       ` Mark Burton
2013-05-23  1:57         ` Edgar E. Iglesias
2013-05-18 18:52     ` Blue Swirl
2013-05-22 16:14       ` KONRAD Frédéric
2013-05-19  4:37     ` Rob Landley
2013-05-19  7:21       ` Peter Maydell
2013-05-19 20:09         ` Mark Burton
2013-05-19 21:20           ` Peter Maydell
     [not found]             ` <CAD2=zRDphd7N5gCQeX6oVQP=HEbRRMpcwPKEVDj46DHKhgkKMw@mail.gmail.com>
2013-05-19 21:47               ` Brendan Dolan-Gavitt
2013-05-20  5:28             ` Mark Burton
2013-05-19 21:39           ` Rob Landley
2013-05-20  5:34             ` Mark Burton
2013-05-29 12:37           ` Pavel Dovgaluk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.