All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
@ 2016-07-24 12:42 Benjamin Herrenschmidt
  2016-07-24 12:51 ` Benjamin Herrenschmidt
  2016-07-25  0:34 ` Richard Henderson
  0 siblings, 2 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-24 12:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Christian Borntraeger

Hi !

I need help from TCG experts here. I was chasing down a bug causing
some stuff to crash when using vector ops with a ppc32 guest on x86,
but pulling that string led to a whole mess that *may* be affecting a
pile of architetures unless I'm misunderstanding something...

So basically what happens is that some instruction emulation helpers,
like in my case stvebx (target-ppc/mem_helper.c) are doing calls to
cpu_{st,ld}x_data. Let's say cpu_stb_data() for the sake of the
argument.

That is equivalent to calling cpu_stb_data_ra() with a "0" retaddr.

However, if that faults, when tlb_fill() gets eventually called, what I
observe is not 0 in "retaddr" but ... -2.

The reason, as far as I understand, is that cpu_stb_data_ra() calls
helper_ret_stb_mmu() which does:

     retaddr -= GETPC_ADJ;

(which is -2)

Now a whole pile of tlb_fill() implementations (in fact all of them one
way or another) do:

        if (likely(retaddr)) {
            /* now we have a real cpu fault */
            cpu_restore_state(cs, retaddr);
        }

So that test is missed. The result that I obseve is that the fault gets
reported for the wrong instruction (the wrong PC).

Now I did try changing the above test out of curiosity to also check
against -2, but the end result still mis-reported the fault. So
something's going deeper than that I figured out so far...

What *did* work was to copy what x86 does, which is to change my
helper_stvebx() to not use cpu_stb_data at all, but instead use
cpu_stb_data_ra(...., GETPC()), which mimmics what x86 does for some of
it's helpers.

That fixed the specific problem I was chasing.

However, there are a ton of other helpers, in powerpc, s390 and other
archs, doing that cpu_stb_data() the same way we do, so I really wonder
what's going on here.

Some advice would be very much appreciated ;-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-24 12:42 [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? Benjamin Herrenschmidt
@ 2016-07-24 12:51 ` Benjamin Herrenschmidt
  2016-07-24 12:52   ` Benjamin Herrenschmidt
  2016-07-25  0:34 ` Richard Henderson
  1 sibling, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-24 12:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Christian Borntraeger

On Sun, 2016-07-24 at 22:42 +1000, Benjamin Herrenschmidt wrote:
> 
> What *did* work was to copy what x86 does, which is to change my
> helper_stvebx() to not use cpu_stb_data at all, but instead use
> cpu_stb_data_ra(...., GETPC()), which mimmics what x86 does for some
> of
> it's helpers.
> 
> That fixed the specific problem I was chasing.
> 
> However, there are a ton of other helpers, in powerpc, s390 and other
> archs, doing that cpu_stb_data() the same way we do, so I really
> wonder
> what's going on here.
> 
> Some advice would be very much appreciated ;-)

FYI: This probably completely wrong patch (but it was easier than
hacking all the helpers) fixed the problem for me. With this (and the
video driver I wrote that I will publish asap), I can now reliably boot
various versions of MacOS X in qemu ppc using a 7400 CPU.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-24 12:51 ` Benjamin Herrenschmidt
@ 2016-07-24 12:52   ` Benjamin Herrenschmidt
  2016-07-25  0:36     ` Richard Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-24 12:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Christian Borntraeger

On Sun, 2016-07-24 at 22:51 +1000, Benjamin Herrenschmidt wrote:
> 
> FYI: This probably completely wrong patch (but it was easier than
> hacking all the helpers) fixed the problem for me. With this (and the
> video driver I wrote that I will publish asap), I can now reliably
> boot
> various versions of MacOS X in qemu ppc using a 7400 CPU.

And here's the patch:

diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
index eaf69a1..13e8881 100644
--- a/include/exec/cpu_ldst_template.h
+++ b/include/exec/cpu_ldst_template.h
@@ -111,7 +111,7 @@ glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
 static inline RES_TYPE
 glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
 {
-    return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, 0);
+    return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, GETPC());
 }
 
 #if DATA_SIZE <= 2
@@ -149,7 +149,7 @@ glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
 static inline int
 glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
 {
-    return glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(env, ptr, 0);
+    return glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(env, ptr, GETPC());
 }
 #endif
 
@@ -191,7 +191,7 @@ static inline void
 glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr,
                                       RES_TYPE v)
 {
-    glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(env, ptr, v, 0);
+    glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(env, ptr, v, GETPC());
 }
 
 #endif /* !SOFTMMU_CODE_ACCESS */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-24 12:42 [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? Benjamin Herrenschmidt
  2016-07-24 12:51 ` Benjamin Herrenschmidt
@ 2016-07-25  0:34 ` Richard Henderson
  2016-07-25  5:15   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2016-07-25  0:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On 07/24/2016 06:12 PM, Benjamin Herrenschmidt wrote:
> Hi !
>
> I need help from TCG experts here. I was chasing down a bug causing
> some stuff to crash when using vector ops with a ppc32 guest on x86,
> but pulling that string led to a whole mess that *may* be affecting a
> pile of architetures unless I'm misunderstanding something...
>
> So basically what happens is that some instruction emulation helpers,
> like in my case stvebx (target-ppc/mem_helper.c) are doing calls to
> cpu_{st,ld}x_data. Let's say cpu_stb_data() for the sake of the
> argument.
>
> That is equivalent to calling cpu_stb_data_ra() with a "0" retaddr.
>
> However, if that faults, when tlb_fill() gets eventually called, what I
> observe is not 0 in "retaddr" but ... -2.
>
> The reason, as far as I understand, is that cpu_stb_data_ra() calls
> helper_ret_stb_mmu() which does:
>
>      retaddr -= GETPC_ADJ;
>
> (which is -2)

Ouch, yes.

I noticed a related problem recently, while working on the cmpxchg patch set.

In my opinion, we should (1) merge GETRA and GETPC so there's no confusion 
between the two, (2) push all adjustment down to the final moment before use, 
perhaps in cpu_restore_state.

Thus a null value would be properly retained until checked, and one can easily 
call the memory helper functions without confusion.


r~

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-24 12:52   ` Benjamin Herrenschmidt
@ 2016-07-25  0:36     ` Richard Henderson
  2016-07-25  0:46       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2016-07-25  0:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On 07/24/2016 06:22 PM, Benjamin Herrenschmidt wrote:
> On Sun, 2016-07-24 at 22:51 +1000, Benjamin Herrenschmidt wrote:
>> >
>> > FYI: This probably completely wrong patch (but it was easier than
>> > hacking all the helpers) fixed the problem for me. With this (and the
>> > video driver I wrote that I will publish asap), I can now reliably
>> > boot
>> > various versions of MacOS X in qemu ppc using a 7400 CPU.
> And here's the patch:
>
> diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
> index eaf69a1..13e8881 100644
> --- a/include/exec/cpu_ldst_template.h
> +++ b/include/exec/cpu_ldst_template.h
> @@ -111,7 +111,7 @@ glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
>  static inline RES_TYPE
>  glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
>  {
> -    return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, 0);
> +    return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, GETPC());
>  }
>

These functions would have to be always_inline for this to work.  Otherwise you 
could get the helper's PC, not the TCG caller's PC.

But let's try to fix this the other way.



r~

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25  0:36     ` Richard Henderson
@ 2016-07-25  0:46       ` Benjamin Herrenschmidt
  2016-07-25  0:51         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25  0:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On Mon, 2016-07-25 at 06:06 +0530, Richard Henderson wrote:
> These functions would have to be always_inline for this to work. 
> Otherwise you 
> could get the helper's PC, not the TCG caller's PC.
> 
> But let's try to fix this the other way.

I could use some help there as I don't really understand the PC fixup /
adjustment mechanism in qemu... 

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25  0:46       ` Benjamin Herrenschmidt
@ 2016-07-25  0:51         ` Benjamin Herrenschmidt
  2016-07-25 14:00           ` Richard Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25  0:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On Mon, 2016-07-25 at 10:46 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2016-07-25 at 06:06 +0530, Richard Henderson wrote:
> > 
> > These functions would have to be always_inline for this to work. 
> > Otherwise you 
> > could get the helper's PC, not the TCG caller's PC.
> > 
> > But let's try to fix this the other way.
> 
> I could use some help there as I don't really understand the PC fixup
> adjustment mechanism in qemu... 

One thing I can do, but I don't know whether that's worthwhile (you
tell me), is change all translation helpers in powerpc to do like
x86, which is to pass the RA along and never use the non_ra() variants.

But that's quite a bit of churn, so let me know if your plan is better.

Are those functions always meant to be called within translation
helpers ? IE, the fault can safely longjmp out and it's just a matter
of finding the proper instruction PC to report ?

Or can they also be called outside of that context ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25  0:34 ` Richard Henderson
@ 2016-07-25  5:15   ` Benjamin Herrenschmidt
  2016-07-25 14:12     ` Richard Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25  5:15 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On Mon, 2016-07-25 at 06:04 +0530, Richard Henderson wrote:
> I noticed a related problem recently, while working on the cmpxchg patch set.
> 
> In my opinion, we should (1) merge GETRA and GETPC so there's no confusion 
> between the two, (2) push all adjustment down to the final moment before use, 
> perhaps in cpu_restore_state.
> 
> Thus a null value would be properly retained until checked, and one can easily 
> call the memory helper functions without confusion.

Ok, after a bit more scrubbing of the code I think I understand what you
mean.

Now assuming we fix that, there is still a problem if the target code, such
as the PPC code, calls a helper that might cause a fault without first
updating the PC in the env, right ? IE. On powerpc for example, that means
that any instruction using a helper that might potentially do loads or stores
needs to first call gen_update_nip().

(The same way we seem to do before generating other exceptions such as traps,
well provided we do it correctly everywhere, I need to double check).

Either that, or change the helpers to capture the PC using GETPC/GETRA from
the first level of helper function (so as to ensure the return address is
correct).

Am I right ?

IE. Even if we fix the 0 vs. -2 problem, I still need this patch:

--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -6916,6 +6916,7 @@ static void gen_lve##name(DisasContext *ctx)                            \
         if (size > 1) {                                                 \
             tcg_gen_andi_tl(EA, EA, ~(size - 1));                       \
         }                                                               \
+        gen_update_nip(ctx, ctx->nip - 4);                              \
         rs = gen_avr_ptr(rS(ctx->opcode));                              \
         gen_helper_lve##name(cpu_env, rs, EA);                          \
         tcg_temp_free(EA);                                              \
@@ -6937,6 +6938,7 @@ static void gen_stve##name(DisasContext *ctx)                           \
         if (size > 1) {                                                 \
             tcg_gen_andi_tl(EA, EA, ~(size - 1));                       \
         }                                                               \
+        gen_update_nip(ctx, ctx->nip - 4);                              \
         rs = gen_avr_ptr(rS(ctx->opcode));                              \
         gen_helper_stve##name(cpu_env, rs, EA);                         \
         tcg_temp_free(EA);                                              \

(And possibly others I haven't yet audited)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25  0:51         ` Benjamin Herrenschmidt
@ 2016-07-25 14:00           ` Richard Henderson
  2016-07-25 21:42             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2016-07-25 14:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On 07/25/2016 06:21 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2016-07-25 at 10:46 +1000, Benjamin Herrenschmidt wrote:
>> On Mon, 2016-07-25 at 06:06 +0530, Richard Henderson wrote:
>>>
>>> These functions would have to be always_inline for this to work.
>>> Otherwise you
>>> could get the helper's PC, not the TCG caller's PC.
>>>
>>> But let's try to fix this the other way.
>>
>> I could use some help there as I don't really understand the PC fixup
>> adjustment mechanism in qemu...
>
> One thing I can do, but I don't know whether that's worthwhile (you
> tell me), is change all translation helpers in powerpc to do like
> x86, which is to pass the RA along and never use the non_ra() variants.
>
> But that's quite a bit of churn, so let me know if your plan is better.

It is worthwhile, because that allows you to drop some state saving within the 
translation.  With the assumption being, of course, that the exceptional case 
is rare and so we save on the state saving most of the time.

Or, as appears to be the case for e.g. LVEBX, there currently is no such state 
save, and so any exceptions raised by this insn will point to the wrong insn, 
and so the conversion fixes a bug.

> Are those functions always meant to be called within translation
> helpers ? IE, the fault can safely longjmp out and it's just a matter
> of finding the proper instruction PC to report ?

Correct.

> Or can they also be called outside of that context ?

No, not without a valid return address.

E.g. it's not valid to have one helper call another, and for the second helper 
use GETPC.  For this, typically, one must factor out a common function which 
accepts a "retaddr" argument, which the callers must fill in with GETPC.


r~

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25  5:15   ` Benjamin Herrenschmidt
@ 2016-07-25 14:12     ` Richard Henderson
  2016-07-25 21:46       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2016-07-25 14:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On 07/25/2016 10:45 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2016-07-25 at 06:04 +0530, Richard Henderson wrote:
>> I noticed a related problem recently, while working on the cmpxchg patch set.
>>
>> In my opinion, we should (1) merge GETRA and GETPC so there's no confusion
>> between the two, (2) push all adjustment down to the final moment before use,
>> perhaps in cpu_restore_state.
>>
>> Thus a null value would be properly retained until checked, and one can easily
>> call the memory helper functions without confusion.
>
> Ok, after a bit more scrubbing of the code I think I understand what you
> mean.
>
> Now assuming we fix that, there is still a problem if the target code, such
> as the PPC code, calls a helper that might cause a fault without first
> updating the PC in the env, right ? IE. On powerpc for example, that means
> that any instruction using a helper that might potentially do loads or stores
> needs to first call gen_update_nip().

Well, that's the bug with the current code, yes.

But what we gain by transforming this code to use (via indirection) 
cpu_loop_exit_restore, is the ability to reconstruct values, like NIP, without 
first having saved them.

Any data that you give to tcg_gen_insn_start will be given to 
restore_state_to_opc.  PPC currently does

     tcg_gen_insn_start(ctx.nip);

and

     env->nip = data[0];

so you're good there.

For some targets, we also restore part of the flags computation with this 
mechanism.  With more trickery, ARM is (intending to?) compute exception 
syndrome information with this.  As I understand it, this is very much akin to 
the PPC gen_set_access_type, so perhaps in future there's some savings to be 
had there.

> Either that, or change the helpers to capture the PC using GETPC/GETRA from
> the first level of helper function (so as to ensure the return address is
> correct).
>
> Am I right ?

You are right.

> IE. Even if we fix the 0 vs. -2 problem, I still need this patch:
>
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -6916,6 +6916,7 @@ static void gen_lve##name(DisasContext *ctx)                            \
>          if (size > 1) {                                                 \
>              tcg_gen_andi_tl(EA, EA, ~(size - 1));                       \
>          }                                                               \
> +        gen_update_nip(ctx, ctx->nip - 4);                              \
>          rs = gen_avr_ptr(rS(ctx->opcode));                              \
>          gen_helper_lve##name(cpu_env, rs, EA);                          \
>          tcg_temp_free(EA);                                              \
> @@ -6937,6 +6938,7 @@ static void gen_stve##name(DisasContext *ctx)                           \
>          if (size > 1) {                                                 \
>              tcg_gen_andi_tl(EA, EA, ~(size - 1));                       \
>          }                                                               \
> +        gen_update_nip(ctx, ctx->nip - 4);                              \
>          rs = gen_avr_ptr(rS(ctx->opcode));                              \
>          gen_helper_stve##name(cpu_env, rs, EA);                         \
>          tcg_temp_free(EA);                                              \
>
> (And possibly others I haven't yet audited)

Yes indeed.  I'm amused by this, since I read your emails in the wrong order, 
and so discovered exactly this problem while answering the other email.


r~

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25 14:00           ` Richard Henderson
@ 2016-07-25 21:42             ` Benjamin Herrenschmidt
  2016-07-25 22:19               ` Peter Maydell
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25 21:42 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On Mon, 2016-07-25 at 19:30 +0530, Richard Henderson wrote:
> > Or can they also be called outside of that context ?
> 
> No, not without a valid return address.
> 
> E.g. it's not valid to have one helper call another, and for the second helper 
> use GETPC.  For this, typically, one must factor out a common function which 
> accepts a "retaddr" argument, which the callers must fill in with GETPC.

Right, I've figured that out. I notice that the cpu_ldl_code() are
sprinkled in parts that are "chancy".

For example we have one in powerpc_excp() to read the faulting
instruction, though that *should* never fail it's till not great.

I haven't completely figured out what code path instruction translation
faults take, I assume we longjmp out of the translation loop the same
was as we do out of the execution loop ?

Note: I've started cleaning that on ppc (but not fixing the -2 bug yet)
in there: very much a work in progress but I'd be happy to have initial
feedback (ignore patch 1 about MOL OSI, it's unrelated):

https://github.com/ozbenh/qemu/commits/wip

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25 14:12     ` Richard Henderson
@ 2016-07-25 21:46       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25 21:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger

On Mon, 2016-07-25 at 19:42 +0530, Richard Henderson wrote:
> For some targets, we also restore part of the flags computation with this 
> mechanism.  With more trickery, ARM is (intending to?) compute exception 
> syndrome information with this.  As I understand it, this is very much akin to 
> the PPC gen_set_access_type, so perhaps in future there's some savings to be 
> had there

Indeed. Another issue we have is generating the opcode bits in DSISR.
Now thankfully for most modern CPUs this is no longer done in HW but at
the moment, I dont like how we call cpu_ldl_code() in powerpc_excp.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25 21:42             ` Benjamin Herrenschmidt
@ 2016-07-25 22:19               ` Peter Maydell
  2016-07-25 22:42                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2016-07-25 22:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Richard Henderson, QEMU Developers, Paolo Bonzini, Christian Borntraeger

On 25 July 2016 at 22:42, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Mon, 2016-07-25 at 19:30 +0530, Richard Henderson wrote:
>> > Or can they also be called outside of that context ?
>>
>> No, not without a valid return address.
>>
>> E.g. it's not valid to have one helper call another, and for the second helper
>> use GETPC.  For this, typically, one must factor out a common function which
>> accepts a "retaddr" argument, which the callers must fill in with GETPC.
>
> Right, I've figured that out. I notice that the cpu_ldl_code() are
> sprinkled in parts that are "chancy".
>
> For example we have one in powerpc_excp() to read the faulting
> instruction, though that *should* never fail it's till not great.

Mmm. Strictly speaking you can't guarantee that that load will
work, because the code you're executing could be still cached
in the QEMU TLB while the (perverse) guest rewrites the page table
so the ldl_code won't work.

You might like to take a look at how target-arm handles the
similar case (wanting to know information about source register
numbers etc for some memory faults). We use a TCG instruction
parameter to save the information as we translate the code
in target-arm/translate-a64.c, then in restore_state_to_opc()
(called after the fault) we can fish it out and put it in the
CPU state struct the same way we regain the faulting PC. Then
in the exception handler it's available for use. Commit aaa1f954d4
is the meat of it.

Alternatively if you just want to do "a load at virtual address
but don't longjump anywhere" you probably need to:
 * do the virt-to-phys translation manually (ie by calling
   some ppc specific function), so you can capture success/failure
   of the MMU access checks (and without setting guest visible
   MMU fault status info)
 * use address_space_ldl() on the result, so you can pass in
   a MemTxResult* to capture success/failure of the phys access

That's pretty clunky. I meant at some point to look at
whether we could provide versions of the cpu_ldl* type
functions that allowed a MemTxResult*, but never got to it.

> I haven't completely figured out what code path instruction translation
> faults take, I assume we longjmp out of the translation loop the same
> was as we do out of the execution loop ?

I suggest you step through it in a debugger if you fancy a laugh.
As I recall it works more by luck than judgement and includes
rather more repeated work than is actually necessary. (That
is, we do longjump out, but not necessarily to the most
sensible point.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25 22:19               ` Peter Maydell
@ 2016-07-25 22:42                 ` Benjamin Herrenschmidt
  2016-07-25 23:22                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25 22:42 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, QEMU Developers, Paolo Bonzini, Christian Borntraeger

On Mon, 2016-07-25 at 23:19 +0100, Peter Maydell wrote:
> > For example we have one in powerpc_excp() to read the faulting
> > instruction, though that *should* never fail it's till not great.
> 
> Mmm. Strictly speaking you can't guarantee that that load will
> work, because the code you're executing could be still cached
> in the QEMU TLB while the (perverse) guest rewrites the page table
> so the ldl_code won't work.

Right, it worries me, it's unlikely but possible.

> You might like to take a look at how target-arm handles the
> similar case (wanting to know information about source register
> numbers etc for some memory faults). We use a TCG instruction
> parameter to save the information as we translate the code
> in target-arm/translate-a64.c, then in restore_state_to_opc()
> (called after the fault) we can fish it out and put it in the
> CPU state struct the same way we regain the faulting PC. Then
> in the exception handler it's available for use. Commit aaa1f954d4
> is the meat of it.

We do something a bit different on ppc where we store the access type
before every access, however the DSISR case is special in that on older
CPUs, it's expected to contains a whole subset of the opcode which is
quite a bit more info than what you want here...

I'm thinking maybe we should use a form of load that returns an error
instead of longjmp'ing, and if we do error out, flush the tb for that
instruction and replay which should cause the translate path to reload
the TLB for it but it's still fishy.

> Alternatively if you just want to do "a load at virtual address
> but don't longjump anywhere" you probably need to:
>  * do the virt-to-phys translation manually (ie by calling
>    some ppc specific function), so you can capture success/failure
>    of the MMU access checks (and without setting guest visible
>    MMU fault status info)
>  * use address_space_ldl() on the result, so you can pass in
>    a MemTxResult* to capture success/failure of the phys access

Yup.

> That's pretty clunky. I meant at some point to look at
> whether we could provide versions of the cpu_ldl* type
> functions that allowed a MemTxResult*, but never got to it.
> 
> > I haven't completely figured out what code path instruction
> translation faults take, I assume we longjmp out of the translation
> loop the same
> > was as we do out of the execution loop ?
> 
> I suggest you step through it in a debugger if you fancy a laugh.
> As I recall it works more by luck than judgement and includes
> rather more repeated work than is actually necessary. (That
> is, we do longjump out, but not necessarily to the most
> sensible point.)

Haha, I was starting to guess that :-) But that isn't a battle I want
to fight today, enough on my plate already !

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
  2016-07-25 22:42                 ` Benjamin Herrenschmidt
@ 2016-07-25 23:22                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-25 23:22 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, QEMU Developers, Paolo Bonzini, Christian Borntraeger

On Tue, 2016-07-26 at 08:42 +1000, Benjamin Herrenschmidt wrote:

> We do something a bit different on ppc where we store the access type
> before every access, however the DSISR case is special in that on
> older
> CPUs, it's expected to contains a whole subset of the opcode which is
> quite a bit more info than what you want here...
> 
> I'm thinking maybe we should use a form of load that returns an error
> instead of longjmp'ing, and if we do error out, flush the tb for that
> instruction and replay which should cause the translate path to
> reload
> the TLB for it but it's still fishy.

I have a better idea !

This is only a problem for alignment interrupts, and those are very
rare, we only generate them in some cases of broken forms like
trying to do a ll/sc on an unaligned address.

So I'm thinking I'm just going to pass the opcode to the helper
in the error_code field.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-07-25 23:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-24 12:42 [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? Benjamin Herrenschmidt
2016-07-24 12:51 ` Benjamin Herrenschmidt
2016-07-24 12:52   ` Benjamin Herrenschmidt
2016-07-25  0:36     ` Richard Henderson
2016-07-25  0:46       ` Benjamin Herrenschmidt
2016-07-25  0:51         ` Benjamin Herrenschmidt
2016-07-25 14:00           ` Richard Henderson
2016-07-25 21:42             ` Benjamin Herrenschmidt
2016-07-25 22:19               ` Peter Maydell
2016-07-25 22:42                 ` Benjamin Herrenschmidt
2016-07-25 23:22                   ` Benjamin Herrenschmidt
2016-07-25  0:34 ` Richard Henderson
2016-07-25  5:15   ` Benjamin Herrenschmidt
2016-07-25 14:12     ` Richard Henderson
2016-07-25 21:46       ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.