* [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? @ 2016-07-24 12:42 Benjamin Herrenschmidt 2016-07-24 12:51 ` Benjamin Herrenschmidt 2016-07-25 0:34 ` Richard Henderson 0 siblings, 2 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-24 12:42 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Christian Borntraeger Hi ! I need help from TCG experts here. I was chasing down a bug causing some stuff to crash when using vector ops with a ppc32 guest on x86, but pulling that string led to a whole mess that *may* be affecting a pile of architetures unless I'm misunderstanding something... So basically what happens is that some instruction emulation helpers, like in my case stvebx (target-ppc/mem_helper.c) are doing calls to cpu_{st,ld}x_data. Let's say cpu_stb_data() for the sake of the argument. That is equivalent to calling cpu_stb_data_ra() with a "0" retaddr. However, if that faults, when tlb_fill() gets eventually called, what I observe is not 0 in "retaddr" but ... -2. The reason, as far as I understand, is that cpu_stb_data_ra() calls helper_ret_stb_mmu() which does: retaddr -= GETPC_ADJ; (which is -2) Now a whole pile of tlb_fill() implementations (in fact all of them one way or another) do: if (likely(retaddr)) { /* now we have a real cpu fault */ cpu_restore_state(cs, retaddr); } So that test is missed. The result that I obseve is that the fault gets reported for the wrong instruction (the wrong PC). Now I did try changing the above test out of curiosity to also check against -2, but the end result still mis-reported the fault. So something's going deeper than that I figured out so far... What *did* work was to copy what x86 does, which is to change my helper_stvebx() to not use cpu_stb_data at all, but instead use cpu_stb_data_ra(...., GETPC()), which mimmics what x86 does for some of it's helpers. That fixed the specific problem I was chasing. However, there are a ton of other helpers, in powerpc, s390 and other archs, doing that cpu_stb_data() the same way we do, so I really wonder what's going on here. Some advice would be very much appreciated ;-) Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-24 12:42 [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? Benjamin Herrenschmidt @ 2016-07-24 12:51 ` Benjamin Herrenschmidt 2016-07-24 12:52 ` Benjamin Herrenschmidt 2016-07-25 0:34 ` Richard Henderson 1 sibling, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-24 12:51 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Christian Borntraeger On Sun, 2016-07-24 at 22:42 +1000, Benjamin Herrenschmidt wrote: > > What *did* work was to copy what x86 does, which is to change my > helper_stvebx() to not use cpu_stb_data at all, but instead use > cpu_stb_data_ra(...., GETPC()), which mimmics what x86 does for some > of > it's helpers. > > That fixed the specific problem I was chasing. > > However, there are a ton of other helpers, in powerpc, s390 and other > archs, doing that cpu_stb_data() the same way we do, so I really > wonder > what's going on here. > > Some advice would be very much appreciated ;-) FYI: This probably completely wrong patch (but it was easier than hacking all the helpers) fixed the problem for me. With this (and the video driver I wrote that I will publish asap), I can now reliably boot various versions of MacOS X in qemu ppc using a 7400 CPU. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-24 12:51 ` Benjamin Herrenschmidt @ 2016-07-24 12:52 ` Benjamin Herrenschmidt 2016-07-25 0:36 ` Richard Henderson 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-24 12:52 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Christian Borntraeger On Sun, 2016-07-24 at 22:51 +1000, Benjamin Herrenschmidt wrote: > > FYI: This probably completely wrong patch (but it was easier than > hacking all the helpers) fixed the problem for me. With this (and the > video driver I wrote that I will publish asap), I can now reliably > boot > various versions of MacOS X in qemu ppc using a 7400 CPU. And here's the patch: diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h index eaf69a1..13e8881 100644 --- a/include/exec/cpu_ldst_template.h +++ b/include/exec/cpu_ldst_template.h @@ -111,7 +111,7 @@ glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, static inline RES_TYPE glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr) { - return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, 0); + return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, GETPC()); } #if DATA_SIZE <= 2 @@ -149,7 +149,7 @@ glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, static inline int glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr) { - return glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(env, ptr, 0); + return glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(env, ptr, GETPC()); } #endif @@ -191,7 +191,7 @@ static inline void glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr, RES_TYPE v) { - glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(env, ptr, v, 0); + glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(env, ptr, v, GETPC()); } #endif /* !SOFTMMU_CODE_ACCESS */ ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-24 12:52 ` Benjamin Herrenschmidt @ 2016-07-25 0:36 ` Richard Henderson 2016-07-25 0:46 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Richard Henderson @ 2016-07-25 0:36 UTC (permalink / raw) To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On 07/24/2016 06:22 PM, Benjamin Herrenschmidt wrote: > On Sun, 2016-07-24 at 22:51 +1000, Benjamin Herrenschmidt wrote: >> > >> > FYI: This probably completely wrong patch (but it was easier than >> > hacking all the helpers) fixed the problem for me. With this (and the >> > video driver I wrote that I will publish asap), I can now reliably >> > boot >> > various versions of MacOS X in qemu ppc using a 7400 CPU. > And here's the patch: > > diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h > index eaf69a1..13e8881 100644 > --- a/include/exec/cpu_ldst_template.h > +++ b/include/exec/cpu_ldst_template.h > @@ -111,7 +111,7 @@ glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, > static inline RES_TYPE > glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr) > { > - return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, 0); > + return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, GETPC()); > } > These functions would have to be always_inline for this to work. Otherwise you could get the helper's PC, not the TCG caller's PC. But let's try to fix this the other way. r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 0:36 ` Richard Henderson @ 2016-07-25 0:46 ` Benjamin Herrenschmidt 2016-07-25 0:51 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 0:46 UTC (permalink / raw) To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On Mon, 2016-07-25 at 06:06 +0530, Richard Henderson wrote: > These functions would have to be always_inline for this to work. > Otherwise you > could get the helper's PC, not the TCG caller's PC. > > But let's try to fix this the other way. I could use some help there as I don't really understand the PC fixup / adjustment mechanism in qemu... Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 0:46 ` Benjamin Herrenschmidt @ 2016-07-25 0:51 ` Benjamin Herrenschmidt 2016-07-25 14:00 ` Richard Henderson 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 0:51 UTC (permalink / raw) To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On Mon, 2016-07-25 at 10:46 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2016-07-25 at 06:06 +0530, Richard Henderson wrote: > > > > These functions would have to be always_inline for this to work. > > Otherwise you > > could get the helper's PC, not the TCG caller's PC. > > > > But let's try to fix this the other way. > > I could use some help there as I don't really understand the PC fixup > adjustment mechanism in qemu... One thing I can do, but I don't know whether that's worthwhile (you tell me), is change all translation helpers in powerpc to do like x86, which is to pass the RA along and never use the non_ra() variants. But that's quite a bit of churn, so let me know if your plan is better. Are those functions always meant to be called within translation helpers ? IE, the fault can safely longjmp out and it's just a matter of finding the proper instruction PC to report ? Or can they also be called outside of that context ? Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 0:51 ` Benjamin Herrenschmidt @ 2016-07-25 14:00 ` Richard Henderson 2016-07-25 21:42 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Richard Henderson @ 2016-07-25 14:00 UTC (permalink / raw) To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On 07/25/2016 06:21 AM, Benjamin Herrenschmidt wrote: > On Mon, 2016-07-25 at 10:46 +1000, Benjamin Herrenschmidt wrote: >> On Mon, 2016-07-25 at 06:06 +0530, Richard Henderson wrote: >>> >>> These functions would have to be always_inline for this to work. >>> Otherwise you >>> could get the helper's PC, not the TCG caller's PC. >>> >>> But let's try to fix this the other way. >> >> I could use some help there as I don't really understand the PC fixup >> adjustment mechanism in qemu... > > One thing I can do, but I don't know whether that's worthwhile (you > tell me), is change all translation helpers in powerpc to do like > x86, which is to pass the RA along and never use the non_ra() variants. > > But that's quite a bit of churn, so let me know if your plan is better. It is worthwhile, because that allows you to drop some state saving within the translation. With the assumption being, of course, that the exceptional case is rare and so we save on the state saving most of the time. Or, as appears to be the case for e.g. LVEBX, there currently is no such state save, and so any exceptions raised by this insn will point to the wrong insn, and so the conversion fixes a bug. > Are those functions always meant to be called within translation > helpers ? IE, the fault can safely longjmp out and it's just a matter > of finding the proper instruction PC to report ? Correct. > Or can they also be called outside of that context ? No, not without a valid return address. E.g. it's not valid to have one helper call another, and for the second helper use GETPC. For this, typically, one must factor out a common function which accepts a "retaddr" argument, which the callers must fill in with GETPC. r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 14:00 ` Richard Henderson @ 2016-07-25 21:42 ` Benjamin Herrenschmidt 2016-07-25 22:19 ` Peter Maydell 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 21:42 UTC (permalink / raw) To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On Mon, 2016-07-25 at 19:30 +0530, Richard Henderson wrote: > > Or can they also be called outside of that context ? > > No, not without a valid return address. > > E.g. it's not valid to have one helper call another, and for the second helper > use GETPC. For this, typically, one must factor out a common function which > accepts a "retaddr" argument, which the callers must fill in with GETPC. Right, I've figured that out. I notice that the cpu_ldl_code() are sprinkled in parts that are "chancy". For example we have one in powerpc_excp() to read the faulting instruction, though that *should* never fail it's till not great. I haven't completely figured out what code path instruction translation faults take, I assume we longjmp out of the translation loop the same was as we do out of the execution loop ? Note: I've started cleaning that on ppc (but not fixing the -2 bug yet) in there: very much a work in progress but I'd be happy to have initial feedback (ignore patch 1 about MOL OSI, it's unrelated): https://github.com/ozbenh/qemu/commits/wip Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 21:42 ` Benjamin Herrenschmidt @ 2016-07-25 22:19 ` Peter Maydell 2016-07-25 22:42 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Peter Maydell @ 2016-07-25 22:19 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Richard Henderson, QEMU Developers, Paolo Bonzini, Christian Borntraeger On 25 July 2016 at 22:42, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Mon, 2016-07-25 at 19:30 +0530, Richard Henderson wrote: >> > Or can they also be called outside of that context ? >> >> No, not without a valid return address. >> >> E.g. it's not valid to have one helper call another, and for the second helper >> use GETPC. For this, typically, one must factor out a common function which >> accepts a "retaddr" argument, which the callers must fill in with GETPC. > > Right, I've figured that out. I notice that the cpu_ldl_code() are > sprinkled in parts that are "chancy". > > For example we have one in powerpc_excp() to read the faulting > instruction, though that *should* never fail it's till not great. Mmm. Strictly speaking you can't guarantee that that load will work, because the code you're executing could be still cached in the QEMU TLB while the (perverse) guest rewrites the page table so the ldl_code won't work. You might like to take a look at how target-arm handles the similar case (wanting to know information about source register numbers etc for some memory faults). We use a TCG instruction parameter to save the information as we translate the code in target-arm/translate-a64.c, then in restore_state_to_opc() (called after the fault) we can fish it out and put it in the CPU state struct the same way we regain the faulting PC. Then in the exception handler it's available for use. Commit aaa1f954d4 is the meat of it. Alternatively if you just want to do "a load at virtual address but don't longjump anywhere" you probably need to: * do the virt-to-phys translation manually (ie by calling some ppc specific function), so you can capture success/failure of the MMU access checks (and without setting guest visible MMU fault status info) * use address_space_ldl() on the result, so you can pass in a MemTxResult* to capture success/failure of the phys access That's pretty clunky. I meant at some point to look at whether we could provide versions of the cpu_ldl* type functions that allowed a MemTxResult*, but never got to it. > I haven't completely figured out what code path instruction translation > faults take, I assume we longjmp out of the translation loop the same > was as we do out of the execution loop ? I suggest you step through it in a debugger if you fancy a laugh. As I recall it works more by luck than judgement and includes rather more repeated work than is actually necessary. (That is, we do longjump out, but not necessarily to the most sensible point.) thanks -- PMM ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 22:19 ` Peter Maydell @ 2016-07-25 22:42 ` Benjamin Herrenschmidt 2016-07-25 23:22 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 22:42 UTC (permalink / raw) To: Peter Maydell Cc: Richard Henderson, QEMU Developers, Paolo Bonzini, Christian Borntraeger On Mon, 2016-07-25 at 23:19 +0100, Peter Maydell wrote: > > For example we have one in powerpc_excp() to read the faulting > > instruction, though that *should* never fail it's till not great. > > Mmm. Strictly speaking you can't guarantee that that load will > work, because the code you're executing could be still cached > in the QEMU TLB while the (perverse) guest rewrites the page table > so the ldl_code won't work. Right, it worries me, it's unlikely but possible. > You might like to take a look at how target-arm handles the > similar case (wanting to know information about source register > numbers etc for some memory faults). We use a TCG instruction > parameter to save the information as we translate the code > in target-arm/translate-a64.c, then in restore_state_to_opc() > (called after the fault) we can fish it out and put it in the > CPU state struct the same way we regain the faulting PC. Then > in the exception handler it's available for use. Commit aaa1f954d4 > is the meat of it. We do something a bit different on ppc where we store the access type before every access, however the DSISR case is special in that on older CPUs, it's expected to contains a whole subset of the opcode which is quite a bit more info than what you want here... I'm thinking maybe we should use a form of load that returns an error instead of longjmp'ing, and if we do error out, flush the tb for that instruction and replay which should cause the translate path to reload the TLB for it but it's still fishy. > Alternatively if you just want to do "a load at virtual address > but don't longjump anywhere" you probably need to: > * do the virt-to-phys translation manually (ie by calling > some ppc specific function), so you can capture success/failure > of the MMU access checks (and without setting guest visible > MMU fault status info) > * use address_space_ldl() on the result, so you can pass in > a MemTxResult* to capture success/failure of the phys access Yup. > That's pretty clunky. I meant at some point to look at > whether we could provide versions of the cpu_ldl* type > functions that allowed a MemTxResult*, but never got to it. > > > I haven't completely figured out what code path instruction > translation faults take, I assume we longjmp out of the translation > loop the same > > was as we do out of the execution loop ? > > I suggest you step through it in a debugger if you fancy a laugh. > As I recall it works more by luck than judgement and includes > rather more repeated work than is actually necessary. (That > is, we do longjump out, but not necessarily to the most > sensible point.) Haha, I was starting to guess that :-) But that isn't a battle I want to fight today, enough on my plate already ! Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 22:42 ` Benjamin Herrenschmidt @ 2016-07-25 23:22 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 23:22 UTC (permalink / raw) To: Peter Maydell Cc: Richard Henderson, QEMU Developers, Paolo Bonzini, Christian Borntraeger On Tue, 2016-07-26 at 08:42 +1000, Benjamin Herrenschmidt wrote: > We do something a bit different on ppc where we store the access type > before every access, however the DSISR case is special in that on > older > CPUs, it's expected to contains a whole subset of the opcode which is > quite a bit more info than what you want here... > > I'm thinking maybe we should use a form of load that returns an error > instead of longjmp'ing, and if we do error out, flush the tb for that > instruction and replay which should cause the translate path to > reload > the TLB for it but it's still fishy. I have a better idea ! This is only a problem for alignment interrupts, and those are very rare, we only generate them in some cases of broken forms like trying to do a ll/sc on an unaligned address. So I'm thinking I'm just going to pass the opcode to the helper in the error_code field. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-24 12:42 [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? Benjamin Herrenschmidt 2016-07-24 12:51 ` Benjamin Herrenschmidt @ 2016-07-25 0:34 ` Richard Henderson 2016-07-25 5:15 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 15+ messages in thread From: Richard Henderson @ 2016-07-25 0:34 UTC (permalink / raw) To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On 07/24/2016 06:12 PM, Benjamin Herrenschmidt wrote: > Hi ! > > I need help from TCG experts here. I was chasing down a bug causing > some stuff to crash when using vector ops with a ppc32 guest on x86, > but pulling that string led to a whole mess that *may* be affecting a > pile of architetures unless I'm misunderstanding something... > > So basically what happens is that some instruction emulation helpers, > like in my case stvebx (target-ppc/mem_helper.c) are doing calls to > cpu_{st,ld}x_data. Let's say cpu_stb_data() for the sake of the > argument. > > That is equivalent to calling cpu_stb_data_ra() with a "0" retaddr. > > However, if that faults, when tlb_fill() gets eventually called, what I > observe is not 0 in "retaddr" but ... -2. > > The reason, as far as I understand, is that cpu_stb_data_ra() calls > helper_ret_stb_mmu() which does: > > retaddr -= GETPC_ADJ; > > (which is -2) Ouch, yes. I noticed a related problem recently, while working on the cmpxchg patch set. In my opinion, we should (1) merge GETRA and GETPC so there's no confusion between the two, (2) push all adjustment down to the final moment before use, perhaps in cpu_restore_state. Thus a null value would be properly retained until checked, and one can easily call the memory helper functions without confusion. r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 0:34 ` Richard Henderson @ 2016-07-25 5:15 ` Benjamin Herrenschmidt 2016-07-25 14:12 ` Richard Henderson 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 5:15 UTC (permalink / raw) To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On Mon, 2016-07-25 at 06:04 +0530, Richard Henderson wrote: > I noticed a related problem recently, while working on the cmpxchg patch set. > > In my opinion, we should (1) merge GETRA and GETPC so there's no confusion > between the two, (2) push all adjustment down to the final moment before use, > perhaps in cpu_restore_state. > > Thus a null value would be properly retained until checked, and one can easily > call the memory helper functions without confusion. Ok, after a bit more scrubbing of the code I think I understand what you mean. Now assuming we fix that, there is still a problem if the target code, such as the PPC code, calls a helper that might cause a fault without first updating the PC in the env, right ? IE. On powerpc for example, that means that any instruction using a helper that might potentially do loads or stores needs to first call gen_update_nip(). (The same way we seem to do before generating other exceptions such as traps, well provided we do it correctly everywhere, I need to double check). Either that, or change the helpers to capture the PC using GETPC/GETRA from the first level of helper function (so as to ensure the return address is correct). Am I right ? IE. Even if we fix the 0 vs. -2 problem, I still need this patch: --- a/target-ppc/translate.c +++ b/target-ppc/translate.c @@ -6916,6 +6916,7 @@ static void gen_lve##name(DisasContext *ctx) \ if (size > 1) { \ tcg_gen_andi_tl(EA, EA, ~(size - 1)); \ } \ + gen_update_nip(ctx, ctx->nip - 4); \ rs = gen_avr_ptr(rS(ctx->opcode)); \ gen_helper_lve##name(cpu_env, rs, EA); \ tcg_temp_free(EA); \ @@ -6937,6 +6938,7 @@ static void gen_stve##name(DisasContext *ctx) \ if (size > 1) { \ tcg_gen_andi_tl(EA, EA, ~(size - 1)); \ } \ + gen_update_nip(ctx, ctx->nip - 4); \ rs = gen_avr_ptr(rS(ctx->opcode)); \ gen_helper_stve##name(cpu_env, rs, EA); \ tcg_temp_free(EA); \ (And possibly others I haven't yet audited) Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 5:15 ` Benjamin Herrenschmidt @ 2016-07-25 14:12 ` Richard Henderson 2016-07-25 21:46 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Richard Henderson @ 2016-07-25 14:12 UTC (permalink / raw) To: Benjamin Herrenschmidt, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On 07/25/2016 10:45 AM, Benjamin Herrenschmidt wrote: > On Mon, 2016-07-25 at 06:04 +0530, Richard Henderson wrote: >> I noticed a related problem recently, while working on the cmpxchg patch set. >> >> In my opinion, we should (1) merge GETRA and GETPC so there's no confusion >> between the two, (2) push all adjustment down to the final moment before use, >> perhaps in cpu_restore_state. >> >> Thus a null value would be properly retained until checked, and one can easily >> call the memory helper functions without confusion. > > Ok, after a bit more scrubbing of the code I think I understand what you > mean. > > Now assuming we fix that, there is still a problem if the target code, such > as the PPC code, calls a helper that might cause a fault without first > updating the PC in the env, right ? IE. On powerpc for example, that means > that any instruction using a helper that might potentially do loads or stores > needs to first call gen_update_nip(). Well, that's the bug with the current code, yes. But what we gain by transforming this code to use (via indirection) cpu_loop_exit_restore, is the ability to reconstruct values, like NIP, without first having saved them. Any data that you give to tcg_gen_insn_start will be given to restore_state_to_opc. PPC currently does tcg_gen_insn_start(ctx.nip); and env->nip = data[0]; so you're good there. For some targets, we also restore part of the flags computation with this mechanism. With more trickery, ARM is (intending to?) compute exception syndrome information with this. As I understand it, this is very much akin to the PPC gen_set_access_type, so perhaps in future there's some savings to be had there. > Either that, or change the helpers to capture the PC using GETPC/GETRA from > the first level of helper function (so as to ensure the return address is > correct). > > Am I right ? You are right. > IE. Even if we fix the 0 vs. -2 problem, I still need this patch: > > --- a/target-ppc/translate.c > +++ b/target-ppc/translate.c > @@ -6916,6 +6916,7 @@ static void gen_lve##name(DisasContext *ctx) \ > if (size > 1) { \ > tcg_gen_andi_tl(EA, EA, ~(size - 1)); \ > } \ > + gen_update_nip(ctx, ctx->nip - 4); \ > rs = gen_avr_ptr(rS(ctx->opcode)); \ > gen_helper_lve##name(cpu_env, rs, EA); \ > tcg_temp_free(EA); \ > @@ -6937,6 +6938,7 @@ static void gen_stve##name(DisasContext *ctx) \ > if (size > 1) { \ > tcg_gen_andi_tl(EA, EA, ~(size - 1)); \ > } \ > + gen_update_nip(ctx, ctx->nip - 4); \ > rs = gen_avr_ptr(rS(ctx->opcode)); \ > gen_helper_stve##name(cpu_env, rs, EA); \ > tcg_temp_free(EA); \ > > (And possibly others I haven't yet audited) Yes indeed. I'm amused by this, since I read your emails in the wrong order, and so discovered exactly this problem while answering the other email. r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? 2016-07-25 14:12 ` Richard Henderson @ 2016-07-25 21:46 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-25 21:46 UTC (permalink / raw) To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Christian Borntraeger On Mon, 2016-07-25 at 19:42 +0530, Richard Henderson wrote: > For some targets, we also restore part of the flags computation with this > mechanism. With more trickery, ARM is (intending to?) compute exception > syndrome information with this. As I understand it, this is very much akin to > the PPC gen_set_access_type, so perhaps in future there's some savings to be > had there Indeed. Another issue we have is generating the opcode bits in DSISR. Now thankfully for most modern CPUs this is no longer done in HW but at the moment, I dont like how we call cpu_ldl_code() in powerpc_excp. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-07-25 23:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-24 12:42 [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? Benjamin Herrenschmidt 2016-07-24 12:51 ` Benjamin Herrenschmidt 2016-07-24 12:52 ` Benjamin Herrenschmidt 2016-07-25 0:36 ` Richard Henderson 2016-07-25 0:46 ` Benjamin Herrenschmidt 2016-07-25 0:51 ` Benjamin Herrenschmidt 2016-07-25 14:00 ` Richard Henderson 2016-07-25 21:42 ` Benjamin Herrenschmidt 2016-07-25 22:19 ` Peter Maydell 2016-07-25 22:42 ` Benjamin Herrenschmidt 2016-07-25 23:22 ` Benjamin Herrenschmidt 2016-07-25 0:34 ` Richard Henderson 2016-07-25 5:15 ` Benjamin Herrenschmidt 2016-07-25 14:12 ` Richard Henderson 2016-07-25 21:46 ` Benjamin Herrenschmidt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.