* [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile
@ 2018-03-19 3:15 Richard Henderson
2018-03-19 6:30 ` Pavel Dovgalyuk
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Richard Henderson @ 2018-03-19 3:15 UTC (permalink / raw)
To: qemu-devel; +Cc: Pavel.Dovgaluk, peter.maydell, pbonzini
We have confused the number of instructions that have been
executed in the TB with the number of instructions needed
to repeat the I/O instruction.
We have used cpu_restore_state_from_tb, which means that
the guest pc is pointing to the I/O instruction. The only
time the answer to the later question is not 1 is when
MIPS or SH4 need to re-execute the branch for the delay
slot as well.
We must rely on cpu->cflags_next_tb to generate the next TB,
as otherwise we have a race condition with other guest cpus
within the TB cache.
Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.
r~
---
accel/tcg/translate-all.c | 37 ++++++++++---------------------------
1 file changed, 10 insertions(+), 27 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5ad1b919bc..d4190602d1 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
CPUArchState *env = cpu->env_ptr;
#endif
TranslationBlock *tb;
- uint32_t n, flags;
- target_ulong pc, cs_base;
+ uint32_t n;
tb_lock();
tb = tb_find_pc(retaddr);
@@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
(void *)retaddr);
}
- n = cpu->icount_decr.u16.low + tb->icount;
cpu_restore_state_from_tb(cpu, tb, retaddr);
- /* Calculate how many instructions had been executed before the fault
- occurred. */
- n = n - cpu->icount_decr.u16.low;
- /* Generate a new TB ending on the I/O insn. */
- n++;
+
/* On MIPS and SH, delay slot instructions can only be restarted if
they were already the first instruction in the TB. If this is not
the first instruction in a TB then re-execute the preceding
branch. */
+ n = 1;
#if defined(TARGET_MIPS)
- if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {
+ if ((env->hflags & MIPS_HFLAG_BMASK) != 0
+ && env->active_tc.PC != tb->pc) {
env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
cpu->icount_decr.u16.low++;
env->hflags &= ~MIPS_HFLAG_BMASK;
+ n = 2;
}
#elif defined(TARGET_SH4)
if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
- && n > 1) {
+ && env->pc != tb->pc) {
env->pc -= 2;
cpu->icount_decr.u16.low++;
env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
+ n = 2;
}
#endif
- /* This should never happen. */
- if (n > CF_COUNT_MASK) {
- cpu_abort(cpu, "TB too big during recompile");
- }
- pc = tb->pc;
- cs_base = tb->cs_base;
- flags = tb->flags;
- tb_phys_invalidate(tb, -1);
-
- /* Execute one IO instruction without caching
- instead of creating large TB. */
- cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
+ /* Generate a new TB executing the I/O insn. */
+ cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
if (tb->cflags & CF_NOCACHE) {
if (tb->orig_tb) {
@@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
tb_remove(tb);
}
- /* Generate new TB instead of the current one. */
- /* FIXME: In theory this could raise an exception. In practice
- we have already translated the block once so it's probably ok. */
- tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
-
/* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
* the first in the TB) then we end up generating a whole new TB and
* repeating the fault, which is horribly inefficient.
--
2.14.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile
2018-03-19 3:15 [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile Richard Henderson
@ 2018-03-19 6:30 ` Pavel Dovgalyuk
2018-03-19 15:54 ` Paolo Bonzini
2018-03-20 0:52 ` Philippe Mathieu-Daudé
2 siblings, 0 replies; 5+ messages in thread
From: Pavel Dovgalyuk @ 2018-03-19 6:30 UTC (permalink / raw)
To: 'Richard Henderson', qemu-devel
Cc: Pavel.Dovgaluk, peter.maydell, pbonzini
> From: Richard Henderson [mailto:richard.henderson@linaro.org]
> We have confused the number of instructions that have been
> executed in the TB with the number of instructions needed
> to repeat the I/O instruction.
>
> We have used cpu_restore_state_from_tb, which means that
> the guest pc is pointing to the I/O instruction. The only
> time the answer to the later question is not 1 is when
> MIPS or SH4 need to re-execute the branch for the delay
> slot as well.
>
> We must rely on cpu->cflags_next_tb to generate the next TB,
> as otherwise we have a race condition with other guest cpus
> within the TB cache.
>
> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>
> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.
>
Works for Ciro's ARM sample and doesn't break icount and replay for i386.
Tested-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Pavel Dovgalyuk
> r~
>
> ---
> accel/tcg/translate-all.c | 37 ++++++++++---------------------------
> 1 file changed, 10 insertions(+), 27 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5ad1b919bc..d4190602d1 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> CPUArchState *env = cpu->env_ptr;
> #endif
> TranslationBlock *tb;
> - uint32_t n, flags;
> - target_ulong pc, cs_base;
> + uint32_t n;
>
> tb_lock();
> tb = tb_find_pc(retaddr);
> @@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
> (void *)retaddr);
> }
> - n = cpu->icount_decr.u16.low + tb->icount;
> cpu_restore_state_from_tb(cpu, tb, retaddr);
> - /* Calculate how many instructions had been executed before the fault
> - occurred. */
> - n = n - cpu->icount_decr.u16.low;
> - /* Generate a new TB ending on the I/O insn. */
> - n++;
> +
> /* On MIPS and SH, delay slot instructions can only be restarted if
> they were already the first instruction in the TB. If this is not
> the first instruction in a TB then re-execute the preceding
> branch. */
> + n = 1;
> #if defined(TARGET_MIPS)
> - if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {
> + if ((env->hflags & MIPS_HFLAG_BMASK) != 0
> + && env->active_tc.PC != tb->pc) {
> env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
> cpu->icount_decr.u16.low++;
> env->hflags &= ~MIPS_HFLAG_BMASK;
> + n = 2;
> }
> #elif defined(TARGET_SH4)
> if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
> - && n > 1) {
> + && env->pc != tb->pc) {
> env->pc -= 2;
> cpu->icount_decr.u16.low++;
> env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
> + n = 2;
> }
> #endif
> - /* This should never happen. */
> - if (n > CF_COUNT_MASK) {
> - cpu_abort(cpu, "TB too big during recompile");
> - }
>
> - pc = tb->pc;
> - cs_base = tb->cs_base;
> - flags = tb->flags;
> - tb_phys_invalidate(tb, -1);
> -
> - /* Execute one IO instruction without caching
> - instead of creating large TB. */
> - cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
> + /* Generate a new TB executing the I/O insn. */
> + cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
>
> if (tb->cflags & CF_NOCACHE) {
> if (tb->orig_tb) {
> @@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> tb_remove(tb);
> }
>
> - /* Generate new TB instead of the current one. */
> - /* FIXME: In theory this could raise an exception. In practice
> - we have already translated the block once so it's probably ok. */
> - tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
> -
> /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> * the first in the TB) then we end up generating a whole new TB and
> * repeating the fault, which is horribly inefficient.
> --
> 2.14.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile
2018-03-19 3:15 [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile Richard Henderson
2018-03-19 6:30 ` Pavel Dovgalyuk
@ 2018-03-19 15:54 ` Paolo Bonzini
2018-03-20 0:39 ` Richard Henderson
2018-03-20 0:52 ` Philippe Mathieu-Daudé
2 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2018-03-19 15:54 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Pavel.Dovgaluk, peter.maydell
On 19/03/2018 04:15, Richard Henderson wrote:
> We have confused the number of instructions that have been
> executed in the TB with the number of instructions needed
> to repeat the I/O instruction.
>
> We have used cpu_restore_state_from_tb, which means that
> the guest pc is pointing to the I/O instruction. The only
> time the answer to the later question is not 1 is when
> MIPS or SH4 need to re-execute the branch for the delay
> slot as well.
>
> We must rely on cpu->cflags_next_tb to generate the next TB,
> as otherwise we have a race condition with other guest cpus
> within the TB cache.
>
> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>
> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.
Thanks, let me know if you prefer to send a pull request yourself, or if
I should include it in the next.
Thanks,
Paolo
>
>
> r~
>
> ---
> accel/tcg/translate-all.c | 37 ++++++++++---------------------------
> 1 file changed, 10 insertions(+), 27 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5ad1b919bc..d4190602d1 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> CPUArchState *env = cpu->env_ptr;
> #endif
> TranslationBlock *tb;
> - uint32_t n, flags;
> - target_ulong pc, cs_base;
> + uint32_t n;
>
> tb_lock();
> tb = tb_find_pc(retaddr);
> @@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
> (void *)retaddr);
> }
> - n = cpu->icount_decr.u16.low + tb->icount;
> cpu_restore_state_from_tb(cpu, tb, retaddr);
> - /* Calculate how many instructions had been executed before the fault
> - occurred. */
> - n = n - cpu->icount_decr.u16.low;
> - /* Generate a new TB ending on the I/O insn. */
> - n++;
> +
> /* On MIPS and SH, delay slot instructions can only be restarted if
> they were already the first instruction in the TB. If this is not
> the first instruction in a TB then re-execute the preceding
> branch. */
> + n = 1;
> #if defined(TARGET_MIPS)
> - if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {
> + if ((env->hflags & MIPS_HFLAG_BMASK) != 0
> + && env->active_tc.PC != tb->pc) {
> env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
> cpu->icount_decr.u16.low++;
> env->hflags &= ~MIPS_HFLAG_BMASK;
> + n = 2;
> }
> #elif defined(TARGET_SH4)
> if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
> - && n > 1) {
> + && env->pc != tb->pc) {
> env->pc -= 2;
> cpu->icount_decr.u16.low++;
> env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
> + n = 2;
> }
> #endif
> - /* This should never happen. */
> - if (n > CF_COUNT_MASK) {
> - cpu_abort(cpu, "TB too big during recompile");
> - }
>
> - pc = tb->pc;
> - cs_base = tb->cs_base;
> - flags = tb->flags;
> - tb_phys_invalidate(tb, -1);
> -
> - /* Execute one IO instruction without caching
> - instead of creating large TB. */
> - cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
> + /* Generate a new TB executing the I/O insn. */
> + cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
>
> if (tb->cflags & CF_NOCACHE) {
> if (tb->orig_tb) {
> @@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> tb_remove(tb);
> }
>
> - /* Generate new TB instead of the current one. */
> - /* FIXME: In theory this could raise an exception. In practice
> - we have already translated the block once so it's probably ok. */
> - tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
> -
> /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> * the first in the TB) then we end up generating a whole new TB and
> * repeating the fault, which is horribly inefficient.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile
2018-03-19 15:54 ` Paolo Bonzini
@ 2018-03-20 0:39 ` Richard Henderson
0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2018-03-20 0:39 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: Pavel.Dovgaluk, peter.maydell
On 03/19/2018 11:54 PM, Paolo Bonzini wrote:
> On 19/03/2018 04:15, Richard Henderson wrote:
>> We have confused the number of instructions that have been
>> executed in the TB with the number of instructions needed
>> to repeat the I/O instruction.
>>
>> We have used cpu_restore_state_from_tb, which means that
>> the guest pc is pointing to the I/O instruction. The only
>> time the answer to the later question is not 1 is when
>> MIPS or SH4 need to re-execute the branch for the delay
>> slot as well.
>>
>> We must rely on cpu->cflags_next_tb to generate the next TB,
>> as otherwise we have a race condition with other guest cpus
>> within the TB cache.
>>
>> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>
>> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.
>
> Thanks, let me know if you prefer to send a pull request yourself, or if
> I should include it in the next.
I'm at Linaro Connect this week. Please include this in your next.
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile
2018-03-19 3:15 [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile Richard Henderson
2018-03-19 6:30 ` Pavel Dovgalyuk
2018-03-19 15:54 ` Paolo Bonzini
@ 2018-03-20 0:52 ` Philippe Mathieu-Daudé
2 siblings, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-03-20 0:52 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: peter.maydell, Pavel.Dovgaluk, pbonzini
On 03/19/2018 04:15 AM, Richard Henderson wrote:
> We have confused the number of instructions that have been
> executed in the TB with the number of instructions needed
> to repeat the I/O instruction.
>
> We have used cpu_restore_state_from_tb, which means that
> the guest pc is pointing to the I/O instruction. The only
> time the answer to the later question is not 1 is when
> MIPS or SH4 need to re-execute the branch for the delay
> slot as well.
>
> We must rely on cpu->cflags_next_tb to generate the next TB,
> as otherwise we have a race condition with other guest cpus
> within the TB cache.
>
> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>
> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.
>
>
> r~
>
> ---
> accel/tcg/translate-all.c | 37 ++++++++++---------------------------
> 1 file changed, 10 insertions(+), 27 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5ad1b919bc..d4190602d1 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> CPUArchState *env = cpu->env_ptr;
> #endif
> TranslationBlock *tb;
> - uint32_t n, flags;
> - target_ulong pc, cs_base;
> + uint32_t n;
>
> tb_lock();
> tb = tb_find_pc(retaddr);
> @@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
> (void *)retaddr);
> }
> - n = cpu->icount_decr.u16.low + tb->icount;
> cpu_restore_state_from_tb(cpu, tb, retaddr);
> - /* Calculate how many instructions had been executed before the fault
> - occurred. */
> - n = n - cpu->icount_decr.u16.low;
> - /* Generate a new TB ending on the I/O insn. */
> - n++;
> +
> /* On MIPS and SH, delay slot instructions can only be restarted if
> they were already the first instruction in the TB. If this is not
> the first instruction in a TB then re-execute the preceding
> branch. */
> + n = 1;
> #if defined(TARGET_MIPS)
> - if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {
> + if ((env->hflags & MIPS_HFLAG_BMASK) != 0
> + && env->active_tc.PC != tb->pc) {
> env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
> cpu->icount_decr.u16.low++;
> env->hflags &= ~MIPS_HFLAG_BMASK;
> + n = 2;
> }
> #elif defined(TARGET_SH4)
> if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
> - && n > 1) {
> + && env->pc != tb->pc) {
> env->pc -= 2;
> cpu->icount_decr.u16.low++;
> env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
> + n = 2;
> }
> #endif
> - /* This should never happen. */
> - if (n > CF_COUNT_MASK) {
> - cpu_abort(cpu, "TB too big during recompile");
> - }
>
> - pc = tb->pc;
> - cs_base = tb->cs_base;
> - flags = tb->flags;
> - tb_phys_invalidate(tb, -1);
> -
> - /* Execute one IO instruction without caching
> - instead of creating large TB. */
> - cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
> + /* Generate a new TB executing the I/O insn. */
> + cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
>
> if (tb->cflags & CF_NOCACHE) {
> if (tb->orig_tb) {
> @@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> tb_remove(tb);
> }
>
> - /* Generate new TB instead of the current one. */
> - /* FIXME: In theory this could raise an exception. In practice
> - we have already translated the block once so it's probably ok. */
> - tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
> -
> /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> * the first in the TB) then we end up generating a whole new TB and
> * repeating the fault, which is horribly inefficient.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-03-20 0:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19 3:15 [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile Richard Henderson
2018-03-19 6:30 ` Pavel Dovgalyuk
2018-03-19 15:54 ` Paolo Bonzini
2018-03-20 0:39 ` Richard Henderson
2018-03-20 0:52 ` Philippe Mathieu-Daudé
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.