All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode
@ 2021-07-25 17:44 Peter Maydell
  2021-07-25 17:44 ` [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low Peter Maydell
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Peter Maydell @ 2021-07-25 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This patchset fixes the intermittent hang seen when running a guest in
icount mode, as reported in
  https://gitlab.com/qemu-project/qemu/-/issues/499 .

The underlying cause of the hang is that code in cpu_loop_exec_tb()
was using CF_COUNT_MASK as the maximum possible number of instructions
it would try to execute from a TB when it set the icount_decr.u16.low
field. This is wrong, because (a) that field can validly be set to any
unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
number of insns in the TB.

Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
value for icount_decr.u16.low, which is 0xffff.  Patch two adjusts the
"should we ask for a TB with exactly this many insns in it?" condition
so that instead of testing "cpu->icount_extra == 0", which should be
always true if (insns_left > 0 && insns_left < tb->icount), we assert
it instead.  This assertion would have caught the bug fixed in patch
one.

Tested using the same iterating loop test described in the bug report;
without the fix QEMU hangs within a handful of iterations. With the
fix it managed 175 successful iterations before I got bored and hit ^C.

thanks
-- PMM

Peter Maydell (2):
  accel/tcg: Don't use CF_COUNT_MASK as the max value of
    icount_decr.u16.low
  accel/tcg: Remove unnecessary check on icount_extra in
    cpu_loop_exec_tb()

 accel/tcg/cpu-exec.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low
  2021-07-25 17:44 [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Peter Maydell
@ 2021-07-25 17:44 ` Peter Maydell
  2021-07-26  9:17   ` Philippe Mathieu-Daudé
  2021-07-25 17:44 ` [PATCH for-6.1 2/2] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb() Peter Maydell
  2021-07-25 18:11 ` [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Richard Henderson
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2021-07-25 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

In cpu_loop_exec_tb() we were bounding the number of insns we might
try to execute in a TB using CF_COUNT_MASK.  This is incorrect,
because we can validly put up to 0xffff into icount_decr.u16.low.  In
particular, since commit 78ff82bb1b67c0d7 reduced CF_COUNT_MASK to
511 this meant that we would incorrectly only try to execute 511
instructions in a 512-instruction TB, which could result in QEMU
hanging when in icount mode.

Use the actual maximum value, which is 0xffff. (This brings this code
in to line with the similar logic in icount_prepare_for_run() in
tcg-accel-ops-icount.c.)

Fixes: 78ff82bb1b67c0d7
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/499
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/tcg/cpu-exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index fc895cf51e4..6e8dc291197 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -834,7 +834,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
     /* Ensure global icount has gone forward */
     icount_update(cpu);
     /* Refill decrementer and continue execution.  */
-    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
+    insns_left = MIN(0xffff, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
     cpu->icount_extra = cpu->icount_budget - insns_left;
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH for-6.1 2/2] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb()
  2021-07-25 17:44 [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Peter Maydell
  2021-07-25 17:44 ` [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low Peter Maydell
@ 2021-07-25 17:44 ` Peter Maydell
  2021-07-25 17:45   ` Peter Maydell
  2021-07-25 18:11 ` [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Richard Henderson
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2021-07-25 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

In cpu_loop_exec_tb(), we decide whether to look for a TB with
exactly insns_left instructions in it using the condition
 (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)

The check for icount_extra == 0 is unnecessary, because we just set
  insns_left = MIN(0xffff, cpu->icount_budget);
  icount_extra = icount_budget - insns_left;
and so icount_extra can only be non-zero if icount_budget > 0xffff
and insns_left == 0xffff. But in that case insns_left >= tb->icount
because 0xffff is much larger than TCG_MAX_INSNS, so the condition
will be false anyway.

Remove the unnecessary check, and instead assert:
 * that we are only going to execute a partial TB here if the
   icount budget has run out (ie icount_extra == 0)
 * that the number of insns we're going to execute does fit into
   the CF_COUNT_MASK

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
You could argue that we don't need the asserts, if you like.
The first one would have caught the bug fixed in the previous
commit, though.
---
 accel/tcg/cpu-exec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 6e8dc291197..5aa42fbff35 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -843,7 +843,9 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
      * execute we need to ensure we find/generate a TB with exactly
      * insns_left instructions in it.
      */
-    if (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)  {
+    if (insns_left > 0 && insns_left < tb->icount)  {
+        assert(insns_left <= CF_COUNT_MASK);
+        assert(cpu->icount_extra == 0);
         cpu->cflags_next_tb = (tb->cflags & ~CF_COUNT_MASK) | insns_left;
     }
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH for-6.1 2/2] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb()
  2021-07-25 17:44 ` [PATCH for-6.1 2/2] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb() Peter Maydell
@ 2021-07-25 17:45   ` Peter Maydell
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2021-07-25 17:45 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Richard Henderson

On Sun, 25 Jul 2021 at 18:44, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> In cpu_loop_exec_tb(), we decide whether to look for a TB with
> exactly insns_left instructions in it using the condition
>  (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)
>
> The check for icount_extra == 0 is unnecessary, because we just set
>   insns_left = MIN(0xffff, cpu->icount_budget);
>   icount_extra = icount_budget - insns_left;
> and so icount_extra can only be non-zero if icount_budget > 0xffff
> and insns_left == 0xffff. But in that case insns_left >= tb->icount
> because 0xffff is much larger than TCG_MAX_INSNS, so the condition
> will be false anyway.
>
> Remove the unnecessary check, and instead assert:
>  * that we are only going to execute a partial TB here if the
>    icount budget has run out (ie icount_extra == 0)
>  * that the number of insns we're going to execute does fit into
>    the CF_COUNT_MASK
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> You could argue that we don't need the asserts, if you like.
> The first one would have caught the bug fixed in the previous
> commit, though.

"first" in the bulleted list, "second" in the order I put them in the code...

> ---
>  accel/tcg/cpu-exec.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 6e8dc291197..5aa42fbff35 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -843,7 +843,9 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>       * execute we need to ensure we find/generate a TB with exactly
>       * insns_left instructions in it.
>       */
> -    if (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)  {
> +    if (insns_left > 0 && insns_left < tb->icount)  {
> +        assert(insns_left <= CF_COUNT_MASK);
> +        assert(cpu->icount_extra == 0);
>          cpu->cflags_next_tb = (tb->cflags & ~CF_COUNT_MASK) | insns_left;
>      }
>  #endif

-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode
  2021-07-25 17:44 [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Peter Maydell
  2021-07-25 17:44 ` [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low Peter Maydell
  2021-07-25 17:44 ` [PATCH for-6.1 2/2] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb() Peter Maydell
@ 2021-07-25 18:11 ` Richard Henderson
  2021-07-26 16:54   ` Richard Henderson
  2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2021-07-25 18:11 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel

On 7/25/21 7:44 AM, Peter Maydell wrote:
> This patchset fixes the intermittent hang seen when running a guest in
> icount mode, as reported in
>    https://gitlab.com/qemu-project/qemu/-/issues/499 .
> 
> The underlying cause of the hang is that code in cpu_loop_exec_tb()
> was using CF_COUNT_MASK as the maximum possible number of instructions
> it would try to execute from a TB when it set the icount_decr.u16.low
> field. This is wrong, because (a) that field can validly be set to any
> unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
> reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
> number of insns in the TB.
> 
> Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
> value for icount_decr.u16.low, which is 0xffff.  Patch two adjusts the
> "should we ask for a TB with exactly this many insns in it?" condition
> so that instead of testing "cpu->icount_extra == 0", which should be
> always true if (insns_left > 0 && insns_left < tb->icount), we assert
> it instead.  This assertion would have caught the bug fixed in patch
> one.
> 
> Tested using the same iterating loop test described in the bug report;
> without the fix QEMU hangs within a handful of iterations. With the
> fix it managed 175 successful iterations before I got bored and hit ^C.
> 
> thanks
> -- PMM
> 
> Peter Maydell (2):
>    accel/tcg: Don't use CF_COUNT_MASK as the max value of
>      icount_decr.u16.low
>    accel/tcg: Remove unnecessary check on icount_extra in
>      cpu_loop_exec_tb()

Nice one.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low
  2021-07-25 17:44 ` [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low Peter Maydell
@ 2021-07-26  9:17   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-07-26  9:17 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel; +Cc: Richard Henderson

On 7/25/21 7:44 PM, Peter Maydell wrote:
> In cpu_loop_exec_tb() we were bounding the number of insns we might
> try to execute in a TB using CF_COUNT_MASK.  This is incorrect,
> because we can validly put up to 0xffff into icount_decr.u16.low.  In
> particular, since commit 78ff82bb1b67c0d7 reduced CF_COUNT_MASK to
> 511 this meant that we would incorrectly only try to execute 511
> instructions in a 512-instruction TB, which could result in QEMU
> hanging when in icount mode.

Nice catch...

> Use the actual maximum value, which is 0xffff. (This brings this code
> in to line with the similar logic in icount_prepare_for_run() in
> tcg-accel-ops-icount.c.)
> 
> Fixes: 78ff82bb1b67c0d7
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/499
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  accel/tcg/cpu-exec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index fc895cf51e4..6e8dc291197 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -834,7 +834,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>      /* Ensure global icount has gone forward */
>      icount_update(cpu);
>      /* Refill decrementer and continue execution.  */
> -    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
> +    insns_left = MIN(0xffff, cpu->icount_budget);

Or UINT16_MAX.

Regardless:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

>      cpu_neg(cpu)->icount_decr.u16.low = insns_left;
>      cpu->icount_extra = cpu->icount_budget - insns_left;
>  
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode
  2021-07-25 18:11 ` [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Richard Henderson
@ 2021-07-26 16:54   ` Richard Henderson
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2021-07-26 16:54 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel

On 7/25/21 8:11 AM, Richard Henderson wrote:
> On 7/25/21 7:44 AM, Peter Maydell wrote:
>> This patchset fixes the intermittent hang seen when running a guest in
>> icount mode, as reported in
>>    https://gitlab.com/qemu-project/qemu/-/issues/499 .
>>
>> The underlying cause of the hang is that code in cpu_loop_exec_tb()
>> was using CF_COUNT_MASK as the maximum possible number of instructions
>> it would try to execute from a TB when it set the icount_decr.u16.low
>> field. This is wrong, because (a) that field can validly be set to any
>> unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
>> reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
>> number of insns in the TB.
>>
>> Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
>> value for icount_decr.u16.low, which is 0xffff.  Patch two adjusts the
>> "should we ask for a TB with exactly this many insns in it?" condition
>> so that instead of testing "cpu->icount_extra == 0", which should be
>> always true if (insns_left > 0 && insns_left < tb->icount), we assert
>> it instead.  This assertion would have caught the bug fixed in patch
>> one.
>>
>> Tested using the same iterating loop test described in the bug report;
>> without the fix QEMU hangs within a handful of iterations. With the
>> fix it managed 175 successful iterations before I got bored and hit ^C.
>>
>> thanks
>> -- PMM
>>
>> Peter Maydell (2):
>>    accel/tcg: Don't use CF_COUNT_MASK as the max value of
>>      icount_decr.u16.low
>>    accel/tcg: Remove unnecessary check on icount_extra in
>>      cpu_loop_exec_tb()
> 
> Nice one.
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Queued for 6.1.

r~



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-26 16:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-25 17:44 [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Peter Maydell
2021-07-25 17:44 ` [PATCH for-6.1 1/2] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low Peter Maydell
2021-07-26  9:17   ` Philippe Mathieu-Daudé
2021-07-25 17:44 ` [PATCH for-6.1 2/2] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb() Peter Maydell
2021-07-25 17:45   ` Peter Maydell
2021-07-25 18:11 ` [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode Richard Henderson
2021-07-26 16:54   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.